Making it easier to play my Tic-Tac-Toe agent
Background #
As part of my senior project in undergrad, I made a Tic-tac-toe playing RL agent. It used a simple temporal difference (TD) update rule that can be found in Chapter 1 of Sutton and Barto’s RL book. In fact, all of the details for how to build the agent can be found in Chapter 1 of that book. They do a case study of making a Tic-Tac-Toe playing algorithm and cover everything from what’s needed from the environment, to the update rule for the agent. Definitely worth a read, especially if you want to implement one yourself.
The Update #
A friend asked me how he could play my trained agent, so I chose to go ahead and write a simple script to make it easy for anyone (with a tiny bit of terminal knowledge) to play against it. Here’s how to do it:
Head over to my GitHub repository and clone it:
git clone https://github.com/jfpettit/senior-practicum.git
Once you’ve cloned it, go ahead and cd into the repository and into the Tic-tac-toe folder:
cd senior-practicum/TD_tictactoe/
At last, you can run the game with:
python tictactoe_runner.py
Here’s a sample of a game I played with it so you know what kind of output should show up in your terminal:
Jacobs-MacBook-Pro:TD_TicTacToe jacobpettit$ python tictactoe_runner.py
Select piece to play as: input X or O:x
[['-' '-' '-']
['-' '-' '-']
['-' '-' '-']]
Input your move coordinates, separated by a comma: 1,1
[['-' '-' 'O']
['-' 'X' '-']
['-' '-' '-']]
Input your move coordinates, separated by a comma: 2,0
[['-' '-' 'O']
['-' 'X' '-']
['X' '-' 'O']]
Input your move coordinates, separated by a comma: 1,2
[['-' '-' 'O']
['O' 'X' 'X']
['X' '-' 'O']]
Input your move coordinates, separated by a comma: 2,1
[['-' 'O' 'O']
['O' 'X' 'X']
['X' 'X' 'O']]
Input your move coordinates, separated by a comma: 0,0
[['X' 'O' 'O']
['O' 'X' 'X']
['X' 'X' 'O']]
So, in this case, nobody won. The code doesn’t print out the winner of the game, so don’t expect any output after that last move.