Making it easier to play my Tic-Tac-Toe agent

Background #

As part of my senior project in undergrad, I made a Tic-tac-toe playing RL agent. It used a simple temporal difference (TD) update rule that can be found in Chapter 1 of Sutton and Barto’s RL book. In fact, all of the details for how to build the agent can be found in Chapter 1 of that book. They do a case study of making a Tic-Tac-Toe playing algorithm and cover everything from what’s needed from the environment, to the update rule for the agent. Definitely worth a read, especially if you want to implement one yourself.

The Update #

A friend asked me how he could play my trained agent, so I chose to go ahead and write a simple script to make it easy for anyone (with a tiny bit of terminal knowledge) to play against it. Here’s how to do it:

Head over to my GitHub repository and clone it:

git clone https://github.com/jfpettit/senior-practicum.git

Once you’ve cloned it, go ahead and cd into the repository and into the Tic-tac-toe folder:

cd senior-practicum/TD_tictactoe/

At last, you can run the game with:

python tictactoe_runner.py

Here’s a sample of a game I played with it so you know what kind of output should show up in your terminal:

Jacobs-MacBook-Pro:TD_TicTacToe jacobpettit$ python tictactoe_runner.py 
Select piece to play as: input X or O:x
[['-' '-' '-']
 ['-' '-' '-']
 ['-' '-' '-']]
Input your move coordinates, separated by a comma: 1,1
[['-' '-' 'O']
 ['-' 'X' '-']
 ['-' '-' '-']]
Input your move coordinates, separated by a comma: 2,0
[['-' '-' 'O']
 ['-' 'X' '-']
 ['X' '-' 'O']]
Input your move coordinates, separated by a comma: 1,2
[['-' '-' 'O']
 ['O' 'X' 'X']
 ['X' '-' 'O']]
Input your move coordinates, separated by a comma: 2,1
[['-' 'O' 'O']
 ['O' 'X' 'X']
 ['X' 'X' 'O']]
Input your move coordinates, separated by a comma: 0,0
[['X' 'O' 'O']
 ['O' 'X' 'X']
 ['X' 'X' 'O']]

So, in this case, nobody won. The code doesn’t print out the winner of the game, so don’t expect any output after that last move.

 
3
Kudos
 
3
Kudos

Now read this

Beginner friendly reinforcement learning with rlpack

Trained PPO agent playing LunarLander Lately, I’ve been working on learning more about deep reinforcement learning and decided to start writing my own RL framework as a way to get really familiar with some of the algorithms. In the... Continue →