Jacob Pettit

My blog about learning machine learning and other cool, science-y things.

Read this first

Beginner friendly reinforcement learning with rlpack

Trained PPO agent playing LunarLander

Lately, I’ve been working on learning more about deep reinforcement learning and decided to start writing my own RL framework as a way to get really familiar with some of the algorithms. In the process, I also thought it could be cool to make my framework a resource for beginners to easily get started with reinforcement learning. With that in mind, the goal wasn’t state of the art performance. However, in the future, I might decide that a high standard of performance is something I want to prioritize over beginner-friendliness.

This framework isn’t finished, but this post indicates the first version of it that I’m happy with being done. Up until now, it’s been the main project I’ve focused on, but moving forward I’ll be putting more time into exploring different things and will more passively work on expanding rlpack. Find the repository here or...

Continue reading →

Making it easier to play my Tic-Tac-Toe agent


As part of my senior project in undergrad, I made a Tic-tac-toe playing RL agent. It used a simple temporal difference (TD) update rule that can be found in Chapter 1 of Sutton and Barto’s RL book. In fact, all of the details for how to build the agent can be found in Chapter 1 of that book. They do a case study of making a Tic-Tac-Toe playing algorithm and cover everything from what’s needed from the environment, to the update rule for the agent. Definitely worth a read, especially if you want to implement one yourself.

The Update

A friend asked me how he could play my trained agent, so I chose to go ahead and write a simple script to make it easy for anyone (with a tiny bit of terminal knowledge) to play against it. Here’s how to do it:

Head over to my GitHub repository and clone it:

git clone https://github.com/jfpettit/senior-practicum.git

Once you’ve cloned it, go...

Continue reading →

Introducing gym-snake-rl



Although there are existing implementations of the classic Snake game (play the game here), I wanted to create my own implementation for a few reasons:

  • Opportunity to learn more about environment design, including designing an observation space and reward signals.
  • Write an implementation with random map generation, so that this code could be used to work on generalization in RL. See OpenAI’s blog post on this topic.
  • Create snake as a multi-agent system, and create versions of the environment where there are fewer units of food than there are snakes, so that we can investigate what competitive behavior evolves.
  • Implement a vectorized observation space for the snake game, in an attempt to require less computational power than games that only provide the screen images as observations. For example, CoinRun, OpenAI’s procedurally generated environment for working on...

Continue reading →