Introducing gym-snake-rl
Motivation #
Although there are existing implementations of the classic Snake game (play the game here), I wanted to create my own implementation for a few reasons:
- Opportunity to learn more about environment design, including designing an observation space and reward signals.
- Write an implementation with random map generation, so that this code could be used to work on generalization in RL. See OpenAI’s blog post on this topic.
- Create snake as a multi-agent system, and create versions of the environment where there are fewer units of food than there are snakes, so that we can investigate what competitive behavior evolves.
- Implement a vectorized observation space for the snake game, in an attempt to require less computational power than games that only provide the screen images as observations. For example, CoinRun, OpenAI’s procedurally generated environment for working on generalization in RL, only gives screen images as input. It was also fickle and difficult to set up when I tried to use it.
Getting up and running #
Installing and using my code is very simple. Start with cloning my GitHub repository;
git clone https://github.com/jfpettit/gym-snake-rl.git
Then, run the following:
pip install -e gym-snake-rl
Now you’re able to use the environment in your Python code like any other Gym environment:
import gym
import gym_snake_rl
env = gym.make('BasicSnake-small_vector-16-v0')
Training our first agent #
This step is optional, but if you’d like, you can use OpenAI’s Baselines to train from the command line. You can do this with the following:
Make sure you’ve installed gym-snake-rl, then install Baselines
git clone https://github.com/openai/baselines.git
cd baselines
Now, ensure you have Tensorflow installed
pip install tensorflow
or
pip install tensorflow-gpu # if you have a GPU that works with CUDA and have it set up with the appropriate drivers
Finally;
pip install -e .
That should set you up with Baselines installed. If you have an error or need more clarification on installation, look on OpenAI’s repository, which I’ve linked just above.
Now we can train an agent in our snake environment with the following command:
python -m baselines.run --alg=ppo2 --env=gym_snake_rl:BasicSnake-small_vector-16-v0 --num_timesteps=5e5 --save_path=~/basicsnake_smallvec_ppo2
A dialogue box with diagnostics should pop up and will update periodically throughout training. It takes about 15 minutes to train on my computer, a Mid-2012 Macbook Pro.
Once training has completed, you can watch your agent playing the game with:
python -m baselines.run --alg=ppo2 --env=gym_snake_rl:BasicSnake-small_vector-16-v0 --num_timesteps=0 --load_path=~/basicsnake_smallvec_ppo2 --play
Now that you’re up and running, let’s look at some of the finer points of the code I wrote.
Multi-agent Snake #
First off, disclaimer: multi-agent snake is still very much in beta. I guess the whole project is in beta, but multi-agent snake is particularly so.
When you are using the multi-agent snake environment, observations will be output as a tuple of tuples. So, you’ll need to make sure to feed each of your agents the appropriate tuple of observations. There are variants of multi-agent snake written with 2, 3, and 10 snakes already. You’ll be able to either use Gym to make one of those pre-built environment configurations, or you can take a look at my code and directly set up your own environment configuration. Each of the 2, 3, and 10 snake variants also have sub-variants; one with one food per snake (so 3 snakes, 3 foods) and one with one food total (so 10 snakes, 1 food). Hopefully the setups with fewer foods than snakes will enable some research into the evolution of competitive behavior between agents when they are placed in every-agent-for-themself type situations.
Random map generation #
An agent can be trained on randomly generated maps using the exact same command as above, but instead of setting --env=gym_snake_rl:BasicSnake-small_vector-16-v0
, set --env=gym_snake_rl:ObstacleBasicSnake-small_vector-16-v0
. Mine didn’t learn much besides to be afraid of the walls.
The code to randomly generate maps isn’t very sophisticated. As such, I don’t recommend placing more obstacles than 1/3rd the size of your longest map side. So, if you’re in a 16 by 16 grid, You probably shouldn’t request more than 5 or 6 obstacles to be placed on the map. But, you only need to worry about this if you’re not using one of the prebuilt configurations. Each of the prebuilts are conservative in the number of obstacles selected in order to avoid any issues popping up.
What issues? #
Well, when setting a really large number of obstacles, I’d end up with pockets of open map space completely encircled by wall and obstacles, so the snake couldn’t get to it. This is an issue because the food placement algorithm randomly places food anywhere that isn’t a wall or a snake, so the food could get placed somewhere that the snake couldn’t reach. Here’s an example of what I’m talking about:
This picture was produced by using a 25 by 25 map and placing 50 obstacles on it. You can actually see here, that the snake (green) is in a completely different pocket of the map than the food (red) and so will never be able to reach the food. Luckily, the map regenerates each episode, but it’s probably still best to avoid this kind of situation during training. It’s still possible for this issue to occur with a smaller number of obstacles, however, I observed that my code was really robust when requesting no more than about a third of the largest side of the map for number of obstacles. So in this case, that would be num_obstacles=8
.
Sidenote #
If you’re using the small_vector
observation setup, it is necessary to use a map with walls. We’ll get into this more now, as we cover different observation types.
Observation types #
This environment has 5 different observation types.
-
raw
: This observation type is the raw numpy array your agent is operating in. One channel, colors range from zero to 255. -
rgb
: This is an RGB image of your map. So three color channels, and colors in typical RGB range. -
rgb5
: This is the RGB image, but zoomed in a few pixels. -
small_vector
: Our first vectorized state representation, this includes a flattened vector of the 8 squares surrounding the snake’s head and the square the snake’s head is on (so it would look something like this (0, 0, 0, 0, 100, 0, 0, 0, 0), where 100 is where the snake’s head is). Included in our full observation vector is the length of the snake (an integer) and Euclidean distance from each food on the map. -
big_vector
: This is simply the raw numpy array of the map, flattened. On the end of our observation vector is the length of the snake and Euclidean distance from each piece of food on the map.
It is necessary to use a map with walls when you’re using the small_vector
observation type because the observation is gathered by indexing into the array at each of the 8 squares around the snake’s head. If you do this when there are no walls, you are all but guaranteed to run into an error with the index you’re trying to access being outside of the array. I didn’t write a better fix for this because it didn’t seem very important.
Reward functions #
My bit of googling around about reward functions in snake indicates that the typical reward function is +1 for eating food, -1 for dying, and 0 for everything else. This code contains that reward function as default. However, there is a second option. The second reward function implements Potential-Based-Reward-Shaping (PBRS). You can read more about it here, or just doing a google search will turn up some relevant information.
The current PBRS function is very much in beta and is an area of this code that needs to be improved. I’ve run some simple experiments with the current function and haven’t had much luck with learning anything. It’s an area I’ll actively be experimenting with, and, if someone wants to contribute to this code, this would be a simple way to get started. Altering the existing potential based reward function doesn’t require much understanding of how the code works. You’d just need to navigate to gym-snake-rl/gym_snake_rl/core/core_snake.py
and edit lines 263 through 269. The current reward function formulation is:
if step_closer_to_food:
reward = 0.01 * abs((gamma * current_dist) - old_dist)
elif step_away_from_food:
reward = -0.01 * abs((gamma * current_dist) - old_dist)
Gamma is a discount factor and is set to 0.99 by default.
In this environment, it is important to be aware that applying reward standardization may yield NaNs and cause errors. It is possible to have an episode or set of episodes where the return is all zeros or all ones. Of course, a vector of all one number has 0 standard deviation, so when you do (returns - mean(returns)) / standard_deviation(returns)
that division by zero will produce the NaNs. If you’re using PBRS, this is no longer a concern as you will not have a return vector that is made up entirely of the same number.
I anticipate that when the PBRS function in this code is fixed, it’ll prove helpful in training agents like the one trained on randomly generated maps above to actually go towards the food, instead of getting stuck in any weird behavior loops.
Environment names #
There are a bunch of environments contained in this code, and I’m going to cover the naming convention and list all of their names here.
Naming convention #
The structure of environment names follows this structure:
Name-observation_type-size-v0
All of the environments here are currently v0, and the prebuilt sizes are 16, 32, and 64.
Example list of names. #
Now for the fun part.
Partial List #
BasicSnake-raw-16-v0
BasicSnake-raw-32-v0
BasicSnake-raw-64-v0
BasicSnake-rgb-16-v0
BasicSnake-rgb-32-v0
BasicSnake-rgb-64-v0
BasicSnake-rgb5-16-v0
BasicSnake-rgb5-32-v0
BasicSnake-rgb5-64-v0
BasicSnake-small_vector-16-v0
BasicSnake-small_vector-32-v0
BasicSnake-small_vector-64-v0
BasicSnake-big_vector-16-v0
BasicSnake-big_vector-32-v0
BasicSnake-big_vector-64-v0
BasicSnakePBRS-raw-16-v0
BasicSnakePBRS-raw-32-v0
BasicSnakePBRS-raw-64-v0
BasicSnakePBRS-rgb-16-v0
BasicSnakePBRS-rgb-32-v0
BasicSnakePBRS-rgb-64-v0
BasicSnakePBRS-rgb5-16-v0
BasicSnakePBRS-rgb5-32-v0
BasicSnakePBRS-rgb5-64-v0
BasicSnakePBRS-small_vector-16-v0
BasicSnakePBRS-small_vector-32-v0
BasicSnakePBRS-small_vector-64-v0
BasicSnakePBRS-big_vector-16-v0
BasicSnakePBRS-big_vector-32-v0
BasicSnakePBRS-big_vector-64-v0
HungrySnake-raw-16-v0
HungrySnake-raw-32-v0
HungrySnake-raw-64-v0
HungrySnake-rgb-16-v0
HungrySnake-rgb-32-v0
HungrySnake-rgb-64-v0
HungrySnake-rgb5-16-v0
HungrySnake-rgb5-32-v0
HungrySnake-rgb5-64-v0
HungrySnake-small_vector-16-v0
HungrySnake-small_vector-32-v0
HungrySnake-small_vector-64-v0
HungrySnake-big_vector-16-v0
HungrySnake-big_vector-32-v0
HungrySnake-big_vector-64-v0
HungrySnakePBRS-raw-16-v0
HungrySnakePBRS-raw-32-v0
HungrySnakePBRS-raw-64-v0
HungrySnakePBRS-rgb-16-v0
HungrySnakePBRS-rgb-32-v0
HungrySnakePBRS-rgb-64-v0
HungrySnakePBRS-rgb5-16-v0
HungrySnakePBRS-rgb5-32-v0
HungrySnakePBRS-rgb5-64-v0
HungrySnakePBRS-small_vector-16-v0
HungrySnakePBRS-small_vector-32-v0
HungrySnakePBRS-small_vector-64-v0
HungrySnakePBRS-big_vector-16-v0
HungrySnakePBRS-big_vector-32-v0
HungrySnakePBRS-big_vector-64-v0
BabySnake-raw-16-v0
BabySnake-raw-32-v0
BabySnake-raw-64-v0
BabySnake-rgb-16-v0
BabySnake-rgb-32-v0
BabySnake-rgb-64-v0
BabySnake-rgb5-16-v0
BabySnake-rgb5-32-v0
BabySnake-rgb5-64-v0
BabySnake-small_vector-16-v0
BabySnake-small_vector-32-v0
BabySnake-small_vector-64-v0
BabySnake-big_vector-16-v0
BabySnake-big_vector-32-v0
BabySnake-big_vector-64-v0
BabySnakePBRS-raw-16-v0
BabySnakePBRS-raw-32-v0
BabySnakePBRS-raw-64-v0
BabySnakePBRS-rgb-16-v0
BabySnakePBRS-rgb-32-v0
BabySnakePBRS-rgb-64-v0
BabySnakePBRS-rgb5-16-v0
BabySnakePBRS-rgb5-32-v0
BabySnakePBRS-rgb5-64-v0
BabySnakePBRS-small_vector-16-v0
BabySnakePBRS-small_vector-32-v0
BabySnakePBRS-small_vector-64-v0
BabySnakePBRS-big_vector-16-v0
BabySnakePBRS-big_vector-32-v0
BabySnakePBRS-big_vector-64-v0
… and so on.
For a full list of names, see the __init__.py
file in gym-snake-rl/gym_snakerl/
. There is a full list of names, and they all follow the above convention. So, it’s pretty simple to get the environment you want by picking the name from the list, and then subbing it in with the observation type, size, and v0 tags.
HungrySnake and BabySnake variants. #
Both of these game variants are from here.
The BabySnake variant is very simple. Every time the snake eats a food, the episode ends. The HungrySnake variant has a limit for how many steps it can take after eating, this is done to simulate increasing hunger. The limit resets each time the snake eats.
References #
I referenced OpenAI’s Baselines for their setup instructions and code running instructions and such. Much of my code was built on top of nicomon24’s code. In fact, my project started off as a clone of his project and then I built the features I wanted to add on top of their code, and in some cases I rewrote the code to work with what I wanted to do.
My creation of this environment was inspired by OpenAI’s work titled Quantifying Generalization in Reinforcement Learning and I wanted to enable people wihout lots of computational power, like myself, to work on things similar to what they did.
Closing out #
If you’ve read for this long, thank you.
Any issues with the code should be opened at the GitHub repository linked towards the start of the blog, and if you write an addition to the code, you should submit a pull request on GitHub. If you have further questions, you can tweet me or email me (jfpettit@gmail.com).
Citation #
@article{gym-snake-rl2019,
author={Jacob Pettit},
title={Introducing gym-snake-rl},
url={https://jfpettit.github.io/Introducing-gym-snake-rl/},
year={2019}
}
Update History #
Update 1 on July 12, 2019: Renamed package from gym-snake-rl to gym-snake-rl.
Update 2 on July 23, 2019: Fixed stray gym-snakerl’s to gym-snake-rl.