Files
retro-gamer/prompt.md
Chris Proctor 5ca97dc5d0 Initial commit
2026-05-08 14:07:17 -04:00

5.7 KiB

Your task is to write a new python project and package called retro-gamer. The goal of this package is to serve as a tool kit for a computer science beginners to learn about reinforcement learning. Retro gamer will train agents to play games implemented with the retro-games framework. In this to the students will not focus on writing code; instead, they will specify Meta data about the games Their agents are playing, so that the training model can choose more effective representations of the game and thereby training more effectively. Students will also adjust hyper parameters of the training model. The goal is to use the game and the training model as objects to think with, to reason about and learn about reinforcement learning.

First read the code and documentation for the retro games framework (/Users/chrisp/Repos/MWC/packages/retro; https://retro-games.readthedocs.io/en/latest/). In the framework games are implemented in a character parametrized by board size. The observation space for a game includes its character grid and a dictionary containing game state. One key in the game state will be used as the reward function (usually “score”). Each position in the character grid can have a character with a foreground in the background color, but training agents will ignore color. The game operates on the tick model, where all key strokes entered since the last turn are aggregated, each agent has a chance to act, and then the game moves to the next turn. The action based for our agents will include the choice of a single key stroke (or none) per turn.

Then we are going to plan the retro gamer package so that it can train an agent to play a given game. Our planning will focus on proposed changes which need to be made to the retro games framework to allow agents to train and play games. (I anticipate that these changes will include an entry point specifying a function which returns the initialized game.) we will also develop a specification of required and optional Meta data, which can be provided about the game to make training more effective. This metadata will include:

  • Board size: required
  • Character set: a list of characters which can appear on the board. The contents of each pixel will be represented by a one hot encoding; the size of the character that determines the length of the encoding vector. When character set is specified, any characters outside of that will be ignored. If character set it is not specified, the training agent will conduct initial exploration to observe the characters which appear. The length of this initial exploration can be specified as a hyper parameter on the trainer. Another hyper parameter on the trainer will specify what to do when we encounter a character outside the observed set: rebuild the model with an extended character set, or ignore unknown characters.
  • Actions: required. a list of key strokes recognized by the game.
  • spatial: expressing whether the game board should be considered spatial, or just UI for displaying data. Spatial games will use a convolutional neural net architecture; otherwise Games will flatten the pixel array and use a multiplayer perceptron architecture.
  • Reward: required. a key in the state dictionary to use as the reward function.
  • Observe state: optional. A list of state keys to include in the observation space. This list must not include the reward key. The values in observed state keys must be integers, floats, or bools. When observe state is not specified, the default is an empty list.

A key feature of the retro gamer package is the trainer class, which constructs a deep q-learning model for training the game based on the provided metadata. The training class will be highly interpretable through its log file. When a training class is initialized, it uses its own hyper parameters and the game Metadata to construct a training model. A detailed description of the training model is written to the trainer log file, including the rationale for design decisions (e.g., whether to use a CNN or a MLP). Additional trainer, hyper parameters will include things such as the number of layers in the neural network, and the size of the layers. I want to think about the right level of abstraction to expose to students in specifying the network architecture. In a sense a Py torch model is a specification for the network architecture, but this provides more flexibility than students will know what to do with, and is likely to be frustrating because most networks they specify will not work well. So perhaps we might allow students to specify the number of layers and then infer remaining parameters, providing an explanation for our decisions. (we should make it possible for more advanced students to fully specify the py torch model). Training parameters, such as learning rate, learning rate decay, training, duration, whether to prioritize experiences with the highest temporal difference, etc, can also be specified as hyper parameters to the trainer.

When the trainer trains an agent, it creates a directory containing a model/training specification in toml format; log file whose first entry is a detailed description of the model architecture and its design rationale, and saved model weights with training snapshots. The retro games package will have to be updated to allow a game to be initialized in headless mode, so the trainer can access board state, but nothing is written to standard out. We will need to test that we can configure a game to run while having the agent observed the board state and send key strokes as actions— please propose the system design that will allow this.

Retro gamer should also provide a CLI through which agents and their training regimes can be created (model and training hyper parameters can be specified as options), trained, and used to play games.