Reference ========= Game description fields ----------------------- Game descriptions are written in the ``[tool.retro-gamer]`` section of your game project's ``pyproject.toml``. ``retro-gamer create`` reads this section and copies the metadata into the training run's ``config.toml``, where it can also be inspected or hand-edited. A complete example for the Snake game: .. code-block:: toml [tool.retro-gamer] actions = ["KEY_RIGHT", "KEY_UP", "KEY_LEFT", "KEY_DOWN"] reward = "score" character_set = ["@", "*", ">", "<", "^", "v"] spatial = true observe_state = [] You do not need to specify the board size: ``retro-gamer`` reads it directly from your game's ``board_size`` attribute. The fields are described below. ``actions`` ~~~~~~~~~~~ **Required.** A list of keystroke names the agent may send to the game each turn. Use arrow key names for directional games, or single characters for character-key games. .. code-block:: toml actions = ["KEY_RIGHT", "KEY_UP", "KEY_LEFT", "KEY_DOWN"] The agent also has access to a no-op action (doing nothing). The total number of actions in the Q-network output is ``len(actions) + 1``. ``reward`` ~~~~~~~~~~ **Required.** The key in the game's state dictionary to use as the reward signal. The reward computed for each turn is the *change* in this value from the previous turn. .. code-block:: toml reward = "score" ``character_set`` ~~~~~~~~~~~~~~~~~ **Optional.** A list of single characters that may appear on the board. Each character occupies one "slot" in the one-hot encoding. Characters not in this list are treated as empty space. .. code-block:: toml character_set = ["@", "*", ">", "<", "^", "v"] If omitted, ``retro-gamer`` runs an exploration phase to discover the characters that appear in practice. The length of this phase is controlled by the ``exploration_turns`` hyperparameter. ``spatial`` ~~~~~~~~~~~ **Optional; default ``true``.** Whether to treat the board as a 2D spatial scene. When ``true``, the trainer uses a convolutional neural network (CNN) that can detect patterns in the relative positions of characters. When ``false``, the trainer uses a multilayer perceptron (MLP) that sees the board as a flat list of numbers without positional structure. .. code-block:: toml spatial = true ``observe_state`` ~~~~~~~~~~~~~~~~~ **Optional; default ``[]``.** A list of keys from the game's state dictionary to append to the observation vector. The values must be numbers (integers, floats, or booleans). The reward key must not appear in this list. .. code-block:: toml observe_state = ["lives", "level"] .. _hyperparameters: Hyperparameters --------------- Hyperparameters are stored in the ``[hyperparameters]`` section of ``config.toml``. They can be set via ``retro-gamer create`` options or edited directly. Learning and optimization ~~~~~~~~~~~~~~~~~~~~~~~~~ ``learning_rate`` (default: ``0.001``) The step size used by the Adam optimizer when updating network weights. Larger values converge faster but may be unstable; smaller values are more stable but slower. ``lr_decay`` (default: ``0.995``) Multiplicative decay applied to the learning rate after each episode. The learning rate decreases geometrically over training, helping the network fine-tune later without destabilizing early progress. ``gamma`` (default: ``0.99``) The discount factor for future rewards. A value of 1.0 makes the agent value all future rewards equally; smaller values make the agent increasingly myopic. Exploration ~~~~~~~~~~~ ``epsilon`` (default: ``1.0``) The initial exploration rate. At each turn, the agent takes a random action with probability ``epsilon`` and exploits its current Q-function with probability ``1 - epsilon``. ``epsilon_decay`` (default: ``0.995``) Multiplicative decay applied to ``epsilon`` after each episode. ``epsilon_min`` (default: ``0.05``) The floor below which ``epsilon`` will not fall. A small amount of continued exploration prevents the agent from becoming permanently committed to a suboptimal policy. Memory and sampling ~~~~~~~~~~~~~~~~~~~ ``batch_size`` (default: ``64``) The number of experiences sampled from the replay buffer per training step. ``memory_capacity`` (default: ``10000``) The maximum number of experiences the replay buffer can hold. When full, the oldest experiences are discarded. ``prioritize_experiences`` (default: ``false``) Whether to use prioritized experience replay. When ``true``, experiences with larger TD errors are sampled more frequently. This often improves sample efficiency at a modest computational cost. Network architecture ~~~~~~~~~~~~~~~~~~~~ ``n_layers`` (default: ``2``) The number of hidden layers in the MLP head (for spatial games, this follows the CNN; for non-spatial games, it is the full network). ``layer_size`` (default: ``128``) The width (number of units) in each hidden layer. Training duration ~~~~~~~~~~~~~~~~~ ``training_episodes`` (default: ``1000``) The total number of game episodes to run. Each episode runs until the game ends or ``max_turns_per_episode`` turns have elapsed. ``max_turns_per_episode`` (default: ``2000``) A safety cutoff preventing a single episode from running indefinitely (for example, if the agent finds a way to avoid dying). ``target_update_freq`` (default: ``100``) How many training steps between updates of the target network. More frequent updates make training targets move faster (less stable); less frequent updates make them more stable but slower to reflect new learning. Character discovery ~~~~~~~~~~~~~~~~~~~ ``exploration_turns`` (default: ``200``) When ``character_set`` is not specified, the number of random turns to run at the start of training to discover which characters appear on the board. ``unknown_character_strategy`` (default: ``"ignore"``) What to do when a character appears during training that is not in the established ``character_set``. ``"ignore"`` treats it as an empty cell; ``"extend"`` rebuilds the model with an extended character set. CLI reference ------------- ``retro-gamer create`` ~~~~~~~~~~~~~~~~~~~~~~ Create a new training run directory with ``config.toml``. Game metadata is read automatically from the ``[tool.retro-gamer]`` section of your game's ``pyproject.toml``; you do not pass it on the command line. .. code-block:: console % retro-gamer create --game MODULE --output DIR [OPTIONS] **Required options:** - ``--game MODULE`` — Python module containing ``create_game()`` (e.g. ``retro.examples.snake``). The ``[tool.retro-gamer]`` section is read from the ``pyproject.toml`` found in or above the module's source directory. - ``--output DIR`` — Directory to create for this training run. **Hyperparameter options** (all optional; see :ref:`hyperparameters`): - ``--training-episodes N`` - ``--n-layers N`` - ``--layer-size N`` - ``--learning-rate F`` - ``--lr-decay F`` - ``--gamma F`` - ``--epsilon-decay F`` - ``--epsilon-min F`` - ``--batch-size N`` - ``--memory-capacity N`` - ``--target-update-freq N`` - ``--max-turns-per-episode N`` - ``--exploration-turns N`` - ``--prioritize-experiences`` / ``--no-prioritize-experiences`` ``retro-gamer train`` ~~~~~~~~~~~~~~~~~~~~~ Train (or resume training) a DQN agent. .. code-block:: console % retro-gamer train RUN_DIR [--resume CHECKPOINT] ``RUN_DIR`` must contain a ``config.toml`` generated by ``retro-gamer create``. If ``--resume`` is given, training resumes from the specified checkpoint file (relative or absolute path). ``retro-gamer play`` ~~~~~~~~~~~~~~~~~~~~ Watch a trained agent play the game in the terminal. .. code-block:: console % retro-gamer play RUN_DIR [--checkpoint NAME] [--framerate N] ``--checkpoint`` defaults to ``final``. You can specify a checkpoint by name (e.g. ``ep_0100``) or by path relative to ``RUN_DIR/checkpoints/``. ``--framerate`` sets the target frames per second (default: 12). Press Enter or Escape to quit. ``retro-gamer info`` ~~~~~~~~~~~~~~~~~~~~~ Print a summary of a training run: metadata, hyperparameters, recent episode log, and available checkpoints. .. code-block:: console % retro-gamer info RUN_DIR Training run directory structure --------------------------------- A training run is a self-contained directory with the following contents: .. code-block:: text runs/snake/ ├── config.toml # game description + hyperparameters ├── training.log # architecture rationale + per-episode log └── checkpoints/ ├── ep_0100.pt # model weights at episode 100 ├── ep_0200.pt ├── ... └── final.pt # model weights at training completion ``config.toml`` is written by ``retro-gamer create`` and updated (with the discovered character set and resolved hyperparameters) when ``retro-gamer train`` begins. Editing ``config.toml`` between ``create`` and ``train`` is the recommended way to adjust hyperparameters. ``training.log`` begins with the full architecture description generated at training startup, followed by one line per episode in the format:: [EP NNNN] total_reward=F steps=N epsilon=F avg_loss=F Checkpoint files are PyTorch state dictionaries containing model weights, optimizer state, the current epsilon, and the total number of training steps completed. They can be loaded with ``retro-gamer play`` or directly with the Python API. Python API ---------- For advanced use, ``retro-gamer``'s components are importable as a library. .. code-block:: python from retro_gamer import GameMetadata, GameEnvironment, DQNTrainer from retro.examples.snake import create_game # Read metadata from [tool.retro-gamer] in the game's pyproject.toml metadata = GameMetadata.from_pyproject("retro.examples.snake") trainer = DQNTrainer( create_game, metadata, "runs/snake/", training_episodes=500, n_layers=2, layer_size=128, ) trainer.train() ``GameEnvironment`` provides a gym-style interface for stepping through a game programmatically: .. code-block:: python from retro_gamer import GameEnvironment env = GameEnvironment(create_game, metadata) obs = env.reset() # returns initial observation vector obs, reward, done = env.step("KEY_RIGHT") The observation is a flat NumPy array of dtype ``float32``. For spatial games, the first ``C × H × W`` elements are the board (channel-first one-hot encoding); for non-spatial games, the board is encoded ``H × W × C`` and then flattened. Any ``observe_state`` values are appended at the end.