Reference ========= Game description fields ----------------------- Game descriptions are written in the ``[tool.retro-gamer]`` section of your game project's ``pyproject.toml``. ``retro-gamer create`` reads this section and copies the metadata into the training run's ``config.toml``, where it can also be inspected or hand-edited. A complete example for the Snake game: .. code-block:: toml [tool.retro-gamer] actions = ["KEY_RIGHT", "KEY_UP", "KEY_LEFT", "KEY_DOWN"] reward = "score" character_set = ["@", "*", ">", "<", "^", "v"] You do not need to specify the board size: ``retro-gamer`` reads it directly from your game's ``board_size`` attribute. The fields are described below. ``actions`` ~~~~~~~~~~~ **Required.** A list of keystroke names the agent may send to the game each turn. Use arrow key names for directional games, or single characters for character-key games. .. code-block:: toml actions = ["KEY_RIGHT", "KEY_UP", "KEY_LEFT", "KEY_DOWN"] The agent also has access to a no-op action (doing nothing). The total number of actions in the Q-network output is ``len(actions) + 1``. ``reward`` ~~~~~~~~~~ **Required.** The key in the game's state dictionary to use as the reward signal. The reward computed for each turn is the *change* in this value from the previous turn. .. code-block:: toml reward = "score" ``character_set`` ~~~~~~~~~~~~~~~~~ **Optional.** A list of single characters that may appear on the board. Each character occupies one "slot" in the one-hot encoding. Characters not in this list are treated as empty space. .. code-block:: toml character_set = ["@", "*", ">", "<", "^", "v"] If omitted, ``retro-gamer`` runs an exploration phase to discover the characters that appear in practice. The length of this phase is controlled by the ``exploration_turns`` hyperparameter. Preprocessing options --------------------- Preprocessing options live in the ``[preprocessing]`` section of a run's ``config.toml``. They control how the game's board and state are transformed into the observation vector that the neural network sees. ``retro-gamer create`` writes sensible defaults; you can edit them by hand before running ``retro-gamer train``. .. note:: Changes to any ``[preprocessing]`` option—or to the game description fields above—make existing checkpoints incompatible. Run ``retro-gamer clean`` before retraining after such changes. ``spatial`` (default: ``false``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Whether to treat the board as a 2D spatial scene. When ``true``, the trainer uses a convolutional neural network (CNN); when ``false``, a multilayer perceptron (MLP) that sees the board as a flat list of numbers. ``board`` (default: ``true``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Whether to include the board encoding in the observation vector. Set to ``false`` to train on game state variables only, with no board at all. This is useful for games with small, enumerable state spaces where a lookup table (classic Q-learning) is sufficient. When ``board = false``: - ``spatial`` must also be ``false`` (no board means no 2D scene for a CNN). - At least one key must be listed in ``observe_state``. - ``character_set`` is not required and character discovery is skipped. .. code-block:: toml [preprocessing] board = false observe_state = ["board_state"] ``observe_state`` (default: ``[]``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A list of keys from ``game.state`` to include in the observation vector, appended after the board encoding (or as the entire observation when ``board = false``). Scalar values contribute one element each; list or tuple values are flattened. .. code-block:: toml observe_state = ["apple_dx", "apple_dy"] The keys must be present in ``game.state`` at every step, initialized in ``create_game()`` before the game starts. All values that are lists or tuples must always have the same length from episode to episode. .. warning:: ``observe_state`` keys must be initialized to their final shape in ``create_game()`` before the game starts. If a key is absent or its list length changes between episodes, training will crash with an error explaining which key changed and by how much. This happens because the neural network's input layer has a fixed size determined at the start of training; it cannot adapt to a changing observation shape mid-run. Always initialize every observed key with a placeholder of the correct type and length before the first ``game.step()`` call. ``observe_state_sizes`` (auto-discovered) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A table mapping each ``observe_state`` key to its flat size (``1`` for scalars, ``N`` for sequences of length N). This is written automatically to ``config.toml`` the first time ``retro-gamer train`` runs, after the trainer samples ``game.state`` to discover the actual sizes: .. code-block:: toml observe_state_sizes = {board_state = 9} You do not need to set this manually. Once written, it is used to detect changes in state shape when resuming training—an incompatible change here requires running ``retro-gamer clean`` and starting fresh. ``egocentric`` (default: ``false``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When ``true``, the board observation is cropped to a square window centred on a specific agent rather than the full board. This gives the agent a local, first-person-like view and makes the observation invariant to the agent's absolute position on the board. Requires ``egocentric_player`` and ``egocentric_radius``. ``egocentric_player`` ~~~~~~~~~~~~~~~~~~~~~~ The name of the agent to use as the centre of the egocentric crop. Must match the ``name`` attribute of one of the game's agents. .. code-block:: toml egocentric_player = "Snake head" ``egocentric_radius`` ~~~~~~~~~~~~~~~~~~~~~~ The half-side-length of the egocentric crop window, in cells. The resulting observation covers a ``(2r+1) × (2r+1)`` region. Larger values give the agent a wider view; smaller values focus it on the immediate vicinity. .. code-block:: toml egocentric_radius = 8 # 17×17 window When ``egocentric_radius`` is set, ``board_size`` in ``[metadata]`` is automatically updated to ``[2r+1, 2r+1]`` so the network is sized correctly. .. _hyperparameters: Hyperparameters --------------- Hyperparameters are split across two sections of ``config.toml``: - ``[model]`` — network architecture (changing these requires starting fresh) - ``[training]`` — learning algorithm parameters (safe to change at any time) Both sections can be set via ``retro-gamer create`` options or edited directly. Learning and optimization ~~~~~~~~~~~~~~~~~~~~~~~~~ ``learning_rate`` (default: ``0.0001``) The step size used by the Adam optimizer when updating network weights. Larger values converge faster but may be unstable; smaller values are more stable but slower. ``learning_rate_decay`` (default: ``0.9999``) Multiplicative decay applied to the learning rate after each episode. The learning rate decreases geometrically over training, helping the network fine-tune later without destabilizing early progress. With the default value, the learning rate decays to about 13 % of its starting value after 20 000 episodes. ``gamma`` (default: ``0.99``) The discount factor for future rewards. A value of 1.0 makes the agent value all future rewards equally; smaller values make the agent increasingly myopic. Exploration ~~~~~~~~~~~ ``epsilon`` (default: ``1.0``) The initial exploration rate. At each turn, the agent takes a random action with probability ``epsilon`` and exploits its current Q-function with probability ``1 - epsilon``. ``epsilon_decay`` (default: ``0.9997``) Multiplicative decay applied to ``epsilon`` after each episode. ``epsilon_min`` (default: ``0.05``) The floor below which ``epsilon`` will not fall. A small amount of continued exploration prevents the agent from becoming permanently committed to a suboptimal policy. Memory and sampling ~~~~~~~~~~~~~~~~~~~ ``batch_size`` (default: ``64``) The number of experiences sampled from the replay buffer per training step. ``memory_capacity`` (default: ``50000``) The maximum number of experiences the replay buffer can hold. When full, the oldest experiences are discarded. ``prioritize_experiences`` (default: ``true``) Whether to use prioritized experience replay. When ``true``, experiences with larger TD errors are sampled more frequently. This often improves sample efficiency at a modest computational cost. Model architecture ~~~~~~~~~~~~~~~~~~ These live in the ``[model]`` section. Changing them requires starting fresh (run ``retro-gamer clean`` before retraining). ``hidden_sizes`` (default: ``[128, 64]``) A list of integers giving the size of each hidden layer in the MLP head. The default creates two layers: 128 units then 64. For spatial games this follows the CNN; for non-spatial games it is the full network. Larger or deeper networks can represent more complex Q-functions but train more slowly and may need more episodes. Training duration ~~~~~~~~~~~~~~~~~ ``training_episodes`` (default: ``20000``) The total number of game episodes to run. Each episode runs until the game ends or ``max_turns_per_episode`` turns have elapsed. ``max_turns_per_episode`` (default: ``2000``) A safety cutoff preventing a single episode from running indefinitely (for example, if the agent finds a way to avoid dying). ``target_update_freq`` (default: ``500``) How many training steps between updates of the target network. More frequent updates make training targets move faster (less stable); less frequent updates make them more stable but slower to reflect new learning. ``train_every`` (default: ``4``) Run one training step every N game steps. Higher values speed up episode collection at the cost of fewer gradient updates per experience. The default of 4 is a good balance for most games; set to 1 to train on every step. Character discovery ~~~~~~~~~~~~~~~~~~~ ``exploration_turns`` (default: ``200``) When ``character_set`` is not specified, the number of random turns to run at the start of training to discover which characters appear on the board. ``unknown_character_strategy`` (default: ``"ignore"``) What to do when a character appears during training that is not in the established ``character_set``. ``"ignore"`` treats it as an empty cell; ``"extend"`` rebuilds the model with an extended character set. CLI reference ------------- ``retro-gamer create`` ~~~~~~~~~~~~~~~~~~~~~~ Create a new training run directory with ``config.toml``. Game metadata is read automatically from the ``[tool.retro-gamer]`` section of your game's ``pyproject.toml``; you do not pass it on the command line. .. code-block:: console % retro-gamer create --game GAME --output DIR [OPTIONS] **Required options:** - ``--game GAME`` — Your game, specified as a file path or a Python module name: - File path: ``--game my_game.py`` or ``--game my_game/`` - Module name: ``--game retro.examples.snake`` The ``[tool.retro-gamer]`` section is read from the ``pyproject.toml`` found in or above the game file. - ``--output DIR`` — Directory to create for this training run. **Hyperparameter options** (all optional; see :ref:`hyperparameters`): - ``--training-episodes N`` - ``--hidden-sizes SIZES`` — comma-separated, e.g. ``512,256`` - ``--learning-rate F`` - ``--learning-rate-decay F`` - ``--gamma F`` - ``--epsilon-decay F`` - ``--epsilon-min F`` - ``--batch-size N`` - ``--memory-capacity N`` - ``--target-update-freq N`` - ``--max-turns-per-episode N`` - ``--exploration-turns N`` - ``--train-every N`` - ``--prioritize-experiences`` / ``--no-prioritize-experiences`` ``retro-gamer train`` ~~~~~~~~~~~~~~~~~~~~~ Train a DQN agent. .. code-block:: console % retro-gamer train RUN_DIR ``RUN_DIR`` must contain a ``config.toml`` generated by ``retro-gamer create``. If checkpoints already exist in ``RUN_DIR``, training automatically resumes from the latest one so prior work is never lost. If all configured episodes have already been completed, the command prints a message and exits immediately. To keep training, increase ``training_episodes`` in ``config.toml`` and run again. **Incompatible changes.** Some config changes make existing checkpoints unusable. If you change any of the following, ``retro-gamer train`` will detect the mismatch and refuse to resume, with a clear explanation: - ``actions``, ``reward``, ``character_set``, ``board_size`` (``[metadata]``) — game description - ``spatial``, ``board``, ``observe_state``, ``observe_state_sizes``, ``egocentric``, ``egocentric_player``, ``egocentric_radius`` (``[preprocessing]``) — observation encoding - ``hidden_sizes`` (``[model]``) — network architecture Run ``retro-gamer clean RUN_DIR`` to remove the old checkpoints and start fresh. Other hyperparameter changes (learning rate, epsilon, etc.) are safe and take effect immediately on the next training run. ``retro-gamer play`` ~~~~~~~~~~~~~~~~~~~~ Watch a trained agent play the game in the terminal. .. code-block:: console % retro-gamer play RUN_DIR [--checkpoint NAME] [--framerate N] By default, the latest available checkpoint is loaded. Use ``--checkpoint`` to load a specific one by name (e.g. ``ep_0100``). ``--framerate`` sets the target frames per second (default: 12). Press Enter or Escape to quit. ``retro-gamer clean`` ~~~~~~~~~~~~~~~~~~~~~ Remove all checkpoints and the training log from a run directory. .. code-block:: console % retro-gamer clean RUN_DIR Prompts for confirmation before deleting. Use ``--yes`` / ``-y`` to skip the prompt. The ``config.toml`` is preserved so you can run ``retro-gamer train`` immediately to start fresh with the same settings. Use this after making an incompatible change (see ``retro-gamer train`` above) or any time you want to restart training from scratch. ``retro-gamer info`` ~~~~~~~~~~~~~~~~~~~~~ Print a summary of a training run: metadata, hyperparameters, recent checkpoint log, and available checkpoints. .. code-block:: console % retro-gamer info RUN_DIR Training run directory structure --------------------------------- A training run is a self-contained directory with the following contents: .. code-block:: text runs/snake/ ├── config.toml # game description + hyperparameters ├── training.log # architecture rationale + per-episode log └── checkpoints/ ├── ep_0100.pt # model weights at episode 100 ├── ep_0200.pt └── ... # one file saved every 100 episodes ``config.toml`` is written by ``retro-gamer create`` and updated (with the discovered character set and resolved hyperparameters) when ``retro-gamer train`` begins. It has five sections: ``[game]``, ``[metadata]``, ``[preprocessing]``, ``[model]``, and ``[training]``. Editing ``config.toml`` between ``create`` and ``train`` is the recommended way to adjust hyperparameters. ``training.log`` begins with the full network architecture description, then one line per checkpoint (every 100 episodes) in the format:: [ep_NNNN] ep=SSSS-NNNN avg_reward=F avg_steps=N epsilon=F avg_loss=F time=Xm Xs total=Xm Xs Each field averages over the episodes since the previous checkpoint: - ``ep=SSSS-NNNN`` — episode range covered by this entry - ``avg_reward`` — mean total reward per episode (positive = good) - ``avg_steps`` — mean episode length in game turns - ``epsilon`` — current exploration rate (approaches ``epsilon_min`` over time) - ``avg_loss`` — mean Huber loss across training steps (should decrease as learning stabilises). Huber loss equals ½·(q−t)² for small errors and |q−t|−½ for large ones, so it stays bounded even when Q-values are large. Values in the range 0–10 are typical; a slow downward trend over thousands of episodes is the healthy pattern. A loss that grows without bound indicates a learning rate that is too high. - ``time`` — wall-clock time for this checkpoint interval - ``total`` — cumulative training time across all sessions When training is resumed, a ``=== Resumed from ... ===`` line is appended so the log records the full history of a run across multiple sessions. Python API ---------- For advanced use, ``retro-gamer``'s components are importable as a library. See the :doc:`api` reference for full details. .. code-block:: python from retro_gamer import GameMetadata, DQNTrainer from retro.examples.snake import create_game metadata = GameMetadata.from_pyproject("retro.examples.snake") trainer = DQNTrainer(create_game, metadata, "runs/snake/") trainer.train()