retro-gamer/docs/reference.rst

Reference
=========

Game description fields
-----------------------

Game descriptions are written in the ``[tool.retro-gamer]`` section of
your game project's ``pyproject.toml``. ``retro-gamer create`` reads
this section and copies the metadata into the training run's
``config.toml``, where it can also be inspected or hand-edited.

A complete example for the Snake game:

.. code-block:: toml

   [tool.retro-gamer]
   actions = ["KEY_RIGHT", "KEY_UP", "KEY_LEFT", "KEY_DOWN"]
   reward = "score"
   character_set = ["@", "*", ">", "<", "^", "v"]
   spatial = true
   observe_state = []

You do not need to specify the board size: ``retro-gamer`` reads it
directly from your game's ``board_size`` attribute.

The fields are described below.

``actions``
~~~~~~~~~~~

**Required.** A list of keystroke names the agent may send to the game
each turn. Use arrow key names for directional games, or single
characters for character-key games.

.. code-block:: toml

   actions = ["KEY_RIGHT", "KEY_UP", "KEY_LEFT", "KEY_DOWN"]

The agent also has access to a no-op action (doing nothing). The total
number of actions in the Q-network output is ``len(actions) + 1``.

``reward``
~~~~~~~~~~

**Required.** The key in the game's state dictionary to use as the
reward signal. The reward computed for each turn is the *change* in
this value from the previous turn.

.. code-block:: toml

   reward = "score"

``character_set``
~~~~~~~~~~~~~~~~~

**Optional.** A list of single characters that may appear on the board.
Each character occupies one "slot" in the one-hot encoding. Characters
not in this list are treated as empty space.

.. code-block:: toml

   character_set = ["@", "*", ">", "<", "^", "v"]

If omitted, ``retro-gamer`` runs an exploration phase to discover the
characters that appear in practice. The length of this phase is
controlled by the ``exploration_turns`` hyperparameter.

``spatial``
~~~~~~~~~~~

**Optional; default ``true``.** Whether to treat the board as a 2D
spatial scene. When ``true``, the trainer uses a convolutional neural
network (CNN) that can detect patterns in the relative positions of
characters. When ``false``, the trainer uses a multilayer perceptron
(MLP) that sees the board as a flat list of numbers without positional
structure.

.. code-block:: toml

   spatial = true

``observe_state``
~~~~~~~~~~~~~~~~~

**Optional; default ``[]``.** A list of keys from the game's state
dictionary to append to the observation vector. The values must be
numbers (integers, floats, or booleans). The reward key must not
appear in this list.

.. code-block:: toml

   observe_state = ["lives", "level"]

.. _hyperparameters:

Hyperparameters
---------------

Hyperparameters are stored in the ``[hyperparameters]`` section of
``config.toml``. They can be set via ``retro-gamer create`` options or
edited directly.

Learning and optimization
~~~~~~~~~~~~~~~~~~~~~~~~~

``learning_rate`` (default: ``0.001``)
    The step size used by the Adam optimizer when updating network
    weights. Larger values converge faster but may be unstable; smaller
    values are more stable but slower.

``lr_decay`` (default: ``0.995``)
    Multiplicative decay applied to the learning rate after each
    episode. The learning rate decreases geometrically over training,
    helping the network fine-tune later without destabilizing early
    progress.

``gamma`` (default: ``0.99``)
    The discount factor for future rewards. A value of 1.0 makes the
    agent value all future rewards equally; smaller values make the
    agent increasingly myopic.

Exploration
~~~~~~~~~~~

``epsilon`` (default: ``1.0``)
    The initial exploration rate. At each turn, the agent takes a
    random action with probability ``epsilon`` and exploits its current
    Q-function with probability ``1 - epsilon``.

``epsilon_decay`` (default: ``0.995``)
    Multiplicative decay applied to ``epsilon`` after each episode.

``epsilon_min`` (default: ``0.05``)
    The floor below which ``epsilon`` will not fall. A small amount of
    continued exploration prevents the agent from becoming permanently
    committed to a suboptimal policy.

Memory and sampling
~~~~~~~~~~~~~~~~~~~

``batch_size`` (default: ``64``)
    The number of experiences sampled from the replay buffer per
    training step.

``memory_capacity`` (default: ``10000``)
    The maximum number of experiences the replay buffer can hold. When
    full, the oldest experiences are discarded.

``prioritize_experiences`` (default: ``false``)
    Whether to use prioritized experience replay. When ``true``,
    experiences with larger TD errors are sampled more frequently.
    This often improves sample efficiency at a modest computational
    cost.

Network architecture
~~~~~~~~~~~~~~~~~~~~

``n_layers`` (default: ``2``)
    The number of hidden layers in the MLP head (for spatial games,
    this follows the CNN; for non-spatial games, it is the full
    network).

``layer_size`` (default: ``128``)
    The width (number of units) in each hidden layer.

Training duration
~~~~~~~~~~~~~~~~~

``training_episodes`` (default: ``1000``)
    The total number of game episodes to run. Each episode runs until
    the game ends or ``max_turns_per_episode`` turns have elapsed.

``max_turns_per_episode`` (default: ``2000``)
    A safety cutoff preventing a single episode from running
    indefinitely (for example, if the agent finds a way to avoid
    dying).

``target_update_freq`` (default: ``100``)
    How many training steps between updates of the target network.
    More frequent updates make training targets move faster (less
    stable); less frequent updates make them more stable but slower
    to reflect new learning.

Character discovery
~~~~~~~~~~~~~~~~~~~

``exploration_turns`` (default: ``200``)
    When ``character_set`` is not specified, the number of random
    turns to run at the start of training to discover which
    characters appear on the board.

``unknown_character_strategy`` (default: ``"ignore"``)
    What to do when a character appears during training that is not
    in the established ``character_set``. ``"ignore"`` treats it as
    an empty cell; ``"extend"`` rebuilds the model with an extended
    character set.

CLI reference
-------------

``retro-gamer create``
~~~~~~~~~~~~~~~~~~~~~~

Create a new training run directory with ``config.toml``. Game metadata
is read automatically from the ``[tool.retro-gamer]`` section of your
game's ``pyproject.toml``; you do not pass it on the command line.

.. code-block:: console

   % retro-gamer create --game MODULE --output DIR [OPTIONS]

**Required options:**

- ``--game MODULE`` — Python module containing ``create_game()``
  (e.g. ``retro.examples.snake``). The ``[tool.retro-gamer]`` section
  is read from the ``pyproject.toml`` found in or above the module's
  source directory.
- ``--output DIR`` — Directory to create for this training run.

**Hyperparameter options** (all optional; see :ref:`hyperparameters`):

- ``--training-episodes N``
- ``--n-layers N``
- ``--layer-size N``
- ``--learning-rate F``
- ``--lr-decay F``
- ``--gamma F``
- ``--epsilon-decay F``
- ``--epsilon-min F``
- ``--batch-size N``
- ``--memory-capacity N``
- ``--target-update-freq N``
- ``--max-turns-per-episode N``
- ``--exploration-turns N``
- ``--prioritize-experiences`` / ``--no-prioritize-experiences``

``retro-gamer train``
~~~~~~~~~~~~~~~~~~~~~

Train (or resume training) a DQN agent.

.. code-block:: console

   % retro-gamer train RUN_DIR [--resume CHECKPOINT]

``RUN_DIR`` must contain a ``config.toml`` generated by ``retro-gamer
create``. If ``--resume`` is given, training resumes from the specified
checkpoint file (relative or absolute path).

``retro-gamer play``
~~~~~~~~~~~~~~~~~~~~

Watch a trained agent play the game in the terminal.

.. code-block:: console

   % retro-gamer play RUN_DIR [--checkpoint NAME] [--framerate N]

``--checkpoint`` defaults to ``final``. You can specify a checkpoint by
name (e.g. ``ep_0100``) or by path relative to ``RUN_DIR/checkpoints/``.
``--framerate`` sets the target frames per second (default: 12). Press
Enter or Escape to quit.

``retro-gamer info``
~~~~~~~~~~~~~~~~~~~~~

Print a summary of a training run: metadata, hyperparameters, recent
episode log, and available checkpoints.

.. code-block:: console

   % retro-gamer info RUN_DIR

Training run directory structure
---------------------------------

A training run is a self-contained directory with the following
contents:

.. code-block:: text

   runs/snake/
   ├── config.toml       # game description + hyperparameters
   ├── training.log      # architecture rationale + per-episode log
   └── checkpoints/
       ├── ep_0100.pt    # model weights at episode 100
       ├── ep_0200.pt
       ├── ...
       └── final.pt      # model weights at training completion

``config.toml`` is written by ``retro-gamer create`` and updated (with
the discovered character set and resolved hyperparameters) when
``retro-gamer train`` begins. Editing ``config.toml`` between ``create``
and ``train`` is the recommended way to adjust hyperparameters.

``training.log`` begins with the full architecture description
generated at training startup, followed by one line per episode in the
format::

   [EP NNNN] total_reward=F  steps=N  epsilon=F  avg_loss=F

Checkpoint files are PyTorch state dictionaries containing model
weights, optimizer state, the current epsilon, and the total number of
training steps completed. They can be loaded with
``retro-gamer play`` or directly with the Python API.

Python API
----------

For advanced use, ``retro-gamer``'s components are importable as a
library.

.. code-block:: python

   from retro_gamer import GameMetadata, GameEnvironment, DQNTrainer
   from retro.examples.snake import create_game

   # Read metadata from [tool.retro-gamer] in the game's pyproject.toml
   metadata = GameMetadata.from_pyproject("retro.examples.snake")

   trainer = DQNTrainer(
       create_game, metadata, "runs/snake/",
       training_episodes=500,
       n_layers=2,
       layer_size=128,
   )
   trainer.train()

``GameEnvironment`` provides a gym-style interface for stepping through
a game programmatically:

.. code-block:: python

   from retro_gamer import GameEnvironment

   env = GameEnvironment(create_game, metadata)
   obs = env.reset()             # returns initial observation vector
   obs, reward, done = env.step("KEY_RIGHT")

The observation is a flat NumPy array of dtype ``float32``. For spatial
games, the first ``C × H × W`` elements are the board (channel-first
one-hot encoding); for non-spatial games, the board is encoded
``H × W × C`` and then flattened. Any ``observe_state`` values are
appended at the end.