Files
retro-gamer/docs/reference.rst
Chris Proctor 5ca97dc5d0 Initial commit
2026-05-08 14:07:17 -04:00

345 lines
10 KiB
ReStructuredText
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Reference
=========
Game description fields
-----------------------
Game descriptions are written in the ``[tool.retro-gamer]`` section of
your game project's ``pyproject.toml``. ``retro-gamer create`` reads
this section and copies the metadata into the training run's
``config.toml``, where it can also be inspected or hand-edited.
A complete example for the Snake game:
.. code-block:: toml
[tool.retro-gamer]
actions = ["KEY_RIGHT", "KEY_UP", "KEY_LEFT", "KEY_DOWN"]
reward = "score"
character_set = ["@", "*", ">", "<", "^", "v"]
spatial = true
observe_state = []
You do not need to specify the board size: ``retro-gamer`` reads it
directly from your game's ``board_size`` attribute.
The fields are described below.
``actions``
~~~~~~~~~~~
**Required.** A list of keystroke names the agent may send to the game
each turn. Use arrow key names for directional games, or single
characters for character-key games.
.. code-block:: toml
actions = ["KEY_RIGHT", "KEY_UP", "KEY_LEFT", "KEY_DOWN"]
The agent also has access to a no-op action (doing nothing). The total
number of actions in the Q-network output is ``len(actions) + 1``.
``reward``
~~~~~~~~~~
**Required.** The key in the game's state dictionary to use as the
reward signal. The reward computed for each turn is the *change* in
this value from the previous turn.
.. code-block:: toml
reward = "score"
``character_set``
~~~~~~~~~~~~~~~~~
**Optional.** A list of single characters that may appear on the board.
Each character occupies one "slot" in the one-hot encoding. Characters
not in this list are treated as empty space.
.. code-block:: toml
character_set = ["@", "*", ">", "<", "^", "v"]
If omitted, ``retro-gamer`` runs an exploration phase to discover the
characters that appear in practice. The length of this phase is
controlled by the ``exploration_turns`` hyperparameter.
``spatial``
~~~~~~~~~~~
**Optional; default ``true``.** Whether to treat the board as a 2D
spatial scene. When ``true``, the trainer uses a convolutional neural
network (CNN) that can detect patterns in the relative positions of
characters. When ``false``, the trainer uses a multilayer perceptron
(MLP) that sees the board as a flat list of numbers without positional
structure.
.. code-block:: toml
spatial = true
``observe_state``
~~~~~~~~~~~~~~~~~
**Optional; default ``[]``.** A list of keys from the game's state
dictionary to append to the observation vector. The values must be
numbers (integers, floats, or booleans). The reward key must not
appear in this list.
.. code-block:: toml
observe_state = ["lives", "level"]
.. _hyperparameters:
Hyperparameters
---------------
Hyperparameters are stored in the ``[hyperparameters]`` section of
``config.toml``. They can be set via ``retro-gamer create`` options or
edited directly.
Learning and optimization
~~~~~~~~~~~~~~~~~~~~~~~~~
``learning_rate`` (default: ``0.001``)
The step size used by the Adam optimizer when updating network
weights. Larger values converge faster but may be unstable; smaller
values are more stable but slower.
``lr_decay`` (default: ``0.995``)
Multiplicative decay applied to the learning rate after each
episode. The learning rate decreases geometrically over training,
helping the network fine-tune later without destabilizing early
progress.
``gamma`` (default: ``0.99``)
The discount factor for future rewards. A value of 1.0 makes the
agent value all future rewards equally; smaller values make the
agent increasingly myopic.
Exploration
~~~~~~~~~~~
``epsilon`` (default: ``1.0``)
The initial exploration rate. At each turn, the agent takes a
random action with probability ``epsilon`` and exploits its current
Q-function with probability ``1 - epsilon``.
``epsilon_decay`` (default: ``0.995``)
Multiplicative decay applied to ``epsilon`` after each episode.
``epsilon_min`` (default: ``0.05``)
The floor below which ``epsilon`` will not fall. A small amount of
continued exploration prevents the agent from becoming permanently
committed to a suboptimal policy.
Memory and sampling
~~~~~~~~~~~~~~~~~~~
``batch_size`` (default: ``64``)
The number of experiences sampled from the replay buffer per
training step.
``memory_capacity`` (default: ``10000``)
The maximum number of experiences the replay buffer can hold. When
full, the oldest experiences are discarded.
``prioritize_experiences`` (default: ``false``)
Whether to use prioritized experience replay. When ``true``,
experiences with larger TD errors are sampled more frequently.
This often improves sample efficiency at a modest computational
cost.
Network architecture
~~~~~~~~~~~~~~~~~~~~
``n_layers`` (default: ``2``)
The number of hidden layers in the MLP head (for spatial games,
this follows the CNN; for non-spatial games, it is the full
network).
``layer_size`` (default: ``128``)
The width (number of units) in each hidden layer.
Training duration
~~~~~~~~~~~~~~~~~
``training_episodes`` (default: ``1000``)
The total number of game episodes to run. Each episode runs until
the game ends or ``max_turns_per_episode`` turns have elapsed.
``max_turns_per_episode`` (default: ``2000``)
A safety cutoff preventing a single episode from running
indefinitely (for example, if the agent finds a way to avoid
dying).
``target_update_freq`` (default: ``100``)
How many training steps between updates of the target network.
More frequent updates make training targets move faster (less
stable); less frequent updates make them more stable but slower
to reflect new learning.
Character discovery
~~~~~~~~~~~~~~~~~~~
``exploration_turns`` (default: ``200``)
When ``character_set`` is not specified, the number of random
turns to run at the start of training to discover which
characters appear on the board.
``unknown_character_strategy`` (default: ``"ignore"``)
What to do when a character appears during training that is not
in the established ``character_set``. ``"ignore"`` treats it as
an empty cell; ``"extend"`` rebuilds the model with an extended
character set.
CLI reference
-------------
``retro-gamer create``
~~~~~~~~~~~~~~~~~~~~~~
Create a new training run directory with ``config.toml``. Game metadata
is read automatically from the ``[tool.retro-gamer]`` section of your
game's ``pyproject.toml``; you do not pass it on the command line.
.. code-block:: console
% retro-gamer create --game MODULE --output DIR [OPTIONS]
**Required options:**
- ``--game MODULE`` — Python module containing ``create_game()``
(e.g. ``retro.examples.snake``). The ``[tool.retro-gamer]`` section
is read from the ``pyproject.toml`` found in or above the module's
source directory.
- ``--output DIR`` — Directory to create for this training run.
**Hyperparameter options** (all optional; see :ref:`hyperparameters`):
- ``--training-episodes N``
- ``--n-layers N``
- ``--layer-size N``
- ``--learning-rate F``
- ``--lr-decay F``
- ``--gamma F``
- ``--epsilon-decay F``
- ``--epsilon-min F``
- ``--batch-size N``
- ``--memory-capacity N``
- ``--target-update-freq N``
- ``--max-turns-per-episode N``
- ``--exploration-turns N``
- ``--prioritize-experiences`` / ``--no-prioritize-experiences``
``retro-gamer train``
~~~~~~~~~~~~~~~~~~~~~
Train (or resume training) a DQN agent.
.. code-block:: console
% retro-gamer train RUN_DIR [--resume CHECKPOINT]
``RUN_DIR`` must contain a ``config.toml`` generated by ``retro-gamer
create``. If ``--resume`` is given, training resumes from the specified
checkpoint file (relative or absolute path).
``retro-gamer play``
~~~~~~~~~~~~~~~~~~~~
Watch a trained agent play the game in the terminal.
.. code-block:: console
% retro-gamer play RUN_DIR [--checkpoint NAME] [--framerate N]
``--checkpoint`` defaults to ``final``. You can specify a checkpoint by
name (e.g. ``ep_0100``) or by path relative to ``RUN_DIR/checkpoints/``.
``--framerate`` sets the target frames per second (default: 12). Press
Enter or Escape to quit.
``retro-gamer info``
~~~~~~~~~~~~~~~~~~~~~
Print a summary of a training run: metadata, hyperparameters, recent
episode log, and available checkpoints.
.. code-block:: console
% retro-gamer info RUN_DIR
Training run directory structure
---------------------------------
A training run is a self-contained directory with the following
contents:
.. code-block:: text
runs/snake/
├── config.toml # game description + hyperparameters
├── training.log # architecture rationale + per-episode log
└── checkpoints/
├── ep_0100.pt # model weights at episode 100
├── ep_0200.pt
├── ...
└── final.pt # model weights at training completion
``config.toml`` is written by ``retro-gamer create`` and updated (with
the discovered character set and resolved hyperparameters) when
``retro-gamer train`` begins. Editing ``config.toml`` between ``create``
and ``train`` is the recommended way to adjust hyperparameters.
``training.log`` begins with the full architecture description
generated at training startup, followed by one line per episode in the
format::
[EP NNNN] total_reward=F steps=N epsilon=F avg_loss=F
Checkpoint files are PyTorch state dictionaries containing model
weights, optimizer state, the current epsilon, and the total number of
training steps completed. They can be loaded with
``retro-gamer play`` or directly with the Python API.
Python API
----------
For advanced use, ``retro-gamer``'s components are importable as a
library.
.. code-block:: python
from retro_gamer import GameMetadata, GameEnvironment, DQNTrainer
from retro.examples.snake import create_game
# Read metadata from [tool.retro-gamer] in the game's pyproject.toml
metadata = GameMetadata.from_pyproject("retro.examples.snake")
trainer = DQNTrainer(
create_game, metadata, "runs/snake/",
training_episodes=500,
n_layers=2,
layer_size=128,
)
trainer.train()
``GameEnvironment`` provides a gym-style interface for stepping through
a game programmatically:
.. code-block:: python
from retro_gamer import GameEnvironment
env = GameEnvironment(create_game, metadata)
obs = env.reset() # returns initial observation vector
obs, reward, done = env.step("KEY_RIGHT")
The observation is a flat NumPy array of dtype ``float32``. For spatial
games, the first ``C × H × W`` elements are the board (channel-first
one-hot encoding); for non-spatial games, the board is encoded
``H × W × C`` and then flattened. Any ``observe_state`` values are
appended at the end.