Updates across the board

This commit is contained in:
Chris Proctor
2026-06-22 16:41:31 -04:00
parent 5ca97dc5d0
commit 73624d1a0c
33 changed files with 3104 additions and 643 deletions

View File

@@ -17,8 +17,6 @@ A complete example for the Snake game:
actions = ["KEY_RIGHT", "KEY_UP", "KEY_LEFT", "KEY_DOWN"]
reward = "score"
character_set = ["@", "*", ">", "<", "^", "v"]
spatial = true
observe_state = []
You do not need to specify the board size: ``retro-gamer`` reads it
directly from your game's ``board_size`` attribute.
@@ -65,54 +63,156 @@ If omitted, ``retro-gamer`` runs an exploration phase to discover the
characters that appear in practice. The length of this phase is
controlled by the ``exploration_turns`` hyperparameter.
``spatial``
~~~~~~~~~~~
Preprocessing options
---------------------
**Optional; default ``true``.** Whether to treat the board as a 2D
spatial scene. When ``true``, the trainer uses a convolutional neural
network (CNN) that can detect patterns in the relative positions of
characters. When ``false``, the trainer uses a multilayer perceptron
(MLP) that sees the board as a flat list of numbers without positional
structure.
Preprocessing options live in the ``[preprocessing]`` section of a run's
``config.toml``. They control how the game's board and state are
transformed into the observation vector that the neural network sees.
``retro-gamer create`` writes sensible defaults; you can edit them by
hand before running ``retro-gamer train``.
.. note::
Changes to any ``[preprocessing]`` option—or to the game description
fields above—make existing checkpoints incompatible. Run
``retro-gamer clean`` before retraining after such changes.
``spatial`` (default: ``false``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Whether to treat the board as a 2D spatial scene. When ``true``, the
trainer uses a convolutional neural network (CNN); when ``false``, a
multilayer perceptron (MLP) that sees the board as a flat list of
numbers.
``board`` (default: ``true``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Whether to include the board encoding in the observation vector. Set
to ``false`` to train on game state variables only, with no board at
all. This is useful for games with small, enumerable state spaces where
a lookup table (classic Q-learning) is sufficient.
When ``board = false``:
- ``spatial`` must also be ``false`` (no board means no 2D scene for a CNN).
- At least one key must be listed in ``observe_state``.
- ``character_set`` is not required and character discovery is skipped.
.. code-block:: toml
spatial = true
[preprocessing]
board = false
observe_state = ["board_state"]
``observe_state``
~~~~~~~~~~~~~~~~~
``observe_state`` (default: ``[]``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
**Optional; default ``[]``.** A list of keys from the game's state
dictionary to append to the observation vector. The values must be
numbers (integers, floats, or booleans). The reward key must not
appear in this list.
A list of keys from ``game.state`` to include in the observation
vector, appended after the board encoding (or as the entire
observation when ``board = false``). Scalar values contribute one
element each; list or tuple values are flattened.
.. code-block:: toml
observe_state = ["lives", "level"]
observe_state = ["apple_dx", "apple_dy"]
The keys must be present in ``game.state`` at every step, initialized
in ``create_game()`` before the game starts. All values that are lists
or tuples must always have the same length from episode to episode.
.. warning::
``observe_state`` keys must be initialized to their final shape in
``create_game()`` before the game starts. If a key is absent or its
list length changes between episodes, training will crash with an
error explaining which key changed and by how much. This happens
because the neural network's input layer has a fixed size determined
at the start of training; it cannot adapt to a changing observation
shape mid-run.
Always initialize every observed key with a placeholder of the
correct type and length before the first ``game.step()`` call.
``observe_state_sizes`` (auto-discovered)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A table mapping each ``observe_state`` key to its flat size (``1`` for
scalars, ``N`` for sequences of length N). This is written automatically
to ``config.toml`` the first time ``retro-gamer train`` runs, after the
trainer samples ``game.state`` to discover the actual sizes:
.. code-block:: toml
observe_state_sizes = {board_state = 9}
You do not need to set this manually. Once written, it is used to
detect changes in state shape when resuming training—an incompatible
change here requires running ``retro-gamer clean`` and starting fresh.
``egocentric`` (default: ``false``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When ``true``, the board observation is cropped to a square window
centred on a specific agent rather than the full board. This gives the
agent a local, first-person-like view and makes the observation
invariant to the agent's absolute position on the board.
Requires ``egocentric_player`` and ``egocentric_radius``.
``egocentric_player``
~~~~~~~~~~~~~~~~~~~~~~
The name of the agent to use as the centre of the egocentric crop.
Must match the ``name`` attribute of one of the game's agents.
.. code-block:: toml
egocentric_player = "Snake head"
``egocentric_radius``
~~~~~~~~~~~~~~~~~~~~~~
The half-side-length of the egocentric crop window, in cells. The
resulting observation covers a ``(2r+1) × (2r+1)`` region. Larger
values give the agent a wider view; smaller values focus it on the
immediate vicinity.
.. code-block:: toml
egocentric_radius = 8 # 17×17 window
When ``egocentric_radius`` is set, ``board_size`` in ``[metadata]`` is
automatically updated to ``[2r+1, 2r+1]`` so the network is sized
correctly.
.. _hyperparameters:
Hyperparameters
---------------
Hyperparameters are stored in the ``[hyperparameters]`` section of
``config.toml``. They can be set via ``retro-gamer create`` options or
edited directly.
Hyperparameters are split across two sections of ``config.toml``:
- ``[model]`` — network architecture (changing these requires starting fresh)
- ``[training]`` — learning algorithm parameters (safe to change at any time)
Both sections can be set via ``retro-gamer create`` options or edited directly.
Learning and optimization
~~~~~~~~~~~~~~~~~~~~~~~~~
``learning_rate`` (default: ``0.001``)
``learning_rate`` (default: ``0.0001``)
The step size used by the Adam optimizer when updating network
weights. Larger values converge faster but may be unstable; smaller
values are more stable but slower.
``lr_decay`` (default: ``0.995``)
``learning_rate_decay`` (default: ``0.9999``)
Multiplicative decay applied to the learning rate after each
episode. The learning rate decreases geometrically over training,
helping the network fine-tune later without destabilizing early
progress.
progress. With the default value, the learning rate decays to about
13 % of its starting value after 20 000 episodes.
``gamma`` (default: ``0.99``)
The discount factor for future rewards. A value of 1.0 makes the
@@ -127,7 +227,7 @@ Exploration
random action with probability ``epsilon`` and exploits its current
Q-function with probability ``1 - epsilon``.
``epsilon_decay`` (default: ``0.995``)
``epsilon_decay`` (default: ``0.9997``)
Multiplicative decay applied to ``epsilon`` after each episode.
``epsilon_min`` (default: ``0.05``)
@@ -142,31 +242,33 @@ Memory and sampling
The number of experiences sampled from the replay buffer per
training step.
``memory_capacity`` (default: ``10000``)
``memory_capacity`` (default: ``50000``)
The maximum number of experiences the replay buffer can hold. When
full, the oldest experiences are discarded.
``prioritize_experiences`` (default: ``false``)
``prioritize_experiences`` (default: ``true``)
Whether to use prioritized experience replay. When ``true``,
experiences with larger TD errors are sampled more frequently.
This often improves sample efficiency at a modest computational
cost.
Network architecture
~~~~~~~~~~~~~~~~~~~~
Model architecture
~~~~~~~~~~~~~~~~~~
``n_layers`` (default: ``2``)
The number of hidden layers in the MLP head (for spatial games,
this follows the CNN; for non-spatial games, it is the full
network).
These live in the ``[model]`` section. Changing them requires starting fresh
(run ``retro-gamer clean`` before retraining).
``layer_size`` (default: ``128``)
The width (number of units) in each hidden layer.
``hidden_sizes`` (default: ``[128, 64]``)
A list of integers giving the size of each hidden layer in the MLP
head. The default creates two layers: 128 units then 64. For spatial
games this follows the CNN; for non-spatial games it is the full
network. Larger or deeper networks can represent more complex
Q-functions but train more slowly and may need more episodes.
Training duration
~~~~~~~~~~~~~~~~~
``training_episodes`` (default: ``1000``)
``training_episodes`` (default: ``20000``)
The total number of game episodes to run. Each episode runs until
the game ends or ``max_turns_per_episode`` turns have elapsed.
@@ -175,12 +277,18 @@ Training duration
indefinitely (for example, if the agent finds a way to avoid
dying).
``target_update_freq`` (default: ``100``)
``target_update_freq`` (default: ``500``)
How many training steps between updates of the target network.
More frequent updates make training targets move faster (less
stable); less frequent updates make them more stable but slower
to reflect new learning.
``train_every`` (default: ``4``)
Run one training step every N game steps. Higher values speed up
episode collection at the cost of fewer gradient updates per
experience. The default of 4 is a good balance for most games;
set to 1 to train on every step.
Character discovery
~~~~~~~~~~~~~~~~~~~
@@ -207,23 +315,26 @@ game's ``pyproject.toml``; you do not pass it on the command line.
.. code-block:: console
% retro-gamer create --game MODULE --output DIR [OPTIONS]
% retro-gamer create --game GAME --output DIR [OPTIONS]
**Required options:**
- ``--game MODULE``Python module containing ``create_game()``
(e.g. ``retro.examples.snake``). The ``[tool.retro-gamer]`` section
is read from the ``pyproject.toml`` found in or above the module's
source directory.
- ``--game GAME``Your game, specified as a file path or a Python
module name:
- File path: ``--game my_game.py`` or ``--game my_game/``
- Module name: ``--game retro.examples.snake``
The ``[tool.retro-gamer]`` section is read from the ``pyproject.toml``
found in or above the game file.
- ``--output DIR`` — Directory to create for this training run.
**Hyperparameter options** (all optional; see :ref:`hyperparameters`):
- ``--training-episodes N``
- ``--n-layers N``
- ``--layer-size N``
- ``--hidden-sizes SIZES`` — comma-separated, e.g. ``512,256``
- ``--learning-rate F``
- ``--lr-decay F``
- ``--learning-rate-decay F``
- ``--gamma F``
- ``--epsilon-decay F``
- ``--epsilon-min F``
@@ -232,20 +343,40 @@ game's ``pyproject.toml``; you do not pass it on the command line.
- ``--target-update-freq N``
- ``--max-turns-per-episode N``
- ``--exploration-turns N``
- ``--train-every N``
- ``--prioritize-experiences`` / ``--no-prioritize-experiences``
``retro-gamer train``
~~~~~~~~~~~~~~~~~~~~~
Train (or resume training) a DQN agent.
Train a DQN agent.
.. code-block:: console
% retro-gamer train RUN_DIR [--resume CHECKPOINT]
% retro-gamer train RUN_DIR
``RUN_DIR`` must contain a ``config.toml`` generated by ``retro-gamer
create``. If ``--resume`` is given, training resumes from the specified
checkpoint file (relative or absolute path).
create``. If checkpoints already exist in ``RUN_DIR``, training
automatically resumes from the latest one so prior work is never lost.
If all configured episodes have already been completed, the command
prints a message and exits immediately. To keep training, increase
``training_episodes`` in ``config.toml`` and run again.
**Incompatible changes.** Some config changes make existing checkpoints
unusable. If you change any of the following, ``retro-gamer train`` will
detect the mismatch and refuse to resume, with a clear explanation:
- ``actions``, ``reward``, ``character_set``, ``board_size``
(``[metadata]``) — game description
- ``spatial``, ``board``, ``observe_state``, ``observe_state_sizes``,
``egocentric``, ``egocentric_player``, ``egocentric_radius``
(``[preprocessing]``) — observation encoding
- ``hidden_sizes`` (``[model]``) — network architecture
Run ``retro-gamer clean RUN_DIR`` to remove the old checkpoints and start
fresh. Other hyperparameter changes (learning rate, epsilon, etc.) are
safe and take effect immediately on the next training run.
``retro-gamer play``
~~~~~~~~~~~~~~~~~~~~
@@ -256,16 +387,32 @@ Watch a trained agent play the game in the terminal.
% retro-gamer play RUN_DIR [--checkpoint NAME] [--framerate N]
``--checkpoint`` defaults to ``final``. You can specify a checkpoint by
name (e.g. ``ep_0100``) or by path relative to ``RUN_DIR/checkpoints/``.
By default, the latest available checkpoint is loaded. Use
``--checkpoint`` to load a specific one by name (e.g. ``ep_0100``).
``--framerate`` sets the target frames per second (default: 12). Press
Enter or Escape to quit.
``retro-gamer clean``
~~~~~~~~~~~~~~~~~~~~~
Remove all checkpoints and the training log from a run directory.
.. code-block:: console
% retro-gamer clean RUN_DIR
Prompts for confirmation before deleting. Use ``--yes`` / ``-y`` to skip
the prompt. The ``config.toml`` is preserved so you can run
``retro-gamer train`` immediately to start fresh with the same settings.
Use this after making an incompatible change (see ``retro-gamer train``
above) or any time you want to restart training from scratch.
``retro-gamer info``
~~~~~~~~~~~~~~~~~~~~~
Print a summary of a training run: metadata, hyperparameters, recent
episode log, and available checkpoints.
checkpoint log, and available checkpoints.
.. code-block:: console
@@ -285,60 +432,49 @@ contents:
└── checkpoints/
├── ep_0100.pt # model weights at episode 100
├── ep_0200.pt
── ...
└── final.pt # model weights at training completion
── ... # one file saved every 100 episodes
``config.toml`` is written by ``retro-gamer create`` and updated (with
the discovered character set and resolved hyperparameters) when
``retro-gamer train`` begins. Editing ``config.toml`` between ``create``
and ``train`` is the recommended way to adjust hyperparameters.
``retro-gamer train`` begins. It has five sections: ``[game]``,
``[metadata]``, ``[preprocessing]``, ``[model]``, and ``[training]``.
Editing ``config.toml`` between ``create`` and ``train`` is the
recommended way to adjust hyperparameters.
``training.log`` begins with the full architecture description
generated at training startup, followed by one line per episode in the
format::
``training.log`` begins with the full network architecture description,
then one line per checkpoint (every 100 episodes) in the format::
[EP NNNN] total_reward=F steps=N epsilon=F avg_loss=F
[ep_NNNN] ep=SSSS-NNNN avg_reward=F avg_steps=N epsilon=F avg_loss=F time=Xm Xs total=Xm Xs
Checkpoint files are PyTorch state dictionaries containing model
weights, optimizer state, the current epsilon, and the total number of
training steps completed. They can be loaded with
``retro-gamer play`` or directly with the Python API.
Each field averages over the episodes since the previous checkpoint:
- ``ep=SSSS-NNNN`` — episode range covered by this entry
- ``avg_reward`` — mean total reward per episode (positive = good)
- ``avg_steps`` — mean episode length in game turns
- ``epsilon`` — current exploration rate (approaches ``epsilon_min`` over time)
- ``avg_loss`` — mean Huber loss across training steps (should decrease as learning
stabilises). Huber loss equals ½·(qt)² for small errors and |qt|−½ for large
ones, so it stays bounded even when Q-values are large. Values in the range
010 are typical; a slow downward trend over thousands of episodes is the
healthy pattern. A loss that grows without bound indicates a learning rate
that is too high.
- ``time`` — wall-clock time for this checkpoint interval
- ``total`` — cumulative training time across all sessions
When training is resumed, a ``=== Resumed from ... ===`` line is appended
so the log records the full history of a run across multiple sessions.
Python API
----------
For advanced use, ``retro-gamer``'s components are importable as a
library.
library. See the :doc:`api` reference for full details.
.. code-block:: python
from retro_gamer import GameMetadata, GameEnvironment, DQNTrainer
from retro_gamer import GameMetadata, DQNTrainer
from retro.examples.snake import create_game
# Read metadata from [tool.retro-gamer] in the game's pyproject.toml
metadata = GameMetadata.from_pyproject("retro.examples.snake")
trainer = DQNTrainer(
create_game, metadata, "runs/snake/",
training_episodes=500,
n_layers=2,
layer_size=128,
)
trainer = DQNTrainer(create_game, metadata, "runs/snake/")
trainer.train()
``GameEnvironment`` provides a gym-style interface for stepping through
a game programmatically:
.. code-block:: python
from retro_gamer import GameEnvironment
env = GameEnvironment(create_game, metadata)
obs = env.reset() # returns initial observation vector
obs, reward, done = env.step("KEY_RIGHT")
The observation is a flat NumPy array of dtype ``float32``. For spatial
games, the first ``C × H × W`` elements are the board (channel-first
one-hot encoding); for non-spatial games, the board is encoded
``H × W × C`` and then flattened. Any ``observe_state`` values are
appended at the end.