481 lines
16 KiB
ReStructuredText
481 lines
16 KiB
ReStructuredText
Reference
|
||
=========
|
||
|
||
Game description fields
|
||
-----------------------
|
||
|
||
Game descriptions are written in the ``[tool.retro-gamer]`` section of
|
||
your game project's ``pyproject.toml``. ``retro-gamer create`` reads
|
||
this section and copies the metadata into the training run's
|
||
``config.toml``, where it can also be inspected or hand-edited.
|
||
|
||
A complete example for the Snake game:
|
||
|
||
.. code-block:: toml
|
||
|
||
[tool.retro-gamer]
|
||
actions = ["KEY_RIGHT", "KEY_UP", "KEY_LEFT", "KEY_DOWN"]
|
||
reward = "score"
|
||
character_set = ["@", "*", ">", "<", "^", "v"]
|
||
|
||
You do not need to specify the board size: ``retro-gamer`` reads it
|
||
directly from your game's ``board_size`` attribute.
|
||
|
||
The fields are described below.
|
||
|
||
``actions``
|
||
~~~~~~~~~~~
|
||
|
||
**Required.** A list of keystroke names the agent may send to the game
|
||
each turn. Use arrow key names for directional games, or single
|
||
characters for character-key games.
|
||
|
||
.. code-block:: toml
|
||
|
||
actions = ["KEY_RIGHT", "KEY_UP", "KEY_LEFT", "KEY_DOWN"]
|
||
|
||
The agent also has access to a no-op action (doing nothing). The total
|
||
number of actions in the Q-network output is ``len(actions) + 1``.
|
||
|
||
``reward``
|
||
~~~~~~~~~~
|
||
|
||
**Required.** The key in the game's state dictionary to use as the
|
||
reward signal. The reward computed for each turn is the *change* in
|
||
this value from the previous turn.
|
||
|
||
.. code-block:: toml
|
||
|
||
reward = "score"
|
||
|
||
``character_set``
|
||
~~~~~~~~~~~~~~~~~
|
||
|
||
**Optional.** A list of single characters that may appear on the board.
|
||
Each character occupies one "slot" in the one-hot encoding. Characters
|
||
not in this list are treated as empty space.
|
||
|
||
.. code-block:: toml
|
||
|
||
character_set = ["@", "*", ">", "<", "^", "v"]
|
||
|
||
If omitted, ``retro-gamer`` runs an exploration phase to discover the
|
||
characters that appear in practice. The length of this phase is
|
||
controlled by the ``exploration_turns`` hyperparameter.
|
||
|
||
Preprocessing options
|
||
---------------------
|
||
|
||
Preprocessing options live in the ``[preprocessing]`` section of a run's
|
||
``config.toml``. They control how the game's board and state are
|
||
transformed into the observation vector that the neural network sees.
|
||
``retro-gamer create`` writes sensible defaults; you can edit them by
|
||
hand before running ``retro-gamer train``.
|
||
|
||
.. note::
|
||
|
||
Changes to any ``[preprocessing]`` option—or to the game description
|
||
fields above—make existing checkpoints incompatible. Run
|
||
``retro-gamer clean`` before retraining after such changes.
|
||
|
||
``spatial`` (default: ``false``)
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Whether to treat the board as a 2D spatial scene. When ``true``, the
|
||
trainer uses a convolutional neural network (CNN); when ``false``, a
|
||
multilayer perceptron (MLP) that sees the board as a flat list of
|
||
numbers.
|
||
|
||
``board`` (default: ``true``)
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Whether to include the board encoding in the observation vector. Set
|
||
to ``false`` to train on game state variables only, with no board at
|
||
all. This is useful for games with small, enumerable state spaces where
|
||
a lookup table (classic Q-learning) is sufficient.
|
||
|
||
When ``board = false``:
|
||
|
||
- ``spatial`` must also be ``false`` (no board means no 2D scene for a CNN).
|
||
- At least one key must be listed in ``observe_state``.
|
||
- ``character_set`` is not required and character discovery is skipped.
|
||
|
||
.. code-block:: toml
|
||
|
||
[preprocessing]
|
||
board = false
|
||
observe_state = ["board_state"]
|
||
|
||
``observe_state`` (default: ``[]``)
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
A list of keys from ``game.state`` to include in the observation
|
||
vector, appended after the board encoding (or as the entire
|
||
observation when ``board = false``). Scalar values contribute one
|
||
element each; list or tuple values are flattened.
|
||
|
||
.. code-block:: toml
|
||
|
||
observe_state = ["apple_dx", "apple_dy"]
|
||
|
||
The keys must be present in ``game.state`` at every step, initialized
|
||
in ``create_game()`` before the game starts. All values that are lists
|
||
or tuples must always have the same length from episode to episode.
|
||
|
||
.. warning::
|
||
|
||
``observe_state`` keys must be initialized to their final shape in
|
||
``create_game()`` before the game starts. If a key is absent or its
|
||
list length changes between episodes, training will crash with an
|
||
error explaining which key changed and by how much. This happens
|
||
because the neural network's input layer has a fixed size determined
|
||
at the start of training; it cannot adapt to a changing observation
|
||
shape mid-run.
|
||
|
||
Always initialize every observed key with a placeholder of the
|
||
correct type and length before the first ``game.step()`` call.
|
||
|
||
``observe_state_sizes`` (auto-discovered)
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
A table mapping each ``observe_state`` key to its flat size (``1`` for
|
||
scalars, ``N`` for sequences of length N). This is written automatically
|
||
to ``config.toml`` the first time ``retro-gamer train`` runs, after the
|
||
trainer samples ``game.state`` to discover the actual sizes:
|
||
|
||
.. code-block:: toml
|
||
|
||
observe_state_sizes = {board_state = 9}
|
||
|
||
You do not need to set this manually. Once written, it is used to
|
||
detect changes in state shape when resuming training—an incompatible
|
||
change here requires running ``retro-gamer clean`` and starting fresh.
|
||
|
||
``egocentric`` (default: ``false``)
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
When ``true``, the board observation is cropped to a square window
|
||
centred on a specific agent rather than the full board. This gives the
|
||
agent a local, first-person-like view and makes the observation
|
||
invariant to the agent's absolute position on the board.
|
||
|
||
Requires ``egocentric_player`` and ``egocentric_radius``.
|
||
|
||
``egocentric_player``
|
||
~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
The name of the agent to use as the centre of the egocentric crop.
|
||
Must match the ``name`` attribute of one of the game's agents.
|
||
|
||
.. code-block:: toml
|
||
|
||
egocentric_player = "Snake head"
|
||
|
||
``egocentric_radius``
|
||
~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
The half-side-length of the egocentric crop window, in cells. The
|
||
resulting observation covers a ``(2r+1) × (2r+1)`` region. Larger
|
||
values give the agent a wider view; smaller values focus it on the
|
||
immediate vicinity.
|
||
|
||
.. code-block:: toml
|
||
|
||
egocentric_radius = 8 # 17×17 window
|
||
|
||
When ``egocentric_radius`` is set, ``board_size`` in ``[metadata]`` is
|
||
automatically updated to ``[2r+1, 2r+1]`` so the network is sized
|
||
correctly.
|
||
|
||
.. _hyperparameters:
|
||
|
||
Hyperparameters
|
||
---------------
|
||
|
||
Hyperparameters are split across two sections of ``config.toml``:
|
||
|
||
- ``[model]`` — network architecture (changing these requires starting fresh)
|
||
- ``[training]`` — learning algorithm parameters (safe to change at any time)
|
||
|
||
Both sections can be set via ``retro-gamer create`` options or edited directly.
|
||
|
||
Learning and optimization
|
||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
``learning_rate`` (default: ``0.0001``)
|
||
The step size used by the Adam optimizer when updating network
|
||
weights. Larger values converge faster but may be unstable; smaller
|
||
values are more stable but slower.
|
||
|
||
``learning_rate_decay`` (default: ``0.9999``)
|
||
Multiplicative decay applied to the learning rate after each
|
||
episode. The learning rate decreases geometrically over training,
|
||
helping the network fine-tune later without destabilizing early
|
||
progress. With the default value, the learning rate decays to about
|
||
13 % of its starting value after 20 000 episodes.
|
||
|
||
``gamma`` (default: ``0.99``)
|
||
The discount factor for future rewards. A value of 1.0 makes the
|
||
agent value all future rewards equally; smaller values make the
|
||
agent increasingly myopic.
|
||
|
||
Exploration
|
||
~~~~~~~~~~~
|
||
|
||
``epsilon`` (default: ``1.0``)
|
||
The initial exploration rate. At each turn, the agent takes a
|
||
random action with probability ``epsilon`` and exploits its current
|
||
Q-function with probability ``1 - epsilon``.
|
||
|
||
``epsilon_decay`` (default: ``0.9997``)
|
||
Multiplicative decay applied to ``epsilon`` after each episode.
|
||
|
||
``epsilon_min`` (default: ``0.05``)
|
||
The floor below which ``epsilon`` will not fall. A small amount of
|
||
continued exploration prevents the agent from becoming permanently
|
||
committed to a suboptimal policy.
|
||
|
||
Memory and sampling
|
||
~~~~~~~~~~~~~~~~~~~
|
||
|
||
``batch_size`` (default: ``64``)
|
||
The number of experiences sampled from the replay buffer per
|
||
training step.
|
||
|
||
``memory_capacity`` (default: ``50000``)
|
||
The maximum number of experiences the replay buffer can hold. When
|
||
full, the oldest experiences are discarded.
|
||
|
||
``prioritize_experiences`` (default: ``true``)
|
||
Whether to use prioritized experience replay. When ``true``,
|
||
experiences with larger TD errors are sampled more frequently.
|
||
This often improves sample efficiency at a modest computational
|
||
cost.
|
||
|
||
Model architecture
|
||
~~~~~~~~~~~~~~~~~~
|
||
|
||
These live in the ``[model]`` section. Changing them requires starting fresh
|
||
(run ``retro-gamer clean`` before retraining).
|
||
|
||
``hidden_sizes`` (default: ``[128, 64]``)
|
||
A list of integers giving the size of each hidden layer in the MLP
|
||
head. The default creates two layers: 128 units then 64. For spatial
|
||
games this follows the CNN; for non-spatial games it is the full
|
||
network. Larger or deeper networks can represent more complex
|
||
Q-functions but train more slowly and may need more episodes.
|
||
|
||
Training duration
|
||
~~~~~~~~~~~~~~~~~
|
||
|
||
``training_episodes`` (default: ``20000``)
|
||
The total number of game episodes to run. Each episode runs until
|
||
the game ends or ``max_turns_per_episode`` turns have elapsed.
|
||
|
||
``max_turns_per_episode`` (default: ``2000``)
|
||
A safety cutoff preventing a single episode from running
|
||
indefinitely (for example, if the agent finds a way to avoid
|
||
dying).
|
||
|
||
``target_update_freq`` (default: ``500``)
|
||
How many training steps between updates of the target network.
|
||
More frequent updates make training targets move faster (less
|
||
stable); less frequent updates make them more stable but slower
|
||
to reflect new learning.
|
||
|
||
``train_every`` (default: ``4``)
|
||
Run one training step every N game steps. Higher values speed up
|
||
episode collection at the cost of fewer gradient updates per
|
||
experience. The default of 4 is a good balance for most games;
|
||
set to 1 to train on every step.
|
||
|
||
Character discovery
|
||
~~~~~~~~~~~~~~~~~~~
|
||
|
||
``exploration_turns`` (default: ``200``)
|
||
When ``character_set`` is not specified, the number of random
|
||
turns to run at the start of training to discover which
|
||
characters appear on the board.
|
||
|
||
``unknown_character_strategy`` (default: ``"ignore"``)
|
||
What to do when a character appears during training that is not
|
||
in the established ``character_set``. ``"ignore"`` treats it as
|
||
an empty cell; ``"extend"`` rebuilds the model with an extended
|
||
character set.
|
||
|
||
CLI reference
|
||
-------------
|
||
|
||
``retro-gamer create``
|
||
~~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Create a new training run directory with ``config.toml``. Game metadata
|
||
is read automatically from the ``[tool.retro-gamer]`` section of your
|
||
game's ``pyproject.toml``; you do not pass it on the command line.
|
||
|
||
.. code-block:: console
|
||
|
||
% retro-gamer create --game GAME --output DIR [OPTIONS]
|
||
|
||
**Required options:**
|
||
|
||
- ``--game GAME`` — Your game, specified as a file path or a Python
|
||
module name:
|
||
|
||
- File path: ``--game my_game.py`` or ``--game my_game/``
|
||
- Module name: ``--game retro.examples.snake``
|
||
|
||
The ``[tool.retro-gamer]`` section is read from the ``pyproject.toml``
|
||
found in or above the game file.
|
||
- ``--output DIR`` — Directory to create for this training run.
|
||
|
||
**Hyperparameter options** (all optional; see :ref:`hyperparameters`):
|
||
|
||
- ``--training-episodes N``
|
||
- ``--hidden-sizes SIZES`` — comma-separated, e.g. ``512,256``
|
||
- ``--learning-rate F``
|
||
- ``--learning-rate-decay F``
|
||
- ``--gamma F``
|
||
- ``--epsilon-decay F``
|
||
- ``--epsilon-min F``
|
||
- ``--batch-size N``
|
||
- ``--memory-capacity N``
|
||
- ``--target-update-freq N``
|
||
- ``--max-turns-per-episode N``
|
||
- ``--exploration-turns N``
|
||
- ``--train-every N``
|
||
- ``--prioritize-experiences`` / ``--no-prioritize-experiences``
|
||
|
||
``retro-gamer train``
|
||
~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Train a DQN agent.
|
||
|
||
.. code-block:: console
|
||
|
||
% retro-gamer train RUN_DIR
|
||
|
||
``RUN_DIR`` must contain a ``config.toml`` generated by ``retro-gamer
|
||
create``. If checkpoints already exist in ``RUN_DIR``, training
|
||
automatically resumes from the latest one so prior work is never lost.
|
||
|
||
If all configured episodes have already been completed, the command
|
||
prints a message and exits immediately. To keep training, increase
|
||
``training_episodes`` in ``config.toml`` and run again.
|
||
|
||
**Incompatible changes.** Some config changes make existing checkpoints
|
||
unusable. If you change any of the following, ``retro-gamer train`` will
|
||
detect the mismatch and refuse to resume, with a clear explanation:
|
||
|
||
- ``actions``, ``reward``, ``character_set``, ``board_size``
|
||
(``[metadata]``) — game description
|
||
- ``spatial``, ``board``, ``observe_state``, ``observe_state_sizes``,
|
||
``egocentric``, ``egocentric_player``, ``egocentric_radius``
|
||
(``[preprocessing]``) — observation encoding
|
||
- ``hidden_sizes`` (``[model]``) — network architecture
|
||
|
||
Run ``retro-gamer clean RUN_DIR`` to remove the old checkpoints and start
|
||
fresh. Other hyperparameter changes (learning rate, epsilon, etc.) are
|
||
safe and take effect immediately on the next training run.
|
||
|
||
``retro-gamer play``
|
||
~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Watch a trained agent play the game in the terminal.
|
||
|
||
.. code-block:: console
|
||
|
||
% retro-gamer play RUN_DIR [--checkpoint NAME] [--framerate N]
|
||
|
||
By default, the latest available checkpoint is loaded. Use
|
||
``--checkpoint`` to load a specific one by name (e.g. ``ep_0100``).
|
||
``--framerate`` sets the target frames per second (default: 12). Press
|
||
Enter or Escape to quit.
|
||
|
||
``retro-gamer clean``
|
||
~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Remove all checkpoints and the training log from a run directory.
|
||
|
||
.. code-block:: console
|
||
|
||
% retro-gamer clean RUN_DIR
|
||
|
||
Prompts for confirmation before deleting. Use ``--yes`` / ``-y`` to skip
|
||
the prompt. The ``config.toml`` is preserved so you can run
|
||
``retro-gamer train`` immediately to start fresh with the same settings.
|
||
|
||
Use this after making an incompatible change (see ``retro-gamer train``
|
||
above) or any time you want to restart training from scratch.
|
||
|
||
``retro-gamer info``
|
||
~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Print a summary of a training run: metadata, hyperparameters, recent
|
||
checkpoint log, and available checkpoints.
|
||
|
||
.. code-block:: console
|
||
|
||
% retro-gamer info RUN_DIR
|
||
|
||
Training run directory structure
|
||
---------------------------------
|
||
|
||
A training run is a self-contained directory with the following
|
||
contents:
|
||
|
||
.. code-block:: text
|
||
|
||
runs/snake/
|
||
├── config.toml # game description + hyperparameters
|
||
├── training.log # architecture rationale + per-episode log
|
||
└── checkpoints/
|
||
├── ep_0100.pt # model weights at episode 100
|
||
├── ep_0200.pt
|
||
└── ... # one file saved every 100 episodes
|
||
|
||
``config.toml`` is written by ``retro-gamer create`` and updated (with
|
||
the discovered character set and resolved hyperparameters) when
|
||
``retro-gamer train`` begins. It has five sections: ``[game]``,
|
||
``[metadata]``, ``[preprocessing]``, ``[model]``, and ``[training]``.
|
||
Editing ``config.toml`` between ``create`` and ``train`` is the
|
||
recommended way to adjust hyperparameters.
|
||
|
||
``training.log`` begins with the full network architecture description,
|
||
then one line per checkpoint (every 100 episodes) in the format::
|
||
|
||
[ep_NNNN] ep=SSSS-NNNN avg_reward=F avg_steps=N epsilon=F avg_loss=F time=Xm Xs total=Xm Xs
|
||
|
||
Each field averages over the episodes since the previous checkpoint:
|
||
|
||
- ``ep=SSSS-NNNN`` — episode range covered by this entry
|
||
- ``avg_reward`` — mean total reward per episode (positive = good)
|
||
- ``avg_steps`` — mean episode length in game turns
|
||
- ``epsilon`` — current exploration rate (approaches ``epsilon_min`` over time)
|
||
- ``avg_loss`` — mean Huber loss across training steps (should decrease as learning
|
||
stabilises). Huber loss equals ½·(q−t)² for small errors and |q−t|−½ for large
|
||
ones, so it stays bounded even when Q-values are large. Values in the range
|
||
0–10 are typical; a slow downward trend over thousands of episodes is the
|
||
healthy pattern. A loss that grows without bound indicates a learning rate
|
||
that is too high.
|
||
- ``time`` — wall-clock time for this checkpoint interval
|
||
- ``total`` — cumulative training time across all sessions
|
||
|
||
When training is resumed, a ``=== Resumed from ... ===`` line is appended
|
||
so the log records the full history of a run across multiple sessions.
|
||
|
||
Python API
|
||
----------
|
||
|
||
For advanced use, ``retro-gamer``'s components are importable as a
|
||
library. See the :doc:`api` reference for full details.
|
||
|
||
.. code-block:: python
|
||
|
||
from retro_gamer import GameMetadata, DQNTrainer
|
||
from retro.examples.snake import create_game
|
||
|
||
metadata = GameMetadata.from_pyproject("retro.examples.snake")
|
||
trainer = DQNTrainer(create_game, metadata, "runs/snake/")
|
||
trainer.train()
|