Updates across the board

This commit is contained in:
Chris Proctor
2026-06-22 16:41:31 -04:00
parent 5ca97dc5d0
commit 73624d1a0c
33 changed files with 3104 additions and 643 deletions

186
docs/integration.rst Normal file
View File

@@ -0,0 +1,186 @@
Integrating a Trained Model
===========================
Once you have trained a model, you can use it in two ways:
- **PolicyInput** — the model replaces the keyboard, driving an existing
player-controlled agent. Use this to watch a trained agent play, or to
run automated evaluations.
- **TrainedPolicy in play_turn** — call ``get_action(game)`` from inside any
agent's ``play_turn`` to embed the model as an autonomous character (for
example, a smart enemy) alongside human-controlled or other agents.
Loading a trained model
-----------------------
Both approaches start by creating a :class:`retro_gamer.TrainedPolicy`:
.. code-block:: python
from retro_gamer import TrainedPolicy
ai = TrainedPolicy("runs/snake/")
This reads ``config.toml``, rebuilds the network, and loads the latest
checkpoint. To load a specific checkpoint instead:
.. code-block:: python
ai = TrainedPolicy("runs/snake/", checkpoint="ep_0500")
PolicyInput: model as player
----------------------------
:class:`retro_gamer.PolicyInput` is an input source — it implements the same
interface as keyboard input, but chooses actions using the trained model. Pass
it to ``game.play()`` and everything else works exactly as usual:
.. code-block:: python
from retro.examples.snake import create_game
from retro_gamer import TrainedPolicy, PolicyInput
ai = TrainedPolicy("runs/snake/")
game = create_game()
game.play(input_source=PolicyInput(ai, game))
On each turn, ``PolicyInput`` observes the current board and game state, runs
the model, and sends the chosen action to the game exactly as if the player
had pressed that key.
TrainedPolicy in play_turn: model as autonomous character
---------------------------------------------------------
To embed a trained model as an autonomous game character, create a
``TrainedPolicy`` at module level and call ``get_action(game)`` from inside
the agent's ``play_turn``. Placing it at module level means the model is
loaded from disk once — not once per episode.
.. code-block:: python
from retro.game import Game
from retro.examples.snake import Apple, SnakeHead
from retro_gamer import TrainedPolicy
_ai = TrainedPolicy("runs/snake/")
class AISnake(SnakeHead):
def handle_keystroke(self, k, game): pass # ignore keyboard
def play_turn(self, game):
key = _ai.get_action(game)
if key == 'KEY_RIGHT': self.direction = (1, 0)
elif key == 'KEY_LEFT': self.direction = (-1, 0)
elif key == 'KEY_UP': self.direction = (0, -1)
elif key == 'KEY_DOWN': self.direction = (0, 1)
super().play_turn(game)
human_snake = SnakeHead()
ai_snake = AISnake()
ai_snake.position = (16, 8)
apple = Apple()
game = Game([human_snake, ai_snake, apple], {"score": 0}, board_size=(32, 16))
apple.relocate(game)
game.play()
Training an enemy model
~~~~~~~~~~~~~~~~~~~~~~~~
You can use the same training pipeline to produce a model for an enemy agent.
``retro-gamer`` does not care *which* character it is training — it only cares
that it can control one character through the keyboard and read a reward signal
from the game state. To train an enemy:
1. **Create an enemy-perspective game variant.** Write (or add) a
``create_game`` function — in a separate file, or alongside your main one —
where the enemy agent is the keyboard-driven character and the reward key
in the game state reflects the enemy's objective (for example, a bonus for
catching the player). The human player can be absent, replaced by a
random-moving agent, or driven by a ``TrainedPolicy`` once you have a trained
player model.
.. code-block:: python
def create_enemy_training_game():
enemy = EnemyAgent() # the character the trainer will control
player = RandomPlayer() # a stand-in; no human involved
game = Game([enemy, player], {'enemy_reward': 0}, board_size=(32, 16))
return game
2. **Train normally against this variant.**
.. code-block:: console
% retro-gamer create --game my_game:create_enemy_training_game \
--output runs/enemy/
% retro-gamer train runs/enemy/
3. **Embed the trained model in your main game** using ``get_action``, exactly
as shown above.
.. note::
Because ``retro-gamer`` injects actions through the game's global input
source, *all* keyboard-listening agents in the training game will receive
the trainer's keystrokes. The cleanest approach is to make the enemy the
only keyboard-driven character in the training variant — any other
characters should advance on their own without reading from the keyboard.
Adversarial training
~~~~~~~~~~~~~~~~~~~~~
Once you have separate training runs for the player and the enemy, you can
train them *against each other* iteratively. The idea is simple: train the
player against the current enemy model, then train the enemy against the
updated player model, and repeat. Each side is forced to improve against an
increasingly capable opponent.
The key technique is to load the opponent's model at module level in each
training game variant, so it is loaded from disk once per run rather than
once per episode:
.. code-block:: python
# enemy_training_game.py
from retro_gamer import TrainedPolicy
_player = TrainedPolicy("runs/player/") # loaded once when the module is imported
def create_game():
enemy = EnemyAgent()
player = AIPlayer(_player) # uses _player.get_action in play_turn
return Game([enemy, player], {'enemy_reward': 0}, board_size=(32, 16))
You then alternate training runs:
.. code-block:: console
% retro-gamer train runs/player/ # train player against current enemy
% retro-gamer train runs/enemy/ # train enemy against updated player
% retro-gamer train runs/player/ # train player again
# ...
How many episodes to run before switching is itself a design decision: too
few and neither model has time to adapt; too many and each side overfits to
its current opponent. Watching how the strategies evolve — and asking *why*
each model behaves as it does at each stage — connects directly to concepts
in multi-agent reinforcement learning and adversarial training.
Differences between the two approaches
---------------------------------------
.. list-table::
:header-rows: 1
:widths: 35 65
* - ``PolicyInput``
- ``TrainedPolicy`` in ``play_turn``
* - Replaces human input for the whole game
- One autonomous agent among many
* - Game code is unchanged
- Agent's ``play_turn`` calls ``get_action``
* - One model drives all player-controlled agents
- Each agent instance has its own model
* - Simpler — just pass to ``game.play()``
- More flexible — mix human and AI characters