Updates across the board

2026-06-22 16:41:31 -04:00
parent 5ca97dc5d0
commit 73624d1a0c
33 changed files with 3104 additions and 643 deletions
--- a/docs/integration.rst
+++ b/docs/integration.rst
@@ -0,0 +1,186 @@
+Integrating a Trained Model
+===========================
+
+Once you have trained a model, you can use it in two ways:
+
+- **PolicyInput** — the model replaces the keyboard, driving an existing
+  player-controlled agent. Use this to watch a trained agent play, or to
+  run automated evaluations.
+- **TrainedPolicy in play_turn** — call ``get_action(game)`` from inside any
+  agent's ``play_turn`` to embed the model as an autonomous character (for
+  example, a smart enemy) alongside human-controlled or other agents.
+
+Loading a trained model
+-----------------------
+
+Both approaches start by creating a :class:`retro_gamer.TrainedPolicy`:
+
+.. code-block:: python
+
+   from retro_gamer import TrainedPolicy
+
+   ai = TrainedPolicy("runs/snake/")
+
+This reads ``config.toml``, rebuilds the network, and loads the latest
+checkpoint. To load a specific checkpoint instead:
+
+.. code-block:: python
+
+   ai = TrainedPolicy("runs/snake/", checkpoint="ep_0500")
+
+PolicyInput: model as player
+----------------------------
+
+:class:`retro_gamer.PolicyInput` is an input source — it implements the same
+interface as keyboard input, but chooses actions using the trained model. Pass
+it to ``game.play()`` and everything else works exactly as usual:
+
+.. code-block:: python
+
+   from retro.examples.snake import create_game
+   from retro_gamer import TrainedPolicy, PolicyInput
+
+   ai = TrainedPolicy("runs/snake/")
+   game = create_game()
+   game.play(input_source=PolicyInput(ai, game))
+
+On each turn, ``PolicyInput`` observes the current board and game state, runs
+the model, and sends the chosen action to the game exactly as if the player
+had pressed that key.
+
+TrainedPolicy in play_turn: model as autonomous character
+---------------------------------------------------------
+
+To embed a trained model as an autonomous game character, create a
+``TrainedPolicy`` at module level and call ``get_action(game)`` from inside
+the agent's ``play_turn``. Placing it at module level means the model is
+loaded from disk once — not once per episode.
+
+.. code-block:: python
+
+   from retro.game import Game
+   from retro.examples.snake import Apple, SnakeHead
+   from retro_gamer import TrainedPolicy
+
+   _ai = TrainedPolicy("runs/snake/")
+
+   class AISnake(SnakeHead):
+       def handle_keystroke(self, k, game): pass  # ignore keyboard
+
+       def play_turn(self, game):
+           key = _ai.get_action(game)
+           if key == 'KEY_RIGHT': self.direction = (1, 0)
+           elif key == 'KEY_LEFT': self.direction = (-1, 0)
+           elif key == 'KEY_UP': self.direction = (0, -1)
+           elif key == 'KEY_DOWN': self.direction = (0, 1)
+           super().play_turn(game)
+
+   human_snake = SnakeHead()
+   ai_snake = AISnake()
+   ai_snake.position = (16, 8)
+   apple = Apple()
+
+   game = Game([human_snake, ai_snake, apple], {"score": 0}, board_size=(32, 16))
+   apple.relocate(game)
+   game.play()
+
+Training an enemy model
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+You can use the same training pipeline to produce a model for an enemy agent.
+``retro-gamer`` does not care *which* character it is training — it only cares
+that it can control one character through the keyboard and read a reward signal
+from the game state. To train an enemy:
+
+1. **Create an enemy-perspective game variant.** Write (or add) a
+   ``create_game`` function — in a separate file, or alongside your main one —
+   where the enemy agent is the keyboard-driven character and the reward key
+   in the game state reflects the enemy's objective (for example, a bonus for
+   catching the player). The human player can be absent, replaced by a
+   random-moving agent, or driven by a ``TrainedPolicy`` once you have a trained
+   player model.
+
+   .. code-block:: python
+
+      def create_enemy_training_game():
+          enemy = EnemyAgent()       # the character the trainer will control
+          player = RandomPlayer()    # a stand-in; no human involved
+          game = Game([enemy, player], {'enemy_reward': 0}, board_size=(32, 16))
+          return game
+
+2. **Train normally against this variant.**
+
+   .. code-block:: console
+
+      % retro-gamer create --game my_game:create_enemy_training_game \
+                           --output runs/enemy/
+      % retro-gamer train runs/enemy/
+
+3. **Embed the trained model in your main game** using ``get_action``, exactly
+   as shown above.
+
+.. note::
+
+   Because ``retro-gamer`` injects actions through the game's global input
+   source, *all* keyboard-listening agents in the training game will receive
+   the trainer's keystrokes. The cleanest approach is to make the enemy the
+   only keyboard-driven character in the training variant — any other
+   characters should advance on their own without reading from the keyboard.
+
+Adversarial training
+~~~~~~~~~~~~~~~~~~~~~
+
+Once you have separate training runs for the player and the enemy, you can
+train them *against each other* iteratively. The idea is simple: train the
+player against the current enemy model, then train the enemy against the
+updated player model, and repeat. Each side is forced to improve against an
+increasingly capable opponent.
+
+The key technique is to load the opponent's model at module level in each
+training game variant, so it is loaded from disk once per run rather than
+once per episode:
+
+.. code-block:: python
+
+   # enemy_training_game.py
+   from retro_gamer import TrainedPolicy
+
+   _player = TrainedPolicy("runs/player/")   # loaded once when the module is imported
+
+   def create_game():
+       enemy = EnemyAgent()
+       player = AIPlayer(_player)           # uses _player.get_action in play_turn
+       return Game([enemy, player], {'enemy_reward': 0}, board_size=(32, 16))
+
+You then alternate training runs:
+
+.. code-block:: console
+
+   % retro-gamer train runs/player/   # train player against current enemy
+   % retro-gamer train runs/enemy/    # train enemy against updated player
+   % retro-gamer train runs/player/   # train player again
+   # ...
+
+How many episodes to run before switching is itself a design decision: too
+few and neither model has time to adapt; too many and each side overfits to
+its current opponent. Watching how the strategies evolve — and asking *why*
+each model behaves as it does at each stage — connects directly to concepts
+in multi-agent reinforcement learning and adversarial training.
+
+Differences between the two approaches
+---------------------------------------
+
+.. list-table::
+   :header-rows: 1
+   :widths: 35 65
+
+   * - ``PolicyInput``
+     - ``TrainedPolicy`` in ``play_turn``
+   * - Replaces human input for the whole game
+     - One autonomous agent among many
+   * - Game code is unchanged
+     - Agent's ``play_turn`` calls ``get_action``
+   * - One model drives all player-controlled agents
+     - Each agent instance has its own model
+   * - Simpler — just pass to ``game.play()``
+     - More flexible — mix human and AI characters