retro-gamer/docs/integration.rst

Integrating a Trained Model
===========================

Once you have trained a model, you can use it in two ways:

- **PolicyInput** — the model replaces the keyboard, driving an existing
  player-controlled agent. Use this to watch a trained agent play, or to
  run automated evaluations.
- **TrainedPolicy in play_turn** — call ``get_action(game)`` from inside any
  agent's ``play_turn`` to embed the model as an autonomous character (for
  example, a smart enemy) alongside human-controlled or other agents.

Loading a trained model
-----------------------

Both approaches start by creating a :class:`retro_gamer.TrainedPolicy`:

.. code-block:: python

   from retro_gamer import TrainedPolicy

   ai = TrainedPolicy("runs/snake/")

This reads ``config.toml``, rebuilds the network, and loads the latest
checkpoint. To load a specific checkpoint instead:

.. code-block:: python

   ai = TrainedPolicy("runs/snake/", checkpoint="ep_0500")

PolicyInput: model as player
----------------------------

:class:`retro_gamer.PolicyInput` is an input source — it implements the same
interface as keyboard input, but chooses actions using the trained model. Pass
it to ``game.play()`` and everything else works exactly as usual:

.. code-block:: python

   from retro.examples.snake import create_game
   from retro_gamer import TrainedPolicy, PolicyInput

   ai = TrainedPolicy("runs/snake/")
   game = create_game()
   game.play(input_source=PolicyInput(ai, game))

On each turn, ``PolicyInput`` observes the current board and game state, runs
the model, and sends the chosen action to the game exactly as if the player
had pressed that key.

TrainedPolicy in play_turn: model as autonomous character
---------------------------------------------------------

To embed a trained model as an autonomous game character, create a
``TrainedPolicy`` at module level and call ``get_action(game)`` from inside
the agent's ``play_turn``. Placing it at module level means the model is
loaded from disk once — not once per episode.

.. code-block:: python

   from retro.game import Game
   from retro.examples.snake import Apple, SnakeHead
   from retro_gamer import TrainedPolicy

   _ai = TrainedPolicy("runs/snake/")

   class AISnake(SnakeHead):
       def handle_keystroke(self, k, game): pass  # ignore keyboard

       def play_turn(self, game):
           key = _ai.get_action(game)
           if key == 'KEY_RIGHT': self.direction = (1, 0)
           elif key == 'KEY_LEFT': self.direction = (-1, 0)
           elif key == 'KEY_UP': self.direction = (0, -1)
           elif key == 'KEY_DOWN': self.direction = (0, 1)
           super().play_turn(game)

   human_snake = SnakeHead()
   ai_snake = AISnake()
   ai_snake.position = (16, 8)
   apple = Apple()

   game = Game([human_snake, ai_snake, apple], {"score": 0}, board_size=(32, 16))
   apple.relocate(game)
   game.play()

Training an enemy model
~~~~~~~~~~~~~~~~~~~~~~~~

You can use the same training pipeline to produce a model for an enemy agent.
``retro-gamer`` does not care *which* character it is training — it only cares
that it can control one character through the keyboard and read a reward signal
from the game state. To train an enemy:

1. **Create an enemy-perspective game variant.** Write (or add) a
   ``create_game`` function — in a separate file, or alongside your main one —
   where the enemy agent is the keyboard-driven character and the reward key
   in the game state reflects the enemy's objective (for example, a bonus for
   catching the player). The human player can be absent, replaced by a
   random-moving agent, or driven by a ``TrainedPolicy`` once you have a trained
   player model.

   .. code-block:: python

      def create_enemy_training_game():
          enemy = EnemyAgent()       # the character the trainer will control
          player = RandomPlayer()    # a stand-in; no human involved
          game = Game([enemy, player], {'enemy_reward': 0}, board_size=(32, 16))
          return game

2. **Train normally against this variant.**

   .. code-block:: console

      % retro-gamer create --game my_game:create_enemy_training_game \
                           --output runs/enemy/
      % retro-gamer train runs/enemy/

3. **Embed the trained model in your main game** using ``get_action``, exactly
   as shown above.

.. note::

   Because ``retro-gamer`` injects actions through the game's global input
   source, *all* keyboard-listening agents in the training game will receive
   the trainer's keystrokes. The cleanest approach is to make the enemy the
   only keyboard-driven character in the training variant — any other
   characters should advance on their own without reading from the keyboard.

Adversarial training
~~~~~~~~~~~~~~~~~~~~~

Once you have separate training runs for the player and the enemy, you can
train them *against each other* iteratively. The idea is simple: train the
player against the current enemy model, then train the enemy against the
updated player model, and repeat. Each side is forced to improve against an
increasingly capable opponent.

The key technique is to load the opponent's model at module level in each
training game variant, so it is loaded from disk once per run rather than
once per episode:

.. code-block:: python

   # enemy_training_game.py
   from retro_gamer import TrainedPolicy

   _player = TrainedPolicy("runs/player/")   # loaded once when the module is imported

   def create_game():
       enemy = EnemyAgent()
       player = AIPlayer(_player)           # uses _player.get_action in play_turn
       return Game([enemy, player], {'enemy_reward': 0}, board_size=(32, 16))

You then alternate training runs:

.. code-block:: console

   % retro-gamer train runs/player/   # train player against current enemy
   % retro-gamer train runs/enemy/    # train enemy against updated player
   % retro-gamer train runs/player/   # train player again
   # ...

How many episodes to run before switching is itself a design decision: too
few and neither model has time to adapt; too many and each side overfits to
its current opponent. Watching how the strategies evolve — and asking *why*
each model behaves as it does at each stage — connects directly to concepts
in multi-agent reinforcement learning and adversarial training.

Differences between the two approaches
---------------------------------------

.. list-table::
   :header-rows: 1
   :widths: 35 65

   * - ``PolicyInput``
     - ``TrainedPolicy`` in ``play_turn``
   * - Replaces human input for the whole game
     - One autonomous agent among many
   * - Game code is unchanged
     - Agent's ``play_turn`` calls ``get_action``
   * - One model drives all player-controlled agents
     - Each agent instance has its own model
   * - Simpler — just pass to ``game.play()``
     - More flexible — mix human and AI characters