187 lines
6.7 KiB
ReStructuredText
187 lines
6.7 KiB
ReStructuredText
Integrating a Trained Model
|
|
===========================
|
|
|
|
Once you have trained a model, you can use it in two ways:
|
|
|
|
- **PolicyInput** — the model replaces the keyboard, driving an existing
|
|
player-controlled agent. Use this to watch a trained agent play, or to
|
|
run automated evaluations.
|
|
- **TrainedPolicy in play_turn** — call ``get_action(game)`` from inside any
|
|
agent's ``play_turn`` to embed the model as an autonomous character (for
|
|
example, a smart enemy) alongside human-controlled or other agents.
|
|
|
|
Loading a trained model
|
|
-----------------------
|
|
|
|
Both approaches start by creating a :class:`retro_gamer.TrainedPolicy`:
|
|
|
|
.. code-block:: python
|
|
|
|
from retro_gamer import TrainedPolicy
|
|
|
|
ai = TrainedPolicy("runs/snake/")
|
|
|
|
This reads ``config.toml``, rebuilds the network, and loads the latest
|
|
checkpoint. To load a specific checkpoint instead:
|
|
|
|
.. code-block:: python
|
|
|
|
ai = TrainedPolicy("runs/snake/", checkpoint="ep_0500")
|
|
|
|
PolicyInput: model as player
|
|
----------------------------
|
|
|
|
:class:`retro_gamer.PolicyInput` is an input source — it implements the same
|
|
interface as keyboard input, but chooses actions using the trained model. Pass
|
|
it to ``game.play()`` and everything else works exactly as usual:
|
|
|
|
.. code-block:: python
|
|
|
|
from retro.examples.snake import create_game
|
|
from retro_gamer import TrainedPolicy, PolicyInput
|
|
|
|
ai = TrainedPolicy("runs/snake/")
|
|
game = create_game()
|
|
game.play(input_source=PolicyInput(ai, game))
|
|
|
|
On each turn, ``PolicyInput`` observes the current board and game state, runs
|
|
the model, and sends the chosen action to the game exactly as if the player
|
|
had pressed that key.
|
|
|
|
TrainedPolicy in play_turn: model as autonomous character
|
|
---------------------------------------------------------
|
|
|
|
To embed a trained model as an autonomous game character, create a
|
|
``TrainedPolicy`` at module level and call ``get_action(game)`` from inside
|
|
the agent's ``play_turn``. Placing it at module level means the model is
|
|
loaded from disk once — not once per episode.
|
|
|
|
.. code-block:: python
|
|
|
|
from retro.game import Game
|
|
from retro.examples.snake import Apple, SnakeHead
|
|
from retro_gamer import TrainedPolicy
|
|
|
|
_ai = TrainedPolicy("runs/snake/")
|
|
|
|
class AISnake(SnakeHead):
|
|
def handle_keystroke(self, k, game): pass # ignore keyboard
|
|
|
|
def play_turn(self, game):
|
|
key = _ai.get_action(game)
|
|
if key == 'KEY_RIGHT': self.direction = (1, 0)
|
|
elif key == 'KEY_LEFT': self.direction = (-1, 0)
|
|
elif key == 'KEY_UP': self.direction = (0, -1)
|
|
elif key == 'KEY_DOWN': self.direction = (0, 1)
|
|
super().play_turn(game)
|
|
|
|
human_snake = SnakeHead()
|
|
ai_snake = AISnake()
|
|
ai_snake.position = (16, 8)
|
|
apple = Apple()
|
|
|
|
game = Game([human_snake, ai_snake, apple], {"score": 0}, board_size=(32, 16))
|
|
apple.relocate(game)
|
|
game.play()
|
|
|
|
Training an enemy model
|
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
You can use the same training pipeline to produce a model for an enemy agent.
|
|
``retro-gamer`` does not care *which* character it is training — it only cares
|
|
that it can control one character through the keyboard and read a reward signal
|
|
from the game state. To train an enemy:
|
|
|
|
1. **Create an enemy-perspective game variant.** Write (or add) a
|
|
``create_game`` function — in a separate file, or alongside your main one —
|
|
where the enemy agent is the keyboard-driven character and the reward key
|
|
in the game state reflects the enemy's objective (for example, a bonus for
|
|
catching the player). The human player can be absent, replaced by a
|
|
random-moving agent, or driven by a ``TrainedPolicy`` once you have a trained
|
|
player model.
|
|
|
|
.. code-block:: python
|
|
|
|
def create_enemy_training_game():
|
|
enemy = EnemyAgent() # the character the trainer will control
|
|
player = RandomPlayer() # a stand-in; no human involved
|
|
game = Game([enemy, player], {'enemy_reward': 0}, board_size=(32, 16))
|
|
return game
|
|
|
|
2. **Train normally against this variant.**
|
|
|
|
.. code-block:: console
|
|
|
|
% retro-gamer create --game my_game:create_enemy_training_game \
|
|
--output runs/enemy/
|
|
% retro-gamer train runs/enemy/
|
|
|
|
3. **Embed the trained model in your main game** using ``get_action``, exactly
|
|
as shown above.
|
|
|
|
.. note::
|
|
|
|
Because ``retro-gamer`` injects actions through the game's global input
|
|
source, *all* keyboard-listening agents in the training game will receive
|
|
the trainer's keystrokes. The cleanest approach is to make the enemy the
|
|
only keyboard-driven character in the training variant — any other
|
|
characters should advance on their own without reading from the keyboard.
|
|
|
|
Adversarial training
|
|
~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Once you have separate training runs for the player and the enemy, you can
|
|
train them *against each other* iteratively. The idea is simple: train the
|
|
player against the current enemy model, then train the enemy against the
|
|
updated player model, and repeat. Each side is forced to improve against an
|
|
increasingly capable opponent.
|
|
|
|
The key technique is to load the opponent's model at module level in each
|
|
training game variant, so it is loaded from disk once per run rather than
|
|
once per episode:
|
|
|
|
.. code-block:: python
|
|
|
|
# enemy_training_game.py
|
|
from retro_gamer import TrainedPolicy
|
|
|
|
_player = TrainedPolicy("runs/player/") # loaded once when the module is imported
|
|
|
|
def create_game():
|
|
enemy = EnemyAgent()
|
|
player = AIPlayer(_player) # uses _player.get_action in play_turn
|
|
return Game([enemy, player], {'enemy_reward': 0}, board_size=(32, 16))
|
|
|
|
You then alternate training runs:
|
|
|
|
.. code-block:: console
|
|
|
|
% retro-gamer train runs/player/ # train player against current enemy
|
|
% retro-gamer train runs/enemy/ # train enemy against updated player
|
|
% retro-gamer train runs/player/ # train player again
|
|
# ...
|
|
|
|
How many episodes to run before switching is itself a design decision: too
|
|
few and neither model has time to adapt; too many and each side overfits to
|
|
its current opponent. Watching how the strategies evolve — and asking *why*
|
|
each model behaves as it does at each stage — connects directly to concepts
|
|
in multi-agent reinforcement learning and adversarial training.
|
|
|
|
Differences between the two approaches
|
|
---------------------------------------
|
|
|
|
.. list-table::
|
|
:header-rows: 1
|
|
:widths: 35 65
|
|
|
|
* - ``PolicyInput``
|
|
- ``TrainedPolicy`` in ``play_turn``
|
|
* - Replaces human input for the whole game
|
|
- One autonomous agent among many
|
|
* - Game code is unchanged
|
|
- Agent's ``play_turn`` calls ``get_action``
|
|
* - One model drives all player-controlled agents
|
|
- Each agent instance has its own model
|
|
* - Simpler — just pass to ``game.play()``
|
|
- More flexible — mix human and AI characters
|