Introduction ============ ``retro-gamer`` grew out of a question about how students learn difficult ideas in computer science. Reinforcement learning—the branch of machine learning in which an agent learns to act well by interacting with an environment and receiving rewards—is one of the most powerful and widely-deployed ideas in modern computing. It underlies systems that play chess and Go at superhuman levels, control industrial robots, optimize power grids, and personalize recommendation feeds. It is also genuinely hard to understand, not because the core ideas are especially abstract, but because the feedback between a student's understanding and the system's behavior is usually invisible. You adjust a hyperparameter, run a training loop, and get a number. What happened inside, and why, remains opaque. The design hypothesis of ``retro-gamer`` is that this opacity is not inevitable. If a student already knows a game well—how it works, what the pieces mean, what counts as doing well—then training an agent on that game gives them a concrete anchor for reasoning about what the learning algorithm is doing and why. When the trainer decides to use a convolutional neural network instead of a simpler model, it explains its reasoning. When training stalls, the student can ask: did I describe the game accurately? Is the reward signal sending the right signal? Would a different exploration strategy help? These are exactly the questions that build genuine conceptual understanding. ``retro-gamer`` is developed as part of the `Making With Code `__ curriculum, a project-based high school computer science curriculum emphasizing personally meaningful creation and deep conceptual engagement. In the games unit, students design and implement their own games using the ``retro-games`` framework. The extension into reinforcement learning is a natural next step: you built the game; now let's see if a machine can learn to play it. How retro-gamer works --------------------- Rather than asking you to write a training algorithm yourself, ``retro-gamer`` asks you to describe the game you want to train on. This description—written in your game project's ``pyproject.toml``—tells the trainer things the game's code alone doesn't make obvious: which characters matter, which piece of game state represents success, whether the board should be understood spatially or as a flat data display. From this description, the trainer constructs a deep Q-learning model suited to the game. It writes out a plain-language explanation of every architectural decision it makes, then begins training. As training proceeds, it logs each episode's reward, loss, and exploration rate. Trained model snapshots—checkpoints—are saved periodically, so you can watch how the agent's skill develops over time. When you're done training, you can load any checkpoint and watch the agent play. A typical workflow looks like this. First, describe your game in the ``[tool.retro-gamer]`` section of your game project's ``pyproject.toml``: .. code-block:: toml [tool.retro-gamer] actions = ["KEY_RIGHT", "KEY_UP", "KEY_LEFT", "KEY_DOWN"] reward = "score" character_set = ["@", "*", ">", "<", "^", "v"] Then create a training run, train, and watch the result: .. code-block:: console % retro-gamer create --game my_game --output runs/snake/ % retro-gamer train runs/snake/ % retro-gamer play runs/snake/ --checkpoint ep_0500 The ``create`` command sets up the training run directory; ``train`` runs the learning algorithm; ``play`` loads a checkpoint and lets you watch the trained agent live in the terminal. What you will learn ------------------- Working with ``retro-gamer`` is designed to build understanding of a cluster of related ideas: **Reinforcement learning** is the framework in which an agent interacts with an environment, receiving observations and rewards, and learns to choose actions that maximize its long-term reward. The ``retro-gamer`` training loop is a concrete instance of this framework: the agent is the neural network, the environment is the game, the observation is the encoded board and game state, and the reward is the change in score from one turn to the next. **Neural network architecture** shapes what a model can and cannot learn. When you declare a game ``spatial``, the trainer builds a convolutional neural network that can detect patterns in the relative positions of game pieces. When you declare it non-spatial, it builds a simpler network that ignores position. Seeing the consequence of this choice in training behavior is a direct experience of why architecture matters. **Observation design** determines what information is available to the agent. If you leave a character out of the ``character_set``, the agent will not distinguish it from empty space. If the game module defines a ``get_state()`` function, the agent also receives those computed values as part of its observation. The consequences of these choices for what the agent can learn are reasonably predictable — and making and checking those predictions is exactly the kind of reasoning the tool is designed to support. **Reward engineering** is the craft of specifying what counts as doing well in a way the agent can actually optimize. Using score as the reward is natural for many games, but some games have sparse rewards (the agent rarely earns points), and some have reward signals that are easy to game. Experimenting with what to use as a reward—and observing how that choice shapes training—is one of the richest paths into understanding what reinforcement learning is actually optimizing. **Hyperparameter tuning** is the practice of adjusting training settings such as learning rate, exploration probability, and network size to improve training efficiency and final performance. ``retro-gamer`` exposes these settings explicitly and explains their role in the training log, so tuning them is connected to conceptual understanding rather than uninformed search. The interpretable training log ------------------------------ A key feature of ``retro-gamer`` is its training log. When training begins, the trainer writes a complete, plain-language account of the model it built: why it chose the architecture it did, what the observation vector contains, what actions the agent can take, and how the exploration and learning schedules are set up. Here is an example from training a snake agent: .. code-block:: text [INIT] === Network Architecture === [INIT] Board: 32×16, character set: 6 chars (one-hot per cell) [INIT] Observed state keys: 0 | Actions (incl. no-op): 5 [INIT] spatial=True → using CNN architecture [INIT] Rationale: the board is a 2-D spatial scene; a CNN captures [INIT] local patterns (walls, items nearby) more efficiently than an MLP. [INIT] CNN: Conv2d(6→32, k=3, pad=1) → ReLU → Conv2d(32→64, k=3, pad=1) → ReLU [INIT] CNN output: 64 channels × 16×32 = 32768 features (flattened) [INIT] MLP head input: 32768 (conv) + 0 (state) = 32768 [INIT] MLP: 32768 → 128 → 128 → 5 [INIT] Hidden layers: 2 | Layer width: 128 [INIT] Output: 5 Q-values [INIT] Actions: ['KEY_RIGHT', 'KEY_UP', 'KEY_LEFT', 'KEY_DOWN'] + (no-op) ... [EP 0001] total_reward=0.0 steps=2000 epsilon=0.9950 avg_loss=0.023540 [EP 0100] total_reward=3.0 steps=1847 epsilon=0.6065 avg_loss=0.001204 [EP 0500] total_reward=9.0 steps=1203 epsilon=0.0821 avg_loss=0.000387 The episode log shows total reward (score earned), how many turns the episode lasted, the current exploration rate (``epsilon``), and the average prediction error (``avg_loss``). Reading this log—and connecting changes in these numbers to what you know about the game and the algorithm—is one of the main activities the tool is designed to support.