Initial commit

This commit is contained in:
Chris Proctor
2026-05-08 14:07:17 -04:00
commit 5ca97dc5d0
36 changed files with 4147 additions and 0 deletions

View File

@@ -0,0 +1,26 @@
---
name: retro-gamer architecture
description: Two-package structure, key design decisions, and how the packages interact
type: project
---
retro-gamer trains DQN agents to play retro-games framework games. Two packages are involved:
**retro** (`/Users/chrisp/Repos/MWC/packages/retro`) — the game framework, modified:
- `retro/input.py``InputSource` protocol, `TerminalInput`, `ProgrammaticInput` (for RL). `ProgrammaticInput.press(key)` queues a keystroke for the next step.
- `retro/views/``View` protocol, `TerminalView` (moved from `view.py`), `HeadlessView` (reads board into `board_characters` list-of-lists without terminal output). `view.py` kept as compat shim.
- `retro/game.py``Game.step()` runs one turn (uses `self.input_source`, calls `self.view.render()` if set). `Game.play()` loops over `step()` with its own TerminalInput/TerminalView. `Game.start()` must be called before first `step()`.
- `retro/examples/snake.py` — added `create_game(**kwargs)` factory function returning an initialized Game.
**retro-gamer** (`/Users/chrisp/Repos/MWC/packages/retro-gamer`) — the RL toolkit:
- `metadata.py``GameMetadata` dataclass (board_size, actions, reward, character_set, spatial, observe_state). TOML load/save via `from_toml()`/`to_toml()`.
- `env.py``GameEnvironment(game_factory, metadata)` with `reset()→obs`, `step(action)→(obs,reward,done)`. Manages `ProgrammaticInput` + `HeadlessView` internally. Reward is delta of state[reward_key].
- `observation.py` — one-hot encodes board to (H,W,C) array; for spatial games transposes to (C,H,W) then flattens; state keys appended. Always returns flat 1D np.ndarray.
- `network.py``build_network(metadata, hp)→(model, rationale_str)`. `_SpatialNet` uses Conv2d→flatten→MLP; `_FlatNet` uses MLP only. The flat obs vector's first C*H*W elements are board (channel-first), remainder is state.
- `memory.py``ReplayMemory` (FIFO deque), `PrioritizedReplayMemory` (alpha/beta sampling).
- `trainer.py``DQNTrainer`. Discovers character_set if not given. Writes architecture rationale to `training.log` on init. Saves `config.toml` (merges with existing to preserve `game` section). Checkpoints every 100 episodes + final.
- `cli.py``retro-gamer create/train/play/info`. `create` writes config.toml with game module name. `train` loads config, calls DQNTrainer. `play` runs model vs terminal view using ProgrammaticInput+HeadlessView for obs and TerminalView for display.
**Why:** Designed as a pedagogy tool for students learning RL. Students specify GameMetadata and hyperparameters; the trainer makes architecture decisions and logs rationale.
**How to apply:** When adding features, keep RL concepts out of retro framework (input.py and views/ should not reference RL). When extending trainer, log all design decisions to training.log.