Initial commit

2026-05-08 14:07:17 -04:00
commit 5ca97dc5d0
36 changed files with 4147 additions and 0 deletions
--- a/memory/project_architecture.md
+++ b/memory/project_architecture.md
@@ -0,0 +1,26 @@
+---
+name: retro-gamer architecture
+description: Two-package structure, key design decisions, and how the packages interact
+type: project
+---
+
+retro-gamer trains DQN agents to play retro-games framework games. Two packages are involved:
+
+**retro** (`/Users/chrisp/Repos/MWC/packages/retro`) — the game framework, modified:
+- `retro/input.py` — `InputSource` protocol, `TerminalInput`, `ProgrammaticInput` (for RL). `ProgrammaticInput.press(key)` queues a keystroke for the next step.
+- `retro/views/` — `View` protocol, `TerminalView` (moved from `view.py`), `HeadlessView` (reads board into `board_characters` list-of-lists without terminal output). `view.py` kept as compat shim.
+- `retro/game.py` — `Game.step()` runs one turn (uses `self.input_source`, calls `self.view.render()` if set). `Game.play()` loops over `step()` with its own TerminalInput/TerminalView. `Game.start()` must be called before first `step()`.
+- `retro/examples/snake.py` — added `create_game(**kwargs)` factory function returning an initialized Game.
+
+**retro-gamer** (`/Users/chrisp/Repos/MWC/packages/retro-gamer`) — the RL toolkit:
+- `metadata.py` — `GameMetadata` dataclass (board_size, actions, reward, character_set, spatial, observe_state). TOML load/save via `from_toml()`/`to_toml()`.
+- `env.py` — `GameEnvironment(game_factory, metadata)` with `reset()→obs`, `step(action)→(obs,reward,done)`. Manages `ProgrammaticInput` + `HeadlessView` internally. Reward is delta of state[reward_key].
+- `observation.py` — one-hot encodes board to (H,W,C) array; for spatial games transposes to (C,H,W) then flattens; state keys appended. Always returns flat 1D np.ndarray.
+- `network.py` — `build_network(metadata, hp)→(model, rationale_str)`. `_SpatialNet` uses Conv2d→flatten→MLP; `_FlatNet` uses MLP only. The flat obs vector's first C*H*W elements are board (channel-first), remainder is state.
+- `memory.py` — `ReplayMemory` (FIFO deque), `PrioritizedReplayMemory` (alpha/beta sampling).
+- `trainer.py` — `DQNTrainer`. Discovers character_set if not given. Writes architecture rationale to `training.log` on init. Saves `config.toml` (merges with existing to preserve `game` section). Checkpoints every 100 episodes + final.
+- `cli.py` — `retro-gamer create/train/play/info`. `create` writes config.toml with game module name. `train` loads config, calls DQNTrainer. `play` runs model vs terminal view using ProgrammaticInput+HeadlessView for obs and TerminalView for display.
+
+**Why:** Designed as a pedagogy tool for students learning RL. Students specify GameMetadata and hyperparameters; the trainer makes architecture decisions and logs rationale.
+
+**How to apply:** When adding features, keep RL concepts out of retro framework (input.py and views/ should not reference RL). When extending trainer, log all design decisions to training.log.