retro-gamer/memory/project_architecture.md at 5f74dbba8d0aacb8204ec577335ad2f08868aab7

Files

Chris Proctor 5ca97dc5d0 Initial commit

2026-05-08 14:07:17 -04:00

2.8 KiB

Raw Blame History

name, description, type

name	description	type
retro-gamer architecture	Two-package structure, key design decisions, and how the packages interact	project

retro-gamer trains DQN agents to play retro-games framework games. Two packages are involved:

retro (/Users/chrisp/Repos/MWC/packages/retro) — the game framework, modified:

retro/input.py — InputSource protocol, TerminalInput, ProgrammaticInput (for RL). ProgrammaticInput.press(key) queues a keystroke for the next step.
retro/views/ — View protocol, TerminalView (moved from view.py), HeadlessView (reads board into board_characters list-of-lists without terminal output). view.py kept as compat shim.
retro/game.py — Game.step() runs one turn (uses self.input_source, calls self.view.render() if set). Game.play() loops over step() with its own TerminalInput/TerminalView. Game.start() must be called before first step().
retro/examples/snake.py — added create_game(**kwargs) factory function returning an initialized Game.

retro-gamer (/Users/chrisp/Repos/MWC/packages/retro-gamer) — the RL toolkit:

metadata.py — GameMetadata dataclass (board_size, actions, reward, character_set, spatial, observe_state). TOML load/save via from_toml()/to_toml().
env.py — GameEnvironment(game_factory, metadata) with reset()→obs, step(action)→(obs,reward,done). Manages ProgrammaticInput + HeadlessView internally. Reward is delta of state[reward_key].
observation.py — one-hot encodes board to (H,W,C) array; for spatial games transposes to (C,H,W) then flattens; state keys appended. Always returns flat 1D np.ndarray.
network.py — build_network(metadata, hp)→(model, rationale_str). _SpatialNet uses Conv2d→flatten→MLP; _FlatNet uses MLP only. The flat obs vector's first CHW elements are board (channel-first), remainder is state.
memory.py — ReplayMemory (FIFO deque), PrioritizedReplayMemory (alpha/beta sampling).
trainer.py — DQNTrainer. Discovers character_set if not given. Writes architecture rationale to training.log on init. Saves config.toml (merges with existing to preserve game section). Checkpoints every 100 episodes + final.
cli.py — retro-gamer create/train/play/info. create writes config.toml with game module name. train loads config, calls DQNTrainer. play runs model vs terminal view using ProgrammaticInput+HeadlessView for obs and TerminalView for display.

Why: Designed as a pedagogy tool for students learning RL. Students specify GameMetadata and hyperparameters; the trainer makes architecture decisions and logs rationale.

How to apply: When adding features, keep RL concepts out of retro framework (input.py and views/ should not reference RL). When extending trainer, log all design decisions to training.log.

2.8 KiB Raw Blame History

2.8 KiB

Raw Blame History