Initial commit
This commit is contained in:
102
training_log.md
Normal file
102
training_log.md
Normal file
@@ -0,0 +1,102 @@
|
||||
# Forager Training Log
|
||||
|
||||
Document each training attempt below. For each attempt, write your hypothesis
|
||||
before you run the experiment, then fill in the evidence and analysis after.
|
||||
|
||||
Use `retro-gamer info runs/forager/` to see a summary of your run, and
|
||||
`cat runs/forager/training.log` to see the full log.
|
||||
|
||||
---
|
||||
|
||||
## Attempt 1
|
||||
|
||||
### Hypothesis
|
||||
|
||||
*Before training, predict what will happen with the default configuration.
|
||||
Will the agent learn to find the food? How quickly? What might go wrong?*
|
||||
|
||||
Your prediction:
|
||||
|
||||
### Configuration
|
||||
|
||||
*Copy the relevant sections of `runs/forager/config.toml` here.*
|
||||
|
||||
```toml
|
||||
|
||||
```
|
||||
|
||||
### Evidence
|
||||
|
||||
*Paste the first and last few lines of your training log, and any interesting
|
||||
moments in between.*
|
||||
|
||||
```
|
||||
|
||||
```
|
||||
|
||||
### Analysis
|
||||
|
||||
*What happened? How do the numbers — avg_reward, avg_steps, epsilon, avg_loss —
|
||||
tell the story of what the agent learned? Did the result match your prediction?*
|
||||
|
||||
---
|
||||
|
||||
## Attempt 2
|
||||
|
||||
### Hypothesis
|
||||
|
||||
*Based on what you observed in Attempt 1, what will you change and why?
|
||||
Predict the outcome.*
|
||||
|
||||
### Configuration
|
||||
|
||||
```toml
|
||||
|
||||
```
|
||||
|
||||
### Evidence
|
||||
|
||||
```
|
||||
|
||||
```
|
||||
|
||||
### Analysis
|
||||
|
||||
---
|
||||
|
||||
## Attempt 3 (if needed)
|
||||
|
||||
### Hypothesis
|
||||
|
||||
### Configuration
|
||||
|
||||
```toml
|
||||
|
||||
```
|
||||
|
||||
### Evidence
|
||||
|
||||
```
|
||||
|
||||
```
|
||||
|
||||
### Analysis
|
||||
|
||||
---
|
||||
|
||||
## Final analysis
|
||||
|
||||
**Which attempt produced the best-trained agent? Run `retro-gamer play` on your
|
||||
best run's checkpoints and describe what the agent does.**
|
||||
|
||||
*Your answer:*
|
||||
|
||||
**Compare two of your attempts. What changed between them, and how did that
|
||||
change affect the training curve?**
|
||||
|
||||
*Your answer:*
|
||||
|
||||
**If you had more time, what would you try next to improve the agent further?
|
||||
Refer to specific hyperparameters or configuration options.**
|
||||
|
||||
*Your answer:*
|
||||
Reference in New Issue
Block a user