lab_estimation/questions.md

# Estimation: Questions

---

## Checkpoint 1: Single, Binary, and Ordinal Predictors

**1. Record the slope, intercept, and test RMSE for your three single-predictor income models.**

| Predictor  | Slope | Intercept | Test RMSE |
|------------|-------|-----------|-----------|
| `education`|       |           |           |
| `health`   |       |           |           |
|            |       |           |           |

**2. What does the slope mean in plain language for each model?**


**3. `income` ranges from 1 to 8. Is a test RMSE of 1.5 good or bad?**


**4. Which predictor gives the lowest test RMSE?**


**5. `health` is self-reported: people rate their own health from 1 (poor) to 5 (excellent), with no objective measurement behind the number. Two people in similar physical condition might rate themselves differently depending on what they're used to comparing themselves against. How might that kind of bias affect a model trained on `health`?**


**6. For your binary predictor (`exercise` or `no_doctor`), what's the model's predicted income difference between the two groups?**


**7. If you used `exercise`: does this mean exercising more causes higher income? Could it run the other way—does having more income make regular exercise easier (gym access, safer neighborhoods, flexible work hours)? What does this tell you about associations in general?**


**8. Look back at your `education` and `health` models. Is the assumption that each step is evenly spaced more or less reasonable for these than it was for Pokémon's `generation`? Why?**

---

## Checkpoint 2: Multiple Regression

**9. Record the train and test RMSE for your multiple regression model predicting `health` from `education`, `income`, `exercise`, `age`, and `no_doctor`.**

|       | RMSE |
|------ | ---- |
| Train |      |
| Test  |      |

**10. How does test RMSE compare to your single-predictor models? Does train RMSE diverge from test RMSE, or do they stay close together?**


**11. Suppose an insurance company used a model like this one to predict an applicant's health, and used that prediction in deciding whether or not to provide insurance (or whether to charge more for insurance). What are the potential benefits of this use? The potential harms? Who might be unfairly affected?**


**12. If someone were denied insurance coverage because of this model's prediction, should they have the right to know why? Should they be able to challenge it? What would they need in order to do either?**


---

## Checkpoint 3: Closing Position

**13. Under what conditions, if any, is it appropriate to use health survey data like BRFSS to build predictive models for commercial purposes? Write a short paragraph (4–6 sentences).**

*Your answer:*