2.7 KiB
Estimation: Questions
Checkpoint 1: Single, Binary, and Ordinal Predictors
1. Record the slope, intercept, and test RMSE for your three single-predictor income models.
| Predictor | Slope | Intercept | Test RMSE |
|---|---|---|---|
education |
|||
health |
|||
2. What does the slope mean in plain language for each model?
3. income ranges from 1 to 8. Is a test RMSE of 1.5 good or bad?
4. Which predictor gives the lowest test RMSE?
5. health is self-reported: people rate their own health from 1 (poor) to 5 (excellent), with no objective measurement behind the number. Two people in similar physical condition might rate themselves differently depending on what they're used to comparing themselves against. How might that kind of bias affect a model trained on health?
6. For your binary predictor (exercise or no_doctor), what's the model's predicted income difference between the two groups?
7. If you used exercise: does this mean exercising more causes higher income? Could it run the other way—does having more income make regular exercise easier (gym access, safer neighborhoods, flexible work hours)? What does this tell you about associations in general?
8. Look back at your education and health models. Is the assumption that each step is evenly spaced more or less reasonable for these than it was for Pokémon's generation? Why?
Checkpoint 2: Multiple Regression
9. Record the train and test RMSE for your multiple regression model predicting health from education, income, exercise, age, and no_doctor.
| RMSE | |
|---|---|
| Train | |
| Test |
10. How does test RMSE compare to your single-predictor models? Does train RMSE diverge from test RMSE, or do they stay close together?
11. Suppose an insurance company used a model like this one to predict an applicant's health, and used that prediction in deciding whether or not to provide insurance (or whether to charge more for insurance). What are the potential benefits of this use? The potential harms? Who might be unfairly affected?
12. If someone were denied insurance coverage because of this model's prediction, should they have the right to know why? Should they be able to challenge it? What would they need in order to do either?
Checkpoint 3: Closing Position
13. Under what conditions, if any, is it appropriate to use health survey data like BRFSS to build predictive models for commercial purposes? Write a short paragraph (4–6 sentences).
Your answer: