generated from mwc/project_argument
All that is left is the conclusion, write some comments and review it all before sending it out. excited to be almost done
1030 lines
151 KiB
Plaintext
1030 lines
151 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "worldwide-blood",
|
|
"metadata": {},
|
|
"source": [
|
|
"# A Data Science Investigation About Fatal Car Crashes in America "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "understanding-numbers",
|
|
"metadata": {},
|
|
"source": [
|
|
"*✏️ Write 2-3 sentences describing your research.*\n",
|
|
"\n",
|
|
"It's a collection of data on the reasons fatal car crashes occur in every state of America, and it will be used to determine which region of America is the deadliest. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "greater-circular",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Overarching Question: What is the deadliest region in America to drive on?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "appreciated-testimony",
|
|
"metadata": {},
|
|
"source": [
|
|
"*✏️ Write 2-3 sentences explaining why this question.*\n",
|
|
"\n",
|
|
"I am interested in this because I live on the Northeast Coast and we have a lot of car \n",
|
|
"accidents. People drive very fast here. The roads are not always paved properly and maintained. I want to know if it's just bad luck when people get into accidents or if it's their own fault. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "permanent-pollution",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Data"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "technical-evans",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#Include any import statements you will need\n",
|
|
"import pandas as pd\n",
|
|
"import matplotlib.pyplot as plt"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"id": "overhead-sigma",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"### 💻 FILL IN YOUR DATASET FILE NAME BELOW 💻 ###\n",
|
|
"\n",
|
|
"file_name = \"B_D - bad-drivers.csv\"\n",
|
|
"dataset_path = \"data/\" + file_name\n",
|
|
"\n",
|
|
"df = pd.read_csv(dataset_path)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"id": "heated-blade",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>State</th>\n",
|
|
" <th>number_drivers_fatal_billion_miles</th>\n",
|
|
" <th>percentage_drivers_fatal_speeding</th>\n",
|
|
" <th>percentage_drivers_fatal_alcohol_impaired</th>\n",
|
|
" <th>percentage_drivers_fatal_not_distracted</th>\n",
|
|
" <th>percentage_drivers_fatal_no_previous_accidents</th>\n",
|
|
" <th>car_insurance_premiums</th>\n",
|
|
" <th>region</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>Alabama</td>\n",
|
|
" <td>18.8</td>\n",
|
|
" <td>39</td>\n",
|
|
" <td>30</td>\n",
|
|
" <td>96</td>\n",
|
|
" <td>80</td>\n",
|
|
" <td>784.55</td>\n",
|
|
" <td>Southeast</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>Alaska</td>\n",
|
|
" <td>18.1</td>\n",
|
|
" <td>41</td>\n",
|
|
" <td>25</td>\n",
|
|
" <td>90</td>\n",
|
|
" <td>94</td>\n",
|
|
" <td>1053.48</td>\n",
|
|
" <td>West</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>Arizona</td>\n",
|
|
" <td>18.6</td>\n",
|
|
" <td>35</td>\n",
|
|
" <td>28</td>\n",
|
|
" <td>84</td>\n",
|
|
" <td>96</td>\n",
|
|
" <td>899.47</td>\n",
|
|
" <td>Southeast</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>Arkansas</td>\n",
|
|
" <td>22.4</td>\n",
|
|
" <td>18</td>\n",
|
|
" <td>26</td>\n",
|
|
" <td>94</td>\n",
|
|
" <td>95</td>\n",
|
|
" <td>827.34</td>\n",
|
|
" <td>Southeast</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>California</td>\n",
|
|
" <td>12.0</td>\n",
|
|
" <td>35</td>\n",
|
|
" <td>28</td>\n",
|
|
" <td>91</td>\n",
|
|
" <td>89</td>\n",
|
|
" <td>878.41</td>\n",
|
|
" <td>West</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" State number_drivers_fatal_billion_miles \\\n",
|
|
"0 Alabama 18.8 \n",
|
|
"1 Alaska 18.1 \n",
|
|
"2 Arizona 18.6 \n",
|
|
"3 Arkansas 22.4 \n",
|
|
"4 California 12.0 \n",
|
|
"\n",
|
|
" percentage_drivers_fatal_speeding \\\n",
|
|
"0 39 \n",
|
|
"1 41 \n",
|
|
"2 35 \n",
|
|
"3 18 \n",
|
|
"4 35 \n",
|
|
"\n",
|
|
" percentage_drivers_fatal_alcohol_impaired \\\n",
|
|
"0 30 \n",
|
|
"1 25 \n",
|
|
"2 28 \n",
|
|
"3 26 \n",
|
|
"4 28 \n",
|
|
"\n",
|
|
" percentage_drivers_fatal_not_distracted \\\n",
|
|
"0 96 \n",
|
|
"1 90 \n",
|
|
"2 84 \n",
|
|
"3 94 \n",
|
|
"4 91 \n",
|
|
"\n",
|
|
" percentage_drivers_fatal_no_previous_accidents car_insurance_premiums \\\n",
|
|
"0 80 784.55 \n",
|
|
"1 94 1053.48 \n",
|
|
"2 96 899.47 \n",
|
|
"3 95 827.34 \n",
|
|
"4 89 878.41 \n",
|
|
"\n",
|
|
" region \n",
|
|
"0 Southeast \n",
|
|
"1 West \n",
|
|
"2 Southeast \n",
|
|
"3 Southeast \n",
|
|
"4 West "
|
|
]
|
|
},
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"df.head()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "continental-franklin",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Data Overview**\n",
|
|
"\n",
|
|
"*✏️ Write 2-3 sentences describing this dataset. Be sure to include where the data comes from and what it contains.*\n",
|
|
"\n",
|
|
"### When is this data set from?\n",
|
|
"\n",
|
|
"I got the data set from FiveThirtyEight. It was used for an article called\n",
|
|
"\"Dear Mona, Which state has the worst drivers?\" in October 2014. The person who wrote the article is Mona Chalabi, they are a data editor at the Guardian US, \n",
|
|
"a columnist at New York Magazine, and a lead news writer for FiveThirtyEight.\n",
|
|
"\n",
|
|
"The date is about fatal collisions in each state. There are 8 rows:\n",
|
|
"\n",
|
|
"1. State\n",
|
|
"2. Number of drivers involved in fatal collisions per billion miles\n",
|
|
"3. Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding\n",
|
|
"4. Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired\n",
|
|
"5. Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted\n",
|
|
"6. Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents\n",
|
|
"7. Car Insurance Premiums ($)\n",
|
|
"8. Region\n",
|
|
"\n",
|
|
"### How did this data set get clean?\n",
|
|
"\n",
|
|
"I did not need to do much cleaning of the data myself, but I did add a column called \"Region\" to separate the state into 5 different regions: Northwest, Midwest, Southeast, West, and Northeast. I also excluded data on Losses incurred by insurance companies for collisions per insured driver because insurance companies are well known for finding ways to get out of paying customers for collisions, thus it is not an accurate representation of fatal car crashes. \n",
|
|
"\n",
|
|
"## What specific research questions will you investigate?\n",
|
|
"\n",
|
|
"1. What region has the highest drinking and driving cause of fatal collisions?\n",
|
|
"\n",
|
|
"2. What region has the highest car insurance premiums?\n",
|
|
"\n",
|
|
"3. What region is the most unlucky state for fatal collisions?\n",
|
|
"\n",
|
|
"4. Is there a connection between the speed and the roads that are causing fatal collisions, that would make the Car Insurance Premiums more expensive?\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"id": "f7bba5f3-5911-4a76-ad43-f6ce78cd4fb3",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Index(['State', 'number_drivers_fatal_billion_miles',\n",
|
|
" 'percentage_drivers_fatal_speeding',\n",
|
|
" 'percentage_drivers_fatal_alcohol_impaired',\n",
|
|
" 'percentage_drivers_fatal_not_distracted',\n",
|
|
" 'percentage_drivers_fatal_no_previous_accidents',\n",
|
|
" 'car_insurance_premiums', 'region'],\n",
|
|
" dtype='object')"
|
|
]
|
|
},
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"df.columns"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "infinite-instrument",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Methods and Results"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"id": "basic-canadian",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"<function seaborn.rcmod.set_theme(context='notebook', style='darkgrid', palette='deep', font='sans-serif', font_scale=1, color_codes=True, rc=None)>"
|
|
]
|
|
},
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"import seaborn as sns\n",
|
|
"sns.set_theme"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "recognized-positive",
|
|
"metadata": {},
|
|
"source": [
|
|
"## First Research Question: What region has the highest drinking and driving cause of fatal collisions?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "graduate-palmer",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Methods"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "endless-variation",
|
|
"metadata": {},
|
|
"source": [
|
|
"*Explain how you will approach this research question below. Consider the following:* \n",
|
|
" - *Which aspects of the dataset will you use?* \n",
|
|
" - *How will you reorganize/store the data?* \n",
|
|
" - *What data science tools/functions will you use and why?* \n",
|
|
" \n",
|
|
"✏️ *Write your answer below:*\n",
|
|
"\n",
|
|
"To answer this question, I will organize the data for each state by the region it is in. Then, calculate the average percentage of drivers involved in fatal collisions who were alcohol-impaired. Finally, I will make a bar plot to compare the average number of fatal collisions that involved drinking and driving for each of the regions\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "portuguese-japan",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Results "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"id": "negative-highlight",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"region\n",
|
|
"Southeast 29.687500\n",
|
|
"West 30.363636\n",
|
|
"Northwest 31.000000\n",
|
|
"Northeast 31.444444\n",
|
|
"Midwest 31.666667\n",
|
|
"Name: percentage_drivers_fatal_alcohol_impaired, dtype: float64"
|
|
]
|
|
},
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"#######################################################################\n",
|
|
"### 💻 YOUR WORK GOES HERE TO ANSWER THE FIRST RESEARCH QUESTION 💻 \n",
|
|
"### \n",
|
|
"### Your data analysis may include a statistic and/or a data visualization\n",
|
|
"#######################################################################\n",
|
|
"\n",
|
|
"region = df.groupby(\"region\").percentage_drivers_fatal_alcohol_impaired.mean().sort_values()\n",
|
|
"region\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"id": "victorian-burning",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"<Axes: xlabel='region', ylabel='percentage_drivers_fatal_alcohol_impaired'>"
|
|
]
|
|
},
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
},
|
|
{
|
|
"data": {
|
|
"image/png": "",
|
|
"text/plain": [
|
|
"<Figure size 640x480 with 1 Axes>"
|
|
]
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"sns.barplot(data=df, x=\"region\", y=\"percentage_drivers_fatal_alcohol_impaired\", errorbar=\"sd\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "collectible-puppy",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Second Research Question: What region has the highest car insurance premiums?\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "demographic-future",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Methods"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "incorporate-roller",
|
|
"metadata": {},
|
|
"source": [
|
|
"*Explain how you will approach this research question below. Consider the following:* \n",
|
|
" - *Which aspects of the dataset will you use?* \n",
|
|
" - *How will you reorganize/store the data?* \n",
|
|
" - *What data science tools/functions will you use and why?* \n",
|
|
"\n",
|
|
"✏️ *Write your answer below:*\n",
|
|
"\n",
|
|
"To answer this question, I will organize the data for each state by the region it is in. Then, compare the average cost of car insurance and see which region is the highest.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "juvenile-creation",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Results "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"id": "pursuant-surrey",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"region\n",
|
|
"Midwest 756.630833\n",
|
|
"West 855.624545\n",
|
|
"Southeast 905.472500\n",
|
|
"Northeast 975.038889\n",
|
|
"Northwest 1160.163333\n",
|
|
"Name: car_insurance_premiums, dtype: float64"
|
|
]
|
|
},
|
|
"execution_count": 9,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"#######################################################################\n",
|
|
"### 💻 YOUR WORK GOES HERE TO ANSWER THE SECOND RESEARCH QUESTION 💻 \n",
|
|
"###\n",
|
|
"### Your data analysis may include a statistic and/or a data visualization\n",
|
|
"#######################################################################\n",
|
|
"\n",
|
|
"df.groupby(\"region\").car_insurance_premiums.mean().sort_values()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"id": "located-night",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"<Axes: xlabel='region', ylabel='car_insurance_premiums'>"
|
|
]
|
|
},
|
|
"execution_count": 10,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
},
|
|
{
|
|
"data": {
|
|
"image/png": "",
|
|
"text/plain": [
|
|
"<Figure size 640x480 with 1 Axes>"
|
|
]
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"sns.barplot(data=df, x=\"region\", y=\"car_insurance_premiums\", errorbar=\"sd\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "8ab785a8-ac72-4fec-8d4b-7ad7f93de32d",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Third Research Question: What region is the most unlucky state for fatal collisions?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "810dc600-da04-437d-a546-4d6c5bec01c6",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Methods"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "be64d030-0f40-4c32-ac3e-be494e64b3a7",
|
|
"metadata": {},
|
|
"source": [
|
|
"*Explain how you will approach this research question below. Consider the following:* \n",
|
|
" - *Which aspects of the dataset will you use?* \n",
|
|
" - *How will you reorganize/store the data?* \n",
|
|
" - *What data science tools/functions will you use and why?* \n",
|
|
"\n",
|
|
"✏️ *Write your answer below:*\n",
|
|
"\n",
|
|
"To answer this question, I will organize the data for each state by the region it is in. Then, compare the average percentage of Drivers Involved In Fatal Collisions Who Were Not Distracted and the average percentage of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 24,
|
|
"id": "096fe314-2953-4644-86e0-cd717f77eb8f",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>percentage_drivers_fatal_not_distracted</th>\n",
|
|
" <th>percentage_drivers_fatal_no_previous_accidents</th>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>region</th>\n",
|
|
" <th></th>\n",
|
|
" <th></th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>Midwest</th>\n",
|
|
" <td>88.833333</td>\n",
|
|
" <td>86.666667</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>Northeast</th>\n",
|
|
" <td>87.777778</td>\n",
|
|
" <td>85.111111</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>Northwest</th>\n",
|
|
" <td>91.333333</td>\n",
|
|
" <td>93.666667</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>Southeast</th>\n",
|
|
" <td>83.000000</td>\n",
|
|
" <td>89.312500</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>West</th>\n",
|
|
" <td>84.000000</td>\n",
|
|
" <td>91.727273</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" percentage_drivers_fatal_not_distracted \\\n",
|
|
"region \n",
|
|
"Midwest 88.833333 \n",
|
|
"Northeast 87.777778 \n",
|
|
"Northwest 91.333333 \n",
|
|
"Southeast 83.000000 \n",
|
|
"West 84.000000 \n",
|
|
"\n",
|
|
" percentage_drivers_fatal_no_previous_accidents \n",
|
|
"region \n",
|
|
"Midwest 86.666667 \n",
|
|
"Northeast 85.111111 \n",
|
|
"Northwest 93.666667 \n",
|
|
"Southeast 89.312500 \n",
|
|
"West 91.727273 "
|
|
]
|
|
},
|
|
"execution_count": 24,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"region = df.groupby(\"region\")\n",
|
|
"region_mean = region[[\"percentage_drivers_fatal_not_distracted\", \"percentage_drivers_fatal_no_previous_accidents\"]].mean().sort_values(\"region\")\n",
|
|
"region_mean"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 25,
|
|
"id": "3777e6d1-08a5-4747-9100-9af86f88e90a",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"<Axes: xlabel='region', ylabel='percentage_drivers_fatal_not_distracted'>"
|
|
]
|
|
},
|
|
"execution_count": 25,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
},
|
|
{
|
|
"data": {
|
|
"image/png": "",
|
|
"text/plain": [
|
|
"<Figure size 640x480 with 1 Axes>"
|
|
]
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"sns.barplot(data=region_mean, x=\"region\", y=\"percentage_drivers_fatal_not_distracted\")\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 26,
|
|
"id": "21d5122b-9973-43f9-b4f0-7db7a6c936d9",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"<Axes: xlabel='region', ylabel='percentage_drivers_fatal_no_previous_accidents'>"
|
|
]
|
|
},
|
|
"execution_count": 26,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
},
|
|
{
|
|
"data": {
|
|
"image/png": "",
|
|
"text/plain": [
|
|
"<Figure size 640x480 with 1 Axes>"
|
|
]
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"sns.barplot(data=region_mean, x=\"region\", y=\"percentage_drivers_fatal_no_previous_accidents\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d66967db-fe78-4889-824e-f7bce4e02cc8",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Fourth Research Question: Is there a connection between the average Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding and the region with the most expensive car insurance premiums?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "9661b6d4-3c4f-42a2-8916-b9df38375760",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Methods"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "cc44ade7-3ae9-44a0-b821-29ecb1b66385",
|
|
"metadata": {},
|
|
"source": [
|
|
"Explain how you will approach this research question below. Consider the following:\n",
|
|
"\n",
|
|
"Which aspects of the dataset will you use?\n",
|
|
"How will you reorganize/store the data?\n",
|
|
"What data science tools/functions will you use and why?\n",
|
|
"✏️ Write your answer below:\n",
|
|
"\n",
|
|
"To answer this question, I will organize the data for each state by the region it is in. Then, compare the average Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding to see if there is a connection with the region with the highest car insurance."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 27,
|
|
"id": "b921b74c-951a-4f30-a42e-292f011fd61a",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>percentage_drivers_fatal_speeding</th>\n",
|
|
" <th>car_insurance_premiums</th>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>region</th>\n",
|
|
" <th></th>\n",
|
|
" <th></th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>Midwest</th>\n",
|
|
" <td>27.166667</td>\n",
|
|
" <td>756.630833</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>Northeast</th>\n",
|
|
" <td>32.444444</td>\n",
|
|
" <td>975.038889</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>Northwest</th>\n",
|
|
" <td>39.333333</td>\n",
|
|
" <td>1160.163333</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>Southeast</th>\n",
|
|
" <td>27.687500</td>\n",
|
|
" <td>905.472500</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>West</th>\n",
|
|
" <td>39.909091</td>\n",
|
|
" <td>855.624545</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" percentage_drivers_fatal_speeding car_insurance_premiums\n",
|
|
"region \n",
|
|
"Midwest 27.166667 756.630833\n",
|
|
"Northeast 32.444444 975.038889\n",
|
|
"Northwest 39.333333 1160.163333\n",
|
|
"Southeast 27.687500 905.472500\n",
|
|
"West 39.909091 855.624545"
|
|
]
|
|
},
|
|
"execution_count": 27,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"region = df.groupby(\"region\")\n",
|
|
"region_speed_insur = region[[\"percentage_drivers_fatal_speeding\", \"car_insurance_premiums\"]].mean().sort_values(\"region\")\n",
|
|
"region_speed_insur"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 34,
|
|
"id": "c242ede2-9440-4d49-8967-e6dfce650ec1",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"<Axes: xlabel='region', ylabel='percentage_drivers_fatal_speeding'>"
|
|
]
|
|
},
|
|
"execution_count": 34,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
},
|
|
{
|
|
"data": {
|
|
"image/png": "",
|
|
"text/plain": [
|
|
"<Figure size 640x480 with 1 Axes>"
|
|
]
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"sns.barplot(data=region_speed_insur, y=\"percentage_drivers_fatal_speeding\", x=\"region\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 35,
|
|
"id": "c8079c91-3eb2-4dfd-88c0-53f8a81c9892",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"<Axes: xlabel='region', ylabel='car_insurance_premiums'>"
|
|
]
|
|
},
|
|
"execution_count": 35,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
},
|
|
{
|
|
"data": {
|
|
"image/png": "",
|
|
"text/plain": [
|
|
"<Figure size 640x480 with 1 Axes>"
|
|
]
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"sns.barplot(data=region_speed_insur, x=\"region\", y=\"car_insurance_premiums\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "infectious-symbol",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Discussion"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "furnished-camping",
|
|
"metadata": {
|
|
"code_folding": []
|
|
},
|
|
"source": [
|
|
"## Considerations"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "bearing-stadium",
|
|
"metadata": {},
|
|
"source": [
|
|
"*It's important to recognize the limitations of our research.\n",
|
|
"Consider the following:*\n",
|
|
"\n",
|
|
"- *Do the results give an accurate depiction of your research question? Why or why not?*\n",
|
|
"- *What were limitations of your datset?*\n",
|
|
"- *Are there any known biases in the data?*\n",
|
|
"\n",
|
|
"✏️ *Write your answer below:*"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "beneficial-invasion",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Summary"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "about-raise",
|
|
"metadata": {},
|
|
"source": [
|
|
"*Summarize what you discovered through the research. Consider the following:*\n",
|
|
"\n",
|
|
"- *What did you learn about your media consumption/digital habits?*\n",
|
|
"- *Did the results make sense?*\n",
|
|
"- *What was most surprising?*\n",
|
|
"- *How will this project impact you going forward?*\n",
|
|
"\n",
|
|
"✏️ *Write your answer below:*"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"jupytext": {
|
|
"cell_metadata_json": true,
|
|
"text_representation": {
|
|
"extension": ".Rmd",
|
|
"format_name": "rmarkdown",
|
|
"format_version": "1.2",
|
|
"jupytext_version": "1.9.1"
|
|
}
|
|
},
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.13.7"
|
|
},
|
|
"toc": {
|
|
"base_numbering": 1,
|
|
"nav_menu": {},
|
|
"number_sections": false,
|
|
"sideBar": true,
|
|
"skip_h1_title": false,
|
|
"title_cell": "Table of Contents",
|
|
"title_sidebar": "Contents",
|
|
"toc_cell": false,
|
|
"toc_position": {},
|
|
"toc_section_display": true,
|
|
"toc_window_display": false
|
|
},
|
|
"varInspector": {
|
|
"cols": {
|
|
"lenName": 16,
|
|
"lenType": 16,
|
|
"lenVar": 40
|
|
},
|
|
"kernels_config": {
|
|
"python": {
|
|
"delete_cmd_postfix": "",
|
|
"delete_cmd_prefix": "del ",
|
|
"library": "var_list.py",
|
|
"varRefreshCmd": "print(var_dic_list())"
|
|
},
|
|
"r": {
|
|
"delete_cmd_postfix": ") ",
|
|
"delete_cmd_prefix": "rm(",
|
|
"library": "var_list.r",
|
|
"varRefreshCmd": "cat(var_dic_list()) "
|
|
}
|
|
},
|
|
"types_to_exclude": [
|
|
"module",
|
|
"function",
|
|
"builtin_function_or_method",
|
|
"instance",
|
|
"_Feature"
|
|
],
|
|
"window_display": false
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|