entered in my data on bad drivers

now im trying to figure out how i can separate the states in to 5 regions
This commit is contained in:
tsmith37
2025-11-03 17:50:08 -05:00
parent 3d908eeb28
commit 6871602f17
3 changed files with 954 additions and 12 deletions

View File

@@ -0,0 +1,625 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "worldwide-blood",
"metadata": {},
"source": [
"# Introduction"
]
},
{
"cell_type": "markdown",
"id": "understanding-numbers",
"metadata": {},
"source": [
"*✏️ Write 2-3 sentences describing your research.*\n",
"\n",
"It's a collection of data on the reasons fatal car crashes occur in every state of America, and it will be used to determine which region of America is the deadliest. "
]
},
{
"cell_type": "markdown",
"id": "greater-circular",
"metadata": {},
"source": [
"## Overarching Question: What is the deadliest region in America to drive on?"
]
},
{
"cell_type": "markdown",
"id": "appreciated-testimony",
"metadata": {},
"source": [
"*✏️ Write 2-3 sentences explaining why this question.*\n",
"\n",
"I am interested in this because I live on the Northeast Coast and we have a lot of car \n",
"accidents. People drive very fast here. The roads are not always paved properly and maintained. I want to know if it's just bad luck when people get into accidents or if it's their own fault. "
]
},
{
"cell_type": "markdown",
"id": "permanent-pollution",
"metadata": {},
"source": [
"# Data"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "technical-evans",
"metadata": {},
"outputs": [],
"source": [
"#Include any import statements you will need\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "overhead-sigma",
"metadata": {},
"outputs": [],
"source": [
"### 💻 FILL IN YOUR DATASET FILE NAME BELOW 💻 ###\n",
"\n",
"file_name = \"bad-drivers.csv\"\n",
"dataset_path = \"data/\" + file_name\n",
"\n",
"df = pd.read_csv(dataset_path)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "heated-blade",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>State</th>\n",
" <th>Number of drivers involved in fatal collisions per billion miles</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents</th>\n",
" <th>Car Insurance Premiums ($)</th>\n",
" <th>Losses incurred by insurance companies for collisions per insured driver ($)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Alabama</td>\n",
" <td>18.8</td>\n",
" <td>39</td>\n",
" <td>30</td>\n",
" <td>96</td>\n",
" <td>80</td>\n",
" <td>784.55</td>\n",
" <td>145.08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Alaska</td>\n",
" <td>18.1</td>\n",
" <td>41</td>\n",
" <td>25</td>\n",
" <td>90</td>\n",
" <td>94</td>\n",
" <td>1053.48</td>\n",
" <td>133.93</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Arizona</td>\n",
" <td>18.6</td>\n",
" <td>35</td>\n",
" <td>28</td>\n",
" <td>84</td>\n",
" <td>96</td>\n",
" <td>899.47</td>\n",
" <td>110.35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Arkansas</td>\n",
" <td>22.4</td>\n",
" <td>18</td>\n",
" <td>26</td>\n",
" <td>94</td>\n",
" <td>95</td>\n",
" <td>827.34</td>\n",
" <td>142.39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>California</td>\n",
" <td>12.0</td>\n",
" <td>35</td>\n",
" <td>28</td>\n",
" <td>91</td>\n",
" <td>89</td>\n",
" <td>878.41</td>\n",
" <td>165.63</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" State \\\n",
"0 Alabama \n",
"1 Alaska \n",
"2 Arizona \n",
"3 Arkansas \n",
"4 California \n",
"\n",
" Number of drivers involved in fatal collisions per billion miles \\\n",
"0 18.8 \n",
"1 18.1 \n",
"2 18.6 \n",
"3 22.4 \n",
"4 12.0 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding \\\n",
"0 39 \n",
"1 41 \n",
"2 35 \n",
"3 18 \n",
"4 35 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired \\\n",
"0 30 \n",
"1 25 \n",
"2 28 \n",
"3 26 \n",
"4 28 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted \\\n",
"0 96 \n",
"1 90 \n",
"2 84 \n",
"3 94 \n",
"4 91 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents \\\n",
"0 80 \n",
"1 94 \n",
"2 96 \n",
"3 95 \n",
"4 89 \n",
"\n",
" Car Insurance Premiums ($) \\\n",
"0 784.55 \n",
"1 1053.48 \n",
"2 899.47 \n",
"3 827.34 \n",
"4 878.41 \n",
"\n",
" Losses incurred by insurance companies for collisions per insured driver ($) \n",
"0 145.08 \n",
"1 133.93 \n",
"2 110.35 \n",
"3 142.39 \n",
"4 165.63 "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"id": "continental-franklin",
"metadata": {},
"source": [
"**Data Overview**\n",
"\n",
"*✏️ Write 2-3 sentences describing this dataset. Be sure to include where the data comes from and what it contains.*\n",
"\n",
"I got the data set from FiveThirtyEight. It was used for an article called\n",
"\"Dear Mona, Which state has the worst drivers?\" in October 2014. The person who wrote the article is Mona Chalabi, they are a data editor at the Guardian US, \n",
"a columnist at New York Margazine, and a lead news writer for FiveThirtyEight.\n",
"\n",
"The date is about fatal collisions in each state. There are 7 rows, some of the rows\n",
"are about \"Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired\" and \"Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted\"\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "6ba44c9c-60d1-46a4-8257-b4e8eeea348d",
"metadata": {},
"source": [
"I will recategorise the data so that all of the states data will be separated into the five regions of the United States"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "f7bba5f3-5911-4a76-ad43-f6ce78cd4fb3",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>State</th>\n",
" <th>Number of drivers involved in fatal collisions per billion miles</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents</th>\n",
" <th>Car Insurance Premiums ($)</th>\n",
" <th>Losses incurred by insurance companies for collisions per insured driver ($)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>32</th>\n",
" <td>New York</td>\n",
" <td>12.3</td>\n",
" <td>32</td>\n",
" <td>29</td>\n",
" <td>88</td>\n",
" <td>80</td>\n",
" <td>1234.31</td>\n",
" <td>150.01</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" State \\\n",
"32 New York \n",
"\n",
" Number of drivers involved in fatal collisions per billion miles \\\n",
"32 12.3 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding \\\n",
"32 32 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired \\\n",
"32 29 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted \\\n",
"32 88 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents \\\n",
"32 80 \n",
"\n",
" Car Insurance Premiums ($) \\\n",
"32 1234.31 \n",
"\n",
" Losses incurred by insurance companies for collisions per insured driver ($) \n",
"32 150.01 "
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\n",
"Northeast = df[df.State == \"New York\"]\n",
"Northeast"
]
},
{
"cell_type": "markdown",
"id": "infinite-instrument",
"metadata": {},
"source": [
"# Methods and Results"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "basic-canadian",
"metadata": {},
"outputs": [],
"source": [
"#Import any helper files you need here"
]
},
{
"cell_type": "markdown",
"id": "recognized-positive",
"metadata": {},
"source": [
"## First Research Question: Is drinking and driving the biggest cause of fatal collisions?\\"
]
},
{
"cell_type": "markdown",
"id": "graduate-palmer",
"metadata": {},
"source": [
"### Methods"
]
},
{
"cell_type": "markdown",
"id": "endless-variation",
"metadata": {},
"source": [
"*Explain how you will approach this research question below. Consider the following:* \n",
" - *Which aspects of the dataset will you use?* \n",
" - *How will you reorganize/store the data?* \n",
" - *What data science tools/functions will you use and why?* \n",
" \n",
"✏️ *Write your answer below:*\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "portuguese-japan",
"metadata": {},
"source": [
"### Results "
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "negative-highlight",
"metadata": {},
"outputs": [],
"source": [
"#######################################################################\n",
"### 💻 YOUR WORK GOES HERE TO ANSWER THE FIRST RESEARCH QUESTION 💻 \n",
"### \n",
"### Your data analysis may include a statistic and/or a data visualization\n",
"#######################################################################"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "victorian-burning",
"metadata": {},
"outputs": [],
"source": [
"# 💻 YOU CAN ADD NEW CELLS WITH THE \"+\" BUTTON "
]
},
{
"cell_type": "markdown",
"id": "collectible-puppy",
"metadata": {},
"source": [
"## Second Research Question: What state is the most unluckiest state for fatel collisions?\n"
]
},
{
"cell_type": "markdown",
"id": "demographic-future",
"metadata": {},
"source": [
"### Methods"
]
},
{
"cell_type": "markdown",
"id": "incorporate-roller",
"metadata": {},
"source": [
"*Explain how you will approach this research question below. Consider the following:* \n",
" - *Which aspects of the dataset will you use?* \n",
" - *How will you reorganize/store the data?* \n",
" - *What data science tools/functions will you use and why?* \n",
"\n",
"✏️ *Write your answer below:*\n"
]
},
{
"cell_type": "markdown",
"id": "juvenile-creation",
"metadata": {},
"source": [
"### Results "
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "pursuant-surrey",
"metadata": {},
"outputs": [],
"source": [
"#######################################################################\n",
"### 💻 YOUR WORK GOES HERE TO ANSWER THE SECOND RESEARCH QUESTION 💻 \n",
"###\n",
"### Your data analysis may include a statistic and/or a data visualization\n",
"#######################################################################"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "located-night",
"metadata": {},
"outputs": [],
"source": [
"# 💻 YOU CAN ADD NEW CELLS WITH THE \"+\" BUTTON "
]
},
{
"cell_type": "markdown",
"id": "infectious-symbol",
"metadata": {},
"source": [
"# Discussion"
]
},
{
"cell_type": "markdown",
"id": "furnished-camping",
"metadata": {
"code_folding": []
},
"source": [
"## Considerations"
]
},
{
"cell_type": "markdown",
"id": "bearing-stadium",
"metadata": {},
"source": [
"*It's important to recognize the limitations of our research.\n",
"Consider the following:*\n",
"\n",
"- *Do the results give an accurate depiction of your research question? Why or why not?*\n",
"- *What were limitations of your datset?*\n",
"- *Are there any known biases in the data?*\n",
"\n",
"✏️ *Write your answer below:*"
]
},
{
"cell_type": "markdown",
"id": "beneficial-invasion",
"metadata": {},
"source": [
"## Summary"
]
},
{
"cell_type": "markdown",
"id": "about-raise",
"metadata": {},
"source": [
"*Summarize what you discovered through the research. Consider the following:*\n",
"\n",
"- *What did you learn about your media consumption/digital habits?*\n",
"- *Did the results make sense?*\n",
"- *What was most surprising?*\n",
"- *How will this project impact you going forward?*\n",
"\n",
"✏️ *Write your answer below:*"
]
}
],
"metadata": {
"jupytext": {
"cell_metadata_json": true,
"text_representation": {
"extension": ".Rmd",
"format_name": "rmarkdown",
"format_version": "1.2",
"jupytext_version": "1.9.1"
}
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.7"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -13,7 +13,9 @@
"id": "understanding-numbers",
"metadata": {},
"source": [
"*✏️ Write 2-3 sentences describing your research.*"
"*✏️ Write 2-3 sentences describing your research.*\n",
"\n",
"It's a collection of data on the reasons fatal car crashes occur in every state of America, and it will be used to determine which region of America is the deadliest. "
]
},
{
@@ -21,7 +23,7 @@
"id": "greater-circular",
"metadata": {},
"source": [
"## Overarching Question: [✏️ PUT YOUR QUESTION HERE ✏️]"
"## Overarching Question: What is the deadliest region in America to drive on?"
]
},
{
@@ -29,7 +31,10 @@
"id": "appreciated-testimony",
"metadata": {},
"source": [
"*✏️ Write 2-3 sentences explaining why this question.*"
"*✏️ Write 2-3 sentences explaining why this question.*\n",
"\n",
"I am interested in this because I live on the Northeast Coast and we have a lot of car \n",
"accidents. People drive very fast here. The roads are not always paved properly and maintained. I want to know if it's just bad luck when people get into accidents or if it's their own fault. "
]
},
{
@@ -42,7 +47,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"id": "technical-evans",
"metadata": {},
"outputs": [],
@@ -54,14 +59,14 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 5,
"id": "overhead-sigma",
"metadata": {},
"outputs": [],
"source": [
"### 💻 FILL IN YOUR DATASET FILE NAME BELOW 💻 ###\n",
"\n",
"file_name = \"YOUR_DATASET_FILE_NAME.csv\"\n",
"file_name = \"bad-drivers.csv\"\n",
"dataset_path = \"data/\" + file_name\n",
"\n",
"df = pd.read_csv(dataset_path)"
@@ -69,10 +74,164 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 8,
"id": "heated-blade",
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>State</th>\n",
" <th>Number of drivers involved in fatal collisions per billion miles</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents</th>\n",
" <th>Car Insurance Premiums ($)</th>\n",
" <th>Losses incurred by insurance companies for collisions per insured driver ($)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Alabama</td>\n",
" <td>18.8</td>\n",
" <td>39</td>\n",
" <td>30</td>\n",
" <td>96</td>\n",
" <td>80</td>\n",
" <td>784.55</td>\n",
" <td>145.08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Alaska</td>\n",
" <td>18.1</td>\n",
" <td>41</td>\n",
" <td>25</td>\n",
" <td>90</td>\n",
" <td>94</td>\n",
" <td>1053.48</td>\n",
" <td>133.93</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Arizona</td>\n",
" <td>18.6</td>\n",
" <td>35</td>\n",
" <td>28</td>\n",
" <td>84</td>\n",
" <td>96</td>\n",
" <td>899.47</td>\n",
" <td>110.35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Arkansas</td>\n",
" <td>22.4</td>\n",
" <td>18</td>\n",
" <td>26</td>\n",
" <td>94</td>\n",
" <td>95</td>\n",
" <td>827.34</td>\n",
" <td>142.39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>California</td>\n",
" <td>12.0</td>\n",
" <td>35</td>\n",
" <td>28</td>\n",
" <td>91</td>\n",
" <td>89</td>\n",
" <td>878.41</td>\n",
" <td>165.63</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" State \\\n",
"0 Alabama \n",
"1 Alaska \n",
"2 Arizona \n",
"3 Arkansas \n",
"4 California \n",
"\n",
" Number of drivers involved in fatal collisions per billion miles \\\n",
"0 18.8 \n",
"1 18.1 \n",
"2 18.6 \n",
"3 22.4 \n",
"4 12.0 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding \\\n",
"0 39 \n",
"1 41 \n",
"2 35 \n",
"3 18 \n",
"4 35 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired \\\n",
"0 30 \n",
"1 25 \n",
"2 28 \n",
"3 26 \n",
"4 28 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted \\\n",
"0 96 \n",
"1 90 \n",
"2 84 \n",
"3 94 \n",
"4 91 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents \\\n",
"0 80 \n",
"1 94 \n",
"2 96 \n",
"3 95 \n",
"4 89 \n",
"\n",
" Car Insurance Premiums ($) \\\n",
"0 784.55 \n",
"1 1053.48 \n",
"2 899.47 \n",
"3 827.34 \n",
"4 878.41 \n",
"\n",
" Losses incurred by insurance companies for collisions per insured driver ($) \n",
"0 145.08 \n",
"1 133.93 \n",
"2 110.35 \n",
"3 142.39 \n",
"4 165.63 "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
@@ -84,7 +243,113 @@
"source": [
"**Data Overview**\n",
"\n",
"*✏️ Write 2-3 sentences describing this dataset. Be sure to include where the data comes from and what it contains.*"
"*✏️ Write 2-3 sentences describing this dataset. Be sure to include where the data comes from and what it contains.*\n",
"\n",
"I got the data set from FiveThirtyEight. It was used for an article called\n",
"\"Dear Mona, Which state has the worst drivers?\" in October 2014. The person who wrote the article is Mona Chalabi, they are a data editor at the Guardian US, \n",
"a columnist at New York Margazine, and a lead news writer for FiveThirtyEight.\n",
"\n",
"The date is about fatal collisions in each state. There are 7 rows, some of the rows\n",
"are about \"Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired\" and \"Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted\"\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "6ba44c9c-60d1-46a4-8257-b4e8eeea348d",
"metadata": {},
"source": [
"I will recategorise the data so that all of the states data will be separated into the five regions of the United States"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "f7bba5f3-5911-4a76-ad43-f6ce78cd4fb3",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>State</th>\n",
" <th>Number of drivers involved in fatal collisions per billion miles</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents</th>\n",
" <th>Car Insurance Premiums ($)</th>\n",
" <th>Losses incurred by insurance companies for collisions per insured driver ($)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>32</th>\n",
" <td>New York</td>\n",
" <td>12.3</td>\n",
" <td>32</td>\n",
" <td>29</td>\n",
" <td>88</td>\n",
" <td>80</td>\n",
" <td>1234.31</td>\n",
" <td>150.01</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" State \\\n",
"32 New York \n",
"\n",
" Number of drivers involved in fatal collisions per billion miles \\\n",
"32 12.3 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding \\\n",
"32 32 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired \\\n",
"32 29 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted \\\n",
"32 88 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents \\\n",
"32 80 \n",
"\n",
" Car Insurance Premiums ($) \\\n",
"32 1234.31 \n",
"\n",
" Losses incurred by insurance companies for collisions per insured driver ($) \n",
"32 150.01 "
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\n",
"Northeast = df[df.State == \"New York\"]\n",
"Northeast"
]
},
{
@@ -110,7 +375,7 @@
"id": "recognized-positive",
"metadata": {},
"source": [
"## First Research Question: [✏️ PUT YOUR QUESTION HERE ✏️]\n"
"## First Research Question: Is drinking and driving the biggest cause of fatal collisions?\\"
]
},
{
@@ -172,7 +437,7 @@
"id": "collectible-puppy",
"metadata": {},
"source": [
"## Second Research Question: [✏️ PUT YOUR QUESTION HERE ✏️]\n"
"## Second Research Question: What state is the most unluckiest state for fatel collisions?\n"
]
},
{
@@ -310,7 +575,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
"version": "3.13.7"
},
"toc": {
"base_numbering": 1,

52
data/bad-drivers.csv Normal file
View File

@@ -0,0 +1,52 @@
State,Number of drivers involved in fatal collisions per billion miles,Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding,Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired,Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted,Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents,Car Insurance Premiums ($),Losses incurred by insurance companies for collisions per insured driver ($)
Alabama,18.8,39,30,96,80,784.55,145.08
Alaska,18.1,41,25,90,94,1053.48,133.93
Arizona,18.6,35,28,84,96,899.47,110.35
Arkansas,22.4,18,26,94,95,827.34,142.39
California,12,35,28,91,89,878.41,165.63
Colorado,13.6,37,28,79,95,835.5,139.91
Connecticut,10.8,46,36,87,82,1068.73,167.02
Delaware,16.2,38,30,87,99,1137.87,151.48
District of Columbia,5.9,34,27,100,100,1273.89,136.05
Florida,17.9,21,29,92,94,1160.13,144.18
Georgia,15.6,19,25,95,93,913.15,142.8
Hawaii,17.5,54,41,82,87,861.18,120.92
Idaho,15.3,36,29,85,98,641.96,82.75
Illinois,12.8,36,34,94,96,803.11,139.15
Indiana,14.5,25,29,95,95,710.46,108.92
Iowa,15.7,17,25,97,87,649.06,114.47
Kansas,17.8,27,24,77,85,780.45,133.8
Kentucky,21.4,19,23,78,76,872.51,137.13
Louisiana,20.5,35,33,73,98,1281.55,194.78
Maine,15.1,38,30,87,84,661.88,96.57
Maryland,12.5,34,32,71,99,1048.78,192.7
Massachusetts,8.2,23,35,87,80,1011.14,135.63
Michigan,14.1,24,28,95,77,1110.61,152.26
Minnesota,9.6,23,29,88,88,777.18,133.35
Mississippi,17.6,15,31,10,100,896.07,155.77
Missouri,16.1,43,34,92,84,790.32,144.45
Montana,21.4,39,44,84,85,816.21,85.15
Nebraska,14.9,13,35,93,90,732.28,114.82
Nevada,14.7,37,32,95,99,1029.87,138.71
New Hampshire,11.6,35,30,87,83,746.54,120.21
New Jersey,11.2,16,28,86,78,1301.52,159.85
New Mexico,18.4,19,27,67,98,869.85,120.75
New York,12.3,32,29,88,80,1234.31,150.01
North Carolina,16.8,39,31,94,81,708.24,127.82
North Dakota,23.9,23,42,99,86,688.75,109.72
Ohio,14.1,28,34,99,82,697.73,133.52
Oklahoma,19.9,32,29,92,94,881.51,178.86
Oregon,12.8,33,26,67,90,804.71,104.61
Pennsylvania,18.2,50,31,96,88,905.99,153.86
Rhode Island,11.1,34,38,92,79,1148.99,148.58
South Carolina,23.9,38,41,96,81,858.97,116.29
South Dakota,19.4,31,33,98,86,669.31,96.87
Tennessee,19.5,21,29,82,81,767.91,155.57
Texas,19.4,40,38,91,87,1004.75,156.83
Utah,11.3,43,16,88,96,809.38,109.48
Vermont,13.6,30,30,96,95,716.2,109.61
Virginia,12.7,19,27,87,88,768.95,153.72
Washington,10.6,42,33,82,86,890.03,111.62
West Virginia,23.8,34,28,97,87,992.61,152.56
Wisconsin,13.8,36,33,39,84,670.31,106.62
Wyoming,17.4,42,32,81,90,791.14,122.04
1 State Number of drivers involved in fatal collisions per billion miles Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents Car Insurance Premiums ($) Losses incurred by insurance companies for collisions per insured driver ($)
2 Alabama 18.8 39 30 96 80 784.55 145.08
3 Alaska 18.1 41 25 90 94 1053.48 133.93
4 Arizona 18.6 35 28 84 96 899.47 110.35
5 Arkansas 22.4 18 26 94 95 827.34 142.39
6 California 12 35 28 91 89 878.41 165.63
7 Colorado 13.6 37 28 79 95 835.5 139.91
8 Connecticut 10.8 46 36 87 82 1068.73 167.02
9 Delaware 16.2 38 30 87 99 1137.87 151.48
10 District of Columbia 5.9 34 27 100 100 1273.89 136.05
11 Florida 17.9 21 29 92 94 1160.13 144.18
12 Georgia 15.6 19 25 95 93 913.15 142.8
13 Hawaii 17.5 54 41 82 87 861.18 120.92
14 Idaho 15.3 36 29 85 98 641.96 82.75
15 Illinois 12.8 36 34 94 96 803.11 139.15
16 Indiana 14.5 25 29 95 95 710.46 108.92
17 Iowa 15.7 17 25 97 87 649.06 114.47
18 Kansas 17.8 27 24 77 85 780.45 133.8
19 Kentucky 21.4 19 23 78 76 872.51 137.13
20 Louisiana 20.5 35 33 73 98 1281.55 194.78
21 Maine 15.1 38 30 87 84 661.88 96.57
22 Maryland 12.5 34 32 71 99 1048.78 192.7
23 Massachusetts 8.2 23 35 87 80 1011.14 135.63
24 Michigan 14.1 24 28 95 77 1110.61 152.26
25 Minnesota 9.6 23 29 88 88 777.18 133.35
26 Mississippi 17.6 15 31 10 100 896.07 155.77
27 Missouri 16.1 43 34 92 84 790.32 144.45
28 Montana 21.4 39 44 84 85 816.21 85.15
29 Nebraska 14.9 13 35 93 90 732.28 114.82
30 Nevada 14.7 37 32 95 99 1029.87 138.71
31 New Hampshire 11.6 35 30 87 83 746.54 120.21
32 New Jersey 11.2 16 28 86 78 1301.52 159.85
33 New Mexico 18.4 19 27 67 98 869.85 120.75
34 New York 12.3 32 29 88 80 1234.31 150.01
35 North Carolina 16.8 39 31 94 81 708.24 127.82
36 North Dakota 23.9 23 42 99 86 688.75 109.72
37 Ohio 14.1 28 34 99 82 697.73 133.52
38 Oklahoma 19.9 32 29 92 94 881.51 178.86
39 Oregon 12.8 33 26 67 90 804.71 104.61
40 Pennsylvania 18.2 50 31 96 88 905.99 153.86
41 Rhode Island 11.1 34 38 92 79 1148.99 148.58
42 South Carolina 23.9 38 41 96 81 858.97 116.29
43 South Dakota 19.4 31 33 98 86 669.31 96.87
44 Tennessee 19.5 21 29 82 81 767.91 155.57
45 Texas 19.4 40 38 91 87 1004.75 156.83
46 Utah 11.3 43 16 88 96 809.38 109.48
47 Vermont 13.6 30 30 96 95 716.2 109.61
48 Virginia 12.7 19 27 87 88 768.95 153.72
49 Washington 10.6 42 33 82 86 890.03 111.62
50 West Virginia 23.8 34 28 97 87 992.61 152.56
51 Wisconsin 13.8 36 33 39 84 670.31 106.62
52 Wyoming 17.4 42 32 81 90 791.14 122.04