entered in my data on bad drivers

now im trying to figure out how i can separate the states in to 5 regions
This commit is contained in:
tsmith37
2025-11-03 17:50:08 -05:00
parent 3d908eeb28
commit 6871602f17
3 changed files with 954 additions and 12 deletions

View File

@@ -13,7 +13,9 @@
"id": "understanding-numbers",
"metadata": {},
"source": [
"*✏️ Write 2-3 sentences describing your research.*"
"*✏️ Write 2-3 sentences describing your research.*\n",
"\n",
"It's a collection of data on the reasons fatal car crashes occur in every state of America, and it will be used to determine which region of America is the deadliest. "
]
},
{
@@ -21,7 +23,7 @@
"id": "greater-circular",
"metadata": {},
"source": [
"## Overarching Question: [✏️ PUT YOUR QUESTION HERE ✏️]"
"## Overarching Question: What is the deadliest region in America to drive on?"
]
},
{
@@ -29,7 +31,10 @@
"id": "appreciated-testimony",
"metadata": {},
"source": [
"*✏️ Write 2-3 sentences explaining why this question.*"
"*✏️ Write 2-3 sentences explaining why this question.*\n",
"\n",
"I am interested in this because I live on the Northeast Coast and we have a lot of car \n",
"accidents. People drive very fast here. The roads are not always paved properly and maintained. I want to know if it's just bad luck when people get into accidents or if it's their own fault. "
]
},
{
@@ -42,7 +47,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"id": "technical-evans",
"metadata": {},
"outputs": [],
@@ -54,14 +59,14 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 5,
"id": "overhead-sigma",
"metadata": {},
"outputs": [],
"source": [
"### 💻 FILL IN YOUR DATASET FILE NAME BELOW 💻 ###\n",
"\n",
"file_name = \"YOUR_DATASET_FILE_NAME.csv\"\n",
"file_name = \"bad-drivers.csv\"\n",
"dataset_path = \"data/\" + file_name\n",
"\n",
"df = pd.read_csv(dataset_path)"
@@ -69,10 +74,164 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 8,
"id": "heated-blade",
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>State</th>\n",
" <th>Number of drivers involved in fatal collisions per billion miles</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents</th>\n",
" <th>Car Insurance Premiums ($)</th>\n",
" <th>Losses incurred by insurance companies for collisions per insured driver ($)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Alabama</td>\n",
" <td>18.8</td>\n",
" <td>39</td>\n",
" <td>30</td>\n",
" <td>96</td>\n",
" <td>80</td>\n",
" <td>784.55</td>\n",
" <td>145.08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Alaska</td>\n",
" <td>18.1</td>\n",
" <td>41</td>\n",
" <td>25</td>\n",
" <td>90</td>\n",
" <td>94</td>\n",
" <td>1053.48</td>\n",
" <td>133.93</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Arizona</td>\n",
" <td>18.6</td>\n",
" <td>35</td>\n",
" <td>28</td>\n",
" <td>84</td>\n",
" <td>96</td>\n",
" <td>899.47</td>\n",
" <td>110.35</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Arkansas</td>\n",
" <td>22.4</td>\n",
" <td>18</td>\n",
" <td>26</td>\n",
" <td>94</td>\n",
" <td>95</td>\n",
" <td>827.34</td>\n",
" <td>142.39</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>California</td>\n",
" <td>12.0</td>\n",
" <td>35</td>\n",
" <td>28</td>\n",
" <td>91</td>\n",
" <td>89</td>\n",
" <td>878.41</td>\n",
" <td>165.63</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" State \\\n",
"0 Alabama \n",
"1 Alaska \n",
"2 Arizona \n",
"3 Arkansas \n",
"4 California \n",
"\n",
" Number of drivers involved in fatal collisions per billion miles \\\n",
"0 18.8 \n",
"1 18.1 \n",
"2 18.6 \n",
"3 22.4 \n",
"4 12.0 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding \\\n",
"0 39 \n",
"1 41 \n",
"2 35 \n",
"3 18 \n",
"4 35 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired \\\n",
"0 30 \n",
"1 25 \n",
"2 28 \n",
"3 26 \n",
"4 28 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted \\\n",
"0 96 \n",
"1 90 \n",
"2 84 \n",
"3 94 \n",
"4 91 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents \\\n",
"0 80 \n",
"1 94 \n",
"2 96 \n",
"3 95 \n",
"4 89 \n",
"\n",
" Car Insurance Premiums ($) \\\n",
"0 784.55 \n",
"1 1053.48 \n",
"2 899.47 \n",
"3 827.34 \n",
"4 878.41 \n",
"\n",
" Losses incurred by insurance companies for collisions per insured driver ($) \n",
"0 145.08 \n",
"1 133.93 \n",
"2 110.35 \n",
"3 142.39 \n",
"4 165.63 "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
@@ -84,7 +243,113 @@
"source": [
"**Data Overview**\n",
"\n",
"*✏️ Write 2-3 sentences describing this dataset. Be sure to include where the data comes from and what it contains.*"
"*✏️ Write 2-3 sentences describing this dataset. Be sure to include where the data comes from and what it contains.*\n",
"\n",
"I got the data set from FiveThirtyEight. It was used for an article called\n",
"\"Dear Mona, Which state has the worst drivers?\" in October 2014. The person who wrote the article is Mona Chalabi, they are a data editor at the Guardian US, \n",
"a columnist at New York Margazine, and a lead news writer for FiveThirtyEight.\n",
"\n",
"The date is about fatal collisions in each state. There are 7 rows, some of the rows\n",
"are about \"Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired\" and \"Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted\"\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "6ba44c9c-60d1-46a4-8257-b4e8eeea348d",
"metadata": {},
"source": [
"I will recategorise the data so that all of the states data will be separated into the five regions of the United States"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "f7bba5f3-5911-4a76-ad43-f6ce78cd4fb3",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>State</th>\n",
" <th>Number of drivers involved in fatal collisions per billion miles</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted</th>\n",
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents</th>\n",
" <th>Car Insurance Premiums ($)</th>\n",
" <th>Losses incurred by insurance companies for collisions per insured driver ($)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>32</th>\n",
" <td>New York</td>\n",
" <td>12.3</td>\n",
" <td>32</td>\n",
" <td>29</td>\n",
" <td>88</td>\n",
" <td>80</td>\n",
" <td>1234.31</td>\n",
" <td>150.01</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" State \\\n",
"32 New York \n",
"\n",
" Number of drivers involved in fatal collisions per billion miles \\\n",
"32 12.3 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding \\\n",
"32 32 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired \\\n",
"32 29 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted \\\n",
"32 88 \n",
"\n",
" Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents \\\n",
"32 80 \n",
"\n",
" Car Insurance Premiums ($) \\\n",
"32 1234.31 \n",
"\n",
" Losses incurred by insurance companies for collisions per insured driver ($) \n",
"32 150.01 "
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\n",
"Northeast = df[df.State == \"New York\"]\n",
"Northeast"
]
},
{
@@ -110,7 +375,7 @@
"id": "recognized-positive",
"metadata": {},
"source": [
"## First Research Question: [✏️ PUT YOUR QUESTION HERE ✏️]\n"
"## First Research Question: Is drinking and driving the biggest cause of fatal collisions?\\"
]
},
{
@@ -172,7 +437,7 @@
"id": "collectible-puppy",
"metadata": {},
"source": [
"## Second Research Question: [✏️ PUT YOUR QUESTION HERE ✏️]\n"
"## Second Research Question: What state is the most unluckiest state for fatel collisions?\n"
]
},
{
@@ -310,7 +575,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
"version": "3.13.7"
},
"toc": {
"base_numbering": 1,