generated from mwc/project_argument
entered in my data on bad drivers
now im trying to figure out how i can separate the states in to 5 regions
This commit is contained in:
625
.ipynb_checkpoints/argument-checkpoint.ipynb
Normal file
625
.ipynb_checkpoints/argument-checkpoint.ipynb
Normal file
@@ -0,0 +1,625 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "worldwide-blood",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Introduction"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "understanding-numbers",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"*✏️ Write 2-3 sentences describing your research.*\n",
|
||||
"\n",
|
||||
"It's a collection of data on the reasons fatal car crashes occur in every state of America, and it will be used to determine which region of America is the deadliest. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "greater-circular",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Overarching Question: What is the deadliest region in America to drive on?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "appreciated-testimony",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"*✏️ Write 2-3 sentences explaining why this question.*\n",
|
||||
"\n",
|
||||
"I am interested in this because I live on the Northeast Coast and we have a lot of car \n",
|
||||
"accidents. People drive very fast here. The roads are not always paved properly and maintained. I want to know if it's just bad luck when people get into accidents or if it's their own fault. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "permanent-pollution",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "technical-evans",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Include any import statements you will need\n",
|
||||
"import pandas as pd\n",
|
||||
"import matplotlib.pyplot as plt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "overhead-sigma",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"### 💻 FILL IN YOUR DATASET FILE NAME BELOW 💻 ###\n",
|
||||
"\n",
|
||||
"file_name = \"bad-drivers.csv\"\n",
|
||||
"dataset_path = \"data/\" + file_name\n",
|
||||
"\n",
|
||||
"df = pd.read_csv(dataset_path)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "heated-blade",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>State</th>\n",
|
||||
" <th>Number of drivers involved in fatal collisions per billion miles</th>\n",
|
||||
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding</th>\n",
|
||||
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired</th>\n",
|
||||
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted</th>\n",
|
||||
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents</th>\n",
|
||||
" <th>Car Insurance Premiums ($)</th>\n",
|
||||
" <th>Losses incurred by insurance companies for collisions per insured driver ($)</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>Alabama</td>\n",
|
||||
" <td>18.8</td>\n",
|
||||
" <td>39</td>\n",
|
||||
" <td>30</td>\n",
|
||||
" <td>96</td>\n",
|
||||
" <td>80</td>\n",
|
||||
" <td>784.55</td>\n",
|
||||
" <td>145.08</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>1</th>\n",
|
||||
" <td>Alaska</td>\n",
|
||||
" <td>18.1</td>\n",
|
||||
" <td>41</td>\n",
|
||||
" <td>25</td>\n",
|
||||
" <td>90</td>\n",
|
||||
" <td>94</td>\n",
|
||||
" <td>1053.48</td>\n",
|
||||
" <td>133.93</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>2</th>\n",
|
||||
" <td>Arizona</td>\n",
|
||||
" <td>18.6</td>\n",
|
||||
" <td>35</td>\n",
|
||||
" <td>28</td>\n",
|
||||
" <td>84</td>\n",
|
||||
" <td>96</td>\n",
|
||||
" <td>899.47</td>\n",
|
||||
" <td>110.35</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>3</th>\n",
|
||||
" <td>Arkansas</td>\n",
|
||||
" <td>22.4</td>\n",
|
||||
" <td>18</td>\n",
|
||||
" <td>26</td>\n",
|
||||
" <td>94</td>\n",
|
||||
" <td>95</td>\n",
|
||||
" <td>827.34</td>\n",
|
||||
" <td>142.39</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>4</th>\n",
|
||||
" <td>California</td>\n",
|
||||
" <td>12.0</td>\n",
|
||||
" <td>35</td>\n",
|
||||
" <td>28</td>\n",
|
||||
" <td>91</td>\n",
|
||||
" <td>89</td>\n",
|
||||
" <td>878.41</td>\n",
|
||||
" <td>165.63</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" State \\\n",
|
||||
"0 Alabama \n",
|
||||
"1 Alaska \n",
|
||||
"2 Arizona \n",
|
||||
"3 Arkansas \n",
|
||||
"4 California \n",
|
||||
"\n",
|
||||
" Number of drivers involved in fatal collisions per billion miles \\\n",
|
||||
"0 18.8 \n",
|
||||
"1 18.1 \n",
|
||||
"2 18.6 \n",
|
||||
"3 22.4 \n",
|
||||
"4 12.0 \n",
|
||||
"\n",
|
||||
" Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding \\\n",
|
||||
"0 39 \n",
|
||||
"1 41 \n",
|
||||
"2 35 \n",
|
||||
"3 18 \n",
|
||||
"4 35 \n",
|
||||
"\n",
|
||||
" Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired \\\n",
|
||||
"0 30 \n",
|
||||
"1 25 \n",
|
||||
"2 28 \n",
|
||||
"3 26 \n",
|
||||
"4 28 \n",
|
||||
"\n",
|
||||
" Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted \\\n",
|
||||
"0 96 \n",
|
||||
"1 90 \n",
|
||||
"2 84 \n",
|
||||
"3 94 \n",
|
||||
"4 91 \n",
|
||||
"\n",
|
||||
" Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents \\\n",
|
||||
"0 80 \n",
|
||||
"1 94 \n",
|
||||
"2 96 \n",
|
||||
"3 95 \n",
|
||||
"4 89 \n",
|
||||
"\n",
|
||||
" Car Insurance Premiums ($) \\\n",
|
||||
"0 784.55 \n",
|
||||
"1 1053.48 \n",
|
||||
"2 899.47 \n",
|
||||
"3 827.34 \n",
|
||||
"4 878.41 \n",
|
||||
"\n",
|
||||
" Losses incurred by insurance companies for collisions per insured driver ($) \n",
|
||||
"0 145.08 \n",
|
||||
"1 133.93 \n",
|
||||
"2 110.35 \n",
|
||||
"3 142.39 \n",
|
||||
"4 165.63 "
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"df.head()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "continental-franklin",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Data Overview**\n",
|
||||
"\n",
|
||||
"*✏️ Write 2-3 sentences describing this dataset. Be sure to include where the data comes from and what it contains.*\n",
|
||||
"\n",
|
||||
"I got the data set from FiveThirtyEight. It was used for an article called\n",
|
||||
"\"Dear Mona, Which state has the worst drivers?\" in October 2014. The person who wrote the article is Mona Chalabi, they are a data editor at the Guardian US, \n",
|
||||
"a columnist at New York Margazine, and a lead news writer for FiveThirtyEight.\n",
|
||||
"\n",
|
||||
"The date is about fatal collisions in each state. There are 7 rows, some of the rows\n",
|
||||
"are about \"Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired\" and \"Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted\"\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6ba44c9c-60d1-46a4-8257-b4e8eeea348d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"I will recategorise the data so that all of the states data will be separated into the five regions of the United States"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 40,
|
||||
"id": "f7bba5f3-5911-4a76-ad43-f6ce78cd4fb3",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>State</th>\n",
|
||||
" <th>Number of drivers involved in fatal collisions per billion miles</th>\n",
|
||||
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding</th>\n",
|
||||
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired</th>\n",
|
||||
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted</th>\n",
|
||||
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents</th>\n",
|
||||
" <th>Car Insurance Premiums ($)</th>\n",
|
||||
" <th>Losses incurred by insurance companies for collisions per insured driver ($)</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>32</th>\n",
|
||||
" <td>New York</td>\n",
|
||||
" <td>12.3</td>\n",
|
||||
" <td>32</td>\n",
|
||||
" <td>29</td>\n",
|
||||
" <td>88</td>\n",
|
||||
" <td>80</td>\n",
|
||||
" <td>1234.31</td>\n",
|
||||
" <td>150.01</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" State \\\n",
|
||||
"32 New York \n",
|
||||
"\n",
|
||||
" Number of drivers involved in fatal collisions per billion miles \\\n",
|
||||
"32 12.3 \n",
|
||||
"\n",
|
||||
" Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding \\\n",
|
||||
"32 32 \n",
|
||||
"\n",
|
||||
" Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired \\\n",
|
||||
"32 29 \n",
|
||||
"\n",
|
||||
" Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted \\\n",
|
||||
"32 88 \n",
|
||||
"\n",
|
||||
" Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents \\\n",
|
||||
"32 80 \n",
|
||||
"\n",
|
||||
" Car Insurance Premiums ($) \\\n",
|
||||
"32 1234.31 \n",
|
||||
"\n",
|
||||
" Losses incurred by insurance companies for collisions per insured driver ($) \n",
|
||||
"32 150.01 "
|
||||
]
|
||||
},
|
||||
"execution_count": 40,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"\n",
|
||||
"Northeast = df[df.State == \"New York\"]\n",
|
||||
"Northeast"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "infinite-instrument",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Methods and Results"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "basic-canadian",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Import any helper files you need here"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "recognized-positive",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## First Research Question: Is drinking and driving the biggest cause of fatal collisions?\\"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "graduate-palmer",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Methods"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "endless-variation",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"*Explain how you will approach this research question below. Consider the following:* \n",
|
||||
" - *Which aspects of the dataset will you use?* \n",
|
||||
" - *How will you reorganize/store the data?* \n",
|
||||
" - *What data science tools/functions will you use and why?* \n",
|
||||
" \n",
|
||||
"✏️ *Write your answer below:*\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "portuguese-japan",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Results "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"id": "negative-highlight",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#######################################################################\n",
|
||||
"### 💻 YOUR WORK GOES HERE TO ANSWER THE FIRST RESEARCH QUESTION 💻 \n",
|
||||
"### \n",
|
||||
"### Your data analysis may include a statistic and/or a data visualization\n",
|
||||
"#######################################################################"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"id": "victorian-burning",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# 💻 YOU CAN ADD NEW CELLS WITH THE \"+\" BUTTON "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "collectible-puppy",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Second Research Question: What state is the most unluckiest state for fatel collisions?\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "demographic-future",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Methods"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "incorporate-roller",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"*Explain how you will approach this research question below. Consider the following:* \n",
|
||||
" - *Which aspects of the dataset will you use?* \n",
|
||||
" - *How will you reorganize/store the data?* \n",
|
||||
" - *What data science tools/functions will you use and why?* \n",
|
||||
"\n",
|
||||
"✏️ *Write your answer below:*\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "juvenile-creation",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Results "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"id": "pursuant-surrey",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#######################################################################\n",
|
||||
"### 💻 YOUR WORK GOES HERE TO ANSWER THE SECOND RESEARCH QUESTION 💻 \n",
|
||||
"###\n",
|
||||
"### Your data analysis may include a statistic and/or a data visualization\n",
|
||||
"#######################################################################"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"id": "located-night",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# 💻 YOU CAN ADD NEW CELLS WITH THE \"+\" BUTTON "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "infectious-symbol",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Discussion"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "furnished-camping",
|
||||
"metadata": {
|
||||
"code_folding": []
|
||||
},
|
||||
"source": [
|
||||
"## Considerations"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bearing-stadium",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"*It's important to recognize the limitations of our research.\n",
|
||||
"Consider the following:*\n",
|
||||
"\n",
|
||||
"- *Do the results give an accurate depiction of your research question? Why or why not?*\n",
|
||||
"- *What were limitations of your datset?*\n",
|
||||
"- *Are there any known biases in the data?*\n",
|
||||
"\n",
|
||||
"✏️ *Write your answer below:*"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "beneficial-invasion",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Summary"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "about-raise",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"*Summarize what you discovered through the research. Consider the following:*\n",
|
||||
"\n",
|
||||
"- *What did you learn about your media consumption/digital habits?*\n",
|
||||
"- *Did the results make sense?*\n",
|
||||
"- *What was most surprising?*\n",
|
||||
"- *How will this project impact you going forward?*\n",
|
||||
"\n",
|
||||
"✏️ *Write your answer below:*"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"jupytext": {
|
||||
"cell_metadata_json": true,
|
||||
"text_representation": {
|
||||
"extension": ".Rmd",
|
||||
"format_name": "rmarkdown",
|
||||
"format_version": "1.2",
|
||||
"jupytext_version": "1.9.1"
|
||||
}
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.13.7"
|
||||
},
|
||||
"toc": {
|
||||
"base_numbering": 1,
|
||||
"nav_menu": {},
|
||||
"number_sections": false,
|
||||
"sideBar": true,
|
||||
"skip_h1_title": false,
|
||||
"title_cell": "Table of Contents",
|
||||
"title_sidebar": "Contents",
|
||||
"toc_cell": false,
|
||||
"toc_position": {},
|
||||
"toc_section_display": true,
|
||||
"toc_window_display": false
|
||||
},
|
||||
"varInspector": {
|
||||
"cols": {
|
||||
"lenName": 16,
|
||||
"lenType": 16,
|
||||
"lenVar": 40
|
||||
},
|
||||
"kernels_config": {
|
||||
"python": {
|
||||
"delete_cmd_postfix": "",
|
||||
"delete_cmd_prefix": "del ",
|
||||
"library": "var_list.py",
|
||||
"varRefreshCmd": "print(var_dic_list())"
|
||||
},
|
||||
"r": {
|
||||
"delete_cmd_postfix": ") ",
|
||||
"delete_cmd_prefix": "rm(",
|
||||
"library": "var_list.r",
|
||||
"varRefreshCmd": "cat(var_dic_list()) "
|
||||
}
|
||||
},
|
||||
"types_to_exclude": [
|
||||
"module",
|
||||
"function",
|
||||
"builtin_function_or_method",
|
||||
"instance",
|
||||
"_Feature"
|
||||
],
|
||||
"window_display": false
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
289
argument.ipynb
289
argument.ipynb
@@ -13,7 +13,9 @@
|
||||
"id": "understanding-numbers",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"*✏️ Write 2-3 sentences describing your research.*"
|
||||
"*✏️ Write 2-3 sentences describing your research.*\n",
|
||||
"\n",
|
||||
"It's a collection of data on the reasons fatal car crashes occur in every state of America, and it will be used to determine which region of America is the deadliest. "
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -21,7 +23,7 @@
|
||||
"id": "greater-circular",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Overarching Question: [✏️ PUT YOUR QUESTION HERE ✏️]"
|
||||
"## Overarching Question: What is the deadliest region in America to drive on?"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -29,7 +31,10 @@
|
||||
"id": "appreciated-testimony",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"*✏️ Write 2-3 sentences explaining why this question.*"
|
||||
"*✏️ Write 2-3 sentences explaining why this question.*\n",
|
||||
"\n",
|
||||
"I am interested in this because I live on the Northeast Coast and we have a lot of car \n",
|
||||
"accidents. People drive very fast here. The roads are not always paved properly and maintained. I want to know if it's just bad luck when people get into accidents or if it's their own fault. "
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -42,7 +47,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 3,
|
||||
"id": "technical-evans",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@@ -54,14 +59,14 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 5,
|
||||
"id": "overhead-sigma",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"### 💻 FILL IN YOUR DATASET FILE NAME BELOW 💻 ###\n",
|
||||
"\n",
|
||||
"file_name = \"YOUR_DATASET_FILE_NAME.csv\"\n",
|
||||
"file_name = \"bad-drivers.csv\"\n",
|
||||
"dataset_path = \"data/\" + file_name\n",
|
||||
"\n",
|
||||
"df = pd.read_csv(dataset_path)"
|
||||
@@ -69,10 +74,164 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 8,
|
||||
"id": "heated-blade",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>State</th>\n",
|
||||
" <th>Number of drivers involved in fatal collisions per billion miles</th>\n",
|
||||
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding</th>\n",
|
||||
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired</th>\n",
|
||||
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted</th>\n",
|
||||
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents</th>\n",
|
||||
" <th>Car Insurance Premiums ($)</th>\n",
|
||||
" <th>Losses incurred by insurance companies for collisions per insured driver ($)</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>0</th>\n",
|
||||
" <td>Alabama</td>\n",
|
||||
" <td>18.8</td>\n",
|
||||
" <td>39</td>\n",
|
||||
" <td>30</td>\n",
|
||||
" <td>96</td>\n",
|
||||
" <td>80</td>\n",
|
||||
" <td>784.55</td>\n",
|
||||
" <td>145.08</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>1</th>\n",
|
||||
" <td>Alaska</td>\n",
|
||||
" <td>18.1</td>\n",
|
||||
" <td>41</td>\n",
|
||||
" <td>25</td>\n",
|
||||
" <td>90</td>\n",
|
||||
" <td>94</td>\n",
|
||||
" <td>1053.48</td>\n",
|
||||
" <td>133.93</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>2</th>\n",
|
||||
" <td>Arizona</td>\n",
|
||||
" <td>18.6</td>\n",
|
||||
" <td>35</td>\n",
|
||||
" <td>28</td>\n",
|
||||
" <td>84</td>\n",
|
||||
" <td>96</td>\n",
|
||||
" <td>899.47</td>\n",
|
||||
" <td>110.35</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>3</th>\n",
|
||||
" <td>Arkansas</td>\n",
|
||||
" <td>22.4</td>\n",
|
||||
" <td>18</td>\n",
|
||||
" <td>26</td>\n",
|
||||
" <td>94</td>\n",
|
||||
" <td>95</td>\n",
|
||||
" <td>827.34</td>\n",
|
||||
" <td>142.39</td>\n",
|
||||
" </tr>\n",
|
||||
" <tr>\n",
|
||||
" <th>4</th>\n",
|
||||
" <td>California</td>\n",
|
||||
" <td>12.0</td>\n",
|
||||
" <td>35</td>\n",
|
||||
" <td>28</td>\n",
|
||||
" <td>91</td>\n",
|
||||
" <td>89</td>\n",
|
||||
" <td>878.41</td>\n",
|
||||
" <td>165.63</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" State \\\n",
|
||||
"0 Alabama \n",
|
||||
"1 Alaska \n",
|
||||
"2 Arizona \n",
|
||||
"3 Arkansas \n",
|
||||
"4 California \n",
|
||||
"\n",
|
||||
" Number of drivers involved in fatal collisions per billion miles \\\n",
|
||||
"0 18.8 \n",
|
||||
"1 18.1 \n",
|
||||
"2 18.6 \n",
|
||||
"3 22.4 \n",
|
||||
"4 12.0 \n",
|
||||
"\n",
|
||||
" Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding \\\n",
|
||||
"0 39 \n",
|
||||
"1 41 \n",
|
||||
"2 35 \n",
|
||||
"3 18 \n",
|
||||
"4 35 \n",
|
||||
"\n",
|
||||
" Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired \\\n",
|
||||
"0 30 \n",
|
||||
"1 25 \n",
|
||||
"2 28 \n",
|
||||
"3 26 \n",
|
||||
"4 28 \n",
|
||||
"\n",
|
||||
" Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted \\\n",
|
||||
"0 96 \n",
|
||||
"1 90 \n",
|
||||
"2 84 \n",
|
||||
"3 94 \n",
|
||||
"4 91 \n",
|
||||
"\n",
|
||||
" Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents \\\n",
|
||||
"0 80 \n",
|
||||
"1 94 \n",
|
||||
"2 96 \n",
|
||||
"3 95 \n",
|
||||
"4 89 \n",
|
||||
"\n",
|
||||
" Car Insurance Premiums ($) \\\n",
|
||||
"0 784.55 \n",
|
||||
"1 1053.48 \n",
|
||||
"2 899.47 \n",
|
||||
"3 827.34 \n",
|
||||
"4 878.41 \n",
|
||||
"\n",
|
||||
" Losses incurred by insurance companies for collisions per insured driver ($) \n",
|
||||
"0 145.08 \n",
|
||||
"1 133.93 \n",
|
||||
"2 110.35 \n",
|
||||
"3 142.39 \n",
|
||||
"4 165.63 "
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"df.head()"
|
||||
]
|
||||
@@ -84,7 +243,113 @@
|
||||
"source": [
|
||||
"**Data Overview**\n",
|
||||
"\n",
|
||||
"*✏️ Write 2-3 sentences describing this dataset. Be sure to include where the data comes from and what it contains.*"
|
||||
"*✏️ Write 2-3 sentences describing this dataset. Be sure to include where the data comes from and what it contains.*\n",
|
||||
"\n",
|
||||
"I got the data set from FiveThirtyEight. It was used for an article called\n",
|
||||
"\"Dear Mona, Which state has the worst drivers?\" in October 2014. The person who wrote the article is Mona Chalabi, they are a data editor at the Guardian US, \n",
|
||||
"a columnist at New York Margazine, and a lead news writer for FiveThirtyEight.\n",
|
||||
"\n",
|
||||
"The date is about fatal collisions in each state. There are 7 rows, some of the rows\n",
|
||||
"are about \"Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired\" and \"Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted\"\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6ba44c9c-60d1-46a4-8257-b4e8eeea348d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"I will recategorise the data so that all of the states data will be separated into the five regions of the United States"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 40,
|
||||
"id": "f7bba5f3-5911-4a76-ad43-f6ce78cd4fb3",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<div>\n",
|
||||
"<style scoped>\n",
|
||||
" .dataframe tbody tr th:only-of-type {\n",
|
||||
" vertical-align: middle;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe tbody tr th {\n",
|
||||
" vertical-align: top;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .dataframe thead th {\n",
|
||||
" text-align: right;\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<table border=\"1\" class=\"dataframe\">\n",
|
||||
" <thead>\n",
|
||||
" <tr style=\"text-align: right;\">\n",
|
||||
" <th></th>\n",
|
||||
" <th>State</th>\n",
|
||||
" <th>Number of drivers involved in fatal collisions per billion miles</th>\n",
|
||||
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding</th>\n",
|
||||
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired</th>\n",
|
||||
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted</th>\n",
|
||||
" <th>Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents</th>\n",
|
||||
" <th>Car Insurance Premiums ($)</th>\n",
|
||||
" <th>Losses incurred by insurance companies for collisions per insured driver ($)</th>\n",
|
||||
" </tr>\n",
|
||||
" </thead>\n",
|
||||
" <tbody>\n",
|
||||
" <tr>\n",
|
||||
" <th>32</th>\n",
|
||||
" <td>New York</td>\n",
|
||||
" <td>12.3</td>\n",
|
||||
" <td>32</td>\n",
|
||||
" <td>29</td>\n",
|
||||
" <td>88</td>\n",
|
||||
" <td>80</td>\n",
|
||||
" <td>1234.31</td>\n",
|
||||
" <td>150.01</td>\n",
|
||||
" </tr>\n",
|
||||
" </tbody>\n",
|
||||
"</table>\n",
|
||||
"</div>"
|
||||
],
|
||||
"text/plain": [
|
||||
" State \\\n",
|
||||
"32 New York \n",
|
||||
"\n",
|
||||
" Number of drivers involved in fatal collisions per billion miles \\\n",
|
||||
"32 12.3 \n",
|
||||
"\n",
|
||||
" Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding \\\n",
|
||||
"32 32 \n",
|
||||
"\n",
|
||||
" Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired \\\n",
|
||||
"32 29 \n",
|
||||
"\n",
|
||||
" Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted \\\n",
|
||||
"32 88 \n",
|
||||
"\n",
|
||||
" Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents \\\n",
|
||||
"32 80 \n",
|
||||
"\n",
|
||||
" Car Insurance Premiums ($) \\\n",
|
||||
"32 1234.31 \n",
|
||||
"\n",
|
||||
" Losses incurred by insurance companies for collisions per insured driver ($) \n",
|
||||
"32 150.01 "
|
||||
]
|
||||
},
|
||||
"execution_count": 40,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"\n",
|
||||
"Northeast = df[df.State == \"New York\"]\n",
|
||||
"Northeast"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -110,7 +375,7 @@
|
||||
"id": "recognized-positive",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## First Research Question: [✏️ PUT YOUR QUESTION HERE ✏️]\n"
|
||||
"## First Research Question: Is drinking and driving the biggest cause of fatal collisions?\\"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -172,7 +437,7 @@
|
||||
"id": "collectible-puppy",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Second Research Question: [✏️ PUT YOUR QUESTION HERE ✏️]\n"
|
||||
"## Second Research Question: What state is the most unluckiest state for fatel collisions?\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -310,7 +575,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.7"
|
||||
"version": "3.13.7"
|
||||
},
|
||||
"toc": {
|
||||
"base_numbering": 1,
|
||||
|
||||
52
data/bad-drivers.csv
Normal file
52
data/bad-drivers.csv
Normal file
@@ -0,0 +1,52 @@
|
||||
State,Number of drivers involved in fatal collisions per billion miles,Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding,Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired,Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted,Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents,Car Insurance Premiums ($),Losses incurred by insurance companies for collisions per insured driver ($)
|
||||
Alabama,18.8,39,30,96,80,784.55,145.08
|
||||
Alaska,18.1,41,25,90,94,1053.48,133.93
|
||||
Arizona,18.6,35,28,84,96,899.47,110.35
|
||||
Arkansas,22.4,18,26,94,95,827.34,142.39
|
||||
California,12,35,28,91,89,878.41,165.63
|
||||
Colorado,13.6,37,28,79,95,835.5,139.91
|
||||
Connecticut,10.8,46,36,87,82,1068.73,167.02
|
||||
Delaware,16.2,38,30,87,99,1137.87,151.48
|
||||
District of Columbia,5.9,34,27,100,100,1273.89,136.05
|
||||
Florida,17.9,21,29,92,94,1160.13,144.18
|
||||
Georgia,15.6,19,25,95,93,913.15,142.8
|
||||
Hawaii,17.5,54,41,82,87,861.18,120.92
|
||||
Idaho,15.3,36,29,85,98,641.96,82.75
|
||||
Illinois,12.8,36,34,94,96,803.11,139.15
|
||||
Indiana,14.5,25,29,95,95,710.46,108.92
|
||||
Iowa,15.7,17,25,97,87,649.06,114.47
|
||||
Kansas,17.8,27,24,77,85,780.45,133.8
|
||||
Kentucky,21.4,19,23,78,76,872.51,137.13
|
||||
Louisiana,20.5,35,33,73,98,1281.55,194.78
|
||||
Maine,15.1,38,30,87,84,661.88,96.57
|
||||
Maryland,12.5,34,32,71,99,1048.78,192.7
|
||||
Massachusetts,8.2,23,35,87,80,1011.14,135.63
|
||||
Michigan,14.1,24,28,95,77,1110.61,152.26
|
||||
Minnesota,9.6,23,29,88,88,777.18,133.35
|
||||
Mississippi,17.6,15,31,10,100,896.07,155.77
|
||||
Missouri,16.1,43,34,92,84,790.32,144.45
|
||||
Montana,21.4,39,44,84,85,816.21,85.15
|
||||
Nebraska,14.9,13,35,93,90,732.28,114.82
|
||||
Nevada,14.7,37,32,95,99,1029.87,138.71
|
||||
New Hampshire,11.6,35,30,87,83,746.54,120.21
|
||||
New Jersey,11.2,16,28,86,78,1301.52,159.85
|
||||
New Mexico,18.4,19,27,67,98,869.85,120.75
|
||||
New York,12.3,32,29,88,80,1234.31,150.01
|
||||
North Carolina,16.8,39,31,94,81,708.24,127.82
|
||||
North Dakota,23.9,23,42,99,86,688.75,109.72
|
||||
Ohio,14.1,28,34,99,82,697.73,133.52
|
||||
Oklahoma,19.9,32,29,92,94,881.51,178.86
|
||||
Oregon,12.8,33,26,67,90,804.71,104.61
|
||||
Pennsylvania,18.2,50,31,96,88,905.99,153.86
|
||||
Rhode Island,11.1,34,38,92,79,1148.99,148.58
|
||||
South Carolina,23.9,38,41,96,81,858.97,116.29
|
||||
South Dakota,19.4,31,33,98,86,669.31,96.87
|
||||
Tennessee,19.5,21,29,82,81,767.91,155.57
|
||||
Texas,19.4,40,38,91,87,1004.75,156.83
|
||||
Utah,11.3,43,16,88,96,809.38,109.48
|
||||
Vermont,13.6,30,30,96,95,716.2,109.61
|
||||
Virginia,12.7,19,27,87,88,768.95,153.72
|
||||
Washington,10.6,42,33,82,86,890.03,111.62
|
||||
West Virginia,23.8,34,28,97,87,992.61,152.56
|
||||
Wisconsin,13.8,36,33,39,84,670.31,106.62
|
||||
Wyoming,17.4,42,32,81,90,791.14,122.04
|
||||
|
Reference in New Issue
Block a user