{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "worldwide-blood",
   "metadata": {},
   "source": [
    "# Introduction"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "understanding-numbers",
   "metadata": {},
   "source": [
    "*✏️ Write 2-3 sentences describing your research.*\n",
    "\n",
    "It's a collection of data on the reasons fatal car crashes occur in every state of America, and it will be used to determine which region of America is the deadliest. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "greater-circular",
   "metadata": {},
   "source": [
    "## Overarching Question: What is the deadliest region in America to drive on?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "appreciated-testimony",
   "metadata": {},
   "source": [
    "*✏️ Write 2-3 sentences explaining why this question.*\n",
    "\n",
    "I am interested in this because I live on the Northeast Coast and we have a lot of car \n",
    "accidents. People drive very fast here. The roads are not always paved properly and maintained. I want to know if it's just bad luck when people get into accidents or if it's their own fault. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "permanent-pollution",
   "metadata": {},
   "source": [
    "# Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "technical-evans",
   "metadata": {},
   "outputs": [],
   "source": [
    "#Include any import statements you will need\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "overhead-sigma",
   "metadata": {},
   "outputs": [],
   "source": [
    "### 💻 FILL IN YOUR DATASET FILE NAME BELOW 💻 ###\n",
    "\n",
    "file_name = \"bad-drivers.csv\"\n",
    "dataset_path = \"data/\" + file_name\n",
    "\n",
    "df = pd.read_csv(dataset_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "heated-blade",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>State</th>\n",
       "      <th>Number of drivers involved in fatal collisions per billion miles</th>\n",
       "      <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding</th>\n",
       "      <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired</th>\n",
       "      <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted</th>\n",
       "      <th>Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents</th>\n",
       "      <th>Car Insurance Premiums ($)</th>\n",
       "      <th>Losses incurred by insurance companies for collisions per insured driver ($)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Alabama</td>\n",
       "      <td>18.8</td>\n",
       "      <td>39</td>\n",
       "      <td>30</td>\n",
       "      <td>96</td>\n",
       "      <td>80</td>\n",
       "      <td>784.55</td>\n",
       "      <td>145.08</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Alaska</td>\n",
       "      <td>18.1</td>\n",
       "      <td>41</td>\n",
       "      <td>25</td>\n",
       "      <td>90</td>\n",
       "      <td>94</td>\n",
       "      <td>1053.48</td>\n",
       "      <td>133.93</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Arizona</td>\n",
       "      <td>18.6</td>\n",
       "      <td>35</td>\n",
       "      <td>28</td>\n",
       "      <td>84</td>\n",
       "      <td>96</td>\n",
       "      <td>899.47</td>\n",
       "      <td>110.35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Arkansas</td>\n",
       "      <td>22.4</td>\n",
       "      <td>18</td>\n",
       "      <td>26</td>\n",
       "      <td>94</td>\n",
       "      <td>95</td>\n",
       "      <td>827.34</td>\n",
       "      <td>142.39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>California</td>\n",
       "      <td>12.0</td>\n",
       "      <td>35</td>\n",
       "      <td>28</td>\n",
       "      <td>91</td>\n",
       "      <td>89</td>\n",
       "      <td>878.41</td>\n",
       "      <td>165.63</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        State  \\\n",
       "0     Alabama   \n",
       "1      Alaska   \n",
       "2     Arizona   \n",
       "3    Arkansas   \n",
       "4  California   \n",
       "\n",
       "   Number of drivers involved in fatal collisions per billion miles  \\\n",
       "0                                               18.8                  \n",
       "1                                               18.1                  \n",
       "2                                               18.6                  \n",
       "3                                               22.4                  \n",
       "4                                               12.0                  \n",
       "\n",
       "   Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding  \\\n",
       "0                                                 39                      \n",
       "1                                                 41                      \n",
       "2                                                 35                      \n",
       "3                                                 18                      \n",
       "4                                                 35                      \n",
       "\n",
       "   Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired  \\\n",
       "0                                                 30                              \n",
       "1                                                 25                              \n",
       "2                                                 28                              \n",
       "3                                                 26                              \n",
       "4                                                 28                              \n",
       "\n",
       "   Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted  \\\n",
       "0                                                 96                            \n",
       "1                                                 90                            \n",
       "2                                                 84                            \n",
       "3                                                 94                            \n",
       "4                                                 91                            \n",
       "\n",
       "   Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents  \\\n",
       "0                                                 80                                                        \n",
       "1                                                 94                                                        \n",
       "2                                                 96                                                        \n",
       "3                                                 95                                                        \n",
       "4                                                 89                                                        \n",
       "\n",
       "   Car Insurance Premiums ($)  \\\n",
       "0                      784.55   \n",
       "1                     1053.48   \n",
       "2                      899.47   \n",
       "3                      827.34   \n",
       "4                      878.41   \n",
       "\n",
       "   Losses incurred by insurance companies for collisions per insured driver ($)  \n",
       "0                                             145.08                             \n",
       "1                                             133.93                             \n",
       "2                                             110.35                             \n",
       "3                                             142.39                             \n",
       "4                                             165.63                             "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "continental-franklin",
   "metadata": {},
   "source": [
    "**Data Overview**\n",
    "\n",
    "*✏️ Write 2-3 sentences describing this dataset. Be sure to include where the data comes from and what it contains.*\n",
    "\n",
    "I got the data set from FiveThirtyEight. It was used for an article called\n",
    "\"Dear Mona, Which state has the worst drivers?\" in October 2014. The person who wrote the article is Mona Chalabi, they are a data editor at the Guardian US, \n",
    "a columnist at New York Margazine, and a lead news writer for FiveThirtyEight.\n",
    "\n",
    "The date is about fatal collisions in each state. There are 7 rows, some of the rows\n",
    "are about \"Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired\" and \"Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted\"\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ba44c9c-60d1-46a4-8257-b4e8eeea348d",
   "metadata": {},
   "source": [
    "I will recategorise the data so that all of the states data will be separated into the five regions of the United States"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "f7bba5f3-5911-4a76-ad43-f6ce78cd4fb3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>State</th>\n",
       "      <th>Number of drivers involved in fatal collisions per billion miles</th>\n",
       "      <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding</th>\n",
       "      <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired</th>\n",
       "      <th>Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted</th>\n",
       "      <th>Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents</th>\n",
       "      <th>Car Insurance Premiums ($)</th>\n",
       "      <th>Losses incurred by insurance companies for collisions per insured driver ($)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>New York</td>\n",
       "      <td>12.3</td>\n",
       "      <td>32</td>\n",
       "      <td>29</td>\n",
       "      <td>88</td>\n",
       "      <td>80</td>\n",
       "      <td>1234.31</td>\n",
       "      <td>150.01</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       State  \\\n",
       "32  New York   \n",
       "\n",
       "    Number of drivers involved in fatal collisions per billion miles  \\\n",
       "32                                               12.3                  \n",
       "\n",
       "    Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding  \\\n",
       "32                                                 32                      \n",
       "\n",
       "    Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired  \\\n",
       "32                                                 29                              \n",
       "\n",
       "    Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted  \\\n",
       "32                                                 88                            \n",
       "\n",
       "    Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents  \\\n",
       "32                                                 80                                                        \n",
       "\n",
       "    Car Insurance Premiums ($)  \\\n",
       "32                     1234.31   \n",
       "\n",
       "    Losses incurred by insurance companies for collisions per insured driver ($)  \n",
       "32                                             150.01                             "
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\n",
    "Northeast = df[df.State == \"New York\"]\n",
    "Northeast"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "infinite-instrument",
   "metadata": {},
   "source": [
    "# Methods and Results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "basic-canadian",
   "metadata": {},
   "outputs": [],
   "source": [
    "#Import any helper files you need here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "recognized-positive",
   "metadata": {},
   "source": [
    "## First Research Question: Is drinking and driving the biggest cause of fatal collisions?\\"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "graduate-palmer",
   "metadata": {},
   "source": [
    "### Methods"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "endless-variation",
   "metadata": {},
   "source": [
    "*Explain how you will approach this research question below. Consider the following:* \n",
    "  - *Which aspects of the dataset will you use?* \n",
    "  - *How will you reorganize/store the data?* \n",
    "  - *What data science tools/functions will you use and why?* \n",
    "  \n",
    "✏️ *Write your answer below:*\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "portuguese-japan",
   "metadata": {},
   "source": [
    "### Results "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "negative-highlight",
   "metadata": {},
   "outputs": [],
   "source": [
    "#######################################################################\n",
    "### 💻 YOUR WORK GOES HERE TO ANSWER THE FIRST RESEARCH QUESTION 💻 \n",
    "### \n",
    "### Your data analysis may include a statistic and/or a data visualization\n",
    "#######################################################################"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "victorian-burning",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 💻 YOU CAN ADD NEW CELLS WITH THE \"+\" BUTTON "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "collectible-puppy",
   "metadata": {},
   "source": [
    "## Second Research Question: What state is the most unluckiest state for fatel collisions?\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "demographic-future",
   "metadata": {},
   "source": [
    "### Methods"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "incorporate-roller",
   "metadata": {},
   "source": [
    "*Explain how you will approach this research question below. Consider the following:* \n",
    "  - *Which aspects of the dataset will you use?* \n",
    "  - *How will you reorganize/store the data?* \n",
    "  - *What data science tools/functions will you use and why?* \n",
    "\n",
    "✏️ *Write your answer below:*\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "juvenile-creation",
   "metadata": {},
   "source": [
    "### Results "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "pursuant-surrey",
   "metadata": {},
   "outputs": [],
   "source": [
    "#######################################################################\n",
    "### 💻 YOUR WORK GOES HERE TO ANSWER THE SECOND RESEARCH QUESTION 💻 \n",
    "###\n",
    "### Your data analysis may include a statistic and/or a data visualization\n",
    "#######################################################################"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "located-night",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 💻 YOU CAN ADD NEW CELLS WITH THE \"+\" BUTTON "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "infectious-symbol",
   "metadata": {},
   "source": [
    "# Discussion"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "furnished-camping",
   "metadata": {
    "code_folding": []
   },
   "source": [
    "## Considerations"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bearing-stadium",
   "metadata": {},
   "source": [
    "*It's important to recognize the limitations of our research.\n",
    "Consider the following:*\n",
    "\n",
    "- *Do the results give an accurate depiction of your research question? Why or why not?*\n",
    "- *What were limitations of your datset?*\n",
    "- *Are there any known biases in the data?*\n",
    "\n",
    "✏️ *Write your answer below:*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "beneficial-invasion",
   "metadata": {},
   "source": [
    "## Summary"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "about-raise",
   "metadata": {},
   "source": [
    "*Summarize what you discovered through the research. Consider the following:*\n",
    "\n",
    "- *What did you learn about your media consumption/digital habits?*\n",
    "- *Did the results make sense?*\n",
    "- *What was most surprising?*\n",
    "- *How will this project impact you going forward?*\n",
    "\n",
    "✏️ *Write your answer below:*"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "cell_metadata_json": true,
   "text_representation": {
    "extension": ".Rmd",
    "format_name": "rmarkdown",
    "format_version": "1.2",
    "jupytext_version": "1.9.1"
   }
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.7"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}