project_argument/argument.ipynb

753 lines
64 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "worldwide-blood",
"metadata": {},
"source": [
"# Introduction"
]
},
{
"cell_type": "markdown",
"id": "understanding-numbers",
"metadata": {},
"source": [
"*✏️ Write 2-3 sentences describing your research.*"
]
},
{
"cell_type": "markdown",
"id": "greater-circular",
"metadata": {},
"source": [
"## Overarching Question: [✏️ PUT YOUR QUESTION HERE ✏️]"
]
},
{
"cell_type": "markdown",
"id": "appreciated-testimony",
"metadata": {},
"source": [
"*✏️ Write 2-3 sentences explaining why this question.*"
]
},
{
"cell_type": "markdown",
"id": "permanent-pollution",
"metadata": {},
"source": [
"# Data"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "technical-evans",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "overhead-sigma",
"metadata": {},
"outputs": [],
"source": [
"file_name = \"US_births_2000-2014_SSA.csv\"\n",
"dataset_path = \"data/\" + file_name\n",
"\n",
"df = pd.read_csv(dataset_path)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "heated-blade",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>month</th>\n",
" <th>date_of_month</th>\n",
" <th>day_of_week</th>\n",
" <th>births</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2000</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>6</td>\n",
" <td>9083</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2000</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>7</td>\n",
" <td>8006</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2000</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>11363</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2000</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>2</td>\n",
" <td>13032</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2000</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>3</td>\n",
" <td>12558</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" year month date_of_month day_of_week births\n",
"0 2000 1 1 6 9083\n",
"1 2000 1 2 7 8006\n",
"2 2000 1 3 1 11363\n",
"3 2000 1 4 2 13032\n",
"4 2000 1 5 3 12558"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"id": "continental-franklin",
"metadata": {},
"source": [
"**Data Overview**\n",
"\n",
"*This data set contains U.S. births data for the years 2000 to 2014, as provided by the Social Security Administration. The columns include the year, month, date of month, day of weeks, and the number of births on each one of these days.*"
]
},
{
"cell_type": "markdown",
"id": "infinite-instrument",
"metadata": {},
"source": [
"# Methods and Results"
]
},
{
"cell_type": "markdown",
"id": "recognized-positive",
"metadata": {},
"source": [
"## First Research Question: On average, are less children born on the 13th of the month?\n"
]
},
{
"cell_type": "markdown",
"id": "graduate-palmer",
"metadata": {},
"source": [
"### Methods"
]
},
{
"cell_type": "markdown",
"id": "endless-variation",
"metadata": {},
"source": [
"In order to research this question, I will use the aspects of date_of_month and births. \n",
"\n",
"In order to reorganize the data, I will have to create a new chart that will look at the average of each day individually. To do this I will have to regroup the data by the date_of_month and then find the mean for each date.\n",
"\n",
"I will then create a barplot using this new data to show a visual representation of the average number of births for each numbered day of the month. \n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "portuguese-japan",
"metadata": {},
"source": [
"### Results "
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "negative-highlight",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"date_of_month\n",
"1 11131.261111\n",
"2 11280.261111\n",
"3 11346.894444\n",
"4 11137.694444\n",
"5 11312.138889\n",
"6 11320.716667\n",
"7 11463.422222\n",
"8 11453.622222\n",
"9 11358.888889\n",
"10 11478.633333\n",
"11 11411.655556\n",
"12 11513.794444\n",
"13 11111.466667\n",
"14 11534.950000\n",
"15 11483.327778\n",
"16 11436.950000\n",
"17 11508.733333\n",
"18 11542.627778\n",
"19 11474.044444\n",
"20 11573.594444\n",
"21 11551.100000\n",
"22 11394.511111\n",
"23 11241.972222\n",
"24 11073.350000\n",
"25 10958.522222\n",
"26 11118.394444\n",
"27 11308.238889\n",
"28 11397.377778\n",
"29 11354.822485\n",
"30 11393.484848\n",
"31 11074.933333\n",
"Name: births, dtype: float64"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"average_births = df.groupby('date_of_month')['births'].mean()\n",
"average_births"
]
},
{
"cell_type": "code",
"execution_count": 45,
"id": "victorian-burning",
"metadata": {},
"outputs": [],
"source": [
"import seaborn as sns \n",
"sns.set_theme()"
]
},
{
"cell_type": "code",
"execution_count": 46,
"id": "affd0a87-f44b-47d9-829f-7015f815f3ba",
"metadata": {},
"outputs": [],
"source": [
"average_births_df = average_births.reset_index()\n",
"average_births_df.columns = [\"date_of_month\", \"average_births\"]"
]
},
{
"cell_type": "code",
"execution_count": 53,
"id": "7ea67a24-008f-45e6-9082-96a13804b45f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<Axes: xlabel='date_of_month', ylabel='Count'>"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.histplot(data=average_births_df, x=\"date_of_month\", weights=\"average_births\", bins=31)"
]
},
{
"cell_type": "markdown",
"id": "69a032ce-31ec-4613-8f21-d31d52d02fa2",
"metadata": {},
"source": [
"Looking at these results, the average number of births for each date of the month is within 1,000 births of each other, which is not a huge difference, but the 13th of each month is on the lower end. The only 2 dates of the month that has a lower average are the 25th and the 31st. Overall, there are less babies being born on the 13th than the majority of other dates"
]
},
{
"cell_type": "markdown",
"id": "collectible-puppy",
"metadata": {},
"source": [
"## Second Research Question: Of babies being born on the 13th of the month, which day of the week is the least common on average?\n"
]
},
{
"cell_type": "markdown",
"id": "demographic-future",
"metadata": {},
"source": [
"### Methods"
]
},
{
"cell_type": "markdown",
"id": "incorporate-roller",
"metadata": {},
"source": [
"I will now use three different columns to organize and show my data: date_of_month, day_of_week, and births. \n",
"First, i will have to organize my data by extracting only the data with a date_of_month equal to 13. This will then allow me to look at the average day_of_week for this new data set, which I will then turn into a bar plot in order to easily see which day_of_week is the most and least common. "
]
},
{
"cell_type": "markdown",
"id": "juvenile-creation",
"metadata": {},
"source": [
"### Results "
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "pursuant-surrey",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 False\n",
"1 False\n",
"2 False\n",
"3 False\n",
"4 False\n",
" ... \n",
"5474 False\n",
"5475 False\n",
"5476 False\n",
"5477 False\n",
"5478 False\n",
"Name: date_of_month, Length: 5479, dtype: bool"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.date_of_month==13"
]
},
{
"cell_type": "code",
"execution_count": 48,
"id": "located-night",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>year</th>\n",
" <th>month</th>\n",
" <th>date_of_month</th>\n",
" <th>day_of_week</th>\n",
" <th>births</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>2000</td>\n",
" <td>1</td>\n",
" <td>13</td>\n",
" <td>4</td>\n",
" <td>11815</td>\n",
" </tr>\n",
" <tr>\n",
" <th>43</th>\n",
" <td>2000</td>\n",
" <td>2</td>\n",
" <td>13</td>\n",
" <td>7</td>\n",
" <td>7933</td>\n",
" </tr>\n",
" <tr>\n",
" <th>72</th>\n",
" <td>2000</td>\n",
" <td>3</td>\n",
" <td>13</td>\n",
" <td>1</td>\n",
" <td>11157</td>\n",
" </tr>\n",
" <tr>\n",
" <th>103</th>\n",
" <td>2000</td>\n",
" <td>4</td>\n",
" <td>13</td>\n",
" <td>4</td>\n",
" <td>11907</td>\n",
" </tr>\n",
" <tr>\n",
" <th>133</th>\n",
" <td>2000</td>\n",
" <td>5</td>\n",
" <td>13</td>\n",
" <td>6</td>\n",
" <td>8747</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5338</th>\n",
" <td>2014</td>\n",
" <td>8</td>\n",
" <td>13</td>\n",
" <td>3</td>\n",
" <td>12817</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5369</th>\n",
" <td>2014</td>\n",
" <td>9</td>\n",
" <td>13</td>\n",
" <td>6</td>\n",
" <td>8903</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5399</th>\n",
" <td>2014</td>\n",
" <td>10</td>\n",
" <td>13</td>\n",
" <td>1</td>\n",
" <td>11241</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5430</th>\n",
" <td>2014</td>\n",
" <td>11</td>\n",
" <td>13</td>\n",
" <td>4</td>\n",
" <td>12103</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5460</th>\n",
" <td>2014</td>\n",
" <td>12</td>\n",
" <td>13</td>\n",
" <td>6</td>\n",
" <td>8596</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>180 rows × 5 columns</p>\n",
"</div>"
],
"text/plain": [
" year month date_of_month day_of_week births\n",
"12 2000 1 13 4 11815\n",
"43 2000 2 13 7 7933\n",
"72 2000 3 13 1 11157\n",
"103 2000 4 13 4 11907\n",
"133 2000 5 13 6 8747\n",
"... ... ... ... ... ...\n",
"5338 2014 8 13 3 12817\n",
"5369 2014 9 13 6 8903\n",
"5399 2014 10 13 1 11241\n",
"5430 2014 11 13 4 12103\n",
"5460 2014 12 13 6 8596\n",
"\n",
"[180 rows x 5 columns]"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"thirteen=df[df.date_of_month==13]\n",
"thirteen"
]
},
{
"cell_type": "code",
"execution_count": 51,
"id": "4e2a89c4-5686-4865-94e6-ac30f5366e0d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"day_of_week\n",
"1 11805.541667\n",
"2 12865.000000\n",
"3 12736.040000\n",
"4 12563.714286\n",
"5 11949.960000\n",
"6 8550.760000\n",
"7 7390.296296\n",
"Name: births, dtype: float64"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"average_day = thirteen.groupby('day_of_week')['births'].mean()\n",
"average_day"
]
},
{
"cell_type": "code",
"execution_count": 59,
"id": "75b165fc-375b-4144-bf86-f85f04653cc3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<Axes: xlabel='day_of_week', ylabel='Count'>"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.histplot(data=thirteen, x=\"day_of_week\", weights=\"births\", bins=7)"
]
},
{
"cell_type": "markdown",
"id": "97dbf4e7-d3d2-4320-be82-0b0df1e47369",
"metadata": {},
"source": [
"Based on the chart and graph I have created here, it is obvious that Friday is not the least common day for children to be born on the 13th. It sits right around the middle which proves that the myth of less children being born on Friday the 13th to be false."
]
},
{
"cell_type": "markdown",
"id": "infectious-symbol",
"metadata": {},
"source": [
"# Discussion"
]
},
{
"cell_type": "markdown",
"id": "furnished-camping",
"metadata": {
"code_folding": []
},
"source": [
"## Considerations"
]
},
{
"cell_type": "markdown",
"id": "bearing-stadium",
"metadata": {},
"source": [
"I think that my results give a fairly accurate depiction of my research questions because it looks at the births in the U.S. over a series of 14 years. I believe this is a large enough sample size to make the assumptions we have made. If we wanted an even better example of this data and to expand upon the research we have a few different options:\n",
"1. We could look at the data over a longer period of time to see if that has any impact.\n",
"2. We could look at the data in terms of locations to see if the superstition exists more in different areas of the country.\n",
"3. We could expand the data to other countries to see if this superstition shows in their data.\n",
"\n",
"The fact that our data was only taken from the early 2000s in the United States could potentially show a bias. "
]
},
{
"cell_type": "markdown",
"id": "beneficial-invasion",
"metadata": {},
"source": [
"## Summary"
]
},
{
"cell_type": "markdown",
"id": "about-raise",
"metadata": {},
"source": [
"I believe that the results did make sense based on my predictions but doesn't follow the traditions of the superstition of Friday the 13th. It is often assumed that mothers avoid having their children on Friday the 13th as it comes with bad luck, but it is often impossible to plan ahead for such specifics when it comes to birth. Our results don't lead us to any conclusions where Friday the 13th is uncommon to the point where it is statistically significant. The date of the month is on the lower end, but only by a few hundred births. The date of the month data was very evenly distributed to where no one date was very different from another. In terms of day of the week, Friday the 13th fell right in the middle, with less births occuring on the weekends. As someone who was born on Friday the 13th, I always assumed I had a rare birth date but going forward I am able to see that it is just the same as any other birth date."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9cfbdbeb-48a8-4f15-a1f7-e07e9e7846c4",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"jupytext": {
"cell_metadata_json": true,
"text_representation": {
"extension": ".Rmd",
"format_name": "rmarkdown",
"format_version": "1.2",
"jupytext_version": "1.9.1"
}
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.5"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 5
}