"
+ ],
+ "text/plain": [
+ " year month date_of_month day_of_week births\n",
+ "0 2000 1 1 6 9083\n",
+ "1 2000 1 2 7 8006\n",
+ "2 2000 1 3 1 11363\n",
+ "3 2000 1 4 2 13032\n",
+ "4 2000 1 5 3 12558"
+ ]
+ },
+ "execution_count": 18,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "continental-franklin",
+ "metadata": {},
+ "source": [
+ "**Data Overview**\n",
+ "\n",
+ "*This data set contains U.S. births data for the years 2000 to 2014, as provided by the Social Security Administration. The columns include the year, month, date of month, day of weeks, and the number of births on each one of these days.*"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "infinite-instrument",
+ "metadata": {},
+ "source": [
+ "# Methods and Results"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "recognized-positive",
+ "metadata": {},
+ "source": [
+ "## First Research Question: On average, are less children born on the 13th of the month?\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "graduate-palmer",
+ "metadata": {},
+ "source": [
+ "### Methods"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "endless-variation",
+ "metadata": {},
+ "source": [
+ "In order to research this question, I will use the aspects of date_of_month and births. \n",
+ "\n",
+ "In order to reorganize the data, I will have to create a new chart that will look at the average of each day individually. To do this I will have to regroup the data by the date_of_month and then find the mean for each date.\n",
+ "\n",
+ "I will then create a barplot using this new data to show a visual representation of the average number of births for each numbered day of the month. \n",
+ "\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "portuguese-japan",
+ "metadata": {},
+ "source": [
+ "### Results "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 44,
+ "id": "negative-highlight",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "date_of_month\n",
+ "1 11131.261111\n",
+ "2 11280.261111\n",
+ "3 11346.894444\n",
+ "4 11137.694444\n",
+ "5 11312.138889\n",
+ "6 11320.716667\n",
+ "7 11463.422222\n",
+ "8 11453.622222\n",
+ "9 11358.888889\n",
+ "10 11478.633333\n",
+ "11 11411.655556\n",
+ "12 11513.794444\n",
+ "13 11111.466667\n",
+ "14 11534.950000\n",
+ "15 11483.327778\n",
+ "16 11436.950000\n",
+ "17 11508.733333\n",
+ "18 11542.627778\n",
+ "19 11474.044444\n",
+ "20 11573.594444\n",
+ "21 11551.100000\n",
+ "22 11394.511111\n",
+ "23 11241.972222\n",
+ "24 11073.350000\n",
+ "25 10958.522222\n",
+ "26 11118.394444\n",
+ "27 11308.238889\n",
+ "28 11397.377778\n",
+ "29 11354.822485\n",
+ "30 11393.484848\n",
+ "31 11074.933333\n",
+ "Name: births, dtype: float64"
+ ]
+ },
+ "execution_count": 44,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "average_births = df.groupby('date_of_month')['births'].mean()\n",
+ "average_births"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 45,
+ "id": "victorian-burning",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import seaborn as sns \n",
+ "sns.set_theme()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 46,
+ "id": "affd0a87-f44b-47d9-829f-7015f815f3ba",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "average_births_df = average_births.reset_index()\n",
+ "average_births_df.columns = [\"date_of_month\", \"average_births\"]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 53,
+ "id": "7ea67a24-008f-45e6-9082-96a13804b45f",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 53,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ "
"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "sns.histplot(data=average_births_df, x=\"date_of_month\", weights=\"average_births\", bins=31)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "69a032ce-31ec-4613-8f21-d31d52d02fa2",
+ "metadata": {},
+ "source": [
+ "Looking at these results, the average number of births for each date of the month is within 1,000 births of each other, which is not a huge difference, but the 13th of each month is on the lower end. The only 2 dates of the month that has a lower average are the 25th and the 31st. Overall, there are less babies being born on the 13th than the majority of other dates"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "collectible-puppy",
+ "metadata": {},
+ "source": [
+ "## Second Research Question: Of babies being born on the 13th of the month, which day of the week is the least common on average?\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "demographic-future",
+ "metadata": {},
+ "source": [
+ "### Methods"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "incorporate-roller",
+ "metadata": {},
+ "source": [
+ "I will now use three different columns to organize and show my data: date_of_month, day_of_week, and births. \n",
+ "First, i will have to organize my data by extracting only the data with a date_of_month equal to 13. This will then allow me to look at the average day_of_week for this new data set, which I will then turn into a bar plot in order to easily see which day_of_week is the most and least common. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "juvenile-creation",
+ "metadata": {},
+ "source": [
+ "### Results "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 42,
+ "id": "pursuant-surrey",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 False\n",
+ "1 False\n",
+ "2 False\n",
+ "3 False\n",
+ "4 False\n",
+ " ... \n",
+ "5474 False\n",
+ "5475 False\n",
+ "5476 False\n",
+ "5477 False\n",
+ "5478 False\n",
+ "Name: date_of_month, Length: 5479, dtype: bool"
+ ]
+ },
+ "execution_count": 42,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df.date_of_month==13"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 48,
+ "id": "located-night",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "sns.histplot(data=thirteen, x=\"day_of_week\", weights=\"births\", bins=7)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "97dbf4e7-d3d2-4320-be82-0b0df1e47369",
+ "metadata": {},
+ "source": [
+ "Based on the chart and graph I have created here, it is obvious that Friday is not the least common day for children to be born on the 13th. It sits right around the middle which proves that the myth of less children being born on Friday the 13th to be false."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "infectious-symbol",
+ "metadata": {},
+ "source": [
+ "# Discussion"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "furnished-camping",
+ "metadata": {
+ "code_folding": []
+ },
+ "source": [
+ "## Considerations"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "bearing-stadium",
+ "metadata": {},
+ "source": [
+ "I think that my results give a fairly accurate depiction of my research questions because it looks at the births in the U.S. over a series of 14 years. I believe this is a large enough sample size to make the assumptions we have made. If we wanted an even better example of this data and to expand upon the research we have a few different options:\n",
+ "1. We could look at the data over a longer period of time to see if that has any impact.\n",
+ "2. We could look at the data in terms of locations to see if the superstition exists more in different areas of the country.\n",
+ "3. We could expand the data to other countries to see if this superstition shows in their data.\n",
+ "\n",
+ "The fact that our data was only taken from the early 2000s in the United States could potentially show a bias. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "beneficial-invasion",
+ "metadata": {},
+ "source": [
+ "## Summary"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "about-raise",
+ "metadata": {},
+ "source": [
+ "I believe that the results did make sense based on my predictions but doesn't follow the traditions of the superstition of Friday the 13th. It is often assumed that mothers avoid having their children on Friday the 13th as it comes with bad luck, but it is often impossible to plan ahead for such specifics when it comes to birth. Our results don't lead us to any conclusions where Friday the 13th is uncommon to the point where it is statistically significant. The date of the month is on the lower end, but only by a few hundred births. The date of the month data was very evenly distributed to where no one date was very different from another. In terms of day of the week, Friday the 13th fell right in the middle, with less births occuring on the weekends. As someone who was born on Friday the 13th, I always assumed I had a rare birth date but going forward I am able to see that it is just the same as any other birth date."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9cfbdbeb-48a8-4f15-a1f7-e07e9e7846c4",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "jupytext": {
+ "cell_metadata_json": true,
+ "text_representation": {
+ "extension": ".Rmd",
+ "format_name": "rmarkdown",
+ "format_version": "1.2",
+ "jupytext_version": "1.9.1"
+ }
+ },
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.5"
+ },
+ "toc": {
+ "base_numbering": 1,
+ "nav_menu": {},
+ "number_sections": false,
+ "sideBar": true,
+ "skip_h1_title": false,
+ "title_cell": "Table of Contents",
+ "title_sidebar": "Contents",
+ "toc_cell": false,
+ "toc_position": {},
+ "toc_section_display": true,
+ "toc_window_display": false
+ },
+ "varInspector": {
+ "cols": {
+ "lenName": 16,
+ "lenType": 16,
+ "lenVar": 40
+ },
+ "kernels_config": {
+ "python": {
+ "delete_cmd_postfix": "",
+ "delete_cmd_prefix": "del ",
+ "library": "var_list.py",
+ "varRefreshCmd": "print(var_dic_list())"
+ },
+ "r": {
+ "delete_cmd_postfix": ") ",
+ "delete_cmd_prefix": "rm(",
+ "library": "var_list.r",
+ "varRefreshCmd": "cat(var_dic_list()) "
+ }
+ },
+ "types_to_exclude": [
+ "module",
+ "function",
+ "builtin_function_or_method",
+ "instance",
+ "_Feature"
+ ],
+ "window_display": false
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/argument.ipynb b/argument.ipynb
index 4ed27b4..f164070 100644
--- a/argument.ipynb
+++ b/argument.ipynb
@@ -42,26 +42,23 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 34,
"id": "technical-evans",
"metadata": {},
"outputs": [],
"source": [
- "#Include any import statements you will need\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 33,
"id": "overhead-sigma",
"metadata": {},
"outputs": [],
"source": [
- "### 💻 FILL IN YOUR DATASET FILE NAME BELOW 💻 ###\n",
- "\n",
- "file_name = \"YOUR_DATASET_FILE_NAME.csv\"\n",
+ "file_name = \"US_births_2000-2014_SSA.csv\"\n",
"dataset_path = \"data/\" + file_name\n",
"\n",
"df = pd.read_csv(dataset_path)"
@@ -69,10 +66,97 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 18,
"id": "heated-blade",
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
year
\n",
+ "
month
\n",
+ "
date_of_month
\n",
+ "
day_of_week
\n",
+ "
births
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
2000
\n",
+ "
1
\n",
+ "
1
\n",
+ "
6
\n",
+ "
9083
\n",
+ "
\n",
+ "
\n",
+ "
1
\n",
+ "
2000
\n",
+ "
1
\n",
+ "
2
\n",
+ "
7
\n",
+ "
8006
\n",
+ "
\n",
+ "
\n",
+ "
2
\n",
+ "
2000
\n",
+ "
1
\n",
+ "
3
\n",
+ "
1
\n",
+ "
11363
\n",
+ "
\n",
+ "
\n",
+ "
3
\n",
+ "
2000
\n",
+ "
1
\n",
+ "
4
\n",
+ "
2
\n",
+ "
13032
\n",
+ "
\n",
+ "
\n",
+ "
4
\n",
+ "
2000
\n",
+ "
1
\n",
+ "
5
\n",
+ "
3
\n",
+ "
12558
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " year month date_of_month day_of_week births\n",
+ "0 2000 1 1 6 9083\n",
+ "1 2000 1 2 7 8006\n",
+ "2 2000 1 3 1 11363\n",
+ "3 2000 1 4 2 13032\n",
+ "4 2000 1 5 3 12558"
+ ]
+ },
+ "execution_count": 18,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
"df.head()"
]
@@ -84,7 +168,7 @@
"source": [
"**Data Overview**\n",
"\n",
- "*✏️ Write 2-3 sentences describing this dataset. Be sure to include where the data comes from and what it contains.*"
+ "*This data set contains U.S. births data for the years 2000 to 2014, as provided by the Social Security Administration. The columns include the year, month, date of month, day of weeks, and the number of births on each one of these days.*"
]
},
{
@@ -95,22 +179,12 @@
"# Methods and Results"
]
},
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "basic-canadian",
- "metadata": {},
- "outputs": [],
- "source": [
- "#Import any helper files you need here"
- ]
- },
{
"cell_type": "markdown",
"id": "recognized-positive",
"metadata": {},
"source": [
- "## First Research Question: [✏️ PUT YOUR QUESTION HERE ✏️]\n"
+ "## First Research Question: On average, are less children born on the 13th of the month?\n"
]
},
{
@@ -126,12 +200,12 @@
"id": "endless-variation",
"metadata": {},
"source": [
- "*Explain how you will approach this research question below. Consider the following:* \n",
- " - *Which aspects of the dataset will you use?* \n",
- " - *How will you reorganize/store the data?* \n",
- " - *What data science tools/functions will you use and why?* \n",
- " \n",
- "✏️ *Write your answer below:*\n",
+ "In order to research this question, I will use the aspects of date_of_month and births. \n",
+ "\n",
+ "In order to reorganize the data, I will have to create a new chart that will look at the average of each day individually. To do this I will have to regroup the data by the date_of_month and then find the mean for each date.\n",
+ "\n",
+ "I will then create a barplot using this new data to show a visual representation of the average number of births for each numbered day of the month. \n",
+ "\n",
"\n"
]
},
@@ -145,26 +219,117 @@
},
{
"cell_type": "code",
- "execution_count": 17,
+ "execution_count": 44,
"id": "negative-highlight",
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "date_of_month\n",
+ "1 11131.261111\n",
+ "2 11280.261111\n",
+ "3 11346.894444\n",
+ "4 11137.694444\n",
+ "5 11312.138889\n",
+ "6 11320.716667\n",
+ "7 11463.422222\n",
+ "8 11453.622222\n",
+ "9 11358.888889\n",
+ "10 11478.633333\n",
+ "11 11411.655556\n",
+ "12 11513.794444\n",
+ "13 11111.466667\n",
+ "14 11534.950000\n",
+ "15 11483.327778\n",
+ "16 11436.950000\n",
+ "17 11508.733333\n",
+ "18 11542.627778\n",
+ "19 11474.044444\n",
+ "20 11573.594444\n",
+ "21 11551.100000\n",
+ "22 11394.511111\n",
+ "23 11241.972222\n",
+ "24 11073.350000\n",
+ "25 10958.522222\n",
+ "26 11118.394444\n",
+ "27 11308.238889\n",
+ "28 11397.377778\n",
+ "29 11354.822485\n",
+ "30 11393.484848\n",
+ "31 11074.933333\n",
+ "Name: births, dtype: float64"
+ ]
+ },
+ "execution_count": 44,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
- "#######################################################################\n",
- "### 💻 YOUR WORK GOES HERE TO ANSWER THE FIRST RESEARCH QUESTION 💻 \n",
- "### \n",
- "### Your data analysis may include a statistic and/or a data visualization\n",
- "#######################################################################"
+ "average_births = df.groupby('date_of_month')['births'].mean()\n",
+ "average_births"
]
},
{
"cell_type": "code",
- "execution_count": 16,
+ "execution_count": 45,
"id": "victorian-burning",
"metadata": {},
"outputs": [],
"source": [
- "# 💻 YOU CAN ADD NEW CELLS WITH THE \"+\" BUTTON "
+ "import seaborn as sns \n",
+ "sns.set_theme()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 46,
+ "id": "affd0a87-f44b-47d9-829f-7015f815f3ba",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "average_births_df = average_births.reset_index()\n",
+ "average_births_df.columns = [\"date_of_month\", \"average_births\"]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 53,
+ "id": "7ea67a24-008f-45e6-9082-96a13804b45f",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 53,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ "
"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "sns.histplot(data=average_births_df, x=\"date_of_month\", weights=\"average_births\", bins=31)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "69a032ce-31ec-4613-8f21-d31d52d02fa2",
+ "metadata": {},
+ "source": [
+ "Looking at these results, the average number of births for each date of the month is within 1,000 births of each other, which is not a huge difference, but the 13th of each month is on the lower end. The only 2 dates of the month that has a lower average are the 25th and the 31st. Overall, there are less babies being born on the 13th than the majority of other dates"
]
},
{
@@ -172,7 +337,7 @@
"id": "collectible-puppy",
"metadata": {},
"source": [
- "## Second Research Question: [✏️ PUT YOUR QUESTION HERE ✏️]\n"
+ "## Second Research Question: Of babies being born on the 13th of the month, which day of the week is the least common on average?\n"
]
},
{
@@ -188,12 +353,8 @@
"id": "incorporate-roller",
"metadata": {},
"source": [
- "*Explain how you will approach this research question below. Consider the following:* \n",
- " - *Which aspects of the dataset will you use?* \n",
- " - *How will you reorganize/store the data?* \n",
- " - *What data science tools/functions will you use and why?* \n",
- "\n",
- "✏️ *Write your answer below:*\n"
+ "I will now use three different columns to organize and show my data: date_of_month, day_of_week, and births. \n",
+ "First, i will have to organize my data by extracting only the data with a date_of_month equal to 13. This will then allow me to look at the average day_of_week for this new data set, which I will then turn into a bar plot in order to easily see which day_of_week is the most and least common. "
]
},
{
@@ -206,26 +367,258 @@
},
{
"cell_type": "code",
- "execution_count": 14,
+ "execution_count": 42,
"id": "pursuant-surrey",
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 False\n",
+ "1 False\n",
+ "2 False\n",
+ "3 False\n",
+ "4 False\n",
+ " ... \n",
+ "5474 False\n",
+ "5475 False\n",
+ "5476 False\n",
+ "5477 False\n",
+ "5478 False\n",
+ "Name: date_of_month, Length: 5479, dtype: bool"
+ ]
+ },
+ "execution_count": 42,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
"source": [
- "#######################################################################\n",
- "### 💻 YOUR WORK GOES HERE TO ANSWER THE SECOND RESEARCH QUESTION 💻 \n",
- "###\n",
- "### Your data analysis may include a statistic and/or a data visualization\n",
- "#######################################################################"
+ "df.date_of_month==13"
]
},
{
"cell_type": "code",
- "execution_count": 15,
+ "execution_count": 48,
"id": "located-night",
"metadata": {},
- "outputs": [],
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "