{
"cells": [
{
"cell_type": "markdown",
"id": "4647d855",
"metadata": {},
"source": [
"# Lab 04: Data Science Tools\n",
"\n",
"## 0. Intro to Jupyter Notebooks\n",
"\n",
"Welcome to your first Jupyter notebook! Notebooks are made up of cells. Some cells contain text (like this one) and others contain Python code.\n",
"\n",
"Each cell can be in two different modes: editing or running. To edit a cell, double-click on it. When you're done editing, press **Shift+Enter** to run it. You can use [Markdown](https://www.markdownguide.org/cheat-sheet/) to add basic formatting to the text. Before you go on, try editing the text in this cell. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "355492e0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"50"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Other cells are code cells, containing Python code. (This is a comment, of course!)\n",
"# Try running this cell (again, shift+Enter). You'll see the result of the final statement \n",
"# printed below the cell. \n",
"# Then try changing the Python code and re-run it.\n",
"\n",
"30+20"
]
},
{
"cell_type": "markdown",
"id": "03fc2d21",
"metadata": {},
"source": [
"## 0.1 Cells share state\n",
"\n",
"Even though code cells run one at a time, anything that happens in a cell (like declaring a variable or running a function) affects the whole notebook. Try running these two cells a few times, in different orders. What happens when you run **Cell B** over and over?"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "76cfb7a4",
"metadata": {},
"outputs": [],
"source": [
"# Cell A\n",
"x = 10\n",
"x"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6de0894c",
"metadata": {},
"outputs": [],
"source": [
"# Cell B\n",
"x = x * 2\n",
"x"
]
},
{
"cell_type": "markdown",
"id": "36e850fd",
"metadata": {},
"source": [
"## 0.2 Saving your work\n",
"\n",
"When you finish working on a notebook, save your work using the top left icon in the menu bar above. Your notebook is stored in the file `lab_04.ipynb` in the lab directory. You can commit your changes to `ipynb` files just like any other file. Once you finish with Jupyter, you can stop the server by **Ctrl + C** in the terminal. \n",
"\n",
"*If you're doing this lab on a cloud-based platform like Binder, then you can't save your work. So don't close the tab!*"
]
},
{
"cell_type": "markdown",
"id": "88057cce",
"metadata": {},
"source": [
"---\n",
"\n",
"## 1. Pandas\n",
"\n",
"Pandas is probably the most important Python library for data science. Pandas provides an object called a **DataFrame**, which is basically a table with rows and columns. Most of the time, you will load data into Pandas using a `.csv` file. CSV files can be exported from Excel or Google Sheets, and are a common format for public data sets. \n",
"\n",
"In this lab, we'll be working with two data sets: The first contains Pokémon characteristics and the second comes from a wide-scale survey conducted by the US Centers for Disease Control ([details](https://www.cdc.gov/brfss/annual_data/annual_2020.html)). We will demonstrate techniques with Pokémon; your job is to replicate these tasks with the CDC dataset. \n",
"\n",
"**Note:** Pandas has *extensive* capabilities, and there's no way we could possibly present them all here. If you have a clearly-formed idea of what you want to do with tabular data, there's a way to do it. This lab introduces *some* of what Pandas can do, but expect to spend time reading the documentation and Stack Overflow when you start working on new tasks. \n",
"\n",
"## 1.0 Getting started\n",
"\n",
"First, we'll import pandas (using the conventional variable name `pd`) and load the two datasets. *Run these cells and every code cell you encounter in this notebook.*"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "43f4949a",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd \n",
"pokemon = pd.read_csv(\"pokemon.csv\")\n",
"people = pd.read_csv(\"brfss_2020.csv\")\n",
"#find another csv file!"
]
},
{
"cell_type": "markdown",
"id": "5179cb62",
"metadata": {},
"source": [
"## 1.1 A first look\n",
"\n",
"#### Demo\n",
"\n",
"Let's start by learning the *shape* of the data. How many columns are there? How many rows? What kinds of data are included?"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "420d195a",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" name \n",
" type \n",
" subtype \n",
" total \n",
" hp \n",
" attack \n",
" defense \n",
" special_attack \n",
" special_defense \n",
" speed \n",
" generation \n",
" legendary \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" Bulbasaur \n",
" Grass \n",
" Poison \n",
" 318 \n",
" 45 \n",
" 49 \n",
" 49 \n",
" 65 \n",
" 65 \n",
" 45 \n",
" 1 \n",
" False \n",
" \n",
" \n",
" 1 \n",
" Ivysaur \n",
" Grass \n",
" Poison \n",
" 405 \n",
" 60 \n",
" 62 \n",
" 63 \n",
" 80 \n",
" 80 \n",
" 60 \n",
" 1 \n",
" False \n",
" \n",
" \n",
" 2 \n",
" Venusaur \n",
" Grass \n",
" Poison \n",
" 525 \n",
" 80 \n",
" 82 \n",
" 83 \n",
" 100 \n",
" 100 \n",
" 80 \n",
" 1 \n",
" False \n",
" \n",
" \n",
" 3 \n",
" VenusaurMega Venusaur \n",
" Grass \n",
" Poison \n",
" 625 \n",
" 80 \n",
" 100 \n",
" 123 \n",
" 122 \n",
" 120 \n",
" 80 \n",
" 1 \n",
" False \n",
" \n",
" \n",
" 4 \n",
" Charmander \n",
" Fire \n",
" NaN \n",
" 309 \n",
" 39 \n",
" 52 \n",
" 43 \n",
" 60 \n",
" 50 \n",
" 65 \n",
" 1 \n",
" False \n",
" \n",
" \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" \n",
" \n",
" 795 \n",
" Diancie \n",
" Rock \n",
" Fairy \n",
" 600 \n",
" 50 \n",
" 100 \n",
" 150 \n",
" 100 \n",
" 150 \n",
" 50 \n",
" 6 \n",
" True \n",
" \n",
" \n",
" 796 \n",
" DiancieMega Diancie \n",
" Rock \n",
" Fairy \n",
" 700 \n",
" 50 \n",
" 160 \n",
" 110 \n",
" 160 \n",
" 110 \n",
" 110 \n",
" 6 \n",
" True \n",
" \n",
" \n",
" 797 \n",
" HoopaHoopa Confined \n",
" Psychic \n",
" Ghost \n",
" 600 \n",
" 80 \n",
" 110 \n",
" 60 \n",
" 150 \n",
" 130 \n",
" 70 \n",
" 6 \n",
" True \n",
" \n",
" \n",
" 798 \n",
" HoopaHoopa Unbound \n",
" Psychic \n",
" Dark \n",
" 680 \n",
" 80 \n",
" 160 \n",
" 60 \n",
" 170 \n",
" 130 \n",
" 80 \n",
" 6 \n",
" True \n",
" \n",
" \n",
" 799 \n",
" Volcanion \n",
" Fire \n",
" Water \n",
" 600 \n",
" 80 \n",
" 110 \n",
" 120 \n",
" 130 \n",
" 90 \n",
" 70 \n",
" 6 \n",
" True \n",
" \n",
" \n",
"
\n",
"
800 rows × 12 columns
\n",
"
"
],
"text/plain": [
" name type subtype total hp attack defense \\\n",
"0 Bulbasaur Grass Poison 318 45 49 49 \n",
"1 Ivysaur Grass Poison 405 60 62 63 \n",
"2 Venusaur Grass Poison 525 80 82 83 \n",
"3 VenusaurMega Venusaur Grass Poison 625 80 100 123 \n",
"4 Charmander Fire NaN 309 39 52 43 \n",
".. ... ... ... ... .. ... ... \n",
"795 Diancie Rock Fairy 600 50 100 150 \n",
"796 DiancieMega Diancie Rock Fairy 700 50 160 110 \n",
"797 HoopaHoopa Confined Psychic Ghost 600 80 110 60 \n",
"798 HoopaHoopa Unbound Psychic Dark 680 80 160 60 \n",
"799 Volcanion Fire Water 600 80 110 120 \n",
"\n",
" special_attack special_defense speed generation legendary \n",
"0 65 65 45 1 False \n",
"1 80 80 60 1 False \n",
"2 100 100 80 1 False \n",
"3 122 120 80 1 False \n",
"4 60 50 65 1 False \n",
".. ... ... ... ... ... \n",
"795 100 150 50 6 True \n",
"796 160 110 110 6 True \n",
"797 150 130 70 6 True \n",
"798 170 130 80 6 True \n",
"799 130 90 70 6 True \n",
"\n",
"[800 rows x 12 columns]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pokemon"
]
},
{
"cell_type": "markdown",
"id": "2cb67dff",
"metadata": {},
"source": [
"OK, 800 Pokémon, with 12 columns for each. And you can see all the columns. Not all the data is shown in this preview, of course. If there were more columns than could be displayed, you could see them all by typing `pokemon.columns`. \n",
"\n",
"---\n",
"\n",
"#### Your turn!\n",
"\n",
"Now do the same for your data set, `people`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dab2c1e7",
"metadata": {},
"outputs": [],
"source": [
"#Your code here"
]
},
{
"cell_type": "markdown",
"id": "9bbf59e9",
"metadata": {},
"source": [
"---\n",
"## 1.2 Descriptive Statistics\n",
"\n",
"#### Demo\n",
"\n",
"Let's get a sense of the data contained in some of the columns. For categorical data like `generation`, it makes sense to look at value counts--showing us how many of each category there are. You can use the optional keyword `normalize=True` to see percentage of total instead of frequencies. You can put the optional keyword `normalize=True` in the () to see percentage of total instead of frequencies."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "7707f6f2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1 166\n",
"5 165\n",
"3 160\n",
"4 121\n",
"2 106\n",
"6 82\n",
"Name: generation, dtype: int64"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pokemon.generation.value_counts()"
]
},
{
"cell_type": "markdown",
"id": "914f0e59",
"metadata": {},
"source": [
"For numeric data, we could start by looking at the mean value. We can select multiple columns and get all the column means at once."
]
},
{
"cell_type": "markdown",
"id": "421adbdd",
"metadata": {},
"source": [
"---\n",
"#### Your turn!\n",
"\n",
"**1.2.0.** In this survey, people are grouped into age bands of 18, 25, 35, 45, 55, and 65. Using the people survey, what percentage of people are in each age band? (When we talk about \"people\" in this lab, we're referring to the people who responded to the survey, not the whole US population.)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "ccbc1a21",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"65 0.336326\n",
"55 0.206369\n",
"45 0.157669\n",
"35 0.135527\n",
"25 0.108866\n",
"18 0.055244\n",
"Name: age, dtype: float64"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Hint: pokemon.generation.value_counts()\n",
"people.age.value_counts(normalize=True)"
]
},
{
"cell_type": "markdown",
"id": "1c120fcc",
"metadata": {},
"source": [
"**1.2.1.** The `exercise` column indicates whether a person has done any physical activity or exercise in the last 30 days, outside of work. What percentage of people have done exercise?"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b13a99c2",
"metadata": {},
"outputs": [],
"source": [
"#Hint: pokemon.generation.value_counts()\n",
"test3 = people.exercise.value_counts(True)\n",
"test3"
]
},
{
"cell_type": "markdown",
"id": "cec02e11",
"metadata": {},
"source": [
"--- \n",
"#### Demo\n",
"For numeric data, we could start by looking at the mean value. We can select multiple columns and get all the column means at once."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "58aa56db",
"metadata": {},
"outputs": [],
"source": [
"pokemon[[\"hp\", \"attack\", \"defense\", \"speed\"]].mean()"
]
},
{
"cell_type": "markdown",
"id": "a8a08607",
"metadata": {},
"source": [
"We can also compute the mean of boolean data. In this case, True will map to 1 and False will map to 0. So the mean value equals the percentage of data which is True. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1d165f28",
"metadata": {},
"outputs": [],
"source": [
"pokemon.legendary.mean()"
]
},
{
"cell_type": "markdown",
"id": "2e64d3a1",
"metadata": {},
"source": [
"---\n",
"#### Your turn!\n",
"**1.2.3.** Using the people survey, What are the mean height and weight of people in this survey?"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c9338ff0",
"metadata": {},
"outputs": [],
"source": [
"#Hint: pokemon[[\"hp\", \"attack\", \"defense\", \"speed\"]].mean()\n",
"people[['height', 'weight']].mean()"
]
},
{
"cell_type": "markdown",
"id": "8b7738ef",
"metadata": {},
"source": [
"---\n",
"## 1.3 Filtering\n",
"\n",
"Sometimes we're just interested in a selection of the data set. The way to do this is to create a boolean series, and then use this to select which rows you want to include. Vocabulary note: A dataframe is two-dimensional, with rows and columns. A series (a single row or a single column) is one-dimensional. \n",
"\n",
"#### Demo\n",
"`pokemon.legendary` is already boolean, so we can use this to select just the legendary pokémon. "
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "2b1b1c85",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" name \n",
" type \n",
" subtype \n",
" total \n",
" hp \n",
" attack \n",
" defense \n",
" special_attack \n",
" special_defense \n",
" speed \n",
" generation \n",
" legendary \n",
" \n",
" \n",
" \n",
" \n",
" 156 \n",
" Articuno \n",
" Ice \n",
" Flying \n",
" 580 \n",
" 90 \n",
" 85 \n",
" 100 \n",
" 95 \n",
" 125 \n",
" 85 \n",
" 1 \n",
" True \n",
" \n",
" \n",
" 157 \n",
" Zapdos \n",
" Electric \n",
" Flying \n",
" 580 \n",
" 90 \n",
" 90 \n",
" 85 \n",
" 125 \n",
" 90 \n",
" 100 \n",
" 1 \n",
" True \n",
" \n",
" \n",
" 158 \n",
" Moltres \n",
" Fire \n",
" Flying \n",
" 580 \n",
" 90 \n",
" 100 \n",
" 90 \n",
" 125 \n",
" 85 \n",
" 90 \n",
" 1 \n",
" True \n",
" \n",
" \n",
" 162 \n",
" Mewtwo \n",
" Psychic \n",
" NaN \n",
" 680 \n",
" 106 \n",
" 110 \n",
" 90 \n",
" 154 \n",
" 90 \n",
" 130 \n",
" 1 \n",
" True \n",
" \n",
" \n",
" 163 \n",
" MewtwoMega Mewtwo X \n",
" Psychic \n",
" Fighting \n",
" 780 \n",
" 106 \n",
" 190 \n",
" 100 \n",
" 154 \n",
" 100 \n",
" 130 \n",
" 1 \n",
" True \n",
" \n",
" \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" \n",
" \n",
" 795 \n",
" Diancie \n",
" Rock \n",
" Fairy \n",
" 600 \n",
" 50 \n",
" 100 \n",
" 150 \n",
" 100 \n",
" 150 \n",
" 50 \n",
" 6 \n",
" True \n",
" \n",
" \n",
" 796 \n",
" DiancieMega Diancie \n",
" Rock \n",
" Fairy \n",
" 700 \n",
" 50 \n",
" 160 \n",
" 110 \n",
" 160 \n",
" 110 \n",
" 110 \n",
" 6 \n",
" True \n",
" \n",
" \n",
" 797 \n",
" HoopaHoopa Confined \n",
" Psychic \n",
" Ghost \n",
" 600 \n",
" 80 \n",
" 110 \n",
" 60 \n",
" 150 \n",
" 130 \n",
" 70 \n",
" 6 \n",
" True \n",
" \n",
" \n",
" 798 \n",
" HoopaHoopa Unbound \n",
" Psychic \n",
" Dark \n",
" 680 \n",
" 80 \n",
" 160 \n",
" 60 \n",
" 170 \n",
" 130 \n",
" 80 \n",
" 6 \n",
" True \n",
" \n",
" \n",
" 799 \n",
" Volcanion \n",
" Fire \n",
" Water \n",
" 600 \n",
" 80 \n",
" 110 \n",
" 120 \n",
" 130 \n",
" 90 \n",
" 70 \n",
" 6 \n",
" True \n",
" \n",
" \n",
"
\n",
"
65 rows × 12 columns
\n",
"
"
],
"text/plain": [
" name type subtype total hp attack defense \\\n",
"156 Articuno Ice Flying 580 90 85 100 \n",
"157 Zapdos Electric Flying 580 90 90 85 \n",
"158 Moltres Fire Flying 580 90 100 90 \n",
"162 Mewtwo Psychic NaN 680 106 110 90 \n",
"163 MewtwoMega Mewtwo X Psychic Fighting 780 106 190 100 \n",
".. ... ... ... ... ... ... ... \n",
"795 Diancie Rock Fairy 600 50 100 150 \n",
"796 DiancieMega Diancie Rock Fairy 700 50 160 110 \n",
"797 HoopaHoopa Confined Psychic Ghost 600 80 110 60 \n",
"798 HoopaHoopa Unbound Psychic Dark 680 80 160 60 \n",
"799 Volcanion Fire Water 600 80 110 120 \n",
"\n",
" special_attack special_defense speed generation legendary \n",
"156 95 125 85 1 True \n",
"157 125 90 100 1 True \n",
"158 125 85 90 1 True \n",
"162 154 90 130 1 True \n",
"163 154 100 130 1 True \n",
".. ... ... ... ... ... \n",
"795 100 150 50 6 True \n",
"796 160 110 110 6 True \n",
"797 150 130 70 6 True \n",
"798 170 130 80 6 True \n",
"799 130 90 70 6 True \n",
"\n",
"[65 rows x 12 columns]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"legendary = pokemon[pokemon.legendary]\n",
"legendary"
]
},
{
"cell_type": "markdown",
"id": "7ece74db",
"metadata": {},
"source": [
"Let's get all the ice pokémon. We can create a boolean series from another series..."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "d3ffce2d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 False\n",
"1 False\n",
"2 False\n",
"3 False\n",
"4 False\n",
" ... \n",
"795 False\n",
"796 False\n",
"797 False\n",
"798 False\n",
"799 False\n",
"Name: type, Length: 800, dtype: bool"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pokemon.type == \"Ice\""
]
},
{
"cell_type": "markdown",
"id": "53210011",
"metadata": {},
"source": [
"And then use this series to select just the ice pokémon. "
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "0572dd7d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" name \n",
" type \n",
" subtype \n",
" total \n",
" hp \n",
" attack \n",
" defense \n",
" special_attack \n",
" special_defense \n",
" speed \n",
" generation \n",
" legendary \n",
" \n",
" \n",
" \n",
" \n",
" 133 \n",
" Jynx \n",
" Ice \n",
" Psychic \n",
" 455 \n",
" 65 \n",
" 50 \n",
" 35 \n",
" 115 \n",
" 95 \n",
" 95 \n",
" 1 \n",
" False \n",
" \n",
" \n",
" 156 \n",
" Articuno \n",
" Ice \n",
" Flying \n",
" 580 \n",
" 90 \n",
" 85 \n",
" 100 \n",
" 95 \n",
" 125 \n",
" 85 \n",
" 1 \n",
" True \n",
" \n",
" \n",
" 238 \n",
" Swinub \n",
" Ice \n",
" Ground \n",
" 250 \n",
" 50 \n",
" 50 \n",
" 40 \n",
" 30 \n",
" 30 \n",
" 50 \n",
" 2 \n",
" False \n",
" \n",
" \n",
" 239 \n",
" Piloswine \n",
" Ice \n",
" Ground \n",
" 450 \n",
" 100 \n",
" 100 \n",
" 80 \n",
" 60 \n",
" 60 \n",
" 50 \n",
" 2 \n",
" False \n",
" \n",
" \n",
" 243 \n",
" Delibird \n",
" Ice \n",
" Flying \n",
" 330 \n",
" 45 \n",
" 55 \n",
" 45 \n",
" 65 \n",
" 45 \n",
" 75 \n",
" 2 \n",
" False \n",
" \n",
" \n",
" 257 \n",
" Smoochum \n",
" Ice \n",
" Psychic \n",
" 305 \n",
" 45 \n",
" 30 \n",
" 15 \n",
" 85 \n",
" 65 \n",
" 65 \n",
" 2 \n",
" False \n",
" \n",
" \n",
" 395 \n",
" Snorunt \n",
" Ice \n",
" NaN \n",
" 300 \n",
" 50 \n",
" 50 \n",
" 50 \n",
" 50 \n",
" 50 \n",
" 50 \n",
" 3 \n",
" False \n",
" \n",
" \n",
" 396 \n",
" Glalie \n",
" Ice \n",
" NaN \n",
" 480 \n",
" 80 \n",
" 80 \n",
" 80 \n",
" 80 \n",
" 80 \n",
" 80 \n",
" 3 \n",
" False \n",
" \n",
" \n",
" 397 \n",
" GlalieMega Glalie \n",
" Ice \n",
" NaN \n",
" 580 \n",
" 80 \n",
" 120 \n",
" 80 \n",
" 120 \n",
" 80 \n",
" 100 \n",
" 3 \n",
" False \n",
" \n",
" \n",
" 398 \n",
" Spheal \n",
" Ice \n",
" Water \n",
" 290 \n",
" 70 \n",
" 40 \n",
" 50 \n",
" 55 \n",
" 50 \n",
" 25 \n",
" 3 \n",
" False \n",
" \n",
" \n",
" 399 \n",
" Sealeo \n",
" Ice \n",
" Water \n",
" 410 \n",
" 90 \n",
" 60 \n",
" 70 \n",
" 75 \n",
" 70 \n",
" 45 \n",
" 3 \n",
" False \n",
" \n",
" \n",
" 400 \n",
" Walrein \n",
" Ice \n",
" Water \n",
" 530 \n",
" 110 \n",
" 80 \n",
" 90 \n",
" 95 \n",
" 90 \n",
" 65 \n",
" 3 \n",
" False \n",
" \n",
" \n",
" 415 \n",
" Regice \n",
" Ice \n",
" NaN \n",
" 580 \n",
" 80 \n",
" 50 \n",
" 100 \n",
" 100 \n",
" 200 \n",
" 50 \n",
" 3 \n",
" True \n",
" \n",
" \n",
" 522 \n",
" Glaceon \n",
" Ice \n",
" NaN \n",
" 525 \n",
" 65 \n",
" 60 \n",
" 110 \n",
" 130 \n",
" 95 \n",
" 65 \n",
" 4 \n",
" False \n",
" \n",
" \n",
" 524 \n",
" Mamoswine \n",
" Ice \n",
" Ground \n",
" 530 \n",
" 110 \n",
" 130 \n",
" 80 \n",
" 70 \n",
" 60 \n",
" 80 \n",
" 4 \n",
" False \n",
" \n",
" \n",
" 530 \n",
" Froslass \n",
" Ice \n",
" Ghost \n",
" 480 \n",
" 70 \n",
" 80 \n",
" 70 \n",
" 80 \n",
" 70 \n",
" 110 \n",
" 4 \n",
" False \n",
" \n",
" \n",
" 643 \n",
" Vanillite \n",
" Ice \n",
" NaN \n",
" 305 \n",
" 36 \n",
" 50 \n",
" 50 \n",
" 65 \n",
" 60 \n",
" 44 \n",
" 5 \n",
" False \n",
" \n",
" \n",
" 644 \n",
" Vanillish \n",
" Ice \n",
" NaN \n",
" 395 \n",
" 51 \n",
" 65 \n",
" 65 \n",
" 80 \n",
" 75 \n",
" 59 \n",
" 5 \n",
" False \n",
" \n",
" \n",
" 645 \n",
" Vanilluxe \n",
" Ice \n",
" NaN \n",
" 535 \n",
" 71 \n",
" 95 \n",
" 85 \n",
" 110 \n",
" 95 \n",
" 79 \n",
" 5 \n",
" False \n",
" \n",
" \n",
" 674 \n",
" Cubchoo \n",
" Ice \n",
" NaN \n",
" 305 \n",
" 55 \n",
" 70 \n",
" 40 \n",
" 60 \n",
" 40 \n",
" 40 \n",
" 5 \n",
" False \n",
" \n",
" \n",
" 675 \n",
" Beartic \n",
" Ice \n",
" NaN \n",
" 485 \n",
" 95 \n",
" 110 \n",
" 80 \n",
" 70 \n",
" 80 \n",
" 50 \n",
" 5 \n",
" False \n",
" \n",
" \n",
" 676 \n",
" Cryogonal \n",
" Ice \n",
" NaN \n",
" 485 \n",
" 70 \n",
" 50 \n",
" 30 \n",
" 95 \n",
" 135 \n",
" 105 \n",
" 5 \n",
" False \n",
" \n",
" \n",
" 788 \n",
" Bergmite \n",
" Ice \n",
" NaN \n",
" 304 \n",
" 55 \n",
" 69 \n",
" 85 \n",
" 32 \n",
" 35 \n",
" 28 \n",
" 6 \n",
" False \n",
" \n",
" \n",
" 789 \n",
" Avalugg \n",
" Ice \n",
" NaN \n",
" 514 \n",
" 95 \n",
" 117 \n",
" 184 \n",
" 44 \n",
" 46 \n",
" 28 \n",
" 6 \n",
" False \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name type subtype total hp attack defense \\\n",
"133 Jynx Ice Psychic 455 65 50 35 \n",
"156 Articuno Ice Flying 580 90 85 100 \n",
"238 Swinub Ice Ground 250 50 50 40 \n",
"239 Piloswine Ice Ground 450 100 100 80 \n",
"243 Delibird Ice Flying 330 45 55 45 \n",
"257 Smoochum Ice Psychic 305 45 30 15 \n",
"395 Snorunt Ice NaN 300 50 50 50 \n",
"396 Glalie Ice NaN 480 80 80 80 \n",
"397 GlalieMega Glalie Ice NaN 580 80 120 80 \n",
"398 Spheal Ice Water 290 70 40 50 \n",
"399 Sealeo Ice Water 410 90 60 70 \n",
"400 Walrein Ice Water 530 110 80 90 \n",
"415 Regice Ice NaN 580 80 50 100 \n",
"522 Glaceon Ice NaN 525 65 60 110 \n",
"524 Mamoswine Ice Ground 530 110 130 80 \n",
"530 Froslass Ice Ghost 480 70 80 70 \n",
"643 Vanillite Ice NaN 305 36 50 50 \n",
"644 Vanillish Ice NaN 395 51 65 65 \n",
"645 Vanilluxe Ice NaN 535 71 95 85 \n",
"674 Cubchoo Ice NaN 305 55 70 40 \n",
"675 Beartic Ice NaN 485 95 110 80 \n",
"676 Cryogonal Ice NaN 485 70 50 30 \n",
"788 Bergmite Ice NaN 304 55 69 85 \n",
"789 Avalugg Ice NaN 514 95 117 184 \n",
"\n",
" special_attack special_defense speed generation legendary \n",
"133 115 95 95 1 False \n",
"156 95 125 85 1 True \n",
"238 30 30 50 2 False \n",
"239 60 60 50 2 False \n",
"243 65 45 75 2 False \n",
"257 85 65 65 2 False \n",
"395 50 50 50 3 False \n",
"396 80 80 80 3 False \n",
"397 120 80 100 3 False \n",
"398 55 50 25 3 False \n",
"399 75 70 45 3 False \n",
"400 95 90 65 3 False \n",
"415 100 200 50 3 True \n",
"522 130 95 65 4 False \n",
"524 70 60 80 4 False \n",
"530 80 70 110 4 False \n",
"643 65 60 44 5 False \n",
"644 80 75 59 5 False \n",
"645 110 95 79 5 False \n",
"674 60 40 40 5 False \n",
"675 70 80 50 5 False \n",
"676 95 135 105 5 False \n",
"788 32 35 28 6 False \n",
"789 44 46 28 6 False "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ice = pokemon[pokemon.type == \"Ice\"]\n",
"ice"
]
},
{
"cell_type": "markdown",
"id": "c01f16a5",
"metadata": {},
"source": [
"---\n",
"#### Your turn!\n",
"\n",
"**1.3.0.** `no_doctor` indicates whether there was a time in the last year when the person needed to see a doctor, but could not afford to do so. Create a dataframe containing only these people. "
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "e6f35a92",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" age \n",
" sex \n",
" income \n",
" education \n",
" sexual_orientation \n",
" height \n",
" weight \n",
" health \n",
" no_doctor \n",
" exercise \n",
" sleep \n",
" \n",
" \n",
" \n",
" \n",
" 0 \n",
" 55 \n",
" female \n",
" 5 \n",
" 2 \n",
" other \n",
" 1.55 \n",
" 83.01 \n",
" 2 \n",
" True \n",
" True \n",
" 7 \n",
" \n",
" \n",
" 2 \n",
" 35 \n",
" female \n",
" 8 \n",
" 4 \n",
" heterosexual \n",
" 1.65 \n",
" 77.11 \n",
" 4 \n",
" True \n",
" True \n",
" 7 \n",
" \n",
" \n",
" 24 \n",
" 35 \n",
" male \n",
" 8 \n",
" 3 \n",
" heterosexual \n",
" 1.73 \n",
" 94.35 \n",
" 4 \n",
" True \n",
" False \n",
" 8 \n",
" \n",
" \n",
" 50 \n",
" 35 \n",
" female \n",
" 4 \n",
" 2 \n",
" heterosexual \n",
" 1.78 \n",
" 81.65 \n",
" 4 \n",
" True \n",
" False \n",
" 10 \n",
" \n",
" \n",
" 66 \n",
" 45 \n",
" female \n",
" 6 \n",
" 4 \n",
" heterosexual \n",
" 1.57 \n",
" 72.57 \n",
" 4 \n",
" True \n",
" True \n",
" 7 \n",
" \n",
" \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" \n",
" \n",
" 166407 \n",
" 18 \n",
" male \n",
" 5 \n",
" 2 \n",
" heterosexual \n",
" 1.68 \n",
" 68.04 \n",
" 3 \n",
" True \n",
" True \n",
" 8 \n",
" \n",
" \n",
" 166409 \n",
" 25 \n",
" male \n",
" 6 \n",
" 2 \n",
" heterosexual \n",
" 1.57 \n",
" 58.51 \n",
" 4 \n",
" True \n",
" False \n",
" 7 \n",
" \n",
" \n",
" 166414 \n",
" 55 \n",
" female \n",
" 8 \n",
" 3 \n",
" heterosexual \n",
" 1.63 \n",
" 88.45 \n",
" 3 \n",
" True \n",
" False \n",
" 6 \n",
" \n",
" \n",
" 166416 \n",
" 65 \n",
" female \n",
" 5 \n",
" 2 \n",
" heterosexual \n",
" 1.50 \n",
" 55.34 \n",
" 3 \n",
" True \n",
" False \n",
" 6 \n",
" \n",
" \n",
" 166423 \n",
" 35 \n",
" female \n",
" 5 \n",
" 4 \n",
" heterosexual \n",
" 1.60 \n",
" 68.04 \n",
" 4 \n",
" True \n",
" True \n",
" 6 \n",
" \n",
" \n",
"
\n",
"
13784 rows × 11 columns
\n",
"
"
],
"text/plain": [
" age sex income education sexual_orientation height weight \\\n",
"0 55 female 5 2 other 1.55 83.01 \n",
"2 35 female 8 4 heterosexual 1.65 77.11 \n",
"24 35 male 8 3 heterosexual 1.73 94.35 \n",
"50 35 female 4 2 heterosexual 1.78 81.65 \n",
"66 45 female 6 4 heterosexual 1.57 72.57 \n",
"... ... ... ... ... ... ... ... \n",
"166407 18 male 5 2 heterosexual 1.68 68.04 \n",
"166409 25 male 6 2 heterosexual 1.57 58.51 \n",
"166414 55 female 8 3 heterosexual 1.63 88.45 \n",
"166416 65 female 5 2 heterosexual 1.50 55.34 \n",
"166423 35 female 5 4 heterosexual 1.60 68.04 \n",
"\n",
" health no_doctor exercise sleep \n",
"0 2 True True 7 \n",
"2 4 True True 7 \n",
"24 4 True False 8 \n",
"50 4 True False 10 \n",
"66 4 True True 7 \n",
"... ... ... ... ... \n",
"166407 3 True True 8 \n",
"166409 4 True False 7 \n",
"166414 3 True False 6 \n",
"166416 3 True False 6 \n",
"166423 4 True True 6 \n",
"\n",
"[13784 rows x 11 columns]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Hint ice = pokemon[pokemon.type == \"Ice\"]\n",
"noDoc = people[people.no_doctor]\n",
"noDoc"
]
},
{
"cell_type": "markdown",
"id": "c72e2155",
"metadata": {},
"source": [
"Let's get the high-speed ice pokémon. You can join conditions together using the `&` (and) and `|` (or) operators. `~` means \"not\", so `pokemon[~(pokemon.type == \"Ice\")]` would select all the non-ice pokémon. Due to order of operations, each condition needs to be wrapped in parentheses.\n",
"\n",
"You could get the pokémon who are fire or ice by selecting `pokemon[(pokemon.type == \"Fire\") | (pokemon.type == \"Ice\")]`, but lets get the high-speed ice pokemon now."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "97f3332e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" name \n",
" type \n",
" subtype \n",
" total \n",
" hp \n",
" attack \n",
" defense \n",
" special_attack \n",
" special_defense \n",
" speed \n",
" generation \n",
" legendary \n",
" \n",
" \n",
" \n",
" \n",
" 133 \n",
" Jynx \n",
" Ice \n",
" Psychic \n",
" 455 \n",
" 65 \n",
" 50 \n",
" 35 \n",
" 115 \n",
" 95 \n",
" 95 \n",
" 1 \n",
" False \n",
" \n",
" \n",
" 156 \n",
" Articuno \n",
" Ice \n",
" Flying \n",
" 580 \n",
" 90 \n",
" 85 \n",
" 100 \n",
" 95 \n",
" 125 \n",
" 85 \n",
" 1 \n",
" True \n",
" \n",
" \n",
" 396 \n",
" Glalie \n",
" Ice \n",
" NaN \n",
" 480 \n",
" 80 \n",
" 80 \n",
" 80 \n",
" 80 \n",
" 80 \n",
" 80 \n",
" 3 \n",
" False \n",
" \n",
" \n",
" 397 \n",
" GlalieMega Glalie \n",
" Ice \n",
" NaN \n",
" 580 \n",
" 80 \n",
" 120 \n",
" 80 \n",
" 120 \n",
" 80 \n",
" 100 \n",
" 3 \n",
" False \n",
" \n",
" \n",
" 524 \n",
" Mamoswine \n",
" Ice \n",
" Ground \n",
" 530 \n",
" 110 \n",
" 130 \n",
" 80 \n",
" 70 \n",
" 60 \n",
" 80 \n",
" 4 \n",
" False \n",
" \n",
" \n",
" 530 \n",
" Froslass \n",
" Ice \n",
" Ghost \n",
" 480 \n",
" 70 \n",
" 80 \n",
" 70 \n",
" 80 \n",
" 70 \n",
" 110 \n",
" 4 \n",
" False \n",
" \n",
" \n",
" 676 \n",
" Cryogonal \n",
" Ice \n",
" NaN \n",
" 485 \n",
" 70 \n",
" 50 \n",
" 30 \n",
" 95 \n",
" 135 \n",
" 105 \n",
" 5 \n",
" False \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name type subtype total hp attack defense \\\n",
"133 Jynx Ice Psychic 455 65 50 35 \n",
"156 Articuno Ice Flying 580 90 85 100 \n",
"396 Glalie Ice NaN 480 80 80 80 \n",
"397 GlalieMega Glalie Ice NaN 580 80 120 80 \n",
"524 Mamoswine Ice Ground 530 110 130 80 \n",
"530 Froslass Ice Ghost 480 70 80 70 \n",
"676 Cryogonal Ice NaN 485 70 50 30 \n",
"\n",
" special_attack special_defense speed generation legendary \n",
"133 115 95 95 1 False \n",
"156 95 125 85 1 True \n",
"396 80 80 80 3 False \n",
"397 120 80 100 3 False \n",
"524 70 60 80 4 False \n",
"530 80 70 110 4 False \n",
"676 95 135 105 5 False "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"high_speed_ice = pokemon[(pokemon.type == \"Ice\") & (pokemon.speed >= 80)]\n",
"high_speed_ice"
]
},
{
"cell_type": "markdown",
"id": "22215027",
"metadata": {},
"source": [
"---\n",
"#### Your turn!\n",
"**1.3.1.** `health` asks people for their general health, with the meanings of numbers shown below. Create a dataframe which contains people whose general health is good or better. \n",
"\n",
"| number | health status | \n",
"| ------ | ----------- |\n",
"| 1 | Poor |\n",
"| 2 | Fair |\n",
"| 3 | Good |\n",
"| 4 | Very good |\n",
"| 5 | Excellent |"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "a80f0adc",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" age \n",
" sex \n",
" income \n",
" education \n",
" sexual_orientation \n",
" height \n",
" weight \n",
" health \n",
" no_doctor \n",
" exercise \n",
" sleep \n",
" \n",
" \n",
" \n",
" \n",
" 2 \n",
" 35 \n",
" female \n",
" 8 \n",
" 4 \n",
" heterosexual \n",
" 1.65 \n",
" 77.11 \n",
" 4 \n",
" True \n",
" True \n",
" 7 \n",
" \n",
" \n",
" 3 \n",
" 55 \n",
" male \n",
" 8 \n",
" 4 \n",
" heterosexual \n",
" 1.83 \n",
" 81.65 \n",
" 5 \n",
" False \n",
" True \n",
" 8 \n",
" \n",
" \n",
" 4 \n",
" 55 \n",
" female \n",
" 8 \n",
" 4 \n",
" heterosexual \n",
" 1.80 \n",
" 76.66 \n",
" 4 \n",
" False \n",
" True \n",
" 8 \n",
" \n",
" \n",
" 5 \n",
" 55 \n",
" male \n",
" 8 \n",
" 4 \n",
" heterosexual \n",
" 1.80 \n",
" 74.84 \n",
" 5 \n",
" False \n",
" True \n",
" 7 \n",
" \n",
" \n",
" 8 \n",
" 55 \n",
" female \n",
" 6 \n",
" 4 \n",
" heterosexual \n",
" 1.73 \n",
" 63.50 \n",
" 5 \n",
" False \n",
" True \n",
" 7 \n",
" \n",
" \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" ... \n",
" \n",
" \n",
" 166411 \n",
" 65 \n",
" male \n",
" 7 \n",
" 1 \n",
" heterosexual \n",
" 1.78 \n",
" 117.93 \n",
" 4 \n",
" False \n",
" True \n",
" 7 \n",
" \n",
" \n",
" 166415 \n",
" 35 \n",
" male \n",
" 8 \n",
" 4 \n",
" heterosexual \n",
" 1.75 \n",
" 99.79 \n",
" 4 \n",
" False \n",
" True \n",
" 7 \n",
" \n",
" \n",
" 166417 \n",
" 35 \n",
" female \n",
" 8 \n",
" 2 \n",
" heterosexual \n",
" 1.73 \n",
" 95.25 \n",
" 4 \n",
" False \n",
" False \n",
" 4 \n",
" \n",
" \n",
" 166421 \n",
" 25 \n",
" male \n",
" 7 \n",
" 2 \n",
" heterosexual \n",
" 1.78 \n",
" 86.18 \n",
" 4 \n",
" False \n",
" True \n",
" 6 \n",
" \n",
" \n",
" 166423 \n",
" 35 \n",
" female \n",
" 5 \n",
" 4 \n",
" heterosexual \n",
" 1.60 \n",
" 68.04 \n",
" 4 \n",
" True \n",
" True \n",
" 6 \n",
" \n",
" \n",
"
\n",
"
93585 rows × 11 columns
\n",
"
"
],
"text/plain": [
" age sex income education sexual_orientation height weight \\\n",
"2 35 female 8 4 heterosexual 1.65 77.11 \n",
"3 55 male 8 4 heterosexual 1.83 81.65 \n",
"4 55 female 8 4 heterosexual 1.80 76.66 \n",
"5 55 male 8 4 heterosexual 1.80 74.84 \n",
"8 55 female 6 4 heterosexual 1.73 63.50 \n",
"... ... ... ... ... ... ... ... \n",
"166411 65 male 7 1 heterosexual 1.78 117.93 \n",
"166415 35 male 8 4 heterosexual 1.75 99.79 \n",
"166417 35 female 8 2 heterosexual 1.73 95.25 \n",
"166421 25 male 7 2 heterosexual 1.78 86.18 \n",
"166423 35 female 5 4 heterosexual 1.60 68.04 \n",
"\n",
" health no_doctor exercise sleep \n",
"2 4 True True 7 \n",
"3 5 False True 8 \n",
"4 4 False True 8 \n",
"5 5 False True 7 \n",
"8 5 False True 7 \n",
"... ... ... ... ... \n",
"166411 4 False True 7 \n",
"166415 4 False True 7 \n",
"166417 4 False False 4 \n",
"166421 4 False True 6 \n",
"166423 4 True True 6 \n",
"\n",
"[93585 rows x 11 columns]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Hint = high_speed_ice = pokemon[(pokemon.type == \"Ice\") & (pokemon.speed >= 80)]\n",
"goodHealth = people[(people.health >3)]\n",
"goodHealth"
]
},
{
"cell_type": "markdown",
"id": "0e3ec616",
"metadata": {},
"source": [
"**1.3.2.** `education` indicates the highest level of education completed, with codes as follows. Create a dataframe which only contains female college graduates who needed a doctor but couldn't afford one. (The survey asked people for their current sex, and only had options for male and female.)\n",
"\n",
"| number | education level | \n",
"| ------ | ----------- |\n",
"| 1 | Did not graduate from high school |\n",
"| 2 | Graduated from high school |\n",
"| 3 | Attended some college |\n",
"| 4 | Graduated from college |"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1ee0b68f",
"metadata": {},
"outputs": [],
"source": [
"#Hint: high_speed_ice = pokemon[(pokemon.type == \"Ice\") & (pokemon.speed >= 80)]\n",
"sickgirlbosses = people[(people.no_doctor)&(people.education==4)&(people.sex=='female')]\n",
"sickgirlbosses"
]
},
{
"cell_type": "markdown",
"id": "b41ee22d",
"metadata": {},
"source": [
"---\n",
"## 1.4. Grouping\n",
"\n",
"Now things get crazy. You can group a dataframe using one or more columns, and then compare their statistics. \n",
"\n",
"#### Demo\n",
"\n",
"Do different types of pokémon move at different speeds? We'll use `sort_values` to put these in order from slow to fast."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d8c9a61c",
"metadata": {},
"outputs": [],
"source": [
"pokemon.groupby(\"type\").speed.mean().sort_values()"
]
},
{
"cell_type": "markdown",
"id": "bbecbb71",
"metadata": {},
"source": [
"----\n",
"### Your turn\n",
"\n",
"**1.4.0.** `income` records peoples' annual income, in the following bands. `sleep` records the average hours of sleep someone gets per night. Is there a difference in the average hours of sleep by income level?\n",
"\n",
"| number | annual income, in $1000 | \n",
"| ------ | ----------- |\n",
"| 1 | Less than 10 |\n",
"| 2 | 10-15 |\n",
"| 3 | 15-20 |\n",
"| 4 | 20-25 |\n",
"| 5 | 25-35 |\n",
"| 6 | 35-50 |\n",
"| 7 | 50-75 |\n",
"| 8 | More than 75 |"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9ce2d00f",
"metadata": {},
"outputs": [],
"source": [
"#pokemon.groupby(\"type\").speed.mean().sort_values()\n",
"a = people.groupby(\"income\").sleep.mean()\n",
"a"
]
},
{
"cell_type": "markdown",
"id": "3ff40dde",
"metadata": {},
"source": [
"---\n",
"#### Demo\n",
"Do types differ in other stats? Let's sort by hit points. "
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "1f1dd086",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" \n",
" hp \n",
" attack \n",
" defense \n",
" \n",
" \n",
" type \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" Bug \n",
" 56.884058 \n",
" 70.971014 \n",
" 70.724638 \n",
" \n",
" \n",
" Electric \n",
" 59.795455 \n",
" 69.090909 \n",
" 66.295455 \n",
" \n",
" \n",
" Ghost \n",
" 64.437500 \n",
" 73.781250 \n",
" 81.187500 \n",
" \n",
" \n",
" Steel \n",
" 65.222222 \n",
" 92.703704 \n",
" 126.370370 \n",
" \n",
" \n",
" Rock \n",
" 65.363636 \n",
" 92.863636 \n",
" 100.795455 \n",
" \n",
" \n",
" Dark \n",
" 66.806452 \n",
" 88.387097 \n",
" 70.225806 \n",
" \n",
" \n",
" Poison \n",
" 67.250000 \n",
" 74.678571 \n",
" 68.821429 \n",
" \n",
" \n",
" Grass \n",
" 67.271429 \n",
" 73.214286 \n",
" 70.800000 \n",
" \n",
" \n",
" Fighting \n",
" 69.851852 \n",
" 96.777778 \n",
" 65.925926 \n",
" \n",
" \n",
" Fire \n",
" 69.903846 \n",
" 84.769231 \n",
" 67.769231 \n",
" \n",
" \n",
" Psychic \n",
" 70.631579 \n",
" 71.456140 \n",
" 67.684211 \n",
" \n",
" \n",
" Flying \n",
" 70.750000 \n",
" 78.750000 \n",
" 66.250000 \n",
" \n",
" \n",
" Ice \n",
" 72.000000 \n",
" 72.750000 \n",
" 71.416667 \n",
" \n",
" \n",
" Water \n",
" 72.062500 \n",
" 74.151786 \n",
" 72.946429 \n",
" \n",
" \n",
" Ground \n",
" 73.781250 \n",
" 95.750000 \n",
" 84.843750 \n",
" \n",
" \n",
" Fairy \n",
" 74.117647 \n",
" 61.529412 \n",
" 65.705882 \n",
" \n",
" \n",
" Normal \n",
" 77.275510 \n",
" 73.469388 \n",
" 59.846939 \n",
" \n",
" \n",
" Dragon \n",
" 83.312500 \n",
" 112.125000 \n",
" 86.375000 \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
" hp attack defense\n",
"type \n",
"Bug 56.884058 70.971014 70.724638\n",
"Electric 59.795455 69.090909 66.295455\n",
"Ghost 64.437500 73.781250 81.187500\n",
"Steel 65.222222 92.703704 126.370370\n",
"Rock 65.363636 92.863636 100.795455\n",
"Dark 66.806452 88.387097 70.225806\n",
"Poison 67.250000 74.678571 68.821429\n",
"Grass 67.271429 73.214286 70.800000\n",
"Fighting 69.851852 96.777778 65.925926\n",
"Fire 69.903846 84.769231 67.769231\n",
"Psychic 70.631579 71.456140 67.684211\n",
"Flying 70.750000 78.750000 66.250000\n",
"Ice 72.000000 72.750000 71.416667\n",
"Water 72.062500 74.151786 72.946429\n",
"Ground 73.781250 95.750000 84.843750\n",
"Fairy 74.117647 61.529412 65.705882\n",
"Normal 77.275510 73.469388 59.846939\n",
"Dragon 83.312500 112.125000 86.375000"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ptypes = pokemon.groupby(\"type\")\n",
"ptypes[[\"hp\", \"attack\", \"defense\"]].mean().sort_values(\"hp\")"
]
},
{
"cell_type": "markdown",
"id": "d2f5ce5c",
"metadata": {},
"source": [
"Which type/subtype combinations are most likely to have legendary pokémon?"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7e1b2e97",
"metadata": {},
"outputs": [],
"source": [
"legendary_percentages = pokemon.groupby([\"type\", \"subtype\"]).legendary.mean().sort_values() \n",
"legendary_percentages[legendary_percentages > 0.5] "
]
},
{
"cell_type": "markdown",
"id": "4697836f",
"metadata": {},
"source": [
"---\n",
"#### Your turn!\n",
"**1.4.0.** Is there a difference in peoples' general health, by sex and education level? "
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "b9f40fc7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"sex education\n",
"female 1 2.848040\n",
"male 1 3.031525\n",
"female 2 3.315797\n",
"male 2 3.440818\n",
"female 3 3.483379\n",
"male 3 3.549105\n",
" 4 3.826512\n",
"female 4 3.844340\n",
"Name: health, dtype: float64"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#Hint: legendary_percentages = pokemon.groupby([\"type\", \"subtype\"]).legendary.mean().sort_values() \n",
"\n",
"b = people.groupby([\"sex\", \"education\"]).health.mean().sort_values()\n",
"b"
]
},
{
"cell_type": "markdown",
"id": "93ca2aa9",
"metadata": {},
"source": [
"---\n",
"### 1.5. Plotting \n",
"\n",
"Pandas has excellent built-in plotting capabilities, but \n",
"we are going to use the [seaborn](https://seaborn.pydata.org/) library because it's a bit \n",
"more intuitive and produces more beautiful plots. `set_theme`, called here without any arguments, assigns the default color palette. "
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "dadbe683",
"metadata": {},
"outputs": [],
"source": [
"import seaborn as sns\n",
"sns.set_theme()"
]
},
{
"cell_type": "markdown",
"id": "76bab223",
"metadata": {},
"source": [
"#### Demo\n",
"\n",
"**When you want to visualize the distribution of a series**, a [histogram](https://seaborn.pydata.org/generated/seaborn.histplot.html) puts data into bins and plots the number of data points in each bin.\n",
"\n",
"Let's see the distribution of pokémon attack values. Note how assigning `x=\"attack\"` spreads attack values over the x-axis, while `y=\"attack\"` spreads attack values over the y-axis. The number of bins is selected automatically, but you can specify this with the optional `bins` argument. "
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "242e70d2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.histplot(data=pokemon, x=\"attack\")"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "6ae16707",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.histplot(data=pokemon, y=\"attack\", bins=5)"
]
},
{
"cell_type": "markdown",
"id": "d9b4cf0e",
"metadata": {},
"source": [
"---\n",
"#### Your turn!\n",
"\n",
"**1.5.0.** Plot a histogram of peoples' heights."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "379ad264",
"metadata": {},
"outputs": [],
"source": [
"#Hint: sns.histplot(data=pokemon, x=\"attack\")\n",
"sns.histplot(data=people, x=\"height\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}