generated from mwc/lab_pokemon
3085 lines
98 KiB
Plaintext
3085 lines
98 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "90041b00-672b-4bd4-a8e8-0cab3f0548af",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Lab 04: Data Science Tools\n",
|
||
"\n",
|
||
"## 0. Jupyter Notebooks\n",
|
||
"\n",
|
||
"Welcome to your first Jupyter notebook! Notebooks are made up of cells. Some cells contain text (like this one) and others contain Python code. \n",
|
||
"\n",
|
||
"Each cell can be in two different modes: editing or running. To edit a cell, double-click on it. When you're done editing, press **shift+Enter** to run it. You can use [Markdown](https://www.markdownguide.org/cheat-sheet/) to add basic formatting to the text. Before you go on, try editing the text in this cell."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"id": "5923b0d7-c0e0-48fa-b765-4aa6002c2d4f",
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"25"
|
||
]
|
||
},
|
||
"execution_count": 4,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"# Other cells are code cells, containing Python code. (This is a comment, of course!)\n",
|
||
"# Try running this cell (again, shift+Enter). You'll see the result of the final statement \n",
|
||
"# printed below the cell. \n",
|
||
"# Then try changing the Python code and re-run it.\n",
|
||
"\n",
|
||
"5*5\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "257ef44f-8f53-4136-9d0d-23a811ec53e9",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 0.1 Cells share state\n",
|
||
"\n",
|
||
"Even though code cells run one at a time, anything that happens in a cell (like declaring a variable or running a function) affects the whole notebook. Try running these two cells a few times, in different orders. What happens when you run *Cell B* over and over?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"id": "0e2a2927-f6d1-4b13-97ae-ff97416723e9",
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"20"
|
||
]
|
||
},
|
||
"execution_count": 7,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"# Cell A\n",
|
||
"\n",
|
||
"x"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"id": "69dd7908-b213-4d0f-8016-e46a4a491961",
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"20"
|
||
]
|
||
},
|
||
"execution_count": 6,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"# Cell B\n",
|
||
"x = x * 2\n",
|
||
"x"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "adc581ac-db13-40a8-bcfc-bf5d6e5472c5",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 0.2 Saving your work\n",
|
||
"\n",
|
||
"When you finish working on a notebook, save your work using the icon in the menu bar above. Your notebook is stored in the file `lab_04.ipynb` in the lab directory. You can commit your changes to `ipynb` files just like any other file. Once you finish with Jupyter, you can stop the server by pressing **Control + C** in the Terminal. \n",
|
||
"\n",
|
||
"*If you're doing this lab on a cloud-based platform like Binder, then you can't save your work. So don't close the tab!*"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "c9c4aec2-949d-4a2e-b736-f5182b1f9ff7",
|
||
"metadata": {},
|
||
"source": [
|
||
"---\n",
|
||
"\n",
|
||
"## 1. Pandas\n",
|
||
"\n",
|
||
"Pandas is probably the most important Python library for data science. Pandas provides an object called a **DataFrame**, which is basically a table with rows and columns. Most of the time, you will load data into Pandas using a `.csv` file. CSV files can be exported from Excel or Google Sheets, and are a common format for public data sets. \n",
|
||
"\n",
|
||
"In this lab, we'll be working with two data sets: The first contains Pokémon characteristics and the second comes from a wide-scale survey conducted by the US Centers for Disease Control ([details](https://www.cdc.gov/brfss/annual_data/annual_2020.html)). We will demonstrate techniques with Pokémon; your job is to replicate these tasks with the CDC dataset. \n",
|
||
"\n",
|
||
"**Note:** Pandas has *extensive* capabilities, and there's no way we could possibly present them all here. If you have a clearly-formed idea of what you want to do with tabular data, there's a way to do it. This lab introduces *some* of what Pandas can do, but expect to spend time reading the documentation and Stack Overflow when you start working on new tasks. \n",
|
||
"\n",
|
||
"### 1.0 Getting started\n",
|
||
"\n",
|
||
"First, we'll import pandas (using the conventional variable name `pd`) and load the two datasets. *Run these cells and every code cell you encounter in this notebook.*"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"id": "ba09a0f8-27d9-456f-aeff-3980e3362d5b",
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"import pandas as pd"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"id": "a29d508a-2d9a-4d62-9ff6-7a0ecfd5eba4",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"pokemon = pd.read_csv(\"pokemon.csv\")\n",
|
||
"people = pd.read_csv(\"brfss_2020.csv\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "d4e0b811-b8bf-4e9a-a934-3aad8f0520bb",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 1.1 A first look\n",
|
||
"\n",
|
||
"#### Demo\n",
|
||
"\n",
|
||
"Let's start by learning the *shape* of the data. How many columns are there? How many rows? What kinds of data are included?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"id": "579d8dda-ca39-48b1-8819-b17651029729",
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>name</th>\n",
|
||
" <th>type</th>\n",
|
||
" <th>subtype</th>\n",
|
||
" <th>total</th>\n",
|
||
" <th>hp</th>\n",
|
||
" <th>attack</th>\n",
|
||
" <th>defense</th>\n",
|
||
" <th>special_attack</th>\n",
|
||
" <th>special_defense</th>\n",
|
||
" <th>speed</th>\n",
|
||
" <th>generation</th>\n",
|
||
" <th>legendary</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>Bulbasaur</td>\n",
|
||
" <td>Grass</td>\n",
|
||
" <td>Poison</td>\n",
|
||
" <td>318</td>\n",
|
||
" <td>45</td>\n",
|
||
" <td>49</td>\n",
|
||
" <td>49</td>\n",
|
||
" <td>65</td>\n",
|
||
" <td>65</td>\n",
|
||
" <td>45</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>Ivysaur</td>\n",
|
||
" <td>Grass</td>\n",
|
||
" <td>Poison</td>\n",
|
||
" <td>405</td>\n",
|
||
" <td>60</td>\n",
|
||
" <td>62</td>\n",
|
||
" <td>63</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>60</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>Venusaur</td>\n",
|
||
" <td>Grass</td>\n",
|
||
" <td>Poison</td>\n",
|
||
" <td>525</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>82</td>\n",
|
||
" <td>83</td>\n",
|
||
" <td>100</td>\n",
|
||
" <td>100</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>VenusaurMega Venusaur</td>\n",
|
||
" <td>Grass</td>\n",
|
||
" <td>Poison</td>\n",
|
||
" <td>625</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>100</td>\n",
|
||
" <td>123</td>\n",
|
||
" <td>122</td>\n",
|
||
" <td>120</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>Charmander</td>\n",
|
||
" <td>Fire</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>309</td>\n",
|
||
" <td>39</td>\n",
|
||
" <td>52</td>\n",
|
||
" <td>43</td>\n",
|
||
" <td>60</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>65</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>795</th>\n",
|
||
" <td>Diancie</td>\n",
|
||
" <td>Rock</td>\n",
|
||
" <td>Fairy</td>\n",
|
||
" <td>600</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>100</td>\n",
|
||
" <td>150</td>\n",
|
||
" <td>100</td>\n",
|
||
" <td>150</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>True</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>796</th>\n",
|
||
" <td>DiancieMega Diancie</td>\n",
|
||
" <td>Rock</td>\n",
|
||
" <td>Fairy</td>\n",
|
||
" <td>700</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>160</td>\n",
|
||
" <td>110</td>\n",
|
||
" <td>160</td>\n",
|
||
" <td>110</td>\n",
|
||
" <td>110</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>True</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>797</th>\n",
|
||
" <td>HoopaHoopa Confined</td>\n",
|
||
" <td>Psychic</td>\n",
|
||
" <td>Ghost</td>\n",
|
||
" <td>600</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>110</td>\n",
|
||
" <td>60</td>\n",
|
||
" <td>150</td>\n",
|
||
" <td>130</td>\n",
|
||
" <td>70</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>True</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>798</th>\n",
|
||
" <td>HoopaHoopa Unbound</td>\n",
|
||
" <td>Psychic</td>\n",
|
||
" <td>Dark</td>\n",
|
||
" <td>680</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>160</td>\n",
|
||
" <td>60</td>\n",
|
||
" <td>170</td>\n",
|
||
" <td>130</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>True</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>799</th>\n",
|
||
" <td>Volcanion</td>\n",
|
||
" <td>Fire</td>\n",
|
||
" <td>Water</td>\n",
|
||
" <td>600</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>110</td>\n",
|
||
" <td>120</td>\n",
|
||
" <td>130</td>\n",
|
||
" <td>90</td>\n",
|
||
" <td>70</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>True</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>800 rows × 12 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" name type subtype total hp attack defense \\\n",
|
||
"0 Bulbasaur Grass Poison 318 45 49 49 \n",
|
||
"1 Ivysaur Grass Poison 405 60 62 63 \n",
|
||
"2 Venusaur Grass Poison 525 80 82 83 \n",
|
||
"3 VenusaurMega Venusaur Grass Poison 625 80 100 123 \n",
|
||
"4 Charmander Fire NaN 309 39 52 43 \n",
|
||
".. ... ... ... ... .. ... ... \n",
|
||
"795 Diancie Rock Fairy 600 50 100 150 \n",
|
||
"796 DiancieMega Diancie Rock Fairy 700 50 160 110 \n",
|
||
"797 HoopaHoopa Confined Psychic Ghost 600 80 110 60 \n",
|
||
"798 HoopaHoopa Unbound Psychic Dark 680 80 160 60 \n",
|
||
"799 Volcanion Fire Water 600 80 110 120 \n",
|
||
"\n",
|
||
" special_attack special_defense speed generation legendary \n",
|
||
"0 65 65 45 1 False \n",
|
||
"1 80 80 60 1 False \n",
|
||
"2 100 100 80 1 False \n",
|
||
"3 122 120 80 1 False \n",
|
||
"4 60 50 65 1 False \n",
|
||
".. ... ... ... ... ... \n",
|
||
"795 100 150 50 6 True \n",
|
||
"796 160 110 110 6 True \n",
|
||
"797 150 130 70 6 True \n",
|
||
"798 170 130 80 6 True \n",
|
||
"799 130 90 70 6 True \n",
|
||
"\n",
|
||
"[800 rows x 12 columns]"
|
||
]
|
||
},
|
||
"execution_count": 5,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"pokemon"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "ee8b0718-56f9-4fc8-bd35-fa0ccb445179",
|
||
"metadata": {},
|
||
"source": [
|
||
"OK, 800 Pokémon, with 12 columns for each. And you can see all the columns. Not all the data is shown in this preview, of course. If there were more columns than could be displayed, you could see them all by typing `pokemon.columns`. \n",
|
||
"\n",
|
||
"#### Your turn\n",
|
||
"\n",
|
||
"Now do the same for your data set, `people`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"id": "c9e5e4ec-b197-450c-ae2d-318006fa0a2f",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>age</th>\n",
|
||
" <th>sex</th>\n",
|
||
" <th>income</th>\n",
|
||
" <th>education</th>\n",
|
||
" <th>sexual_orientation</th>\n",
|
||
" <th>height</th>\n",
|
||
" <th>weight</th>\n",
|
||
" <th>health</th>\n",
|
||
" <th>no_doctor</th>\n",
|
||
" <th>exercise</th>\n",
|
||
" <th>sleep</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>55</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>other</td>\n",
|
||
" <td>1.55</td>\n",
|
||
" <td>83.01</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>7</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>65</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.65</td>\n",
|
||
" <td>78.02</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>35</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.65</td>\n",
|
||
" <td>77.11</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>7</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>55</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.83</td>\n",
|
||
" <td>81.65</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>55</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.80</td>\n",
|
||
" <td>76.66</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>166420</th>\n",
|
||
" <td>45</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.63</td>\n",
|
||
" <td>86.18</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>6</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>166421</th>\n",
|
||
" <td>25</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>7</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.78</td>\n",
|
||
" <td>86.18</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>6</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>166422</th>\n",
|
||
" <td>25</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.91</td>\n",
|
||
" <td>45.36</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>166423</th>\n",
|
||
" <td>35</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.60</td>\n",
|
||
" <td>68.04</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>6</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>166424</th>\n",
|
||
" <td>35</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>7</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.75</td>\n",
|
||
" <td>86.18</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>8</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>166425 rows × 11 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" age sex income education sexual_orientation height weight \\\n",
|
||
"0 55 female 5 2 other 1.55 83.01 \n",
|
||
"1 65 female 8 1 heterosexual 1.65 78.02 \n",
|
||
"2 35 female 8 4 heterosexual 1.65 77.11 \n",
|
||
"3 55 male 8 4 heterosexual 1.83 81.65 \n",
|
||
"4 55 female 8 4 heterosexual 1.80 76.66 \n",
|
||
"... ... ... ... ... ... ... ... \n",
|
||
"166420 45 female 8 3 heterosexual 1.63 86.18 \n",
|
||
"166421 25 male 7 2 heterosexual 1.78 86.18 \n",
|
||
"166422 25 female 1 2 heterosexual 1.91 45.36 \n",
|
||
"166423 35 female 5 4 heterosexual 1.60 68.04 \n",
|
||
"166424 35 male 7 2 heterosexual 1.75 86.18 \n",
|
||
"\n",
|
||
" health no_doctor exercise sleep \n",
|
||
"0 2 True True 7 \n",
|
||
"1 3 False False 8 \n",
|
||
"2 4 True True 7 \n",
|
||
"3 5 False True 8 \n",
|
||
"4 4 False True 8 \n",
|
||
"... ... ... ... ... \n",
|
||
"166420 1 False False 6 \n",
|
||
"166421 4 False True 6 \n",
|
||
"166422 1 False False 8 \n",
|
||
"166423 4 True True 6 \n",
|
||
"166424 3 False False 8 \n",
|
||
"\n",
|
||
"[166425 rows x 11 columns]"
|
||
]
|
||
},
|
||
"execution_count": 6,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"people"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "7fab76ef-d453-4568-a916-4d4c29535a42",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 1.2 Descriptive Statistics\n",
|
||
"\n",
|
||
"#### Demo\n",
|
||
"\n",
|
||
"Let's get a sense of the data contained in some of the columns. For categorical data like `generation`, it makes sense to look at value counts--showing us how many of each category there are. You can use the optional keyword `normalize=True` to see percentage of total instead of frequencies. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"id": "9afca362-9edc-423c-981b-dc42107d5de0",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"1 166\n",
|
||
"5 165\n",
|
||
"3 160\n",
|
||
"4 121\n",
|
||
"2 106\n",
|
||
"6 82\n",
|
||
"Name: generation, dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 7,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"pokemon.generation.value_counts()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "a9b98eee-bdc2-4c63-bab2-ee82e2466d0f",
|
||
"metadata": {},
|
||
"source": [
|
||
"For numeric data, we could start by looking at the mean value. We can select multiple columns and get all the column means at once."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 8,
|
||
"id": "5fe580d0-5939-4152-9f8c-4c32d35a4479",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"hp 69.25875\n",
|
||
"attack 79.00125\n",
|
||
"defense 73.84250\n",
|
||
"speed 68.27750\n",
|
||
"dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 8,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"pokemon[[\"hp\", \"attack\", \"defense\", \"speed\"]].mean()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "0d8e6e78-fcfc-4c38-a418-545fe4216a44",
|
||
"metadata": {},
|
||
"source": [
|
||
"We can also compute the mean of boolean data. In this case, True will map to 1 and False will map to 0. So the mean value equals the percentage of data which is True. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 9,
|
||
"id": "dc69ef53-70cd-4ae0-80e7-c9c8e28de76f",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0.08125"
|
||
]
|
||
},
|
||
"execution_count": 9,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"pokemon.legendary.mean()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "69333e87-8df2-4b46-9005-2b8c9df3a7b4",
|
||
"metadata": {},
|
||
"source": [
|
||
"Just over 8% of Pokemon are legendary."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "f563d97d-d9d3-4f2d-a46a-5d5dfc6382de",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Your turn\n",
|
||
"\n",
|
||
"**1.2.0.** In this survey, people are grouped into age bands of 18-24, 25-34, 35-44, 45-54, 55-64, and 65+, with the lower bound reported. What percentage of people are in each age band? (When we talk about \"people\" in this lab, we're referring to the people who responded to the survey, not the whole US population.)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 11,
|
||
"id": "8fbcc766-8399-4f93-a6c8-e0607250a72a",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"65 55973\n",
|
||
"55 34345\n",
|
||
"45 26240\n",
|
||
"35 22555\n",
|
||
"25 18118\n",
|
||
"18 9194\n",
|
||
"Name: age, dtype: int64"
|
||
]
|
||
},
|
||
"execution_count": 11,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"people.age.value_counts()\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "38006e7b-4771-4c29-86a8-19d04a50fc25",
|
||
"metadata": {},
|
||
"source": [
|
||
"**1.2.1.** What are the mean height and weight of people in this survey?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 12,
|
||
"id": "b7f910c8-3d40-49ae-b270-678734c04100",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"height 1.705082\n",
|
||
"weight 83.053588\n",
|
||
"dtype: float64"
|
||
]
|
||
},
|
||
"execution_count": 12,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"people[[\"height\", \"weight\"]].mean()\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "f74634bb-8664-46e4-b371-6f45cbb7c8ef",
|
||
"metadata": {},
|
||
"source": [
|
||
"**1.2.2.** The `exercise` column indicates whether a person has done any physical activity or exercise in the last 30 days, outside of work. What percentage of people have done exercise?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 13,
|
||
"id": "f3891188-a85f-4089-8388-d4d81c7438ad",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0.7858014120474688"
|
||
]
|
||
},
|
||
"execution_count": 13,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"people.exercise.mean()\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "f6082e65-321c-4ee0-9457-74f9bb1b0363",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 1.3 Filtering\n",
|
||
"\n",
|
||
"Sometimes we're just interested in a selection of the data set. The way to do this is to create a boolean series, and then use this to select which rows you want to include. Vocabulary note: A dataframe is two-dimensional, with rows and columns. A series (a single row or a single column) is one-dimensional. \n",
|
||
"\n",
|
||
"#### Demo\n",
|
||
"`pokemon.legendary` is already boolean, so we can use this to select just the legendary pokémon. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 14,
|
||
"id": "12c0c6c9-c07b-4183-82f6-5e346c74aac9",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>name</th>\n",
|
||
" <th>type</th>\n",
|
||
" <th>subtype</th>\n",
|
||
" <th>total</th>\n",
|
||
" <th>hp</th>\n",
|
||
" <th>attack</th>\n",
|
||
" <th>defense</th>\n",
|
||
" <th>special_attack</th>\n",
|
||
" <th>special_defense</th>\n",
|
||
" <th>speed</th>\n",
|
||
" <th>generation</th>\n",
|
||
" <th>legendary</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>156</th>\n",
|
||
" <td>Articuno</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>Flying</td>\n",
|
||
" <td>580</td>\n",
|
||
" <td>90</td>\n",
|
||
" <td>85</td>\n",
|
||
" <td>100</td>\n",
|
||
" <td>95</td>\n",
|
||
" <td>125</td>\n",
|
||
" <td>85</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>True</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>157</th>\n",
|
||
" <td>Zapdos</td>\n",
|
||
" <td>Electric</td>\n",
|
||
" <td>Flying</td>\n",
|
||
" <td>580</td>\n",
|
||
" <td>90</td>\n",
|
||
" <td>90</td>\n",
|
||
" <td>85</td>\n",
|
||
" <td>125</td>\n",
|
||
" <td>90</td>\n",
|
||
" <td>100</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>True</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>158</th>\n",
|
||
" <td>Moltres</td>\n",
|
||
" <td>Fire</td>\n",
|
||
" <td>Flying</td>\n",
|
||
" <td>580</td>\n",
|
||
" <td>90</td>\n",
|
||
" <td>100</td>\n",
|
||
" <td>90</td>\n",
|
||
" <td>125</td>\n",
|
||
" <td>85</td>\n",
|
||
" <td>90</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>True</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>162</th>\n",
|
||
" <td>Mewtwo</td>\n",
|
||
" <td>Psychic</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>680</td>\n",
|
||
" <td>106</td>\n",
|
||
" <td>110</td>\n",
|
||
" <td>90</td>\n",
|
||
" <td>154</td>\n",
|
||
" <td>90</td>\n",
|
||
" <td>130</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>True</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>163</th>\n",
|
||
" <td>MewtwoMega Mewtwo X</td>\n",
|
||
" <td>Psychic</td>\n",
|
||
" <td>Fighting</td>\n",
|
||
" <td>780</td>\n",
|
||
" <td>106</td>\n",
|
||
" <td>190</td>\n",
|
||
" <td>100</td>\n",
|
||
" <td>154</td>\n",
|
||
" <td>100</td>\n",
|
||
" <td>130</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>True</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>795</th>\n",
|
||
" <td>Diancie</td>\n",
|
||
" <td>Rock</td>\n",
|
||
" <td>Fairy</td>\n",
|
||
" <td>600</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>100</td>\n",
|
||
" <td>150</td>\n",
|
||
" <td>100</td>\n",
|
||
" <td>150</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>True</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>796</th>\n",
|
||
" <td>DiancieMega Diancie</td>\n",
|
||
" <td>Rock</td>\n",
|
||
" <td>Fairy</td>\n",
|
||
" <td>700</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>160</td>\n",
|
||
" <td>110</td>\n",
|
||
" <td>160</td>\n",
|
||
" <td>110</td>\n",
|
||
" <td>110</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>True</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>797</th>\n",
|
||
" <td>HoopaHoopa Confined</td>\n",
|
||
" <td>Psychic</td>\n",
|
||
" <td>Ghost</td>\n",
|
||
" <td>600</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>110</td>\n",
|
||
" <td>60</td>\n",
|
||
" <td>150</td>\n",
|
||
" <td>130</td>\n",
|
||
" <td>70</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>True</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>798</th>\n",
|
||
" <td>HoopaHoopa Unbound</td>\n",
|
||
" <td>Psychic</td>\n",
|
||
" <td>Dark</td>\n",
|
||
" <td>680</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>160</td>\n",
|
||
" <td>60</td>\n",
|
||
" <td>170</td>\n",
|
||
" <td>130</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>True</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>799</th>\n",
|
||
" <td>Volcanion</td>\n",
|
||
" <td>Fire</td>\n",
|
||
" <td>Water</td>\n",
|
||
" <td>600</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>110</td>\n",
|
||
" <td>120</td>\n",
|
||
" <td>130</td>\n",
|
||
" <td>90</td>\n",
|
||
" <td>70</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>True</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>65 rows × 12 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" name type subtype total hp attack defense \\\n",
|
||
"156 Articuno Ice Flying 580 90 85 100 \n",
|
||
"157 Zapdos Electric Flying 580 90 90 85 \n",
|
||
"158 Moltres Fire Flying 580 90 100 90 \n",
|
||
"162 Mewtwo Psychic NaN 680 106 110 90 \n",
|
||
"163 MewtwoMega Mewtwo X Psychic Fighting 780 106 190 100 \n",
|
||
".. ... ... ... ... ... ... ... \n",
|
||
"795 Diancie Rock Fairy 600 50 100 150 \n",
|
||
"796 DiancieMega Diancie Rock Fairy 700 50 160 110 \n",
|
||
"797 HoopaHoopa Confined Psychic Ghost 600 80 110 60 \n",
|
||
"798 HoopaHoopa Unbound Psychic Dark 680 80 160 60 \n",
|
||
"799 Volcanion Fire Water 600 80 110 120 \n",
|
||
"\n",
|
||
" special_attack special_defense speed generation legendary \n",
|
||
"156 95 125 85 1 True \n",
|
||
"157 125 90 100 1 True \n",
|
||
"158 125 85 90 1 True \n",
|
||
"162 154 90 130 1 True \n",
|
||
"163 154 100 130 1 True \n",
|
||
".. ... ... ... ... ... \n",
|
||
"795 100 150 50 6 True \n",
|
||
"796 160 110 110 6 True \n",
|
||
"797 150 130 70 6 True \n",
|
||
"798 170 130 80 6 True \n",
|
||
"799 130 90 70 6 True \n",
|
||
"\n",
|
||
"[65 rows x 12 columns]"
|
||
]
|
||
},
|
||
"execution_count": 14,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"legendary = pokemon[pokemon.legendary]\n",
|
||
"legendary"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "b4ad804a-f5f0-441f-bb83-51f360c1c154",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's get all the ice pokémon. We can create a boolean series from another series..."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 17,
|
||
"id": "5d089acf-7b76-4f91-8803-42a4a9a11e3e",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"0 False\n",
|
||
"1 False\n",
|
||
"2 False\n",
|
||
"3 False\n",
|
||
"4 False\n",
|
||
" ... \n",
|
||
"795 False\n",
|
||
"796 False\n",
|
||
"797 False\n",
|
||
"798 False\n",
|
||
"799 False\n",
|
||
"Name: type, Length: 800, dtype: bool"
|
||
]
|
||
},
|
||
"execution_count": 17,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"pokemon.type == \"Ice\"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "a5ea9e89-f466-48de-9133-346c99f4a6c1",
|
||
"metadata": {},
|
||
"source": [
|
||
"And then use this series to select just the ice pokémon. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 16,
|
||
"id": "510fa0fc-2b38-4725-9bbf-ec57d62792be",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>name</th>\n",
|
||
" <th>type</th>\n",
|
||
" <th>subtype</th>\n",
|
||
" <th>total</th>\n",
|
||
" <th>hp</th>\n",
|
||
" <th>attack</th>\n",
|
||
" <th>defense</th>\n",
|
||
" <th>special_attack</th>\n",
|
||
" <th>special_defense</th>\n",
|
||
" <th>speed</th>\n",
|
||
" <th>generation</th>\n",
|
||
" <th>legendary</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>133</th>\n",
|
||
" <td>Jynx</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>Psychic</td>\n",
|
||
" <td>455</td>\n",
|
||
" <td>65</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>35</td>\n",
|
||
" <td>115</td>\n",
|
||
" <td>95</td>\n",
|
||
" <td>95</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>156</th>\n",
|
||
" <td>Articuno</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>Flying</td>\n",
|
||
" <td>580</td>\n",
|
||
" <td>90</td>\n",
|
||
" <td>85</td>\n",
|
||
" <td>100</td>\n",
|
||
" <td>95</td>\n",
|
||
" <td>125</td>\n",
|
||
" <td>85</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>True</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>238</th>\n",
|
||
" <td>Swinub</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>Ground</td>\n",
|
||
" <td>250</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>40</td>\n",
|
||
" <td>30</td>\n",
|
||
" <td>30</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>239</th>\n",
|
||
" <td>Piloswine</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>Ground</td>\n",
|
||
" <td>450</td>\n",
|
||
" <td>100</td>\n",
|
||
" <td>100</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>60</td>\n",
|
||
" <td>60</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>243</th>\n",
|
||
" <td>Delibird</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>Flying</td>\n",
|
||
" <td>330</td>\n",
|
||
" <td>45</td>\n",
|
||
" <td>55</td>\n",
|
||
" <td>45</td>\n",
|
||
" <td>65</td>\n",
|
||
" <td>45</td>\n",
|
||
" <td>75</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>257</th>\n",
|
||
" <td>Smoochum</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>Psychic</td>\n",
|
||
" <td>305</td>\n",
|
||
" <td>45</td>\n",
|
||
" <td>30</td>\n",
|
||
" <td>15</td>\n",
|
||
" <td>85</td>\n",
|
||
" <td>65</td>\n",
|
||
" <td>65</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>395</th>\n",
|
||
" <td>Snorunt</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>300</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>396</th>\n",
|
||
" <td>Glalie</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>480</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>397</th>\n",
|
||
" <td>GlalieMega Glalie</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>580</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>120</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>120</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>100</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>398</th>\n",
|
||
" <td>Spheal</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>Water</td>\n",
|
||
" <td>290</td>\n",
|
||
" <td>70</td>\n",
|
||
" <td>40</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>55</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>25</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>399</th>\n",
|
||
" <td>Sealeo</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>Water</td>\n",
|
||
" <td>410</td>\n",
|
||
" <td>90</td>\n",
|
||
" <td>60</td>\n",
|
||
" <td>70</td>\n",
|
||
" <td>75</td>\n",
|
||
" <td>70</td>\n",
|
||
" <td>45</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>400</th>\n",
|
||
" <td>Walrein</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>Water</td>\n",
|
||
" <td>530</td>\n",
|
||
" <td>110</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>90</td>\n",
|
||
" <td>95</td>\n",
|
||
" <td>90</td>\n",
|
||
" <td>65</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>415</th>\n",
|
||
" <td>Regice</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>580</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>100</td>\n",
|
||
" <td>100</td>\n",
|
||
" <td>200</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>True</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>522</th>\n",
|
||
" <td>Glaceon</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>525</td>\n",
|
||
" <td>65</td>\n",
|
||
" <td>60</td>\n",
|
||
" <td>110</td>\n",
|
||
" <td>130</td>\n",
|
||
" <td>95</td>\n",
|
||
" <td>65</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>524</th>\n",
|
||
" <td>Mamoswine</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>Ground</td>\n",
|
||
" <td>530</td>\n",
|
||
" <td>110</td>\n",
|
||
" <td>130</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>70</td>\n",
|
||
" <td>60</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>530</th>\n",
|
||
" <td>Froslass</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>Ghost</td>\n",
|
||
" <td>480</td>\n",
|
||
" <td>70</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>70</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>70</td>\n",
|
||
" <td>110</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>643</th>\n",
|
||
" <td>Vanillite</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>305</td>\n",
|
||
" <td>36</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>65</td>\n",
|
||
" <td>60</td>\n",
|
||
" <td>44</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>644</th>\n",
|
||
" <td>Vanillish</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>395</td>\n",
|
||
" <td>51</td>\n",
|
||
" <td>65</td>\n",
|
||
" <td>65</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>75</td>\n",
|
||
" <td>59</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>645</th>\n",
|
||
" <td>Vanilluxe</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>535</td>\n",
|
||
" <td>71</td>\n",
|
||
" <td>95</td>\n",
|
||
" <td>85</td>\n",
|
||
" <td>110</td>\n",
|
||
" <td>95</td>\n",
|
||
" <td>79</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>674</th>\n",
|
||
" <td>Cubchoo</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>305</td>\n",
|
||
" <td>55</td>\n",
|
||
" <td>70</td>\n",
|
||
" <td>40</td>\n",
|
||
" <td>60</td>\n",
|
||
" <td>40</td>\n",
|
||
" <td>40</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>675</th>\n",
|
||
" <td>Beartic</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>485</td>\n",
|
||
" <td>95</td>\n",
|
||
" <td>110</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>70</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>676</th>\n",
|
||
" <td>Cryogonal</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>485</td>\n",
|
||
" <td>70</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>30</td>\n",
|
||
" <td>95</td>\n",
|
||
" <td>135</td>\n",
|
||
" <td>105</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>788</th>\n",
|
||
" <td>Bergmite</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>304</td>\n",
|
||
" <td>55</td>\n",
|
||
" <td>69</td>\n",
|
||
" <td>85</td>\n",
|
||
" <td>32</td>\n",
|
||
" <td>35</td>\n",
|
||
" <td>28</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>789</th>\n",
|
||
" <td>Avalugg</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>514</td>\n",
|
||
" <td>95</td>\n",
|
||
" <td>117</td>\n",
|
||
" <td>184</td>\n",
|
||
" <td>44</td>\n",
|
||
" <td>46</td>\n",
|
||
" <td>28</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" name type subtype total hp attack defense \\\n",
|
||
"133 Jynx Ice Psychic 455 65 50 35 \n",
|
||
"156 Articuno Ice Flying 580 90 85 100 \n",
|
||
"238 Swinub Ice Ground 250 50 50 40 \n",
|
||
"239 Piloswine Ice Ground 450 100 100 80 \n",
|
||
"243 Delibird Ice Flying 330 45 55 45 \n",
|
||
"257 Smoochum Ice Psychic 305 45 30 15 \n",
|
||
"395 Snorunt Ice NaN 300 50 50 50 \n",
|
||
"396 Glalie Ice NaN 480 80 80 80 \n",
|
||
"397 GlalieMega Glalie Ice NaN 580 80 120 80 \n",
|
||
"398 Spheal Ice Water 290 70 40 50 \n",
|
||
"399 Sealeo Ice Water 410 90 60 70 \n",
|
||
"400 Walrein Ice Water 530 110 80 90 \n",
|
||
"415 Regice Ice NaN 580 80 50 100 \n",
|
||
"522 Glaceon Ice NaN 525 65 60 110 \n",
|
||
"524 Mamoswine Ice Ground 530 110 130 80 \n",
|
||
"530 Froslass Ice Ghost 480 70 80 70 \n",
|
||
"643 Vanillite Ice NaN 305 36 50 50 \n",
|
||
"644 Vanillish Ice NaN 395 51 65 65 \n",
|
||
"645 Vanilluxe Ice NaN 535 71 95 85 \n",
|
||
"674 Cubchoo Ice NaN 305 55 70 40 \n",
|
||
"675 Beartic Ice NaN 485 95 110 80 \n",
|
||
"676 Cryogonal Ice NaN 485 70 50 30 \n",
|
||
"788 Bergmite Ice NaN 304 55 69 85 \n",
|
||
"789 Avalugg Ice NaN 514 95 117 184 \n",
|
||
"\n",
|
||
" special_attack special_defense speed generation legendary \n",
|
||
"133 115 95 95 1 False \n",
|
||
"156 95 125 85 1 True \n",
|
||
"238 30 30 50 2 False \n",
|
||
"239 60 60 50 2 False \n",
|
||
"243 65 45 75 2 False \n",
|
||
"257 85 65 65 2 False \n",
|
||
"395 50 50 50 3 False \n",
|
||
"396 80 80 80 3 False \n",
|
||
"397 120 80 100 3 False \n",
|
||
"398 55 50 25 3 False \n",
|
||
"399 75 70 45 3 False \n",
|
||
"400 95 90 65 3 False \n",
|
||
"415 100 200 50 3 True \n",
|
||
"522 130 95 65 4 False \n",
|
||
"524 70 60 80 4 False \n",
|
||
"530 80 70 110 4 False \n",
|
||
"643 65 60 44 5 False \n",
|
||
"644 80 75 59 5 False \n",
|
||
"645 110 95 79 5 False \n",
|
||
"674 60 40 40 5 False \n",
|
||
"675 70 80 50 5 False \n",
|
||
"676 95 135 105 5 False \n",
|
||
"788 32 35 28 6 False \n",
|
||
"789 44 46 28 6 False "
|
||
]
|
||
},
|
||
"execution_count": 16,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"ice = pokemon[pokemon.type == \"Ice\"]\n",
|
||
"ice"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "0af5f534-0bec-4577-beee-29b350102265",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's get the high-speed ice pokémon. You can join conditions together using the `&` (and) and `|` (or) operators. `~` means \"not\", so `pokemon[~(pokemon.type == \"Ice\")]` would select all the non-ice pokémon. Due to order of operations, each condition needs to be wrapped in parentheses."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 41,
|
||
"id": "05d4c5c2-c6b4-4795-9799-c884b15445a1",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>name</th>\n",
|
||
" <th>type</th>\n",
|
||
" <th>subtype</th>\n",
|
||
" <th>total</th>\n",
|
||
" <th>hp</th>\n",
|
||
" <th>attack</th>\n",
|
||
" <th>defense</th>\n",
|
||
" <th>special_attack</th>\n",
|
||
" <th>special_defense</th>\n",
|
||
" <th>speed</th>\n",
|
||
" <th>generation</th>\n",
|
||
" <th>legendary</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>133</th>\n",
|
||
" <td>Jynx</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>Psychic</td>\n",
|
||
" <td>455</td>\n",
|
||
" <td>65</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>35</td>\n",
|
||
" <td>115</td>\n",
|
||
" <td>95</td>\n",
|
||
" <td>95</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>156</th>\n",
|
||
" <td>Articuno</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>Flying</td>\n",
|
||
" <td>580</td>\n",
|
||
" <td>90</td>\n",
|
||
" <td>85</td>\n",
|
||
" <td>100</td>\n",
|
||
" <td>95</td>\n",
|
||
" <td>125</td>\n",
|
||
" <td>85</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>True</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>396</th>\n",
|
||
" <td>Glalie</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>480</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>397</th>\n",
|
||
" <td>GlalieMega Glalie</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>580</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>120</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>120</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>100</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>524</th>\n",
|
||
" <td>Mamoswine</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>Ground</td>\n",
|
||
" <td>530</td>\n",
|
||
" <td>110</td>\n",
|
||
" <td>130</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>70</td>\n",
|
||
" <td>60</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>530</th>\n",
|
||
" <td>Froslass</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>Ghost</td>\n",
|
||
" <td>480</td>\n",
|
||
" <td>70</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>70</td>\n",
|
||
" <td>80</td>\n",
|
||
" <td>70</td>\n",
|
||
" <td>110</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>676</th>\n",
|
||
" <td>Cryogonal</td>\n",
|
||
" <td>Ice</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>485</td>\n",
|
||
" <td>70</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>30</td>\n",
|
||
" <td>95</td>\n",
|
||
" <td>135</td>\n",
|
||
" <td>105</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>False</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" name type subtype total hp attack defense \\\n",
|
||
"133 Jynx Ice Psychic 455 65 50 35 \n",
|
||
"156 Articuno Ice Flying 580 90 85 100 \n",
|
||
"396 Glalie Ice NaN 480 80 80 80 \n",
|
||
"397 GlalieMega Glalie Ice NaN 580 80 120 80 \n",
|
||
"524 Mamoswine Ice Ground 530 110 130 80 \n",
|
||
"530 Froslass Ice Ghost 480 70 80 70 \n",
|
||
"676 Cryogonal Ice NaN 485 70 50 30 \n",
|
||
"\n",
|
||
" special_attack special_defense speed generation legendary \n",
|
||
"133 115 95 95 1 False \n",
|
||
"156 95 125 85 1 True \n",
|
||
"396 80 80 80 3 False \n",
|
||
"397 120 80 100 3 False \n",
|
||
"524 70 60 80 4 False \n",
|
||
"530 80 70 110 4 False \n",
|
||
"676 95 135 105 5 False "
|
||
]
|
||
},
|
||
"execution_count": 41,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"high_speed_ice = pokemon[(pokemon.type == \"Ice\") & (pokemon.speed >= 80)]\n",
|
||
"high_speed_ice"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "c84dc7ce-24f2-4ac7-92d7-99ed331488e0",
|
||
"metadata": {},
|
||
"source": [
|
||
"You could get the pokémon who are fire or ice by selecting `pokemon[(pokemon.type == \"Fire\") | (pokemon.type == \"Ice\")]`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "1f0e9625-b194-450d-b003-b88798cc2f45",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Your turn\n",
|
||
"\n",
|
||
"**1.3.0.** `no_doctor` indicates whether there was a time in the last year when the person needed to see a doctor, but could not afford to do so. Create a dataframe containing only these people. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 15,
|
||
"id": "198cb0c6-3f43-43c2-9eee-3939c12ea537",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>age</th>\n",
|
||
" <th>sex</th>\n",
|
||
" <th>income</th>\n",
|
||
" <th>education</th>\n",
|
||
" <th>sexual_orientation</th>\n",
|
||
" <th>height</th>\n",
|
||
" <th>weight</th>\n",
|
||
" <th>health</th>\n",
|
||
" <th>no_doctor</th>\n",
|
||
" <th>exercise</th>\n",
|
||
" <th>sleep</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>55</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>other</td>\n",
|
||
" <td>1.55</td>\n",
|
||
" <td>83.01</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>7</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>35</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.65</td>\n",
|
||
" <td>77.11</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>7</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>24</th>\n",
|
||
" <td>35</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.73</td>\n",
|
||
" <td>94.35</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>50</th>\n",
|
||
" <td>35</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.78</td>\n",
|
||
" <td>81.65</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>10</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>66</th>\n",
|
||
" <td>45</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.57</td>\n",
|
||
" <td>72.57</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>7</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>166407</th>\n",
|
||
" <td>18</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.68</td>\n",
|
||
" <td>68.04</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>166409</th>\n",
|
||
" <td>25</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.57</td>\n",
|
||
" <td>58.51</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>7</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>166414</th>\n",
|
||
" <td>55</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.63</td>\n",
|
||
" <td>88.45</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>6</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>166416</th>\n",
|
||
" <td>65</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.50</td>\n",
|
||
" <td>55.34</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>6</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>166423</th>\n",
|
||
" <td>35</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.60</td>\n",
|
||
" <td>68.04</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>6</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>13784 rows × 11 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" age sex income education sexual_orientation height weight \\\n",
|
||
"0 55 female 5 2 other 1.55 83.01 \n",
|
||
"2 35 female 8 4 heterosexual 1.65 77.11 \n",
|
||
"24 35 male 8 3 heterosexual 1.73 94.35 \n",
|
||
"50 35 female 4 2 heterosexual 1.78 81.65 \n",
|
||
"66 45 female 6 4 heterosexual 1.57 72.57 \n",
|
||
"... ... ... ... ... ... ... ... \n",
|
||
"166407 18 male 5 2 heterosexual 1.68 68.04 \n",
|
||
"166409 25 male 6 2 heterosexual 1.57 58.51 \n",
|
||
"166414 55 female 8 3 heterosexual 1.63 88.45 \n",
|
||
"166416 65 female 5 2 heterosexual 1.50 55.34 \n",
|
||
"166423 35 female 5 4 heterosexual 1.60 68.04 \n",
|
||
"\n",
|
||
" health no_doctor exercise sleep \n",
|
||
"0 2 True True 7 \n",
|
||
"2 4 True True 7 \n",
|
||
"24 4 True False 8 \n",
|
||
"50 4 True False 10 \n",
|
||
"66 4 True True 7 \n",
|
||
"... ... ... ... ... \n",
|
||
"166407 3 True True 8 \n",
|
||
"166409 4 True False 7 \n",
|
||
"166414 3 True False 6 \n",
|
||
"166416 3 True False 6 \n",
|
||
"166423 4 True True 6 \n",
|
||
"\n",
|
||
"[13784 rows x 11 columns]"
|
||
]
|
||
},
|
||
"execution_count": 15,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"no_doctor = people[people.no_doctor]\n",
|
||
"no_doctor\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "9d213707-a15b-4751-8df9-48aa568af209",
|
||
"metadata": {},
|
||
"source": [
|
||
"**1.3.1.** `health` asks people for their general health, with the meanings of numbers shown below. Create a dataframe which contains people whose general health is good or better. \n",
|
||
"\n",
|
||
"| number | health status | \n",
|
||
"| ------ | ----------- |\n",
|
||
"| 1 | Poor |\n",
|
||
"| 2 | Fair |\n",
|
||
"| 3 | Good |\n",
|
||
"| 4 | Very good |\n",
|
||
"| 5 | Excellent |"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 46,
|
||
"id": "8a8c1ad6-4c1e-4996-ab5e-5212dadb1851",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>age</th>\n",
|
||
" <th>sex</th>\n",
|
||
" <th>income</th>\n",
|
||
" <th>education</th>\n",
|
||
" <th>sexual_orientation</th>\n",
|
||
" <th>height</th>\n",
|
||
" <th>weight</th>\n",
|
||
" <th>health</th>\n",
|
||
" <th>no_doctor</th>\n",
|
||
" <th>exercise</th>\n",
|
||
" <th>sleep</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>65</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.65</td>\n",
|
||
" <td>78.02</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>35</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.65</td>\n",
|
||
" <td>77.11</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>7</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>55</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.83</td>\n",
|
||
" <td>81.65</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>55</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.80</td>\n",
|
||
" <td>76.66</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>55</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.80</td>\n",
|
||
" <td>74.84</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>7</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>166418</th>\n",
|
||
" <td>55</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>7</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.57</td>\n",
|
||
" <td>63.50</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>166419</th>\n",
|
||
" <td>45</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.52</td>\n",
|
||
" <td>68.04</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>7</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>166421</th>\n",
|
||
" <td>25</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>7</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.78</td>\n",
|
||
" <td>86.18</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>6</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>166423</th>\n",
|
||
" <td>35</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.60</td>\n",
|
||
" <td>68.04</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>6</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>166424</th>\n",
|
||
" <td>35</td>\n",
|
||
" <td>male</td>\n",
|
||
" <td>7</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.75</td>\n",
|
||
" <td>86.18</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>False</td>\n",
|
||
" <td>8</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>142249 rows × 11 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" age sex income education sexual_orientation height weight \\\n",
|
||
"1 65 female 8 1 heterosexual 1.65 78.02 \n",
|
||
"2 35 female 8 4 heterosexual 1.65 77.11 \n",
|
||
"3 55 male 8 4 heterosexual 1.83 81.65 \n",
|
||
"4 55 female 8 4 heterosexual 1.80 76.66 \n",
|
||
"5 55 male 8 4 heterosexual 1.80 74.84 \n",
|
||
"... ... ... ... ... ... ... ... \n",
|
||
"166418 55 male 7 2 heterosexual 1.57 63.50 \n",
|
||
"166419 45 female 8 2 heterosexual 1.52 68.04 \n",
|
||
"166421 25 male 7 2 heterosexual 1.78 86.18 \n",
|
||
"166423 35 female 5 4 heterosexual 1.60 68.04 \n",
|
||
"166424 35 male 7 2 heterosexual 1.75 86.18 \n",
|
||
"\n",
|
||
" health no_doctor exercise sleep \n",
|
||
"1 3 False False 8 \n",
|
||
"2 4 True True 7 \n",
|
||
"3 5 False True 8 \n",
|
||
"4 4 False True 8 \n",
|
||
"5 5 False True 7 \n",
|
||
"... ... ... ... ... \n",
|
||
"166418 3 False True 8 \n",
|
||
"166419 3 False True 7 \n",
|
||
"166421 4 False True 6 \n",
|
||
"166423 4 True True 6 \n",
|
||
"166424 3 False False 8 \n",
|
||
"\n",
|
||
"[142249 rows x 11 columns]"
|
||
]
|
||
},
|
||
"execution_count": 46,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"health = people[people['health'] >= 3]\n",
|
||
"health"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "7add542b-bfd2-481a-b5b4-4e1ca744078a",
|
||
"metadata": {},
|
||
"source": [
|
||
"**1.3.2.**. `education` indicates the highest level of education completed, with codes as follows. Create a dataframe which only contains female college graduates who needed a doctor but couldn't afford one. (The survey asked people for their current sex, and only had options for male and female.)\n",
|
||
"\n",
|
||
"| number | education level | \n",
|
||
"| ------ | ----------- |\n",
|
||
"| 1 | Did not graduate from high school |\n",
|
||
"| 2 | Graduated from high school |\n",
|
||
"| 3 | Attended some college |\n",
|
||
"| 4 | Graduated from college |"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 50,
|
||
"id": "315682ae-7d54-4d78-9a63-d23c83ba1576",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>age</th>\n",
|
||
" <th>sex</th>\n",
|
||
" <th>income</th>\n",
|
||
" <th>education</th>\n",
|
||
" <th>sexual_orientation</th>\n",
|
||
" <th>height</th>\n",
|
||
" <th>weight</th>\n",
|
||
" <th>health</th>\n",
|
||
" <th>no_doctor</th>\n",
|
||
" <th>exercise</th>\n",
|
||
" <th>sleep</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>35</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.65</td>\n",
|
||
" <td>77.11</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>7</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>66</th>\n",
|
||
" <td>45</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.57</td>\n",
|
||
" <td>72.57</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>7</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>135</th>\n",
|
||
" <td>55</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.70</td>\n",
|
||
" <td>81.65</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>7</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>146</th>\n",
|
||
" <td>65</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.55</td>\n",
|
||
" <td>72.57</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>7</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>259</th>\n",
|
||
" <td>65</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.57</td>\n",
|
||
" <td>56.70</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>6</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>166121</th>\n",
|
||
" <td>35</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.65</td>\n",
|
||
" <td>136.08</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>5</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>166219</th>\n",
|
||
" <td>55</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>other</td>\n",
|
||
" <td>1.52</td>\n",
|
||
" <td>59.87</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>5</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>166272</th>\n",
|
||
" <td>25</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.52</td>\n",
|
||
" <td>98.88</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>166381</th>\n",
|
||
" <td>45</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.52</td>\n",
|
||
" <td>49.90</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>6</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>166423</th>\n",
|
||
" <td>35</td>\n",
|
||
" <td>female</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>heterosexual</td>\n",
|
||
" <td>1.60</td>\n",
|
||
" <td>68.04</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>True</td>\n",
|
||
" <td>6</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>2321 rows × 11 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" age sex income education sexual_orientation height weight \\\n",
|
||
"2 35 female 8 4 heterosexual 1.65 77.11 \n",
|
||
"66 45 female 6 4 heterosexual 1.57 72.57 \n",
|
||
"135 55 female 6 4 heterosexual 1.70 81.65 \n",
|
||
"146 65 female 5 4 heterosexual 1.55 72.57 \n",
|
||
"259 65 female 4 4 heterosexual 1.57 56.70 \n",
|
||
"... ... ... ... ... ... ... ... \n",
|
||
"166121 35 female 8 4 heterosexual 1.65 136.08 \n",
|
||
"166219 55 female 5 4 other 1.52 59.87 \n",
|
||
"166272 25 female 8 4 heterosexual 1.52 98.88 \n",
|
||
"166381 45 female 5 4 heterosexual 1.52 49.90 \n",
|
||
"166423 35 female 5 4 heterosexual 1.60 68.04 \n",
|
||
"\n",
|
||
" health no_doctor exercise sleep \n",
|
||
"2 4 True True 7 \n",
|
||
"66 4 True True 7 \n",
|
||
"135 4 True True 7 \n",
|
||
"146 5 True True 7 \n",
|
||
"259 3 True True 6 \n",
|
||
"... ... ... ... ... \n",
|
||
"166121 4 True True 5 \n",
|
||
"166219 3 True True 5 \n",
|
||
"166272 3 True True 8 \n",
|
||
"166381 5 True True 6 \n",
|
||
"166423 4 True True 6 \n",
|
||
"\n",
|
||
"[2321 rows x 11 columns]"
|
||
]
|
||
},
|
||
"execution_count": 50,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"education = people[(people['sex'] == 'female') & (people['education'] == 4) & (people['no_doctor'] == True)]\n",
|
||
"education"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "646d1148-7d94-4521-a04a-fbf17ade1235",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 1.4. Grouping\n",
|
||
"\n",
|
||
"Now things get crazy. You can group a dataframe using one or more columns, and then compare their statistics. \n",
|
||
"\n",
|
||
"#### Demo\n",
|
||
"\n",
|
||
"Do different types of pokémon move at different speeds? We'll use `sort_values` to put these in order from slow to fast."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "069ea0ab-eff6-4985-9f46-db956fe1df91",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"pokemon.groupby(\"type\").speed.mean().sort_values()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "bdc801b7-d3ae-45bb-80f4-ebeb474e20a1",
|
||
"metadata": {},
|
||
"source": [
|
||
"Do types differ in other stats? Let's sort by hit points. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "5c420c0e-b5d2-49ae-ab98-3305ee076169",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"ptypes = pokemon.groupby(\"type\")\n",
|
||
"ptypes[[\"hp\", \"attack\", \"defense\"]].mean().sort_values(\"hp\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "cc9a3d19-0ecd-487b-b34f-b748c44fc9c9",
|
||
"metadata": {},
|
||
"source": [
|
||
"Which type/subtype combinations are most likely to have legendary pokémon?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "444a580d-e70c-48a1-bf87-77f98b8c9f85",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"legendary_percentages = pokemon.groupby([\"type\", \"subtype\"]).legendary.mean().sort_values() \n",
|
||
"legendary_percentages[legendary_percentages > 0.5]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "de23775b-8670-4371-913d-d8fa1d1f3a76",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Your turn\n",
|
||
"\n",
|
||
"**1.4.0.** `income` records peoples' annual income, in the following bands. `sleep` records the average hours of sleep someone gets per night. Is there a difference in the average hours of sleep by income level?\n",
|
||
"\n",
|
||
"| number | annual income, in $1000 | \n",
|
||
"| ------ | ----------- |\n",
|
||
"| 1 | Less than 10 |\n",
|
||
"| 2 | 10-15 |\n",
|
||
"| 3 | 15-20 |\n",
|
||
"| 4 | 20-25 |\n",
|
||
"| 5 | 25-35 |\n",
|
||
"| 6 | 35-50 |\n",
|
||
"| 7 | 50-75 |\n",
|
||
"| 8 | More than 75 |"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "75c1ac4f-3914-4c0a-a156-2e084002df66",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# YOUR CODE HERE"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "f6413f2b-26a0-4b70-976f-90e45558c4bb",
|
||
"metadata": {},
|
||
"source": [
|
||
"**1.4.0.** Is there a difference in peoples' income or general health, by sex and education level? "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "d46df8a1-bbc2-45a4-9be1-cee1858cbf21",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# YOUR CODE HERE"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "931d602b-ddf4-4c8b-80e0-f886267cce76",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 1.5. Plotting \n",
|
||
"\n",
|
||
"Pandas has excellent built-in plotting capabilities, but \n",
|
||
"we are going to use the [seaborn](https://seaborn.pydata.org/) library because it's a bit \n",
|
||
"more intuitive and produces more beautiful plots. `set_theme`, called here without any arguments, assigns the default color palette. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "b1e06e4f-6b9e-42af-a27c-dbb525a259ce",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import seaborn as sns\n",
|
||
"sns.set_theme()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "a15ad672-13e8-4bdd-bc31-a489a1730daf",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Demo\n",
|
||
"\n",
|
||
"**When you want to visualize the distribution of a series**, a [histogram](https://seaborn.pydata.org/generated/seaborn.histplot.html) puts data into bins and plots the number of data points in each bin.\n",
|
||
"\n",
|
||
"Let's see the distribution of pokémon attack values. Note how assigning `x=\"attack\"` spreads attack values over the x-axis, while `y=\"attack\"` spreads attack values over the y-axis. The number of bins is selected automatically, but you can specify this with the optional `bins` argument. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "5ce066fe-f81d-4b78-a394-c5c2f4dc9f46",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"sns.histplot(data=pokemon, x=\"attack\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "bceb253b-ef4f-4aa2-aef4-cab2b3ca6d59",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"sns.histplot(data=pokemon, y=\"attack\", bins=5)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "2aac9186-86c0-41db-a1c4-8719bb78b46b",
|
||
"metadata": {},
|
||
"source": [
|
||
"**When you want to compare the distribution of a numeric variable across categories**, a [barplot](https://seaborn.pydata.org/generated/seaborn.barplot.html) is a good choice. Choose one numeric column and one categorical column. \n",
|
||
"\n",
|
||
"Let's see pokémon hit points by legendary/non-legendary. `ci=\"sd\"` shows the standard deviation for each category. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "92be1ad0-12bb-49f0-a3f6-85fcfd98e943",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"sns.barplot(data=pokemon, x=\"legendary\", y=\"hp\", ci=\"sd\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "4f75e1fa-a5d7-4d2c-a458-8190a7cd700e",
|
||
"metadata": {},
|
||
"source": [
|
||
"Here, we use a barplot to show average hit points by type. `ci=None` removes the standard deviation bars, because they clutter up the plot with too much detail. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "17f1c289-5990-4420-bfcb-e50eee0b8af6",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"sns.barplot(data=pokemon, x=\"hp\", y=\"type\", ci=None, palette=\"muted\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "213d6139-203f-4d81-a4b1-6f98cb184662",
|
||
"metadata": {},
|
||
"source": [
|
||
"**When you want to show how many observations are the intersection of multiple categories,** a [countplot](https://seaborn.pydata.org/generated/seaborn.countplot.html) is a good choice. \n",
|
||
"\n",
|
||
"To demonstrate this, let's convert the numeric variable `speed` into a categorical variable, `speed_category`, using the built-in function [cut](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html). "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "3c8e9f47-9aea-4bf0-a628-7aa1a66a8eee",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"bins = [0, 50, 100, 200]\n",
|
||
"labels = [\"slow\", \"medium\", \"fast\"]\n",
|
||
"pokemon[\"speed_category\"] = pd.cut(pokemon.speed, bins=bins, labels=labels)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "22f78bec-3d18-4133-ba9f-6595d7181ded",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"sns.countplot(data=pokemon, x=\"speed_category\", hue=\"legendary\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "fd508c13-9900-4be1-958f-4f9e9e9b633a",
|
||
"metadata": {},
|
||
"source": [
|
||
"**When you want to show the relationship between two numeric variables**, a [scatterplot](https://seaborn.pydata.org/generated/seaborn.scatterplot.html) is a good choice. \n",
|
||
"\n",
|
||
"Here, we plot pokémon hit points against their speed. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "444d9832-bd57-4238-9ea4-5ee898847170",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"sns.scatterplot(data=pokemon, x=\"hp\", y=\"speed\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "03e81709-393d-4c41-bce5-2dffc9cf5553",
|
||
"metadata": {},
|
||
"source": [
|
||
"You can distinguish between categories within a scatter plot by assigning a categorical variable to `hue`. We set the marker size with `s` and their opacity with `alpha`. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "86f9747b-00a3-407f-9b73-0bce40bac50d",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"sns.scatterplot(data=pokemon, x=\"hp\", y=\"speed\", hue=\"legendary\", alpha=0.5, s=60)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "f3741251-2a2b-437e-b68f-084fb4399e9f",
|
||
"metadata": {},
|
||
"source": [
|
||
"Finally, if you want scatter plots across multiple categories, a [relplot](https://seaborn.pydata.org/generated/seaborn.relplot.html) lets you distribute categories across rows and colums in a grid. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "7385237c-6a5c-4041-af46-559d6d84d1fa",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"favorite_types = pokemon[pokemon.type.isin([\"Fire\", \"Water\", \"Grass\"])]\n",
|
||
"sns.relplot(data=favorite_types, x=\"hp\", y=\"speed\", hue=\"legendary\", col=\"type\", s=100)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "c6a20904-416d-44be-a4f3-2107200fb3c2",
|
||
"metadata": {},
|
||
"source": [
|
||
"#### Your turn\n",
|
||
"\n",
|
||
"**1.5.0.** Plot a histogram of peoples' heights."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "3b268a30-42ff-4ab8-b2cd-c58a76121f9c",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Your code here"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "9b0c9120-fff4-42b2-8ab6-3aa2eba47806",
|
||
"metadata": {},
|
||
"source": [
|
||
"**1.5.1.** Plot a bar chart showing peoples' average hours of sleep by age. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "ee30c851-14b1-4901-9182-4304d54d53a6",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Your code here"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "15d94323-2d65-4100-9916-101516f6ccf1",
|
||
"metadata": {},
|
||
"source": [
|
||
"**1.5.2.** Plot a bar chart showing peoples' likelihood of getting exercise by income. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "13eeecd8-2518-4ed9-aac5-727a96b5bf80",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Your code here"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "b2705fef-470d-494c-86c1-8b3bd34b3660",
|
||
"metadata": {},
|
||
"source": [
|
||
"**1.5.3.** Plot a bar chart showing average reported health by age. For each age, show average health for those who get exercise and those who don't."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "4ee2eb69-2f9a-42e7-b5d3-9499631bfd06",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Your code here"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "84b1e240-4f75-4c86-8c1f-1026aa223717",
|
||
"metadata": {},
|
||
"source": [
|
||
"**1.5.4.** Create a plot showing the number of people at each income level, for each education level. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "d7e02da8-beab-40e7-95d0-74a5c2bc838e",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Your code here"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "ac717580-4157-402c-9262-b2b50dfe606f",
|
||
"metadata": {},
|
||
"source": [
|
||
"**1.5.5.** Plot side-by-side scatter plots showing the relationship between height and weight for males and females. (There are so many overlapping dots that the plot will be more informative if you lower the opacity of each dot. Try using `alpha=0.1` and `edgecolor=None`.)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "b00dd7d6-226b-469c-86d8-b71b328aa576",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Your code here"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "e9ff7225-5d08-428b-90e8-ee60f4a4049a",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 2. Crafting a data argument\n",
|
||
"\n",
|
||
"Everything up to here are just tools, worthless without a clear research question and a convincing argument. Choose a research question that interests you which might be answerable using the `people` dataset. Then do your best to find the answer in the space below. This answer should include data analysis (code cells) as well as written argument (text cells) explaining what the data means and why you believe it answers your question. \n",
|
||
"\n",
|
||
"Examples of research questions might include:\n",
|
||
"\n",
|
||
"- Do older people tend to have higher incomes?\n",
|
||
"- Do people who sleep at least 6 hours a night tend to report better health? \n",
|
||
"- Is it more common for males to be bisexual than females?\n",
|
||
"\n",
|
||
"**A note of caution:** this lab has given you tools for exploring associations--patterns that tend to co-occur. These tools *do not* equip you to argue that one variable causes another to change. For example: Plot 1.5.4 showed that people who are taller also tend to be heaver, with a lot of individual variation. But are people heavier *because* they are taller? Are they taller because they are heavier? Or maybe neither variable causes the other--perhaps they're both caused by something else. If you want to be able to answer questions like these, take a course on statistics."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "6f934273-b829-4bc2-a7f4-a27a3fc44a99",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Your code here. Feel free to add new text cells and code cells as necessary."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "8b4b852b-402c-45d4-b3bb-840e47b249ed",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.12.5"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|