{ "cells": [ { "cell_type": "markdown", "id": "90041b00-672b-4bd4-a8e8-0cab3f0548af", "metadata": {}, "source": [ "# Lab 04: Data Science Tools\n", "\n", "## 0. Jupyter Notebooks\n", "\n", "Welcome to your first Jupyter notebook! Notebooks are made up of cells. Some cells contain text (like this one) and others contain Python code.\n", "\n", "Each cell can be in two different modes: editing or running. To edit a cell, double-click on it. When you're done editing, press **shift+Enter** to run it. You can use [Markdown](https://www.markdownguide.org/cheat-sheet/) to add basic formatting to the text. Before you go on, try editing the text in this cell." ] }, { "cell_type": "code", "execution_count": 13, "id": "5923b0d7-c0e0-48fa-b765-4aa6002c2d4f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Other cells are code cells, containing Python code. (This is a comment, of course!)\n", "# Try running this cell (again, shift+Enter). You'll see the result of the final statement \n", "# printed below the cell. \n", "# Then try changing the Python code and re-run it.\n", "\n", "1+1+1+2" ] }, { "cell_type": "markdown", "id": "257ef44f-8f53-4136-9d0d-23a811ec53e9", "metadata": {}, "source": [ "### 0.1 Cells share state\n", "\n", "Even though code cells run one at a time, anything that happens in a cell (like declaring a variable or running a function) affects the whole notebook. Try running these two cells a few times, in different orders. What happens when you run *Cell B* over and over?" ] }, { "cell_type": "code", "execution_count": 1, "id": "0e2a2927-f6d1-4b13-97ae-ff97416723e9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "9" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Cell A\n", "x = 9\n", "x" ] }, { "cell_type": "code", "execution_count": 2, "id": "69dd7908-b213-4d0f-8016-e46a4a491961", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "18" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Cell B\n", "x = x * 2\n", "x" ] }, { "cell_type": "markdown", "id": "adc581ac-db13-40a8-bcfc-bf5d6e5472c5", "metadata": {}, "source": [ "### 0.2 Saving your work\n", "\n", "When you finish working on a notebook, save your work using the icon in the menu bar above. Your notebook is stored in the file `lab_pokemon.ipynb` in the lab directory. You can commit your changes to `ipynb` files just like any other file. Once you finish with Jupyter, you can stop the server by pressing **Control + C** in the Terminal. \n", "\n", "*If you're doing this lab on a cloud-based platform like Binder, then you can't save your work. So don't close the tab!*" ] }, { "cell_type": "markdown", "id": "0269bf0f-b993-4dfe-99cd-a7d38e94546c", "metadata": {}, "source": [ "---\n", "\n", "## 1. Pandas\n", "\n", "Pandas is probably the most important Python library for data science. Pandas provides an object called a **DataFrame**, which is basically a table with rows and columns. Most of the time, you will load data into Pandas using a `.csv` file. CSV files can be exported from Excel or Google Sheets, and are a common format for public data sets. \n", "\n", "In this lab, we'll be working with two data sets: The first contains Pokémon characteristics and the second comes from a wide-scale survey conducted by the US Centers for Disease Control ([details](https://www.cdc.gov/brfss/annual_data/annual_2020.html)). We will demonstrate techniques with Pokémon; your job is to replicate these tasks with the CDC dataset. \n", "\n", "**Note:** Pandas has *extensive* capabilities, and there's no way we could possibly present them all here. If you have a clearly-formed idea of what you want to do with tabular data, there's a way to do it. This lab introduces *some* of what Pandas can do, but expect to spend time reading the documentation and Stack Overflow when you start working on new tasks. \n", "\n", "### 1.0 Getting started\n", "\n", "First, we'll import pandas (using the conventional variable name `pd`) and load the two datasets. *Run these cells and every code cell you encounter in this notebook.*" ] }, { "cell_type": "markdown", "id": "f60aa4b0-7050-4e43-9619-5f8500770cb0", "metadata": {}, "source": [ "import pandas as pd\n", "\n", "pokemon = pd.read_csv(\"pokemon.csv\")\n", "people = pd.read_csv(\"brfss_2020.csv\")" ] }, { "cell_type": "markdown", "id": "d4e0b811-b8bf-4e9a-a934-3aad8f0520bb", "metadata": {}, "source": [ "### 1.1 A first look\n", "\n", "#### Demo\n", "\n", "Let's start by learning the *shape* of the data. How many columns are there? How many rows? What kinds of data are included?" ] }, { "cell_type": "code", "execution_count": 1, "id": "579d8dda-ca39-48b1-8819-b17651029729", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "pokemon = pd.read_csv(\"pokemon.csv\")\n", "people = pd.read_csv(\"brfss_2020.csv\")" ] }, { "cell_type": "markdown", "id": "ee8b0718-56f9-4fc8-bd35-fa0ccb445179", "metadata": {}, "source": [ "OK, 800 Pokémon, with 12 columns for each. And you can see all the columns. Not all the data is shown in this preview, of course. If there were more columns than could be displayed, you could see them all by typing `pokemon.columns`. \n", "\n", "#### Your turn\n", "\n", "Now do the same for your data set, `people`." ] }, { "cell_type": "code", "execution_count": 2, "id": "c9e5e4ec-b197-450c-ae2d-318006fa0a2f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | age | \n", "sex | \n", "income | \n", "education | \n", "sexual_orientation | \n", "height | \n", "weight | \n", "health | \n", "no_doctor | \n", "exercise | \n", "sleep | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "55 | \n", "female | \n", "5 | \n", "2 | \n", "other | \n", "1.55 | \n", "83.01 | \n", "2 | \n", "True | \n", "True | \n", "7 | \n", "
| 1 | \n", "65 | \n", "female | \n", "8 | \n", "1 | \n", "heterosexual | \n", "1.65 | \n", "78.02 | \n", "3 | \n", "False | \n", "False | \n", "8 | \n", "
| 2 | \n", "35 | \n", "female | \n", "8 | \n", "4 | \n", "heterosexual | \n", "1.65 | \n", "77.11 | \n", "4 | \n", "True | \n", "True | \n", "7 | \n", "
| 3 | \n", "55 | \n", "male | \n", "8 | \n", "4 | \n", "heterosexual | \n", "1.83 | \n", "81.65 | \n", "5 | \n", "False | \n", "True | \n", "8 | \n", "
| 4 | \n", "55 | \n", "female | \n", "8 | \n", "4 | \n", "heterosexual | \n", "1.80 | \n", "76.66 | \n", "4 | \n", "False | \n", "True | \n", "8 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 166420 | \n", "45 | \n", "female | \n", "8 | \n", "3 | \n", "heterosexual | \n", "1.63 | \n", "86.18 | \n", "1 | \n", "False | \n", "False | \n", "6 | \n", "
| 166421 | \n", "25 | \n", "male | \n", "7 | \n", "2 | \n", "heterosexual | \n", "1.78 | \n", "86.18 | \n", "4 | \n", "False | \n", "True | \n", "6 | \n", "
| 166422 | \n", "25 | \n", "female | \n", "1 | \n", "2 | \n", "heterosexual | \n", "1.91 | \n", "45.36 | \n", "1 | \n", "False | \n", "False | \n", "8 | \n", "
| 166423 | \n", "35 | \n", "female | \n", "5 | \n", "4 | \n", "heterosexual | \n", "1.60 | \n", "68.04 | \n", "4 | \n", "True | \n", "True | \n", "6 | \n", "
| 166424 | \n", "35 | \n", "male | \n", "7 | \n", "2 | \n", "heterosexual | \n", "1.75 | \n", "86.18 | \n", "3 | \n", "False | \n", "False | \n", "8 | \n", "
166425 rows × 11 columns
\n", "| \n", " | name | \n", "type | \n", "subtype | \n", "total | \n", "hp | \n", "attack | \n", "defense | \n", "special_attack | \n", "special_defense | \n", "speed | \n", "generation | \n", "legendary | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 156 | \n", "Articuno | \n", "Ice | \n", "Flying | \n", "580 | \n", "90 | \n", "85 | \n", "100 | \n", "95 | \n", "125 | \n", "85 | \n", "1 | \n", "True | \n", "
| 157 | \n", "Zapdos | \n", "Electric | \n", "Flying | \n", "580 | \n", "90 | \n", "90 | \n", "85 | \n", "125 | \n", "90 | \n", "100 | \n", "1 | \n", "True | \n", "
| 158 | \n", "Moltres | \n", "Fire | \n", "Flying | \n", "580 | \n", "90 | \n", "100 | \n", "90 | \n", "125 | \n", "85 | \n", "90 | \n", "1 | \n", "True | \n", "
| 162 | \n", "Mewtwo | \n", "Psychic | \n", "NaN | \n", "680 | \n", "106 | \n", "110 | \n", "90 | \n", "154 | \n", "90 | \n", "130 | \n", "1 | \n", "True | \n", "
| 163 | \n", "MewtwoMega Mewtwo X | \n", "Psychic | \n", "Fighting | \n", "780 | \n", "106 | \n", "190 | \n", "100 | \n", "154 | \n", "100 | \n", "130 | \n", "1 | \n", "True | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 795 | \n", "Diancie | \n", "Rock | \n", "Fairy | \n", "600 | \n", "50 | \n", "100 | \n", "150 | \n", "100 | \n", "150 | \n", "50 | \n", "6 | \n", "True | \n", "
| 796 | \n", "DiancieMega Diancie | \n", "Rock | \n", "Fairy | \n", "700 | \n", "50 | \n", "160 | \n", "110 | \n", "160 | \n", "110 | \n", "110 | \n", "6 | \n", "True | \n", "
| 797 | \n", "HoopaHoopa Confined | \n", "Psychic | \n", "Ghost | \n", "600 | \n", "80 | \n", "110 | \n", "60 | \n", "150 | \n", "130 | \n", "70 | \n", "6 | \n", "True | \n", "
| 798 | \n", "HoopaHoopa Unbound | \n", "Psychic | \n", "Dark | \n", "680 | \n", "80 | \n", "160 | \n", "60 | \n", "170 | \n", "130 | \n", "80 | \n", "6 | \n", "True | \n", "
| 799 | \n", "Volcanion | \n", "Fire | \n", "Water | \n", "600 | \n", "80 | \n", "110 | \n", "120 | \n", "130 | \n", "90 | \n", "70 | \n", "6 | \n", "True | \n", "
65 rows × 12 columns
\n", "| \n", " | name | \n", "type | \n", "subtype | \n", "total | \n", "hp | \n", "attack | \n", "defense | \n", "special_attack | \n", "special_defense | \n", "speed | \n", "generation | \n", "legendary | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 156 | \n", "Articuno | \n", "Ice | \n", "Flying | \n", "580 | \n", "90 | \n", "85 | \n", "100 | \n", "95 | \n", "125 | \n", "85 | \n", "1 | \n", "True | \n", "
| 157 | \n", "Zapdos | \n", "Electric | \n", "Flying | \n", "580 | \n", "90 | \n", "90 | \n", "85 | \n", "125 | \n", "90 | \n", "100 | \n", "1 | \n", "True | \n", "
| 158 | \n", "Moltres | \n", "Fire | \n", "Flying | \n", "580 | \n", "90 | \n", "100 | \n", "90 | \n", "125 | \n", "85 | \n", "90 | \n", "1 | \n", "True | \n", "
| 162 | \n", "Mewtwo | \n", "Psychic | \n", "NaN | \n", "680 | \n", "106 | \n", "110 | \n", "90 | \n", "154 | \n", "90 | \n", "130 | \n", "1 | \n", "True | \n", "
| 163 | \n", "MewtwoMega Mewtwo X | \n", "Psychic | \n", "Fighting | \n", "780 | \n", "106 | \n", "190 | \n", "100 | \n", "154 | \n", "100 | \n", "130 | \n", "1 | \n", "True | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 795 | \n", "Diancie | \n", "Rock | \n", "Fairy | \n", "600 | \n", "50 | \n", "100 | \n", "150 | \n", "100 | \n", "150 | \n", "50 | \n", "6 | \n", "True | \n", "
| 796 | \n", "DiancieMega Diancie | \n", "Rock | \n", "Fairy | \n", "700 | \n", "50 | \n", "160 | \n", "110 | \n", "160 | \n", "110 | \n", "110 | \n", "6 | \n", "True | \n", "
| 797 | \n", "HoopaHoopa Confined | \n", "Psychic | \n", "Ghost | \n", "600 | \n", "80 | \n", "110 | \n", "60 | \n", "150 | \n", "130 | \n", "70 | \n", "6 | \n", "True | \n", "
| 798 | \n", "HoopaHoopa Unbound | \n", "Psychic | \n", "Dark | \n", "680 | \n", "80 | \n", "160 | \n", "60 | \n", "170 | \n", "130 | \n", "80 | \n", "6 | \n", "True | \n", "
| 799 | \n", "Volcanion | \n", "Fire | \n", "Water | \n", "600 | \n", "80 | \n", "110 | \n", "120 | \n", "130 | \n", "90 | \n", "70 | \n", "6 | \n", "True | \n", "
65 rows × 12 columns
\n", "| \n", " | hp | \n", "attack | \n", "defense | \n", "
|---|---|---|---|
| type | \n", "\n", " | \n", " | \n", " | 
| Bug | \n", "56.884058 | \n", "70.971014 | \n", "70.724638 | \n", "
| Electric | \n", "59.795455 | \n", "69.090909 | \n", "66.295455 | \n", "
| Ghost | \n", "64.437500 | \n", "73.781250 | \n", "81.187500 | \n", "
| Steel | \n", "65.222222 | \n", "92.703704 | \n", "126.370370 | \n", "
| Rock | \n", "65.363636 | \n", "92.863636 | \n", "100.795455 | \n", "
| Dark | \n", "66.806452 | \n", "88.387097 | \n", "70.225806 | \n", "
| Poison | \n", "67.250000 | \n", "74.678571 | \n", "68.821429 | \n", "
| Grass | \n", "67.271429 | \n", "73.214286 | \n", "70.800000 | \n", "
| Fighting | \n", "69.851852 | \n", "96.777778 | \n", "65.925926 | \n", "
| Fire | \n", "69.903846 | \n", "84.769231 | \n", "67.769231 | \n", "
| Psychic | \n", "70.631579 | \n", "71.456140 | \n", "67.684211 | \n", "
| Flying | \n", "70.750000 | \n", "78.750000 | \n", "66.250000 | \n", "
| Ice | \n", "72.000000 | \n", "72.750000 | \n", "71.416667 | \n", "
| Water | \n", "72.062500 | \n", "74.151786 | \n", "72.946429 | \n", "
| Ground | \n", "73.781250 | \n", "95.750000 | \n", "84.843750 | \n", "
| Fairy | \n", "74.117647 | \n", "61.529412 | \n", "65.705882 | \n", "
| Normal | \n", "77.275510 | \n", "73.469388 | \n", "59.846939 | \n", "
| Dragon | \n", "83.312500 | \n", "112.125000 | \n", "86.375000 | \n", "