{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"width:1000 px\">\n",
    "\n",
    "<div style=\"float:right; width:98 px; height:98px;\">\n",
    "<img src=\"https://raw.githubusercontent.com/Unidata/MetPy/master/src/metpy/plots/_static/unidata_150x150.png\" alt=\"Unidata Logo\" style=\"height: 98px;\">\n",
    "</div>\n",
    "\n",
    "<h1>Introduction to Pandas</h1>\n",
    "<h3>Unidata Python Workshop</h3>\n",
    "\n",
    "<div style=\"clear:both\"></div>\n",
    "</div>\n",
    "\n",
    "<hr style=\"height:2px;\">\n",
    "\n",
    "### Questions\n",
    "1. What is Pandas?\n",
    "1. What are the basic Pandas data structures?\n",
    "1. How can I read data into Pandas?\n",
    "1. What are some of the data operations available in Pandas?\n",
    "\n",
    "### Objectives\n",
    "1. <a href=\"#series\">Data Series</a>\n",
    "1. <a href=\"#frames\">Data Frames</a>\n",
    "1. <a href=\"#loading\">Loading Data in Pandas</a>\n",
    "1. <a href=\"#missing\">Missing Data</a>\n",
    "1. <a href=\"#manipulating\">Manipulating Data</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name=\"series\"></a>\n",
    "## Data Series\n",
    "Data series are one of the fundamental data structures in Pandas. You can think of them like a dictionary; they have a key (index) and value (data/values) like a dictionary, but also have some handy functionality attached to them.\n",
    "\n",
    "To start out, let's create a series from scratch. We'll imagine these are temperature observations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    23\n",
       "1    20\n",
       "2    25\n",
       "3    18\n",
       "dtype: int64"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "temperatures = pd.Series([23, 20, 25, 18])\n",
    "temperatures"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The values on the left are the index (zero based integers by default) and on the right are the values. Notice that the data type is an integer. Any NumPy datatype is acceptable in a series.\n",
    "\n",
    "That's great, but it'd be more useful if the station were associated with those values. In fact you could say we want the values *indexed* by station name."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "TOP    23\n",
       "OUN    20\n",
       "DAL    25\n",
       "DEN    18\n",
       "dtype: int64"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "temperatures = pd.Series([23, 20, 25, 18], index=['TOP', 'OUN', 'DAL', 'DEN'])\n",
    "temperatures"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, very similar to a dictionary, we can use the index to access and modify elements."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "25"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "temperatures['DAL']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DAL    25\n",
       "OUN    20\n",
       "dtype: int64"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "temperatures[['DAL', 'OUN']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also do basic filtering, math, etc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "TOP    23\n",
       "DAL    25\n",
       "dtype: int64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "temperatures[temperatures > 20]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "TOP    25\n",
       "OUN    22\n",
       "DAL    27\n",
       "DEN    20\n",
       "dtype: int64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "temperatures + 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Remember how I said that series are like dictionaries? We can create a series straight from a dictionary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "TOP    14\n",
       "OUN    18\n",
       "DEN     9\n",
       "PHX    11\n",
       "DAL    23\n",
       "dtype: int64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dps = {'TOP': 14,\n",
    "       'OUN': 18,\n",
    "       'DEN': 9,\n",
    "       'PHX': 11,\n",
    "       'DAL': 23}\n",
    "\n",
    "dewpoints = pd.Series(dps)\n",
    "dewpoints"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's also easy to check and see if an index exists in a given series:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'PHX' in dewpoints"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "'PHX' in temperatures"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Series have a name attribute and their index has a name attribute."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "temperatures.name = 'temperature'\n",
    "temperatures.index.name = 'station'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "station\n",
       "TOP    23\n",
       "OUN    20\n",
       "DAL    25\n",
       "DEN    18\n",
       "Name: temperature, dtype: int64"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "temperatures"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "    <b>EXERCISE</b>:\n",
    "     <ul>\n",
    "         <li>Create a series of pressures for stations TOP, OUN, DEN, and DAL (assign any values you like).</li>\n",
    "         <li>Set the series name and series index name.</li>\n",
    "         <li>Print the pressures for all stations which have a dewpoint below 15.</li>\n",
    "    </ul>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "# YOUR CODE GOES HERE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    <b>SOLUTION</b>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "station\n",
      "TOP    1012.1\n",
      "DEN    1008.8\n",
      "Name: pressure, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "# %load solutions/make_series.py\n",
    "\n",
    "# Cell content replaced by load magic replacement.\n",
    "pressures = pd.Series([1012.1, 1010.6, 1008.8, 1011.2], index=['TOP', 'OUN', 'DEN', 'DAL'])\n",
    "pressures.name = 'pressure'\n",
    "pressures.index.name = 'station'\n",
    "print(pressures[dewpoints < 15])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"#top\">Top</a>\n",
    "<hr style=\"height:2px;\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name=\"frames\"></a>\n",
    "## Data Frames\n",
    "Series are great, but what about a bunch of related series? Something like a table or a spreadsheet? Enter the data frame. A data frame can be thought of as a dictionary of data series. They have indexes for their rows and their columns. Each data series can be of a different type, but they will all share a common index.\n",
    "\n",
    "The easiest way to create a data frame by hand is to use a dictionary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>station</th>\n",
       "      <th>temperature</th>\n",
       "      <th>dewpoint</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>TOP</td>\n",
       "      <td>23</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>OUN</td>\n",
       "      <td>20</td>\n",
       "      <td>18</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>DEN</td>\n",
       "      <td>25</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>DAL</td>\n",
       "      <td>18</td>\n",
       "      <td>23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  station  temperature  dewpoint\n",
       "0     TOP           23        14\n",
       "1     OUN           20        18\n",
       "2     DEN           25         9\n",
       "3     DAL           18        23"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = {'station': ['TOP', 'OUN', 'DEN', 'DAL'],\n",
    "        'temperature': [23, 20, 25, 18],\n",
    "        'dewpoint': [14, 18, 9, 23]}\n",
    "\n",
    "df = pd.DataFrame(data)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can access columns (data series) using dictionary type notation or attribute type notation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    23\n",
       "1    20\n",
       "2    25\n",
       "3    18\n",
       "Name: temperature, dtype: int64"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['temperature']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    14\n",
       "1    18\n",
       "2     9\n",
       "3    23\n",
       "Name: dewpoint, dtype: int64"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dewpoint"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice the index is shared and that the name of the column is attached as the series name.\n",
    "\n",
    "You can also create a new column and assign values. If I only pass a scalar it is duplicated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>station</th>\n",
       "      <th>temperature</th>\n",
       "      <th>dewpoint</th>\n",
       "      <th>wspeed</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>TOP</td>\n",
       "      <td>23</td>\n",
       "      <td>14</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>OUN</td>\n",
       "      <td>20</td>\n",
       "      <td>18</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>DEN</td>\n",
       "      <td>25</td>\n",
       "      <td>9</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>DAL</td>\n",
       "      <td>18</td>\n",
       "      <td>23</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  station  temperature  dewpoint  wspeed\n",
       "0     TOP           23        14     0.0\n",
       "1     OUN           20        18     0.0\n",
       "2     DEN           25         9     0.0\n",
       "3     DAL           18        23     0.0"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['wspeed'] = 0.\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's set the index to be the station."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>station</th>\n",
       "      <th>temperature</th>\n",
       "      <th>dewpoint</th>\n",
       "      <th>wspeed</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>station</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>TOP</th>\n",
       "      <td>TOP</td>\n",
       "      <td>23</td>\n",
       "      <td>14</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>OUN</th>\n",
       "      <td>OUN</td>\n",
       "      <td>20</td>\n",
       "      <td>18</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>DEN</th>\n",
       "      <td>DEN</td>\n",
       "      <td>25</td>\n",
       "      <td>9</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>DAL</th>\n",
       "      <td>DAL</td>\n",
       "      <td>18</td>\n",
       "      <td>23</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        station  temperature  dewpoint  wspeed\n",
       "station                                       \n",
       "TOP         TOP           23        14     0.0\n",
       "OUN         OUN           20        18     0.0\n",
       "DEN         DEN           25         9     0.0\n",
       "DAL         DAL           18        23     0.0"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.index = df.station\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Well, that's close, but we now have a redundant column, so let's get rid of it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>temperature</th>\n",
       "      <th>dewpoint</th>\n",
       "      <th>wspeed</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>station</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>TOP</th>\n",
       "      <td>23</td>\n",
       "      <td>14</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>OUN</th>\n",
       "      <td>20</td>\n",
       "      <td>18</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>DEN</th>\n",
       "      <td>25</td>\n",
       "      <td>9</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>DAL</th>\n",
       "      <td>18</td>\n",
       "      <td>23</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         temperature  dewpoint  wspeed\n",
       "station                               \n",
       "TOP               23        14     0.0\n",
       "OUN               20        18     0.0\n",
       "DEN               25         9     0.0\n",
       "DAL               18        23     0.0"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = df.drop('station', axis='columns')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also add data and order it by providing index values. Note that the next cell contains data that's \"out of order\" compared to the dataframe shown above. However, by providing the index that corresponds to each value, the data is organized correctly into the dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>temperature</th>\n",
       "      <th>dewpoint</th>\n",
       "      <th>wspeed</th>\n",
       "      <th>pressure</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>station</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>TOP</th>\n",
       "      <td>23</td>\n",
       "      <td>14</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>OUN</th>\n",
       "      <td>20</td>\n",
       "      <td>18</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1018</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>DEN</th>\n",
       "      <td>25</td>\n",
       "      <td>9</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1010</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>DAL</th>\n",
       "      <td>18</td>\n",
       "      <td>23</td>\n",
       "      <td>0.0</td>\n",
       "      <td>998</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         temperature  dewpoint  wspeed  pressure\n",
       "station                                         \n",
       "TOP               23        14     0.0      1000\n",
       "OUN               20        18     0.0      1018\n",
       "DEN               25         9     0.0      1010\n",
       "DAL               18        23     0.0       998"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['pressure'] = pd.Series([1010,1000,998,1018], index=['DEN','TOP','DAL','OUN'])\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's get a row from the dataframe instead of a column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "temperature      25.0\n",
       "dewpoint          9.0\n",
       "wspeed            0.0\n",
       "pressure       1010.0\n",
       "Name: DEN, dtype: float64"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc['DEN']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can even transpose the data easily if we needed that do make things easier to merge/munge later."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>station</th>\n",
       "      <th>TOP</th>\n",
       "      <th>OUN</th>\n",
       "      <th>DEN</th>\n",
       "      <th>DAL</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>temperature</th>\n",
       "      <td>23.0</td>\n",
       "      <td>20.0</td>\n",
       "      <td>25.0</td>\n",
       "      <td>18.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>dewpoint</th>\n",
       "      <td>14.0</td>\n",
       "      <td>18.0</td>\n",
       "      <td>9.0</td>\n",
       "      <td>23.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>wspeed</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>pressure</th>\n",
       "      <td>1000.0</td>\n",
       "      <td>1018.0</td>\n",
       "      <td>1010.0</td>\n",
       "      <td>998.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "station         TOP     OUN     DEN    DAL\n",
       "temperature    23.0    20.0    25.0   18.0\n",
       "dewpoint       14.0    18.0     9.0   23.0\n",
       "wspeed          0.0     0.0     0.0    0.0\n",
       "pressure     1000.0  1018.0  1010.0  998.0"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.T"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Look at the `values` attribute to access the data as a 1D or 2D array for series and data frames recpectively."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[  23.,   14.,    0., 1000.],\n",
       "       [  20.,   18.,    0., 1018.],\n",
       "       [  25.,    9.,    0., 1010.],\n",
       "       [  18.,   23.,    0.,  998.]])"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([23, 20, 25, 18])"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.temperature.values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "    <b>EXERCISE</b>:\n",
    "     <ul>\n",
    "         <li>Add a series of rain observations to the existing data frame.</li>\n",
    "         <li>Apply an instrument correction of -2 to the dewpoint observations.</li>\n",
    "    </ul>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "# YOUR CODE GOES HERE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    <b>SOLUTION</b>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>temperature</th>\n",
       "      <th>dewpoint</th>\n",
       "      <th>wspeed</th>\n",
       "      <th>pressure</th>\n",
       "      <th>rain</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>station</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>TOP</th>\n",
       "      <td>23</td>\n",
       "      <td>12</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1000</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>OUN</th>\n",
       "      <td>20</td>\n",
       "      <td>16</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1018</td>\n",
       "      <td>0.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>DEN</th>\n",
       "      <td>25</td>\n",
       "      <td>7</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1010</td>\n",
       "      <td>0.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>DAL</th>\n",
       "      <td>18</td>\n",
       "      <td>21</td>\n",
       "      <td>0.0</td>\n",
       "      <td>998</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         temperature  dewpoint  wspeed  pressure  rain\n",
       "station                                               \n",
       "TOP               23        12     0.0      1000   0.0\n",
       "OUN               20        16     0.0      1018   0.4\n",
       "DEN               25         7     0.0      1010   0.2\n",
       "DAL               18        21     0.0       998   0.0"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# %load solutions/rain_obs.py\n",
    "\n",
    "# Cell content replaced by load magic replacement.\n",
    "df['rain'] = [0, 0.4, 0.2, 0]\n",
    "df.dewpoint = df.dewpoint - 2\n",
    "df\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"#top\">Top</a>\n",
    "<hr style=\"height:2px;\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name=\"loading\"></a>\n",
    "## Loading Data in Pandas\n",
    "The real power of pandas is in manupulating and summarizing large sets of tabular data. To do that, we'll need a large set of tabular data. We've included a file in this directory called `JAN17_CO_ASOS.txt` that has all of the ASOS observations for several stations in Colorado for January of 2017. It's a few hundred thousand rows of data in a tab delimited format. Let's load it into Pandas."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv('Jan17_CO_ASOS.txt', sep='\\t')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>station</th>\n",
       "      <th>valid</th>\n",
       "      <th>tmpc</th>\n",
       "      <th>dwpc</th>\n",
       "      <th>mslp</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-01 00:00</td>\n",
       "      <td>M</td>\n",
       "      <td>M</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-01 00:05</td>\n",
       "      <td>M</td>\n",
       "      <td>M</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-01 00:10</td>\n",
       "      <td>M</td>\n",
       "      <td>M</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>LMO</td>\n",
       "      <td>2017-01-01 00:13</td>\n",
       "      <td>1.00</td>\n",
       "      <td>-7.50</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-01 00:15</td>\n",
       "      <td>-3.00</td>\n",
       "      <td>-9.00</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  station             valid  tmpc    dwpc    mslp\n",
       "0     FNL  2017-01-01 00:00      M       M      M\n",
       "1     FNL  2017-01-01 00:05      M       M      M\n",
       "2     FNL  2017-01-01 00:10      M       M      M\n",
       "3     LMO  2017-01-01 00:13   1.00   -7.50      M\n",
       "4     FNL  2017-01-01 00:15  -3.00   -9.00      M"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv('Jan17_CO_ASOS.txt', sep='\\t', parse_dates=['valid'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>station</th>\n",
       "      <th>valid</th>\n",
       "      <th>tmpc</th>\n",
       "      <th>dwpc</th>\n",
       "      <th>mslp</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-01 00:00:00</td>\n",
       "      <td>M</td>\n",
       "      <td>M</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-01 00:05:00</td>\n",
       "      <td>M</td>\n",
       "      <td>M</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-01 00:10:00</td>\n",
       "      <td>M</td>\n",
       "      <td>M</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>LMO</td>\n",
       "      <td>2017-01-01 00:13:00</td>\n",
       "      <td>1.00</td>\n",
       "      <td>-7.50</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-01 00:15:00</td>\n",
       "      <td>-3.00</td>\n",
       "      <td>-9.00</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  station               valid  tmpc    dwpc    mslp\n",
       "0     FNL 2017-01-01 00:00:00      M       M      M\n",
       "1     FNL 2017-01-01 00:05:00      M       M      M\n",
       "2     FNL 2017-01-01 00:10:00      M       M      M\n",
       "3     LMO 2017-01-01 00:13:00   1.00   -7.50      M\n",
       "4     FNL 2017-01-01 00:15:00  -3.00   -9.00      M"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv('Jan17_CO_ASOS.txt', sep='\\t', parse_dates=['valid'], na_values='M')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>station</th>\n",
       "      <th>valid</th>\n",
       "      <th>tmpc</th>\n",
       "      <th>dwpc</th>\n",
       "      <th>mslp</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-01 00:00:00</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-01 00:05:00</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-01 00:10:00</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>LMO</td>\n",
       "      <td>2017-01-01 00:13:00</td>\n",
       "      <td>1.0</td>\n",
       "      <td>-7.5</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-01 00:15:00</td>\n",
       "      <td>-3.0</td>\n",
       "      <td>-9.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  station               valid   tmpc     dwpc     mslp\n",
       "0     FNL 2017-01-01 00:00:00     NaN      NaN     NaN\n",
       "1     FNL 2017-01-01 00:05:00     NaN      NaN     NaN\n",
       "2     FNL 2017-01-01 00:10:00     NaN      NaN     NaN\n",
       "3     LMO 2017-01-01 00:13:00     1.0     -7.5     NaN\n",
       "4     FNL 2017-01-01 00:15:00    -3.0     -9.0     NaN"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's look in detail at those column names. Turns out we need to do some cleaning of this file. Welcome to real world data analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['station', 'valid', ' tmpc ', '  dwpc ', '  mslp'], dtype='object')"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.columns = ['station', 'time', 'temperature', 'dewpoint', 'pressure']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>station</th>\n",
       "      <th>time</th>\n",
       "      <th>temperature</th>\n",
       "      <th>dewpoint</th>\n",
       "      <th>pressure</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-01 00:00:00</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-01 00:05:00</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-01 00:10:00</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>LMO</td>\n",
       "      <td>2017-01-01 00:13:00</td>\n",
       "      <td>1.0</td>\n",
       "      <td>-7.5</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-01 00:15:00</td>\n",
       "      <td>-3.0</td>\n",
       "      <td>-9.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  station                time  temperature  dewpoint  pressure\n",
       "0     FNL 2017-01-01 00:00:00          NaN       NaN       NaN\n",
       "1     FNL 2017-01-01 00:05:00          NaN       NaN       NaN\n",
       "2     FNL 2017-01-01 00:10:00          NaN       NaN       NaN\n",
       "3     LMO 2017-01-01 00:13:00          1.0      -7.5       NaN\n",
       "4     FNL 2017-01-01 00:15:00         -3.0      -9.0       NaN"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For other formats of data CSV, fixed width, etc. that are tools to read it as well. You can even read excel files straight into Pandas."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"#top\">Top</a>\n",
    "<hr style=\"height:2px;\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name=\"missing\"></a>\n",
    "## Missing Data\n",
    "We've already dealt with some missing data by turning the 'M' string into actual NaN's while reading the file in. We can do one better though and delete any rows that have all values missing. There are similar operations that could be performed for columns. You can even drop if any values are missing, all are missing, or just those you specify are missing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "169658"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = df.dropna(axis='rows', how='all', subset=['temperature', 'dewpoint', 'pressure'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "72550"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>station</th>\n",
       "      <th>time</th>\n",
       "      <th>temperature</th>\n",
       "      <th>dewpoint</th>\n",
       "      <th>pressure</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>LMO</td>\n",
       "      <td>2017-01-01 00:13:00</td>\n",
       "      <td>1.00</td>\n",
       "      <td>-7.50</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-01 00:15:00</td>\n",
       "      <td>-3.00</td>\n",
       "      <td>-9.00</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>1V6</td>\n",
       "      <td>2017-01-01 00:15:00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>-9.00</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0CO</td>\n",
       "      <td>2017-01-01 00:23:00</td>\n",
       "      <td>-12.00</td>\n",
       "      <td>-18.00</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>LMO</td>\n",
       "      <td>2017-01-01 00:34:00</td>\n",
       "      <td>-0.22</td>\n",
       "      <td>-8.22</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   station                time  temperature  dewpoint  pressure\n",
       "3      LMO 2017-01-01 00:13:00         1.00     -7.50       NaN\n",
       "4      FNL 2017-01-01 00:15:00        -3.00     -9.00       NaN\n",
       "5      1V6 2017-01-01 00:15:00         0.00     -9.00       NaN\n",
       "7      0CO 2017-01-01 00:23:00       -12.00    -18.00       NaN\n",
       "10     LMO 2017-01-01 00:34:00        -0.22     -8.22       NaN"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "    <b>EXERCISE</b>:\n",
    "\n",
    "Our dataframe df has data in which we dropped any entries that were missing all of the temperature, dewpoint and pressure observations. Let's modify our command some and create a new dataframe df2 that only keeps observations that have all three variables (i.e. if a pressure is missing, the whole entry is dropped). This is useful if you were doing some computation that requires a complete observation to work.\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [],
   "source": [
    "# YOUR CODE GOES HERE\n",
    "# df2 = "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    <b>SOLUTION</b>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>station</th>\n",
       "      <th>time</th>\n",
       "      <th>temperature</th>\n",
       "      <th>dewpoint</th>\n",
       "      <th>pressure</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10422</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-24 07:15:00</td>\n",
       "      <td>-1.00</td>\n",
       "      <td>-2.00</td>\n",
       "      <td>929.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10427</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-24 07:35:00</td>\n",
       "      <td>-1.00</td>\n",
       "      <td>-2.00</td>\n",
       "      <td>929.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10434</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-24 07:55:00</td>\n",
       "      <td>-2.00</td>\n",
       "      <td>-3.00</td>\n",
       "      <td>929.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10435</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-24 07:56:00</td>\n",
       "      <td>-2.22</td>\n",
       "      <td>-2.78</td>\n",
       "      <td>998.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10440</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-24 08:15:00</td>\n",
       "      <td>-3.00</td>\n",
       "      <td>-4.00</td>\n",
       "      <td>929.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>169573</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-12-30 19:56:00</td>\n",
       "      <td>-10.00</td>\n",
       "      <td>-12.22</td>\n",
       "      <td>1018.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>169594</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-12-30 20:56:00</td>\n",
       "      <td>-10.00</td>\n",
       "      <td>-12.78</td>\n",
       "      <td>1017.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>169615</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-12-30 21:56:00</td>\n",
       "      <td>-8.28</td>\n",
       "      <td>-12.22</td>\n",
       "      <td>1016.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>169637</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-12-30 22:56:00</td>\n",
       "      <td>-8.28</td>\n",
       "      <td>-11.72</td>\n",
       "      <td>1015.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>169657</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-12-30 23:56:00</td>\n",
       "      <td>-8.89</td>\n",
       "      <td>-12.22</td>\n",
       "      <td>1016.2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>7953 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       station                time  temperature  dewpoint  pressure\n",
       "10422      FNL 2017-01-24 07:15:00        -1.00     -2.00     929.5\n",
       "10427      FNL 2017-01-24 07:35:00        -1.00     -2.00     929.5\n",
       "10434      FNL 2017-01-24 07:55:00        -2.00     -3.00     929.5\n",
       "10435      FNL 2017-01-24 07:56:00        -2.22     -2.78     998.7\n",
       "10440      FNL 2017-01-24 08:15:00        -3.00     -4.00     929.5\n",
       "...        ...                 ...          ...       ...       ...\n",
       "169573     FNL 2017-12-30 19:56:00       -10.00    -12.22    1018.7\n",
       "169594     FNL 2017-12-30 20:56:00       -10.00    -12.78    1017.1\n",
       "169615     FNL 2017-12-30 21:56:00        -8.28    -12.22    1016.7\n",
       "169637     FNL 2017-12-30 22:56:00        -8.28    -11.72    1015.9\n",
       "169657     FNL 2017-12-30 23:56:00        -8.89    -12.22    1016.2\n",
       "\n",
       "[7953 rows x 5 columns]"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# %load solutions/drop_obs.py\n",
    "\n",
    "# Cell content replaced by load magic replacement.\n",
    "df2 = df.dropna(how='any')\n",
    "df2\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lastly, we still have the original index values. Let's reindex to a new zero-based index for only the rows that have valid data in them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>station</th>\n",
       "      <th>time</th>\n",
       "      <th>temperature</th>\n",
       "      <th>dewpoint</th>\n",
       "      <th>pressure</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>LMO</td>\n",
       "      <td>2017-01-01 00:13:00</td>\n",
       "      <td>1.00</td>\n",
       "      <td>-7.50</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-01 00:15:00</td>\n",
       "      <td>-3.00</td>\n",
       "      <td>-9.00</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1V6</td>\n",
       "      <td>2017-01-01 00:15:00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>-9.00</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0CO</td>\n",
       "      <td>2017-01-01 00:23:00</td>\n",
       "      <td>-12.00</td>\n",
       "      <td>-18.00</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>LMO</td>\n",
       "      <td>2017-01-01 00:34:00</td>\n",
       "      <td>-0.22</td>\n",
       "      <td>-8.22</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>72545</th>\n",
       "      <td>LMO</td>\n",
       "      <td>2017-12-30 23:35:00</td>\n",
       "      <td>-7.00</td>\n",
       "      <td>-11.28</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>72546</th>\n",
       "      <td>1V6</td>\n",
       "      <td>2017-12-30 23:50:00</td>\n",
       "      <td>-5.00</td>\n",
       "      <td>-10.00</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>72547</th>\n",
       "      <td>0CO</td>\n",
       "      <td>2017-12-30 23:54:00</td>\n",
       "      <td>-4.00</td>\n",
       "      <td>-10.00</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>72548</th>\n",
       "      <td>LMO</td>\n",
       "      <td>2017-12-30 23:55:00</td>\n",
       "      <td>-7.00</td>\n",
       "      <td>-11.00</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>72549</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-12-30 23:56:00</td>\n",
       "      <td>-8.89</td>\n",
       "      <td>-12.22</td>\n",
       "      <td>1016.2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>72550 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      station                time  temperature  dewpoint  pressure\n",
       "0         LMO 2017-01-01 00:13:00         1.00     -7.50       NaN\n",
       "1         FNL 2017-01-01 00:15:00        -3.00     -9.00       NaN\n",
       "2         1V6 2017-01-01 00:15:00         0.00     -9.00       NaN\n",
       "3         0CO 2017-01-01 00:23:00       -12.00    -18.00       NaN\n",
       "4         LMO 2017-01-01 00:34:00        -0.22     -8.22       NaN\n",
       "...       ...                 ...          ...       ...       ...\n",
       "72545     LMO 2017-12-30 23:35:00        -7.00    -11.28       NaN\n",
       "72546     1V6 2017-12-30 23:50:00        -5.00    -10.00       NaN\n",
       "72547     0CO 2017-12-30 23:54:00        -4.00    -10.00       NaN\n",
       "72548     LMO 2017-12-30 23:55:00        -7.00    -11.00       NaN\n",
       "72549     FNL 2017-12-30 23:56:00        -8.89    -12.22    1016.2\n",
       "\n",
       "[72550 rows x 5 columns]"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.reset_index(drop=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>station</th>\n",
       "      <th>time</th>\n",
       "      <th>temperature</th>\n",
       "      <th>dewpoint</th>\n",
       "      <th>pressure</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>LMO</td>\n",
       "      <td>2017-01-01 00:13:00</td>\n",
       "      <td>1.00</td>\n",
       "      <td>-7.50</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>FNL</td>\n",
       "      <td>2017-01-01 00:15:00</td>\n",
       "      <td>-3.00</td>\n",
       "      <td>-9.00</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>1V6</td>\n",
       "      <td>2017-01-01 00:15:00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>-9.00</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>0CO</td>\n",
       "      <td>2017-01-01 00:23:00</td>\n",
       "      <td>-12.00</td>\n",
       "      <td>-18.00</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>LMO</td>\n",
       "      <td>2017-01-01 00:34:00</td>\n",
       "      <td>-0.22</td>\n",
       "      <td>-8.22</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   station                time  temperature  dewpoint  pressure\n",
       "3      LMO 2017-01-01 00:13:00         1.00     -7.50       NaN\n",
       "4      FNL 2017-01-01 00:15:00        -3.00     -9.00       NaN\n",
       "5      1V6 2017-01-01 00:15:00         0.00     -9.00       NaN\n",
       "7      0CO 2017-01-01 00:23:00       -12.00    -18.00       NaN\n",
       "10     LMO 2017-01-01 00:34:00        -0.22     -8.22       NaN"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"#top\">Top</a>\n",
    "<hr style=\"height:2px;\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name=\"manipulating\"></a>\n",
    "## Manipulating Data\n",
    "We can now take our data and do some intersting things with it. Let's start with a simple min/max."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Min: -28.72\n",
      "Max: 39.0\n"
     ]
    }
   ],
   "source": [
    "print(f'Min: {df.temperature.min()}\\nMax: {df.temperature.max()}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also do some useful statistics on data with attached methods like corr for correlation coefficient."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.7453312035648769"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.temperature.corr(df.dewpoint)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also call a `groupby` on the data frame to start getting some summary information for each station."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>temperature</th>\n",
       "      <th>dewpoint</th>\n",
       "      <th>pressure</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>station</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0CO</th>\n",
       "      <td>-1.926889</td>\n",
       "      <td>-7.491375</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1V6</th>\n",
       "      <td>7.574364</td>\n",
       "      <td>-4.872335</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FNL</th>\n",
       "      <td>8.656791</td>\n",
       "      <td>-0.228522</td>\n",
       "      <td>1014.852848</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>LMO</th>\n",
       "      <td>12.006185</td>\n",
       "      <td>0.387110</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         temperature  dewpoint     pressure\n",
       "station                                    \n",
       "0CO        -1.926889 -7.491375          NaN\n",
       "1V6         7.574364 -4.872335          NaN\n",
       "FNL         8.656791 -0.228522  1014.852848\n",
       "LMO        12.006185  0.387110          NaN"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('station').mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "    <b>EXERCISE</b>:\n",
    "\n",
    "Calculate the min, max, and standard deviation of the temperature field grouped by each station.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Calculate min\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Calculate max\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Calculate standard deviation\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    <b>SOLUTION</b>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "station\n",
      "0CO   -26.00\n",
      "1V6   -18.00\n",
      "FNL   -27.00\n",
      "LMO   -28.72\n",
      "Name: temperature, dtype: float64\n",
      "station\n",
      "0CO    15.00\n",
      "1V6    30.00\n",
      "FNL    37.22\n",
      "LMO    39.00\n",
      "Name: temperature, dtype: float64\n",
      "station\n",
      "0CO     7.701259\n",
      "1V6     8.413450\n",
      "FNL    11.413245\n",
      "LMO    10.716778\n",
      "Name: temperature, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "# %load solutions/calc_stats.py\n",
    "\n",
    "# Cell content replaced by load magic replacement.\n",
    "print(df.groupby('station').temperature.min())\n",
    "print(df.groupby('station').temperature.max())\n",
    "print(df.groupby('station').temperature.std())\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let me show you how to do all of that and more in a single call."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"8\" halign=\"left\">temperature</th>\n",
       "      <th colspan=\"5\" halign=\"left\">dewpoint</th>\n",
       "      <th colspan=\"8\" halign=\"left\">pressure</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>...</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>station</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0CO</th>\n",
       "      <td>25044.0</td>\n",
       "      <td>-1.926889</td>\n",
       "      <td>7.701259</td>\n",
       "      <td>-26.00</td>\n",
       "      <td>-7.00</td>\n",
       "      <td>-2.00</td>\n",
       "      <td>5.00</td>\n",
       "      <td>15.00</td>\n",
       "      <td>25044.0</td>\n",
       "      <td>-7.491375</td>\n",
       "      <td>...</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>11.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1V6</th>\n",
       "      <td>11632.0</td>\n",
       "      <td>7.574364</td>\n",
       "      <td>8.413450</td>\n",
       "      <td>-18.00</td>\n",
       "      <td>1.00</td>\n",
       "      <td>7.00</td>\n",
       "      <td>13.00</td>\n",
       "      <td>30.00</td>\n",
       "      <td>11632.0</td>\n",
       "      <td>-4.872335</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>12.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FNL</th>\n",
       "      <td>11504.0</td>\n",
       "      <td>8.656791</td>\n",
       "      <td>11.413245</td>\n",
       "      <td>-27.00</td>\n",
       "      <td>0.61</td>\n",
       "      <td>8.28</td>\n",
       "      <td>16.72</td>\n",
       "      <td>37.22</td>\n",
       "      <td>11500.0</td>\n",
       "      <td>-0.228522</td>\n",
       "      <td>...</td>\n",
       "      <td>7.0</td>\n",
       "      <td>18.28</td>\n",
       "      <td>7953.0</td>\n",
       "      <td>1014.852848</td>\n",
       "      <td>8.87871</td>\n",
       "      <td>929.5</td>\n",
       "      <td>1010.3</td>\n",
       "      <td>1015.4</td>\n",
       "      <td>1020.0</td>\n",
       "      <td>1034.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>LMO</th>\n",
       "      <td>24370.0</td>\n",
       "      <td>12.006185</td>\n",
       "      <td>10.716778</td>\n",
       "      <td>-28.72</td>\n",
       "      <td>4.22</td>\n",
       "      <td>12.50</td>\n",
       "      <td>19.50</td>\n",
       "      <td>39.00</td>\n",
       "      <td>24370.0</td>\n",
       "      <td>0.387110</td>\n",
       "      <td>...</td>\n",
       "      <td>7.0</td>\n",
       "      <td>19.11</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>4 rows × 24 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        temperature                                                          \\\n",
       "              count       mean        std    min   25%    50%    75%    max   \n",
       "station                                                                       \n",
       "0CO         25044.0  -1.926889   7.701259 -26.00 -7.00  -2.00   5.00  15.00   \n",
       "1V6         11632.0   7.574364   8.413450 -18.00  1.00   7.00  13.00  30.00   \n",
       "FNL         11504.0   8.656791  11.413245 -27.00  0.61   8.28  16.72  37.22   \n",
       "LMO         24370.0  12.006185  10.716778 -28.72  4.22  12.50  19.50  39.00   \n",
       "\n",
       "        dewpoint            ...             pressure                        \\\n",
       "           count      mean  ...  75%    max    count         mean      std   \n",
       "station                     ...                                              \n",
       "0CO      25044.0 -7.491375  ... -1.0  11.00      0.0          NaN      NaN   \n",
       "1V6      11632.0 -4.872335  ...  0.0  12.00      0.0          NaN      NaN   \n",
       "FNL      11500.0 -0.228522  ...  7.0  18.28   7953.0  1014.852848  8.87871   \n",
       "LMO      24370.0  0.387110  ...  7.0  19.11      0.0          NaN      NaN   \n",
       "\n",
       "                                                \n",
       "           min     25%     50%     75%     max  \n",
       "station                                         \n",
       "0CO        NaN     NaN     NaN     NaN     NaN  \n",
       "1V6        NaN     NaN     NaN     NaN     NaN  \n",
       "FNL      929.5  1010.3  1015.4  1020.0  1034.7  \n",
       "LMO        NaN     NaN     NaN     NaN     NaN  \n",
       "\n",
       "[4 rows x 24 columns]"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('station').describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's suppose we're going to make a meteogram or similar and want to get all of the data for a single station."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>station</th>\n",
       "      <th>time</th>\n",
       "      <th>temperature</th>\n",
       "      <th>dewpoint</th>\n",
       "      <th>pressure</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0CO</td>\n",
       "      <td>2017-01-01 00:23:00</td>\n",
       "      <td>-12.0</td>\n",
       "      <td>-18.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0CO</td>\n",
       "      <td>2017-01-01 00:43:00</td>\n",
       "      <td>-12.0</td>\n",
       "      <td>-18.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0CO</td>\n",
       "      <td>2017-01-01 01:03:00</td>\n",
       "      <td>-12.0</td>\n",
       "      <td>-18.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0CO</td>\n",
       "      <td>2017-01-01 01:23:00</td>\n",
       "      <td>-12.0</td>\n",
       "      <td>-18.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0CO</td>\n",
       "      <td>2017-01-01 02:03:00</td>\n",
       "      <td>-12.0</td>\n",
       "      <td>-19.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  station                time  temperature  dewpoint  pressure\n",
       "0     0CO 2017-01-01 00:23:00        -12.0     -18.0       NaN\n",
       "1     0CO 2017-01-01 00:43:00        -12.0     -18.0       NaN\n",
       "2     0CO 2017-01-01 01:03:00        -12.0     -18.0       NaN\n",
       "3     0CO 2017-01-01 01:23:00        -12.0     -18.0       NaN\n",
       "4     0CO 2017-01-01 02:03:00        -12.0     -19.0       NaN"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby('station').get_group('0CO').head().reset_index(drop=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "    <b>EXERCISE</b>:\n",
    "     <ul>\n",
    "         <li>Round the temperature column to whole degrees.</li>\n",
    "         <li>Group the observations by temperature and use the count method to see how many instances of the rounded temperatures there are in the dataset.</li>\n",
    "    </ul>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [],
   "source": [
    "# YOUR CODE GOES HERE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    <b>SOLUTION</b>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>station</th>\n",
       "      <th>time</th>\n",
       "      <th>dewpoint</th>\n",
       "      <th>pressure</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>temperature</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>-29.0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-28.0</th>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-27.0</th>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-26.0</th>\n",
       "      <td>15</td>\n",
       "      <td>15</td>\n",
       "      <td>15</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-25.0</th>\n",
       "      <td>31</td>\n",
       "      <td>31</td>\n",
       "      <td>31</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35.0</th>\n",
       "      <td>103</td>\n",
       "      <td>103</td>\n",
       "      <td>103</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36.0</th>\n",
       "      <td>75</td>\n",
       "      <td>75</td>\n",
       "      <td>75</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37.0</th>\n",
       "      <td>40</td>\n",
       "      <td>40</td>\n",
       "      <td>40</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38.0</th>\n",
       "      <td>7</td>\n",
       "      <td>7</td>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39.0</th>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>69 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "             station  time  dewpoint  pressure\n",
       "temperature                                   \n",
       "-29.0              1     1         1         0\n",
       "-28.0              4     4         4         0\n",
       "-27.0              5     5         4         0\n",
       "-26.0             15    15        15         0\n",
       "-25.0             31    31        31         0\n",
       "...              ...   ...       ...       ...\n",
       " 35.0            103   103       103        11\n",
       " 36.0             75    75        75         9\n",
       " 37.0             40    40        40         6\n",
       " 38.0              7     7         7         0\n",
       " 39.0              4     4         4         0\n",
       "\n",
       "[69 rows x 4 columns]"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# %load solutions/temperature_count.py\n",
    "\n",
    "# Cell content replaced by load magic replacement.\n",
    "df.temperature = df.temperature.round()\n",
    "df.groupby('temperature').count()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"#top\">Top</a>\n",
    "<hr style=\"height:2px;\">"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}