Basic Time Series Plotting
1. Obtaining Data¶
To learn about time series analysis, we first need to find some data and get it into Python. In this case we're going to use data from the National Data Buoy Center. We'll use the pandas library for our data subset and manipulation operations after obtaining the data with siphon.
Each buoy has many types of data availabe, you can read all about it in the NDBC Web Data Guide. There is a mechanism in siphon to see which data types are available for a given buoy.
from siphon.simplewebservice.ndbc import NDBC
data_types = NDBC.buoy_data_types('46042')
print(data_types)
In this case, we'll just stick with the standard meteorological data. The "realtime" data from NDBC contains approximately 45 days of data from each buoy. We'll retreive that record for buoy 51002 and then do some cleaning of the data.
df = NDBC.realtime_observations('46042')
df.tail()
Let's get rid of the columns with all missing data. We could use the drop
method and manually name all of the columns, but that would require us to know which are all NaN
and that sounds like manual labor - something that programmers hate. Pandas has the dropna
method that allows us to drop rows or columns where any or all values are NaN
. In this case, let's drop all columns with all NaN
values.
df = df.dropna(axis='columns', how='all')
df.head()
- Use the realtime_observations method to retreive supplemental data for buoy 41002. **Note** assign the data to something other that df or you'll have to rerun the data download cell above. We suggest using the name supl_obs.
# YOUR CODE GOES HERE
# supl_obs =
# %load solutions/get_obs.py
# Cell content replaced by load magic replacement.
supl_obs = NDBC.realtime_observations('41002', data_type='supl')
supl_obs.tail()
Finally, we need to trim down the data. The file contains 45 days worth of observations. Let's look at the last week's worth of data.
import pandas as pd
idx = df.time >= (pd.Timestamp.utcnow() - pd.Timedelta(days=7))
df = df[idx]
df.head()
We're almost ready, but now the index column is not that meaningful. It starts at a non-zero row, which is fine with our initial file, but let's re-zero the index so we have a nice clean data frame to start with.
df.reset_index(drop=True, inplace=True)
df.head()
2. Basic Timeseries Plotting¶
Matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. We're going to learn the basics of creating timeseries plots with matplotlib by plotting buoy wind, gust, temperature, and pressure data.
# Convention for import of the pyplot interface
import matplotlib.pyplot as plt
# Set-up to have matplotlib use its support for notebook inline plots
%matplotlib inline
We'll start by plotting the windspeed observations from the buoy.
plt.rc('font', size=12)
fig, ax = plt.subplots(figsize=(10, 6))
# Specify how our lines should look
ax.plot(df.time, df.wind_speed, color='tab:orange', label='Windspeed')
# Same as above
ax.set_xlabel('Time')
ax.set_ylabel('Speed (m/s)')
ax.set_title('Buoy Wind Data')
ax.grid(True)
ax.legend(loc='upper left');
Our x axis labels look a little crowded - let's try only labeling each day in our time series.
# Helpers to format and locate ticks for dates
from matplotlib.dates import DateFormatter, DayLocator
# Set the x-axis to do major ticks on the days and label them like '07/20'
ax.xaxis.set_major_locator(DayLocator())
ax.xaxis.set_major_formatter(DateFormatter('%m/%d'))
fig
Now we can add wind gust speeds to the same plot as a dashed yellow line.
# Use linestyle keyword to style our plot
ax.plot(df.time, df.wind_gust, color='tab:olive', linestyle='--',
label='Wind Gust')
# Redisplay the legend to show our new wind gust line
ax.legend(loc='upper left')
fig
- Create your own figure and axes (
myfig, myax = plt.subplots(figsize=(10, 6))
) which plots temperature. - Change the x-axis major tick labels to display the shortened month and date (i.e. 'Sep DD' where DD is the day number). Look at the table of formatters for help.
- Make sure you include a legend and labels!
-
BONUS: try changing the
linestyle
, e.g., a blue dashed line.
# YOUR CODE GOES HERE
# %load solutions/basic_plot.py
# Cell content replaced by load magic replacement.
myfig, myax = plt.subplots(figsize=(10, 6))
# Plot temperature
myax.plot(df.time, df.air_temperature, color='tab:blue', linestyle='-.', label='Temperature')
myax.set_xlabel('Time')
myax.set_ylabel('Temperature (degC)')
myax.set_title('Buoy 41056 Data')
myax.grid(True)
# format x axis labels
myax.xaxis.set_major_locator(DayLocator())
myax.xaxis.set_major_formatter(DateFormatter('%b %d'))
myax.legend(loc='upper left');
fig
3. Multiple y-axes¶
What if we wanted to plot another variable in vastly different units on our plot?
Let's return to our wind data plot and add pressure.
# plot pressure data on same figure
ax.plot(df.time, df.pressure, color='black', label='Pressure')
ax.set_ylabel('Pressure')
ax.legend(loc='upper left')
fig
That is less than ideal. We can't see detail in the data profiles! We can create a twin of the x-axis and have a secondary y-axis on the right side of the plot. We'll create a totally new figure here.
fig, ax = plt.subplots(figsize=(10, 6))
axb = ax.twinx()
# Same as above
ax.set_xlabel('Time')
ax.set_ylabel('Speed (m/s)')
ax.set_title('Buoy Data')
ax.grid(True)
# Plotting on the first y-axis
ax.plot(df.time, df.wind_speed, color='tab:orange', label='Windspeed')
ax.plot(df.time, df.wind_gust, color='tab:olive', linestyle='--', label='Wind Gust')
ax.legend(loc='upper left');
# Plotting on the second y-axis
axb.set_ylabel('Pressure (hPa)')
axb.plot(df.time, df.pressure, color='black', label='pressure')
ax.xaxis.set_major_locator(DayLocator())
ax.xaxis.set_major_formatter(DateFormatter('%b %d'))
We're closer, but the data are plotting over the legend and not included in the legend. That's because the legend is associated with our primary y-axis. We need to append that data from the second y-axis.
fig, ax = plt.subplots(figsize=(10, 6))
axb = ax.twinx()
# Same as above
ax.set_xlabel('Time')
ax.set_ylabel('Speed (m/s)')
ax.set_title('Buoy 41056 Wind Data')
ax.grid(True)
# Plotting on the first y-axis
ax.plot(df.time, df.wind_speed, color='tab:orange', label='Windspeed')
ax.plot(df.time, df.wind_gust, color='tab:olive', linestyle='--', label='Wind Gust')
# Plotting on the second y-axis
axb.set_ylabel('Pressure (hPa)')
axb.plot(df.time, df.pressure, color='black', label='pressure')
ax.xaxis.set_major_locator(DayLocator())
ax.xaxis.set_major_formatter(DateFormatter('%b %d'))
# Handling of getting lines and labels from all axes for a single legend
lines, labels = ax.get_legend_handles_labels()
lines2, labels2 = axb.get_legend_handles_labels()
axb.legend(lines + lines2, labels + labels2, loc='upper left');
- A blue line representing the wave height measurements.
- A green line representing wind speed on a secondary y-axis
- Proper labels/title.
- **Bonus**: Make the wave height data plot as points only with no line. Look at the documentation for the linestyle and marker arguments.
# YOUR CODE GOES HERE
# %load solutions/adv_plot.py
# Cell content replaced by load magic replacement.
myfig, myax = plt.subplots(figsize=(10, 6))
myaxb = myax.twinx()
# Same as above
myax.set_xlabel('Time')
myax.set_ylabel('Wave Height (m)')
myax.set_title('Buoy Data')
myax.grid(True)
# Plotting on the first y-axis
myax.plot(df.time, df.wave_height, color='tab:blue', label='Waveheight (m)',
linestyle='None', marker='o')
# Plotting on the second y-axis
myaxb.set_ylabel('Windspeed (m/s)')
myaxb.plot(df.time, df.wind_speed, color='tab:green', label='Windspeed (m/s)')
myax.xaxis.set_major_locator(DayLocator())
myax.xaxis.set_major_formatter(DateFormatter('%b %d'))
# Handling of getting lines and labels from all axes for a single legend
mylines, mylabels = myax.get_legend_handles_labels()
mylines2, mylabels2 = myaxb.get_legend_handles_labels()
myax.legend(mylines + mylines2, mylabels + mylabels2, loc='upper left');