Siphon Overview
1. Use Siphon to Access a THREDDS Catalog¶
THREDDS is a server for providing remote access to datasets and a variety of server-side services. THREDDS make data access more uniform regardless of the on-disk format.
- Data Access Services:
- HTTP Download
- Web Mapping/Coverage Service (WMS/WCS)
- OPeNDAP
- NetCDF Subset Service
- CDMRemote
There is a server with real-time data setup at http://thredds.ucar.edu that we'll use to explore the capability of THREDDS and learn how to access data. Let's open that link and explore in the browser what's available on THREDDS. Explore the NEXRAD level 3 data specifically.
THREDDS Catalogs¶
- XML descriptions of data and metadata
- Access methods
- Easily processed with
siphon.catalog.TDSCatalog
from datetime import datetime, timedelta
from siphon.catalog import TDSCatalog
Let's get data from yesterday at this time. We'll use the timedelta object to do this in an easy way.
date = datetime.utcnow() - timedelta(days=1)
print(date)
We'll then go find the URL for the level 3 radar data. Let's get the N0Q (tilt 1 base reflectivity) for the LRX radar. Notice that we change the html
extension to xml
. Siphon will do that for us, but issue a warning.
cat = TDSCatalog('http://thredds.ucar.edu/thredds/catalog/nexrad/level3/'
f'N0Q/LRX/{date:%Y%m%d}/catalog.xml')
cat.datasets
2. Filtering Data¶
We could manually look through that list above and figure out what dataset we're looking for and generate that name (or index). Siphon provides some helpers to simplify this process, provided the names of the dataset follow a pattern with the timestamp in the name:
request_time = date.replace(hour=18, minute=30, second=0, microsecond=0)
ds = cat.datasets.filter_time_nearest(request_time)
ds
We can also find the list of datasets within a time range:
datasets = cat.datasets.filter_time_range(request_time, request_time + timedelta(hours=1))
print(datasets)
- Starting from http://thredds.ucar.edu/, find the level 2 radar data for the Tulsa, OK radar (KINX) for the previous day.
- Grab the URL and create a TDSCatalog instance.
- Using Siphon, find the data available in the catalog between 12Z and 18Z on the previous day.
# YOUR CODE GOES HERE
# %load solutions/datasets.py
# Cell content replaced by load magic replacement.
# Solution from above in case you had trouble
date = datetime.utcnow() - timedelta(days=1)
cat = TDSCatalog(f'https://thredds.ucar.edu/thredds/catalog/nexrad/level2/KINX/{date:%Y%m%d}/catalog.xml')
request_time = date.replace(hour=12, minute=0, second=0, microsecond=0)
datasets = cat.datasets.filter_time_range(request_time, request_time + timedelta(hours=6))
print(datasets)
3. Use Siphon to Perform Remote Data Access¶
Accessing catalogs is only part of the story; Siphon is much more useful if you're trying to access/download datasets.
For instance, using our data that we just retrieved:
# Solution from above in case you had trouble
date = datetime.utcnow() - timedelta(days=1)
cat = TDSCatalog(f'https://thredds.ucar.edu/thredds/catalog/nexrad/level2/KINX/{date:%Y%m%d}/catalog.xml')
request_time = date.replace(hour=12, minute=0, second=0, microsecond=0)
datasets = cat.datasets.filter_time_range(request_time, request_time + timedelta(hours=6))
ds = datasets[0]
We can ask Siphon to download the file locally:
ds.download()
Look in your file explorer panel or run the cell below to verify that we did actually download the file!
import os; os.listdir()
Or better yet, get a file-like object that lets us read
from the file as if it were local:
fobj = ds.remote_open()
data = fobj.read()
print(len(data))
This is handy if you have Python code to read a particular format.
It's also possible to get access to the file through services that provide netCDF4-like access, but for the remote file. This access allows downloading information only for variables of interest, or for (index-based) subsets of that data:
nc = ds.remote_access()
By default this uses CDMRemote (if available), but it's also possible to ask for OPeNDAP (using netCDF4-python). There is even XArray support which is great with the declarative plotting interface - more on that later.
print(list(nc.variables))