siphon.catalog
Code to support reading and parsing catalog files from a THREDDS Data Server (TDS).
They help identifying the latest dataset and finding proper URLs to access the data.
- class siphon.catalog.CaseInsensitiveDict(*args, **kwargs)[source]
Extend
dict
to use a case-insensitive key set.
- class siphon.catalog.CaseInsensitiveStr(*args)[source]
Extend
str
to use case-insensitive comparison and lookup.
- class siphon.catalog.CatalogRef(base_url, element_node)[source]
An object for holding catalog references obtained from a THREDDS Client Catalog.
- name
The name of the
CatalogRef
element- Type:
- href
url to the
CatalogRef
’s THREDDS Client Catalog- Type:
- title
Title of the
CatalogRef
element- Type:
- follow()[source]
Follow the catalog reference and return a new
TDSCatalog
.- Returns:
The referenced catalog
- Return type:
- class siphon.catalog.CompoundService(service_node)[source]
Hold information about compound services.
- services
A list of
SimpleService
objects- Type:
- __init__(service_node)[source]
Initialize a
CompoundService
object.
- class siphon.catalog.Dataset(element_node, catalog_url='')[source]
An object for holding Datasets obtained from a THREDDS Client Catalog.
- access_urls
A dictionary of access urls whose keywords are the access service types defined in the catalog (for example, “OPENDAP”, “NetcdfSubset”, “WMS”, etc.
- Type:
- access_with_service(service, use_xarray=None)[source]
Access the dataset using a particular service.
Return an Python object capable of communicating with the server using the particular service. For instance, for ‘HTTPServer’ this is a file-like object capable of HTTP communication; for OPENDAP this is a netCDF4 dataset.
- Parameters:
service (str) – The name of the service for accessing the dataset
- Return type:
An instance appropriate for communicating using
service
.
- download(filename=None)[source]
Download the dataset to a local file.
- Parameters:
filename (str, optional) – The full path to which the dataset will be saved
- make_access_urls(catalog_url, all_services, metadata=None)[source]
Make fully qualified urls for the access methods enabled on the dataset.
- Parameters:
catalog_url (str) – The top level server url
all_services (List[SimpleService]) – list of
SimpleService
objects associated with the datasetmetadata (dict) – Metadata from the
TDSCatalog
- remote_access(service=None, use_xarray=None)[source]
Access the remote dataset.
Open the remote dataset and get a netCDF4-compatible
Dataset
object providing index-based subsetting capabilities.
- remote_open(mode='b', encoding='ascii', errors='ignore')[source]
Open the remote dataset for random access.
Get a file-like object for reading from the remote dataset, providing random access, similar to a local file.
- Parameters:
mode (‘b’ or ‘t’, optional) – Mode with which to open the remote data; ‘b’ for binary, ‘t’ for text. Defaults to ‘b’.
encoding (str, optional) – If
mode
is text, the encoding to use to decode the binary data into text. Defaults to ‘ascii’.errors (str, optional) – If
mode
is text, the error handling behavior to pass to bytes.decode. Defaults to ‘ignore’.
- Returns:
fobj – A random access, file-like object for reading data
- Return type:
file-like object
- resolve_url(catalog_url)[source]
Resolve the url of the dataset when reading latest.xml.
- Parameters:
catalog_url (str) – The catalog url to be resolved
- subset(service=None)[source]
Subset the dataset.
Open the remote dataset and get a client for talking to
service
.- Parameters:
service (str, optional) – The name of the service for subsetting the dataset. Defaults to ‘NetcdfSubset’ or ‘NetcdfServer’, in that order, depending on the services listed in the catalog.
- Return type:
a client for communicating using
service
- class siphon.catalog.DatasetCollection[source]
Extend
IndexableMapping
to allow datetime-based filter queries.Indexing works like a dictionary. The dataset name (‘my_data.nc’, a string) is the key, and the value returned is an instance of
Dataset
. Positional indexing (e.g., [0]) is another valid method of indexing.DatasetCollection
is commonly encountered as thedatasets
attribute of aTDSCatalog
. If aregex
infilter_time_nearest
or`filter_time_range` does not provide sufficient flexibility, or the ``TDSCatalog
does not provide accurate times, iterating overdatasets
can be useful as part implementing a custom filter. For example, infor ds in catalog.datasets: print(ds)
,ds
will be the dataset name, andds
can be used to implement further filtering logic.- filter_time_nearest(time, regex=None, strptime=None)[source]
Filter keys for an item closest to the desired time.
Loops over all keys in the collection and uses regex to extract and build datetime`s. The collection of `datetime`s is compared to `start and the value that has a
datetime
closest to that requested is returned.If none of the keys in the collection match the regex, indicating that the keys are not date/time-based, aValueError
is raised.- Parameters:
time (
datetime.datetime
) – The desired timeregex (str, optional) – The regular expression to use to extract date/time information from the key. If given, this should contain either 1. named groups: ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, and ‘microsecond’, as appropriate. When a match is found, any of those groups missing from the pattern will be assigned a value of 0. The default pattern looks for patterns like: 20171118_2356. or 2. a group named ‘strptime’ (e.g., r’_s(?P<strptime>d{13})’ for GOES-16 data) to be parsed with strptime.
strptime (str, optional) – the format string that corresponds to regex option (2) above. For example, GOES-16 data with a julian date matching the regex above is parsed with ‘%Y%j%H%M%S’.
- Returns:
The value with a time closest to that desired
- Return type:
- filter_time_range(start, end, regex=None, strptime=None)[source]
Filter keys for all items within the desired time range.
Loops over all keys in the collection and uses regex to extract and build datetime`s. From the collection of `datetime`s, all values within `start and end (inclusive) are returned. If none of the keys in the collection match the regex, indicating that the keys are not date/time-based, a
ValueError
is raised.- Parameters:
start (
datetime.datetime
) – The start of the desired time range, inclusiveend (
datetime.datetime
) – The end of the desired time range, inclusiveregex (str, optional) – The regular expression to use to extract date/time information from the key. If given, this should contain either 1. named groups: ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, and ‘microsecond’, as appropriate. When a match is found, any of those groups missing from the pattern will be assigned a value of 0. The default pattern looks for patterns like: 20171118_2356. or 2. a group named ‘strptime’ (e.g., r’_s(?P<strptime>d{13})’ for GOES-16 data) to be parsed with strptime.
strptime (str, optional) – the format string that corresponds to regex option (2) above. For example, GOES-16 data with a julian date matching the regex above is parsed with ‘%Y%j%H%M%S’.
- Returns:
All values corresponding to times within the specified range
- Return type:
List[Dataset]
- class siphon.catalog.IndexableMapping[source]
Extend
OrderedDict
to allow index-based access to values.
- class siphon.catalog.SimpleService(service_node)[source]
Hold information about an access service enabled on a dataset.
- access_urls
A dictionary of access urls whose keywords are the access service types defined in the catalog (for example, “OPENDAP”, “NetcdfSubset”, “WMS”, etc.)
- class siphon.catalog.TDSCatalog(catalog_url)[source]
Parse information from a THREDDS Client Catalog.
- services
A list of
SimpleService
listed in the catalog- Type:
List
- catalog_refs
A dictionary of
CatalogRef
objects whose keys are the name of the catalog ref title.- Type:
- __init__(catalog_url)[source]
Initialize the TDSCatalog object.
- Parameters:
catalog_url (str) – The URL of a THREDDS client catalog
- property latest
Get the latest dataset, if available.
- siphon.catalog.get_latest_access_url(catalog_url, access_method)[source]
Get the data access url to the latest data using a specified access method.
These are available for a data available from a top level dataset catalog (url). Currently only supports the existence of one “latest” dataset.
- Parameters:
- Returns:
access_url – Data access URL to be used to access the latest data available from a given catalog using the specified access_method. Typically a single string, but not always.
- Return type: