siphon.catalog

Code to support reading and parsing catalog files from a THREDDS Data Server (TDS).

They help identifying the latest dataset and finding proper URLs to access the data.

class siphon.catalog.CaseInsensitiveDict(*args, **kwargs)[source]

Extend dict to use a case-insensitive key set.

__init__(*args, **kwargs)[source]

Create a dict with a set of lowercase keys.

pop(key, *args, **kwargs)[source]

Remove and return the value associated with case-insensitive key.

class siphon.catalog.CaseInsensitiveStr(*args)[source]

Extend str to use case-insensitive comparison and lookup.

__init__(*args)[source]

Create str with a _lowered property.

class siphon.catalog.CatalogRef(base_url, element_node)[source]

An object for holding catalog references obtained from a THREDDS Client Catalog.

name

str – The name of the CatalogRef element

href

str – url to the CatalogRef’s THREDDS Client Catalog

title

str – Title of the CatalogRef element

__init__(base_url, element_node)[source]

Initialize the catalogRef object.

Parameters:
  • base_url (str) – URL to the base catalog that owns this reference
  • element_node (Element) – An Element representing a catalogRef node
follow()[source]

Follow the catalog reference and return a new TDSCatalog.

Returns:The referenced catalog
Return type:TDSCatalog
class siphon.catalog.CompoundService(service_node)[source]

Hold information about compound services.

name

str – The name of the compound service

service_type

str – The service type (for this object, service type will always be “COMPOUND”)

services

list[SimpleService] – A list of SimpleService objects

__init__(service_node)[source]

Initialize a CompoundService object.

Parameters:service_node (Element) – An Element representing a compound service node
is_resolver()[source]

Return whether the service is a resolver service.

For a compound service, this is always False because it will never be a resolver.

class siphon.catalog.Dataset(element_node, catalog_url='')[source]

An object for holding Datasets obtained from a THREDDS Client Catalog.

name

str – The name of the Dataset element

url_path

str – url to the accessible dataset

access_urls

CaseInsensitiveDict[str, str] – A dictionary of access urls whose keywords are the access service types defined in the catalog (for example, “OPENDAP”, “NetcdfSubset”, “WMS”, etc.

__init__(element_node, catalog_url='')[source]

Initialize the Dataset object.

Parameters:
  • element_node (Element) – An Element representing a Dataset node
  • catalog_url (str) – The top level server url
access_with_service(service, use_xarray=None)[source]

Access the dataset using a particular service.

Return an Python object capable of communicating with the server using the particular service. For instance, for ‘HTTPServer’ this is a file-like object capable of HTTP communication; for OPENDAP this is a netCDF4 dataset.

Parameters:service (str) – The name of the service for accessing the dataset
Returns:
Return type:An instance appropriate for communicating using service.
add_access_element_info(access_element)[source]

Create an access method from a catalog element.

download(filename=None)[source]

Download the dataset to a local file.

Parameters:filename (str, optional) – The full path to which the dataset will be saved
make_access_urls(catalog_url, all_services, metadata=None)[source]

Make fully qualified urls for the access methods enabled on the dataset.

Parameters:
remote_access(service=None, use_xarray=None)[source]

Access the remote dataset.

Open the remote dataset and get a netCDF4-compatible Dataset object providing index-based subsetting capabilities.

Parameters:service (str, optional) – The name of the service to use for access to the dataset, either ‘CdmRemote’ or ‘OPENDAP’. Defaults to ‘CdmRemote’.
Returns:Object for netCDF4-like access to the dataset
Return type:Dataset
remote_open()[source]

Open the remote dataset for random access.

Get a file-like object for reading from the remote dataset, providing random access, similar to a local file.

Returns:
Return type:A random access, file-like object
resolve_url(catalog_url)[source]

Resolve the url of the dataset when reading latest.xml.

Parameters:catalog_url (str) – The catalog url to be resolved
subset(service=None)[source]

Subset the dataset.

Open the remote dataset and get a client for talking to service.

Parameters:service (str, optional) – The name of the service for subsetting the dataset. Defaults to ‘NetcdfSubset’ or ‘NetcdfServer’, in that order, depending on the services listed in the catalog.
Returns:
Return type:a client for communicating using service
class siphon.catalog.DatasetCollection[source]

Extend IndexableMapping to allow datetime-based filter queries.

filter_time_nearest(time, regex=None)[source]

Filter keys for an item closest to the desired time.

Loops over all keys in the collection and uses regex to extract and build datetime`s. The collection of `datetime`s is compared to `start and the value that has a datetime closest to that requested is returned.If none of the keys in the collection match the regex, indicating that the keys are not date/time-based, a ValueError is raised.

Parameters:
  • time (datetime.datetime) – The desired time
  • regex (str, optional) – The regular expression to use to extract date/time information from the key. If given, this should contain named groups: ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, and ‘microsecond’, as appropriate. When a match is found, any of those groups missing from the pattern will be assigned a value of 0. The default pattern looks for patterns like: 20171118_2356.
Returns:

Return type:

The value with a time closest to that desired

filter_time_range(start, end, regex=None)[source]

Filter keys for all items within the desired time range.

Loops over all keys in the collection and uses regex to extract and build datetime`s. From the collection of `datetime`s, all values within `start and end (inclusive) are returned. If none of the keys in the collection match the regex, indicating that the keys are not date/time-based, a ValueError is raised.

Parameters:
  • start (datetime.datetime) – The start of the desired time range, inclusive
  • end (datetime.datetime) – The end of the desired time range, inclusive
  • regex (str, optional) – The regular expression to use to extract date/time information from the key. If given, this should contain named groups: ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, and ‘microsecond’, as appropriate. When a match is found, any of those groups missing from the pattern will be assigned a value of 0. The default pattern looks for patterns like: 20171118_2356.
Returns:

Return type:

All values corresponding to times within the specified range

class siphon.catalog.IndexableMapping[source]

Extend OrderedDict to allow index-based access to values.

class siphon.catalog.SimpleService(service_node)[source]

Hold information about an access service enabled on a dataset.

name

str – The name of the service

service_type

str – The service type (i.e. “OPENDAP”, “NetcdfSubset”, “WMS”, etc.)

access_urls

dict[str, str] – A dictionary of access urls whose keywords are the access service types defined in the catalog (for example, “OPENDAP”, “NetcdfSubset”, “WMS”, etc.)

__init__(service_node)[source]

Initialize the Dataset object.

Parameters:service_node (Element) – An Element representing a service node
is_resolver()[source]

Return whether the service is a resolver service.

class siphon.catalog.TDSCatalog(catalog_url)[source]

Parse information from a THREDDS Client Catalog.

catalog_url

str – The url path of the catalog to parse.

base_tds_url

str – The top level server address

datasets

DatasetCollection[str, Dataset] – A dictionary of Dataset objects, whose keys are the name of the dataset’s name

services

List – A list of SimpleService listed in the catalog

catalog_refs

DatasetCollection[str, CatalogRef] – A dictionary of CatalogRef objects whose keys are the name of the catalog ref title.

__init__(catalog_url)[source]

Initialize the TDSCatalog object.

Parameters:catalog_url (str) – The URL of a THREDDS client catalog
latest

Get the latest dataset, if available.

siphon.catalog.get_latest_access_url(catalog_url, access_method)[source]

Get the data access url to the latest data using a specified access method.

These are available for a data available from a top level dataset catalog (url). Currently only supports the existence of one “latest” dataset.

Parameters:
  • catalog_url (str) – The URL of a top level data catalog
  • access_method (str) – desired data access method (i.e. “OPENDAP”, “NetcdfSubset”, “WMS”, etc)
Returns:

access_url – Data access URL to be used to access the latest data available from a given catalog using the specified access_method. Typically a single string, but not always.

Return type:

str