siphon.catalog

Code to support reading and parsing catalog files from a THREDDS Data Server (TDS).

They help identifying the latest dataset and finding proper URLs to access the data.

class siphon.catalog.CaseInsensitiveDict(*args, **kwargs)[source]

Extend dict to use a case-insensitive key set.

__init__(*args, **kwargs)[source]

Create a dict with a set of lowercase keys.

pop(key, *args, **kwargs)[source]

Remove and return the value associated with case-insensitive key.

class siphon.catalog.CaseInsensitiveStr(*args)[source]

Extend str to use case-insensitive comparison and lookup.

__init__(*args)[source]

Create str with a _lowered property.

class siphon.catalog.CatalogRef(base_url, element_node)[source]

An object for holding catalog references obtained from a THREDDS Client Catalog.

name

The name of the CatalogRef element

Type:

str

href

url to the CatalogRef’s THREDDS Client Catalog

Type:

str

title

Title of the CatalogRef element

Type:

str

__init__(base_url, element_node)[source]

Initialize the catalogRef object.

Parameters:
  • base_url (str) – URL to the base catalog that owns this reference

  • element_node (Element) – An Element representing a catalogRef node

follow()[source]

Follow the catalog reference and return a new TDSCatalog.

Returns:

The referenced catalog

Return type:

TDSCatalog

class siphon.catalog.CompoundService(service_node)[source]

Hold information about compound services.

name

The name of the compound service

Type:

str

service_type

The service type (for this object, service type will always be “COMPOUND”)

Type:

str

services

A list of SimpleService objects

Type:

list[SimpleService]

__init__(service_node)[source]

Initialize a CompoundService object.

Parameters:

service_node (Element) – An Element representing a compound service node

is_resolver()[source]

Return whether the service is a resolver service.

For a compound service, this is always False because it will never be a resolver.

class siphon.catalog.Dataset(element_node, catalog_url='')[source]

An object for holding Datasets obtained from a THREDDS Client Catalog.

name

The name of the Dataset element

Type:

str

url_path

url to the accessible dataset

Type:

str

access_urls

A dictionary of access urls whose keywords are the access service types defined in the catalog (for example, “OPENDAP”, “NetcdfSubset”, “WMS”, etc.

Type:

CaseInsensitiveDict[str, str]

__init__(element_node, catalog_url='')[source]

Initialize the Dataset object.

Parameters:
  • element_node (Element) – An Element representing a Dataset node

  • catalog_url (str) – The top level server url

access_with_service(service, use_xarray=None)[source]

Access the dataset using a particular service.

Return an Python object capable of communicating with the server using the particular service. For instance, for ‘HTTPServer’ this is a file-like object capable of HTTP communication; for OPENDAP this is a netCDF4 dataset.

Parameters:

service (str) – The name of the service for accessing the dataset

Return type:

An instance appropriate for communicating using service.

add_access_element_info(access_element)[source]

Create an access method from a catalog element.

download(filename=None)[source]

Download the dataset to a local file.

Parameters:

filename (str, optional) – The full path to which the dataset will be saved

make_access_urls(catalog_url, all_services, metadata=None)[source]

Make fully qualified urls for the access methods enabled on the dataset.

Parameters:
remote_access(service=None, use_xarray=None)[source]

Access the remote dataset.

Open the remote dataset and get a netCDF4-compatible Dataset object providing index-based subsetting capabilities.

Parameters:

service (str, optional) – The name of the service to use for access to the dataset, either ‘CdmRemote’ or ‘OPENDAP’. Defaults to ‘CdmRemote’.

Returns:

Object for netCDF4-like access to the dataset

Return type:

Dataset

remote_open(mode='b', encoding='ascii', errors='ignore')[source]

Open the remote dataset for random access.

Get a file-like object for reading from the remote dataset, providing random access, similar to a local file.

Parameters:
  • mode (‘b’ or ‘t’, optional) – Mode with which to open the remote data; ‘b’ for binary, ‘t’ for text. Defaults to ‘b’.

  • encoding (str, optional) – If mode is text, the encoding to use to decode the binary data into text. Defaults to ‘ascii’.

  • errors (str, optional) – If mode is text, the error handling behavior to pass to bytes.decode. Defaults to ‘ignore’.

Returns:

fobj – A random access, file-like object for reading data

Return type:

file-like object

resolve_url(catalog_url)[source]

Resolve the url of the dataset when reading latest.xml.

Parameters:

catalog_url (str) – The catalog url to be resolved

subset(service=None)[source]

Subset the dataset.

Open the remote dataset and get a client for talking to service.

Parameters:

service (str, optional) – The name of the service for subsetting the dataset. Defaults to ‘NetcdfSubset’ or ‘NetcdfServer’, in that order, depending on the services listed in the catalog.

Return type:

a client for communicating using service

class siphon.catalog.DatasetCollection[source]

Extend IndexableMapping to allow datetime-based filter queries.

Indexing works like a dictionary. The dataset name (‘my_data.nc’, a string) is the key, and the value returned is an instance of Dataset. Positional indexing (e.g., [0]) is another valid method of indexing.

DatasetCollection is commonly encountered as the datasets attribute of a TDSCatalog. If a regex in filter_time_nearest or `filter_time_range` does not provide sufficient flexibility, or the ``TDSCatalog does not provide accurate times, iterating over datasets can be useful as part implementing a custom filter. For example, in for ds in catalog.datasets: print(ds), ds will be the dataset name, and ds can be used to implement further filtering logic.

filter_time_nearest(time, regex=None, strptime=None)[source]

Filter keys for an item closest to the desired time.

Loops over all keys in the collection and uses regex to extract and build datetime`s. The collection of `datetime`s is compared to `start and the value that has a datetime closest to that requested is returned.If none of the keys in the collection match the regex, indicating that the keys are not date/time-based, a ValueError is raised.

Parameters:
  • time (datetime.datetime) – The desired time

  • regex (str, optional) – The regular expression to use to extract date/time information from the key. If given, this should contain either 1. named groups: ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, and ‘microsecond’, as appropriate. When a match is found, any of those groups missing from the pattern will be assigned a value of 0. The default pattern looks for patterns like: 20171118_2356. or 2. a group named ‘strptime’ (e.g., r’_s(?P<strptime>d{13})’ for GOES-16 data) to be parsed with strptime.

  • strptime (str, optional) – the format string that corresponds to regex option (2) above. For example, GOES-16 data with a julian date matching the regex above is parsed with ‘%Y%j%H%M%S’.

Returns:

The value with a time closest to that desired

Return type:

Dataset

filter_time_range(start, end, regex=None, strptime=None)[source]

Filter keys for all items within the desired time range.

Loops over all keys in the collection and uses regex to extract and build datetime`s. From the collection of `datetime`s, all values within `start and end (inclusive) are returned. If none of the keys in the collection match the regex, indicating that the keys are not date/time-based, a ValueError is raised.

Parameters:
  • start (datetime.datetime) – The start of the desired time range, inclusive

  • end (datetime.datetime) – The end of the desired time range, inclusive

  • regex (str, optional) – The regular expression to use to extract date/time information from the key. If given, this should contain either 1. named groups: ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, and ‘microsecond’, as appropriate. When a match is found, any of those groups missing from the pattern will be assigned a value of 0. The default pattern looks for patterns like: 20171118_2356. or 2. a group named ‘strptime’ (e.g., r’_s(?P<strptime>d{13})’ for GOES-16 data) to be parsed with strptime.

  • strptime (str, optional) – the format string that corresponds to regex option (2) above. For example, GOES-16 data with a julian date matching the regex above is parsed with ‘%Y%j%H%M%S’.

Returns:

All values corresponding to times within the specified range

Return type:

List[Dataset]

class siphon.catalog.IndexableMapping[source]

Extend OrderedDict to allow index-based access to values.

class siphon.catalog.SimpleService(service_node)[source]

Hold information about an access service enabled on a dataset.

name

The name of the service

Type:

str

service_type

The service type (i.e. “OPENDAP”, “NetcdfSubset”, “WMS”, etc.)

Type:

str

access_urls

A dictionary of access urls whose keywords are the access service types defined in the catalog (for example, “OPENDAP”, “NetcdfSubset”, “WMS”, etc.)

Type:

dict[str, str]

__init__(service_node)[source]

Initialize the Dataset object.

Parameters:

service_node (Element) – An Element representing a service node

is_resolver()[source]

Return whether the service is a resolver service.

class siphon.catalog.TDSCatalog(catalog_url)[source]

Parse information from a THREDDS Client Catalog.

catalog_url

The url path of the catalog to parse.

Type:

str

base_tds_url

The top level server address

Type:

str

datasets

A dictionary of Dataset objects, whose keys are the name of the dataset’s name

Type:

DatasetCollection[str, Dataset]

services

A list of SimpleService listed in the catalog

Type:

List

catalog_refs

A dictionary of CatalogRef objects whose keys are the name of the catalog ref title.

Type:

DatasetCollection[str, CatalogRef]

__init__(catalog_url)[source]

Initialize the TDSCatalog object.

Parameters:

catalog_url (str) – The URL of a THREDDS client catalog

property latest

Get the latest dataset, if available.

siphon.catalog.get_latest_access_url(catalog_url, access_method)[source]

Get the data access url to the latest data using a specified access method.

These are available for a data available from a top level dataset catalog (url). Currently only supports the existence of one “latest” dataset.

Parameters:
  • catalog_url (str) – The URL of a top level data catalog

  • access_method (str) – desired data access method (i.e. “OPENDAP”, “NetcdfSubset”, “WMS”, etc)

Returns:

access_url – Data access URL to be used to access the latest data available from a given catalog using the specified access_method. Typically a single string, but not always.

Return type:

str