`siphon.catalog`

Code to support reading and parsing catalog files from a THREDDS Data Server (TDS).

They help identifying the latest dataset and finding proper URLs to access the data.

class siphon.catalog.CaseInsensitiveDict(*args, **kwargs)[source]

Extend dict to use a case-insensitive key set.

__init__(*args, **kwargs)[source]: Create a dict with a set of lowercase keys.

pop(key, *args, **kwargs)[source]: Remove and return the value associated with case-insensitive key.

class siphon.catalog.CaseInsensitiveStr(*args)[source]

Extend str to use case-insensitive comparison and lookup.

__init__(*args)[source]: Create str with a _lowered property.

class siphon.catalog.CatalogRef(base_url, element_node)[source]

An object for holding catalog references obtained from a THREDDS Client Catalog.

name

The name of the CatalogRef element

Type:: str

href

url to the CatalogRef’s THREDDS Client Catalog

Type:: str

title

Title of the CatalogRef element

Type:: str

__init__(base_url, element_node)[source]

Initialize the catalogRef object.

Parameters:

base_url (str) – URL to the base catalog that owns this reference
element_node (Element) – An Element representing a catalogRef node

follow()[source]

Follow the catalog reference and return a new TDSCatalog.

Returns:: The referenced catalog
Return type:: TDSCatalog

class siphon.catalog.CompoundService(service_node)[source]

Hold information about compound services.

name

The name of the compound service

Type:: str

service_type

The service type (for this object, service type will always be “COMPOUND”)

Type:: str

services

A list of SimpleService objects

Type:: list[SimpleService]

__init__(service_node)[source]

Initialize a CompoundService object.

Parameters:: service_node (Element) – An Element representing a compound service node

is_resolver()[source]

Return whether the service is a resolver service.

For a compound service, this is always False because it will never be a resolver.

class siphon.catalog.Dataset(element_node, catalog_url='')[source]

An object for holding Datasets obtained from a THREDDS Client Catalog.

name

The name of the Dataset element

Type:: str

url_path

url to the accessible dataset

Type:: str

access_urls

A dictionary of access urls whose keywords are the access service types defined in the catalog (for example, “OPENDAP”, “NetcdfSubset”, “WMS”, etc.

Type:: CaseInsensitiveDict[str, str]

__init__(element_node, catalog_url='')[source]

Initialize the Dataset object.

Parameters:

element_node (Element) – An Element representing a Dataset node
catalog_url (str) – The top level server url

access_with_service(service, use_xarray=None)[source]

Access the dataset using a particular service.

Return an Python object capable of communicating with the server using the particular service. For instance, for ‘HTTPServer’ this is a file-like object capable of HTTP communication; for OPENDAP this is a netCDF4 dataset.

Parameters:: service (str) – The name of the service for accessing the dataset
Return type:: An instance appropriate for communicating using service.

add_access_element_info(access_element)[source]: Create an access method from a catalog element.

download(filename=None)[source]

Download the dataset to a local file.

Parameters:: filename (str, optional) – The full path to which the dataset will be saved

make_access_urls(catalog_url, all_services, metadata=None)[source]

Make fully qualified urls for the access methods enabled on the dataset.

Parameters:

catalog_url (str) – The top level server url
all_services (List[SimpleService]) – list of SimpleService objects associated with the dataset
metadata (dict) – Metadata from the TDSCatalog

remote_access(service=None, use_xarray=None)[source]

Access the remote dataset.

Open the remote dataset and get a netCDF4-compatible Dataset object providing index-based subsetting capabilities.

Parameters:: service (str, optional) – The name of the service to use for access to the dataset, either ‘CdmRemote’ or ‘OPENDAP’. Defaults to ‘CdmRemote’.
Returns:: Object for netCDF4-like access to the dataset
Return type:: Dataset

remote_open(mode='b', encoding='ascii', errors='ignore')[source]

Open the remote dataset for random access.

Get a file-like object for reading from the remote dataset, providing random access, similar to a local file.

Parameters:

mode (‘b’ or ‘t’, optional) – Mode with which to open the remote data; ‘b’ for binary, ‘t’ for text. Defaults to ‘b’.
encoding (str, optional) – If mode is text, the encoding to use to decode the binary data into text. Defaults to ‘ascii’.
errors (str, optional) – If mode is text, the error handling behavior to pass to bytes.decode. Defaults to ‘ignore’.

Returns:

fobj – A random access, file-like object for reading data

Return type:

file-like object

resolve_url(catalog_url)[source]

Resolve the url of the dataset when reading latest.xml.

Parameters:: catalog_url (str) – The catalog url to be resolved

subset(service=None)[source]

Subset the dataset.

Open the remote dataset and get a client for talking to service.

Parameters:: service (str, optional) – The name of the service for subsetting the dataset. Defaults to ‘NetcdfSubset’ or ‘NetcdfServer’, in that order, depending on the services listed in the catalog.
Return type:: a client for communicating using service

class siphon.catalog.DatasetCollection[source]

Extend IndexableMapping to allow datetime-based filter queries.

Indexing works like a dictionary. The dataset name (‘my_data.nc’, a string) is the key, and the value returned is an instance of Dataset. Positional indexing (e.g., [0]) is another valid method of indexing.

DatasetCollection is commonly encountered as the datasets attribute of a TDSCatalog. If a regex in filter_time_nearest or `filter_time_range` does not provide sufficient flexibility, or the ``TDSCatalog does not provide accurate times, iterating over datasets can be useful as part implementing a custom filter. For example, in for ds in catalog.datasets: print(ds), ds will be the dataset name, and ds can be used to implement further filtering logic.

filter_time_nearest(time, regex=None, strptime=None)[source]

Filter keys for an item closest to the desired time.

Loops over all keys in the collection and uses regex to extract and build datetime`s. The collection of `datetime`s is compared to `start and the value that has a datetime closest to that requested is returned.If none of the keys in the collection match the regex, indicating that the keys are not date/time-based, a ValueError is raised.

Parameters:

time (datetime.datetime) – The desired time
regex (str, optional) – The regular expression to use to extract date/time information from the key. If given, this should contain either 1. named groups: ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, and ‘microsecond’, as appropriate. When a match is found, any of those groups missing from the pattern will be assigned a value of 0. The default pattern looks for patterns like: 20171118_2356. or 2. a group named ‘strptime’ (e.g., r’_s(?P<strptime>d{13})’ for GOES-16 data) to be parsed with strptime.
strptime (str, optional) – the format string that corresponds to regex option (2) above. For example, GOES-16 data with a julian date matching the regex above is parsed with ‘%Y%j%H%M%S’.

Returns:

The value with a time closest to that desired

Return type:

Dataset

filter_time_range(start, end, regex=None, strptime=None)[source]

Filter keys for all items within the desired time range.

Loops over all keys in the collection and uses regex to extract and build datetime`s. From the collection of `datetime`s, all values within `start and end (inclusive) are returned. If none of the keys in the collection match the regex, indicating that the keys are not date/time-based, a ValueError is raised.

Parameters:

start (datetime.datetime) – The start of the desired time range, inclusive
end (datetime.datetime) – The end of the desired time range, inclusive
regex (str, optional) – The regular expression to use to extract date/time information from the key. If given, this should contain either 1. named groups: ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, and ‘microsecond’, as appropriate. When a match is found, any of those groups missing from the pattern will be assigned a value of 0. The default pattern looks for patterns like: 20171118_2356. or 2. a group named ‘strptime’ (e.g., r’_s(?P<strptime>d{13})’ for GOES-16 data) to be parsed with strptime.
strptime (str, optional) – the format string that corresponds to regex option (2) above. For example, GOES-16 data with a julian date matching the regex above is parsed with ‘%Y%j%H%M%S’.

Returns:

All values corresponding to times within the specified range

Return type:

List[Dataset]

class siphon.catalog.IndexableMapping[source]: Extend OrderedDict to allow index-based access to values.

class siphon.catalog.SimpleService(service_node)[source]

Hold information about an access service enabled on a dataset.

name

The name of the service

Type:: str

service_type

The service type (i.e. “OPENDAP”, “NetcdfSubset”, “WMS”, etc.)

Type:: str

access_urls

A dictionary of access urls whose keywords are the access service types defined in the catalog (for example, “OPENDAP”, “NetcdfSubset”, “WMS”, etc.)

Type:: dict[str, str]

__init__(service_node)[source]

Initialize the Dataset object.

Parameters:: service_node (Element) – An Element representing a service node

is_resolver()[source]: Return whether the service is a resolver service.

class siphon.catalog.TDSCatalog(catalog_url)[source]

Parse information from a THREDDS Client Catalog.

catalog_url

The url path of the catalog to parse.

Type:: str

base_tds_url

The top level server address

Type:: str

datasets

A dictionary of Dataset objects, whose keys are the name of the dataset’s name

Type:: DatasetCollection[str, Dataset]

services

A list of SimpleService listed in the catalog

Type:: List

catalog_refs

A dictionary of CatalogRef objects whose keys are the name of the catalog ref title.

Type:: DatasetCollection[str, CatalogRef]

__init__(catalog_url)[source]

Initialize the TDSCatalog object.

Parameters:: catalog_url (str) – The URL of a THREDDS client catalog

property latest: Get the latest dataset, if available.

siphon.catalog.get_latest_access_url(catalog_url, access_method)[source]

Get the data access url to the latest data using a specified access method.

These are available for a data available from a top level dataset catalog (url). Currently only supports the existence of one “latest” dataset.

Parameters:

catalog_url (str) – The URL of a top level data catalog
access_method (str) – desired data access method (i.e. “OPENDAP”, “NetcdfSubset”, “WMS”, etc)

Returns:

access_url – Data access URL to be used to access the latest data available from a given catalog using the specified access_method. Typically a single string, but not always.

Return type:

str

siphon.catalog

`siphon.catalog`