Package `netCDF4`

Version 1.7.0

Introduction

netcdf4-python is a Python interface to the netCDF C library.

netCDF version 4 has many features not found in earlier versions of the library and is implemented on top of HDF5. This module can read and write files in both the new netCDF 4 and the old netCDF 3 format, and can create files that are readable by HDF5 clients. The API modelled after Scientific.IO.NetCDF, and should be familiar to users of that module.

Most new features of netCDF 4 are implemented, such as multiple unlimited dimensions, groups and data compression. All the new numeric data types (such as 64 bit and unsigned integer types) are implemented. Compound (struct), variable length (vlen) and enumerated (enum) data types are supported, but not the opaque data type. Mixtures of compound, vlen and enum data types (such as compound types containing enums, or vlens containing compound types) are not supported.

Quick Install

the easiest way to get going is to install via pip install netCDF4. (or if you use the conda package manager conda install -c conda-forge netCDF4).

Developer Install

Clone the github repository. Make sure you either clone recursively, or run git submodule update --init to ensure all the submodules are also checked out.
Make sure the dependencies are satisfied (Python 3.8 or later, numpy, Cython, cftime, setuptools, the HDF5 C library, and the netCDF C library). For MPI parallel IO support, an MPI-enabled versions of the netcdf library is required, as is mpi4py. Parallel IO further depends on the existence of MPI-enabled HDF5 or the PnetCDF library.
By default, the utility nc-config (installed with netcdf-c) will be run used to determine where all the dependencies live.
If nc-config is not in your default PATH, you can set the NETCDF4_DIR environment variable and setup.py will look in $NETCDF4_DIR/bin. You can also use the file setup.cfg to set the path to nc-config, or enter the paths to the libraries and include files manually. Just edit the setup.cfg file in a text editor and follow the instructions in the comments. To disable the use of nc-config, set the env var USE_NCCONFIG to 0. To disable the use of setup.cfg, set USE_SETUPCFG to 0. As a last resort, the library and include paths can be set via environment variables. If you go this route, set USE_NCCONFIG and USE_SETUPCFG to 0, and specify NETCDF4_LIBDIR, NETCDF4_INCDIR, HDF5_LIBDIR and HDF5_INCDIR. If the dependencies are not found in any of the paths specified by environment variables, then standard locations (such as /usr and /usr/local) are searched.
if the env var NETCDF_PLUGIN_DIR is set to point to the location of the netcdf-c compression plugins built by netcdf >= 4.9.0, they will be installed inside the package. In this case HDF5_PLUGIN_PATH will be set to the package installation path on import, so the extra compression algorithms available in netcdf-c >= 4.9.0 will automatically be available. Otherwise, the user will have to set HDF5_PLUGIN_PATH explicitly to have access to the extra compression plugins.
run pip install -v . (as root if necessary)
run the tests in the 'test' directory by running python run_all.py.

Tutorial

Creating/Opening/Closing a netCDF file
Groups in a netCDF file
Dimensions in a netCDF file
Variables in a netCDF file
Attributes in a netCDF file
Dealing with time coordinates
Writing data to and retrieving data from a netCDF variable
Reading data from a multi-file netCDF dataset
Efficient compression of netCDF variables
Beyond homogeneous arrays of a fixed type - compound data types
Variable-length (vlen) data types
Enum data type
Parallel IO
Dealing with strings
In-memory (diskless) Datasets

All of the code in this tutorial is available in examples/tutorial.py, except the parallel IO example, which is in examples/mpi_example.py. Unit tests are in the test directory.

Creating/Opening/Closing a netCDF file

To create a netCDF file from python, you simply call the Dataset constructor. This is also the method used to open an existing netCDF file. If the file is open for write access (mode='w', 'r+' or 'a'), you may write any type of data including new dimensions, groups, variables and attributes. netCDF files come in five flavors (NETCDF3_CLASSIC, NETCDF3_64BIT_OFFSET, NETCDF3_64BIT_DATA, NETCDF4_CLASSIC<code>, and </code>NETCDF4). NETCDF3_CLASSIC was the original netcdf binary format, and was limited to file sizes less than 2 Gb. NETCDF3_64BIT_OFFSET was introduced in version 3.6.0 of the library, and extended the original binary format to allow for file sizes greater than 2 Gb. NETCDF3_64BIT_DATA is a new format that requires version 4.4.0 of the C library - it extends the NETCDF3_64BIT_OFFSET binary format to allow for unsigned/64 bit integer data types and 64-bit dimension sizes. NETCDF3_64BIT is an alias for NETCDF3_64BIT_OFFSET. NETCDF4_CLASSIC files use the version 4 disk format (HDF5), but omits features not found in the version 3 API. They can be read by netCDF 3 clients only if they have been relinked against the netCDF 4 library. They can also be read by HDF5 clients. NETCDF4 files use the version 4 disk format (HDF5) and use the new features of the version 4 API. The netCDF4 module can read and write files in any of these formats. When creating a new file, the format may be specified using the format keyword in the Dataset constructor. The default format is NETCDF4. To see how a given file is formatted, you can examine the data_model attribute. Closing the netCDF file is accomplished via the Dataset.close() method of the Dataset instance.

Here's an example:

>>> from netCDF4 import Dataset
>>> rootgrp = Dataset("test.nc", "w", format="NETCDF4")
>>> print(rootgrp.data_model)
NETCDF4
>>> rootgrp.close()

Remote OPeNDAP-hosted datasets can be accessed for reading over http if a URL is provided to the Dataset constructor instead of a filename. However, this requires that the netCDF library be built with OPenDAP support, via the --enable-dap configure option (added in version 4.0.1).

Groups in a netCDF file

netCDF version 4 added support for organizing data in hierarchical groups, which are analogous to directories in a filesystem. Groups serve as containers for variables, dimensions and attributes, as well as other groups. A Dataset creates a special group, called the 'root group', which is similar to the root directory in a unix filesystem. To create Group instances, use the Dataset.createGroup() method of a Dataset or Group instance. Dataset.createGroup() takes a single argument, a python string containing the name of the new group. The new Group instances contained within the root group can be accessed by name using the groups dictionary attribute of the Dataset instance. Only NETCDF4 formatted files support Groups, if you try to create a Group in a netCDF 3 file you will get an error message.

>>> rootgrp = Dataset("test.nc", "a")
>>> fcstgrp = rootgrp.createGroup("forecasts")
>>> analgrp = rootgrp.createGroup("analyses")
>>> print(rootgrp.groups)
{'forecasts': <class 'netCDF4._netCDF4.Group'>
group /forecasts:
    dimensions(sizes):
    variables(dimensions):
    groups: , 'analyses': <class 'netCDF4._netCDF4.Group'>
group /analyses:
    dimensions(sizes):
    variables(dimensions):
    groups: }
>>>

Groups can exist within groups in a Dataset, just as directories exist within directories in a unix filesystem. Each Group instance has a groups attribute dictionary containing all of the group instances contained within that group. Each Group instance also has a path attribute that contains a simulated unix directory path to that group. To simplify the creation of nested groups, you can use a unix-like path as an argument to Dataset.createGroup().

>>> fcstgrp1 = rootgrp.createGroup("/forecasts/model1")
>>> fcstgrp2 = rootgrp.createGroup("/forecasts/model2")

If any of the intermediate elements of the path do not exist, they are created, just as with the unix command 'mkdir -p'. If you try to create a group that already exists, no error will be raised, and the existing group will be returned.

Here's an example that shows how to navigate all the groups in a Dataset. The function walktree is a Python generator that is used to walk the directory tree. Note that printing the Dataset or Group object yields summary information about it's contents.

>>> def walktree(top):
...     yield top.groups.values()
...     for value in top.groups.values():
...         yield from walktree(value)
>>> print(rootgrp)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    dimensions(sizes):
    variables(dimensions):
    groups: forecasts, analyses
>>> for children in walktree(rootgrp):
...     for child in children:
...         print(child)
<class 'netCDF4._netCDF4.Group'>
group /forecasts:
    dimensions(sizes):
    variables(dimensions):
    groups: model1, model2
<class 'netCDF4._netCDF4.Group'>
group /analyses:
    dimensions(sizes):
    variables(dimensions):
    groups:
<class 'netCDF4._netCDF4.Group'>
group /forecasts/model1:
    dimensions(sizes):
    variables(dimensions):
    groups:
<class 'netCDF4._netCDF4.Group'>
group /forecasts/model2:
    dimensions(sizes):
    variables(dimensions):
    groups:

Dimensions in a netCDF file

netCDF defines the sizes of all variables in terms of dimensions, so before any variables can be created the dimensions they use must be created first. A special case, not often used in practice, is that of a scalar variable, which has no dimensions. A dimension is created using the Dataset.createDimension() method of a Dataset or Group instance. A Python string is used to set the name of the dimension, and an integer value is used to set the size. To create an unlimited dimension (a dimension that can be appended to), the size value is set to None or 0. In this example, there both the time and level dimensions are unlimited. Having more than one unlimited dimension is a new netCDF 4 feature, in netCDF 3 files there may be only one, and it must be the first (leftmost) dimension of the variable.

>>> level = rootgrp.createDimension("level", None)
>>> time = rootgrp.createDimension("time", None)
>>> lat = rootgrp.createDimension("lat", 73)
>>> lon = rootgrp.createDimension("lon", 144)

All of the Dimension instances are stored in a python dictionary.

>>> print(rootgrp.dimensions)
{'level': <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'level', size = 0, 'time': <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 0, 'lat': <class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 73, 'lon': <class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 144}

Using the python len function with a Dimension instance returns current size of that dimension. Dimension.isunlimited() method of a Dimension instance be used to determine if the dimensions is unlimited, or appendable.

>>> print(len(lon))
144
>>> print(lon.isunlimited())
False
>>> print(time.isunlimited())
True

Printing the Dimension object provides useful summary info, including the name and length of the dimension, and whether it is unlimited.

>>> for dimobj in rootgrp.dimensions.values():
...     print(dimobj)
<class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'level', size = 0
<class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 0
<class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 73
<class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 144

Dimension names can be changed using the Datatset.renameDimension method of a Dataset or Group instance.

Variables in a netCDF file

netCDF variables behave much like python multidimensional array objects supplied by the numpy module. However, unlike numpy arrays, netCDF4 variables can be appended to along one or more 'unlimited' dimensions. To create a netCDF variable, use the Dataset.createVariable() method of a Dataset or Group instance. The Dataset.createVariable()j method has two mandatory arguments, the variable name (a Python string), and the variable datatype. The variable's dimensions are given by a tuple containing the dimension names (defined previously with Dataset.createDimension()). To create a scalar variable, simply leave out the dimensions keyword. The variable primitive datatypes correspond to the dtype attribute of a numpy array. You can specify the datatype as a numpy dtype object, or anything that can be converted to a numpy dtype object. Valid datatype specifiers include: 'f4' (32-bit floating point), 'f8' (64-bit floating point), 'i4' (32-bit signed integer), 'i2' (16-bit signed integer), 'i8' (64-bit signed integer), 'i1' (8-bit signed integer), 'u1' (8-bit unsigned integer), 'u2' (16-bit unsigned integer), 'u4' (32-bit unsigned integer), 'u8' (64-bit unsigned integer), or 'S1' (single-character string). The old Numeric single-character typecodes ('f','d','h', 's','b','B','c','i','l'), corresponding to ('f4','f8','i2','i2','i1','i1','S1','i4','i4'), will also work. The unsigned integer types and the 64-bit integer type can only be used if the file format is NETCDF4.

The dimensions themselves are usually also defined as variables, called coordinate variables. The Dataset.createVariable() method returns an instance of the Variable class whose methods can be used later to access and set variable data and attributes.

>>> times = rootgrp.createVariable("time","f8",("time",))
>>> levels = rootgrp.createVariable("level","i4",("level",))
>>> latitudes = rootgrp.createVariable("lat","f4",("lat",))
>>> longitudes = rootgrp.createVariable("lon","f4",("lon",))
>>> # two dimensions unlimited
>>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",))
>>> temp.units = "K"

To get summary info on a Variable instance in an interactive session, just print it.

>>> print(temp)
<class 'netCDF4._netCDF4.Variable'>
float32 temp(time, level, lat, lon)
    units: K
unlimited dimensions: time, level
current shape = (0, 0, 73, 144)
filling on, default _FillValue of 9.969209968386869e+36 used

You can use a path to create a Variable inside a hierarchy of groups.

>>> ftemp = rootgrp.createVariable("/forecasts/model1/temp","f4",("time","level","lat","lon",))

If the intermediate groups do not yet exist, they will be created.

You can also query a Dataset or Group instance directly to obtain Group or Variable instances using paths.

>>> print(rootgrp["/forecasts/model1"])  # a Group instance
<class 'netCDF4._netCDF4.Group'>
group /forecasts/model1:
    dimensions(sizes):
    variables(dimensions): float32 temp(time,level,lat,lon)
    groups:
>>> print(rootgrp["/forecasts/model1/temp"])  # a Variable instance
<class 'netCDF4._netCDF4.Variable'>
float32 temp(time, level, lat, lon)
path = /forecasts/model1
unlimited dimensions: time, level
current shape = (0, 0, 73, 144)
filling on, default _FillValue of 9.969209968386869e+36 used

All of the variables in the Dataset or Group are stored in a Python dictionary, in the same way as the dimensions:

>>> print(rootgrp.variables)
{'time': <class 'netCDF4._netCDF4.Variable'>
float64 time(time)
unlimited dimensions: time
current shape = (0,)
filling on, default _FillValue of 9.969209968386869e+36 used, 'level': <class 'netCDF4._netCDF4.Variable'>
int32 level(level)
unlimited dimensions: level
current shape = (0,)
filling on, default _FillValue of -2147483647 used, 'lat': <class 'netCDF4._netCDF4.Variable'>
float32 lat(lat)
unlimited dimensions:
current shape = (73,)
filling on, default _FillValue of 9.969209968386869e+36 used, 'lon': <class 'netCDF4._netCDF4.Variable'>
float32 lon(lon)
unlimited dimensions:
current shape = (144,)
filling on, default _FillValue of 9.969209968386869e+36 used, 'temp': <class 'netCDF4._netCDF4.Variable'>
float32 temp(time, level, lat, lon)
    units: K
unlimited dimensions: time, level
current shape = (0, 0, 73, 144)
filling on, default _FillValue of 9.969209968386869e+36 used}

Variable names can be changed using the Dataset.renameVariable() method of a Dataset instance.

Variables can be sliced similar to numpy arrays, but there are some differences. See Writing data to and retrieving data from a netCDF variable for more details.

Attributes in a netCDF file

There are two types of attributes in a netCDF file, global and variable. Global attributes provide information about a group, or the entire dataset, as a whole. Variable attributes provide information about one of the variables in a group. Global attributes are set by assigning values to Dataset or Group instance variables. Variable attributes are set by assigning values to Variable instances variables. Attributes can be strings, numbers or sequences. Returning to our example,

>>> import time
>>> rootgrp.description = "bogus example script"
>>> rootgrp.history = "Created " + time.ctime(time.time())
>>> rootgrp.source = "netCDF4 python module tutorial"
>>> latitudes.units = "degrees north"
>>> longitudes.units = "degrees east"
>>> levels.units = "hPa"
>>> temp.units = "K"
>>> times.units = "hours since 0001-01-01 00:00:00.0"
>>> times.calendar = "gregorian"

The Dataset.ncattrs() method of a Dataset, Group or Variable instance can be used to retrieve the names of all the netCDF attributes. This method is provided as a convenience, since using the built-in dir Python function will return a bunch of private methods and attributes that cannot (or should not) be modified by the user.

>>> for name in rootgrp.ncattrs():
...     print("Global attr {} = {}".format(name, getattr(rootgrp, name)))
Global attr description = bogus example script
Global attr history = Created Mon Jul  8 14:19:41 2019
Global attr source = netCDF4 python module tutorial

The __dict__ attribute of a Dataset, Group or Variable instance provides all the netCDF attribute name/value pairs in a python dictionary:

>>> print(rootgrp.__dict__)
{'description': 'bogus example script', 'history': 'Created Mon Jul  8 14:19:41 2019', 'source': 'netCDF4 python module tutorial'}

Attributes can be deleted from a netCDF Dataset, Group or Variable using the python del statement (i.e. del grp.foo removes the attribute foo the the group grp).

Writing data to and retrieving data from a netCDF variable

Now that you have a netCDF Variable instance, how do you put data into it? You can just treat it like an array and assign data to a slice.

>>> import numpy as np
>>> lats =  np.arange(-90,91,2.5)
>>> lons =  np.arange(-180,180,2.5)
>>> latitudes[:] = lats
>>> longitudes[:] = lons
>>> print("latitudes =\n{}".format(latitudes[:]))
latitudes =
[-90.  -87.5 -85.  -82.5 -80.  -77.5 -75.  -72.5 -70.  -67.5 -65.  -62.5
 -60.  -57.5 -55.  -52.5 -50.  -47.5 -45.  -42.5 -40.  -37.5 -35.  -32.5
 -30.  -27.5 -25.  -22.5 -20.  -17.5 -15.  -12.5 -10.   -7.5  -5.   -2.5
   0.    2.5   5.    7.5  10.   12.5  15.   17.5  20.   22.5  25.   27.5
  30.   32.5  35.   37.5  40.   42.5  45.   47.5  50.   52.5  55.   57.5
  60.   62.5  65.   67.5  70.   72.5  75.   77.5  80.   82.5  85.   87.5
  90. ]

Unlike NumPy's array objects, netCDF Variable objects with unlimited dimensions will grow along those dimensions if you assign data outside the currently defined range of indices.

>>> # append along two unlimited dimensions by assigning to slice.
>>> nlats = len(rootgrp.dimensions["lat"])
>>> nlons = len(rootgrp.dimensions["lon"])
>>> print("temp shape before adding data = {}".format(temp.shape))
temp shape before adding data = (0, 0, 73, 144)
>>>
>>> from numpy.random import uniform
>>> temp[0:5, 0:10, :, :] = uniform(size=(5, 10, nlats, nlons))
>>> print("temp shape after adding data = {}".format(temp.shape))
temp shape after adding data = (5, 10, 73, 144)
>>>
>>> # levels have grown, but no values yet assigned.
>>> print("levels shape after adding pressure data = {}".format(levels.shape))
levels shape after adding pressure data = (10,)

Note that the size of the levels variable grows when data is appended along the level dimension of the variable temp, even though no data has yet been assigned to levels.

>>> # now, assign data to levels dimension variable.
>>> levels[:] =  [1000.,850.,700.,500.,300.,250.,200.,150.,100.,50.]

However, that there are some differences between NumPy and netCDF variable slicing rules. Slices behave as usual, being specified as a start:stop:step triplet. Using a scalar integer index i takes the ith element and reduces the rank of the output array by one. Boolean array and integer sequence indexing behaves differently for netCDF variables than for numpy arrays. Only 1-d boolean arrays and integer sequences are allowed, and these indices work independently along each dimension (similar to the way vector subscripts work in fortran). This means that

>>> temp[0, 0, [0,1,2,3], [0,1,2,3]].shape
(4, 4)

returns an array of shape (4,4) when slicing a netCDF variable, but for a numpy array it returns an array of shape (4,). Similarly, a netCDF variable of shape (2,3,4,5) indexed with [0, array([True, False, True]), array([False, True, True, True]), :] would return a (2, 3, 5) array. In NumPy, this would raise an error since it would be equivalent to [0, [0,1], [1,2,3], :]. When slicing with integer sequences, the indices need not be sorted and may contain duplicates (both of these are new features in version 1.2.1). While this behaviour may cause some confusion for those used to NumPy's 'fancy indexing' rules, it provides a very powerful way to extract data from multidimensional netCDF variables by using logical operations on the dimension arrays to create slices.

For example,

>>> tempdat = temp[::2, [1,3,6], lats>0, lons>0]

will extract time indices 0,2 and 4, pressure levels 850, 500 and 200 hPa, all Northern Hemisphere latitudes and Eastern Hemisphere longitudes, resulting in a numpy array of shape (3, 3, 36, 71).

>>> print("shape of fancy temp slice = {}".format(tempdat.shape))
shape of fancy temp slice = (3, 3, 36, 71)

Special note for scalar variables: To extract data from a scalar variable v with no associated dimensions, use numpy.asarray(v) or v[…]. The result will be a numpy scalar array.

By default, netcdf4-python returns numpy masked arrays with values equal to the missing_value or _FillValue variable attributes masked for primitive and enum data types. The Dataset.set_auto_mask() Dataset and Variable methods can be used to disable this feature so that numpy arrays are always returned, with the missing values included. Prior to version 1.4.0 the default behavior was to only return masked arrays when the requested slice contained missing values. This behavior can be recovered using the Dataset.set_always_mask() method. If a masked array is written to a netCDF variable, the masked elements are filled with the value specified by the missing_value attribute. If the variable has no missing_value, the _FillValue is used instead.

Dealing with time coordinates

Time coordinate values pose a special challenge to netCDF users. Most metadata standards (such as CF) specify that time should be measure relative to a fixed date using a certain calendar, with units specified like hours since YY-MM-DD hh:mm:ss. These units can be awkward to deal with, without a utility to convert the values to and from calendar dates. The functions num2date and date2num are provided by cftime to do just that. Here's an example of how they can be used:

>>> # fill in times.
>>> from datetime import datetime, timedelta
>>> from cftime import num2date, date2num
>>> dates = [datetime(2001,3,1)+n*timedelta(hours=12) for n in range(temp.shape[0])]
>>> times[:] = date2num(dates,units=times.units,calendar=times.calendar)
>>> print("time values (in units {}):\n{}".format(times.units, times[:]))
time values (in units hours since 0001-01-01 00:00:00.0):
[17533104. 17533116. 17533128. 17533140. 17533152.]
>>> dates = num2date(times[:],units=times.units,calendar=times.calendar)
>>> print("dates corresponding to time values:\n{}".format(dates))
 [cftime.DatetimeGregorian(2001, 3, 1, 0, 0, 0, 0, has_year_zero=False)
  cftime.DatetimeGregorian(2001, 3, 1, 12, 0, 0, 0, has_year_zero=False)
  cftime.DatetimeGregorian(2001, 3, 2, 0, 0, 0, 0, has_year_zero=False)
  cftime.DatetimeGregorian(2001, 3, 2, 12, 0, 0, 0, has_year_zero=False)
  cftime.DatetimeGregorian(2001, 3, 3, 0, 0, 0, 0, has_year_zero=False)]

num2date() converts numeric values of time in the specified units and calendar to datetime objects, and date2num() does the reverse. All the calendars currently defined in the CF metadata convention are supported. A function called date2index() is also provided which returns the indices of a netCDF time variable corresponding to a sequence of datetime instances.

Reading data from a multi-file netCDF dataset

If you want to read data from a variable that spans multiple netCDF files, you can use the MFDataset class to read the data as if it were contained in a single file. Instead of using a single filename to create a Dataset instance, create a MFDataset instance with either a list of filenames, or a string with a wildcard (which is then converted to a sorted list of files using the python glob module). Variables in the list of files that share the same unlimited dimension are aggregated together, and can be sliced across multiple files. To illustrate this, let's first create a bunch of netCDF files with the same variable (with the same unlimited dimension). The files must in be in NETCDF3_64BIT_OFFSET, NETCDF3_64BIT_DATA, NETCDF3_CLASSIC or NETCDF4_CLASSIC format (NETCDF4 formatted multi-file datasets are not supported).

>>> for nf in range(10):
...     with Dataset("mftest%s.nc" % nf, "w", format="NETCDF4_CLASSIC") as f:
...         _ = f.createDimension("x",None)
...         x = f.createVariable("x","i",("x",))
...         x[0:10] = np.arange(nf*10,10*(nf+1))

Now read all the files back in at once with MFDataset

>>> from netCDF4 import MFDataset
>>> f = MFDataset("mftest*nc")
>>> print(f.variables["x"][:])
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
 96 97 98 99]

Note that MFDataset can only be used to read, not write, multi-file datasets.

Efficient compression of netCDF variables

Data stored in netCDF Variable objects can be compressed and decompressed on the fly. The compression algorithm used is determined by the compression keyword argument to the Dataset.createVariable() method. zlib compression is always available, szip is available if the linked HDF5 library supports it, and zstd, bzip2, blosc_lz,blosc_lz4,blosc_lz4hc, blosc_zlib and blosc_zstd are available via optional external plugins. The complevel keyword regulates the speed and efficiency of the compression for zlib, bzip and zstd (1 being fastest, but lowest compression ratio, 9 being slowest but best compression ratio). The default value of complevel is 4. Setting shuffle=False will turn off the HDF5 shuffle filter, which de-interlaces a block of data before zlib compression by reordering the bytes. The shuffle filter can significantly improve compression ratios, and is on by default if compression=zlib. Setting fletcher32 keyword argument to Dataset.createVariable() to True (it's False by default) enables the Fletcher32 checksum algorithm for error detection. It's also possible to set the HDF5 chunking parameters and endian-ness of the binary data stored in the HDF5 file with the chunksizes and endian keyword arguments to Dataset.createVariable(). These keyword arguments only are relevant for NETCDF4 and NETCDF4_CLASSIC files (where the underlying file format is HDF5) and are silently ignored if the file format is NETCDF3_CLASSIC, NETCDF3_64BIT_OFFSET or NETCDF3_64BIT_DATA. If the HDF5 library is built with szip support, compression=szip can also be used (in conjunction with the szip_coding and szip_pixels_per_block keyword arguments).

If your data only has a certain number of digits of precision (say for example, it is temperature data that was measured with a precision of 0.1 degrees), you can dramatically improve compression by quantizing (or truncating) the data. There are two methods supplied for doing this. You can use the least_significant_digit keyword argument to Dataset.createVariable() to specify the power of ten of the smallest decimal place in the data that is a reliable value. For example if the data has a precision of 0.1, then setting least_significant_digit=1 will cause data the data to be quantized using numpy.around(scale*data)/scale, where scale = 2**bits, and bits is determined so that a precision of 0.1 is retained (in this case bits=4). This is done at the python level and is not a part of the underlying C library. Starting with netcdf-c version 4.9.0, a quantization capability is provided in the library. This can be used via the significant_digits Dataset.createVariable() kwarg (new in version 1.6.0). The interpretation of significant_digits is different than least_signficant_digit in that it specifies the absolute number of significant digits independent of the magnitude of the variable (the floating point exponent). Either of these approaches makes the compression 'lossy' instead of 'lossless', that is some precision in the data is sacrificed for the sake of disk space.

In our example, try replacing the line

>>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",))

with

>>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib')

and then

>>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib',least_significant_digit=3)

or with netcdf-c >= 4.9.0

>>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib',significant_digits=4)

and see how much smaller the resulting files are.

Beyond homogeneous arrays of a fixed type - compound data types

Compound data types map directly to numpy structured (a.k.a 'record') arrays. Structured arrays are akin to C structs, or derived types in Fortran. They allow for the construction of table-like structures composed of combinations of other data types, including other compound types. Compound types might be useful for representing multiple parameter values at each point on a grid, or at each time and space location for scattered (point) data. You can then access all the information for a point by reading one variable, instead of reading different parameters from different variables. Compound data types are created from the corresponding numpy data type using the Dataset.createCompoundType() method of a Dataset or Group instance. Since there is no native complex data type in netcdf (but see Support for complex numbers), compound types are handy for storing numpy complex arrays. Here's an example:

>>> f = Dataset("complex.nc","w")
>>> size = 3 # length of 1-d complex array
>>> # create sample complex data.
>>> datac = np.exp(1j*(1.+np.linspace(0, np.pi, size)))
>>> # create complex128 compound data type.
>>> complex128 = np.dtype([("real",np.float64),("imag",np.float64)])
>>> complex128_t = f.createCompoundType(complex128,"complex128")
>>> # create a variable with this data type, write some data to it.
>>> x_dim = f.createDimension("x_dim",None)
>>> v = f.createVariable("cmplx_var",complex128_t,"x_dim")
>>> data = np.empty(size,complex128) # numpy structured array
>>> data["real"] = datac.real; data["imag"] = datac.imag
>>> v[:] = data # write numpy structured array to netcdf compound var
>>> # close and reopen the file, check the contents.
>>> f.close(); f = Dataset("complex.nc")
>>> v = f.variables["cmplx_var"]
>>> datain = v[:] # read in all the data into a numpy structured array
>>> # create an empty numpy complex array
>>> datac2 = np.empty(datain.shape,np.complex128)
>>> # .. fill it with contents of structured array.
>>> datac2.real = datain["real"]; datac2.imag = datain["imag"]
>>> print('{}: {}'.format(datac.dtype, datac)) # original data
complex128: [ 0.54030231+0.84147098j -0.84147098+0.54030231j -0.54030231-0.84147098j]
>>>
>>> print('{}: {}'.format(datac2.dtype, datac2)) # data from file
complex128: [ 0.54030231+0.84147098j -0.84147098+0.54030231j -0.54030231-0.84147098j]

Compound types can be nested, but you must create the 'inner' ones first. All possible numpy structured arrays cannot be represented as Compound variables - an error message will be raise if you try to create one that is not supported. All of the compound types defined for a Dataset or Group are stored in a Python dictionary, just like variables and dimensions. As always, printing objects gives useful summary information in an interactive session:

>>> print(f)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    dimensions(sizes): x_dim(3)
    variables(dimensions): {'names':['real','imag'], 'formats':['<f8','<f8'], 'offsets':[0,8], 'itemsize':16, 'aligned':True} cmplx_var(x_dim)
    groups:
>>> print(f.variables["cmplx_var"])
<class 'netCDF4._netCDF4.Variable'>
compound cmplx_var(x_dim)
compound data type: {'names':['real','imag'], 'formats':['<f8','<f8'], 'offsets':[0,8], 'itemsize':16, 'aligned':True}
unlimited dimensions: x_dim
current shape = (3,)
>>> print(f.cmptypes)
{'complex128': <class 'netCDF4._netCDF4.CompoundType'>: name = 'complex128', numpy dtype = {'names':['real','imag'], 'formats':['<f8','<f8'], 'offsets':[0,8], 'itemsize':16, 'aligned':True}}
>>> print(f.cmptypes["complex128"])
<class 'netCDF4._netCDF4.CompoundType'>: name = 'complex128', numpy dtype = {'names':['real','imag'], 'formats':['<f8','<f8'], 'offsets':[0,8], 'itemsize':16, 'aligned':True}

Variable-length (vlen) data types

NetCDF 4 has support for variable-length or "ragged" arrays. These are arrays of variable length sequences having the same type. To create a variable-length data type, use the Dataset.createVLType() method method of a Dataset or Group instance.

>>> f = Dataset("tst_vlen.nc","w")
>>> vlen_t = f.createVLType(np.int32, "phony_vlen")

The numpy datatype of the variable-length sequences and the name of the new datatype must be specified. Any of the primitive datatypes can be used (signed and unsigned integers, 32 and 64 bit floats, and characters), but compound data types cannot. A new variable can then be created using this datatype.

>>> x = f.createDimension("x",3)
>>> y = f.createDimension("y",4)
>>> vlvar = f.createVariable("phony_vlen_var", vlen_t, ("y","x"))

Since there is no native vlen datatype in numpy, vlen arrays are represented in python as object arrays (arrays of dtype object). These are arrays whose elements are Python object pointers, and can contain any type of python object. For this application, they must contain 1-D numpy arrays all of the same type but of varying length. In this case, they contain 1-D numpy int32 arrays of random length between 1 and 10.

>>> import random
>>> random.seed(54321)
>>> data = np.empty(len(y)*len(x),object)
>>> for n in range(len(y)*len(x)):
...     data[n] = np.arange(random.randint(1,10),dtype="int32")+1
>>> data = np.reshape(data,(len(y),len(x)))
>>> vlvar[:] = data
>>> print("vlen variable =\n{}".format(vlvar[:]))
vlen variable =
[[array([1, 2, 3, 4, 5, 6, 7, 8], dtype=int32) array([1, 2], dtype=int32)
  array([1, 2, 3, 4], dtype=int32)]
 [array([1, 2, 3], dtype=int32)
  array([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
  array([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)]
 [array([1, 2, 3, 4, 5, 6, 7], dtype=int32) array([1, 2, 3], dtype=int32)
  array([1, 2, 3, 4, 5, 6], dtype=int32)]
 [array([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)
  array([1, 2, 3, 4, 5], dtype=int32) array([1, 2], dtype=int32)]]
>>> print(f)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    dimensions(sizes): x(3), y(4)
    variables(dimensions): int32 phony_vlen_var(y,x)
    groups:
>>> print(f.variables["phony_vlen_var"])
<class 'netCDF4._netCDF4.Variable'>
vlen phony_vlen_var(y, x)
vlen data type: int32
unlimited dimensions:
current shape = (4, 3)
>>> print(f.vltypes["phony_vlen"])
<class 'netCDF4._netCDF4.VLType'>: name = 'phony_vlen', numpy dtype = int32

Numpy object arrays containing python strings can also be written as vlen variables, For vlen strings, you don't need to create a vlen data type. Instead, simply use the python str builtin (or a numpy string datatype with fixed length greater than 1) when calling the Dataset.createVariable() method.

>>> z = f.createDimension("z",10)
>>> strvar = f.createVariable("strvar", str, "z")

In this example, an object array is filled with random python strings with random lengths between 2 and 12 characters, and the data in the object array is assigned to the vlen string variable.

>>> chars = "1234567890aabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
>>> data = np.empty(10,"O")
>>> for n in range(10):
...     stringlen = random.randint(2,12)
...     data[n] = "".join([random.choice(chars) for i in range(stringlen)])
>>> strvar[:] = data
>>> print("variable-length string variable:\n{}".format(strvar[:]))
variable-length string variable:
['Lh' '25F8wBbMI' '53rmM' 'vvjnb3t63ao' 'qjRBQk6w' 'aJh' 'QF'
 'jtIJbJACaQk4' '3Z5' 'bftIIq']
>>> print(f)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    dimensions(sizes): x(3), y(4), z(10)
    variables(dimensions): int32 phony_vlen_var(y,x), <class 'str'> strvar(z)
    groups:
>>> print(f.variables["strvar"])
<class 'netCDF4._netCDF4.Variable'>
vlen strvar(z)
vlen data type: <class 'str'>
unlimited dimensions:
current shape = (10,)

It is also possible to set contents of vlen string variables with numpy arrays of any string or unicode data type. Note, however, that accessing the contents of such variables will always return numpy arrays with dtype object.

Enum data type

netCDF4 has an enumerated data type, which is an integer datatype that is restricted to certain named values. Since Enums don't map directly to a numpy data type, they are read and written as integer arrays.

Here's an example of using an Enum type to hold cloud type data. The base integer data type and a python dictionary describing the allowed values and their names are used to define an Enum data type using Dataset.createEnumType().

>>> nc = Dataset('clouds.nc','w')
>>> # python dict with allowed values and their names.
>>> enum_dict = {'Altocumulus': 7, 'Missing': 255,
... 'Stratus': 2, 'Clear': 0,
... 'Nimbostratus': 6, 'Cumulus': 4, 'Altostratus': 5,
... 'Cumulonimbus': 1, 'Stratocumulus': 3}
>>> # create the Enum type called 'cloud_t'.
>>> cloud_type = nc.createEnumType(np.uint8,'cloud_t',enum_dict)
>>> print(cloud_type)
<class 'netCDF4._netCDF4.EnumType'>: name = 'cloud_t', numpy dtype = uint8, fields/values ={'Altocumulus': 7, 'Missing': 255, 'Stratus': 2, 'Clear': 0, 'Nimbostratus': 6, 'Cumulus': 4, 'Altostratus': 5, 'Cumulonimbus': 1, 'Stratocumulus': 3}

A new variable can be created in the usual way using this data type. Integer data is written to the variable that represents the named cloud types in enum_dict. A ValueError will be raised if an attempt is made to write an integer value not associated with one of the specified names.

>>> time = nc.createDimension('time',None)
>>> # create a 1d variable of type 'cloud_type'.
>>> # The fill_value is set to the 'Missing' named value.
>>> cloud_var = nc.createVariable('primary_cloud',cloud_type,'time',
...                               fill_value=enum_dict['Missing'])
>>> # write some data to the variable.
>>> cloud_var[:] = [enum_dict[k] for k in ['Clear', 'Stratus', 'Cumulus',
...                                        'Missing', 'Cumulonimbus']]
>>> nc.close()
>>> # reopen the file, read the data.
>>> nc = Dataset('clouds.nc')
>>> cloud_var = nc.variables['primary_cloud']
>>> print(cloud_var)
<class 'netCDF4._netCDF4.Variable'>
enum primary_cloud(time)
    _FillValue: 255
enum data type: uint8
unlimited dimensions: time
current shape = (5,)
>>> print(cloud_var.datatype.enum_dict)
{'Altocumulus': 7, 'Missing': 255, 'Stratus': 2, 'Clear': 0, 'Nimbostratus': 6, 'Cumulus': 4, 'Altostratus': 5, 'Cumulonimbus': 1, 'Stratocumulus': 3}
>>> print(cloud_var[:])
[0 2 4 -- 1]
>>> nc.close()

Parallel IO

If MPI parallel enabled versions of netcdf and hdf5 or pnetcdf are detected, and mpi4py is installed, netcdf4-python will be built with parallel IO capabilities enabled. Parallel IO of NETCDF4 or NETCDF4_CLASSIC formatted files is only available if the MPI parallel HDF5 library is available. Parallel IO of classic netcdf-3 file formats is only available if the PnetCDF library is available. To use parallel IO, your program must be running in an MPI environment using mpi4py.

>>> from mpi4py import MPI
>>> import numpy as np
>>> from netCDF4 import Dataset
>>> rank = MPI.COMM_WORLD.rank  # The process ID (integer 0-3 for 4-process run)

To run an MPI-based parallel program like this, you must use mpiexec to launch several parallel instances of Python (for example, using mpiexec -np 4 python mpi_example.py). The parallel features of netcdf4-python are mostly transparent - when a new dataset is created or an existing dataset is opened, use the parallel keyword to enable parallel access.

>>> nc = Dataset('parallel_test.nc','w',parallel=True)

The optional comm keyword may be used to specify a particular MPI communicator (MPI_COMM_WORLD is used by default). Each process (or rank) can now write to the file indepedently. In this example the process rank is written to a different variable index on each task

>>> d = nc.createDimension('dim',4)
>>> v = nc.createVariable('var', np.int64, 'dim')
>>> v[rank] = rank
>>> nc.close()

% ncdump parallel_test.nc
netcdf parallel_test {
dimensions:
    dim = 4 ;
variables:
    int64 var(dim) ;
data:

    var = 0, 1, 2, 3 ;
}

There are two types of parallel IO, independent (the default) and collective. Independent IO means that each process can do IO independently. It should not depend on or be affected by other processes. Collective IO is a way of doing IO defined in the MPI-IO standard; unlike independent IO, all processes must participate in doing IO. To toggle back and forth between the two types of IO, use the Variable.set_collective() Variable method. All metadata operations (such as creation of groups, types, variables, dimensions, or attributes) are collective. There are a couple of important limitations of parallel IO:

parallel IO for NETCDF4 or NETCDF4_CLASSIC formatted files is only available if the netcdf library was compiled with MPI enabled HDF5.
parallel IO for all classic netcdf-3 file formats is only available if the netcdf library was compiled with PnetCDF.
If a variable has an unlimited dimension, appending data must be done in collective mode. If the write is done in independent mode, the operation will fail with a a generic "HDF Error".
You can write compressed data in parallel only with netcdf-c >= 4.7.4 and hdf5 >= 1.10.3 (although you can read in parallel with earlier versions). To write compressed data in parallel, the variable must be in 'collective IO mode'. This is done automatically on variable creation if compression is turned on, but if you are appending to a variable in an existing file, you must use Variable.set_collective()(True) before attempting to write to it.
You cannot use variable-length (VLEN) data types.

Dealing with strings

The most flexible way to store arrays of strings is with the Variable-length (vlen) string data type. However, this requires the use of the NETCDF4 data model, and the vlen type does not map very well numpy arrays (you have to use numpy arrays of dtype=object, which are arrays of arbitrary python objects). numpy does have a fixed-width string array data type, but unfortunately the netCDF data model does not. Instead fixed-width byte strings are typically stored as arrays of 8-bit characters. To perform the conversion to and from character arrays to fixed-width numpy string arrays, the following convention is followed by the python interface. If the _Encoding special attribute is set for a character array (dtype S1) variable, the chartostring() utility function is used to convert the array of characters to an array of strings with one less dimension (the last dimension is interpreted as the length of each string) when reading the data. The character set (usually ascii) is specified by the _Encoding attribute. If _Encoding is 'none' or 'bytes', then the character array is converted to a numpy fixed-width byte string array (dtype S#), otherwise a numpy unicode (dtype U#) array is created. When writing the data, stringtochar() is used to convert the numpy string array to an array of characters with one more dimension. For example,

>>> from netCDF4 import stringtochar
>>> nc = Dataset('stringtest.nc','w',format='NETCDF4_CLASSIC')
>>> _ = nc.createDimension('nchars',3)
>>> _ = nc.createDimension('nstrings',None)
>>> v = nc.createVariable('strings','S1',('nstrings','nchars'))
>>> datain = np.array(['foo','bar'],dtype='S3')
>>> v[:] = stringtochar(datain) # manual conversion to char array
>>> print(v[:]) # data returned as char array
[[b'f' b'o' b'o']
 [b'b' b'a' b'r']]
>>> v._Encoding = 'ascii' # this enables automatic conversion
>>> v[:] = datain # conversion to char array done internally
>>> print(v[:])  # data returned in numpy string array
['foo' 'bar']
>>> nc.close()

Even if the _Encoding attribute is set, the automatic conversion of char arrays to/from string arrays can be disabled with Variable.set_auto_chartostring().

A similar situation is often encountered with numpy structured arrays with subdtypes containing fixed-wdith byte strings (dtype=S#). Since there is no native fixed-length string netCDF datatype, these numpy structure arrays are mapped onto netCDF compound types with character array elements. In this case the string <-> char array conversion is handled automatically (without the need to set the _Encoding attribute) using numpy views. The structured array dtype (including the string elements) can even be used to define the compound data type - the string dtype will be converted to character array dtype under the hood when creating the netcdf compound type. Here's an example:

>>> nc = Dataset('compoundstring_example.nc','w')
>>> dtype = np.dtype([('observation', 'f4'),
...                      ('station_name','S10')])
>>> station_data_t = nc.createCompoundType(dtype,'station_data')
>>> _ = nc.createDimension('station',None)
>>> statdat = nc.createVariable('station_obs', station_data_t, ('station',))
>>> data = np.empty(2,dtype)
>>> data['observation'][:] = (123.,3.14)
>>> data['station_name'][:] = ('Boulder','New York')
>>> print(statdat.dtype) # strings actually stored as character arrays
{'names':['observation','station_name'], 'formats':['<f4',('S1', (10,))], 'offsets':[0,4], 'itemsize':16, 'aligned':True}
>>> statdat[:] = data # strings converted to character arrays internally
>>> print(statdat[:])  # character arrays converted back to strings
[(123.  , b'Boulder') (  3.14, b'New York')]
>>> print(statdat[:].dtype)
{'names':['observation','station_name'], 'formats':['<f4','S10'], 'offsets':[0,4], 'itemsize':16, 'aligned':True}
>>> statdat.set_auto_chartostring(False) # turn off auto-conversion
>>> statdat[:] = data.view(dtype=[('observation', 'f4'),('station_name','S1',10)])
>>> print(statdat[:])  # now structured array with char array subtype is returned
[(123.  , [b'B', b'o', b'u', b'l', b'd', b'e', b'r', b'', b'', b''])
 (  3.14, [b'N', b'e', b'w', b' ', b'Y', b'o', b'r', b'k', b'', b''])]
>>> nc.close()

Note that there is currently no support for mapping numpy structured arrays with unicode elements (dtype U#) onto netCDF compound types, nor is there support for netCDF compound types with vlen string components.

In-memory (diskless) Datasets

You can create netCDF Datasets whose content is held in memory instead of in a disk file. There are two ways to do this. If you don't need access to the memory buffer containing the Dataset from within python, the best way is to use the diskless=True keyword argument when creating the Dataset. If you want to save the Dataset to disk when you close it, also set persist=True. If you want to create a new read-only Dataset from an existing python memory buffer, use the memory keyword argument to pass the memory buffer when creating the Dataset. If you want to create a new in-memory Dataset, and then access the memory buffer directly from Python, use the memory keyword argument to specify the estimated size of the Dataset in bytes when creating the Dataset with mode='w'. Then, the Dataset.close() method will return a python memoryview object representing the Dataset. Below are examples illustrating both approaches.

>>> # create a diskless (in-memory) Dataset,
>>> # and persist the file to disk when it is closed.
>>> nc = Dataset('diskless_example.nc','w',diskless=True,persist=True)
>>> d = nc.createDimension('x',None)
>>> v = nc.createVariable('v',np.int32,'x')
>>> v[0:5] = np.arange(5)
>>> print(nc)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    dimensions(sizes): x(5)
    variables(dimensions): int32 v(x)
    groups:
>>> print(nc['v'][:])
[0 1 2 3 4]
>>> nc.close() # file saved to disk
>>> # create an in-memory dataset from an existing python
>>> # python memory buffer.
>>> # read the newly created netcdf file into a python
>>> # bytes object.
>>> with open('diskless_example.nc', 'rb') as f:
...     nc_bytes = f.read()
>>> # create a netCDF in-memory dataset from the bytes object.
>>> nc = Dataset('inmemory.nc', memory=nc_bytes)
>>> print(nc)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    dimensions(sizes): x(5)
    variables(dimensions): int32 v(x)
    groups:
>>> print(nc['v'][:])
[0 1 2 3 4]
>>> nc.close()
>>> # create an in-memory Dataset and retrieve memory buffer
>>> # estimated size is 1028 bytes - this is actually only
>>> # used if format is NETCDF3
>>> # (ignored for NETCDF4/HDF5 files).
>>> nc = Dataset('inmemory.nc', mode='w',memory=1028)
>>> d = nc.createDimension('x',None)
>>> v = nc.createVariable('v',np.int32,'x')
>>> v[0:5] = np.arange(5)
>>> nc_buf = nc.close() # close returns memoryview
>>> print(type(nc_buf))
<class 'memoryview'>
>>> # save nc_buf to disk, read it back in and check.
>>> with open('inmemory.nc', 'wb') as f:
...     f.write(nc_buf)
>>> nc = Dataset('inmemory.nc')
>>> print(nc)
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    dimensions(sizes): x(5)
    variables(dimensions): int32 v(x)
    groups:
>>> print(nc['v'][:])
[0 1 2 3 4]
>>> nc.close()

Support for complex numbers

Although there is no native support for complex numbers in netCDF, there are some common conventions for storing them. Two of the most common are to either use a compound datatype for the real and imaginary components, or a separate dimension. netCDF4 supports reading several of these conventions, as well as writing using one of two conventions (depending on file format). This support for complex numbers is enabled by setting auto_complex=True when opening a Dataset:

>>> complex_array = np.array([0 + 0j, 1 + 0j, 0 + 1j, 1 + 1j, 0.25 + 0.75j])
>>> with netCDF4.Dataset("complex.nc", "w", auto_complex=True) as nc:
...     nc.createDimension("x", size=len(complex_array))
...     var = nc.createVariable("data", "c16", ("x",))
...     var[:] = complex_array
...     print(var)
<class 'netCDF4._netCDF4.Variable'>
compound data(x)
compound data type: complex128
unlimited dimensions:
current shape = (5,)

When reading files using auto_complex=True, netCDF4 will interpret variables stored using the following conventions as complex numbers:

compound datatypes with two float or double members who names begin with r and i (case insensitive)
a dimension of length 2 named complex or ri

When writing files using auto_complex=True, netCDF4 will use:

a compound datatype named _PFNC_DOUBLE_COMPLEX_TYPE (or *FLOAT* as appropriate) with members r and i for netCDF4 formats;
or a dimension of length 2 named _pfnc_complex for netCDF3 or classic formats.

Support for complex numbers is handled via the nc-complex library. See there for further details.

contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

copyright: 2008 by Jeffrey Whitaker.

license: Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Functions

def chartostring(b, encoding='utf-8'): chartostring(b,encoding='utf-8')

convert a character array to a string array with one less dimension.

b: Input character array (numpy datatype 'S1' or 'U1'). Will be converted to a array of strings, where each string has a fixed length of b.shape[-1] characters.

optional kwarg encoding can be used to specify character encoding (default utf-8). If encoding is 'none' or 'bytes', a numpy.string_ btye array is returned.

returns a numpy string array with datatype 'UN' (or 'SN') and shape b.shape[:-1] where where N=b.shape[-1].
def date2index(dates, nctime, calendar=None, select='exact', has_year_zero=None): Return indices of a netCDF time variable corresponding to the given dates.

dates: A datetime object or a sequence of datetime objects. The datetime objects should not include a time-zone offset.

nctime: A netCDF time variable object. The nctime object must have a units attribute.

calendar: describes the calendar to be used in the time calculations. All the values currently defined in the CF metadata convention <http://cfconventions.org/cf-conventions/cf-conventions#calendar>__ are supported. Valid calendars 'standard', 'gregorian', 'proleptic_gregorian' 'noleap', '365_day', '360_day', 'julian', 'all_leap', '366_day'. Default is None which means the calendar associated with the first input datetime instance will be used.

select: 'exact', 'before', 'after', 'nearest' The index selection method. exact will return the indices perfectly matching the dates given. before and after will return the indices corresponding to the dates just before or just after the given dates if an exact match cannot be found. nearest will return the indices that correspond to the closest dates.

has_year_zero: if set to True, astronomical year numbering is used and the year zero exists. If set to False for real-world calendars, then historical year numbering is used and the year 1 is preceded by year -1 and no year zero exists. The defaults are set to conform with CF version 1.9 conventions (False for 'julian', 'gregorian'/'standard', True for 'proleptic_gregorian' (ISO 8601) and True for the idealized calendars 'noleap'/'365_day', '360_day', 366_day'/'all_leap') The defaults can only be over-ridden for the real-world calendars, for the the idealized calendars the year zero always exists and the has_year_zero kwarg is ignored. This kwarg is not needed to define calendar systems allowed by CF (the calendar-specific defaults do this).

returns an index (indices) of the netCDF time variable corresponding to the given datetime object(s).
def date2num(dates, units, calendar=None, has_year_zero=None, longdouble=False): Return numeric time values given datetime objects. The units of the numeric time values are described by the units argument and the calendar keyword. The datetime objects must be in UTC with no time-zone offset. If there is a time-zone offset in units, it will be applied to the returned numeric values.

dates: A datetime object or a sequence of datetime objects. The datetime objects should not include a time-zone offset. They can be either native python datetime instances (which use the proleptic gregorian calendar) or cftime.datetime instances.

units: a string of the form since describing the time units. can be days, hours, minutes, seconds, milliseconds or microseconds. is the time origin. months since is allowed only for the 360_day calendar and common_years since is allowed only for the 365_day calendar.

calendar: describes the calendar to be used in the time calculations. All the values currently defined in the CF metadata convention <http://cfconventions.org/cf-conventions/cf-conventions#calendar>__ are supported. Valid calendars 'standard', 'gregorian', 'proleptic_gregorian' 'noleap', '365_day', '360_day', 'julian', 'all_leap', '366_day'. Default is None which means the calendar associated with the first input datetime instance will be used.

has_year_zero: If set to True, astronomical year numbering is used and the year zero exists. If set to False for real-world calendars, then historical year numbering is used and the year 1 is preceded by year -1 and no year zero exists. The defaults are set to conform with CF version 1.9 conventions (False for 'julian', 'gregorian'/'standard', True for 'proleptic_gregorian' (ISO 8601) and True for the idealized calendars 'noleap'/'365_day', '360_day', 366_day'/'all_leap') Note that CF v1.9 does not specifically mention whether year zero is allowed in the proleptic_gregorian calendar, but ISO-8601 has a year zero so we have adopted this as the default. The defaults can only be over-ridden for the real-world calendars, for the the idealized calendars the year zero always exists and the has_year_zero kwarg is ignored. This kwarg is not needed to define calendar systems allowed by CF (the calendar-specific defaults do this).

longdouble: If set True, output is in the long double float type (numpy.float128) instead of float (numpy.float64), allowing microsecond accuracy when converting a time value to a date and back again. Otherwise this is only possible if the discretization of the time variable is an integer multiple of the units.

returns a numeric time value, or an array of numeric time values with approximately 1 microsecond accuracy.
def get_alignment(): get_alignment()

return current netCDF alignment within HDF5 files in a tuple (threshold,alignment). See netcdf C library documentation for nc_get_alignment for details. Values can be reset with set_alignment().

This function was added in netcdf 4.9.0.
def get_chunk_cache(): get_chunk_cache()

return current netCDF chunk cache information in a tuple (size,nelems,preemption). See netcdf C library documentation for nc_get_chunk_cache for details. Values can be reset with set_chunk_cache().
def getlibversion(): getlibversion()

returns a string describing the version of the netcdf library used to build the module, and when it was built.
def num2date(times, units, calendar='standard', only_use_cftime_datetimes=True, only_use_python_datetimes=False, has_year_zero=None): Return datetime objects given numeric time values. The units of the numeric time values are described by the units argument and the calendar keyword. The returned datetime objects represent UTC with no time-zone offset, even if the specified units contain a time-zone offset.

times: numeric time values.

units: a string of the form since describing the time units. can be days, hours, minutes, seconds, milliseconds or microseconds. is the time origin. months since is allowed only for the 360_day calendar and common_years since is allowed only for the 365_day calendar.

calendar: describes the calendar used in the time calculations. All the values currently defined in the CF metadata convention <http://cfconventions.org/cf-conventions/cf-conventions#calendar>__ are supported. Valid calendars 'standard', 'gregorian', 'proleptic_gregorian' 'noleap', '365_day', '360_day', 'julian', 'all_leap', '366_day'. Default is 'standard', which is a mixed Julian/Gregorian calendar.

only_use_cftime_datetimes: if False, python datetime.datetime objects are returned from num2date where possible; if True dates which subclass cftime.datetime are returned for all calendars. Default True.

only_use_python_datetimes: always return python datetime.datetime objects and raise an error if this is not possible. Ignored unless only_use_cftime_datetimes=False. Default False.

has_year_zero: if set to True, astronomical year numbering is used and the year zero exists. If set to False for real-world calendars, then historical year numbering is used and the year 1 is preceded by year -1 and no year zero exists. The defaults are set to conform with CF version 1.9 conventions (False for 'julian', 'gregorian'/'standard', True for 'proleptic_gregorian' (ISO 8601) and True for the idealized calendars 'noleap'/'365_day', '360_day', 366_day'/'all_leap') The defaults can only be over-ridden for the real-world calendars, for the the idealized calendars the year zero always exists and the has_year_zero kwarg is ignored. This kwarg is not needed to define calendar systems allowed by CF (the calendar-specific defaults do this).

returns a datetime instance, or an array of datetime instances with microsecond accuracy, if possible.

Note: If only_use_cftime_datetimes=False and use_only_python_datetimes=False, the datetime instances returned are 'real' python datetime objects if calendar='proleptic_gregorian', or calendar='standard' or 'gregorian' and the date is after the breakpoint between the Julian and Gregorian calendars (1582-10-15). Otherwise, they are ctime.datetime objects which support some but not all the methods of native python datetime objects. The datetime instances do not contain a time-zone offset, even if the specified units contains one.
def set_alignment(threshold, alignment): set_alignment()(threshold,alignment)

Change the HDF5 file alignment. See netcdf C library documentation for nc_set_alignment for details.

This function was added in netcdf 4.9.0.
def set_chunk_cache(size=None, nelems=None, preemption=None): set_chunk_cache(self,size=None,nelems=None,preemption=None)

change netCDF4 chunk cache settings. See netcdf C library documentation for nc_set_chunk_cache for details.
def stringtoarr(string, NUMCHARS, dtype='S'): stringtoarr(a, NUMCHARS,dtype='S')

convert a string to a character array of length NUMCHARS

a: Input python string.

NUMCHARS: number of characters used to represent string (if len(a) < NUMCHARS, it will be padded on the right with blanks).

dtype: type of numpy array to return. Default is 'S', which means an array of dtype 'S1' will be returned. If dtype='U', a unicode array (dtype = 'U1') will be returned.

returns a rank 1 numpy character array of length NUMCHARS with datatype 'S1' (default) or 'U1' (if dtype='U')
def stringtochar(a, encoding='utf-8'): stringtochar(a,encoding='utf-8')

convert a string array to a character array with one extra dimension

a: Input numpy string array with numpy datatype 'SN' or 'UN', where N is the number of characters in each string. Will be converted to an array of characters (datatype 'S1' or 'U1') of shape a.shape + (N,).

optional kwarg encoding can be used to specify character encoding (default utf-8). If encoding is 'none' or 'bytes', a numpy.string_ the input array is treated a raw byte strings (numpy.string_).

returns a numpy character array with datatype 'S1' or 'U1' and shape a.shape + (N,), where N is the length of each string in a.

Classes

class CompoundType (...)

A CompoundType instance is used to describe a compound data type, and can be passed to the the Dataset.createVariable() method of a Dataset or Group instance. Compound data types map to numpy structured arrays. See CompoundType for more details.

The instance variables dtype and name should not be modified by the user.

__init__(group, datatype, datatype_name)

CompoundType constructor.

group: Group instance to associate with the compound datatype.

datatype: A numpy dtype object describing a structured (a.k.a record) array. Can be composed of homogeneous numeric or character data types, or other structured array data types.

datatype_name: a Python string containing a description of the compound data type.

Note 1: When creating nested compound data types, the inner compound data types must already be associated with CompoundType instances (so create CompoundType instances for the innermost structures first).

Note 2: CompoundType instances should be created using the Dataset.createCompoundType() method of a Dataset or Group instance, not using this class directly.

Instance variables

var dtype: Return an attribute of instance, which is of type owner.
var dtype_view: Return an attribute of instance, which is of type owner.
var name: Return an attribute of instance, which is of type owner.

class Dataset (...)

A netCDF Dataset is a collection of dimensions, groups, variables and attributes. Together they describe the meaning of data and relations among data fields stored in a netCDF file. See Dataset for more details.

A list of attribute names corresponding to global netCDF attributes defined for the Dataset can be obtained with the Dataset.ncattrs() method. These attributes can be created by assigning to an attribute of the Dataset instance. A dictionary containing all the netCDF attribute name/value pairs is provided by the __dict__ attribute of a Dataset instance.

The following class variables are read-only and should not be modified by the user.

dimensions: The dimensions dictionary maps the names of dimensions defined for the Group or Dataset to instances of the Dimension class.

variables: The variables dictionary maps the names of variables defined for this Dataset or Group to instances of the Variable class.

groups: The groups dictionary maps the names of groups created for this Dataset or Group to instances of the Group class (the Dataset class is simply a special case of the Group class which describes the root group in the netCDF4 file).

cmptypes: The cmptypes dictionary maps the names of compound types defined for the Group or Dataset to instances of the CompoundType class.

vltypes: The vltypes dictionary maps the names of variable-length types defined for the Group or Dataset to instances of the VLType class.

enumtypes: The enumtypes dictionary maps the names of Enum types defined for the Group or Dataset to instances of the EnumType class.

data_model: data_model describes the netCDF data model version, one of NETCDF3_CLASSIC, NETCDF4, NETCDF4_CLASSIC, NETCDF3_64BIT_OFFSET or NETCDF3_64BIT_DATA.

file_format: same as data_model, retained for backwards compatibility.

disk_format: disk_format describes the underlying file format, one of NETCDF3, HDF5, HDF4, PNETCDF, DAP2, DAP4 or UNDEFINED. Only available if using netcdf C library version >= 4.3.1, otherwise will always return UNDEFINED.

parent: parent is a reference to the parent Group instance. None for the root group or Dataset instance.

path: path shows the location of the Group in the Dataset in a unix directory format (the names of groups in the hierarchy separated by backslashes). A Dataset instance is the root group, so the path is simply '/'.

keepweakref: If True, child Dimension and Variables objects only keep weak references to the parent Dataset or Group.

_ncstring_attrs__: If True, all text attributes will be written as variable-length strings.

__init__(self, filename, mode="r", clobber=True, diskless=False, persist=False, keepweakref=False, memory=None, encoding=None, parallel=False, comm=None, info=None, format='NETCDF4')

Dataset constructor.

filename: Name of netCDF file to hold dataset. Can also be a python 3 pathlib instance or the URL of an OpenDAP dataset. When memory is set this is just used to set the filepath().

mode: access mode. r means read-only; no data can be modified. w means write; a new file is created, an existing file with the same name is deleted. x means write, but fail if an existing file with the same name already exists. a and r+ mean append; an existing file is opened for reading and writing, if file does not exist already, one is created. Appending s to modes r, w, r+ or a will enable unbuffered shared access to NETCDF3_CLASSIC, NETCDF3_64BIT_OFFSET or NETCDF3_64BIT_DATA formatted files. Unbuffered access may be useful even if you don't need shared access, since it may be faster for programs that don't access data sequentially. This option is ignored for NETCDF4 and NETCDF4_CLASSIC formatted files.

clobber: if True (default), opening a file with mode='w' will clobber an existing file with the same name. if False, an exception will be raised if a file with the same name already exists. mode=x is identical to mode=w with clobber=False.

format: underlying file format (one of 'NETCDF4', 'NETCDF4_CLASSIC', 'NETCDF3_CLASSIC'<code>, </code>'NETCDF3_64BIT_OFFSET' or 'NETCDF3_64BIT_DATA'. Only relevant if mode = 'w' (if mode = 'r','a' or 'r+' the file format is automatically detected). Default 'NETCDF4', which means the data is stored in an HDF5 file, using netCDF 4 API features. Setting format='NETCDF4_CLASSIC' will create an HDF5 file, using only netCDF 3 compatible API features. netCDF 3 clients must be recompiled and linked against the netCDF 4 library to read files in NETCDF4_CLASSIC format. 'NETCDF3_CLASSIC' is the classic netCDF 3 file format that does not handle 2+ Gb files. 'NETCDF3_64BIT_OFFSET' is the 64-bit offset version of the netCDF 3 file format, which fully supports 2+ GB files, but is only compatible with clients linked against netCDF version 3.6.0 or later. 'NETCDF3_64BIT_DATA' is the 64-bit data version of the netCDF 3 file format, which supports 64-bit dimension sizes plus unsigned and 64 bit integer data types, but is only compatible with clients linked against netCDF version 4.4.0 or later.

diskless: If True, create diskless (in-core) file. This is a feature added to the C library after the netcdf-4.2 release. If you need to access the memory buffer directly, use the in-memory feature instead (see memory kwarg).

persist: if diskless=True, persist file to disk when closed (default False).

keepweakref: if True, child Dimension and Variable instances will keep weak references to the parent Dataset or Group object. Default is False, which means strong references will be kept. Having Dimension and Variable instances keep a strong reference to the parent Dataset instance, which in turn keeps a reference to child Dimension and Variable instances, creates circular references. Circular references complicate garbage collection, which may mean increased memory usage for programs that create may Dataset instances with lots of Variables. It also will result in the Dataset object never being deleted, which means it may keep open files alive as well. Setting keepweakref=True allows Dataset instances to be garbage collected as soon as they go out of scope, potentially reducing memory usage and open file handles. However, in many cases this is not desirable, since the associated Variable instances may still be needed, but are rendered unusable when the parent Dataset instance is garbage collected.

memory: if not None, create or open an in-memory Dataset. If mode = r, the memory kwarg must contain a memory buffer object (an object that supports the python buffer interface). The Dataset will then be created with contents taken from this block of memory. If mode = w, the memory kwarg should contain the anticipated size of the Dataset in bytes (used only for NETCDF3 files). A memory buffer containing a copy of the Dataset is returned by the Dataset.close() method. Requires netcdf-c version 4.4.1 for mode=r netcdf-c 4.6.2 for mode=w. To persist the file to disk, the raw bytes from the returned buffer can be written into a binary file. The Dataset can also be re-opened using this memory buffer.

encoding: encoding used to encode filename string into bytes. Default is None (sys.getdefaultfileencoding() is used).

parallel: open for parallel access using MPI (requires mpi4py and parallel-enabled netcdf-c and hdf5 libraries). Default is False. If True, comm and info kwargs may also be specified.

comm: MPI_Comm object for parallel access. Default None, which means MPI_COMM_WORLD will be used. Ignored if parallel=False.

info: MPI_Info object for parallel access. Default None, which means MPI_INFO_NULL will be used. Ignored if parallel=False.

auto_complex: if True, then automatically convert complex number types

Subclasses

netCDF4._netCDF4.Group
netCDF4._netCDF4.MFDataset

Static methods

def fromcdl(cdlfilename, ncfilename=None, mode='a', format='NETCDF4'): fromcdl(cdlfilename, ncfilename=None, mode='a',format='NETCDF4')

call ncgen via subprocess to create Dataset from CDL text representation. Requires ncgen to be installed and in $PATH.

cdlfilename: CDL file.

ncfilename: netCDF file to create. If not given, CDL filename with suffix replaced by .nc is used..

mode: Access mode to open Dataset (Default 'a').

format: underlying file format to use (one of 'NETCDF4', 'NETCDF4_CLASSIC', 'NETCDF3_CLASSIC'<code>, </code>'NETCDF3_64BIT_OFFSET' or 'NETCDF3_64BIT_DATA'. Default 'NETCDF4'.

Dataset instance for ncfilename is returned.

Instance variables

var auto_complex: Return an attribute of instance, which is of type owner.
var cmptypes: Return an attribute of instance, which is of type owner.
var data_model: Return an attribute of instance, which is of type owner.
var dimensions: Return an attribute of instance, which is of type owner.
var disk_format: Return an attribute of instance, which is of type owner.
var enumtypes: Return an attribute of instance, which is of type owner.
var file_format: Return an attribute of instance, which is of type owner.
var groups: Return an attribute of instance, which is of type owner.
var keepweakref: Return an attribute of instance, which is of type owner.
var name: string name of Group instance
var parent: Return an attribute of instance, which is of type owner.
var path: Return an attribute of instance, which is of type owner.
var variables: Return an attribute of instance, which is of type owner.
var vltypes: Return an attribute of instance, which is of type owner.

Methods

def close(self)

close(self)

Close the Dataset.

def createCompoundType(self, datatype, datatype_name)

createCompoundType(self, datatype, datatype_name)

Creates a new compound data type named datatype_name from the numpy dtype object datatype.

Note: If the new compound data type contains other compound data types (i.e. it is a 'nested' compound type, where not all of the elements are homogeneous numeric data types), then the 'inner' compound types must be created first.

The return value is the CompoundType class instance describing the new datatype.

def createDimension(self, dimname, size=None)

createDimension(self, dimname, size=None)

Creates a new dimension with the given dimname and size.

size must be a positive integer or None, which stands for "unlimited" (default is None). Specifying a size of 0 also results in an unlimited dimension. The return value is the Dimension class instance describing the new dimension. To determine the current maximum size of the dimension, use the len function on the Dimension instance. To determine if a dimension is 'unlimited', use the Dimension.isunlimited() method of the Dimension instance.

def createEnumType(self, datatype, datatype_name, enum_dict)

createEnumType(self, datatype, datatype_name, enum_dict)

Creates a new Enum data type named datatype_name from a numpy integer dtype object datatype, and a python dictionary defining the enum fields and values.

The return value is the EnumType class instance describing the new datatype.

def createGroup(self, groupname)

createGroup(self, groupname)

Creates a new Group with the given groupname.

If groupname is specified as a path, using forward slashes as in unix to separate components, then intermediate groups will be created as necessary (analogous to mkdir -p in unix). For example, createGroup('/GroupA/GroupB/GroupC') will create GroupA, GroupA/GroupB, and GroupA/GroupB/GroupC, if they don't already exist. If the specified path describes a group that already exists, no error is raised.

The return value is a Group class instance.

def createVLType(self, datatype, datatype_name)

createVLType(self, datatype, datatype_name)

Creates a new VLEN data type named datatype_name from a numpy dtype object datatype.

The return value is the VLType class instance describing the new datatype.

def createVariable(self, varname, datatype, dimensions=(), compression=None, zlib=False, complevel=4, shuffle=True, szip_coding='nn', szip_pixels_per_block=8, blosc_shuffle=1, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None, significant_digits=None, quantize_mode='BitGroom', fill_value=None, chunk_cache=None)

createVariable(self, varname, datatype, dimensions=(), compression=None, zlib=False, complevel=4, shuffle=True, fletcher32=False, contiguous=False, chunksizes=None, szip_coding='nn', szip_pixels_per_block=8, blosc_shuffle=1, endian='native', least_significant_digit=None, significant_digits=None, quantize_mode='BitGroom', fill_value=None, chunk_cache=None)

Creates a new variable with the given varname, datatype, and dimensions. If dimensions are not given, the variable is assumed to be a scalar.

If varname is specified as a path, using forward slashes as in unix to separate components, then intermediate groups will be created as necessary For example, createVariable('/GroupA/GroupB/VarC', float, ('x','y')) will create groups GroupA and GroupA/GroupB, plus the variable GroupA/GroupB/VarC, if the preceding groups don't already exist.

The datatype can be a numpy datatype object, or a string that describes a numpy dtype object (like the dtype.str attribute of a numpy array). Supported specifiers include: 'S1' or 'c' (NC_CHAR), 'i1' or 'b' or 'B' (NC_BYTE), 'u1' (NC_UBYTE), 'i2' or 'h' or 's' (NC_SHORT), 'u2' (NC_USHORT), 'i4' or 'i' or 'l' (NC_INT), 'u4' (NC_UINT), 'i8' (NC_INT64), 'u8' (NC_UINT64), 'f4' or 'f' (NC_FLOAT), 'f8' or 'd' (NC_DOUBLE). datatype can also be a CompoundType instance (for a structured, or compound array), a VLType instance (for a variable-length array), or the python str builtin (for a variable-length string array). Numpy string and unicode datatypes with length greater than one are aliases for str.

Data from netCDF variables is presented to python as numpy arrays with the corresponding data type.

dimensions must be a tuple containing Dimension instances and/or dimension names (strings) that have been defined previously using Dataset.createDimension(). The default value is an empty tuple, which means the variable is a scalar.

If the optional keyword argument compression is set, the data will be compressed in the netCDF file using the specified compression algorithm. Currently zlib,szip,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc, blosc_zlib and blosc_zstd are supported. Default is None (no compression). All of the compressors except zlib and szip use the HDF5 plugin architecture.

If the optional keyword zlib is True, the data will be compressed in the netCDF file using zlib compression (default False). The use of this option is deprecated in favor of compression='zlib'.

The optional keyword complevel is an integer between 0 and 9 describing the level of compression desired (default 4). Ignored if compression=None. A value of zero disables compression.

If the optional keyword shuffle is True, the HDF5 shuffle filter will be applied before compressing the data with zlib (default True). This significantly improves compression. Default is True. Ignored if zlib=False.

The optional kwarg blosc_shuffleis ignored unless the blosc compressor is used. blosc_shuffle can be 0 (no shuffle), 1 (byte-wise shuffle) or 2 (bit-wise shuffle). Default is 1.

The optional kwargs szip_coding and szip_pixels_per_block are ignored unless the szip compressor is used. szip_coding can be ec (entropy coding) or nn (nearest neighbor coding). Default is nn. szip_pixels_per_block can be 4, 8, 16 or 32 (default 8).

If the optional keyword fletcher32 is True, the Fletcher32 HDF5 checksum algorithm is activated to detect errors. Default False.

If the optional keyword contiguous is True, the variable data is stored contiguously on disk. Default False. Setting to True for a variable with an unlimited dimension will trigger an error. Fixed size variables (with no unlimited dimension) with no compression filters are contiguous by default.

The optional keyword chunksizes can be used to manually specify the HDF5 chunksizes for each dimension of the variable. A detailed discussion of HDF chunking and I/O performance is available here. The default chunking scheme in the netcdf-c library is discussed here. Basically, you want the chunk size for each dimension to match as closely as possible the size of the data block that users will read from the file. chunksizes cannot be set if contiguous=True.

The optional keyword endian can be used to control whether the data is stored in little or big endian format on disk. Possible values are little, big or native (default). The library will automatically handle endian conversions when the data is read, but if the data is always going to be read on a computer with the opposite format as the one used to create the file, there may be some performance advantage to be gained by setting the endian-ness.

The optional keyword fill_value can be used to override the default netCDF _FillValue (the value that the variable gets filled with before any data is written to it, defaults given in the dict netCDF4.default_fillvals). If fill_value is set to False, then the variable is not pre-filled.

If the optional keyword parameters least_significant_digit or significant_digits are specified, variable data will be truncated (quantized). In conjunction with compression='zlib' this produces 'lossy', but significantly more efficient compression. For example, if least_significant_digit=1, data will be quantized using numpy.around(scale*data)/scale, where scale = 2**bits, and bits is determined so that a precision of 0.1 is retained (in this case bits=4). From the PSL metadata conventions: "least_significant_digit – power of ten of the smallest decimal place in unpacked data that is a reliable value." Default is None, or no quantization, or 'lossless' compression. If significant_digits=3 then the data will be quantized so that three significant digits are retained, independent of the floating point exponent. The keyword argument quantize_mode controls the quantization algorithm (default 'BitGroom', 'BitRound' and 'GranularBitRound' also available). The 'GranularBitRound' algorithm may result in better compression for typical geophysical datasets. This significant_digits kwarg is only available with netcdf-c >= 4.9.0, and only works with NETCDF4 or NETCDF4_CLASSIC formatted files.

When creating variables in a NETCDF4 or NETCDF4_CLASSIC formatted file, HDF5 creates something called a 'chunk cache' for each variable. The default size of the chunk cache may be large enough to completely fill available memory when creating thousands of variables. The optional keyword chunk_cache allows you to reduce (or increase) the size of the default chunk cache when creating a variable. The setting only persists as long as the Dataset is open - you can use the set_var_chunk_cache method to change it the next time the Dataset is opened. Warning - messing with this parameter can seriously degrade performance.

The return value is the Variable class instance describing the new variable.

A list of names corresponding to netCDF variable attributes can be obtained with the Variable method Variable.ncattrs(). A dictionary containing all the netCDF attribute name/value pairs is provided by the __dict__ attribute of a Variable instance.

Variable instances behave much like array objects. Data can be assigned to or retrieved from a variable with indexing and slicing operations on the Variable instance. A Variable instance has six Dataset standard attributes: dimensions, dtype, shape, ndim, name and least_significant_digit. Application programs should never modify these attributes. The dimensions attribute is a tuple containing the names of the dimensions associated with this variable. The dtype attribute is a string describing the variable's data type (i4, f8, S1,<code> etc). The </code>shape attribute is a tuple describing the current sizes of all the variable's dimensions. The name attribute is a string containing the name of the Variable instance. The least_significant_digit attributes describes the power of ten of the smallest decimal place in the data the contains a reliable value. assigned to the Variable instance. The ndim attribute is the number of variable dimensions.

def delncattr(self, name)

delncattr(self,name,value)

delete a netCDF dataset or group attribute. Use if you need to delete a netCDF attribute with the same name as one of the reserved python attributes.

def filepath(self, encoding=None)

filepath(self,encoding=None)

Get the file system path (or the opendap URL) which was used to open/create the Dataset. Requires netcdf >= 4.1.2. The path is decoded into a string using sys.getfilesystemencoding() by default, this can be changed using the encoding kwarg.

def get_variables_by_attributes(self, **kwargs)

get_variables_by_attributes(self, **kwargs)

Returns a list of variables that match specific conditions.

Can pass in key=value parameters and variables are returned that contain all of the matches. For example,

>>> # Get variables with x-axis attribute.
>>> vs = nc.get_variables_by_attributes(axis='X')
>>> # Get variables with matching "standard_name" attribute
>>> vs = nc.get_variables_by_attributes(standard_name='northward_sea_water_velocity')

Can pass in key=callable parameter and variables are returned if the callable returns True. The callable should accept a single parameter, the attribute value. None is given as the attribute value when the attribute does not exist on the variable. For example,

>>> # Get Axis variables
>>> vs = nc.get_variables_by_attributes(axis=lambda v: v in ['X', 'Y', 'Z', 'T'])
>>> # Get variables that don't have an "axis" attribute
>>> vs = nc.get_variables_by_attributes(axis=lambda v: v is None)
>>> # Get variables that have a "grid_mapping" attribute
>>> vs = nc.get_variables_by_attributes(grid_mapping=lambda v: v is not None)

def getncattr(self, name, encoding='utf-8')

getncattr(self,name)

retrieve a netCDF dataset or group attribute. Use if you need to get a netCDF attribute with the same name as one of the reserved python attributes.

option kwarg encoding can be used to specify the character encoding of a string attribute (default is utf-8).

def has_blosc_filter(self)

has_blosc_filter(self) returns True if blosc compression filter is available

def has_bzip2_filter(self)

has_bzip2_filter(self) returns True if bzip2 compression filter is available

def has_szip_filter(self)

has_szip_filter(self) returns True if szip compression filter is available

def has_zstd_filter(self)

has_zstd_filter(self) returns True if zstd compression filter is available

def isopen(self)

isopen(self)

Is the Dataset open or closed?

def ncattrs(self)

ncattrs(self)

return netCDF global attribute names for this Dataset or Group in a list.

def renameAttribute(self, oldname, newname)

renameAttribute(self, oldname, newname)

rename a Dataset or Group attribute named oldname to newname.

def renameDimension(self, oldname, newname)

renameDimension(self, oldname, newname)

rename a Dimension named oldname to newname.

def renameGroup(self, oldname, newname)

renameGroup(self, oldname, newname)

rename a Group named oldname to newname (requires netcdf >= 4.3.1).

def renameVariable(self, oldname, newname)

renameVariable(self, oldname, newname)

rename a Variable named oldname to newname

def set_always_mask(self, value)

set_always_mask(self, True_or_False)

Call Variable.set_always_mask() for all variables contained in this Dataset or Group, as well as for all variables in all its subgroups.

True_or_False: Boolean determining if automatic conversion of masked arrays with no missing values to regular numpy arrays shall be applied for all variables. Default True. Set to False to restore the default behaviour in versions prior to 1.4.1 (numpy array returned unless missing values are present, otherwise masked array returned).