xarray with MetPy Tutorial

xarray is a powerful Python package that provides N-dimensional labeled arrays and datasets following the Common Data Model. While the process of integrating xarray features into MetPy is ongoing, this tutorial demonstrates how xarray can be used within the current version of MetPy. MetPy’s integration primarily works through accessors which allow simplified projection handling and coordinate identification. Unit and calculation support is currently available in a limited fashion, but should be improved in future versions.

import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import xarray as xr

# Any import of metpy will activate the accessors
import metpy.calc as mpcalc
from metpy.testing import get_test_data

Getting Data

While xarray can handle a wide variety of n-dimensional data (essentially anything that can be stored in a netCDF file), a common use case is working with model output. Such model data can be obtained from a THREDDS Data Server using the siphon package, but for this tutorial, we will use an example subset of GFS data from Hurrican Irma (September 5th, 2017).

# Open the netCDF file as a xarray Dataset
data = xr.open_dataset(get_test_data('irma_gfs_example.nc', False))

# View a summary of the Dataset
print(data)

Out:

<xarray.Dataset>
Dimensions:                              (isobaric1: 21, isobaric3: 31, latitude: 81, longitude: 131, time1: 9)
Coordinates:
  * time1                                (time1) datetime64[ns] 2017-09-05T12:00:00 ... 2017-09-06T12:00:00
    reftime                              datetime64[ns] ...
  * latitude                             (latitude) float32 50.0 49.5 ... 10.0
  * isobaric3                            (isobaric3) float64 100.0 ... 1e+05
  * isobaric1                            (isobaric1) float64 1e+04 ... 1e+05
  * longitude                            (longitude) float32 250.0 ... 315.0
Data variables:
    Vertical_velocity_pressure_isobaric  (time1, isobaric1, latitude, longitude) float32 ...
    Relative_humidity_isobaric           (time1, isobaric3, latitude, longitude) float32 ...
    Temperature_isobaric                 (time1, isobaric3, latitude, longitude) float32 ...
    u-component_of_wind_isobaric         (time1, isobaric3, latitude, longitude) float32 ...
    v-component_of_wind_isobaric         (time1, isobaric3, latitude, longitude) float32 ...
    Geopotential_height_isobaric         (time1, isobaric3, latitude, longitude) float32 ...
    LatLon_361X720-0p25S-180p00E         int32 ...
Attributes:
    Originating_or_generating_Center:                                        ...
    Originating_or_generating_Subcenter:                                     ...
    GRIB_table_version:                                                      ...
    Type_of_generating_process:                                              ...
    Analysis_or_forecast_generating_process_identifier_defined_by_originating...
    Conventions:                                                             ...
    history:                                                                 ...
    featureType:                                                             ...
    History:                                                                 ...
    geospatial_lat_min:                                                      ...
    geospatial_lat_max:                                                      ...
    geospatial_lon_min:                                                      ...
    geospatial_lon_max:                                                      ...

Preparing Data

To make use of the data within MetPy, we need to parse the dataset for projection and coordinate information following the CF conventions. For this, we use the data.metpy.parse_cf() method, which will return a new, parsed DataArray or Dataset.

Additionally, we rename our data variables for easier reference.

# To parse the full dataset, we can call parse_cf without an argument, and assign the returned
# Dataset.
data = data.metpy.parse_cf()

# If we instead want just a single variable, we can pass that variable name to parse_cf and
# it will return just that data variable as a DataArray.
data_var = data.metpy.parse_cf('Temperature_isobaric')

# To rename variables, supply a dictionary between old and new names to the rename method
data.rename({
    'Vertical_velocity_pressure_isobaric': 'omega',
    'Relative_humidity_isobaric': 'relative_humidity',
    'Temperature_isobaric': 'temperature',
    'u-component_of_wind_isobaric': 'u',
    'v-component_of_wind_isobaric': 'v',
    'Geopotential_height_isobaric': 'height'
}, inplace=True)

Units

MetPy’s DataArray accessor has a unit_array property to obtain a pint.Quantity array of just the data from the DataArray (metadata is removed) and a convert_units method to convert the the data from one unit to another (keeping it as a DataArray). For now, we’ll just use convert_units to convert our pressure coordinates to hPa.

data['isobaric1'].metpy.convert_units('hPa')
data['isobaric3'].metpy.convert_units('hPa')

Coordinates

You may have noticed how we directly accessed the vertical coordinates above using their names. However, in general, if we are working with a particular DataArray, we don’t have to worry about that since MetPy is able to parse the coordinates and so obtain a particular coordinate type directly. There are two ways to do this:

  1. Use the data_var.metpy.coordinates method
  2. Use the data_var.metpy.x, data_var.metpy.y, data_var.metpy.vertical, data_var.metpy.time properties

The valid coordinate types are:

  • x
  • y
  • vertical
  • time

(Both approaches and all four types are shown below)

# Get multiple coordinates (for example, in just the x and y direction)
x, y = data['temperature'].metpy.coordinates('x', 'y')

# If we want to get just a single coordinate from the coordinates method, we have to use
# tuple unpacking because the coordinates method returns a generator
vertical, = data['temperature'].metpy.coordinates('vertical')

# Or, we can just get a coordinate from the property
time = data['temperature'].metpy.time

# To verify, we can inspect all their names
print([coord.name for coord in (x, y, vertical, time)])

Out:

['longitude', 'latitude', 'isobaric3', 'time1']

Projections

Getting the cartopy coordinate reference system (CRS) of the projection of a DataArray is as straightforward as using the data_var.metpy.cartopy_crs property:

data_crs = data['temperature'].metpy.cartopy_crs
print(data_crs)

Out:

<cartopy.crs.PlateCarree object at 0x2b7e049c5a40>

The cartopy Globe can similarly be accessed via the data_var.metpy.cartopy_globe property:

data_globe = data['temperature'].metpy.cartopy_globe
print(data_globe)

Out:

<cartopy._crs.Globe object at 0x2b7e04d52780>

Calculations

Most of the calculations in metpy.calc will accept DataArrays by converting them into their corresponding unit arrays. While this may often work without any issues, we must keep in mind that because the calculations are working with unit arrays and not DataArrays:

  • The calculations will return unit arrays rather than DataArrays
  • Broadcasting must be taken care of outside of the calculation, as it would only recognize dimensions by order, not name

As an example, we calculate geostropic wind at 500 hPa below:

lat, lon = xr.broadcast(y, x)
f = mpcalc.coriolis_parameter(lat)
dx, dy = mpcalc.lat_lon_grid_deltas(lon, lat, initstring=data_crs.proj4_init)
heights = data['height'].loc[time[0]].loc[{vertical.name: 500.}]
u_geo, v_geo = mpcalc.geostrophic_wind(heights, f, dx, dy)
print(u_geo)
print(v_geo)

Out:

[[ 2.27255554  6.25140312  3.28255416 ... 33.11408008 32.73451717  32.57653661] [ 3.81629673  4.38877614  4.32523476 ... 31.00803157 30.75375118  30.53090851] [ 5.12733416  4.83910466  5.80060347 ... 29.1938277  29.03383493  28.58473536] ... [-1.91178314 -1.14643259 -1.91181665 ... -7.0079228  -7.39214953  -6.88190745] [-2.801646   -2.66973201 -2.40253719 ... -6.40453277 -7.20430366  -6.93719275] [-7.13845126 -8.40509536 -3.63934311 ... -5.60696811 -7.55739378  -8.39839852]] meter / second
[[-14.79152481 -14.39816817 -16.89520568 ...   5.58291437   6.513699    7.39606131] [ -8.30101213 -13.77064356 -17.09080998 ...   5.27352106   6.10386071    7.17918929] [-11.98245151 -13.54015832 -14.85469639 ...   4.91926188   5.74746248    6.7213265 ] ... [  4.25739862   3.4827536    2.96632359 ...   3.2245386    3.2245386    1.93346357] [  6.60970778   3.10076929   2.02109591 ...   2.42597342   3.63896014    3.37233348] [  4.79936391   4.23716749   2.54368012 ...   1.55552506   2.96791422    3.81293338]] meter / second

Also, a limited number of calculations directly support xarray DataArrays or Datasets (they can accept and return xarray objects). Right now, this includes

  • Derivative functions
    • first_derivative
    • second_derivative
    • gradient
    • laplacian
  • Cross-section functions
    • cross_section_components
    • normal_component
    • tangential_component
    • absolute_momentum

More details can be found by looking at the documentation for the specific function of interest.

There is also the special case of the helper function, grid_deltas_from_dataarray, which takes a DataArray input, but returns unit arrays for use in other calculations. We could rewrite the above geostrophic wind example using this helper function as follows:

heights = data['height'].loc[time[0]].loc[{vertical.name: 500.}]
lat, lon = xr.broadcast(y, x)
f = mpcalc.coriolis_parameter(lat)
dx, dy = mpcalc.grid_deltas_from_dataarray(heights)
u_geo, v_geo = mpcalc.geostrophic_wind(heights, f, dx, dy)
print(u_geo)
print(v_geo)

Out:

[[ 2.27255554  6.25140312  3.28255416 ... 33.11408008 32.73451717  32.57653661] [ 3.81629673  4.38877614  4.32523476 ... 31.00803157 30.75375118  30.53090851] [ 5.12733416  4.83910466  5.80060347 ... 29.1938277  29.03383493  28.58473536] ... [-1.91178314 -1.14643259 -1.91181665 ... -7.0079228  -7.39214953  -6.88190745] [-2.801646   -2.66973201 -2.40253719 ... -6.40453277 -7.20430366  -6.93719275] [-7.13845126 -8.40509536 -3.63934311 ... -5.60696811 -7.55739378  -8.39839852]] meter / second
[[-14.79152481 -14.39816817 -16.89520568 ...   5.58291437   6.513699    7.39606131] [ -8.30101213 -13.77064356 -17.09080998 ...   5.27352106   6.10386071    7.17918929] [-11.98245151 -13.54015832 -14.85469639 ...   4.91926188   5.74746248    6.7213265 ] ... [  4.25739862   3.4827536    2.96632359 ...   3.2245386    3.2245386    1.93346357] [  6.60970778   3.10076929   2.02109591 ...   2.42597342   3.63896014    3.37233348] [  4.79936391   4.23716749   2.54368012 ...   1.55552506   2.96791422    3.81293338]] meter / second

Plotting

Like most meteorological data, we want to be able to plot these data. DataArrays can be used like normal numpy arrays in plotting code, which is the recommended process at the current point in time, or we can use some of xarray’s plotting functionality for quick inspection of the data.

(More detail beyond the following can be found at xarray’s plotting reference.)

# A very simple example example of a plot of 500 hPa heights
data['height'].loc[time[0]].loc[{vertical.name: 500.}].plot()
plt.show()
../_images/sphx_glr_xarray_tutorial_001.png
# Let's add a projection and coastlines to it
ax = plt.axes(projection=ccrs.LambertConformal())
ax._hold = True  # Work-around for CartoPy 0.16/Matplotlib 3.0.0 incompatibility
data['height'].loc[time[0]].loc[{vertical.name: 500.}].plot(ax=ax, transform=data_crs)
ax.coastlines()
plt.show()
../_images/sphx_glr_xarray_tutorial_002.png
# Or, let's make a full 500 hPa map with heights, temperature, winds, and humidity

# Select the data for this time and level
data_level = data.loc[{vertical.name: 500., time.name: time[0]}]

# Create the matplotlib figure and axis
fig, ax = plt.subplots(1, 1, figsize=(12, 8), subplot_kw={'projection': data_crs})

# Plot RH as filled contours
rh = ax.contourf(x, y, data_level['relative_humidity'], levels=[70, 80, 90, 100],
                 colors=['#99ff00', '#00ff00', '#00cc00'])

# Plot wind barbs, but not all of them
wind_slice = slice(5, -5, 5)
ax.barbs(x[wind_slice], y[wind_slice],
         data_level['u'].metpy.unit_array[wind_slice, wind_slice].to('knots'),
         data_level['v'].metpy.unit_array[wind_slice, wind_slice].to('knots'),
         length=6)

# Plot heights and temperature as contours
h_contour = ax.contour(x, y, data_level['height'], colors='k', levels=range(5400, 6000, 60))
h_contour.clabel(fontsize=8, colors='k', inline=1, inline_spacing=8,
                 fmt='%i', rightside_up=True, use_clabeltext=True)
t_contour = ax.contour(x, y, data_level['temperature'], colors='xkcd:deep blue',
                       levels=range(248, 276, 2), alpha=0.8, linestyles='--')
t_contour.clabel(fontsize=8, colors='xkcd:deep blue', inline=1, inline_spacing=8,
                 fmt='%i', rightside_up=True, use_clabeltext=True)

# Add geographic features
ax.add_feature(cfeature.LAND.with_scale('50m'), facecolor=cfeature.COLORS['land'])
ax.add_feature(cfeature.OCEAN.with_scale('50m'), facecolor=cfeature.COLORS['water'])
ax.add_feature(cfeature.STATES.with_scale('50m'), edgecolor='#c7c783', zorder=0)
ax.add_feature(cfeature.LAKES.with_scale('50m'), facecolor=cfeature.COLORS['water'],
               edgecolor='#c7c783', zorder=0)

# Set a title and show the plot
ax.set_title(('500 hPa Heights (m), Temperature (K), Humidity (%) at ' +
              time[0].dt.strftime('%Y-%m-%d %H:%MZ')))
plt.show()
../_images/sphx_glr_xarray_tutorial_003.png

What Could Go Wrong?

Depending on your dataset and what you are trying to do, you might run into problems with xarray and MetPy. Below are examples of some of the most common issues

  • parse_cf not able to parse due to conflict
  • An axis not being available
  • An axis not being interpretable
  • Arrays not broadcasting in calculations

Coordinate Conflict

Code:

temperature = data.metpy.parse_cf('Temperature')

Error Message:

/home/user/env/MetPy/metpy/xarray.py:305: UserWarning: DataArray
of requested variable has more than one x coordinate. Specify the
unique axes using the coordinates argument.

Fix:

Specify the coordinates argument to the parse_cf method to map the T (time), Z (vertical), Y, and X axes (as applicable to your dataset) to the corresponding coordinates.

temperature = data.metpy.parse_cf('Temperature',
                                  coordinates={'T': 'time', 'Z': 'isobaric',
                                               'Y': 'y', 'X': 'x'})

Axis Unavailable

Code:

data['Temperature'].metpy.vertical

Error Message:

AttributeError: vertical attribute is not available.

This means that your data variable does not have the coordinate that was requested, at least as far as the parser can recognize. Verify that you are requesting a coordinate that your data actually has, and if it still is not available, you will need to manually specify the coordinates via the coordinates argument discussed above.

Axis Not Interpretable

Code:

x, y, ensemble = data['Temperature'].metpy.coordinates('x', 'y', 'ensemble')

Error Message:

AttributeError: 'ensemble' is not an interpretable axis

This means that you are requesting a coordinate that MetPy is (currently) unable to parse. While this means it cannot be recognized automatically, you can still obtain your desired coordinate directly by accessing it by name. If you have a need for systematic identification of a new coordinate type, we welcome pull requests for such new functionality on GitHub!

Broadcasting in Calculations

Code:

theta = mpcalc.potential_temperature(data['isobaric3'], data['temperature'])

Error Message:

ValueError: operands could not be broadcast together with shapes (9,31,81,131) (31,)

This is a symptom of the incomplete integration of xarray with MetPy’s calculations; the calculations currently convert the DataArrays to unit arrays that do not recognize which coordinates match with which. And so, we must do some manipulations.

Fix 1 (xarray broadcasting):

pressure, temperature = xr.broadcast(data['isobaric3'], data['temperature'])
theta = mpcalc.potential_temperature(pressure, temperature)

Fix 2 (unit array broadcasting):

theta = mpcalc.potential_temperature(
    data['isobaric3'].metpy.unit_array[None, :, None, None],
    data['temperature'].metpy.unit_array
)

Total running time of the script: ( 0 minutes 3.074 seconds)

Gallery generated by Sphinx-Gallery