Intermediate NumPy
1. Using axes to slice arrays¶
The solution to the last exercise in the Numpy Basics notebook introduces an important concept when working with NumPy: the axis. This indicates the particular dimension along which a function should operate (provided the function does something taking multiple values and converts to a single value).
Let's look at a concrete example with sum
:
# Convention for import to get shortened namespace
import numpy as np
# Create an array for testing
a = np.arange(12).reshape(3, 4)
a
# This calculates the total of all values in the array
np.sum(a)
# Keep this in mind:
a.shape
# Instead, take the sum across the rows:
np.sum(a, axis=0)
# Or do the same and take the some across columns:
np.sum(a, axis=1)
- Finish the code below to calculate advection. The trick is to figure out how to do the summation.
# Synthetic data
temp = np.random.randn(100, 50)
u = np.random.randn(100, 50)
v = np.random.randn(100, 50)
# Calculate the gradient components
gradx, grady = np.gradient(temp)
# Turn into an array of vectors:
# axis 0 is x position
# axis 1 is y position
# axis 2 is the vector components
grad_vec = np.dstack([gradx, grady])
print(grad_vec.shape)
# Turn wind components into vector
wind_vec = np.dstack([u, v])
# Calculate advection, the dot product of wind and the negative of gradient
# DON'T USE NUMPY.DOT (doesn't work). Multiply and add.
# %load solutions/advection.py
# Cell content replaced by load magic replacement.
advec = (wind_vec * -grad_vec).sum(axis=-1)
print(advec.shape)
2. Indexing Arrays with Boolean Values¶
Numpy can easily create arrays of boolean values and use those to select certain values to extract from an array
# Create some synthetic data representing temperature and wind speed data
np.random.seed(19990503) # Make sure we all have the same data
temp = (20 * np.cos(np.linspace(0, 2 * np.pi, 100)) +
50 + 2 * np.random.randn(100))
spd = (np.abs(10 * np.sin(np.linspace(0, 2 * np.pi, 100)) +
10 + 5 * np.random.randn(100)))
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(temp, 'tab:red')
plt.plot(spd, 'tab:blue');
By doing a comparision between a NumPy array and a value, we get an array of values representing the results of the comparison between each element and the value
temp > 45
We can take the resulting array and use this to index into the NumPy array and retrieve the values where the result was true
print(temp[temp > 45])
So long as the size of the boolean array matches the data, the boolean array can come from anywhere
print(temp[spd > 10])
# Make a copy so we don't modify the original data
temp2 = temp.copy()
# Replace all places where spd is <10 with NaN (not a number) so matplotlib skips it
temp2[spd < 10] = np.nan
plt.plot(temp2, 'tab:red')
Can also combine multiple boolean arrays using the syntax for bitwise operations. MUST HAVE PARENTHESES due to operator precedence.
print(temp[(temp < 45) & (spd > 10)])
- Heat index is only defined for temperatures >= 80F and relative humidity values >= 40%. Using the data generated below, use boolean indexing to extract the data where heat index has a valid value.
# Here's the "data"
np.random.seed(19990503) # Make sure we all have the same data
temp = (20 * np.cos(np.linspace(0, 2 * np.pi, 100)) +
80 + 2 * np.random.randn(100))
rh = (np.abs(20 * np.cos(np.linspace(0, 4 * np.pi, 100)) +
50 + 5 * np.random.randn(100)))
# Create a mask for the two conditions described above
# good_heat_index =
# Use this mask to grab the temperature and relative humidity values that together
# will give good heat index values
# temp[] ?
# BONUS POINTS: Plot only the data where heat index is defined by
# inverting the mask (using `~mask`) and setting invalid values to np.nan
# %load solutions/heat_index.py
# Cell content replaced by load magic replacement.
import numpy as np
# Here's the "data"
np.random.seed(19990503) # Make sure we all have the same data
temp = (20 * np.cos(np.linspace(0, 2 * np.pi, 100)) +
80 + 2 * np.random.randn(100))
rh = (np.abs(20 * np.cos(np.linspace(0, 4 * np.pi, 100)) +
50 + 5 * np.random.randn(100)))
# Create a mask for the two conditions described above
good_heat_index = (temp >= 80) & (rh >= 0.4)
# Use this mask to grab the temperature and relative humidity values that together
# will give good heat index values
print(temp[good_heat_index])
# BONUS POINTS: Plot only the data where heat index is defined by
# inverting the mask (using ~mask) and setting invalid values to np.nan
temp[~good_heat_index] = np.nan
plt.plot(temp, 'tab:red')
3. Indexing using arrays of indices¶
You can also use a list or array of indices to extract particular values--this is a natural extension of the regular indexing. For instance, just as we can select the first element:
print(temp[0])
We can also extract the first, fifth, and tenth elements:
print(temp[[0, 4, 9]])
One of the ways this comes into play is trying to sort numpy arrays using argsort
. This function returns the indices of the array that give the items in sorted order. So for our temp "data":
inds = np.argsort(temp)
print(inds)
We can use this array of indices to pass into temp to get it in sorted order:
print(temp[inds])
Or we can slice inds
to only give the 10 highest temperatures:
ten_highest = inds[-10:]
print(temp[ten_highest])
There are other numpy arg functions that return indices for operating:
np.*arg*?