Basic_Data_Structures

Basic Data Structures

What are Data Structures?

Imagine you want to plot hourly precipitation measurements from a rain gauge on a bar graph. Your first step will be to put the precipitation data inside a Python data structure. Data structures are how computer programs store information. This information can subsequently be processed, analyzed and visualized.

There are many different types of data structures depending on the kind of data you wish to store. You will have to choose data structures that best meet your requirements for the problem you are trying to solve. Fortunately, Python as a "batteries-included" language gives you a practical choice of data structures to select from. We will do a minimal exploration of Python data structures; just enough to get you going. For a more complete treatment of Python data structures, see the Data Structures section in the Python documentation.

Python Data Structures

First, scientific data can be large and complex and may require data structures appropriate for scientific programming. We cover Python for scientific data further along in the Scientific Python Package section. This notebook covers basic Python data structures meant for general-purpose programming, necessary to write programs in any capacity. Choosing the right data structure for the problem you are targeting will help your programs run correctly and efficiently, and make them easier for others to understand. We will specifically examine three Python data structures: lists, tuples, both Python sequences, and dictionaries.

Sequences

Python offers several data structures to store sequences of information such as hourly mean sea level pressure readings from a weather station, or three-dimensional coordinates describing a location in a climate model. To accommodate storage of such data, Python has a few different choices. We will discuss two of them: lists and tuples.

List

A Python list is a sequence of values that are usually the same kind of item. They are ordered, which means items of a list stay in the order they are inserted in. They can contain strings, numbers, or more complex items. Lists are mutable, which is a fancy way of saying they can be changed after they are created. Here is a Python list of decadal concentrations of carbon dioxide (ppm) measured at the Mauna Loa observatory from 1970 to 2010 assigned to the co2 variable.

In [1]:
co2 = [325.68, 338.68, 354.35, 371.13, 387.37]

The list is demarcated with square brackets, the values are comma delimited and assigned to the co2 variable with the = assignment operator.

What Can You Do with a List?

Once you have created your list, there are many options to further manipulate the list. We will examine just a few examples.

  • Add an Item to the End of the List

Continuing with our list of carbon dioxide concentrations, we want to add a prediction for the year 2020 of 400.0 ppm to the co2 list. We can use the append method to add an item to the end of the list. (A method is like a function, but denoted with the . notation after the variable it is acting on. Instead of append(co2, 400.0) you have co2.append(400.0).)

In [2]:
co2.append(400.0)
print(co2)
[325.68, 338.68, 354.35, 371.13, 387.37, 400.0]
  • Add an Item to the Front of the List

The carbon dioxide concentration in 1960 at Mauna Loa was 316.91 ppm. We can use the insert method, to add an item to the list at the location of our choosing, in this case location or index 0. (Python sequences start at index 0, not 1 like Matlab or Fortran.)

In [3]:
co2.insert(0, 316.91)
print(co2)
[316.91, 325.68, 338.68, 354.35, 371.13, 387.37, 400.0]
  • Change a Value in the List

We want to improve our estimate of the year 2020 carbon dioxide concentration to a value of 401.0 ppm. We will access the 7th value on the list with the square bracket notation.

In [4]:
co2[6] = 401.0  # Remember, 7th item at index 6 because we start at 0, not 1
print(co2)
[316.91, 325.68, 338.68, 354.35, 371.13, 387.37, 401.0]
Tuples

Tuples are also ordered sequences of information but they are immutable, which means once they are created, they cannot change. Immutability may seem like a strange concept given that computer programs are constantly manipulating and changing data, but your program becomes easier to understand when you can guarantee something is unchanging. Tuples tend to contain related items such as an x and y locations in a Cartesian plane, or an author, title and journal in a scholarly citation.

Here we define a tuple representing a geographic coordinate expressed latitude, longitude and elevation in meters:

In [5]:
location = (40.0, -105.3, 1655.1)

The tuple definition is demarcated with parentheses, the values are comma delimited and assigned to the location variable with the = assignment operator. Because tuples are immutable, unlike lists, there are no operations to change them in-place.

Built-in Functions for Lists and Tuples

There are several built-in Python functions to examine both lists and tuples. Let's look at a few. We can find out the length of the tuple or list with the built-in Python len function:

In [6]:
print(len(co2))
7

We can also discover the min and max of a sequence:

In [7]:
print(min(co2), max(co2))
316.91 401.0
Accessing Data from Lists and Tuples

Python offers a rich variety of options to access values inside lists and tuples, and you will want to eventually understand indexing, slicing and striding expressions. For brevity, we will only examine a couple of examples to get values inside sequences. Again, note valid indices on lists and tuples start at 0 and end at size of list - 1.

Indexing

Individual items inside the list can be obtained with the square bracket notation. Here will assign a couple of values from inside the list to two variables: co2_1960 and co2_2010. We will the print the values with Python 3 positional formatting.

In [8]:
co2_1960 = co2[0]  # index 0 at 1960
co2_2010 = co2[5]  # index 5 at 2010
print('Mauna Loa carbon dioxide concentration '
      'in 1960 was {0} ppm and '
      'in 2010 was {1} ppm.'.format(co2_1960, co2_2010))
Mauna Loa carbon dioxide concentration in 1960 was 316.91 ppm and in 2010 was 387.37 ppm.
Multiple Assignments for Unpacking Tuples

Python sequences also allow for multiple assignments for unpacking. This trick is quite handy for tuples:

In [9]:
lat, lon, elev = location  # unpacking the tuple
print('lat {0}, lon {1}, elevation {2}'.format(lat, lon, elev))
lat 40.0, lon -105.3, elevation 1655.1

Dictionaries

Dictionary data structures are easy to understand because you are already familiar with them. When you look up a word definition in a language dictionary or use an index in the back of a book, you are using a dictionary data structure. Dictionaries are composed of key and value pairs. For example,

hydrometeor - an atmospheric phenomenon or entity involving water or water vapor, such as rain or a cloud

Here, the key is "hydrometeor" and the value is "an atmospheric phenomenon or entity involving water or water vapor, such as rain or a cloud."

Let's build upon the earlier tuple example by defining a dictionary of METAR weather stations. The keys are strings representing the METAR ICAO identifier, the values are tuples representing the location of the station expressed in latitude, longitude and elevation in meters

In [10]:
metars = {
    'KPRG': (48.96, 2.44, 66),
    'FAGM': (-26.24, 28.15, 1671),
    'KNYC': (40.71, -74.01, 10)}

Unlike lists and tuples, dictionaries are unordered; entries in a dictionary are not in the order they are inserted in and you cannot rely on any predictable ordering. This is not a problem as you will be using Python dictionary operations to look up the information contained within the dictionary.

What Can You Do with a Dictionary?
  • Look up a Value in a Dictionary

Let's look up the METAR location for KNYC.

In [11]:
print(metars['KNYC'])
(40.71, -74.01, 10)
  • Add a Value to a Dictionary

Let's add the METAR station SBSP for São Paulo, Brazil to our METAR dictionary:

In [12]:
metars['SBSP'] = (-23.63, -46.66, 801)

Note that the keys in dictionary data structures are unique. This means, for example, that if you provide a more accurate location for the São Paulo METAR, then that new key will replace the old one:

In [13]:
metars['SBSP'] = (-23.627, -46.655, 803.1)
print(metars['SBSP'])
(-23.627, -46.655, 803.1)

There will not be two SBSP keys in the dictionary.

Further Reading

There are many topics concerning Python data structures that we did not cover in the interest of brevity. We encourage you to research more elaborate indexing, slicing and striding expressions. Also, we did not cover Sets, which is a data structure composed of unique, unordered values similar to keys in a dictionary data structure. There are several valuable built-in Python functions that merit study: filter(), map(), sorted() functions to name a few. Lastly, in the "Flow Control" notebook, we will examine Python list comprehension to process information inside of sequences and dictionaries.