Preprocessing Data

Many of the tracking algorithms require the input data to be in specific formats. Here we detail how to perform some of the typically required preprocessing steps using the cf-python library. Other tools can be used for the same tasks, however we focus on cf-python since it provides a uniform interface and it is a dependency of TCTrack.

For full documentation of the routines described on these pages and more see the cf python documentation.

Combining in Time

Typically, data is separated into different files in time, but must often be combined into a single file. The below code example illustrates how this can be done with cf-python:

# Read the list input files. This automatically concatenates in time.
input_files = [...]
field = cf.read(input_files)[0]

# (Optionally) Select a time interval. This uses the first three months of 1950
time_interval = cf.wi(cf.dt("1950-01-01"), cf.dt("1950-04-01"), open_upper=True)
field = field.subspace(T=time_interval)

# Write the combined data to a single file
cf.write(field, "combined-output.nc")

Combine Variables

Variables will often be stored in separate files. To combine them with cf-python simply read them in separately and then write them together:

# Read the separate input files
field1 = cf.read("var1_file.nc")[0]
field2 = cf.read("var2_file.nc")[0]

# Write the combined fields to a single file
cf.write([field1, field2], "combined_file.nc")

Separating Variables

If variables instead need to be separated into multiple files, such as in TSTORMS, the opposite proceedure is followed:

# Read in the combined file
field1, field2 = cf.read("combined_file.nc")

# Write to separate files
cf.write(field1, "var1_file.nc")
cf.write(field2, "var2_file.nc")

Subsampling

Sometimes we wish to subsample, e.g. to move from hourly data to daily. This can be done again using cf-python’s subspace command, this time providing a slice or indices to extract the values of interest:

# Read the separate input files
field1 = cf.read("var1_file.nc")[0]

# Generate subspaces as required
# Take the 5th element of the 'Z' coordinate
field2 = field1.subspace(Z=[5])
# Take the zeroth and fifth elements of the 'X' coordinate
field3 = field1.subspace(X=[0, 5])
# Every second elements of the 'Y' coordinate between 3 and -3
field4 = field1.subspace(Y=slice(3, -3, 2))

Note that if only a single element is taken (e.g. a slice of a single pressure level) then the field will retain this as a coordinate dimension. To remove the single-valued coordinate from the field use cf-python’s squeeze before writing to file:

# Read the separate input files
field1 = cf.read("var1_file.nc")[0]

# Slice the 5th pressure level ('Z' coordinate)
field2 = field1.subspace(Z=[5])

# Squeeze to remove the single-valued Z from field dimensions
field2.squeeze(inplace=True)
# or, for a new field
field3 = field2.squeeze()

Operations

cf-python provides various operations to calculate new fields. These include both mathematical operations and statistical collapses.

For example, to calculate vorticity from coincident velocity data we can use curl_xy:

# Read the separate input files
u_field = cf.read("u_file.nc")[0]
v_field = cf.read("v_file.nc")[0]

# calculate vorticity
w_field = cf.curl_xy(u_field, v_field, radius="earth")
w_field.nc_set_variable("vorticity")
w_field.set_property("standard_name", "atmosphere_upward_absolute_vorticity")
w_field.set_property("units", "s-1")

# Save the new variable to NetCDF
cf.write(w_field, "vorticity_file.nc")

Or to take a mean over a coordinate:

# Read the separate input files
field = cf.read("file.nc")[0]

# Take the mean in the zonal 'X' coordinate and squeeze to remove 'X' dimension
field_zonal_mean = field.collapse("mean", axes="X")
field_zonal_mean.squeeze(inplace=True)

# Save the new variable to NetCDF
cf.write(field_zonal_mean, "zonal_mean_file.nc")

Setting Fill Values

Sometimes it us useful to replace fill values after an operation before writing to file. This can be done using cf-python’s filled routine. For example, after to set any null or masked values to 0.0 after calculating vorticity above use:

w_field.filled(fill_value=0.0, inplace=True)

before writing to file.

Set NetCDF Variable Name

To set specfic NetCDF variable names for the fields and coordinates you can use the nc_set_variable methods:

field = cf.read("var1_file.nc")[0]

# Set the new netcdf variable names for the field and coordinates
field.nc_set_variable("slp")
field.coordinate("latitude").nc_set_variable("lat")

# Save with the new netcdf variable names
cf.write(field, "slp_file.nc")

Regridding

Note

To regrid using cf-python requires esmpy and ESMF to be installed as dependencies. There are also other tools available including xarray, NCO (ncremap), and CDO (cdo remap…).

Regridding variables will involve either using the grid of an existing variable or creating a new grid. Each of which is shown below. The interpolation method can be specified using the method argument, with options such as "linear", "conservative", and nearest neighbour search (see here for details).

To use an existing variable:

# Get the fields for the two variables
field1 = cf.read("var1_file.nc")[0]
field2 = cf.read("var2_file.nc")[0]

# Regrid field1 onto the grid of field2
field1 = field1.regrids(field2, method="linear")
field1.nc_clear_dataset_chunksizes()  # Avoids a possible error when writing

To regrid onto a new grid:

field = cf.read("var1_file.nc")[0]

# Create a new grid at regular longitude and latitude coordinates
domain = cf.Domain.create_regular((-180, 180, 1), (-90, 90, 1))

# Regrid
field = field.regrid(domain, method="linear")
field.nc_clear_dataset_chunksizes()  # Avoids a possible error when writing

Note that regridding can be performed inplace using inplace=True.

Gaussian Grid

If, as in TRACK, a regular Gaussian grid is required (i.e. the latitude points satisfy the arcsin of the roots of a Legendre polynomial), the new longitudes and latitudes need to be defined. These are used to define new cf.DimensionCoordinate objects to be used for the regridding.

field = cf.read("var1_file.nc")[0]

# Define a regular Gaussian grid with 'n' points per hemisphere
n = 256
lon = np.arange(0, 360, 360 / (4 * n))
lat = np.degrees(np.arcsin(np.polynomial.legendre.leggauss(2 * n)[0]))

# Copy and modify the latitude and longitude DimensionCoordinates
domain = field.domain.copy()
lat_coord = domain.dimension_coordinate("latitude")
lat_coord.set_data(lat, inplace=True)
lat_coord.del_bounds()
lon_coord = domain.dimension_coordinate("longitude")
lon_coord.set_data(lon, inplace=True)
lon_coord.del_bounds()

# Regrid
field = field.regrids((lat_coord, lon_coord), method="linear")
field.nc_clear_dataset_chunksizes()  # Avoids a possible error when writing