Preprocessing Data ================== Many of the tracking algorithms require the input data to be in specific formats. Here we detail how to perform some of the typically required preprocessing steps using the cf-python library. Other tools can be used for the same tasks, however we focus on cf-python since it provides a uniform interface and it is a dependency of TCTrack. For full documentation of the routines described on these pages and more see the `cf python documentation `_. .. _combine_time: Combining in Time ----------------- Typically, data is separated into different files in time, but must often be combined into a single file. The below code example illustrates how this can be done with cf-python: .. code-block:: python # Read the list input files. This automatically concatenates in time. input_files = [...] field = cf.read(input_files)[0] # (Optionally) Select a time interval. This uses the first three months of 1950 time_interval = cf.wi(cf.dt("1950-01-01"), cf.dt("1950-04-01"), open_upper=True) field = field.subspace(T=time_interval) # Write the combined data to a single file cf.write(field, "combined-output.nc") Combine Variables ----------------- Variables will often be stored in separate files. To combine them with cf-python simply read them in separately and then write them together: .. code-block:: python # Read the separate input files field1 = cf.read("var1_file.nc")[0] field2 = cf.read("var2_file.nc")[0] # Write the combined fields to a single file cf.write([field1, field2], "combined_file.nc") Separating Variables -------------------- If variables instead need to be separated into multiple files, such as in :doc:`TSTORMS <../tracking-algorithms/tstorms>`, the opposite proceedure is followed: .. code-block:: python # Read in the combined file field1, field2 = cf.read("combined_file.nc") # Write to separate files cf.write(field1, "var1_file.nc") cf.write(field2, "var2_file.nc") Subsampling ----------- Sometimes we wish to subsample, e.g. to move from hourly data to daily. This can be done again using cf-python's ``subspace`` command, this time providing a ``slice`` or indices to extract the values of interest: .. code-block:: python # Read the separate input files field1 = cf.read("var1_file.nc")[0] # Generate subspaces as required # Take the 5th element of the 'Z' coordinate field2 = field1.subspace(Z=[5]) # Take the zeroth and fifth elements of the 'X' coordinate field3 = field1.subspace(X=[0, 5]) # Every second elements of the 'Y' coordinate between 3 and -3 field4 = field1.subspace(Y=slice(3, -3, 2)) Note that if only a single element is taken (e.g. a slice of a single pressure level) then the field will retain this as a coordinate dimension. To remove the single-valued coordinate from the field use cf-python's ``squeeze`` before writing to file: .. code-block:: python # Read the separate input files field1 = cf.read("var1_file.nc")[0] # Slice the 5th pressure level ('Z' coordinate) field2 = field1.subspace(Z=[5]) # Squeeze to remove the single-valued Z from field dimensions field2.squeeze(inplace=True) # or, for a new field field3 = field2.squeeze() Operations ---------- cf-python provides various operations to calculate new fields. These include both mathematical operations and statistical collapses. For example, to calculate vorticity from coincident velocity data we can use ``curl_xy``: .. code-block:: python # Read the separate input files u_field = cf.read("u_file.nc")[0] v_field = cf.read("v_file.nc")[0] # calculate vorticity w_field = cf.curl_xy(u_field, v_field, radius="earth") w_field.nc_set_variable("vorticity") w_field.set_property("standard_name", "atmosphere_upward_absolute_vorticity") w_field.set_property("units", "s-1") # Save the new variable to NetCDF cf.write(w_field, "vorticity_file.nc") Or to take a mean over a coordinate: .. code-block:: python # Read the separate input files field = cf.read("file.nc")[0] # Take the mean in the zonal 'X' coordinate and squeeze to remove 'X' dimension field_zonal_mean = field.collapse("mean", axes="X") field_zonal_mean.squeeze(inplace=True) # Save the new variable to NetCDF cf.write(field_zonal_mean, "zonal_mean_file.nc") Setting Fill Values ^^^^^^^^^^^^^^^^^^^ Sometimes it us useful to replace fill values after an operation before writing to file. This can be done using cf-python's ``filled`` routine. For example, after to set any null or masked values to ``0.0`` after calculating vorticity above use: .. code-block:: python w_field.filled(fill_value=0.0, inplace=True) before writing to file. Set NetCDF Variable Name ------------------------ To set specfic NetCDF variable names for the fields and coordinates you can use the ``nc_set_variable`` methods: .. code-block:: python field = cf.read("var1_file.nc")[0] # Set the new netcdf variable names for the field and coordinates field.nc_set_variable("slp") field.coordinate("latitude").nc_set_variable("lat") # Save with the new netcdf variable names cf.write(field, "slp_file.nc") Regridding ---------- .. note:: To regrid using cf-python requires :ref:`esmpy and ESMF to be installed as dependencies `. There are also other tools available including xarray, `NCO `_ (ncremap), and `CDO `_ (cdo remap...). Regridding variables will involve either using the grid of an existing variable or creating a new grid. Each of which is shown below. The interpolation method can be specified using the ``method`` argument, with options such as ``"linear"``, ``"conservative"``, and nearest neighbour search (`see here for details `_). To use an existing variable: .. code-block:: python # Get the fields for the two variables field1 = cf.read("var1_file.nc")[0] field2 = cf.read("var2_file.nc")[0] # Regrid field1 onto the grid of field2 field1 = field1.regrids(field2, method="linear") field1.nc_clear_dataset_chunksizes() # Avoids a possible error when writing To regrid onto a new grid: .. code-block:: python field = cf.read("var1_file.nc")[0] # Create a new grid at regular longitude and latitude coordinates domain = cf.Domain.create_regular((-180, 180, 1), (-90, 90, 1)) # Regrid field = field.regrid(domain, method="linear") field.nc_clear_dataset_chunksizes() # Avoids a possible error when writing Note that regridding can be performed inplace using ``inplace=True``. Gaussian Grid ^^^^^^^^^^^^^ If, as in :doc:`TRACK <../tracking-algorithms/track>`, a regular `Gaussian grid `_ is required (i.e. the latitude points satisfy the arcsin of the roots of a Legendre polynomial), the new longitudes and latitudes need to be defined. These are used to define new ``cf.DimensionCoordinate`` objects to be used for the regridding. .. code-block:: python field = cf.read("var1_file.nc")[0] # Define a regular Gaussian grid with 'n' points per hemisphere n = 256 lon = np.arange(0, 360, 360 / (4 * n)) lat = np.degrees(np.arcsin(np.polynomial.legendre.leggauss(2 * n)[0])) # Copy and modify the latitude and longitude DimensionCoordinates domain = field.domain.copy() lat_coord = domain.dimension_coordinate("latitude") lat_coord.set_data(lat, inplace=True) lat_coord.del_bounds() lon_coord = domain.dimension_coordinate("longitude") lon_coord.set_data(lon, inplace=True) lon_coord.del_bounds() # Regrid field = field.regrids((lat_coord, lon_coord), method="linear") field.nc_clear_dataset_chunksizes() # Avoids a possible error when writing