Output File Format

One of the key features of TCTrack is its output file format. Whilst all of the codes wrapped provide output data in a custom formats with minimal or zero metadata, TCTrack, by contrast, provides data in a standardised results format across all codes that is compliant with the CF-Conventions for metadata.

These pages describe the contents of this file and how it can be used downstream.

CF-Conventions and FAIR Data

The CF (Climate and Forecast) Conventions for NetCDF data are best introduced in their own words:

“The CF metadata conventions are designed to promote the processing and sharing of files created with the NetCDF API. The conventions define metadata that provide a definitive description of what the data in each variable represents, and the spatial and temporal properties of the data. This enables users of data from different sources to decide which quantities are comparable, and facilitates building applications with powerful extraction, regridding, and display capabilities.”

TCTrack produces CF-compliant files for all tracking algorithms with clear metadata.

FAIR is an acronym for Findable, Accessible, Interoperable, and Reusable. It outlines expectations for high-quality metadata and standards-compliant file formats. By providing clear variable definitions, provenance information, and following established conventions, such datasets enable robust scientific workflows, reproducibility, and integration into a wide range of analysis tools. This approach maximizes the scientific value of the data and supports transparent, collaborative research. It also ensures that datasets and files are meaningful, useable, and reliable, even when shared standalone or a significant time after generation.

Overview

Each NetCDF output file produced by the to_netcdf() method of a Tracker contains a collection of tropical cyclone trajectories, with each trajectory representing a single storm track.

The format follows the CF Conventions: H4 Trajectory Data recommendations for trajectory data using two-dimensional arrays of shape (n_trajectory, n_observation). Any trajectories shorter than n_observations end with fill values.

File Format

Dimensions

  • trajectory: Number of detected storm tracks (each a unique trajectory).

    • Has a corresponding dimension coordinate variable indicating the trajectory index

  • observation Maximum number of time steps (observations) in the trajectories.

    • Has a corresponding dimension coordinate variable indicating the observation index

Variables

Required variables

These coordinate variables are always present in a dataset:

  • trajectory: Unique index for each trajectory.

    • Dimensions: (trajectory)

    • Dimension coordinate.

    • Designated by the cf_role="trajectory_id" attribute.

  • observation: Index for each observation within a trajectory.

    • Dimensions: (observation)

    • Dimension coordinate.

  • time: Time of each observation.

    • Dimensions: (trajectory, observation)

    • Auxiliary coordinate.

    • Attributes include units and calendar amongst others

  • latitude: Latitude of the observation.

    • Dimensions: (trajectory, observation)

    • Auxiliary coordinate.

  • longitude: Longitude of the observation.

    • Dimensions: (trajectory, observation)

    • Auxiliary coordinate.

Optional variables

A number of additional variables may also be written to file for each trajectory depending on the algorithm, data used, and user-configuration. Most commonly these will be measures of “intensity” along the tracks, but may also include auxilliary coordinates such as grid indices.

  • Intensity variables

    • Dimensions: (trajectory, observation)

    • Attributes should include units and may also provide a CF cell method indicating how the variable was calculated.

Ancillary Field Variables

Files always include two Ancillary status flag variables:

  • start_flag

    • Dimensions: (trajectory)

  • end_flag

    • Dimensions: (trajectory)

These indicate any tracks that start or end within 1 day of the input dataset bounds and may therefore extend outside this range.

Global Attributes

Files contain a number of global metadata attributes

  • Conventions: Indicates the version of the CF-conventions the file was generated with

  • featureType: trajectory, a CF attribute aiding data processing and regognition of data format

  • tctrack_version: Attribute detailing the software version used to generate the file. Indicates semantic version and commit hash.

  • tctrack_tracker: Tracker name identifying the algorithm used (e.g., TSTORMSTracker)

  • <TCTrack-Tracker>_parameters: JSON-encoded dictionaries recording all key parameters used for detection and tracking, to aid reproducibility.

Inspection and Usage

As a CF-compliant NetCDF file TCTrack outputs can be accessed using a number of downstream tools and softwares including NetCDF-Python, xarray, cf-python etc. both within and beyond the Python ecosystem.

Perhaps the quickest way of inspecting the metadata is to use the ncdump utility:

ncdump -h my_tctrack_output.nc
netcdf my_tctrack_output {
dimensions:
        trajectory = 2 ;
        observation = 13 ;
variables:
        int64 trajectory(trajectory) ;
                trajectory:standard_name = "trajectory" ;
                trajectory:cf_role = "trajectory_id" ;
                trajectory:long_name = "trajectory index" ;
        int64 observation(observation) ;
                observation:standard_name = "observation" ;
                observation:long_name = "observation index" ;
        double time(trajectory, observation) ;
                time:standard_name = "time" ;
                time:long_name = "time" ;
                time:units = "days since 1950-01-01 12:00:00.000000" ;
                time:missing_value = -100000000. ;
                time:calendar = "360_day" ;
        double latitude(trajectory, observation) ;
        ...

Here we show how to inspect a file using cf-python which highlights the structure described above:

import cf

# Read the input file. This automatically concatenates in time.
fieldlist = cf.read("my_tctrack_output.nc")

for field in fieldlist:
    field.dump()
------------------------------------------------------------------
Field: air_pressure_at_sea_level (ncvar%air_pressure_at_sea_level)
------------------------------------------------------------------
Conventions = 'CF-1.12'
detect_parameters = '{"u_in_file": "u_ref_interpolated_final.nc", "v_in_file":
                     "v_ref_interpolated_final.nc", "vort_in_file":
                     "vort850_interpolated_final.nc", "tm_in_file":
                     "tm_interpolated_final.nc", "slp_in_file":
                     "slp_interpolated_final.nc", "use_sfc_wind": true,
                     "vort_crit": 3.5e-05, "tm_crit": 0.0, "thick_crit": 50.0,
                     "dist_crit": 4.0, "lat_bound_n": 70.0, "lat_bound_s":
                     -70.0, "do_spline": false, "do_thickness": false}'
featureType = 'trajectory'
long_name = 'Sea Level Pressure'
missing_value = np.float64(-10000000000.0)
standard_name = 'air_pressure_at_sea_level'
stitch_parameters = '{"r_crit": 900.0, "wind_crit": 17.0, "vort_crit": 3.5e-05,
                     "tm_crit": 0.5, "thick_crit": 50.0, "n_day_crit": 2,
                     "do_filter": true, "lat_bound_n": 40.0, "lat_bound_s":
                     -40.0, "do_spline": false, "do_thickness": false}'
tctrack_tracker = 'TSTORMSTracker'
tctrack_version = '0.1.dev157+g482a888b7'
tstorms_parameters = '{"tstorms_dir": "/home/jwa34/rds/hpc-
                      work/TSTORMS_clean/tropical_storms_pub/", "output_dir":
                      "/home/jwa34/rds/rds-inspire-tc-
                      TqEGHMWTn8A/test_data/tstorms/output_test/", "input_dir":
                      "/rds/project/rds-TqEGHMWTn8A/test_data/tstorms"}'
units = 'Pa'

Data(trajectory(2), observation(13)) = [[1002.11, ..., nan]] Pa

Cell Method: area: maximum (lesser circle of radius 4.0 degrees)

Field Ancillary: status_flag
    long_name = 'Trajectory starting at start of dataset flag.'
    standard_name = 'status_flag'
    Data(trajectory(2)) = [0, 0]

Field Ancillary: status_flag
    long_name = 'Trajectory finishing at end of dataset flag.'
    standard_name = 'status_flag'
    Data(trajectory(2)) = [0, 0]

Domain Axis: observation(13)
Domain Axis: trajectory(2)

Dimension coordinate: trajectory
    cf_role = 'trajectory_id'
    long_name = 'trajectory index'
    standard_name = 'trajectory'
    Data(trajectory(2)) = [0, 1]

Dimension coordinate: observation
    long_name = 'observation index'
    standard_name = 'observation'
    Data(observation(13)) = [0, ..., 12]

Auxiliary coordinate: time
    calendar = '360_day'
    long_name = 'time'
    missing_value = np.float64(-100000000.0)
    standard_name = 'time'
    units = 'days since 1950-01-01 12:00:00.000000'
    Data(trajectory(2), observation(13)) = [[1950-01-06 00:00:00, ..., -275828-03-21 12:00:00]] 360_day

Auxiliary coordinate: latitude
    long_name = 'latitude'
    missing_value = np.float64(-999.9)
    standard_name = 'latitude'
    units = 'degrees_north'
    Data(trajectory(2), observation(13)) = [[-9.26, ..., nan]] degrees_north

Auxiliary coordinate: longitude
    long_name = 'longitude'
    missing_value = np.float64(-999.9)
    standard_name = 'longitude'
    units = 'degrees_east'
    Data(trajectory(2), observation(13)) = [[67.32, ..., nan]] degrees_east

...

Plotting example

Tracks can be visualised using variety of softwares. Here we demonstrate a simple example using NetCDF with cartopy, though users may also explore cf-python, xarray, or hurucanpy.

import netCDF4
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs

# Open the NetCDF file
with netCDF4.Dataset("my_tctrack_output.nc") as ncfile:
    # Read variables
    lat_var = ncfile.variables["latitude"]
    lon_var = ncfile.variables["longitude"]
    time_var = ncfile.variables["time"]
    intensity_var = ncfile.variables["wind_speed"]
    traj_var = ncfile.variables["trajectory"]

    lats = lat_var[:]
    lons = lon_var[:]
    intensity = intensity_var[:]
    traj_labels = traj_var[:]
    times = time_var[:]

    # Convert times to datetime objects
    missing_time = getattr(time_var, "missing_value", np.nan)
    times = np.ma.masked_where(times == missing_time, times)
    time_units = getattr(time_var, "units")
    time_calendar = getattr(time_var, "calendar")
    times_dt = netCDF4.num2date(times, units=time_units, calendar=time_calendar)

    # Get intensity metadata for labels
    intensity_name = getattr(intensity_var, "long_name")
    intensity_units = getattr(intensity_var, "units", "")
    min_intensity = np.nanmin(intensity)
    max_intensity = np.nanmax(intensity)

    plt.figure(figsize=(10, 6))
    ax = plt.axes(projection=ccrs.PlateCarree())
    ax.coastlines()
    ax.gridlines(draw_labels=True)

    # Plot each trajectory
    for i in traj_labels:
        times_i = times_dt[i, :].compressed()
        label = (
            f"{times_i[0].strftime('%Y-%m-%d %H:%M')} to "
            f"{times_i[-1].strftime('%Y-%m-%d %H:%M')}"
        )
        pl = ax.plot(
            lons[i], lats[i], "--", transform=ccrs.PlateCarree(), label=f"{label}"
        )
        sc = ax.scatter(
            lons[i],
            lats[i],
            c=intensity[i],
            cmap="viridis",
            s=40,
            vmin=min_intensity,
            vmax=max_intensity,
            transform=ccrs.PlateCarree(),
        )

    plt.colorbar(sc, label=f"{intensity_name} ({intensity_units})")
    plt.title(f"All Trajectories Colored by {intensity_name}")
    plt.legend()
    plt.savefig("my_tracks.png")
Example tracks plotted using the above code snippet.

References