Output File Format
==================
One of the key features of TCTrack is its output file format.
Whilst all of the codes wrapped provide output data in a custom formats with
minimal or zero metadata, TCTrack, by contrast, provides data in a standardised results
format across all codes that is compliant with the
`CF-Conventions `_ for metadata.
These pages describe the contents of this file and how it can be used downstream.
.. Import the tempest_extremes module to use references throughout this page.
.. py:module:: tctrack.core
:no-index:
CF-Conventions and FAIR Data
----------------------------
The `CF (Climate and Forecast) Conventions for NetCDF data `_
are best introduced in their own words:
*"The CF metadata conventions are designed to promote the processing and sharing of
files created with the NetCDF API. The conventions define metadata that provide a
definitive description of what the data in each variable represents, and the spatial
and temporal properties of the data. This enables users of data from different
sources to decide which quantities are comparable, and facilitates building
applications with powerful extraction, regridding, and display capabilities."*
TCTrack produces CF-compliant files for all tracking algorithms with clear
metadata.
`FAIR `_ is an acronym for
Findable, Accessible, Interoperable, and Reusable.
It outlines expectations for high-quality metadata and standards-compliant file formats.
By providing clear variable definitions, provenance information, and following
established conventions, such datasets enable robust scientific workflows, reproducibility,
and integration into a wide range of analysis tools.
This approach maximizes the scientific value of the data and supports transparent,
collaborative research. It also ensures that datasets and files are meaningful, useable,
and reliable, even when shared standalone or a significant time after generation.
Overview
--------
Each NetCDF output file produced by the :meth:`~TCTracker.to_netcdf` method of a Tracker
contains a collection of tropical cyclone trajectories, with each trajectory representing
a single storm track.
The format follows the
`CF Conventions: H4 Trajectory Data `_
recommendations for trajectory data using two-dimensional arrays of
shape ``(n_trajectory, n_observation)``.
Any trajectories shorter than ``n_observations`` end with fill values.
File Format
-----------
Dimensions
^^^^^^^^^^
- **trajectory**: Number of detected storm tracks (each a unique trajectory).
- Has a corresponding dimension coordinate variable indicating the trajectory index
- **observation** Maximum number of time steps (observations) in the trajectories.
- Has a corresponding dimension coordinate variable indicating the observation index
Variables
^^^^^^^^^
Required variables
""""""""""""""""""
These coordinate variables are always present in a dataset:
- **trajectory**: Unique index for each trajectory.
- *Dimensions*: ``(trajectory)``
- Dimension coordinate.
- Designated by the ``cf_role="trajectory_id"`` *attribute*.
- **observation**: Index for each observation within a trajectory.
- *Dimensions*: ``(observation)``
- Dimension coordinate.
- **time**: Time of each observation.
- *Dimensions*: ``(trajectory, observation)``
- Auxiliary coordinate.
- *Attributes* include ``units`` and ``calendar`` amongst others
- **latitude**: Latitude of the observation.
- *Dimensions*: ``(trajectory, observation)``
- Auxiliary coordinate.
- **longitude**: Longitude of the observation.
- *Dimensions*: ``(trajectory, observation)``
- Auxiliary coordinate.
Optional variables
""""""""""""""""""
A number of additional variables may also be written to file for each trajectory
depending on the algorithm, data used, and user-configuration.
Most commonly these will be measures of "intensity" along the tracks, but may also include
auxilliary coordinates such as grid indices.
- **Intensity variables**
- *Dimensions*: ``(trajectory, observation)``
- *Attributes* should include ``units`` and may also provide a
`CF cell method `_
indicating how the variable was calculated.
Ancillary Field Variables
"""""""""""""""""""""""""
Files always include two
`Ancillary status flag variables `_:
- **start_flag**
- *Dimensions*: ``(trajectory)``
- **end_flag**
- *Dimensions*: ``(trajectory)``
These indicate any tracks that start or end within 1 day of the input dataset bounds
and may therefore extend outside this range.
Global Attributes
^^^^^^^^^^^^^^^^^
Files contain a number of global metadata attributes
- **Conventions**: Indicates the version of the CF-conventions the file was generated
with
- **featureType**: ``trajectory``, a CF attribute aiding data processing and regognition of data format
- **tctrack_version**: Attribute detailing the software version used to generate the file.
Indicates semantic version and commit hash.
- **tctrack_tracker**: Tracker name identifying the algorithm used (e.g., TSTORMSTracker)
- **_parameters**: JSON-encoded dictionaries recording all key parameters used for detection and tracking, to aid reproducibility.
Inspection and Usage
--------------------
As a CF-compliant NetCDF file TCTrack outputs can be accessed using a number of downstream tools and softwares including NetCDF-Python, xarray, cf-python etc. both within and
beyond the Python ecosystem.
Perhaps the quickest way of inspecting the metadata is to use the
`ncdump utility `_:
.. code-block:: shell
ncdump -h my_tctrack_output.nc
::
netcdf my_tctrack_output {
dimensions:
trajectory = 2 ;
observation = 13 ;
variables:
int64 trajectory(trajectory) ;
trajectory:standard_name = "trajectory" ;
trajectory:cf_role = "trajectory_id" ;
trajectory:long_name = "trajectory index" ;
int64 observation(observation) ;
observation:standard_name = "observation" ;
observation:long_name = "observation index" ;
double time(trajectory, observation) ;
time:standard_name = "time" ;
time:long_name = "time" ;
time:units = "days since 1950-01-01 12:00:00.000000" ;
time:missing_value = -100000000. ;
time:calendar = "360_day" ;
double latitude(trajectory, observation) ;
...
Here we show how to inspect a file using cf-python which highlights the structure
described above:
.. code-block:: python
import cf
# Read the input file. This automatically concatenates in time.
fieldlist = cf.read("my_tctrack_output.nc")
for field in fieldlist:
field.dump()
::
------------------------------------------------------------------
Field: air_pressure_at_sea_level (ncvar%air_pressure_at_sea_level)
------------------------------------------------------------------
Conventions = 'CF-1.12'
detect_parameters = '{"u_in_file": "u_ref_interpolated_final.nc", "v_in_file":
"v_ref_interpolated_final.nc", "vort_in_file":
"vort850_interpolated_final.nc", "tm_in_file":
"tm_interpolated_final.nc", "slp_in_file":
"slp_interpolated_final.nc", "use_sfc_wind": true,
"vort_crit": 3.5e-05, "tm_crit": 0.0, "thick_crit": 50.0,
"dist_crit": 4.0, "lat_bound_n": 70.0, "lat_bound_s":
-70.0, "do_spline": false, "do_thickness": false}'
featureType = 'trajectory'
long_name = 'Sea Level Pressure'
missing_value = np.float64(-10000000000.0)
standard_name = 'air_pressure_at_sea_level'
stitch_parameters = '{"r_crit": 900.0, "wind_crit": 17.0, "vort_crit": 3.5e-05,
"tm_crit": 0.5, "thick_crit": 50.0, "n_day_crit": 2,
"do_filter": true, "lat_bound_n": 40.0, "lat_bound_s":
-40.0, "do_spline": false, "do_thickness": false}'
tctrack_tracker = 'TSTORMSTracker'
tctrack_version = '0.1.dev157+g482a888b7'
tstorms_parameters = '{"tstorms_dir": "/home/jwa34/rds/hpc-
work/TSTORMS_clean/tropical_storms_pub/", "output_dir":
"/home/jwa34/rds/rds-inspire-tc-
TqEGHMWTn8A/test_data/tstorms/output_test/", "input_dir":
"/rds/project/rds-TqEGHMWTn8A/test_data/tstorms"}'
units = 'Pa'
Data(trajectory(2), observation(13)) = [[1002.11, ..., nan]] Pa
Cell Method: area: maximum (lesser circle of radius 4.0 degrees)
Field Ancillary: status_flag
long_name = 'Trajectory starting at start of dataset flag.'
standard_name = 'status_flag'
Data(trajectory(2)) = [0, 0]
Field Ancillary: status_flag
long_name = 'Trajectory finishing at end of dataset flag.'
standard_name = 'status_flag'
Data(trajectory(2)) = [0, 0]
Domain Axis: observation(13)
Domain Axis: trajectory(2)
Dimension coordinate: trajectory
cf_role = 'trajectory_id'
long_name = 'trajectory index'
standard_name = 'trajectory'
Data(trajectory(2)) = [0, 1]
Dimension coordinate: observation
long_name = 'observation index'
standard_name = 'observation'
Data(observation(13)) = [0, ..., 12]
Auxiliary coordinate: time
calendar = '360_day'
long_name = 'time'
missing_value = np.float64(-100000000.0)
standard_name = 'time'
units = 'days since 1950-01-01 12:00:00.000000'
Data(trajectory(2), observation(13)) = [[1950-01-06 00:00:00, ..., -275828-03-21 12:00:00]] 360_day
Auxiliary coordinate: latitude
long_name = 'latitude'
missing_value = np.float64(-999.9)
standard_name = 'latitude'
units = 'degrees_north'
Data(trajectory(2), observation(13)) = [[-9.26, ..., nan]] degrees_north
Auxiliary coordinate: longitude
long_name = 'longitude'
missing_value = np.float64(-999.9)
standard_name = 'longitude'
units = 'degrees_east'
Data(trajectory(2), observation(13)) = [[67.32, ..., nan]] degrees_east
...
Plotting example
----------------
Tracks can be visualised using variety of softwares.
Here we demonstrate a simple example using NetCDF with cartopy, though users may also
explore cf-python, xarray, or `hurucanpy `_.
.. code-block:: python
import netCDF4
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
# Open the NetCDF file
with netCDF4.Dataset("my_tctrack_output.nc") as ncfile:
# Read variables
lat_var = ncfile.variables["latitude"]
lon_var = ncfile.variables["longitude"]
time_var = ncfile.variables["time"]
intensity_var = ncfile.variables["wind_speed"]
traj_var = ncfile.variables["trajectory"]
lats = lat_var[:]
lons = lon_var[:]
intensity = intensity_var[:]
traj_labels = traj_var[:]
times = time_var[:]
# Convert times to datetime objects
missing_time = getattr(time_var, "missing_value", np.nan)
times = np.ma.masked_where(times == missing_time, times)
time_units = getattr(time_var, "units")
time_calendar = getattr(time_var, "calendar")
times_dt = netCDF4.num2date(times, units=time_units, calendar=time_calendar)
# Get intensity metadata for labels
intensity_name = getattr(intensity_var, "long_name")
intensity_units = getattr(intensity_var, "units", "")
min_intensity = np.nanmin(intensity)
max_intensity = np.nanmax(intensity)
plt.figure(figsize=(10, 6))
ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
ax.gridlines(draw_labels=True)
# Plot each trajectory
for i in traj_labels:
times_i = times_dt[i, :].compressed()
label = (
f"{times_i[0].strftime('%Y-%m-%d %H:%M')} to "
f"{times_i[-1].strftime('%Y-%m-%d %H:%M')}"
)
pl = ax.plot(
lons[i], lats[i], "--", transform=ccrs.PlateCarree(), label=f"{label}"
)
sc = ax.scatter(
lons[i],
lats[i],
c=intensity[i],
cmap="viridis",
s=40,
vmin=min_intensity,
vmax=max_intensity,
transform=ccrs.PlateCarree(),
)
plt.colorbar(sc, label=f"{intensity_name} ({intensity_units})")
plt.title(f"All Trajectories Colored by {intensity_name}")
plt.legend()
plt.savefig("my_tracks.png")
.. image:: ../images/data_tracks_plot.png
:alt: Example tracks plotted using the above code snippet.
References
----------
- `CF Conventions: Trajectory Data `_
- `FAIR Data on go-fair.org `_