Skip to content

terraflow.climate

The climate module provides spatial interpolation and index-based matching for aligning climate observations to raster cells.

Overview

Per-cell climate interpolation via configurable spatial algorithms:

  • "linear" (default): scipy.interpolate.griddata — fast, no extra dependencies
  • "kriging": Ordinary Kriging via pykrige — geostatistically optimal; also produces per-cell uncertainty ({var}_krig_std columns)
  • "idw": Inverse Distance Weighting (power=2) — faster than kriging, no uncertainty output
  • Index-based matching: Row order or explicit cell ID matching for pre-aligned data
  • Graceful fallbacks: Global mean values for cells outside interpolation range or with sparse data

Quick Example

import pandas as pd
from terraflow.climate import ClimateInterpolator

# Load climate data
climate_df = pd.read_csv("weather_stations.csv")

# Create spatial interpolator with kriging (produces uncertainty columns)
interpolator = ClimateInterpolator(
    climate_df=climate_df,
    strategy="spatial",
    interpolation_method="kriging",
    fallback_to_mean=True
)

# Interpolate values for raster cell locations
import numpy as np
cell_lats = np.array([39.14, 38.55])
cell_lons = np.array([-100.82, -99.20])
interpolated = interpolator.interpolate(cell_lats, cell_lons)

Use Cases

  • Spatial: Weather station networks, satellite gridded data
  • Index: Pre-processed per-cell climate datasets

API Reference

climate

Climate data handling with spatial interpolation and matching strategies.

This module provides tools for aligning climate data with raster cells using index-based lookup or spatial interpolation. Three spatial methods are supported:

  • linear (default) — Delaunay-triangulation linear interpolation via scipy.interpolate.griddata. Fast, zero extra dependencies, no uncertainty estimate.
  • kriging — Ordinary Kriging via pykrige. Geostatistically optimal (Best Linear Unbiased Predictor) with automatic variogram model selection (spherical / exponential / Gaussian) based on Leave-One-Out Cross-Validation (LOOCV). Returns per-cell prediction standard deviation as {var}_krig_std columns. Requires ≥ MIN_KRIGING_STATIONS stations.
  • idw — Inverse Distance Weighting (power=2). Faster than kriging, no extra dependencies beyond numpy, no uncertainty estimate.

Cross-validation (LOOCV) is computed at initialisation time for the kriging method and stored in ClimateInterpolator.cv_metrics. The pipeline writes these metrics to report.json under interpolation_cv.

Example

.. code-block:: python

import pandas as pd
from terraflow.climate import ClimateInterpolator

climate_df = pd.read_csv("climate.csv")
cell_lats = [40.5, 40.6, 40.7]
cell_lons = [-74.5, -74.4, -74.3]

interpolator = ClimateInterpolator(
    climate_df=climate_df,
    strategy="spatial",
    interpolation_method="kriging",
    fallback_to_mean=True,
)
cell_climate = interpolator.interpolate(cell_lats, cell_lons)
# Returns DataFrame with mean_temp, total_rain,
# mean_temp_krig_std, total_rain_krig_std columns.

ClimateInterpolator(climate_df, strategy='spatial', interpolation_method='linear', cell_id_column=None, fallback_to_mean=True)

Spatially interpolate or index-match climate data to raster cells.

Supports two top-level strategies for aligning climate observations to raster cells:

  • "spatial" — interpolate from weather-station coordinates to cell centroids. Three algorithms are available via interpolation_method: "linear" (default), "kriging", "idw".
  • "index" — match cells to climate records by row order or cell ID.

When interpolation_method="kriging", the interpolator:

  1. Selects the best variogram model (spherical / exponential / Gaussian) by Leave-One-Out Cross-Validation (LOOCV) RMSE on the first climate variable.
  2. Stores per-variable LOOCV RMSE and MAE in cv_metrics.
  3. Returns {var}_krig_std columns alongside the point estimates.

Attributes:

Name Type Description
climate_df DataFrame

Climate data with lat, lon, and numeric variable columns.

strategy str

Top-level matching strategy ("spatial" or "index").

interpolation_method str

Spatial algorithm ("linear", "kriging", or "idw"). Only used when strategy="spatial".

cell_id_column str or None

Optional column for explicit cell-ID matching ("index" strategy).

fallback_to_mean bool

If True, replace NaN predictions with the global variable mean.

cv_metrics dict

LOOCV results populated at init when interpolation_method="kriging". Empty dict for other methods.

climate_columns list[str]

Numeric climate variable columns (excludes lat, lon, cell_id_column).

Initialise the interpolator and (for kriging) select variogram model.

Parameters:

Name Type Description Default
climate_df DataFrame

DataFrame with climate data. Must have lat and lon columns.

required
strategy Literal['spatial', 'index']

"spatial" for coordinate-based interpolation; "index" for direct row/ID matching.

'spatial'
interpolation_method Literal['linear', 'kriging', 'idw']

Spatial interpolation algorithm. "linear" uses scipy.griddata; "kriging" uses PyKrige OrdinaryKriging with automatic variogram model selection; "idw" uses inverse distance weighting. Only relevant when strategy="spatial".

'linear'
cell_id_column Optional[str]

Column name for cell-ID matching ("index" strategy only).

None
fallback_to_mean bool

Replace NaN / extrapolated values with the global variable mean.

True

Raises:

Type Description
ValueError

If strategy or interpolation_method is invalid, required columns are missing, or coordinate ranges are violated.

ImportError

If interpolation_method="kriging" and PyKrige is not installed.

Source code in terraflow/climate.py
def __init__(
    self,
    climate_df: pd.DataFrame,
    strategy: Literal["spatial", "index"] = "spatial",
    interpolation_method: Literal["linear", "kriging", "idw"] = "linear",
    cell_id_column: Optional[str] = None,
    fallback_to_mean: bool = True,
):
    """Initialise the interpolator and (for kriging) select variogram model.

    Parameters
    ----------
    climate_df:
        DataFrame with climate data.  Must have ``lat`` and ``lon``
        columns.
    strategy:
        ``"spatial"`` for coordinate-based interpolation; ``"index"``
        for direct row/ID matching.
    interpolation_method:
        Spatial interpolation algorithm.  ``"linear"`` uses
        ``scipy.griddata``; ``"kriging"`` uses PyKrige OrdinaryKriging
        with automatic variogram model selection; ``"idw"`` uses inverse
        distance weighting.  Only relevant when ``strategy="spatial"``.
    cell_id_column:
        Column name for cell-ID matching (``"index"`` strategy only).
    fallback_to_mean:
        Replace NaN / extrapolated values with the global variable mean.

    Raises
    ------
    ValueError
        If ``strategy`` or ``interpolation_method`` is invalid, required
        columns are missing, or coordinate ranges are violated.
    ImportError
        If ``interpolation_method="kriging"`` and PyKrige is not installed.
    """
    if strategy not in ("spatial", "index"):
        raise ValueError(f"strategy must be 'spatial' or 'index', got '{strategy}'")
    if interpolation_method not in ("linear", "kriging", "idw"):
        raise ValueError(
            f"interpolation_method must be 'linear', 'kriging', or 'idw', "
            f"got '{interpolation_method}'"
        )

    self.climate_df = climate_df.copy()
    self.strategy = strategy
    self.interpolation_method = interpolation_method
    self.cell_id_column = cell_id_column
    self.fallback_to_mean = fallback_to_mean
    self.cv_metrics: Dict = {}
    self._krig_variogram_model: str = "spherical"  # overwritten by _init_kriging
    self.variogram_params: dict = {}

    self._validate_columns()

    if self.strategy == "spatial":
        self._validate_spatial_data()

    self._climate_mean = self._compute_mean_climate()

    if self.strategy == "spatial" and self.interpolation_method == "kriging":
        self._init_kriging()

    logger.info(
        f"ClimateInterpolator initialised: strategy='{strategy}', "
        f"interpolation_method='{interpolation_method}', records={len(self.climate_df)}, variables={self.climate_columns}"
    )

interpolate(cell_lats, cell_lons)

Interpolate or match climate data to cell locations.

Parameters:

Name Type Description Default
cell_lats ndarray

Array of cell latitudes (WGS84).

required
cell_lons ndarray

Array of cell longitudes (WGS84).

required

Returns:

Type Description
DataFrame

One row per cell, columns for each climate variable. When interpolation_method="kriging", additional {var}_krig_std columns are included (prediction standard deviation from the kriging variance).

Raises:

Type Description
ValueError

If array lengths do not match.

Source code in terraflow/climate.py
def interpolate(self, cell_lats: np.ndarray, cell_lons: np.ndarray) -> pd.DataFrame:
    """Interpolate or match climate data to cell locations.

    Parameters
    ----------
    cell_lats:
        Array of cell latitudes (WGS84).
    cell_lons:
        Array of cell longitudes (WGS84).

    Returns
    -------
    pd.DataFrame
        One row per cell, columns for each climate variable.  When
        ``interpolation_method="kriging"``, additional
        ``{var}_krig_std`` columns are included (prediction standard
        deviation from the kriging variance).

    Raises
    ------
    ValueError
        If array lengths do not match.
    """
    cell_lats = np.asarray(cell_lats)
    cell_lons = np.asarray(cell_lons)

    if len(cell_lats) != len(cell_lons):
        raise ValueError(
            f"cell_lats and cell_lons must have same length. "
            f"Got {len(cell_lats)} and {len(cell_lons)}"
        )

    if self.strategy == "spatial":
        return self._interpolate_spatial(cell_lats, cell_lons)
    else:
        return self._match_by_index(cell_lats, cell_lons)

CoordinateRange

Bases: BaseModel

Validated geographic coordinate pair.

Ensures latitude is in [-90, 90] and longitude in [-180, 180]. Pydantic provides automatic validation and type coercion.

validate_latitude(v) classmethod

Ensure latitude is in valid range [-90, 90].

Source code in terraflow/climate.py
@field_validator("latitude")
@classmethod
def validate_latitude(cls, v: float) -> float:
    """Ensure latitude is in valid range [-90, 90]."""
    if not isinstance(v, (int, float)):
        raise ValueError(f"Latitude must be numeric, got {type(v).__name__}")
    if v < -90 or v > 90:
        raise ValueError(f"Latitude must be in [-90, 90], got {v}")
    return float(v)

validate_longitude(v) classmethod

Ensure longitude is in valid range [-180, 180].

Source code in terraflow/climate.py
@field_validator("longitude")
@classmethod
def validate_longitude(cls, v: float) -> float:
    """Ensure longitude is in valid range [-180, 180]."""
    if not isinstance(v, (int, float)):
        raise ValueError(f"Longitude must be numeric, got {type(v).__name__}")
    if v < -180 or v > 180:
        raise ValueError(f"Longitude must be in [-180, 180], got {v}")
    return float(v)