ADR-003: Climate Data Interpolation Strategies¶
Status: Accepted
Date: 2026-02-06
Deciders: TerraFlow Contributors
Context¶
Previously, TerraFlow applied a single global mean climate value to all sampled raster cells in a ROI. This approach: - Ignored spatial variation in climate across the region - Reduced model accuracy for geographically diverse regions - Limited applicability to large-scale agricultural analysis
We needed a system to apply per-cell climate values while supporting different data sources and matching strategies.
flowchart TD
A[Weather Stations] --> B{Interpolation Strategy}
C[Raster Cells] --> B
B -->|Spatial| D[scipy.griddata]
B -->|Index| E[Direct Matching]
D --> F[Triangulation]
F --> G[Linear Interpolation]
G --> H[Per-Cell Climate Values]
E --> I[Row-to-Cell Mapping]
I --> H
H --> J[Suitability Scoring]
style D fill:#00b0ff,stroke:#0091ea,color:#fff
style E fill:#7c4dff,stroke:#651fff,color:#fff
style H fill:#2d8a55,stroke:#1e5c3a,color:#fff
Decision¶
Implement ClimateInterpolator with two configurable strategies:
1. Spatial Interpolation Strategy (Default)¶
Use case: Climate data with lat/lon coordinates (point observations, station networks, gridded data)
Implementation: scipy.interpolate.griddata with linear interpolation
- Triangulates climate observation points
- Interpolates values for each raster cell based on its geographic location
- Falls back to nearest-neighbor if insufficient data for linear triangulation
- Uses global mean as fallback for cells outside interpolation convex hull
Advantages: - ✅ Supports arbitrary observation locations - ✅ Produces smooth spatial gradients - ✅ Graceful fallback for sparse data - ✅ No strict requirement for exact cell-to-record mapping
Limitations: - Requires ≥3 observation points for proper triangulation - Extrapolation beyond convex hull uses fallback (mean)
2. Index-Based Matching Strategy¶
Use case: Climate data aligned by row index (e.g., pre-processed per-cell datasets)
Implementation: Direct row-to-cell matching
- Climate CSV row i → Raster cell i
- Optional cell_id_column for explicit ID-based matching (future enhancement)
- Flexible handling of mismatched counts with fallback to mean
Advantages: - ✅ Fast (no interpolation computation) - ✅ Deterministic (no randomness in interpolation) - ✅ Works with any data volume
Limitations: - Requires data to be pre-aligned to cells - Less suitable for point observation data
Implementation Details¶
Configuration¶
# config.yml
climate:
strategy: "spatial" # or "index"
cell_id_column: null # optional, for index matching with explicit IDs
fallback_to_mean: true # use global mean for missing/extrapolated values
Validation¶
Climate CSV must have:
- lat column: latitude in [-90, 90]
- lon column: longitude in [-180, 180]
- ≥1 climate variable column (e.g., mean_temp, total_rain)
Warnings logged for: - Duplicate coordinates - NaN values in climate variables - Sparse data requiring fallback
Code Integration¶
from terraflow.climate import ClimateInterpolator
from terraflow.ingest import load_climate_csv
# Load climate data
climate_df = load_climate_csv("climate.csv")
# Create interpolator with config strategy
interpolator = ClimateInterpolator(
climate_df=climate_df,
strategy=cfg.climate.strategy,
cell_id_column=cfg.climate.cell_id_column,
fallback_to_mean=cfg.climate.fallback_to_mean
)
# Get per-cell climate values
cell_climate = interpolator.interpolate(cell_lats, cell_lons)
Consequences¶
Positive¶
- ✅ Per-cell climate variation improves model accuracy
- ✅ Supports both observation-based and pre-processed climate data
- ✅ Graceful handling of incomplete/sparse data
- ✅ Backward compatible with existing ROI/raster workflows
- ✅ Foundation for future temporal analysis (time series interpolation)
Negative¶
- ⚠️ ~6 ms per cell overhead for spatial interpolation (acceptable for <10K cells)
- ⚠️ Requires climate data with valid coordinates (breaking change for old-style CSVs)
- ⚠️ Adds scipy dependency
Neutral¶
- Requires config update for explicit strategy selection (defaults to spatial)
Alternatives Considered¶
1. Nearest-Neighbor Only¶
- Rejected: Less smooth gradients, discontinuities at observation boundaries
2. Kriging Interpolation¶
- Rejected: Higher computational cost, requires variogram fitting
- Can revisit for optional advanced mode (future)
3. Pre-computed Interpolation Grids¶
- Rejected: Requires external preprocessing, less flexible
- Can combine with current approach (accept pre-interpolated climate)
Related Decisions¶
- ADR-001: Single-band raster processing (climate respects this constraint)
- ADR-002: BBox ROI only (climate interpolation works within ROI)
Future Enhancements¶
- Temporal Interpolation: Extend to monthly/seasonal climate variation
- Advanced Interpolation: Optional kriging for statistically rigorous analysis
- Multi-Variable Weighting: Interpolate with variable-specific weights
- Climate Data Sources: Direct integration with climate APIs (ERA5, MERRA2)
- Per-Cell Climate Attribution: Track which observations influenced each cell's estimate