TerraFlow Feature Roadmap¶

Last Updated: 2026-02-06

Overview¶

This document outlines the strategic direction for TerraFlow development, organized by priority and implementation complexity. Features are grouped into three main tracks:

Track 1: Stability & Quality COMPLETED: Reliability, testability, and maintainability improvements
Track 2: Capability Expansion v0.2.0 RELEASED: Per-cell climate interpolation and enhanced data support
Track 3: Production Features v1.0 PLANNED: Progress tracking, fingerprinting, and operational deployment

Track 1: Stability & Quality ✅ COMPLETED¶

Features completed in this phase improve reliability, testability, and maintainability.

Fully Implemented

All features in this track have been completed and released in v0.1.x series.

✅ Resource Management¶

Fix unclosed rasterio file handles (resource leak)
Implement proper context manager patterns
Add resource cleanup tests

✅ Error Handling & Validation¶

File existence validation before operations
ROI bounds validation (min < max)
Raster band count validation
Model parameter range validation (v_min < v_max, etc.)
Climate CSV column validation
Configuration file validation with helpful error messages
Fix overly broad exception handling (bare except)

✅ Testing¶

CLI unit tests (arguments, errors, help)
Error path tests (missing files, invalid configs)
Geo module edge case tests (invalid bounds, non-intersecting ROI)
Ingest module tests (file validation, malformed data)
Test coverage for 14 critical scenarios

✅ Documentation¶

Comprehensive module docstrings
Function parameter/return/exception documentation
Architecture Decision Records (ADRs)
CLI help text with examples

✅ Dependencies¶

Remove unused imports (xarray, geopandas)
Add version constraints for reproducibility
Sync package version (0.2.0) across pyproject.toml and init.py

✅ Code Quality¶

Fix spatial sampling bias (random.sample vs list slicing)
Improve logging messages with context
Add validation hooks in Pydantic models

Track 2: Capability Expansion ✅ v0.2.0 RELEASED¶

Features that add significant value while maintaining focus on agricultural modeling.

Enhanced Climate Data Support (v0.2.0)

Per-cell climate interpolation is now fully implemented with two configurable strategies:

✅ Spatial interpolation using scipy.interpolate.griddata
✅ Index-based matching for pre-aligned data
✅ Graceful fallback to global mean for sparse data
✅ Comprehensive validation and 32 test cases
✅ Full documentation and ADR-003

📌 Progress Tracking & Observability v1.0 PLANNED¶

Goal: Users can monitor long-running jobs and understand what's happening.

Planned Features

Progress bar: Show sampling progress for large ROIs

Sampling cells: [████████░░] 80/100

Runtime estimation: Predict total time based on raster size
Sampling statistics: Valid cell count, sampling ratio, geographic extent

Implementation: pipeline.py | Effort: 4-6 hours | Tests: Progress accuracy, timeout handling

📌 Run Fingerprinting & Reproducibility v1.0 PLANNED¶

Goal: Track inputs/outputs for reproducibility and auditing.

Manifest File

Generate manifest.json with complete provenance tracking:

{
  "version": "0.2.0",
  "timestamp": "2026-02-06T14:30:00Z",
  "config_hash": "sha256:abc123...",
  "raster_hash": "sha256:def456...",
  "sampled_cells": 500,
  "execution_time_seconds": 12.34
}

Implementation: utils.py, pipeline.py, new fingerprint.py | Effort: 6-8 hours

Keep rasterio window-based reads
Process in overlapping windows with stride
Write batches to parquet file
Return summary stats instead of full DataFrame

Implementation Files: pipeline.py, new output.py Estimated Effort: 12-16 hours Tests Required: Memory usage, output consistency vs current approach Dependencies: pyarrow (for parquet)

📌 Multi-Band Raster Support (Priority: MEDIUM)¶

Goal: Process multi-band data and create composite models.

Features¶

Band selection: Config parameter band: 1 (default) or bands: [1, 2, 3]
Composite scoring: Combine multiple band indices
Weighted average: score = 0.4*ndvi + 0.3*evi + 0.3*moisture
Separate models: Run different model params per band
Auto-detection: Recognize common indices from metadata
NDVI, EVI, NDBI, etc.

Implementation Files: config.py, geo.py, model.py Estimated Effort: 10-12 hours Tests Required: Band validation, multi-band aggregation Related ADR: See adr-001-band-selection.md

📌 Polygon ROI Support (Priority: MEDIUM)¶

Goal: Support arbitrary polygon regions of interest (state boundaries, farms, etc.).

Current Limitation¶

Only bounding box (4 parameters) supported
Users with irregular regions must pre-process

Proposed Features¶

GeoJSON input: Specify ROI as GeoJSON polygon
Shapefile support: Load from .shp/.gpkg files
Named regions: Integrate with GADM or similar
Boundary buffering: Expand point locations to study areas

Implementation Strategy¶

Read polygon geometry via fiona/geopandas
Rasterize polygon to create mask
Apply mask to clipped raster
Continue normal pipeline

Implementation Files: config.py, geo.py, new polygon.py Estimated Effort: 10-14 hours Tests Required: Polygon accuracy, rasterization edge cases Dependencies: fiona or geopandas (optional) Related ADR: See adr-002-bbox-roi.md

Track 3: Production Features (FUTURE)¶

Features for operational deployment, monitoring, and integration.

🎯 Cloud-Native Integration¶

S3/GCS input: Read rasters directly from cloud storage
COG support: Optimize for Cloud-Optimized GeoTIFF reads
Cloud output: Write results to cloud blob storage
Distributed processing: Support Spark/Dask for parallel regions

Estimated Effort: 20+ hours Priority: Q3 2026+

🎯 Web Service & API¶

REST API: Flask/FastAPI endpoint for running pipeline
Job queuing: Background task processing (Celery)
Web UI: Interactive map for ROI selection
Authentication: API key management for hosted instance

Estimated Effort: 40+ hours Priority: Q4 2026+

🎯 Temporal Analysis¶

Time series: Process multiple rasters across time
Trend detection: Identify suitability changes over seasons/years
Phenology: Track seasonal crop development
Anomaly detection: Identify unusual years/regions

Estimated Effort: 24-30 hours Priority: Q3 2026+

🎯 Model Enhancements¶

Machine learning: Learn weights from training data instead of manual specification
Bayesian uncertainty: Quantify confidence in scores
Sensitivity analysis: Identify which factors matter most
Custom indices: Let users define new vegetation/climate indices

Estimated Effort: 30+ hours Priority: Q4 2026+

🎯 Validation & Benchmarking¶

Cross-validation: Leave-one-out or k-fold validation on known sites
Uncertainty bounds: Confidence intervals around predictions
Comparison mode: Compare results across model versions
Performance profiling: Benchmark against standard datasets

Estimated Effort: 12-18 hours Priority: Q2 2026

Implementation Timeline¶

v0.2.0 (Q1 2026)
├─ Progress tracking ✓
├─ Large raster optimization ✓
├─ Enhanced climate support ✓
└─ Run fingerprinting ✓

v1.0.0 (Q2 2026)
├─ Multi-band support ✓
├─ Polygon ROI ✓
├─ Comprehensive tests ✓
└─ Production documentation ✓

v1.1.0 (Q3 2026)
├─ Cloud integration ✓
├─ Temporal analysis ✓
└─ Validation framework ✓

v2.0.0 (Q4 2026)
├─ Web API ✓
├─ ML models ✓
└─ Advanced analytics ✓

Community Contribution Opportunities¶

Areas well-suited for external contributions:

Integration tests for different raster types (GeoTIFF, COG, NetCDF)
Documentation improvements and examples
UI/visualization enhancements
Cloud provider adapters (AWS, GCP, Azure)
Model examples for specific crops/regions
Performance optimization for specific hardware

Decision Framework¶

For new features, consider:

Alignment: Does it serve agricultural modeling use cases?
Simplicity: Can it be explained in <5 minutes?
Testability: Can the feature be rigorously tested?
Maintenance: What's the ongoing support burden?
Dependencies: Do new libraries add value?