TerraFlow Feature Roadmap¶
Last Updated: 2026-02-06
Overview¶
This document outlines the strategic direction for TerraFlow development, organized by priority and implementation complexity. Features are grouped into three main tracks:
- Track 1: Stability & Quality COMPLETED
- Reliability, testability, and maintainability improvements
- Track 2: Capability Expansion v0.2.0 RELEASED
- Per-cell climate interpolation and enhanced data support
- Track 3: Production Features v1.0 PLANNED
- Progress tracking, fingerprinting, and operational deployment
Track 1: Stability & Quality ✅ COMPLETED¶
Features completed in this phase improve reliability, testability, and maintainability.
Fully Implemented
All features in this track have been completed and released in v0.1.x series.
✅ Resource Management¶
- Fix unclosed rasterio file handles (resource leak)
- Implement proper context manager patterns
- Add resource cleanup tests
✅ Error Handling & Validation¶
- File existence validation before operations
- ROI bounds validation (min < max)
- Raster band count validation
- Model parameter range validation (v_min < v_max, etc.)
- Climate CSV column validation
- Configuration file validation with helpful error messages
- Fix overly broad exception handling (bare
except)
✅ Testing¶
- CLI unit tests (arguments, errors, help)
- Error path tests (missing files, invalid configs)
- Geo module edge case tests (invalid bounds, non-intersecting ROI)
- Ingest module tests (file validation, malformed data)
- Test coverage for 14 critical scenarios
✅ Documentation¶
- Comprehensive module docstrings
- Function parameter/return/exception documentation
- Architecture Decision Records (ADRs)
- CLI help text with examples
✅ Dependencies¶
- Remove unused imports (xarray, geopandas)
- Add version constraints for reproducibility
- Sync package version (0.2.0) across pyproject.toml and init.py
✅ Code Quality¶
- Fix spatial sampling bias (random.sample vs list slicing)
- Improve logging messages with context
- Add validation hooks in Pydantic models
Track 2: Capability Expansion ✅ v0.2.0 RELEASED¶
Features that add significant value while maintaining focus on agricultural modeling.
Enhanced Climate Data Support (v0.2.0)
Per-cell climate interpolation is now fully implemented with two configurable strategies:
- ✅ Spatial interpolation using
scipy.interpolate.griddata - ✅ Index-based matching for pre-aligned data
- ✅ Graceful fallback to global mean for sparse data
- ✅ Comprehensive validation and 32 test cases
- ✅ Full documentation and ADR-003
📌 Progress Tracking & Observability v1.0 PLANNED¶
Goal: Users can monitor long-running jobs and understand what's happening.
Planned Features
Progress bar: Show sampling progress for large ROIs
Runtime estimation: Predict total time based on raster size
Sampling statistics: Valid cell count, sampling ratio, geographic extent
Implementation: pipeline.py | Effort: 4-6 hours | Tests: Progress accuracy, timeout handling
📌 Run Fingerprinting & Reproducibility v1.0 PLANNED¶
Goal: Track inputs/outputs for reproducibility and auditing.
Manifest File
Generate manifest.json with complete provenance tracking:
Implementation: utils.py, pipeline.py, new fingerprint.py | Effort: 6-8 hours
- Keep rasterio window-based reads
- Process in overlapping windows with stride
- Write batches to parquet file
- Return summary stats instead of full DataFrame
Implementation Files: pipeline.py, new output.py
Estimated Effort: 12-16 hours
Tests Required: Memory usage, output consistency vs current approach
Dependencies: pyarrow (for parquet)
📌 Multi-Band Raster Support (Priority: MEDIUM)¶
Goal: Process multi-band data and create composite models.
Features¶
- Band selection: Config parameter
band: 1(default) orbands: [1, 2, 3] - Composite scoring: Combine multiple band indices
- Weighted average:
score = 0.4*ndvi + 0.3*evi + 0.3*moisture -
Separate models: Run different model params per band
-
Auto-detection: Recognize common indices from metadata
- NDVI, EVI, NDBI, etc.
Implementation Files: config.py, geo.py, model.py
Estimated Effort: 10-12 hours
Tests Required: Band validation, multi-band aggregation
Related ADR: See adr-001-band-selection.md
📌 Polygon ROI Support (Priority: MEDIUM)¶
Goal: Support arbitrary polygon regions of interest (state boundaries, farms, etc.).
Current Limitation¶
- Only bounding box (4 parameters) supported
- Users with irregular regions must pre-process
Proposed Features¶
- GeoJSON input: Specify ROI as GeoJSON polygon
- Shapefile support: Load from .shp/.gpkg files
- Named regions: Integrate with GADM or similar
- Boundary buffering: Expand point locations to study areas
Implementation Strategy¶
- Read polygon geometry via fiona/geopandas
- Rasterize polygon to create mask
- Apply mask to clipped raster
- Continue normal pipeline
Implementation Files: config.py, geo.py, new polygon.py
Estimated Effort: 10-14 hours
Tests Required: Polygon accuracy, rasterization edge cases
Dependencies: fiona or geopandas (optional)
Related ADR: See adr-002-bbox-roi.md
Track 3: Production Features (FUTURE)¶
Features for operational deployment, monitoring, and integration.
🎯 Cloud-Native Integration¶
- S3/GCS input: Read rasters directly from cloud storage
- COG support: Optimize for Cloud-Optimized GeoTIFF reads
- Cloud output: Write results to cloud blob storage
- Distributed processing: Support Spark/Dask for parallel regions
Estimated Effort: 20+ hours Priority: Q3 2026+
🎯 Web Service & API¶
- REST API: Flask/FastAPI endpoint for running pipeline
- Job queuing: Background task processing (Celery)
- Web UI: Interactive map for ROI selection
- Authentication: API key management for hosted instance
Estimated Effort: 40+ hours Priority: Q4 2026+
🎯 Temporal Analysis¶
- Time series: Process multiple rasters across time
- Trend detection: Identify suitability changes over seasons/years
- Phenology: Track seasonal crop development
- Anomaly detection: Identify unusual years/regions
Estimated Effort: 24-30 hours Priority: Q3 2026+
🎯 Model Enhancements¶
- Machine learning: Learn weights from training data instead of manual specification
- Bayesian uncertainty: Quantify confidence in scores
- Sensitivity analysis: Identify which factors matter most
- Custom indices: Let users define new vegetation/climate indices
Estimated Effort: 30+ hours Priority: Q4 2026+
🎯 Validation & Benchmarking¶
- Cross-validation: Leave-one-out or k-fold validation on known sites
- Uncertainty bounds: Confidence intervals around predictions
- Comparison mode: Compare results across model versions
- Performance profiling: Benchmark against standard datasets
Estimated Effort: 12-18 hours Priority: Q2 2026
Implementation Timeline¶
v0.2.0 (Q1 2026)
├─ Progress tracking ✓
├─ Large raster optimization ✓
├─ Enhanced climate support ✓
└─ Run fingerprinting ✓
v1.0.0 (Q2 2026)
├─ Multi-band support ✓
├─ Polygon ROI ✓
├─ Comprehensive tests ✓
└─ Production documentation ✓
v1.1.0 (Q3 2026)
├─ Cloud integration ✓
├─ Temporal analysis ✓
└─ Validation framework ✓
v2.0.0 (Q4 2026)
├─ Web API ✓
├─ ML models ✓
└─ Advanced analytics ✓
Community Contribution Opportunities¶
Areas well-suited for external contributions:
- Integration tests for different raster types (GeoTIFF, COG, NetCDF)
- Documentation improvements and examples
- UI/visualization enhancements
- Cloud provider adapters (AWS, GCP, Azure)
- Model examples for specific crops/regions
- Performance optimization for specific hardware
Decision Framework¶
For new features, consider:
- Alignment: Does it serve agricultural modeling use cases?
- Simplicity: Can it be explained in <5 minutes?
- Testability: Can the feature be rigorously tested?
- Maintenance: What's the ongoing support burden?
- Dependencies: Do new libraries add value?