Skip to content

Architecture Boundaries

TerraFlow splits the pipeline into two primary layers:

flowchart TD subgraph Ingest["Ingest Layer"] A[File System] B[Cloud Storage] C[APIs] A & B & C --> D[Data Loaders] end D --> E[Validated Data] subgraph Core["Core Layer"] E --> F[Configuration Validation] F --> G[ROI Clipping] G --> H[Climate Aggregation] H --> I[Suitability Scoring] I --> J[Artifact Generation] end J --> K[results.csv] J --> L[fingerprint.json] J --> M[results.html] style Ingest fill:#00b0ff,stroke:#0091ea,color:#fff style Core fill:#2d8a55,stroke:#1e5c3a,color:#fff style E fill:#40a86e,stroke:#2d6a4f,color:#fff

Ingest layer

Ingest-layer details will be finalized later.

Core layer

The core layer:

  • Validates configuration.
  • Clips raster data to the ROI.
  • Aggregates climate metrics.
  • Computes suitability scores and labels.
  • Writes run artifacts.

The core layer should not contain any file system discovery or remote fetch logic; it relies on ingest to provide all data.

Why the boundary matters

Keeping ingestion and core computation separate ensures that:

Deterministic & Testable

Pipeline logic remains deterministic and testable without filesystem dependencies.

Extensible Data Sources

Future data sources (e.g., cloud buckets) can be added without rewriting scoring logic.

Audit-Friendly

The system remains audit-friendly for reproducible research with clear data provenance.