Skip to content

Architecture Boundaries

TerraFlow splits the pipeline into two primary layers:

flowchart TD
    subgraph Ingest["Ingest Layer"]
        A[File System]
        B[Cloud Storage]
        C[APIs]
        A & B & C --> D[Data Loaders]
    end

    D --> E[Validated Data]

    subgraph Core["Core Layer"]
        E --> F[Configuration Validation]
        F --> G[ROI Clipping]
        G --> H[Climate Aggregation]
        H --> I[Suitability Scoring]
        I --> J[Artifact Generation]
    end

    J --> K[results.csv]
    J --> L[fingerprint.json]
    J --> M[results.html]

    style Ingest fill:#00b0ff,stroke:#0091ea,color:#fff
    style Core fill:#2d8a55,stroke:#1e5c3a,color:#fff
    style E fill:#40a86e,stroke:#2d6a4f,color:#fff

Ingest layer

Ingest-layer details will be finalized later.

Core layer

The core layer:

  • Validates configuration.
  • Clips raster data to the ROI.
  • Aggregates climate metrics.
  • Computes suitability scores and labels.
  • Writes run artifacts.

The core layer should not contain any file system discovery or remote fetch logic; it relies on ingest to provide all data.

Why the boundary matters

Keeping ingestion and core computation separate ensures that:

Deterministic & Testable

Pipeline logic remains deterministic and testable without filesystem dependencies.

Extensible Data Sources

Future data sources (e.g., cloud buckets) can be added without rewriting scoring logic.

Audit-Friendly

The system remains audit-friendly for reproducible research with clear data provenance.