Skip to content

TerraFlow in 10 Minutes

Everything you need to go from zero to a working suitability map — what it is, why it exists, how it works, and a live run you can follow along with.


What is TerraFlow?

TerraFlow is a command-line tool that answers one question:

"Given a piece of land, how suitable is it for a particular crop or use — right now, given the current climate?"

It takes three inputs:

Input What it is Example
A land-cover map (raster) A satellite-derived map of the land, broken into pixels USDA Cropland Data Layer (CDL)
A climate data file Temperature and rainfall readings from nearby weather stations CSV with lat, lon, mean_temp, total_rain
A configuration file Your choices: which region, what crop thresholds, how many sites config.yml

And produces one output: results.csv — a table where every row is a sampled location with a suitability score (0–1) and a label (low / medium / high).


Why does it exist?

Assessing land suitability is traditionally done by hand — an agronomist looks at soil maps, calls the local weather office, and applies expert judgment to a spreadsheet. That process is:

  • Slow — days or weeks for a single region
  • Inconsistent — different analysts reach different conclusions
  • Not reproducible — the next analyst can't trace exactly what was done

TerraFlow makes it:

  • Fast — seconds for hundreds of locations
  • Consistent — same config always gives same result
  • Fully reproducible — every run is fingerprinted; two people with the same config and data get byte-identical outputs

How does the pipeline work?

flowchart TD
    A[Your config.yml] --> B[Load land-cover raster]
    B --> C[Crop to ROI]
    C --> D[Load climate CSV]
    D --> E[Interpolate climate to each pixel]
    E --> F[Calculate scores]
    F --> G[Apply weighted formula]
    G --> H[Generate labels]
    H --> I[Write results.csv]

    F --> |vegetation × w_v| G
    F --> |temperature × w_t| G
    F --> |rainfall × w_r| G

    I --> J[cell_id | lat | lon | score | label]

    style A fill:#2d8a55,stroke:#1e5c3a,color:#fff
    style I fill:#2d8a55,stroke:#1e5c3a,color:#fff
    style G fill:#40a86e,stroke:#2d6a4f,color:#fff

Key Design Choices

  • WGS84 output: Coordinates are always in WGS84 degrees (lat/lon) regardless of input projection
  • Reproducible sampling: Same config + data always produces identical output via SHA-256 fingerprint seeding
  • Portable configs: Relative paths resolve relative to the config file location, not working directory

Try it now (5 commands)

# 1. Clone and install
git clone https://github.com/gmarupilla/AgroTerraFlow.git
cd TerraFlow
pip install -e ".[dev]"

# 2. Run the demo
terraflow -c examples/demo_config.yml

# 3. Look at the results
head -5 outputs/demo_run/results.csv
from terraflow.pipeline import main
from pathlib import Path

# Run the pipeline from Python
config_path = Path("examples/demo_config.yml")
main(str(config_path))

# Results written to outputs/demo_run/results.csv
import pandas as pd
df = pd.read_csv("outputs/demo_run/results.csv")
print(df.head())

Expected output (values will vary by sampled cells):

cell_id,lat,lon,v_index,mean_temp,total_rain,score,label
0,39.14,-100.82,87.0,20.3,142.1,0.71,high
1,38.55,-99.20,42.0,19.8,138.4,0.44,medium
2,39.88,-97.61,12.0,20.1,135.9,0.23,low
...

What the output columns mean

Column Meaning
cell_id Index of the sampled pixel within your ROI
lat / lon Geographic coordinates in WGS84 degrees
v_index Raw value from the land-cover raster at this pixel
mean_temp Interpolated temperature (°C) at this location
total_rain Interpolated rainfall (mm) at this location
score Suitability score from 0 (worst) to 1 (best)
label Human-readable tier: low / medium / high

Configuring for your crop

The config file controls everything. Here is a minimal example:

config.yml
raster_path: "../data/my_land_cover.tif"  # (1)!
climate_csv: "../data/weather_stations.csv"  # (2)!
output_dir: "../outputs/my_run"

roi:  # (3)!
  type: bbox
  xmin: -101.0   # West boundary (longitude)
  ymin: 38.0     # South boundary (latitude)
  xmax: -94.0    # East boundary (longitude)
  ymax: 40.0     # North boundary (latitude)

model_params:  # (4)!
  v_min: 0.0     # Lowest acceptable vegetation index
  v_max: 255.0   # Highest vegetation index in your raster
  t_min: 10.0    # Minimum suitable temperature (°C)
  t_max: 35.0    # Maximum suitable temperature (°C)
  r_min: 100.0   # Minimum suitable annual rainfall (mm)
  r_max: 800.0   # Maximum suitable annual rainfall (mm)
  w_v: 0.4       # Weight for vegetation score (must sum to 1.0)
  w_t: 0.3       # Weight for temperature score
  w_r: 0.3       # Weight for rainfall score

max_cells: 500   # How many locations to sample  # (5)!
  1. Path to your land-cover GeoTIFF. Relative paths resolve from config file location.
  2. CSV with columns: lat, lon, mean_temp, total_rain for weather stations.
  3. Region of interest bounding box in WGS84 degrees (longitude/latitude).
  4. Crop-specific thresholds defining optimal ranges for vegetation, temperature, and rainfall.
  5. Number of random locations to sample within the ROI for analysis.

Save this as config.yml and run:

terraflow -c config.yml

What happens next?

I want to… Go to…
Understand the results without writing code Field Guide
Customise the config in detail Configuration Schema
Contribute to the codebase Development Guide
Understand the architecture and design decisions Architecture Overview
See the full list of known issues and improvements AUDIT.md (git-ignored, developers only)