TerraFlow in 10 Minutes¶
Everything you need to go from zero to a working suitability map — what it is, why it exists, how it works, and a live run you can follow along with.
What is TerraFlow?¶
TerraFlow is a command-line tool that answers one question:
"Given a piece of land, how suitable is it for a particular crop or use — right now, given the current climate?"
It takes three inputs:
| Input | What it is | Example |
|---|---|---|
| A land-cover map (raster) | A satellite-derived map of the land, broken into pixels | USDA Cropland Data Layer (CDL) |
| A climate data file | Temperature and rainfall readings from nearby weather stations | CSV with lat, lon, mean_temp, total_rain |
| A configuration file | Your choices: which region, what crop thresholds, how many sites | config.yml |
And produces one output: results.csv — a table where every row is a sampled location with a suitability score (0–1) and a label (low / medium / high).
Why does it exist?¶
Assessing land suitability is traditionally done by hand — an agronomist looks at soil maps, calls the local weather office, and applies expert judgment to a spreadsheet. That process is:
- Slow — days or weeks for a single region
- Inconsistent — different analysts reach different conclusions
- Not reproducible — the next analyst can't trace exactly what was done
TerraFlow makes it:
- Fast — seconds for hundreds of locations
- Consistent — same config always gives same result
- Fully reproducible — every run is fingerprinted; two people with the same config and data get byte-identical outputs
How does the pipeline work?¶
flowchart TD
A[Your config.yml] --> B[Load land-cover raster]
B --> C[Crop to ROI]
C --> D[Load climate CSV]
D --> E[Interpolate climate to each pixel]
E --> F[Calculate scores]
F --> G[Apply weighted formula]
G --> H[Generate labels]
H --> I[Write results.csv]
F --> |vegetation × w_v| G
F --> |temperature × w_t| G
F --> |rainfall × w_r| G
I --> J[cell_id | lat | lon | score | label]
style A fill:#2d8a55,stroke:#1e5c3a,color:#fff
style I fill:#2d8a55,stroke:#1e5c3a,color:#fff
style G fill:#40a86e,stroke:#2d6a4f,color:#fff
Key Design Choices
- WGS84 output: Coordinates are always in WGS84 degrees (lat/lon) regardless of input projection
- Reproducible sampling: Same config + data always produces identical output via SHA-256 fingerprint seeding
- Portable configs: Relative paths resolve relative to the config file location, not working directory
Try it now (5 commands)¶
Expected output (values will vary by sampled cells):
cell_id,lat,lon,v_index,mean_temp,total_rain,score,label
0,39.14,-100.82,87.0,20.3,142.1,0.71,high
1,38.55,-99.20,42.0,19.8,138.4,0.44,medium
2,39.88,-97.61,12.0,20.1,135.9,0.23,low
...
What the output columns mean¶
| Column | Meaning |
|---|---|
cell_id |
Index of the sampled pixel within your ROI |
lat / lon |
Geographic coordinates in WGS84 degrees |
v_index |
Raw value from the land-cover raster at this pixel |
mean_temp |
Interpolated temperature (°C) at this location |
total_rain |
Interpolated rainfall (mm) at this location |
score |
Suitability score from 0 (worst) to 1 (best) |
label |
Human-readable tier: low / medium / high |
Configuring for your crop¶
The config file controls everything. Here is a minimal example:
raster_path: "../data/my_land_cover.tif" # (1)!
climate_csv: "../data/weather_stations.csv" # (2)!
output_dir: "../outputs/my_run"
roi: # (3)!
type: bbox
xmin: -101.0 # West boundary (longitude)
ymin: 38.0 # South boundary (latitude)
xmax: -94.0 # East boundary (longitude)
ymax: 40.0 # North boundary (latitude)
model_params: # (4)!
v_min: 0.0 # Lowest acceptable vegetation index
v_max: 255.0 # Highest vegetation index in your raster
t_min: 10.0 # Minimum suitable temperature (°C)
t_max: 35.0 # Maximum suitable temperature (°C)
r_min: 100.0 # Minimum suitable annual rainfall (mm)
r_max: 800.0 # Maximum suitable annual rainfall (mm)
w_v: 0.4 # Weight for vegetation score (must sum to 1.0)
w_t: 0.3 # Weight for temperature score
w_r: 0.3 # Weight for rainfall score
max_cells: 500 # How many locations to sample # (5)!
- Path to your land-cover GeoTIFF. Relative paths resolve from config file location.
- CSV with columns:
lat,lon,mean_temp,total_rainfor weather stations. - Region of interest bounding box in WGS84 degrees (longitude/latitude).
- Crop-specific thresholds defining optimal ranges for vegetation, temperature, and rainfall.
- Number of random locations to sample within the ROI for analysis.
Save this as config.yml and run:
What happens next?¶
| I want to… | Go to… |
|---|---|
| Understand the results without writing code | Field Guide |
| Customise the config in detail | Configuration Schema |
| Contribute to the codebase | Development Guide |
| Understand the architecture and design decisions | Architecture Overview |
| See the full list of known issues and improvements | AUDIT.md (git-ignored, developers only) |