Reproducibility Guarantees¶
TerraFlow is designed so that a researcher can cite a specific run_fingerprint
and reviewers, collaborators, or auditors can regenerate byte-identical outputs
from the same inputs and configuration. This page documents exactly what
that guarantee covers, what it does not cover, and how to cite a run.
What the run fingerprint includes¶
A run_fingerprint is a deterministic SHA-256 hex digest computed from:
- The entire YAML configuration, canonicalised as JSON (keys sorted,
booleans/floats normalised). This means every field —
raster_path,raster_band,max_cells, everymodel_paramsweight, the completeclimateblock (includinginterpolation_methodandvariogram_mode), theroiblock, and any optionalsensitivity/validation/exportconfig — contributes to the fingerprint. - The SHA-256 content hash of the input raster (GeoTIFF bytes).
- The SHA-256 content hash of the climate CSV (file bytes).
- The optional reference CSV hash for
terraflow validate, when present.
The fingerprint is the directory name under <output_dir>/runs/ where all
artifacts land. Identical inputs and config ⇒ identical fingerprint ⇒ the
cache hit path returns the previously computed features.parquet,
manifest.json, and report.json without recomputation.
What the run fingerprint excludes (by design)¶
These are intentionally not part of the fingerprint:
- File mtimes. Copying a file across machines rewrites the mtime; only the content hash is load-bearing.
- Absolute paths. Two users with inputs at different paths but the same content get the same fingerprint.
- Wall-clock time.
manifest.jsonrecordscreated_at_utcfor audit, but it does not feed back into the hash. - Host metadata. Hostname, username, git SHA of the working tree are
recorded in
manifest.jsonfor provenance but never influence the hash. - Installed package versions. See the limits section below — this is a known non-guarantee, not an oversight.
What is strongly reproducible (bit-identical)¶
Under the same Python version, same numpy / scipy / pykrige wheels, and
same OS/architecture:
- The sampled set of ROI cells (
rng.choiceis seeded from the fingerprint). - The per-cell
lat,lon,v_index,mean_temp,total_rain,score, andlabelvalues. - The per-cell Monte Carlo confidence-interval columns (
score_ci_low,score_ci_high) whenuncertainty_samples > 0. - All Sobol' and Morris indices from
terraflow sensitivity. - All spatial-block CV, kappa, and Moran's-I values from
terraflow validate.
The sampling path is exercised across the max_cells == n_valid_cells and
max_cells > n_valid_cells boundaries by the regression tests in
tests/test_determinism.py::TestMaxCellsBoundary.
Known sources of non-determinism and their limits¶
Reproducibility is a floor, not a ceiling. These are the documented departures:
Variogram fitting (pykrige extended mode)¶
terraflow's extended variogram_mode fits nested variogram families
using scipy.optimize.curve_fit, which invokes the Levenberg–Marquardt
solver. curve_fit is deterministic given identical inputs and initial
guesses, but across different scipy versions the solver's convergence
path can shift by a few ULPs, producing very small differences in
variogram parameters (psill, range_, nugget). Standard-mode
variograms (spherical / exponential / Gaussian) are selected by LOOCV RMSE
and are not affected.
Delaunay triangulation tie-breaking (scipy)¶
Linear and nearest-neighbour interpolation via scipy.interpolate.griddata
use scipy.spatial.Delaunay under the hood. When four or more stations
lie on a common circle, the triangulation is ambiguous and the specific
triangulation chosen depends on the underlying qhull version. In
practice this manifests only with degenerate station layouts; real-world
networks rarely trigger it.
Floating-point summation order¶
Across OS / BLAS combinations (Accelerate on macOS, OpenBLAS on Linux,
MKL on Intel), numpy's internal summation order for large reductions
can differ in the last bit. Score values will match to ~1e-12 across
platforms, not bit-identical.
Cache invalidation on schema bump¶
Running an older TerraFlow against a newer features.parquet (or vice
versa) is handled: the pipeline reads the embedded
terraflow_schema_version on cache hit and invalidates the cache,
logging a WARNING, when the version does not match the current
FEATURES_SCHEMA_VERSION. See
tests/test_pipeline.py::TestCacheSchemaVersionInvalidation.
Station-coordinate deduplication¶
If the climate CSV contains duplicate lat/lon rows (common with
aggregated NOAA summaries), TerraFlow averages them at
ClimateInterpolator construction. The merge is deterministic and
bit-identical, but it does mean the underlying input CSV and the
effective station set used for kriging are different objects. The
averaging step is logged at INFO level with the before/after station
counts.
How to cite a specific run¶
For a publication, cite the run the way you would cite a software version:
Results produced by TerraFlow v0.2.2
(https://pypi.org/project/terraflow-agro/0.2.2),
run_fingerprint=<hex>, inputs sha256:<raster>, <climate>.
All three fields — the package version, the fingerprint, and the input
hashes — are recorded in the manifest.json of every run. The raw
payload can be pasted verbatim into supplementary materials.
Reproducibility check-list for reviewers¶
- Install the exact version:
pip install terraflow-agro==0.2.2. - Recompute the input hashes:
sha256sum <raster.tif> <climate.csv>. - Point the config at those paths and run
terraflow run -c cfg.yml. - Verify that the resulting directory name under
runs/matches the publishedrun_fingerprint. - Open
manifest.jsonand confirm theinput_fingerprints.sha256values match your step-2 hashes.
If all four match, the artifacts are byte-identical to the cited run (within the known floating-point limits above).