Skip to content

terraflow.stats

The stats module exposes the Pydantic summary models that back report.json plus helper functions for raster and climate statistics.

Public surface

  • RasterSummary, ClimateSummary, RunReport — Pydantic v2 models.
  • summarize_raster, summarize_raster_file, compare_rasters, batch_summarize — helper functions.

API Reference

stats

ClimateSummary

Bases: BaseModel

Simple summary describing climate inputs used for a pipeline run.

RasterSummary

Bases: BaseModel

Summary statistics for a single-band raster (or ROI subset).

This model is Pydantic-based so it can be safely serialized to JSON/YAML and used in downstream reporting or auditing.

from_array(arr) classmethod

Build a RasterSummary from a masked array. Entirely masked arrays yield count=0 and all other fields None.

Source code in terraflow/stats.py
@classmethod
def from_array(cls, arr: np.ma.MaskedArray) -> "RasterSummary":
    """
    Build a RasterSummary from a masked array.
    Entirely masked arrays yield count=0 and all other fields None.
    """
    if arr.count() == 0:
        return cls(count=0, mean=None, std=None, min=None, max=None)

    data = arr.compressed().astype("float64")
    if data.size == 1:
        std_val: Optional[float] = 0.0
    else:
        std_val = float(data.std(ddof=1))

    return cls(
        count=int(data.size),
        mean=float(data.mean()),
        std=std_val,
        min=float(data.min()),
        max=float(data.max()),
    )

RunReport

Bases: BaseModel

Lightweight, serializable report of a TerraFlow run.

This is intentionally generic: it does not depend on the internal dataframe schema, only on aggregated summaries.

is_empty_roi property

Convenience flag: True if the ROI resulted in no valid cells.

batch_summarize(raster_paths, roi=None, roi_crs=None)

Summarize a collection of rasters and return a mapping from file stem to RasterSummary instances.

Parameters:

Name Type Description Default
raster_paths Iterable[str | Path]

Iterable of paths to raster files.

required
roi Optional[Dict[str, float]]

Optional ROI dict; see :func:summarize_raster for details.

None
roi_crs Optional[str]

CRS of the ROI coordinates; see :func:summarize_raster for details.

None
Source code in terraflow/stats.py
def batch_summarize(
    raster_paths: Iterable[str | Path],
    roi: Optional[Dict[str, float]] = None,
    roi_crs: Optional[str] = None,
) -> Dict[str, RasterSummary]:
    """
    Summarize a collection of rasters and return a mapping from file stem
    to RasterSummary instances.

    Parameters
    ----------
    raster_paths:
        Iterable of paths to raster files.
    roi:
        Optional ROI dict; see :func:`summarize_raster` for details.
    roi_crs:
        CRS of the ROI coordinates; see :func:`summarize_raster` for details.
    """
    summaries: Dict[str, RasterSummary] = {}

    for path in raster_paths:
        path = Path(path)
        summaries[path.stem] = summarize_raster_file(path, roi=roi, roi_crs=roi_crs)

    return summaries

compare_rasters(raster_a, raster_b, roi=None, roi_crs=None, mode='difference')

Compare two rasters over an optional ROI and return the resulting array plus summary statistics.

Parameters:

Name Type Description Default
raster_a DatasetReader

Open rasterio datasets with the same shape/transform.

required
raster_b DatasetReader

Open rasterio datasets with the same shape/transform.

required
roi Optional[Dict[str, float]]

Optional ROI dict with keys xmin, ymin, xmax, ymax. Coordinates must be in the CRS given by roi_crs.

None
roi_crs Optional[str]

CRS of the ROI coordinates. When None (default) the native CRS of raster_a is used.

None
mode Literal['difference', 'ratio']
  • "difference": compute (a - b)
  • "ratio": compute a / b for non-zero pixels in b
'difference'
Source code in terraflow/stats.py
def compare_rasters(
    raster_a: DatasetReader,
    raster_b: DatasetReader,
    roi: Optional[Dict[str, float]] = None,
    roi_crs: Optional[str] = None,
    mode: Literal["difference", "ratio"] = "difference",
) -> Tuple[np.ma.MaskedArray, RasterSummary]:
    """
    Compare two rasters over an optional ROI and return the resulting array
    plus summary statistics.

    Parameters
    ----------
    raster_a, raster_b:
        Open rasterio datasets with the same shape/transform.
    roi:
        Optional ROI dict with keys xmin, ymin, xmax, ymax.  Coordinates must
        be in the CRS given by *roi_crs*.
    roi_crs:
        CRS of the ROI coordinates.  When ``None`` (default) the native CRS of
        *raster_a* is used.
    mode:
        - "difference": compute (a - b)
        - "ratio": compute a / b for non-zero pixels in b
    """
    if roi is not None:
        _roi_crs_a = roi_crs if roi_crs is not None else raster_a.crs.to_string()
        _roi_crs_b = roi_crs if roi_crs is not None else raster_b.crs.to_string()
        arr_a, _ = clip_raster_to_roi(raster_a, roi, roi_crs=_roi_crs_a)
        arr_b, _ = clip_raster_to_roi(raster_b, roi, roi_crs=_roi_crs_b)
    else:
        arr_a = raster_a.read(1, masked=True)
        arr_b = raster_b.read(1, masked=True)

    if arr_a.shape != arr_b.shape:
        raise ValueError(
            f"Raster shapes differ: {arr_a.shape} vs {arr_b.shape}. "
            "Ensure rasters are aligned before comparison."
        )

    if mode == "difference":
        result = arr_a - arr_b
    elif mode == "ratio":
        result = np.ma.divide(arr_a, arr_b)
    else:
        raise ValueError(f"Unsupported mode: {mode!r}")

    return result, RasterSummary.from_array(result)

summarize_raster(raster, roi=None, roi_crs=None)

Compute summary statistics for band 1 of a raster, optionally clipped to an ROI.

Parameters:

Name Type Description Default
raster DatasetReader

Open rasterio dataset.

required
roi Optional[Dict[str, float]]

Optional dict with keys xmin, ymin, xmax, ymax. Coordinates must be in the CRS given by roi_crs. When roi_crs is None the coordinates are assumed to be in the raster's native CRS.

None
roi_crs Optional[str]

EPSG code or WKT string for the CRS of the ROI coordinates. When None (default) the raster's own CRS is used so that native-CRS coordinates are accepted without reprojection.

None
Source code in terraflow/stats.py
def summarize_raster(
    raster: DatasetReader,
    roi: Optional[Dict[str, float]] = None,
    roi_crs: Optional[str] = None,
) -> RasterSummary:
    """
    Compute summary statistics for band 1 of a raster, optionally clipped to an ROI.

    Parameters
    ----------
    raster:
        Open rasterio dataset.
    roi:
        Optional dict with keys xmin, ymin, xmax, ymax.  Coordinates must be
        in the CRS given by *roi_crs*.  When *roi_crs* is ``None`` the
        coordinates are assumed to be in the raster's native CRS.
    roi_crs:
        EPSG code or WKT string for the CRS of the ROI coordinates.  When
        ``None`` (default) the raster's own CRS is used so that native-CRS
        coordinates are accepted without reprojection.
    """
    if roi is not None:
        _roi_crs = roi_crs if roi_crs is not None else raster.crs.to_string()
        data, _ = clip_raster_to_roi(raster, roi, roi_crs=_roi_crs)
    else:
        data = raster.read(1, masked=True)

    return RasterSummary.from_array(data)

summarize_raster_file(raster_path, roi=None, roi_crs=None)

Convenience wrapper: open a raster, summarize it, and close it.

Parameters:

Name Type Description Default
raster_path str | Path

Path to the raster file.

required
roi Optional[Dict[str, float]]

Optional ROI dict; see :func:summarize_raster for details.

None
roi_crs Optional[str]

CRS of the ROI coordinates; see :func:summarize_raster for details.

None
Source code in terraflow/stats.py
def summarize_raster_file(
    raster_path: str | Path,
    roi: Optional[Dict[str, float]] = None,
    roi_crs: Optional[str] = None,
) -> RasterSummary:
    """
    Convenience wrapper: open a raster, summarize it, and close it.

    Parameters
    ----------
    raster_path:
        Path to the raster file.
    roi:
        Optional ROI dict; see :func:`summarize_raster` for details.
    roi_crs:
        CRS of the ROI coordinates; see :func:`summarize_raster` for details.
    """
    raster_path = Path(raster_path)

    with rasterio.open(raster_path) as ds:
        return summarize_raster(ds, roi=roi, roi_crs=roi_crs)