Skip to content

Getting Started

The FeatureArrayEstimator

The FeatureArrayEstimator class provides a wrapper around scikit-learn compatible estimators, allowing you to apply methods like predict, predict_proba, transform, and kneighbors directly to raster data in a variety of formats including Numpy and Xarray arrays. The FeatureArrayEstimator handles reshaping between n-dimensional raster coordinates and 1-dimensional sample coordinates used by estimators, parallelizes operations across Dask chunks, and tracks metadata like band names, NoData values, and spatial references.

Example Usage

To generate predictions from a raster dataset, instantiate a scikit-learn estimator, wrap it into a FeatureArrayEstimator, then fit1 it with tabular data. The X dataset should include predictor features that correspond with your raster2 bands. For supervised classification, the y dataset should include one or more targets that will be predicted as output bands.

from sklearn.ensemble import RandomForestRegressor
from sklearn_raster import FeatureArrayEstimator

est = FeatureArrayEstimator(RandomForestRegressor(n_estimators=500))
est.fit(X, y)

Once fit, methods like predict can be used to generate georeferenced, gridded outputs from raster inputs.

import rioxarray

da = rioxarray.open_rasterio("rgb_image.tif")
pred = est.predict(da)

Next Steps

User Guide

The user guide contains more information about specific topics like:

Tutorials

Run interactive tutorial notebooks to demo features like:


  1. Estimators must be wrapped before fitting to allow sklearn-raster to access necessary metadata like the names and number of targets. Wrapping a pre-fit estimator will reset the estimator and raise a warning. 

  2. sklearn-raster works with any gridded data of arbitrary dimensionality, including geospatial rasters, climate data, and biomedical imagery. The user guide generally focuses on geospatial workflows and uses associated terminology. Gridded input datasets are sometimes generically referred to as feature arrays