Getting Started
The FeatureArrayEstimator¶
The FeatureArrayEstimator class provides a wrapper around scikit-learn compatible estimators, allowing you to apply methods like predict, predict_proba, transform, and kneighbors directly to raster data in a variety of formats including Numpy and Xarray arrays. The FeatureArrayEstimator handles reshaping between n-dimensional raster coordinates and 1-dimensional sample coordinates used by estimators, parallelizes operations across Dask chunks, and tracks metadata like band names, NoData values, and spatial references.
Example Usage¶
To generate predictions from a raster dataset, instantiate a scikit-learn estimator, wrap it into a FeatureArrayEstimator, then fit1 it with tabular data. The X dataset should include predictor features that correspond with your raster2 bands. For supervised classification, the y dataset should include one or more targets that will be predicted as output bands.
from sklearn.ensemble import RandomForestRegressor
from sklearn_raster import FeatureArrayEstimator
est = FeatureArrayEstimator(RandomForestRegressor(n_estimators=500))
est.fit(X, y)
Once fit, methods like predict can be used to generate georeferenced, gridded outputs from raster inputs.
Next Steps¶
User Guide¶
The user guide contains more information about specific topics like:
- Which estimators are compatible with
sklearn-raster - Supported raster formats and their pros and cons
- How metadata like spatial references, band names, and NoData masks are handled
- Performance tips for working with large datasets and Dask
- How
sklearn-rastercompares to related packages likesklearn-xarray,dask-ml, andscikit-map
Tutorials¶
Run interactive tutorial notebooks to demo features like:
- Supervised classification and regression
- Unsupervised clustering
- Dimensionality reduction with pipelines
-
Estimators must be wrapped before fitting to allow
sklearn-rasterto access necessary metadata like the names and number of targets. Wrapping a pre-fit estimator will reset the estimator and raise a warning. ↩ -
sklearn-rasterworks with any gridded data of arbitrary dimensionality, including geospatial rasters, climate data, and biomedical imagery. The user guide generally focuses on geospatial workflows and uses associated terminology. Gridded input datasets are sometimes generically referred to as feature arrays. ↩