A game theory approach to measuring spatial effects from machine learning models. GeoShapley is built on Shapley value and Kernel SHAP estimator.
GeoShapley can be installed from PyPI:
$ pip install geoshapley
To install the latest version from Github:
$ pip install git+https://github.com/ziqi-li/geoshapley.git
GeoShapley can explain any model that takes tabular data + spatial features (e.g., coordinates) as the input. Examples of natively supported models include:
- XGBoost/CatBoost/LightGBM
- Random Forest
- MLP or other
scikit-learn
modules. - TabNet
- Explainable Boosting Machine
- Statistical models: OLS/Gaussian Process/GWR
Other models can be supported by defining a helper function model.predict() to wrap around their original models' prediction or inference functions.
Currently, spatial features (e.g., coordinates) need to be put as the last columns of your pandas.DataFrame
(X_geo
).
Below shows an example on how to explain a trained MLP model. More examples can be found at the notebooks folder.
from geoshapley import GeoShapleyExplainer
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_geo, y, random_state=1)
#Fit a NN model based on training data
mlp_model = MLPRegressor().fit(X_train, y_train)
#Specify a small background data
background = X_train.sample(100).values
#Initilize a GeoShapleyExplainer
mlp_explainer = GeoShapleyExplainer(mlp_model.predict, background)
#Explain the data
mlp_rslt = mlp_explainer.explain(X_geo)
#Make a shap-style summary plot
mlp_rslt.summary_plot()
#Make partial dependence plots of the primary (non-spatial) effects
mlp_rslt.partial_dependence_plots()
#Calculate spatially varying explanations
mlp_svc = mlp_rslt.get_svc()
Li, Z. (2024). GeoShapley: A Game Theory Approach to Measuring Spatial Effects in Machine Learning Models. Annals of the American Association of Geographers. Open access at: https://www.tandfonline.com/doi/full/10.1080/24694452.2024.2350982