Kernel Explainer

gshap.KernelExplainer

class gshap.KernelExplainer(model, data, g=lambda x: x.mean()) [source]

The Kernel Explainer is a model-agnostic method of approximating G-SHAP values.

Parameters: model : callable

Callable which takes a (# observations, # features) matrix and returns an output which will be fed into g. For ordinary SHAP, the model returns a (# observations, # targets) output vector.

data : numpy.array or pandas.DataFrame or pandas.Series

Background dataset from which values are randomly sampled to simulate absent features.

g :

Callable which takes the model output and returns a scalar. Defaults to the mean of the output, which is the classical SHAP value.

Attributes: model : callable

Set from the model parameter.

data : numpy.array

Set from the data parameter. If data is a pandas object, it is automatically converted to a numpy.array.

g : callable

Set from the g parameter.

Examples

This example shows how to compute classical SHAP values.

import gshap

from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression

X, y = load_boston(return_X_y=True)
reg = LinearRegression().fit(X,y)
explainer = gshap.KernelExplainer(
    model=reg.predict, data=X, g=lambda x: x.mean()
)
explainer.gshap_values(X, nsamples=1000)

Out:

array([-8.52873964e-04, -4.90442234e-04,  9.42836482e-05,  3.98231297e-04,
    2.03149964e-03,  3.93086231e-03, -7.38176865e-06,  3.81400727e-03,
    5.19437337e-03, -1.34661588e-03,  7.08535145e-04,  1.50486721e-03,
   -8.28480438e-03])

As expected, all SHAP values are 0 for linear regression. We can see this when we compare the mean prediction for the original data X to the shuffled background data explainer.data.

explainer.compare(X, bootstrap_samples=1000)

Out:

22.53280632411067, 22.52089950825812

Methods

compare(self, X, bootstrap_samples=1000) [source]

Compares the background data self.data to the comparison data X in terms of the general function self.g.

Parameters: X : numpy.array or pandas.Series or pandas.DataFrame

(# samples, # features) matrix of comparison data.

bootstrap_samples : int, default=1000

Number of bootstrapped samples for computing g of the background data.

Returns: g_comparison : float

g(model(X)), where X is the comparison data.

g_background : float

g(model(X_b)), where X_b is the shuffled background data.

gshap_values(self, X, **kwargs) [source]

Compute G-SHAP values for all features.

Parameters: X : numpy.array or pandas.DataFrame or pandas.Series

A (# samples, # features) matrix.

nsamples : scalar or 'auto', default='auto'

Number of samples to draw when approximating G-SHAP values.

Returns: gshap_values : np.array

(# features,) vector of G-SHAP values ordered by feature index.

gshap_value(self, j, X, **kwargs) [source]

Compute the G-SHAP value for feature j.

Parameters: j : scalar or column name

The index or column name of the feature of interest.

X : numpy.array or pandas.DataFrame or pandas.Series

A (# samples, # features) matrix.

nsamples : scalar or 'auto', default='auto'

Number of samples to draw when approximating G-SHAP values.

Returns: gshap_value : float

Approximated G-SHAP value for feature j (float).