Kernel Explainer
gshap.KernelExplainer
class gshap.KernelExplainer(model, data, g=lambda x: x.mean()) [source]
The Kernel Explainer is a model-agnostic method of approximating G-SHAP values.
Parameters: | model : callable
Callable which takes a (# observations, # features) matrix and returns an output which will be fed into Background dataset from which values are randomly sampled to simulate absent features. g :
Callable which takes the |
---|---|
Attributes: | model : callable
Set from the
Set from the
Set from the |
Examples
This example shows how to compute classical SHAP values.
import gshap
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
X, y = load_boston(return_X_y=True)
reg = LinearRegression().fit(X,y)
explainer = gshap.KernelExplainer(
model=reg.predict, data=X, g=lambda x: x.mean()
)
explainer.gshap_values(X, nsamples=1000)
Out:
array([-8.52873964e-04, -4.90442234e-04, 9.42836482e-05, 3.98231297e-04,
2.03149964e-03, 3.93086231e-03, -7.38176865e-06, 3.81400727e-03,
5.19437337e-03, -1.34661588e-03, 7.08535145e-04, 1.50486721e-03,
-8.28480438e-03])
As expected, all SHAP values are 0 for linear regression. We can see this
when we compare the mean prediction for the original data X
to the
shuffled background data explainer.data
.
explainer.compare(X, bootstrap_samples=1000)
Out:
22.53280632411067, 22.52089950825812
Methods
compare(self, X, bootstrap_samples=1000) [source]
Compares the background data self.data
to the comparison data X
in terms of the general function self.g
.
Parameters: | X : numpy.array or pandas.Series or pandas.DataFrame
(# samples, # features) matrix of comparison data. bootstrap_samples : int, default=1000
Number of bootstrapped samples for computing |
---|---|
Returns: | g_comparison : float
g(model(X)), where X is the comparison data. g_background : floatg(model(X_b)), where X_b is the shuffled background data. |
gshap_values(self, X, **kwargs) [source]
Compute G-SHAP values for all features.
Parameters: | X : numpy.array or pandas.DataFrame or pandas.Series
A (# samples, # features) matrix. nsamples : scalar or 'auto', default='auto'Number of samples to draw when approximating G-SHAP values. |
---|---|
Returns: | gshap_values : np.array
(# features,) vector of G-SHAP values ordered by feature index. |
gshap_value(self, j, X, **kwargs) [source]
Compute the G-SHAP value for feature j
.
Parameters: | j : scalar or column name
The index or column name of the feature of interest. X : numpy.array or pandas.DataFrame or pandas.SeriesA (# samples, # features) matrix. nsamples : scalar or 'auto', default='auto'Number of samples to draw when approximating G-SHAP values. |
---|---|
Returns: | gshap_value : float
Approximated G-SHAP value for feature |