Intergroup differences
For examples and interpretation, see my notebook on intergroup difference explanations.
gshap.intergroup.IntergroupDifference
class gshap.intergroup.IntergroupDifference(group, distance='absolute_mean_distance') [source]
This class measures the distance between distributions of predicted outcomes for different groups.
Paramters: | group : numpy.array or pandas.Series
(# observations,) array of boolean or binary values indicating group membership. distance : callable or str, default='absolute_mean_distance'
Takes two vectors of model output for the outgroup and ingroup. Output vectors will usually be (# outgroup,) and (# ingroup,), or (# outgroup, # classes) and (# ingroup, # classes). |
---|---|
Attributes: | group : numpy.array
Set from the
Set from the |
Examples
import gshap
from gshap.datasets import load_recidivism
from gshap.intergroup import IntergroupDifference
from sklearn.svm import SVC
recidivism = load_recidivism()
X, y = recidivism.data, recidivism.target
clf = SVC().fit(X,y)
g = IntergroupDifference(group=X['black'], distance='relative_mean_distance')
explainer = gshap.KernelExplainer(clf.predict, X, g)
explainer.gshap_values(X, nsamples=10)
Out:
array([ 0.01335252, 0.24884556, 0.00132373, -0.0025238 , -0.00151837,
0.40453822, 0.01636782, 0.07666043, -0.00056414, 0.00966583])
Methods
__call__(self, output) [source]
Compute distance measure between groups.
Parameters: | ouput : numpy.array or pandas.Series
Model output, usually a (# observations,) or (# observations, # classes) vector. |
---|---|
Returns: | distance : scalar
Measure of the distance between the distributions of predicted outputs for outgroup and ingroup observations. |
gshap.intergroup.absolute_mean_distance
def gshap.intergroup.absolute_mean_distance(out_0, out_1) [source]
Parameters: | out_0 : np.array
(# observations,) vector of model outputs for outgroup observations. out_1 : np.array(# observations,) vector of model outputs for ingroup observations. |
---|---|
Returns: | distance : scalar
out_1.mean() - out_0.mean() |
gshap.intergroup.relative_mean_distance
def gshap.intergroup.relative_mean_distance(out_0, out_1) [source]
Parameters: | out_0 : np.array
(# observations,) vector of model outputs for outgroup observations. out_1 : np.array(# observations,) vector of model outputs for ingroup observations. |
---|---|
Returns: | distance : scalar
out_1.mean() / out_0.mean() - 1 |