Intergroup differences

For examples and interpretation, see my notebook on intergroup difference explanations.

gshap.intergroup.IntergroupDifference

class gshap.intergroup.IntergroupDifference(group, distance='absolute_mean_distance') [source]

This class measures the distance between distributions of predicted outcomes for different groups.

Paramters: group : numpy.array or pandas.Series

(# observations,) array of boolean or binary values indicating group membership.

distance : callable or str, default='absolute_mean_distance'

Takes two vectors of model output for the outgroup and ingroup. Output vectors will usually be (# outgroup,) and (# ingroup,), or (# outgroup, # classes) and (# ingroup, # classes). distance returns a scalar measure of intergroup difference, such as the absolute difference between group means. If input as a string, distance is used as a key to look up built-in distance functions.

Attributes: group : numpy.array

Set from the group parameter. If the parameter is passed as a pandas.Series, it is automatically converted in a numpy.array.

distance : callable or str

Set from the distance parameter.

Examples

import gshap
from gshap.datasets import load_recidivism
from gshap.intergroup import IntergroupDifference

from sklearn.svm import SVC

recidivism = load_recidivism()
X, y = recidivism.data, recidivism.target
clf = SVC().fit(X,y)

g = IntergroupDifference(group=X['black'], distance='relative_mean_distance')
explainer = gshap.KernelExplainer(clf.predict, X, g)
explainer.gshap_values(X, nsamples=10)

Out:

array([ 0.01335252,  0.24884556,  0.00132373, -0.0025238 , -0.00151837,
    0.40453822,  0.01636782,  0.07666043, -0.00056414,  0.00966583])

Methods

__call__(self, output) [source]

Compute distance measure between groups.

Parameters: ouput : numpy.array or pandas.Series

Model output, usually a (# observations,) or (# observations, # classes) vector.

Returns: distance : scalar

Measure of the distance between the distributions of predicted outputs for outgroup and ingroup observations.

gshap.intergroup.absolute_mean_distance

def gshap.intergroup.absolute_mean_distance(out_0, out_1) [source]

Parameters: out_0 : np.array

(# observations,) vector of model outputs for outgroup observations.

out_1 : np.array

(# observations,) vector of model outputs for ingroup observations.

Returns: distance : scalar

out_1.mean() - out_0.mean()

gshap.intergroup.relative_mean_distance

def gshap.intergroup.relative_mean_distance(out_0, out_1) [source]

Parameters: out_0 : np.array

(# observations,) vector of model outputs for outgroup observations.

out_1 : np.array

(# observations,) vector of model outputs for ingroup observations.

Returns: distance : scalar

out_1.mean() / out_0.mean() - 1