interpret_community.shap.kernel_explainer module

Defines the KernelExplainer for computing explanations on black box models or functions.

class interpret_community.shap.kernel_explainer.KernelExplainer(model, initialization_examples, is_function=False, explain_subset=None, nsamples='auto', features=None, classes=None, nclusters=10, show_progress=True, transformations=None, allow_all_transformations=False, model_task=<ModelTask.Unknown: 'unknown'>, **kwargs)

Bases: interpret_community.common.blackbox_explainer.BlackBoxExplainer

available_explanations = ['global', 'local']
explain_global(evaluation_examples, sampling_policy=None, include_local=True, batch_size=100)

Explain the model globally by aggregating local explanations to global.

Parameters:
  • evaluation_examples (numpy.array or pandas.DataFrame or scipy.sparse.csr_matrix) – A matrix of feature vector examples (# examples x # features) on which to explain the model’s output.
  • sampling_policy (interpret_community.common.policy.SamplingPolicy) – Optional policy for sampling the evaluation examples. See documentation on SamplingPolicy for more information.
  • include_local (bool) – Include the local explanations in the returned global explanation. If include_local is False, will stream the local explanations to aggregate to global.
  • batch_size (int) – If include_local is False, specifies the batch size for aggregating local explanations to global.
Returns:

A model explanation object. It is guaranteed to be a GlobalExplanation which also has the properties of LocalExplanation and ExpectedValuesMixin. If the model is a classifier, it will have the properties of PerClassMixin.

Return type:

DynamicGlobalExplanation

explain_local(evaluation_examples)

Explain the function locally by using SHAP’s KernelExplainer.

Parameters:evaluation_examples (DatasetWrapper) – A matrix of feature vector examples (# examples x # features) on which to explain the model’s output.
Returns:A model explanation object. It is guaranteed to be a LocalExplanation which also has the properties of ExpectedValuesMixin. If the model is a classifier, it will have the properties of the ClassesMixin.
Return type:DynamicLocalExplanation
explainer_type = 'blackbox'

The Kernel Explainer for explaining black box models or functions.

Parameters:
  • model (object) – The model to explain or function if is_function is True. A model that implements sklearn.predict or sklearn.predict_proba or function that accepts a 2d ndarray.
  • initialization_examples (numpy.array or pandas.DataFrame or scipy.sparse.csr_matrix) – A matrix of feature vector examples (# examples x # features) for initializing the explainer.
  • is_function (bool) – Default is False. Set to True if passing function instead of a model.
  • explain_subset (list[int]) – List of feature indices. If specified, only selects a subset of the features in the evaluation dataset for explanation, which will speed up the explanation process when number of features is large and the user already knows the set of interested features. The subset can be the top-k features from the model summary.
  • nsamples ('auto' or int) – Default to ‘auto’. Number of times to re-evaluate the model when explaining each prediction. More samples lead to lower variance estimates of the feature importance values, but incur more computation cost. When ‘auto’ is provided, the number of samples is computed according to a heuristic rule.
  • features (list[str]) – A list of feature names.
  • classes (list[str]) – Class names as a list of strings. The order of the class names should match that of the model output. Only required if explaining classifier.
  • nclusters (int) – Number of means to use for approximation. A dataset is summarized with nclusters mean samples weighted by the number of data points they each represent. When the number of initialization examples is larger than (10 x nclusters), those examples will be summarized with k-means where k = nclusters.
  • show_progress (bool) – Default to ‘True’. Determines whether to display the explanation status bar when using shap_values from the KernelExplainer.
  • transformations (sklearn.compose.ColumnTransformer or list[tuple]) –

    sklearn.compose.ColumnTransformer or a list of tuples describing the column name and transformer. When transformations are provided, explanations are of the features before the transformation. The format for a list of transformations is same as the one here: https://github.com/scikit-learn-contrib/sklearn-pandas.

    If you are using a transformation that is not in the list of sklearn.preprocessing transformations that are supported by the interpret-community package, then this parameter cannot take a list of more than one column as input for the transformation. You can use the following sklearn.preprocessing transformations with a list of columns since these are already one to many or one to one: Binarizer, KBinsDiscretizer, KernelCenterer, LabelEncoder, MaxAbsScaler, MinMaxScaler, Normalizer, OneHotEncoder, OrdinalEncoder, PowerTransformer, QuantileTransformer, RobustScaler, StandardScaler.

    Examples for transformations that work:

    [
        (["col1", "col2"], sklearn_one_hot_encoder),
        (["col3"], None) #col3 passes as is
    ]
    [
        (["col1"], my_own_transformer),
        (["col2"], my_own_transformer),
    ]
    

    An example of a transformation that would raise an error since it cannot be interpreted as one to many:

    [
        (["col1", "col2"], my_own_transformer)
    ]
    

    The last example would not work since the interpret-community package can’t determine whether my_own_transformer gives a many to many or one to many mapping when taking a sequence of columns.

  • allow_all_transformations (bool) – Allow many to many and many to one transformations.
  • model_task (str) – Optional parameter to specify whether the model is a classification or regression model. In most cases, the type of the model can be inferred based on the shape of the output, where a classifier has a predict_proba method and outputs a 2 dimensional array, while a regressor has a predict method and outputs a 1 dimensional array.