interpret_community.shap.gpu_kernel_explainer module

Defines the GPUKernelExplainer for computing explanations on black box models or functions.

class interpret_community.shap.gpu_kernel_explainer.GPUKernelExplainer(model, initialization_examples, explain_subset=None, is_function=False, nsamples='auto', features=None, classes=None, nclusters=10, show_progress=False, transformations=None, allow_all_transformations=False, model_task=ModelTask.Unknown, **kwargs)

Bases: interpret_community.common.blackbox_explainer.BlackBoxExplainer

available_explanations = ['global', 'local']
explain_global(evaluation_examples, sampling_policy=None, include_local=True, batch_size=100)

Explain the model globally by aggregating local explanations to global. :param evaluation_examples: A matrix of feature vector examples (# examples x # features) on which

to explain the model’s output.

Parameters
  • sampling_policy (interpret_community.common.policy.SamplingPolicy) – Optional policy for sampling the evaluation examples. See documentation on SamplingPolicy for more information.

  • include_local (bool) – Include the local explanations in the returned global explanation. If include_local is False, will stream the local explanations to aggregate to global.

  • batch_size (int) – If include_local is False, specifies the batch size for aggregating local explanations to global.

Returns

A model explanation object. It is guaranteed to be a GlobalExplanation which also has the properties of LocalExplanation and ExpectedValuesMixin. If the model is a classifier, it will have the properties of PerClassMixin.

Return type

DynamicGlobalExplanation

explain_local(evaluation_examples)

Explain the function locally by using SHAP’s KernelExplainer. :param evaluation_examples: A matrix of feature vector examples (# examples x # features) on which

to explain the model’s output.

Returns

A model explanation object. It is guaranteed to be a LocalExplanation which also has the properties of ExpectedValuesMixin. If the model is a classifier, it will have the properties of the ClassesMixin.

Return type

DynamicLocalExplanation

explainer_type = 'blackbox'

GPU version of the Kernel Explainer for explaining black box models or functions.

Uses cuml’s GPU Kernel SHAP. https://docs.rapids.ai/api/cuml/stable/api.html#shap-kernel-explainer

Characteristics of the GPU version:
  • Unlike the SHAP package, nsamples is a parameter at the initialization of the explainer and there is a small initialization time.

  • Only tabular data is supported for now, via passing the background dataset explicitly.

  • Sparse data support is planned for the near future.

  • Further optimizations are in progress. For example, if the background dataset has constant value columns and the observation has the same value in some entries, the number of evaluations of the function can be reduced.

Parameters
  • model (object) – Function that takes a matrix of samples (n_samples, n_features) and computes the output for those samples with shape (n_samples).

  • initialization_examples (numpy.ndarray or pandas.DataFrame) – A matrix of feature vector examples (# examples x # features) for initializing the explainer.

  • explain_subset (list[int]) – List of feature indices. If specified, only selects a subset of the features in the evaluation dataset for explanation, which will speed up the explanation process when number of features is large and the user already knows the set of interested features. The subset can be the top-k features from the model summary.

  • nsamples ('auto' or int) – int (default = 2 * data.shape[1] + 2048) Number of times to re-evaluate the model when explaining each prediction. More samples lead to lower variance estimates of the SHAP values. The “auto” setting uses nsamples = 2 * X.shape[1] + 2048.

  • features (list[str]) – A list of feature names.

  • classes (list[str]) – Class names as a list of strings. The order of the class names should match that of the model output. Only required if explaining classifier.

  • nclusters (int) – Number of means to use for approximation. A dataset is summarized with nclusters mean samples weighted by the number of data points they each represent. When the number of initialization examples is larger than (10 x nclusters), those examples will be summarized with k-means where k = nclusters.

  • show_progress (bool) – Default to ‘False’. Determines whether to display the explanation status bar when using shap_values from the cuML KernelExplainer.

  • transformations (sklearn.compose.ColumnTransformer or list[tuple]) –

    sklearn.compose.ColumnTransformer or a list of tuples describing the column name and transformer. When transformations are provided, explanations are of the features before the transformation. The format for a list of transformations is same as the one here: https://github.com/scikit-learn-contrib/sklearn-pandas. If you are using a transformation that is not in the list of sklearn.preprocessing transformations that are supported by the interpret-community package, then this parameter cannot take a list of more than one column as input for the transformation. You can use the following sklearn.preprocessing transformations with a list of columns since these are already one to many or one to one: Binarizer, KBinsDiscretizer, KernelCenterer, LabelEncoder, MaxAbsScaler, MinMaxScaler, Normalizer, OneHotEncoder, OrdinalEncoder, PowerTransformer, QuantileTransformer, RobustScaler, StandardScaler. Examples for transformations that work:

    [
        (["col1", "col2"], sklearn_one_hot_encoder),
        (["col3"], None) #col3 passes as is
    ]
    [
        (["col1"], my_own_transformer),
        (["col2"], my_own_transformer),
    ]
    
    An example of a transformation that would raise an error since it cannot be interpreted as one to many::
    [

    ([“col1”, “col2”], my_own_transformer)

    ]

    The last example would not work since the interpret-community package can’t determine whether my_own_transformer gives a many to many or one to many mapping when taking a sequence of columns.

  • allow_all_transformations (bool) – Allow many to many and many to one transformations.

  • model_task (str) – Optional parameter to specify whether the model is a classification or regression model. In most cases, the type of the model can be inferred based on the shape of the output, where a classifier has a predict_proba method and outputs a 2 dimensional array, while a regressor has a predict method and outputs a 1 dimensional array.