interpret_community.common.policy module

Defines explanation policies.

class interpret_community.common.policy.SamplingPolicy(allow_eval_sampling=False, max_dim_clustering=50, sampling_method='hdbscan', **kwargs)

Bases: ChainedIdentity

Defines the sampling policy for downsampling the evaluation examples.

The policy is a set of parameters that can be tuned to speed up or improve the accuracy of the explain_model function during sampling.

Parameters:
  • allow_eval_sampling (bool) – Default to ‘False’. Specify whether to allow sampling of evaluation data. If ‘True’, cluster the evaluation data and determine the optimal number of points for sampling. Set to ‘True’ to speed up the process when the evaluation data set is large and you only want to generate model summary info.

  • max_dim_clustering (int) – Default to 50 and only take effect when ‘allow_eval_sampling’ is set to ‘True’. Specify the dimensionality to reduce the evaluation data before clustering for sampling. When doing sampling to determine how aggressively to downsample without getting poor explanation results uses a heuristic to find the optimal number of clusters. Since KMeans performs poorly on high dimensional data PCA or Truncated SVD is first run to reduce the dimensionality, which is followed by finding the optimal k by running KMeans until a local minimum is reached as determined by computing the silhouette score, reducing k each time.

  • sampling_method (str) – The sampling method for determining how much to downsample the evaluation data by. If allow_eval_sampling is True, the evaluation data is downsampled to a max_threshold, and then this heuristic is used to determine how much more to downsample the evaluation data without losing accuracy on the calculated feature importance values. By default, this is set to hdbscan, but you can also specify kmeans. With hdbscan the number of clusters is automatically determined and multiplied by a threshold. With kmeans, the optimal number of clusters is found by running KMeans until the maximum silhouette score is calculated, with k halved each time.

Return type:

dict

Returns:

The arguments for the sampling policy

property allow_eval_sampling

Get whether to allow sampling of evaluation data.

Returns:

Whether to allow sampling of evaluation data.

Return type:

bool

property max_dim_clustering

Get the dimensionality to reduce the evaluation data before clustering for sampling.

Returns:

The dimensionality to reduce the evaluation data before clustering for sampling.

Return type:

int

property sampling_method

Get the sampling method for determining how much to downsample the evaluation data by.

Returns:

The sampling method for determining how much to downsample the evaluation data by.

Return type:

str