interpret_community.mimic.models package¶
Module for explainable surrogate models.
- class interpret_community.mimic.models.BaseExplainableModel(**kwargs)¶
Bases:
abc.ABC
,interpret_community.common.chained_identity.ChainedIdentity
The base class for models that can be explained.
- abstract property expected_values¶
Abstract property to get the expected values.
- abstract explain_global(**kwargs)¶
Abstract method to get the global feature importances from the trained explainable model.
- abstract explain_local(evaluation_examples, **kwargs)¶
Abstract method to get the local feature importances from the trained explainable model.
- static explainable_model_type()¶
Retrieve the model type.
- abstract fit(**kwargs)¶
Abstract method to fit the explainable model.
- abstract property model¶
Abstract property to get the underlying model.
- abstract predict(dataset, **kwargs)¶
Abstract method to predict labels using the explainable model.
- abstract predict_proba(dataset, **kwargs)¶
Abstract method to predict probabilities using the explainable model.
- class interpret_community.mimic.models.DecisionTreeExplainableModel(multiclass=False, random_state=123, shap_values_output=ShapValuesOutput.DEFAULT, classification=True, **kwargs)¶
Bases:
interpret_community.mimic.models.explainable_model.BaseExplainableModel
- available_explanations = ['global', 'local']¶
- property expected_values¶
Use TreeExplainer to get the expected values.
- Returns
The expected values of the decision tree tree model.
- Return type
- explain_global(**kwargs)¶
Call tree model feature importances to get the global feature importances from the tree surrogate model.
- Returns
The global explanation of feature importances.
- Return type
- explain_local(evaluation_examples, probabilities=None, **kwargs)¶
Use TreeExplainer to get the local feature importances from the trained explainable model.
- Parameters
evaluation_examples (numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix) – The evaluation examples to compute local feature importances for.
probabilities (numpy.ndarray) – If output_type is probability, can specify the teacher model’s probability for scaling the shap values.
- Returns
The local explanation of feature importances.
- Return type
Union[list, numpy.ndarray]
- static explainable_model_type()¶
Retrieve the model type.
- Returns
Tree explainable model type.
- Return type
- explainer_type = 'model'¶
Decision Tree explainable model.
- Parameters
multiclass (bool) – Set to true to generate a multiclass model.
random_state (int) – Int to seed the model.
shap_values_output (interpret_community.common.constants.ShapValuesOutput) – The type of the output from explain_local when using TreeExplainer. Currently only types ‘default’, ‘probability’ and ‘teacher_probability’ are supported. If ‘probability’ is specified, then we approximately scale the raw log-odds values from the TreeExplainer to probabilities.
classification (bool) – Indicates if this is a classification or regression explanation.
- fit(dataset, labels, **kwargs)¶
Call tree fit to fit the explainable model.
- param dataset
The dataset to train the model on.
- type dataset
numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix
- param labels
The labels to train the model on.
- type labels
numpy.ndarray
If multiclass=True, uses the parameters for DecisionTreeClassifier: Build a decision tree classifier from the training set (X, y).
Parameters
- X{arraylike, sparse matrix} of shape (n_samples, n_features)
The training input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsc_matrix
.- yarraylike of shape (n_samples,) or (n_samples, n_outputs)
The target values (class labels) as integers or strings.
- sample_weightarraylike of shape (n_samples,), default=None
Sample weights. If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
- check_inputbool, default=True
Allow to bypass several input checking. Don’t use this parameter unless you know what you’re doing.
Returns
- selfDecisionTreeClassifier
Fitted estimator.
Otherwise, if multiclass=False, uses the parameters for DecisionTreeRegressor: Build a decision tree regressor from the training set (X, y).
Parameters
- X{arraylike, sparse matrix} of shape (n_samples, n_features)
The training input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsc_matrix
.- yarraylike of shape (n_samples,) or (n_samples, n_outputs)
The target values (real numbers). Use
dtype=np.float64
andorder='C'
for maximum efficiency.- sample_weightarraylike of shape (n_samples,), default=None
Sample weights. If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node.
- check_inputbool, default=True
Allow to bypass several input checking. Don’t use this parameter unless you know what you’re doing.
Returns
- selfDecisionTreeRegressor
Fitted estimator.
- property model¶
Retrieve the underlying model.
- Returns
The decision tree model, either classifier or regressor.
- Return type
Union[sklearn.tree.DecisionTreeClassifier, sklearn.tree.DecisionTreeRegressor]
- predict(dataset, **kwargs)¶
Call tree predict to predict labels using the explainable model.
- param dataset
The dataset to predict on.
- type dataset
numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix
- return
The predictions of the model.
- rtype
list
If multiclass=True, uses the parameters for DecisionTreeClassifier: Predict class or regression value for X.
For a classification model, the predicted class for each sample in X is returned. For a regression model, the predicted value based on X is returned.
Parameters
- X{arraylike, sparse matrix} of shape (n_samples, n_features)
The input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsr_matrix
.- check_inputbool, default=True
Allow to bypass several input checking. Don’t use this parameter unless you know what you’re doing.
Returns
- yarraylike of shape (n_samples,) or (n_samples, n_outputs)
The predicted classes, or the predict values.
Otherwise, if multiclass=False, uses the parameters for DecisionTreeRegressor: Predict class or regression value for X.
For a classification model, the predicted class for each sample in X is returned. For a regression model, the predicted value based on X is returned.
Parameters
- X{arraylike, sparse matrix} of shape (n_samples, n_features)
The input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsr_matrix
.- check_inputbool, default=True
Allow to bypass several input checking. Don’t use this parameter unless you know what you’re doing.
Returns
- yarraylike of shape (n_samples,) or (n_samples, n_outputs)
The predicted classes, or the predict values.
- predict_proba(dataset, **kwargs)¶
Call tree predict_proba to predict probabilities using the explainable model.
- param dataset
The dataset to predict probabilities on.
- type dataset
numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix
- return
The predictions of the model.
- rtype
list
If multiclass=True, uses the parameters for DecisionTreeClassifier: Predict class probabilities of the input samples X.
The predicted class probability is the fraction of samples of the same class in a leaf.
Parameters
- X{arraylike, sparse matrix} of shape (n_samples, n_features)
The input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsr_matrix
.- check_inputbool, default=True
Allow to bypass several input checking. Don’t use this parameter unless you know what you’re doing.
Returns
- probandarray of shape (n_samples, n_classes) or list of n_outputs such arrays if n_outputs > 1
The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
Otherwise predict_proba is not supported for regression or binary classification.
- class interpret_community.mimic.models.LGBMExplainableModel(multiclass=False, random_state=123, shap_values_output=ShapValuesOutput.DEFAULT, classification=True, **kwargs)¶
Bases:
interpret_community.mimic.models.explainable_model.BaseExplainableModel
- available_explanations = ['global', 'local']¶
- property expected_values¶
Use TreeExplainer to get the expected values.
- Returns
The expected values of the LightGBM tree model.
- Return type
- explain_global(**kwargs)¶
Call lightgbm feature importances to get the global feature importances from the explainable model.
- Returns
The global explanation of feature importances.
- Return type
- explain_local(evaluation_examples, probabilities=None, **kwargs)¶
Use TreeExplainer to get the local feature importances from the trained explainable model.
- Parameters
evaluation_examples (numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix) – The evaluation examples to compute local feature importances for.
probabilities (numpy.ndarray) – If output_type is probability, can specify the teacher model’s probability for scaling the shap values.
- Returns
The local explanation of feature importances.
- Return type
Union[list, numpy.ndarray]
- static explainable_model_type()¶
Retrieve the model type.
- Returns
Tree explainable model type.
- Return type
- explainer_type = 'model'¶
LightGBM (fast, high performance framework based on decision tree) explainable model.
Please see documentation for more details: https://github.com/Microsoft/LightGBM
Additional arguments to LightGBMClassifier and LightGBMRegressor can be passed through kwargs.
- Parameters
multiclass (bool) – Set to true to generate a multiclass model.
random_state (int) – Int to seed the model.
shap_values_output (interpret_community.common.constants.ShapValuesOutput) – The type of the output from explain_local when using TreeExplainer. Currently only types ‘default’, ‘probability’ and ‘teacher_probability’ are supported. If ‘probability’ is specified, then we approximately scale the raw log-odds values from the TreeExplainer to probabilities.
classification (bool) – Indicates if this is a classification or regression explanation.
- fit(dataset, labels, **kwargs)¶
Call lightgbm fit to fit the explainable model.
- Parameters
dataset (numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix) – The dataset to train the model on.
labels (numpy.ndarray) – The labels to train the model on.
- property model¶
Retrieve the underlying model.
- Returns
The lightgbm model, either classifier or regressor.
- Return type
Union[LGBMClassifier, LGBMRegressor]
- predict(dataset, **kwargs)¶
Call lightgbm predict to predict labels using the explainable model.
- Parameters
dataset (numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix) – The dataset to predict on.
- Returns
The predictions of the model.
- Return type
- predict_proba(dataset, **kwargs)¶
Call lightgbm predict_proba to predict probabilities using the explainable model.
- Parameters
dataset (numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix) – The dataset to predict probabilities on.
- Returns
The predictions of the model.
- Return type
- class interpret_community.mimic.models.LinearExplainableModel(multiclass=False, random_state=123, classification=True, sparse_data=False, **kwargs)¶
Bases:
interpret_community.mimic.models.explainable_model.BaseExplainableModel
- available_explanations = ['global', 'local']¶
- property expected_values¶
Use LinearExplainer to get the expected values.
- Returns
The expected values of the linear model.
- Return type
- explain_global(**kwargs)¶
Call coef to get the global feature importances from the linear surrogate model.
- Returns
The global explanation of feature importances.
- Return type
- explain_local(evaluation_examples, **kwargs)¶
Use LinearExplainer to get the local feature importances from the trained explainable model.
- Parameters
evaluation_examples (numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix) – The evaluation examples to compute local feature importances for.
- Returns
The local explanation of feature importances.
- Return type
Union[list, numpy.ndarray]
- static explainable_model_type()¶
Retrieve the model type.
- Returns
Linear explainable model type.
- Return type
- explainer_type = 'model'¶
Linear explainable model.
- fit(dataset, labels, **kwargs)¶
Call linear fit to fit the explainable model.
Store the mean and covariance of the background data for local explanation.
- param dataset
The dataset to train the model on.
- type dataset
numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix
- param labels
The labels to train the model on.
- type labels
numpy.ndarray
If multiclass=True, uses the parameters for LogisticRegression:
Fit the model according to the given training data.
Parameters
- X{arraylike, sparse matrix} of shape (n_samples, n_features)
Training vector, where n_samples is the number of samples and n_features is the number of features.
- yarraylike of shape (n_samples,)
Target vector relative to X.
- sample_weightarraylike of shape (n_samples,) default=None
Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
New in version 0.17: sample_weight support to LogisticRegression.
Returns
- self
Fitted estimator.
Notes
The SAGA solver supports both float64 and float32 bit arrays.
Otherwise, if multiclass=False, uses the parameters for LinearRegression:
Fit linear model.
Parameters
- X{arraylike, sparse matrix} of shape (n_samples, n_features)
Training data.
- yarraylike of shape (n_samples,) or (n_samples, n_targets)
Target values. Will be cast to X’s dtype if necessary.
- sample_weightarraylike of shape (n_samples,), default=None
Individual weights for each sample.
New in version 0.17: parameter sample_weight support to LinearRegression.
Returns
- selfobject
Fitted Estimator.
- property model¶
Retrieve the underlying model.
- Returns
The linear model, either classifier or regressor.
- Return type
Union[LogisticRegression, LinearRegression]
- predict(dataset, **kwargs)¶
Call linear predict to predict labels using the explainable model.
- param dataset
The dataset to predict on.
- type dataset
numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix
- return
The predictions of the model.
- rtype
list
If multiclass=True, uses the parameters for LogisticRegression:
Predict class labels for samples in X.
Parameters
- X{arraylike, sparse matrix} of shape (n_samples, n_features)
The data matrix for which we want to get the predictions.
Returns
- y_predndarray of shape (n_samples,)
Vector containing the class labels for each sample.
Otherwise, if multiclass=False, uses the parameters for LinearRegression:
Predict using the linear model.
Parameters
- Xarraylike or sparse matrix, shape (n_samples, n_features)
Samples.
Returns
- Carray, shape (n_samples,)
Returns predicted values.
- predict_proba(dataset, **kwargs)¶
Call linear predict_proba to predict probabilities using the explainable model.
- param dataset
The dataset to predict probabilities on.
- type dataset
numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix
- return
The predictions of the model.
- rtype
list
If multiclass=True, uses the parameters for LogisticRegression:
Probability estimates.
The returned estimates for all classes are ordered by the label of classes.
For a multi_class problem, if multi_class is set to be “multinomial” the softmax function is used to find the predicted probability of each class. Else use a onevsrest approach, i.e calculate the probability of each class assuming it to be positive using the logistic function. and normalize these values across all the classes.
Parameters
- Xarraylike of shape (n_samples, n_features)
Vector to be scored, where n_samples is the number of samples and n_features is the number of features.
Returns
- Tarraylike of shape (n_samples, n_classes)
Returns the probability of the sample for each class in the model, where classes are ordered as they are in
self.classes_
.
Otherwise predict_proba is not supported for regression or binary classification.
- class interpret_community.mimic.models.SGDExplainableModel(multiclass=False, random_state=123, classification=True, **kwargs)¶
Bases:
interpret_community.mimic.models.explainable_model.BaseExplainableModel
- available_explanations = ['global', 'local']¶
- property expected_values¶
Use LinearExplainer to get the expected values.
- Returns
The expected values of the linear model.
- Return type
- explain_global(**kwargs)¶
Call coef to get the global feature importances from the SGD surrogate model.
- Returns
The global explanation of feature importances.
- Return type
- explain_local(evaluation_examples, **kwargs)¶
Use LinearExplainer to get the local feature importances from the trained explainable model.
- Parameters
evaluation_examples (numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix) – The evaluation examples to compute local feature importances for.
- Returns
The local explanation of feature importances.
- Return type
Union[list, numpy.ndarray]
- explainer_type = 'model'¶
Stochastic Gradient Descent explainable model.
- fit(dataset, labels, **kwargs)¶
Call linear fit to fit the explainable model.
Store the mean and covariance of the background data for local explanation.
- param dataset
The dataset to train the model on.
- type dataset
numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix
- param labels
The labels to train the model on.
- type labels
numpy.ndarray
If multiclass=True, uses the parameters for SGDClassifier: Fit linear model with Stochastic Gradient Descent.
Parameters
- X{arraylike, sparse matrix}, shape (n_samples, n_features)
Training data.
- yndarray of shape (n_samples,)
Target values.
- coef_initndarray of shape (n_classes, n_features), default=None
The initial coefficients to warmstart the optimization.
- intercept_initndarray of shape (n_classes,), default=None
The initial intercept to warmstart the optimization.
- sample_weightarraylike, shape (n_samples,), default=None
Weights applied to individual samples. If not provided, uniform weights are assumed. These weights will be multiplied with class_weight (passed through the constructor) if class_weight is specified.
Returns
- selfobject
Returns an instance of self.
Otherwise, if multiclass=False, uses the parameters for SGDRegressor: Fit linear model with Stochastic Gradient Descent.
Parameters
- X{arraylike, sparse matrix}, shape (n_samples, n_features)
Training data.
- yndarray of shape (n_samples,)
Target values.
- coef_initndarray of shape (n_features,), default=None
The initial coefficients to warmstart the optimization.
- intercept_initndarray of shape (1,), default=None
The initial intercept to warmstart the optimization.
- sample_weightarraylike, shape (n_samples,), default=None
Weights applied to individual samples (1. for unweighted).
Returns
- selfobject
Fitted SGDRegressor estimator.
- property model¶
Retrieve the underlying model.
- Returns
The SGD model, either classifier or regressor.
- Return type
Union[SGDClassifier, SGDRegressor]
- predict(dataset, **kwargs)¶
Call SGD predict to predict labels using the explainable model.
- param dataset
The dataset to predict on.
- type dataset
numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix
- return
The predictions of the model.
- rtype
list
If multiclass=True, uses the parameters for SGDClassifier:
Predict class labels for samples in X.
Parameters
- X{arraylike, sparse matrix} of shape (n_samples, n_features)
The data matrix for which we want to get the predictions.
Returns
- y_predndarray of shape (n_samples,)
Vector containing the class labels for each sample.
Otherwise, if multiclass=False, uses the parameters for SGDRegressor: Predict using the linear model.
Parameters
- X{arraylike, sparse matrix}, shape (n_samples, n_features)
Input data.
Returns
- ndarray of shape (n_samples,)
Predicted target values per element in X.
- predict_proba(dataset, **kwargs)¶
Call SGD predict_proba to predict probabilities using the explainable model.
- param dataset
The dataset to predict probabilities on.
- type dataset
numpy.ndarray or pandas.DataFrame or scipy.sparse.csr_matrix
- return
The predictions of the model.
- rtype
list
If multiclass=True, uses the parameters for SGDClassifier: Probability estimates.
This method is only available for log loss and modified Huber loss.
Multiclass probability estimates are derived from binary (onevs.rest) estimates by simple normalization, as recommended by Zadrozny and Elkan.
Binary probability estimates for loss=”modified_huber” are given by (clip(decision_function(X), 1, 1) + 1) / 2. For other loss functions it is necessary to perform proper probability calibration by wrapping the classifier with
CalibratedClassifierCV
instead.Parameters
- X{arraylike, sparse matrix}, shape (n_samples, n_features)
Input data for prediction.
Returns
- ndarray of shape (n_samples, n_classes)
Returns the probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.
References
Zadrozny and Elkan, “Transforming classifier scores into multiclass probability estimates”, SIGKDD’02, https://dl.acm.org/doi/pdf/10.1145/775047.775151
The justification for the formula in the loss=”modified_huber” case is in the appendix B in: http://jmlr.csail.mit.edu/papers/volume2/zhang02c/zhang02c.pdf
Otherwise predict_proba is not supported for regression or binary classification.