interpret_community.mimic.models package¶
Module for explainable surrogate models.
-
class
interpret_community.mimic.models.
BaseExplainableModel
(**kwargs)¶ Bases:
interpret_community.common.chained_identity.ChainedIdentity
The base class for models that can be explained.
-
expected_values
¶ Abstract property to get the expected values.
-
explain_global
(**kwargs)¶ Abstract method to get the global feature importances from the trained explainable model.
-
explain_local
(evaluation_examples, **kwargs)¶ Abstract method to get the local feature importances from the trained explainable model.
-
static
explainable_model_type
(self)¶ Retrieve the model type.
-
fit
(**kwargs)¶ Abstract method to fit the explainable model.
-
model
¶ Abstract property to get the underlying model.
-
predict
(dataset, **kwargs)¶ Abstract method to predict labels using the explainable model.
-
predict_proba
(dataset, **kwargs)¶ Abstract method to predict probabilities using the explainable model.
-
-
class
interpret_community.mimic.models.
LGBMExplainableModel
(multiclass=False, random_state=123, shap_values_output=<ShapValuesOutput.DEFAULT: 'default'>, classification=True, **kwargs)¶ Bases:
interpret_community.mimic.models.explainable_model.BaseExplainableModel
-
available_explanations
= ['global', 'local']¶
-
expected_values
¶ Use TreeExplainer to get the expected values.
Returns: The expected values of the LightGBM tree model. Return type: list
-
explain_global
(**kwargs)¶ Call lightgbm feature importances to get the global feature importances from the explainable model.
Returns: The global explanation of feature importances. Return type: numpy.ndarray
-
explain_local
(evaluation_examples, probabilities=None, **kwargs)¶ Use TreeExplainer to get the local feature importances from the trained explainable model.
Parameters: - evaluation_examples (numpy or scipy array) – The evaluation examples to compute local feature importances for.
- probabilities (numpy.ndarray) – If output_type is probability, can specify the teacher model’s probability for scaling the shap values.
Returns: The local explanation of feature importances.
Return type: Union[list, numpy.ndarray]
-
static
explainable_model_type
(self)¶ Retrieve the model type.
Returns: Tree explainable model type. Return type: ExplainableModelType
-
explainer_type
= 'model'¶ LightGBM (fast, high performance framework based on decision tree) explainable model.
Please see documentation for more details: https://github.com/Microsoft/LightGBM
Additional arguments to LightGBMClassifier and LightGBMRegressor can be passed through kwargs.
Parameters: - multiclass (bool) – Set to true to generate a multiclass model.
- random_state (int) – Int to seed the model.
- shap_values_output (interpret_community.common.constants.ShapValuesOutput) – The type of the output from explain_local when using TreeExplainer. Currently only types ‘default’, ‘probability’ and ‘teacher_probability’ are supported. If ‘probability’ is specified, then we approximately scale the raw log-odds values from the TreeExplainer to probabilities.
- classification (bool) – Indicates if this is a classification or regression explanation.
-
fit
(dataset, labels, **kwargs)¶ Call lightgbm fit to fit the explainable model.
param dataset: The dataset to train the model on. type dataset: numpy or scipy array param labels: The labels to train the model on. type labels: numpy or scipy array If multiclass=True, uses the parameters for LGBMClassifier: Build a gradient boosting model from the training set (X, y).
Parameters
- X : arraylike or sparse matrix of shape = [n_samples, n_features]
- Input feature matrix.
- y : arraylike of shape = [n_samples]
- The target values (class labels in classification, real numbers in regression).
- sample_weight : arraylike of shape = [n_samples] or None, optional (default=None)
- Weights of training data.
- init_score : arraylike of shape = [n_samples] or None, optional (default=None)
- Init score of training data.
- eval_set : list or None, optional (default=None)
- A list of (X, y) tuple pairs to use as validation sets.
- eval_names : list of strings or None, optional (default=None)
- Names of eval_set.
- eval_sample_weight : list of arrays or None, optional (default=None)
- Weights of eval data.
- eval_class_weight : list or None, optional (default=None)
- Class weights of eval data.
- eval_init_score : list of arrays or None, optional (default=None)
- Init score of eval data.
- eval_metric : string, callable, list or None, optional (default=None)
- If string, it should be a builtin evaluation metric to use.
If callable, it should be a custom evaluation metric, see note below for more details.
If list, it can be a list of builtin metrics, a list of custom evaluation metrics, or a mix of both.
In either case, the
metric
from the model parameters will be evaluated and used as well. Default: ‘l2’ for LGBMRegressor, ‘logloss’ for LGBMClassifier, ‘ndcg’ for LGBMRanker. - early_stopping_rounds : int or None, optional (default=None)
- Activates early stopping. The model will train until the validation score stops improving.
Validation score needs to improve at least every
early_stopping_rounds
round(s) to continue training. Requires at least one validation data and one metric. If there’s more than one, will check all of them. But the training data is ignored anyway. To check only the first metric, set thefirst_metric_only
parameter toTrue
in additional parameterskwargs
of the model constructor. - verbose : bool or int, optional (default=True)
Requires at least one evaluation data. If True, the eval metric on the eval set is printed at each boosting stage. If int, the eval metric on the eval set is printed at every
verbose
boosting stage. The last boosting stage or the boosting stage found by usingearly_stopping_rounds
is also printed.Example
With
verbose
= 4 and at least one item ineval_set
, an evaluation metric is printed every 4 (instead of 1) boosting stages.- feature_name : list of strings or ‘auto’, optional (default=’auto’)
- Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used.
- categorical_feature : list of strings or int, or ‘auto’, optional (default=’auto’)
- Categorical features.
If list of int, interpreted as indices.
If list of strings, interpreted as feature names (need to specify
feature_name
as well). If ‘auto’ and data is pandas DataFrame, pandas unordered categorical columns are used. All values in categorical features should be less than int32 max value (2147483647). Large values could be memory consuming. Consider using consecutive integers starting from zero. All negative values in categorical features will be treated as missing values. The output cannot be monotonically constrained with respect to a categorical feature. - callbacks : list of callback functions or None, optional (default=None)
- List of callback functions that are applied at each iteration. See Callbacks in Python API for more information.
- init_model : string, Booster, LGBMModel or None, optional (default=None)
- Filename of LightGBM model, Booster instance or LGBMModel instance used for continue training.
Returns
- self : object
- Returns self.
Note
Custom eval function expects a callable with following signatures:
func(y_true, y_pred)
,func(y_true, y_pred, weight)
orfunc(y_true, y_pred, weight, group)
and returns (eval_name, eval_result, is_higher_better) or list of (eval_name, eval_result, is_higher_better):- y_true : arraylike of shape = [n_samples]
- The target values.
- y_pred : arraylike of shape = [n_samples] or shape = [n_samples * n_classes] (for multiclass task)
- The predicted values.
- weight : arraylike of shape = [n_samples]
- The weight of samples.
- group : arraylike
- Group/query data, used for ranking task.
- eval_name : string
- The name of evaluation function (without whitespaces).
- eval_result : float
- The eval result.
- is_higher_better : bool
- Is eval result higher better, e.g. AUC is
is_higher_better
.
For binary task, the y_pred is probability of positive class (or margin in case of custom
objective
). For multiclass task, the y_pred is group by class_id first, then group by row_id. If you want to get ith row y_pred in jth class, the access way is y_pred[j * num_data + i].Otherwise, if multiclass=False, uses the parameters for LGBMRegressor: Build a gradient boosting model from the training set (X, y).
Parameters
- X : arraylike or sparse matrix of shape = [n_samples, n_features]
- Input feature matrix.
- y : arraylike of shape = [n_samples]
- The target values (class labels in classification, real numbers in regression).
- sample_weight : arraylike of shape = [n_samples] or None, optional (default=None)
- Weights of training data.
- init_score : arraylike of shape = [n_samples] or None, optional (default=None)
- Init score of training data.
- eval_set : list or None, optional (default=None)
- A list of (X, y) tuple pairs to use as validation sets.
- eval_names : list of strings or None, optional (default=None)
- Names of eval_set.
- eval_sample_weight : list of arrays or None, optional (default=None)
- Weights of eval data.
- eval_init_score : list of arrays or None, optional (default=None)
- Init score of eval data.
- eval_metric : string, callable, list or None, optional (default=None)
- If string, it should be a builtin evaluation metric to use.
If callable, it should be a custom evaluation metric, see note below for more details.
If list, it can be a list of builtin metrics, a list of custom evaluation metrics, or a mix of both.
In either case, the
metric
from the model parameters will be evaluated and used as well. Default: ‘l2’ for LGBMRegressor, ‘logloss’ for LGBMClassifier, ‘ndcg’ for LGBMRanker. - early_stopping_rounds : int or None, optional (default=None)
- Activates early stopping. The model will train until the validation score stops improving.
Validation score needs to improve at least every
early_stopping_rounds
round(s) to continue training. Requires at least one validation data and one metric. If there’s more than one, will check all of them. But the training data is ignored anyway. To check only the first metric, set thefirst_metric_only
parameter toTrue
in additional parameterskwargs
of the model constructor. - verbose : bool or int, optional (default=True)
Requires at least one evaluation data. If True, the eval metric on the eval set is printed at each boosting stage. If int, the eval metric on the eval set is printed at every
verbose
boosting stage. The last boosting stage or the boosting stage found by usingearly_stopping_rounds
is also printed.Example
With
verbose
= 4 and at least one item ineval_set
, an evaluation metric is printed every 4 (instead of 1) boosting stages.- feature_name : list of strings or ‘auto’, optional (default=’auto’)
- Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used.
- categorical_feature : list of strings or int, or ‘auto’, optional (default=’auto’)
- Categorical features.
If list of int, interpreted as indices.
If list of strings, interpreted as feature names (need to specify
feature_name
as well). If ‘auto’ and data is pandas DataFrame, pandas unordered categorical columns are used. All values in categorical features should be less than int32 max value (2147483647). Large values could be memory consuming. Consider using consecutive integers starting from zero. All negative values in categorical features will be treated as missing values. The output cannot be monotonically constrained with respect to a categorical feature. - callbacks : list of callback functions or None, optional (default=None)
- List of callback functions that are applied at each iteration. See Callbacks in Python API for more information.
- init_model : string, Booster, LGBMModel or None, optional (default=None)
- Filename of LightGBM model, Booster instance or LGBMModel instance used for continue training.
Returns
- self : object
- Returns self.
Note
Custom eval function expects a callable with following signatures:
func(y_true, y_pred)
,func(y_true, y_pred, weight)
orfunc(y_true, y_pred, weight, group)
and returns (eval_name, eval_result, is_higher_better) or list of (eval_name, eval_result, is_higher_better):- y_true : arraylike of shape = [n_samples]
- The target values.
- y_pred : arraylike of shape = [n_samples] or shape = [n_samples * n_classes] (for multiclass task)
- The predicted values.
- weight : arraylike of shape = [n_samples]
- The weight of samples.
- group : arraylike
- Group/query data, used for ranking task.
- eval_name : string
- The name of evaluation function (without whitespaces).
- eval_result : float
- The eval result.
- is_higher_better : bool
- Is eval result higher better, e.g. AUC is
is_higher_better
.
For binary task, the y_pred is probability of positive class (or margin in case of custom
objective
). For multiclass task, the y_pred is group by class_id first, then group by row_id. If you want to get ith row y_pred in jth class, the access way is y_pred[j * num_data + i].
-
model
¶ Retrieve the underlying model.
Returns: The lightgbm model, either classifier or regressor. Return type: Union[LGBMClassifier, LGBMRegressor]
-
predict
(dataset, **kwargs)¶ Call lightgbm predict to predict labels using the explainable model.
param dataset: The dataset to predict on. type dataset: numpy or scipy array return: The predictions of the model. rtype: list If multiclass=True, uses the parameters for LGBMClassifier: Return the predicted value for each sample.
Parameters
- X : arraylike or sparse matrix of shape = [n_samples, n_features]
- Input features matrix.
- raw_score : bool, optional (default=False)
- Whether to predict raw scores.
- start_iteration : int, optional (default=0)
- Start index of the iteration to predict. If <= 0, starts from the first iteration.
- num_iteration : int or None, optional (default=None)
- Total number of iterations used in the prediction.
If None, if the best iteration exists and start_iteration <= 0, the best iteration is used;
otherwise, all iterations from
start_iteration
are used (no limits). If <= 0, all iterations fromstart_iteration
are used (no limits). - pred_leaf : bool, optional (default=False)
- Whether to predict leaf index.
- pred_contrib : bool, optional (default=False)
Whether to predict feature contributions.
Note
If you want to get more explanations for your model’s predictions using SHAP values, like SHAP interaction values, you can install the shap package (https://github.com/slundberg/shap). Note that unlike the shap package, with
pred_contrib
we return a matrix with an extra column, where the last column is the expected value.- kwargs
- Other parameters for the prediction.
Returns
- predicted_result : arraylike of shape = [n_samples] or shape = [n_samples, n_classes]
- The predicted values.
- X_leaves : arraylike of shape = [n_samples, n_trees] or shape = [n_samples, n_trees * n_classes]
- If
pred_leaf=True
, the predicted leaf of every tree for each sample. - X_SHAP_values : arraylike of shape = [n_samples, n_features + 1] or shape = [n_samples, (n_features + 1) * n_classes] or list with n_classes length of such objects
- If
pred_contrib=True
, the feature contributions for each sample.
Otherwise, if multiclass=False, uses the parameters for LGBMRegressor: Return the predicted value for each sample.
Parameters
- X : arraylike or sparse matrix of shape = [n_samples, n_features]
- Input features matrix.
- raw_score : bool, optional (default=False)
- Whether to predict raw scores.
- start_iteration : int, optional (default=0)
- Start index of the iteration to predict. If <= 0, starts from the first iteration.
- num_iteration : int or None, optional (default=None)
- Total number of iterations used in the prediction.
If None, if the best iteration exists and start_iteration <= 0, the best iteration is used;
otherwise, all iterations from
start_iteration
are used (no limits). If <= 0, all iterations fromstart_iteration
are used (no limits). - pred_leaf : bool, optional (default=False)
- Whether to predict leaf index.
- pred_contrib : bool, optional (default=False)
Whether to predict feature contributions.
Note
If you want to get more explanations for your model’s predictions using SHAP values, like SHAP interaction values, you can install the shap package (https://github.com/slundberg/shap). Note that unlike the shap package, with
pred_contrib
we return a matrix with an extra column, where the last column is the expected value.- kwargs
- Other parameters for the prediction.
Returns
- predicted_result : arraylike of shape = [n_samples] or shape = [n_samples, n_classes]
- The predicted values.
- X_leaves : arraylike of shape = [n_samples, n_trees] or shape = [n_samples, n_trees * n_classes]
- If
pred_leaf=True
, the predicted leaf of every tree for each sample. - X_SHAP_values : arraylike of shape = [n_samples, n_features + 1] or shape = [n_samples, (n_features + 1) * n_classes] or list with n_classes length of such objects
- If
pred_contrib=True
, the feature contributions for each sample.
-
predict_proba
(dataset, **kwargs)¶ Call lightgbm predict_proba to predict probabilities using the explainable model.
param dataset: The dataset to predict probabilities on. type dataset: numpy or scipy array return: The predictions of the model. rtype: list If multiclass=True, uses the parameters for LGBMClassifier: Return the predicted probability for each class for each sample.
Parameters
- X : arraylike or sparse matrix of shape = [n_samples, n_features]
- Input features matrix.
- raw_score : bool, optional (default=False)
- Whether to predict raw scores.
- start_iteration : int, optional (default=0)
- Start index of the iteration to predict. If <= 0, starts from the first iteration.
- num_iteration : int or None, optional (default=None)
- Total number of iterations used in the prediction.
If None, if the best iteration exists and start_iteration <= 0, the best iteration is used;
otherwise, all iterations from
start_iteration
are used (no limits). If <= 0, all iterations fromstart_iteration
are used (no limits). - pred_leaf : bool, optional (default=False)
- Whether to predict leaf index.
- pred_contrib : bool, optional (default=False)
Whether to predict feature contributions.
Note
If you want to get more explanations for your model’s predictions using SHAP values, like SHAP interaction values, you can install the shap package (https://github.com/slundberg/shap). Note that unlike the shap package, with
pred_contrib
we return a matrix with an extra column, where the last column is the expected value.- kwargs
- Other parameters for the prediction.
Returns
- predicted_probability : arraylike of shape = [n_samples, n_classes]
- The predicted probability for each class for each sample.
- X_leaves : arraylike of shape = [n_samples, n_trees * n_classes]
- If
pred_leaf=True
, the predicted leaf of every tree for each sample. - X_SHAP_values : arraylike of shape = [n_samples, (n_features + 1) * n_classes] or list with n_classes length of such objects
- If
pred_contrib=True
, the feature contributions for each sample.
Otherwise predict_proba is not supported for regression or binary classification.
-
-
class
interpret_community.mimic.models.
SGDExplainableModel
(multiclass=False, random_state=123, classification=True, **kwargs)¶ Bases:
interpret_community.mimic.models.explainable_model.BaseExplainableModel
-
available_explanations
= ['global', 'local']¶
-
expected_values
¶ Use LinearExplainer to get the expected values.
Returns: The expected values of the linear model. Return type: list
-
explain_global
(**kwargs)¶ Call coef to get the global feature importances from the SGD surrogate model.
Returns: The global explanation of feature importances. Return type: list
-
explain_local
(evaluation_examples, **kwargs)¶ Use LinearExplainer to get the local feature importances from the trained explainable model.
Parameters: evaluation_examples (numpy or scipy array) – The evaluation examples to compute local feature importances for. Returns: The local explanation of feature importances. Return type: Union[list, numpy.ndarray]
-
explainer_type
= 'model'¶ Stochastic Gradient Descent explainable model.
Parameters:
-
fit
(dataset, labels, **kwargs)¶ Call linear fit to fit the explainable model.
Store the mean and covariance of the background data for local explanation.
param dataset: The dataset to train the model on. type dataset: numpy or scipy array param labels: The labels to train the model on. type labels: numpy or scipy array If multiclass=True, uses the parameters for SGDClassifier: Fit linear model with Stochastic Gradient Descent.
Parameters
- X : {arraylike, sparse matrix}, shape (n_samples, n_features)
- Training data.
- y : ndarray of shape (n_samples,)
- Target values.
- coef_init : ndarray of shape (n_classes, n_features), default=None
- The initial coefficients to warmstart the optimization.
- intercept_init : ndarray of shape (n_classes,), default=None
- The initial intercept to warmstart the optimization.
- sample_weight : arraylike, shape (n_samples,), default=None
- Weights applied to individual samples. If not provided, uniform weights are assumed. These weights will be multiplied with class_weight (passed through the constructor) if class_weight is specified.
Returns
- self :
- Returns an instance of self.
Otherwise, if multiclass=False, uses the parameters for SGDRegressor: Fit linear model with Stochastic Gradient Descent.
Parameters
- X : {arraylike, sparse matrix}, shape (n_samples, n_features)
- Training data
- y : ndarray of shape (n_samples,)
- Target values
- coef_init : ndarray of shape (n_features,), default=None
- The initial coefficients to warmstart the optimization.
- intercept_init : ndarray of shape (1,), default=None
- The initial intercept to warmstart the optimization.
- sample_weight : arraylike, shape (n_samples,), default=None
- Weights applied to individual samples (1. for unweighted).
Returns
self : returns an instance of self.
-
model
¶ Retrieve the underlying model.
Returns: The SGD model, either classifier or regressor. Return type: Union[SGDClassifier, SGDRegressor]
-
predict
(dataset, **kwargs)¶ Call SGD predict to predict labels using the explainable model.
param dataset: The dataset to predict on. type dataset: numpy or scipy array return: The predictions of the model. rtype: list If multiclass=True, uses the parameters for SGDClassifier:
Predict class labels for samples in X.
Parameters
- X : array_like or sparse matrix, shape (n_samples, n_features)
- Samples.
Returns
- C : array, shape [n_samples]
- Predicted class label per sample.
Otherwise, if multiclass=False, uses the parameters for SGDRegressor: Predict using the linear model
Parameters
X : {arraylike, sparse matrix}, shape (n_samples, n_features)
Returns
- ndarray of shape (n_samples,)
- Predicted target values per element in X.
-
predict_proba
(dataset, **kwargs)¶ Call SGD predict_proba to predict probabilities using the explainable model.
param dataset: The dataset to predict probabilities on. type dataset: numpy or scipy array return: The predictions of the model. rtype: list If multiclass=True, uses the parameters for SGDClassifier: Probability estimates.
This method is only available for log loss and modified Huber loss.
Multiclass probability estimates are derived from binary (onevs.rest) estimates by simple normalization, as recommended by Zadrozny and Elkan.
Binary probability estimates for loss=”modified_huber” are given by (clip(decision_function(X), 1, 1) + 1) / 2. For other loss functions it is necessary to perform proper probability calibration by wrapping the classifier with CalibratedClassifierCV instead.
Parameters
- X : {arraylike, sparse matrix}, shape (n_samples, n_features)
- Input data for prediction.
Returns
- ndarray of shape (n_samples, n_classes)
- Returns the probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.
References
Zadrozny and Elkan, “Transforming classifier scores into multiclass probability estimates”, SIGKDD‘02, http://www.research.ibm.com/people/z/zadrozny/kdd2002Transf.pdf
The justification for the formula in the loss=”modified_huber” case is in the appendix B in: http://jmlr.csail.mit.edu/papers/volume2/zhang02c/zhang02c.pdf
Otherwise predict_proba is not supported for regression or binary classification.
-
-
class
interpret_community.mimic.models.
LinearExplainableModel
(multiclass=False, random_state=123, classification=True, sparse_data=False, **kwargs)¶ Bases:
interpret_community.mimic.models.explainable_model.BaseExplainableModel
-
available_explanations
= ['global', 'local']¶
-
expected_values
¶ Use LinearExplainer to get the expected values.
Returns: The expected values of the linear model. Return type: list
-
explain_global
(**kwargs)¶ Call coef to get the global feature importances from the linear surrogate model.
Returns: The global explanation of feature importances. Return type: list
-
explain_local
(evaluation_examples, **kwargs)¶ Use LinearExplainer to get the local feature importances from the trained explainable model.
Parameters: evaluation_examples (numpy or scipy array) – The evaluation examples to compute local feature importances for. Returns: The local explanation of feature importances. Return type: Union[list, numpy.ndarray]
-
static
explainable_model_type
(self)¶ Retrieve the model type.
Returns: Linear explainable model type. Return type: ExplainableModelType
-
explainer_type
= 'model'¶ Linear explainable model.
Parameters:
-
fit
(dataset, labels, **kwargs)¶ Call linear fit to fit the explainable model.
Store the mean and covariance of the background data for local explanation.
param dataset: The dataset to train the model on. type dataset: numpy or scipy array param labels: The labels to train the model on. type labels: numpy or scipy array If multiclass=True, uses the parameters for LogisticRegression:
Fit the model according to the given training data.
Parameters
- X : {arraylike, sparse matrix} of shape (n_samples, n_features)
- Training vector, where n_samples is the number of samples and n_features is the number of features.
- y : arraylike of shape (n_samples,)
- Target vector relative to X.
- sample_weight : arraylike of shape (n_samples,) default=None
Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
New in version 0.17: sample_weight support to LogisticRegression.
Returns
- self
- Fitted estimator.
Notes
The SAGA solver supports both float64 and float32 bit arrays.
Otherwise, if multiclass=False, uses the parameters for LinearRegression:
Fit linear model.
Parameters
- X : {arraylike, sparse matrix} of shape (n_samples, n_features)
- Training data
- y : arraylike of shape (n_samples,) or (n_samples, n_targets)
- Target values. Will be cast to X’s dtype if necessary
- sample_weight : arraylike of shape (n_samples,), default=None
Individual weights for each sample
New in version 0.17: parameter sample_weight support to LinearRegression.
Returns
self : returns an instance of self.
-
model
¶ Retrieve the underlying model.
Returns: The linear model, either classifier or regressor. Return type: Union[LogisticRegression, LinearRegression]
-
predict
(dataset, **kwargs)¶ Call linear predict to predict labels using the explainable model.
param dataset: The dataset to predict on. type dataset: numpy or scipy array return: The predictions of the model. rtype: list If multiclass=True, uses the parameters for LogisticRegression:
Predict class labels for samples in X.
Parameters
- X : array_like or sparse matrix, shape (n_samples, n_features)
- Samples.
Returns
- C : array, shape [n_samples]
- Predicted class label per sample.
Otherwise, if multiclass=False, uses the parameters for LinearRegression:
Predict using the linear model.
Parameters
- X : array_like or sparse matrix, shape (n_samples, n_features)
- Samples.
Returns
- C : array, shape (n_samples,)
- Returns predicted values.
-
predict_proba
(dataset, **kwargs)¶ Call linear predict_proba to predict probabilities using the explainable model.
param dataset: The dataset to predict probabilities on. type dataset: numpy or scipy array return: The predictions of the model. rtype: list If multiclass=True, uses the parameters for LogisticRegression:
Probability estimates.
The returned estimates for all classes are ordered by the label of classes.
For a multi_class problem, if multi_class is set to be “multinomial” the softmax function is used to find the predicted probability of each class. Else use a onevsrest approach, i.e calculate the probability of each class assuming it to be positive using the logistic function. and normalize these values across all the classes.
Parameters
- X : arraylike of shape (n_samples, n_features)
- Vector to be scored, where n_samples is the number of samples and n_features is the number of features.
Returns
- T : arraylike of shape (n_samples, n_classes)
- Returns the probability of the sample for each class in the model,
where classes are ordered as they are in
self.classes_
.
Otherwise predict_proba is not supported for regression or binary classification.
-
-
class
interpret_community.mimic.models.
DecisionTreeExplainableModel
(multiclass=False, random_state=123, shap_values_output=<ShapValuesOutput.DEFAULT: 'default'>, classification=True, **kwargs)¶ Bases:
interpret_community.mimic.models.explainable_model.BaseExplainableModel
-
available_explanations
= ['global', 'local']¶
-
expected_values
¶ Use TreeExplainer to get the expected values.
Returns: The expected values of the decision tree tree model. Return type: list
-
explain_global
(**kwargs)¶ Call tree model feature importances to get the global feature importances from the tree surrogate model.
Returns: The global explanation of feature importances. Return type: list
-
explain_local
(evaluation_examples, probabilities=None, **kwargs)¶ Use TreeExplainer to get the local feature importances from the trained explainable model.
Parameters: - evaluation_examples (numpy or scipy array) – The evaluation examples to compute local feature importances for.
- probabilities (numpy.ndarray) – If output_type is probability, can specify the teacher model’s probability for scaling the shap values.
Returns: The local explanation of feature importances.
Return type: Union[list, numpy.ndarray]
-
static
explainable_model_type
(self)¶ Retrieve the model type.
Returns: Tree explainable model type. Return type: ExplainableModelType
-
explainer_type
= 'model'¶ Decision Tree explainable model.
Parameters: - multiclass (bool) – Set to true to generate a multiclass model.
- random_state (int) – Int to seed the model.
- shap_values_output (interpret_community.common.constants.ShapValuesOutput) – The type of the output from explain_local when using TreeExplainer. Currently only types ‘default’, ‘probability’ and ‘teacher_probability’ are supported. If ‘probability’ is specified, then we approximately scale the raw log-odds values from the TreeExplainer to probabilities.
- classification (bool) – Indicates if this is a classification or regression explanation.
-
fit
(dataset, labels, **kwargs)¶ Call tree fit to fit the explainable model.
param dataset: The dataset to train the model on. type dataset: numpy or scipy array param labels: The labels to train the model on. type labels: numpy or scipy array If multiclass=True, uses the parameters for DecisionTreeClassifier: Build a decision tree classifier from the training set (X, y).
Parameters
- X : {arraylike, sparse matrix} of shape (n_samples, n_features)
- The training input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsc_matrix
. - y : arraylike of shape (n_samples,) or (n_samples, n_outputs)
- The target values (class labels) as integers or strings.
- sample_weight : arraylike of shape (n_samples,), default=None
- Sample weights. If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
- check_input : bool, default=True
- Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- X_idx_sorted : arraylike of shape (n_samples, n_features), default=None
- The indexes of the sorted training input samples. If many tree are grown on the same dataset, this allows the ordering to be cached between trees. If None, the data will be sorted here. Don’t use this parameter unless you know what to do.
Returns
- self : DecisionTreeClassifier
- Fitted estimator.
Otherwise, if multiclass=False, uses the parameters for DecisionTreeRegressor: Build a decision tree regressor from the training set (X, y).
Parameters
- X : {arraylike, sparse matrix} of shape (n_samples, n_features)
- The training input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsc_matrix
. - y : arraylike of shape (n_samples,) or (n_samples, n_outputs)
- The target values (real numbers). Use
dtype=np.float64
andorder='C'
for maximum efficiency. - sample_weight : arraylike of shape (n_samples,), default=None
- Sample weights. If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node.
- check_input : bool, default=True
- Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- X_idx_sorted : arraylike of shape (n_samples, n_features), default=None
- The indexes of the sorted training input samples. If many tree are grown on the same dataset, this allows the ordering to be cached between trees. If None, the data will be sorted here. Don’t use this parameter unless you know what to do.
Returns
- self : DecisionTreeRegressor
- Fitted estimator.
-
model
¶ Retrieve the underlying model.
Returns: The decision tree model, either classifier or regressor. Return type: Union[DecisionTreeClassifier, DecisionTreeRegressor]
-
predict
(dataset, **kwargs)¶ Call tree predict to predict labels using the explainable model.
param dataset: The dataset to predict on. type dataset: numpy or scipy array return: The predictions of the model. rtype: list If multiclass=True, uses the parameters for DecisionTreeClassifier: Predict class or regression value for X.
For a classification model, the predicted class for each sample in X is returned. For a regression model, the predicted value based on X is returned.
Parameters
- X : {arraylike, sparse matrix} of shape (n_samples, n_features)
- The input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsr_matrix
. - check_input : bool, default=True
- Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
Returns
- y : arraylike of shape (n_samples,) or (n_samples, n_outputs)
- The predicted classes, or the predict values.
Otherwise, if multiclass=False, uses the parameters for DecisionTreeRegressor: Predict class or regression value for X.
For a classification model, the predicted class for each sample in X is returned. For a regression model, the predicted value based on X is returned.
Parameters
- X : {arraylike, sparse matrix} of shape (n_samples, n_features)
- The input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsr_matrix
. - check_input : bool, default=True
- Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
Returns
- y : arraylike of shape (n_samples,) or (n_samples, n_outputs)
- The predicted classes, or the predict values.
-
predict_proba
(dataset, **kwargs)¶ Call tree predict_proba to predict probabilities using the explainable model.
param dataset: The dataset to predict probabilities on. type dataset: numpy or scipy array return: The predictions of the model. rtype: list If multiclass=True, uses the parameters for DecisionTreeClassifier: Predict class probabilities of the input samples X.
The predicted class probability is the fraction of samples of the same class in a leaf.
Parameters
- X : {arraylike, sparse matrix} of shape (n_samples, n_features)
- The input samples. Internally, it will be converted to
dtype=np.float32
and if a sparse matrix is provided to a sparsecsr_matrix
. - check_input : bool, default=True
- Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
Returns
- proba : ndarray of shape (n_samples, n_classes) or list of n_outputs such arrays if n_outputs > 1
- The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
Otherwise predict_proba is not supported for regression or binary classification.
-