interpret_community.mimic.models.lightgbm_model module

Defines an explainable lightgbm model.

class interpret_community.mimic.models.lightgbm_model.LGBMExplainableModel(multiclass=False, random_state=123, shap_values_output=<ShapValuesOutput.DEFAULT: 'default'>, classification=True, **kwargs)

Bases: interpret_community.mimic.models.explainable_model.BaseExplainableModel

available_explanations = ['global', 'local']
expected_values

Use TreeExplainer to get the expected values.

Returns:The expected values of the LightGBM tree model.
Return type:list
explain_global(**kwargs)

Call lightgbm feature importances to get the global feature importances from the explainable model.

Returns:The global explanation of feature importances.
Return type:numpy.ndarray
explain_local(evaluation_examples, probabilities=None, **kwargs)

Use TreeExplainer to get the local feature importances from the trained explainable model.

Parameters:
  • evaluation_examples (numpy or scipy array) – The evaluation examples to compute local feature importances for.
  • probabilities (numpy.ndarray) – If output_type is probability, can specify the teacher model’s probability for scaling the shap values.
Returns:

The local explanation of feature importances.

Return type:

Union[list, numpy.ndarray]

static explainable_model_type(self)

Retrieve the model type.

Returns:Tree explainable model type.
Return type:ExplainableModelType
explainer_type = 'model'

LightGBM (fast, high performance framework based on decision tree) explainable model.

Please see documentation for more details: https://github.com/Microsoft/LightGBM

Additional arguments to LightGBMClassifier and LightGBMRegressor can be passed through kwargs.

Parameters:
  • multiclass (bool) – Set to true to generate a multiclass model.
  • random_state (int) – Int to seed the model.
  • shap_values_output (interpret_community.common.constants.ShapValuesOutput) – The type of the output from explain_local when using TreeExplainer. Currently only types ‘default’, ‘probability’ and ‘teacher_probability’ are supported. If ‘probability’ is specified, then we approximately scale the raw log-odds values from the TreeExplainer to probabilities.
  • classification (bool) – Indicates if this is a classification or regression explanation.
fit(dataset, labels, **kwargs)

Call lightgbm fit to fit the explainable model.

param dataset:The dataset to train the model on.
type dataset:numpy or scipy array
param labels:The labels to train the model on.
type labels:numpy or scipy array

If multiclass=True, uses the parameters for LGBMClassifier: Build a gradient boosting model from the training set (X, y).

Parameters

X : arraylike or sparse matrix of shape = [n_samples, n_features]
Input feature matrix.
y : arraylike of shape = [n_samples]
The target values (class labels in classification, real numbers in regression).
sample_weight : arraylike of shape = [n_samples] or None, optional (default=None)
Weights of training data.
init_score : arraylike of shape = [n_samples] or None, optional (default=None)
Init score of training data.
eval_set : list or None, optional (default=None)
A list of (X, y) tuple pairs to use as validation sets.
eval_names : list of strings or None, optional (default=None)
Names of eval_set.
eval_sample_weight : list of arrays or None, optional (default=None)
Weights of eval data.
eval_class_weight : list or None, optional (default=None)
Class weights of eval data.
eval_init_score : list of arrays or None, optional (default=None)
Init score of eval data.
eval_metric : string, callable, list or None, optional (default=None)
If string, it should be a builtin evaluation metric to use. If callable, it should be a custom evaluation metric, see note below for more details. If list, it can be a list of builtin metrics, a list of custom evaluation metrics, or a mix of both. In either case, the metric from the model parameters will be evaluated and used as well. Default: ‘l2’ for LGBMRegressor, ‘logloss’ for LGBMClassifier, ‘ndcg’ for LGBMRanker.
early_stopping_rounds : int or None, optional (default=None)
Activates early stopping. The model will train until the validation score stops improving. Validation score needs to improve at least every early_stopping_rounds round(s) to continue training. Requires at least one validation data and one metric. If there’s more than one, will check all of them. But the training data is ignored anyway. To check only the first metric, set the first_metric_only parameter to True in additional parameters kwargs of the model constructor.
verbose : bool or int, optional (default=True)

Requires at least one evaluation data. If True, the eval metric on the eval set is printed at each boosting stage. If int, the eval metric on the eval set is printed at every verbose boosting stage. The last boosting stage or the boosting stage found by using early_stopping_rounds is also printed.

Example

With verbose = 4 and at least one item in eval_set, an evaluation metric is printed every 4 (instead of 1) boosting stages.

feature_name : list of strings or ‘auto’, optional (default=’auto’)
Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used.
categorical_feature : list of strings or int, or ‘auto’, optional (default=’auto’)
Categorical features. If list of int, interpreted as indices. If list of strings, interpreted as feature names (need to specify feature_name as well). If ‘auto’ and data is pandas DataFrame, pandas unordered categorical columns are used. All values in categorical features should be less than int32 max value (2147483647). Large values could be memory consuming. Consider using consecutive integers starting from zero. All negative values in categorical features will be treated as missing values. The output cannot be monotonically constrained with respect to a categorical feature.
callbacks : list of callback functions or None, optional (default=None)
List of callback functions that are applied at each iteration. See Callbacks in Python API for more information.
init_model : string, Booster, LGBMModel or None, optional (default=None)
Filename of LightGBM model, Booster instance or LGBMModel instance used for continue training.

Returns

self : object
Returns self.

Note

Custom eval function expects a callable with following signatures: func(y_true, y_pred), func(y_true, y_pred, weight) or func(y_true, y_pred, weight, group) and returns (eval_name, eval_result, is_higher_better) or list of (eval_name, eval_result, is_higher_better):

y_true : arraylike of shape = [n_samples]
The target values.
y_pred : arraylike of shape = [n_samples] or shape = [n_samples * n_classes] (for multiclass task)
The predicted values.
weight : arraylike of shape = [n_samples]
The weight of samples.
group : arraylike
Group/query data, used for ranking task.
eval_name : string
The name of evaluation function (without whitespaces).
eval_result : float
The eval result.
is_higher_better : bool
Is eval result higher better, e.g. AUC is is_higher_better.

For binary task, the y_pred is probability of positive class (or margin in case of custom objective). For multiclass task, the y_pred is group by class_id first, then group by row_id. If you want to get ith row y_pred in jth class, the access way is y_pred[j * num_data + i].

Otherwise, if multiclass=False, uses the parameters for LGBMRegressor: Build a gradient boosting model from the training set (X, y).

Parameters

X : arraylike or sparse matrix of shape = [n_samples, n_features]
Input feature matrix.
y : arraylike of shape = [n_samples]
The target values (class labels in classification, real numbers in regression).
sample_weight : arraylike of shape = [n_samples] or None, optional (default=None)
Weights of training data.
init_score : arraylike of shape = [n_samples] or None, optional (default=None)
Init score of training data.
eval_set : list or None, optional (default=None)
A list of (X, y) tuple pairs to use as validation sets.
eval_names : list of strings or None, optional (default=None)
Names of eval_set.
eval_sample_weight : list of arrays or None, optional (default=None)
Weights of eval data.
eval_init_score : list of arrays or None, optional (default=None)
Init score of eval data.
eval_metric : string, callable, list or None, optional (default=None)
If string, it should be a builtin evaluation metric to use. If callable, it should be a custom evaluation metric, see note below for more details. If list, it can be a list of builtin metrics, a list of custom evaluation metrics, or a mix of both. In either case, the metric from the model parameters will be evaluated and used as well. Default: ‘l2’ for LGBMRegressor, ‘logloss’ for LGBMClassifier, ‘ndcg’ for LGBMRanker.
early_stopping_rounds : int or None, optional (default=None)
Activates early stopping. The model will train until the validation score stops improving. Validation score needs to improve at least every early_stopping_rounds round(s) to continue training. Requires at least one validation data and one metric. If there’s more than one, will check all of them. But the training data is ignored anyway. To check only the first metric, set the first_metric_only parameter to True in additional parameters kwargs of the model constructor.
verbose : bool or int, optional (default=True)

Requires at least one evaluation data. If True, the eval metric on the eval set is printed at each boosting stage. If int, the eval metric on the eval set is printed at every verbose boosting stage. The last boosting stage or the boosting stage found by using early_stopping_rounds is also printed.

Example

With verbose = 4 and at least one item in eval_set, an evaluation metric is printed every 4 (instead of 1) boosting stages.

feature_name : list of strings or ‘auto’, optional (default=’auto’)
Feature names. If ‘auto’ and data is pandas DataFrame, data columns names are used.
categorical_feature : list of strings or int, or ‘auto’, optional (default=’auto’)
Categorical features. If list of int, interpreted as indices. If list of strings, interpreted as feature names (need to specify feature_name as well). If ‘auto’ and data is pandas DataFrame, pandas unordered categorical columns are used. All values in categorical features should be less than int32 max value (2147483647). Large values could be memory consuming. Consider using consecutive integers starting from zero. All negative values in categorical features will be treated as missing values. The output cannot be monotonically constrained with respect to a categorical feature.
callbacks : list of callback functions or None, optional (default=None)
List of callback functions that are applied at each iteration. See Callbacks in Python API for more information.
init_model : string, Booster, LGBMModel or None, optional (default=None)
Filename of LightGBM model, Booster instance or LGBMModel instance used for continue training.

Returns

self : object
Returns self.

Note

Custom eval function expects a callable with following signatures: func(y_true, y_pred), func(y_true, y_pred, weight) or func(y_true, y_pred, weight, group) and returns (eval_name, eval_result, is_higher_better) or list of (eval_name, eval_result, is_higher_better):

y_true : arraylike of shape = [n_samples]
The target values.
y_pred : arraylike of shape = [n_samples] or shape = [n_samples * n_classes] (for multiclass task)
The predicted values.
weight : arraylike of shape = [n_samples]
The weight of samples.
group : arraylike
Group/query data, used for ranking task.
eval_name : string
The name of evaluation function (without whitespaces).
eval_result : float
The eval result.
is_higher_better : bool
Is eval result higher better, e.g. AUC is is_higher_better.

For binary task, the y_pred is probability of positive class (or margin in case of custom objective). For multiclass task, the y_pred is group by class_id first, then group by row_id. If you want to get ith row y_pred in jth class, the access way is y_pred[j * num_data + i].

model

Retrieve the underlying model.

Returns:The lightgbm model, either classifier or regressor.
Return type:Union[LGBMClassifier, LGBMRegressor]
predict(dataset, **kwargs)

Call lightgbm predict to predict labels using the explainable model.

param dataset:The dataset to predict on.
type dataset:numpy or scipy array
return:The predictions of the model.
rtype:list

If multiclass=True, uses the parameters for LGBMClassifier: Return the predicted value for each sample.

Parameters

X : arraylike or sparse matrix of shape = [n_samples, n_features]
Input features matrix.
raw_score : bool, optional (default=False)
Whether to predict raw scores.
start_iteration : int, optional (default=0)
Start index of the iteration to predict. If <= 0, starts from the first iteration.
num_iteration : int or None, optional (default=None)
Total number of iterations used in the prediction. If None, if the best iteration exists and start_iteration <= 0, the best iteration is used; otherwise, all iterations from start_iteration are used (no limits). If <= 0, all iterations from start_iteration are used (no limits).
pred_leaf : bool, optional (default=False)
Whether to predict leaf index.
pred_contrib : bool, optional (default=False)

Whether to predict feature contributions.

Note

If you want to get more explanations for your model’s predictions using SHAP values, like SHAP interaction values, you can install the shap package (https://github.com/slundberg/shap). Note that unlike the shap package, with pred_contrib we return a matrix with an extra column, where the last column is the expected value.

kwargs
Other parameters for the prediction.

Returns

predicted_result : arraylike of shape = [n_samples] or shape = [n_samples, n_classes]
The predicted values.
X_leaves : arraylike of shape = [n_samples, n_trees] or shape = [n_samples, n_trees * n_classes]
If pred_leaf=True, the predicted leaf of every tree for each sample.
X_SHAP_values : arraylike of shape = [n_samples, n_features + 1] or shape = [n_samples, (n_features + 1) * n_classes] or list with n_classes length of such objects
If pred_contrib=True, the feature contributions for each sample.

Otherwise, if multiclass=False, uses the parameters for LGBMRegressor: Return the predicted value for each sample.

Parameters

X : arraylike or sparse matrix of shape = [n_samples, n_features]
Input features matrix.
raw_score : bool, optional (default=False)
Whether to predict raw scores.
start_iteration : int, optional (default=0)
Start index of the iteration to predict. If <= 0, starts from the first iteration.
num_iteration : int or None, optional (default=None)
Total number of iterations used in the prediction. If None, if the best iteration exists and start_iteration <= 0, the best iteration is used; otherwise, all iterations from start_iteration are used (no limits). If <= 0, all iterations from start_iteration are used (no limits).
pred_leaf : bool, optional (default=False)
Whether to predict leaf index.
pred_contrib : bool, optional (default=False)

Whether to predict feature contributions.

Note

If you want to get more explanations for your model’s predictions using SHAP values, like SHAP interaction values, you can install the shap package (https://github.com/slundberg/shap). Note that unlike the shap package, with pred_contrib we return a matrix with an extra column, where the last column is the expected value.

kwargs
Other parameters for the prediction.

Returns

predicted_result : arraylike of shape = [n_samples] or shape = [n_samples, n_classes]
The predicted values.
X_leaves : arraylike of shape = [n_samples, n_trees] or shape = [n_samples, n_trees * n_classes]
If pred_leaf=True, the predicted leaf of every tree for each sample.
X_SHAP_values : arraylike of shape = [n_samples, n_features + 1] or shape = [n_samples, (n_features + 1) * n_classes] or list with n_classes length of such objects
If pred_contrib=True, the feature contributions for each sample.
predict_proba(dataset, **kwargs)

Call lightgbm predict_proba to predict probabilities using the explainable model.

param dataset:The dataset to predict probabilities on.
type dataset:numpy or scipy array
return:The predictions of the model.
rtype:list

If multiclass=True, uses the parameters for LGBMClassifier: Return the predicted probability for each class for each sample.

Parameters

X : arraylike or sparse matrix of shape = [n_samples, n_features]
Input features matrix.
raw_score : bool, optional (default=False)
Whether to predict raw scores.
start_iteration : int, optional (default=0)
Start index of the iteration to predict. If <= 0, starts from the first iteration.
num_iteration : int or None, optional (default=None)
Total number of iterations used in the prediction. If None, if the best iteration exists and start_iteration <= 0, the best iteration is used; otherwise, all iterations from start_iteration are used (no limits). If <= 0, all iterations from start_iteration are used (no limits).
pred_leaf : bool, optional (default=False)
Whether to predict leaf index.
pred_contrib : bool, optional (default=False)

Whether to predict feature contributions.

Note

If you want to get more explanations for your model’s predictions using SHAP values, like SHAP interaction values, you can install the shap package (https://github.com/slundberg/shap). Note that unlike the shap package, with pred_contrib we return a matrix with an extra column, where the last column is the expected value.

kwargs
Other parameters for the prediction.

Returns

predicted_probability : arraylike of shape = [n_samples, n_classes]
The predicted probability for each class for each sample.
X_leaves : arraylike of shape = [n_samples, n_trees * n_classes]
If pred_leaf=True, the predicted leaf of every tree for each sample.
X_SHAP_values : arraylike of shape = [n_samples, (n_features + 1) * n_classes] or list with n_classes length of such objects
If pred_contrib=True, the feature contributions for each sample.

Otherwise predict_proba is not supported for regression or binary classification.