interpret_community.mimic.models.linear_model module¶
Defines an explainable linear model.
-
class
interpret_community.mimic.models.linear_model.
LinearExplainableModel
(multiclass=False, random_state=123, classification=True, sparse_data=False, **kwargs)¶ Bases:
interpret_community.mimic.models.explainable_model.BaseExplainableModel
-
available_explanations
= ['global', 'local']¶
-
expected_values
¶ Use LinearExplainer to get the expected values.
Returns: The expected values of the linear model. Return type: list
-
explain_global
(**kwargs)¶ Call coef to get the global feature importances from the linear surrogate model.
Returns: The global explanation of feature importances. Return type: list
-
explain_local
(evaluation_examples, **kwargs)¶ Use LinearExplainer to get the local feature importances from the trained explainable model.
Parameters: evaluation_examples (numpy.array or pandas.DataFrame or scipy.sparse.csr_matrix) – The evaluation examples to compute local feature importances for. Returns: The local explanation of feature importances. Return type: Union[list, numpy.ndarray]
-
static
explainable_model_type
()¶ Retrieve the model type.
Returns: Linear explainable model type. Return type: ExplainableModelType
-
explainer_type
= 'model'¶ Linear explainable model.
Parameters:
-
fit
(dataset, labels, **kwargs)¶ Call linear fit to fit the explainable model.
Store the mean and covariance of the background data for local explanation.
param dataset: The dataset to train the model on. type dataset: numpy.array or pandas.DataFrame or scipy.sparse.csr_matrix param labels: The labels to train the model on. type labels: numpy.array If multiclass=True, uses the parameters for LogisticRegression:
Fit the model according to the given training data.
Parameters
- X : {arraylike, sparse matrix} of shape (n_samples, n_features)
- Training vector, where n_samples is the number of samples and n_features is the number of features.
- y : arraylike of shape (n_samples,)
- Target vector relative to X.
- sample_weight : arraylike of shape (n_samples,) default=None
Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.
New in version 0.17: sample_weight support to LogisticRegression.
Returns
- self
- Fitted estimator.
Notes
The SAGA solver supports both float64 and float32 bit arrays.
Otherwise, if multiclass=False, uses the parameters for LinearRegression:
Fit linear model.
Parameters
- X : {arraylike, sparse matrix} of shape (n_samples, n_features)
- Training data
- y : arraylike of shape (n_samples,) or (n_samples, n_targets)
- Target values. Will be cast to X’s dtype if necessary
- sample_weight : arraylike of shape (n_samples,), default=None
Individual weights for each sample
New in version 0.17: parameter sample_weight support to LinearRegression.
Returns
self : returns an instance of self.
-
model
¶ Retrieve the underlying model.
Returns: The linear model, either classifier or regressor. Return type: Union[LogisticRegression, LinearRegression]
-
predict
(dataset, **kwargs)¶ Call linear predict to predict labels using the explainable model.
param dataset: The dataset to predict on. type dataset: numpy.array or pandas.DataFrame or scipy.sparse.csr_matrix return: The predictions of the model. rtype: list If multiclass=True, uses the parameters for LogisticRegression:
Predict class labels for samples in X.
Parameters
- X : arraylike or sparse matrix, shape (n_samples, n_features)
- Samples.
Returns
- C : array, shape [n_samples]
- Predicted class label per sample.
Otherwise, if multiclass=False, uses the parameters for LinearRegression:
Predict using the linear model.
Parameters
- X : arraylike or sparse matrix, shape (n_samples, n_features)
- Samples.
Returns
- C : array, shape (n_samples,)
- Returns predicted values.
-
predict_proba
(dataset, **kwargs)¶ Call linear predict_proba to predict probabilities using the explainable model.
param dataset: The dataset to predict probabilities on. type dataset: numpy.array or pandas.DataFrame or scipy.sparse.csr_matrix return: The predictions of the model. rtype: list If multiclass=True, uses the parameters for LogisticRegression:
Probability estimates.
The returned estimates for all classes are ordered by the label of classes.
For a multi_class problem, if multi_class is set to be “multinomial” the softmax function is used to find the predicted probability of each class. Else use a onevsrest approach, i.e calculate the probability of each class assuming it to be positive using the logistic function. and normalize these values across all the classes.
Parameters
- X : arraylike of shape (n_samples, n_features)
- Vector to be scored, where n_samples is the number of samples and n_features is the number of features.
Returns
- T : arraylike of shape (n_samples, n_classes)
- Returns the probability of the sample for each class in the model,
where classes are ordered as they are in
self.classes_
.
Otherwise predict_proba is not supported for regression or binary classification.
-
-
class
interpret_community.mimic.models.linear_model.
LinearExplainer
(model, data, feature_dependence='interventional')¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Linear explainer with support for sparse data and sparse output.
-
shap_values
(evaluation_examples)¶ Estimate the SHAP values for a set of samples.
Parameters: evaluation_examples (numpy.array or pandas.DataFrame or scipy.sparse.csr_matrix) – The evaluation examples. Returns: For models with a single output this returns a matrix of SHAP values (# samples x # features). Each row sums to the difference between the model output for that sample and the expected value of the model output (which is stored as expected_value attribute of the explainer). Return type: Union[list, numpy.ndarray]
-
-
class
interpret_community.mimic.models.linear_model.
SGDExplainableModel
(multiclass=False, random_state=123, classification=True, **kwargs)¶ Bases:
interpret_community.mimic.models.explainable_model.BaseExplainableModel
-
available_explanations
= ['global', 'local']¶
-
expected_values
¶ Use LinearExplainer to get the expected values.
Returns: The expected values of the linear model. Return type: list
-
explain_global
(**kwargs)¶ Call coef to get the global feature importances from the SGD surrogate model.
Returns: The global explanation of feature importances. Return type: list
-
explain_local
(evaluation_examples, **kwargs)¶ Use LinearExplainer to get the local feature importances from the trained explainable model.
Parameters: evaluation_examples (numpy.array or pandas.DataFrame or scipy.sparse.csr_matrix) – The evaluation examples to compute local feature importances for. Returns: The local explanation of feature importances. Return type: Union[list, numpy.ndarray]
-
explainer_type
= 'model'¶ Stochastic Gradient Descent explainable model.
Parameters:
-
fit
(dataset, labels, **kwargs)¶ Call linear fit to fit the explainable model.
Store the mean and covariance of the background data for local explanation.
param dataset: The dataset to train the model on. type dataset: numpy.array or pandas.DataFrame or scipy.sparse.csr_matrix param labels: The labels to train the model on. type labels: numpy.array If multiclass=True, uses the parameters for SGDClassifier: Fit linear model with Stochastic Gradient Descent.
Parameters
- X : {arraylike, sparse matrix}, shape (n_samples, n_features)
- Training data.
- y : ndarray of shape (n_samples,)
- Target values.
- coef_init : ndarray of shape (n_classes, n_features), default=None
- The initial coefficients to warmstart the optimization.
- intercept_init : ndarray of shape (n_classes,), default=None
- The initial intercept to warmstart the optimization.
- sample_weight : arraylike, shape (n_samples,), default=None
- Weights applied to individual samples. If not provided, uniform weights are assumed. These weights will be multiplied with class_weight (passed through the constructor) if class_weight is specified.
Returns
- self :
- Returns an instance of self.
Otherwise, if multiclass=False, uses the parameters for SGDRegressor: Fit linear model with Stochastic Gradient Descent.
Parameters
- X : {arraylike, sparse matrix}, shape (n_samples, n_features)
- Training data
- y : ndarray of shape (n_samples,)
- Target values
- coef_init : ndarray of shape (n_features,), default=None
- The initial coefficients to warmstart the optimization.
- intercept_init : ndarray of shape (1,), default=None
- The initial intercept to warmstart the optimization.
- sample_weight : arraylike, shape (n_samples,), default=None
- Weights applied to individual samples (1. for unweighted).
Returns
self : returns an instance of self.
-
model
¶ Retrieve the underlying model.
Returns: The SGD model, either classifier or regressor. Return type: Union[SGDClassifier, SGDRegressor]
-
predict
(dataset, **kwargs)¶ Call SGD predict to predict labels using the explainable model.
param dataset: The dataset to predict on. type dataset: numpy.array or pandas.DataFrame or scipy.sparse.csr_matrix return: The predictions of the model. rtype: list If multiclass=True, uses the parameters for SGDClassifier:
Predict class labels for samples in X.
Parameters
- X : arraylike or sparse matrix, shape (n_samples, n_features)
- Samples.
Returns
- C : array, shape [n_samples]
- Predicted class label per sample.
Otherwise, if multiclass=False, uses the parameters for SGDRegressor: Predict using the linear model
Parameters
X : {arraylike, sparse matrix}, shape (n_samples, n_features)
Returns
- ndarray of shape (n_samples,)
- Predicted target values per element in X.
-
predict_proba
(dataset, **kwargs)¶ Call SGD predict_proba to predict probabilities using the explainable model.
param dataset: The dataset to predict probabilities on. type dataset: numpy.array or pandas.DataFrame or scipy.sparse.csr_matrix return: The predictions of the model. rtype: list If multiclass=True, uses the parameters for SGDClassifier: Probability estimates.
This method is only available for log loss and modified Huber loss.
Multiclass probability estimates are derived from binary (onevs.rest) estimates by simple normalization, as recommended by Zadrozny and Elkan.
Binary probability estimates for loss=”modified_huber” are given by (clip(decision_function(X), 1, 1) + 1) / 2. For other loss functions it is necessary to perform proper probability calibration by wrapping the classifier with
CalibratedClassifierCV
instead.Parameters
- X : {arraylike, sparse matrix}, shape (n_samples, n_features)
- Input data for prediction.
Returns
- ndarray of shape (n_samples, n_classes)
- Returns the probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.
References
Zadrozny and Elkan, “Transforming classifier scores into multiclass probability estimates”, SIGKDD’02, http://www.research.ibm.com/people/z/zadrozny/kdd2002Transf.pdf
The justification for the formula in the loss=”modified_huber” case is in the appendix B in: http://jmlr.csail.mit.edu/papers/volume2/zhang02c/zhang02c.pdf
Otherwise predict_proba is not supported for regression or binary classification.
-