interpret_community.common.gpu_kmeans module

The code is based on the similar utility function from SHAP: https://github.com/slundberg/shap/blob/9411b68e8057a6c6f3621765b89b24d82bee13d4/shap/utils/_legacy.py This version makes use of cuml kmeans instead of sklearn for speed.

class interpret_community.common.gpu_kmeans.DenseData(data, group_names, *args): Bases: Data

interpret_community.common.gpu_kmeans.kmeans(X, k, round_values=True)

Summarize a dataset with k mean samples weighted by the number of data points they each represent.

Parameters:

X (numpy.ndarray or pandas.DataFrame or any scipy.sparse matrix) – Matrix of data samples to summarize (# samples x # features)
k (int) – Number of means to use for approximation.
round_values (bool) – For all i, round the ith dimension of each mean sample to match the nearest value from X[:,i]. This ensures discrete features always get a valid value.

Returns:

DenseData object.

Return type:

DenseData