rank#

Methods to rank examples in standard (multi-class) classification datasets by cleanlab’s label quality score. Except for order_label_issues, which operates only on the subset of the data identified as potential label issues/errors, the methods in this module can be used on whichever subset of the dataset you choose (including the entire dataset) and provide a label quality score for every example. You can then do something like: np.argsort(label_quality_score) to obtain ranked indices of individual datapoints based on their quality.

Note: multi-label classification is not supported by most methods in this module, each example must be labeled as belonging to a single class, e.g. format: labels = np.ndarray([1,0,2,1,1,0...]).

Note: Label quality scores are most accurate when they are computed based on out-of-sample pred_probs from your model. To obtain out-of-sample predicted probabilities for every datapoint in your dataset, you can use cross-validation. This is encouraged to get better results.

Functions:

`find_top_issues`(quality_scores, *[, top])	Returns the sorted indices of the top issues in quality_scores, ordered from smallest to largest quality score (i.e., from most to least likely to be an issue).
`get_confidence_weighted_entropy_for_each_label`(...)	Returns the "confidence weighted entropy" label-quality score for each datapoint.
`get_label_quality_ensemble_scores`(labels, ...)	Returns label quality scores based on predictions from an ensemble of models.
`get_label_quality_scores`(labels, pred_probs, *)	Returns a label quality score for each datapoint.
`get_normalized_margin_for_each_label`(labels, ...)	Returns the "normalized margin" label-quality score for each datapoint.
`get_self_confidence_for_each_label`(labels, ...)	Returns the self-confidence label-quality score for each datapoint.
`order_label_issues`(label_issues_mask, ...[, ...])	Sorts label issues by label quality score.

cleanlab.rank.find_top_issues(quality_scores, *, top=10)[source]#

Returns the sorted indices of the top issues in quality_scores, ordered from smallest to largest quality score (i.e., from most to least likely to be an issue). For example, the first value returned is the index corresponding to the smallest value in quality_scores (most likely to be an issue). The second value in the returned array is the index corresponding to the second smallest value in quality-scores (second-most likely to be an issue), and so forth.

This method assumes that quality_scores shares an index with some dataset such that the indices returned by this method map to the examples in that dataset.

Parameters:

quality_scores (ndarray) – Array of shape (N,), where N is the number of examples, containing one quality score for each example in the dataset.
top (int) – The number of indices to return.

Return type:

ndarray

Returns:

top_issue_indices – Indices of top examples most likely to suffer from an issue (ranked by issue severity).

cleanlab.rank.get_confidence_weighted_entropy_for_each_label(labels, pred_probs)[source]#

Returns the “confidence weighted entropy” label-quality score for each datapoint.

This is a function to compute label-quality scores for classification datasets, where lower scores indicate labels less likely to be correct.

“confidence weighted entropy” is the normalized entropy divided by “self-confidence”.

Parameters:

labels (np.ndarray) – Labels in the same format expected by the get_label_quality_scores function.
pred_probs (np.ndarray) – Predicted-probabilities in the same format expected by the get_label_quality_scores function.

Return type:

ndarray

Returns:

label_quality_scores (np.ndarray) – Contains one score (between 0 and 1) per example. Lower scores indicate more likely mislabeled examples.

cleanlab.rank.get_label_quality_ensemble_scores(labels, pred_probs_list, *, method='self_confidence', adjust_pred_probs=False, weight_ensemble_members_by='accuracy', custom_weights=None, log_loss_search_T_values=[0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 200.0], verbose=True)[source]#

Returns label quality scores based on predictions from an ensemble of models.

This is a function to compute label-quality scores for classification datasets, where lower scores indicate labels less likely to be correct.

Ensemble scoring requires a list of pred_probs from each model in the ensemble.

For each pred_probs in list, compute label quality score. Take the average of the scores with the chosen weighting scheme determined by weight_ensemble_members_by.

Score is between 0 and 1:

1 — clean label (given label is likely correct).
0 — dirty label (given label is likely incorrect).

Parameters:

labels (np.ndarray) – Labels in the same format expected by the get_label_quality_scores function.
pred_probs_list (List[np.ndarray]) – Each element in this list should be an array of pred_probs in the same format expected by the get_label_quality_scores function. Each element of pred_probs_list corresponds to the predictions from one model for all examples.
method ({"self_confidence", "normalized_margin", "confidence_weighted_entropy"}, default "self_confidence") – Label quality scoring method. See get_label_quality_scores for scenarios on when to use each method.
adjust_pred_probs (bool, optional) – adjust_pred_probs in the same format expected by the get_label_quality_scores function.
weight_ensemble_members_by ({"uniform", "accuracy", "log_loss_search", "custom"}, default "accuracy") –
Weighting scheme used to aggregate scores from each model:
- ”uniform”: Take the simple average of scores.
- ”accuracy”: Take weighted average of scores, weighted by model accuracy.
- ”log_loss_search”: Take weighted average of scores, weighted by exp(t * -log_loss) where t is selected from log_loss_search_T_values parameter and log_loss is the log-loss between a model’s pred_probs and the given labels.
- ”custom”: Take weighted average of scores using custom weights that the user passes to the custom_weights parameter.
custom_weights (np.ndarray, default None) – Weights used to aggregate scores from each model if weight_ensemble_members_by=”custom”. Length of this array must match the number of models: len(pred_probs_list).
log_loss_search_T_values (List, default [1e-4, 1e-3, 1e-2, 1e-1, 1e0, 1e1, 1e2, 2e2]) – List of t values considered if weight_ensemble_members_by=”log_loss_search”. We will choose the value of t that leads to weights which produce the best log-loss when used to form a weighted average of pred_probs from the models.
verbose (bool, default True) – Set to False to suppress all print statements.

Return type:

ndarray

Returns:

label_quality_scores (np.ndarray) – Contains one score (between 0 and 1) per example. Lower scores indicate more likely mislabeled examples.