rank#
Methods to rank examples in standard (multiclass) classification datasets by cleanlab’s label quality score
.
Except for order_label_issues
, which operates only on the subset of the data identified
as potential label issues/errors, the methods in this module can be used on whichever subset
of the dataset you choose (including the entire dataset) and provide a label quality score
for
every example. You can then do something like: np.argsort(label_quality_score)
to obtain ranked
indices of individual datapoints based on their quality.
Note: multilabel classification is not supported by most methods in this module,
each example must be labeled as belonging to a single class, e.g. format: labels = np.ndarray([1,0,2,1,1,0...])
.
For multilabel classification, instead see multilabel_classification.get_label_quality_scores
.
Note: Label quality scores are most accurate when they are computed based on outofsample pred_probs
from your model.
To obtain outofsample predicted probabilities for every datapoint in your dataset, you can use crossvalidation. This is encouraged to get better results.
Functions:

Returns a label quality score for each datapoint. 

Returns label quality scores based on predictions from an ensemble of models. 

Returns the sorted indices of the 

Sorts label issues by label quality score. 

Returns the selfconfidence labelquality score for each datapoint. 

Returns the "normalized margin" labelquality score for each datapoint. 
Returns the "confidence weighted entropy" labelquality score for each datapoint. 
 cleanlab.rank.get_label_quality_scores(labels, pred_probs, *, method='self_confidence', adjust_pred_probs=False)[source]#
Returns a label quality score for each datapoint.
This is a function to compute label quality scores for standard (multiclass) classification datasets, where lower scores indicate labels less likely to be correct.
Score is between 0 and 1.
1  clean label (given label is likely correct). 0  dirty label (given label is likely incorrect).
 Parameters:
labels (
np.ndarray
) – A discrete vector of noisy labels, i.e. some labels may be erroneous. Format requirements: for dataset with K classes, labels must be in 0, 1, …, K1. Note: multilabel classification is not supported by this method, each example must belong to a single class, e.g. format:labels = np.ndarray([1,0,2,1,1,0...])
.pred_probs (
np.ndarray
, optional) –An array of shape
(N, K)
of modelpredicted probabilities,P(label=kx)
. Each row of this matrix corresponds to an example x and contains the modelpredicted probabilities that x belongs to each possible class, for each of the K classes. The columns must be ordered such that these probabilities correspond to class 0, 1, …, K1.Note: Returned label issues are most accurate when they are computed based on outofsample pred_probs from your model. To obtain outofsample predicted probabilities for every datapoint in your dataset, you can use crossvalidation. This is encouraged to get better results.
method (
{"self_confidence", "normalized_margin", "confidence_weighted_entropy"}
, default"self_confidence"
) –Label quality scoring method.
Letting
k = labels[i]
andP = pred_probs[i]
denote the given label and predicted classprobabilities for datapoint i, its score can either be:'normalized_margin'
:P[k]  max_{k' != k}[ P[k'] ]
'self_confidence'
:P[k]
'confidence_weighted_entropy'
:entropy(P) / self_confidence
Note: the actual label quality scores returned by this method may be transformed versions of the above, in order to ensure their values lie between 01 with lower values indicating more likely mislabeled data.
Let
C = {0, 1, ..., K1}
be the set of classes specified for our classification task.The normalized_margin score works better for identifying class conditional label errors, i.e. examples for which another label in
C
is appropriate but the given label is not.The self_confidence score works better for identifying alternative label issues corresponding to bad examples that are: not from any of the classes in
C
, welldescribed by 2 or more labels inC
, or generally just outofdistribution (i.e. anomalous outliers).adjust_pred_probs (
bool
, optional) – Account for class imbalance in the labelquality scoring by adjusting predicted probabilities via subtraction of class confident thresholds and renormalization. Set this toTrue
if you prefer to account for classimbalance. See Northcutt et al., 2021.
 Return type:
ndarray
 Returns:
label_quality_scores (
np.ndarray
) – Contains one score (between 0 and 1) per example. Lower scores indicate more likely mislabeled examples.
 cleanlab.rank.get_label_quality_ensemble_scores(labels, pred_probs_list, *, method='self_confidence', adjust_pred_probs=False, weight_ensemble_members_by='accuracy', custom_weights=None, log_loss_search_T_values=[0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 200.0], verbose=True)[source]#
Returns label quality scores based on predictions from an ensemble of models.
This is a function to compute labelquality scores for classification datasets, where lower scores indicate labels less likely to be correct.
Ensemble scoring requires a list of pred_probs from each model in the ensemble.
For each pred_probs in list, compute label quality score. Take the average of the scores with the chosen weighting scheme determined by weight_ensemble_members_by.
Score is between 0 and 1:
1 — clean label (given label is likely correct).
0 — dirty label (given label is likely incorrect).
 Parameters:
labels (
np.ndarray
) – Labels in the same format expected by the ~cleanlab.rank.get_label_quality_scores function.pred_probs_list (
List[np.ndarray]
) – Each element in this list should be an array of pred_probs in the same format expected by the ~cleanlab.rank.get_label_quality_scores function. Each element of pred_probs_list corresponds to the predictions from one model for all examples.method (
{"self_confidence", "normalized_margin", "confidence_weighted_entropy"}
, default"self_confidence"
) – Label quality scoring method. See ~cleanlab.rank.get_label_quality_scores for scenarios on when to use each method.adjust_pred_probs (
bool
, optional) – adjust_pred_probs in the same format expected by the ~cleanlab.rank.get_label_quality_scores function.weight_ensemble_members_by (
{"uniform", "accuracy", "log_loss_search", "custom"}
, default"accuracy"
) –Weighting scheme used to aggregate scores from each model:
”uniform”: Take the simple average of scores.
”accuracy”: Take weighted average of scores, weighted by model accuracy.
”log_loss_search”: Take weighted average of scores, weighted by exp(t * log_loss) where t is selected from log_loss_search_T_values parameter and log_loss is the logloss between a model’s pred_probs and the given labels.
”custom”: Take weighted average of scores using custom weights that the user passes to the custom_weights parameter.
custom_weights (
np.ndarray
, defaultNone
) – Weights used to aggregate scores from each model if weight_ensemble_members_by=”custom”. Length of this array must match the number of models: len(pred_probs_list).log_loss_search_T_values (
List
, default[1e4
,1e3
,1e2
,1e1
,1e0
,1e1
,1e2
,2e2]
) – List of t values considered if weight_ensemble_members_by=”log_loss_search”. We will choose the value of t that leads to weights which produce the best logloss when used to form a weighted average of pred_probs from the models.verbose (
bool
, defaultTrue
) – Set toFalse
to suppress all print statements.
 Return type:
ndarray
 Returns:
label_quality_scores (
np.ndarray
) – Contains one score (between 0 and 1) per example. Lower scores indicate more likely mislabeled examples.
See also
 cleanlab.rank.find_top_issues(quality_scores, *, top=10)[source]#
Returns the sorted indices of the top issues in quality_scores, ordered from smallest to largest quality score (i.e., from most to least likely to be an issue). For example, the first value returned is the index corresponding to the smallest value in quality_scores (most likely to be an issue). The second value in the returned array is the index corresponding to the second smallest value in qualityscores (secondmost likely to be an issue), and so forth.
This method assumes that quality_scores shares an index with some dataset such that the indices returned by this method map to the examples in that dataset.
 Parameters:
quality_scores (
ndarray
) – Array of shape(N,)
, where N is the number of examples, containing one quality score for each example in the dataset.top (
int
) – The number of indices to return.
 Return type:
ndarray
 Returns:
top_issue_indices
– Indices of top examples most likely to suffer from an issue (ranked by issue severity).
 cleanlab.rank.order_label_issues(label_issues_mask, labels, pred_probs, *, rank_by='self_confidence', rank_by_kwargs={})[source]#
Sorts label issues by label quality score.
Default label quality score is “self_confidence”.
 Parameters:
label_issues_mask (
np.ndarray
) – A boolean mask for the entire dataset whereTrue
represents a label issue andFalse
represents an example that is accurately labeled with high confidence.labels (
np.ndarray
) – Labels in the same format expected by the ~cleanlab.rank.get_label_quality_scores function.pred_probs (
np.ndarray (shape (N
,K))
) – Predictedprobabilities in the same format expected by the ~cleanlab.rank.get_label_quality_scores function.rank_by (
str
, optional) – Score by which to order label error indices (in increasing order). See the method argument of ~cleanlab.rank.get_label_quality_scores.rank_by_kwargs (
dict
, optional) – Optional keyword arguments to pass into ~cleanlab.rank.get_label_quality_scores function. Accepted args include adjust_pred_probs.
 Return type:
ndarray
 Returns:
label_issues_idx (
np.ndarray
) – Return an array of the indices of the examples with label issues, ordered by the labelquality scoring method passed to rank_by.
 cleanlab.rank.get_self_confidence_for_each_label(labels, pred_probs)[source]#
Returns the selfconfidence labelquality score for each datapoint.
This is a function to compute labelquality scores for classification datasets, where lower scores indicate labels less likely to be correct.
The selfconfidence is the classifier’s predicted probability that an example belongs to its given class label.
Selfconfidence can work better than normalizedmargin for detecting label errors due to outofdistribution (OOD) or weird examples vs. label errors in which labels for random examples have been replaced by other classes.
 Parameters:
labels (
np.ndarray
) – Labels in the same format expected by the ~cleanlab.rank.get_label_quality_scores function.pred_probs (
np.ndarray
) – Predictedprobabilities in the same format expected by the ~cleanlab.rank.get_label_quality_scores function.
 Return type:
ndarray
 Returns:
label_quality_scores (
np.ndarray
) – Contains one score (between 0 and 1) per example. Lower scores indicate more likely mislabeled examples.
 cleanlab.rank.get_normalized_margin_for_each_label(labels, pred_probs)[source]#
Returns the “normalized margin” labelquality score for each datapoint.
This is a function to compute labelquality scores for classification datasets, where lower scores indicate labels less likely to be correct.
Letting
k
denote the given label for a datapoint, the margin is(p(label = k)  max(p(label != k)))
, i.e. the probability of the given label minus the probability of the argmax label that is not the given label (margin = prob_label  max_prob_not_label
). This gives you an idea of how likely an example is BOTH its given label AND not another label, and therefore, scores its likelihood of being a good label or a label error. The normalized margin is simply a transformed version of the margin, to ensure values between 01 with lower values indicating more likely mislabeled data.Normalized margin works best for finding class conditional label errors where there is another label in the set of classes that is clearly better than the given label.
 Parameters:
labels (
np.ndarray
) – Labels in the same format expected by the ~cleanlab.rank.get_label_quality_scores function.pred_probs (
np.ndarray
) – Predictedprobabilities in the same format expected by the ~cleanlab.rank.get_label_quality_scores function.
 Return type:
ndarray
 Returns:
label_quality_scores (
np.ndarray
) – Contains one score (between 0 and 1) per example. Lower scores indicate more likely mislabeled examples.
 cleanlab.rank.get_confidence_weighted_entropy_for_each_label(labels, pred_probs)[source]#
Returns the “confidence weighted entropy” labelquality score for each datapoint.
This is a function to compute labelquality scores for classification datasets, where lower scores indicate labels less likely to be correct.
“confidence weighted entropy” is defined as the normalized entropy divided by “selfconfidence”. The returned values are a transformed version of this score, in order to ensure values between 01 with lower values indicating more likely mislabeled data.
 Parameters:
labels (
np.ndarray
) – Labels in the same format expected by the ~cleanlab.rank.get_label_quality_scores function.pred_probs (
np.ndarray
) – Predictedprobabilities in the same format expected by the ~cleanlab.rank.get_label_quality_scores function.
 Return type:
ndarray
 Returns:
label_quality_scores (
np.ndarray
) – Contains one score (between 0 and 1) per example. Lower scores indicate more likely mislabeled examples.