Methods to rank the severity of label issues in multi-label classification datasets. Here each example can belong to one or more classes, or none of the classes at all. Unlike in standard multi-class classification, predicted class probabilities from model need not sum to 1 for each row in multi-label classification.


get_label_quality_scores(labels, pred_probs, *)

Computes a label quality score each example in a multi-label classification dataset.

cleanlab.multilabel_classification.get_label_quality_scores(labels, pred_probs, *, method='self_confidence', adjust_pred_probs=False, aggregator_kwargs={'alpha': 0.8, 'method': 'exponential_moving_average'})[source]#

Computes a label quality score each example in a multi-label classification dataset.

Scores are between 0 and 1 with lower scores indicating examples whose label more likely contains an error. For each example, this method internally computes a separate score for each individual class and then aggregates these per-class scores into an overall label quality score for the example.

To estimate exactly which examples are mislabeled in a multi-label classification dataset, you can also use filter.find_label_issues with argument multi_label=True.

  • labels (List[List[int]]) –

    Multi-label classification labels for each example, which is allowed to belong to multiple classes. The i-th element of labels corresponds to list of classes that i-th example belongs to (e.g. labels = [[1,2],[1],[0],..]).


    Format requirements: For dataset with K classes, individual class labels must be integers in 0, 1, …, K-1.

  • pred_probs (np.ndarray) –

    A 2D array of shape (N, K) of model-predicted class probabilities P(label=k|x). Each row of this matrix corresponds to an example x and contains the predicted probabilities that x belongs to each possible class, for each of the K classes. The columns of this array must be ordered such that these probabilities correspond to class 0, 1, …, K-1. In multi-label classification (where classes are not mutually exclusive), the rows of pred_probs need not sum to 1.


    Estimated label quality scores are most accurate when they are computed based on out-of-sample pred_probs from your model. To obtain out-of-sample predicted probabilities for every example in your dataset, you can use cross-validation. This is encouraged to get better results.

  • method ({"self_confidence", "normalized_margin", "confidence_weighted_entropy"}, default = "self_confidence") –

    Method to calculate separate per class annotation scores that are subsequently aggregated to form an overall label quality score. These scores are separately calculated for each class based on the corresponding column of pred_probs in a one-vs-rest manner, and are standard label quality scores for multi-class classification.

    See also

    rank.get_label_quality_scores function for details about each option.

  • adjust_pred_probs (bool, default = False) – Account for class imbalance in the label-quality scoring by adjusting predicted probabilities via subtraction of class confident thresholds and renormalization. Set this to True if you prefer to account for class-imbalance. See Northcutt et al., 2021.

  • aggregator_kwargs (dict, default = {"method": "exponential_moving_average", "alpha": 0.8}) – A dictionary of hyperparameter values for aggregating per class scores into an overall label quality score for each example. Options for "method" include: "exponential_moving_average" or "softmin" or your own callable function. See internal.multilabel_scorer.Aggregator for details about each option and other possible hyperparameters.

Return type:

ndarray[Any, dtype[np.floating[T]]]


label_quality_scores (np.ndarray) – A 1D array of shape (N,) with a label quality score (between 0 and 1) for each example in the dataset. Lower scores indicate examples whose label is more likely to contain annotation errors.


>>> from cleanlab.multilabel_classification import get_label_quality_scores
>>> import numpy as np
>>> labels = [[1], [0,2]]
>>> pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]])
>>> scores = get_label_quality_scores(labels, pred_probs)
>>> scores
array([0.9, 0.5])