multilabel_classification#
Methods to rank the severity of label issues in multilabel classification datasets. Here each example can belong to one or more classes, or none of the classes at all. Unlike in standard multiclass classification, predicted class probabilities from model need not sum to 1 for each row in multilabel classification.
Functions:

Computes a label quality score each example in a multilabel classification dataset. 
 cleanlab.multilabel_classification.get_label_quality_scores(labels, pred_probs, *, method='self_confidence', adjust_pred_probs=False, aggregator_kwargs={'alpha': 0.8, 'method': 'exponential_moving_average'})[source]#
Computes a label quality score each example in a multilabel classification dataset.
Scores are between 0 and 1 with lower scores indicating examples whose label more likely contains an error. For each example, this method internally computes a separate score for each individual class and then aggregates these perclass scores into an overall label quality score for the example.
To estimate exactly which examples are mislabeled in a multilabel classification dataset, you can also use
filter.find_label_issues
with argumentmulti_label=True
. Parameters:
labels (
List[List[int]]
) –Multilabel classification labels for each example, which is allowed to belong to multiple classes. The ith element of labels corresponds to list of classes that ith example belongs to (e.g.
labels = [[1,2],[1],[0],..]
).Important
Format requirements: For dataset with K classes, individual class labels must be integers in 0, 1, …, K1.
pred_probs (
np.ndarray
) –An array of shape
(N, K)
of modelpredicted probabilities,P(label=kx)
. Each row of this matrix corresponds to an example x and contains the modelpredicted probabilities that x belongs to each possible class, for each of the K classes. The columns must be ordered such that these probabilities correspond to class 0, 1, …, K1. In multilabel classification, the rows of pred_probs need not sum to 1.Note
Estimated label quality scores are most accurate when they are computed based on outofsample
pred_probs
from your model. To obtain outofsample predicted probabilities for every example in your dataset, you can use crossvalidation. This is encouraged to get better results.method (
{"self_confidence", "normalized_margin", "confidence_weighted_entropy"}
, default ="self_confidence"
) –Method to calculate separate per class annotation scores that are subsequently aggregated to form an overall label quality score. These scores are separately calculated for each class based on the corresponding column of pred_probs in a onevsrest manner, and are standard label quality scores for multiclass classification.
See also
rank.get_label_quality_scores
function for details about each option.adjust_pred_probs (
bool
, default= False
) – Account for class imbalance in the labelquality scoring by adjusting predicted probabilities via subtraction of class confident thresholds and renormalization. Set this toTrue
if you prefer to account for classimbalance. See Northcutt et al., 2021.aggregator_kwargs (
dict
, default ={"method": "exponential_moving_average", "alpha": 0.8}
) – A dictionary of hyperparameter values for aggregating per class scores into an overall label quality score for each example. Options for"method"
include:"exponential_moving_average"
or"softmin"
or your own callable function. Seeinternal.multilabel_scorer.Aggregator
for details about each option and other possible hyperparameters.
 Return type:
ndarray
 Returns:
label_quality_scores (
np.ndarray
) – A 1D array of shape(N,)
with a label quality score (between 0 and 1) for each example in the dataset. Lower scores indicate examples whose label is more likely to contain annotation errors.
Examples
>>> from cleanlab.multilabel_classification import get_label_quality_scores >>> import numpy as np >>> labels = [[1], [0,2]] >>> pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]]) >>> scores = get_label_quality_scores(labels, pred_probs) >>> scores array([0.9, 0.5])