dataset#

Methods to summarize overall labeling issues across a multi-label classification dataset. Here each example can belong to one or more classes, or none of the classes at all. Unlike in standard multi-class classification, model-predicted class probabilities need not sum to 1 for each row in multi-label classification.

Functions:

common_multilabel_issues([labels, ...])

Summarizes which classes in a multi-label dataset appear most often mislabeled overall.

rank_classes_by_multilabel_quality([labels, ...])

Returns a DataFrame with three overall label quality scores per class for a multi-label dataset.

overall_multilabel_health_score([labels, ...])

Returns a single score between 0 and 1 measuring the overall quality of all labels in a multi-label classification dataset.

multilabel_health_summary([labels, ...])

Prints a health summary of your multi-label dataset.

cleanlab.multilabel_classification.dataset.common_multilabel_issues(labels=<class 'list'>, pred_probs=None, *, class_names=None, confident_joint=None)[source]#

Summarizes which classes in a multi-label dataset appear most often mislabeled overall.

Since classes are not mutually exclusive in multi-label classification, this method summarizes the label issues for each class independently of the others.

Parameters:
  • labels (List[List[int]]) – List of noisy labels for multi-label classification where each example can belong to multiple classes. Refer to documentation for this argument in multilabel_classification.filter.find_label_issues for further details.

  • pred_probs (np.ndarray) – An array of shape (N, K) of model-predicted class probabilities. Refer to documentation for this argument in multilabel_classification.filter.find_label_issues for further details.

  • class_names (Iterable[str], optional) – A list or other iterable of the string class names. Its order must match the label indices. If class 0 is ‘dog’ and class 1 is ‘cat’, then class_names = ['dog', 'cat']. If provided, the returned DataFrame will have an extra Class Name column with this info.

  • confident_joint (np.ndarray, optional) – An array of shape (K, 2, 2) representing a one-vs-rest formatted confident joint. Refer to documentation for this argument in multilabel_classification.filter.find_label_issues for details.

Return type:

DataFrame

Returns:

common_multilabel_issues (pd.DataFrame) –

DataFrame where each row corresponds to a class summarized by the following columns:
  • Class Name: The name of the class if class_names is provided.

  • Class Index: The index of the class.

  • In Given Label: Whether the Class is originally annotated True or False in the given label.

  • In Suggested Label: Whether the Class should be True or False in the suggested label (based on model’s prediction).

  • Num Examples: Number of examples flagged as a label issue where this Class is True/False “In Given Label” but cleanlab estimates the annotation should actually be as specified “In Suggested Label”. I.e. the number of examples in your dataset where this Class was labeled as True but likely should have been False (or vice versa).

  • Issue Probability: The Num Examples column divided by the total number of examples in the dataset; i.e. the relative overall frequency of each type of label issue in your dataset.

By default, the rows in this DataFrame are ordered by “Issue Probability” (descending).

cleanlab.multilabel_classification.dataset.rank_classes_by_multilabel_quality(labels=None, pred_probs=None, *, class_names=None, joint=None, confident_joint=None)[source]#

Returns a DataFrame with three overall label quality scores per class for a multi-label dataset.

These numbers summarize all examples annotated with the class (details listed below under the Returns parameter). By default, classes are ordered by “Label Quality Score”, so the most problematic classes are reported first in the DataFrame.

Score values are unnormalized and may be very small. What matters is their relative ranking across the classes.

Parameters:

For information about the arguments to this method, see the documentation of common_multilabel_issues.

Return type:

DataFrame

Returns:

overall_label_quality (pd.DataFrame) – Pandas DataFrame with one row per class and columns: “Class Index”, “Label Issues”, “Inverse Label Issues”, “Label Issues”, “Inverse Label Noise”, “Label Quality Score”. Some entries are overall quality scores between 0 and 1, summarizing how good overall the labels appear to be for that class (lower values indicate more erroneous labels). Other entries are estimated counts of annotation errors related to this class.

Here is what each column represents:
  • Class Name: The name of the class if class_names is provided.

  • Class Index: The index of the class in 0, 1, …, K-1.

  • Label Issues: Estimated number of examples in the dataset that are labeled as belonging to class k but actually should not belong to this class.

  • Inverse Label Issues: Estimated number of examples in the dataset that should actually be labeled as class k but did not receive this label.

  • Label Noise: Estimated proportion of examples in the dataset that are labeled as class k but should not be. For each class k: this is computed by dividing the number of examples with “Label Issues” that were labeled as class k by the total number of examples labeled as class k.

  • Inverse Label Noise: Estimated proportion of examples in the dataset that should actually be labeled as class k but did not receive this label.

  • Label Quality Score: Estimated proportion of examples labeled as class k that have been labeled correctly, i.e. 1 - label_noise.

By default, the DataFrame is ordered by “Label Quality Score” (in ascending order), so the classes with the most label issues appear first.

cleanlab.multilabel_classification.dataset.overall_multilabel_health_score(labels=None, pred_probs=None, *, confident_joint=None)[source]#

Returns a single score between 0 and 1 measuring the overall quality of all labels in a multi-label classification dataset. Intuitively, the score is the average correctness of the given labels across all examples in the dataset. So a score of 1 suggests your data is perfectly labeled and a score of 0.5 suggests half of the examples in the dataset may be incorrectly labeled. Thus, a higher score implies a higher quality dataset.

Parameters: For information about the arguments to this method, see the documentation of common_multilabel_issues.

Return type:

float

Returns:

health_score (float) – A overall score between 0 and 1, where 1 implies all labels in the dataset are estimated to be correct. A score of 0.5 implies that half of the dataset’s labels are estimated to have issues.

cleanlab.multilabel_classification.dataset.multilabel_health_summary(labels=None, pred_probs=None, *, class_names=None, num_examples=None, confident_joint=None, verbose=True)[source]#

Prints a health summary of your multi-label dataset.

This summary includes useful statistics like:

  • The classes with the most and least label issues.

  • Overall label quality scores, summarizing how accurate the labels appear across the entire dataset.

Parameters: For information about the arguments to this method, see the documentation of common_multilabel_issues.

Return type:

Dict

Returns:

summary (dict) –

A dictionary containing keys (see the corresponding functions’ documentation to understand the values):