How to migrate to versions >= 2.0.0 from pre 1.0.1#
If you previously used older versions of cleanlab, this guide helps update your existing code to work with versions >= 2.0.0 in no time! Below we outline the major updates and code substitutions to be aware of. A detailed API change-log is listed in the v2.0.0. Release Notes.
Function and class name changes#
This section covers the most commonly-used functionality from Cleanlab 1.0.
pruning.get_noise_indices(s, psx, prune_method, sorted_index_method, ...)filter.find_label_issues (labels, pred_probs, filter_by, return_indices_ranked_by, ...)Note: inverse_noise_matrix is no longer a supported input argument, but confident_joint remains (you can easily convert between these two).
pruning.order_label_errors(label_errors_bool, psx, labels, sorted_index_method)rank.order_label_issues (label_issues_mask, labels, pred_probs, rank_by, ...)Note: You can now alternatively use rank.get_label_quality_score() to numerically score the labels instead of ranking them.
latent_estimation.num_label_errors(labels, psx, ...)count.num_label_issues (labels, pred_probs, ...)Note: This is the most accurate way to estimate the raw number of label errors in a dataset.
classification.LearningWithNoisyLabels(..., prune_method)classification.CleanLearning (..., find_label_issues_kwargs)Note: CleanLearning can now find label errors for you, neatly organizing them in a pandas.DataFrame as well as computing the required out-of-sample predicted probabilities. You just specify which classifier, we handle the cross-validation!
Module name changes#
Reorganized modules:
cleanlab.pruning–>cleanlab.filtercleanlab.latent_estimation–>cleanlab.countcleanlab.noise_generation–>cleanlab.benchmarking.noise_generationcleanlab.baseline_methods–> incorporated intocleanlab.filter
Internal and experimental functionality, marked as such and not guaranteed to be stable between releases:
cleanlab.models–>cleanlab.experimentalcleanlab.coteaching–>cleanlab.experimental.coteachingcleanlab.latent_algebra–>cleanlab.internal.latent_algebracleanlab.util–>cleanlab.internal.util
New modules#
cleanlab.dataset: New methods to print summaries of overall types of label issues most common in a dataset.cleanlab.rank: Moved all ranking and ordering functions fromcleanlab.pruningto here. This module contains methods to score the label quality of each example and rank your data by the quality of their labels.cleanlab.internalandcleanlab.experimental: Moved all advanced code and utility methods to this module, including the oldcleanlab.latent_algebramodule. Researchers may find useful functions in here.
Removed modules#
cleanlab.polyplex
Common argument and variable name changes#
Here are some common name and terminology changes in Cleanlab 2.0:
s–>labels(the given labels in the data, which are potentially noisy)psx–>pred_probs(predicted probabilities output by trained classifier)label_error–>label_issue(a label that is likely to be wrong)
See the documentation for individual functions for details on how argument names changed.