Datalab guides#

Guides for using Datalab and understanding the issues it detects.

Note

Using Datalab requires additional dependencies beyond the rest of the cleanlab package. To install them, run:

$ pip install "cleanlab[datalab]"

For the developmental version of the package, install from source:

$ pip install "git+https://github.com/cleanlab/cleanlab.git#egg=cleanlab[datalab]"

Types of issues#

Guides to use Datalab with greater control, selecting what issues to search for and what nondefault settings to use for detecting them.

Customizing issue types#

Guides (for developers) to create a custom issue type that Datalab audits for together with its built-in issue types.

Cleanlab Studio (Easy Mode)#

Cleanlab Studio is a fully automated platform that can detect the same data issues as this package, as well as many more types of issues, all without you having to do any Machine Learning (or even write any code). Beyond being 100x faster to use and producing more useful results, Cleanlab Studio also provides an intelligent data correction interface for you to quickly fix the issues detected in your dataset (a single data scientist can fix millions of data points thanks to AI suggestions).

Cleanlab Studio offers a powerful AutoML system (with Foundation models) that is useful for more than improving data quality. With a few clicks, you can: find + fix issues in your dataset, identify the best type of ML model and train/tune it, and deploy this model to serve accurate predictions for new data. Also use the same AutoML to auto-label large datasets (a single user can label millions of data points thanks to powerful Foundation models). Try Cleanlab Studio for free!

Stages of modern AI pipeline that can now be automated with Cleanlab Studio