data#

Classes and methods for datasets that are loaded into Datalab.

Exceptions:

DataFormatError(data)

Exception raised when the data is not in a supported format.

DatasetDictError()

Exception raised when a DatasetDict is passed to Datalab.

DatasetLoadError(dataset_type)

Exception raised when a dataset cannot be loaded.

Classes:

Data(data, label_name)

Class that holds and validates datasets for Datalab.

exception cleanlab.datalab.data.DataFormatError(data)[source]#

Bases: ValueError

Exception raised when the data is not in a supported format.

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception cleanlab.datalab.data.DatasetDictError[source]#

Bases: ValueError

Exception raised when a DatasetDict is passed to Datalab.

Usually, this means that a dataset identifier was passed to Datalab, but the dataset is a DatasetDict, which contains multiple splits of the dataset.

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception cleanlab.datalab.data.DatasetLoadError(dataset_type)[source]#

Bases: ValueError

Exception raised when a dataset cannot be loaded.

Parameters:

dataset_type (type) – The type of dataset that failed to load.

args#
with_traceback()#

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class cleanlab.datalab.data.Data(data, label_name)[source]#

Bases: object

Class that holds and validates datasets for Datalab.

Internally, the data is stored as a datasets.Dataset object and the labels are integers (ranging from 0 to K-1, where K is the number of classes) stored in a numpy array.

Parameters:
  • data (Union[Dataset, DataFrame, Dict[str, Any], List[Dict[str, Any]], str]) –

    Dataset to be audited by Datalab. Several formats are supported, which will internally be converted to a Dataset object.

    Supported formats:
    • datasets.Dataset

    • pandas.DataFrame

    • dict
      • keys are strings

      • values are arrays or lists of equal length

    • list
      • list of dictionaries with the same keys

    • str
      • path to a local file
        • Text (.txt)

        • CSV (.csv)

        • JSON (.json)

      • or a dataset identifier on the Hugging Face Hub

    It checks if the string is a path to a file that exists locally, and if not, it assumes it is a dataset identifier on the Hugging Face Hub.

  • label_name (Union[str, List[str]]) – Name of the label column in the dataset.

Warning

Optional dependencies:

  • datasets :

    Dataset, DatasetDict and load_dataset are imported from datasets. This is an optional dependency of cleanlab, but is required for Datalab to work.

Attributes:

class_names

rtype:

list

property class_names: list#
Return type:

list