data#
Classes and methods for datasets that are loaded into Datalab.
Exceptions:
| 
 | Exception raised when the data is not in a supported format. | 
| Exception raised when a DatasetDict is passed to Datalab. | |
| 
 | Exception raised when a dataset cannot be loaded. | 
Classes:
| 
 | Class that holds and validates datasets for Datalab. | 
| 
 | Class to represent labels in a dataset. | 
- exception cleanlab.datalab.internal.data.DataFormatError(data)[source]#
- Bases: - ValueError- Exception raised when the data is not in a supported format. - args#
 - with_traceback()#
- Exception.with_traceback(tb) – set self.__traceback__ to tb and return self. 
 
- exception cleanlab.datalab.internal.data.DatasetDictError[source]#
- Bases: - ValueError- Exception raised when a DatasetDict is passed to Datalab. - Usually, this means that a dataset identifier was passed to Datalab, but the dataset is a DatasetDict, which contains multiple splits of the dataset. - args#
 - with_traceback()#
- Exception.with_traceback(tb) – set self.__traceback__ to tb and return self. 
 
- exception cleanlab.datalab.internal.data.DatasetLoadError(dataset_type)[source]#
- Bases: - ValueError- Exception raised when a dataset cannot be loaded. - Parameters:
- dataset_type ( - type) – The type of dataset that failed to load.
 - args#
 - with_traceback()#
- Exception.with_traceback(tb) – set self.__traceback__ to tb and return self. 
 
- class cleanlab.datalab.internal.data.Data(data, label_name=None)[source]#
- Bases: - object- Class that holds and validates datasets for Datalab. - Internally, the data is stored as a datasets.Dataset object and the labels are integers (ranging from 0 to K-1, where K is the number of classes) stored in a numpy array. - Parameters:
- data ( - Union[- Dataset,- DataFrame,- Dict[- str,- Any],- List[- Dict[- str,- Any]],- str]) –- Dataset to be audited by Datalab. Several formats are supported, which will internally be converted to a Dataset object. - Supported formats:
- datasets.Dataset 
- pandas.DataFrame 
- dict
- keys are strings 
- values are arrays or lists of equal length 
 
 
- list
- list of dictionaries with the same keys 
 
 
- str
- path to a local file
- Text (.txt) 
- CSV (.csv) 
- JSON (.json) 
 
 
- or a dataset identifier on the Hugging Face Hub 
 
 
 - It checks if the string is a path to a file that exists locally, and if not, it assumes it is a dataset identifier on the Hugging Face Hub. 
 
- label_name ( - Union[str,- List[str]]) – Name of the label column in the dataset.
 
 - Warning - Optional dependencies: - datasets :
- Dataset, DatasetDict and load_dataset are imported from datasets. This is an optional dependency of cleanlab, but is required for - Datalabto work.
 
 - Attributes: - rtype:
- List[- str]
 - Check if labels are available. - property class_names: List[str]#
- Return type:
- List[- str]
 
 - property has_labels: bool#
- Check if labels are available. - Return type:
- bool
 
 
- class cleanlab.datalab.internal.data.Label(*, data, label_name=None)[source]#
- Bases: - object- Class to represent labels in a dataset. - Attributes: - A list of class names that are present in the dataset. - Check if labels are available. - property class_names: List[str]#
- A list of class names that are present in the dataset. - Without labels, this will return an empty list. - Return type:
- List[- str]
 
 - property is_available: bool#
- Check if labels are available. - Return type:
- bool