data#
Classes and methods for datasets that are loaded into Datalab.
Exceptions:
|
Exception raised when the data is not in a supported format. |
Exception raised when a DatasetDict is passed to Datalab. |
|
|
Exception raised when a dataset cannot be loaded. |
Classes:
|
Class that holds and validates datasets for Datalab. |
|
Class to represent labels in a dataset. |
- exception cleanlab.datalab.internal.data.DataFormatError(data)[source]#
Bases:
ValueError
Exception raised when the data is not in a supported format.
- args#
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception cleanlab.datalab.internal.data.DatasetDictError[source]#
Bases:
ValueError
Exception raised when a DatasetDict is passed to Datalab.
Usually, this means that a dataset identifier was passed to Datalab, but the dataset is a DatasetDict, which contains multiple splits of the dataset.
- args#
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception cleanlab.datalab.internal.data.DatasetLoadError(dataset_type)[source]#
Bases:
ValueError
Exception raised when a dataset cannot be loaded.
- Parameters:
dataset_type (
type
) – The type of dataset that failed to load.
- args#
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- class cleanlab.datalab.internal.data.Data(data, label_name=None)[source]#
Bases:
object
Class that holds and validates datasets for Datalab.
Internally, the data is stored as a datasets.Dataset object and the labels are integers (ranging from 0 to K-1, where K is the number of classes) stored in a numpy array.
- Parameters:
data (
Union
[Dataset
,DataFrame
,Dict
[str
,Any
],List
[Dict
[str
,Any
]],str
]) –Dataset to be audited by Datalab. Several formats are supported, which will internally be converted to a Dataset object.
- Supported formats:
datasets.Dataset
pandas.DataFrame
- dict
keys are strings
values are arrays or lists of equal length
- list
list of dictionaries with the same keys
- str
- path to a local file
Text (.txt)
CSV (.csv)
JSON (.json)
or a dataset identifier on the Hugging Face Hub
It checks if the string is a path to a file that exists locally, and if not, it assumes it is a dataset identifier on the Hugging Face Hub.
label_name (
Union[str
,List[str]]
) – Name of the label column in the dataset.
Warning
Optional dependencies:
- datasets :
Dataset, DatasetDict and load_dataset are imported from datasets. This is an optional dependency of cleanlab, but is required for
Datalab
to work.
Attributes:
- rtype:
List
[str
]
Check if labels are available.
- property class_names: List[str]#
- Return type:
List
[str
]
- property has_labels: bool#
Check if labels are available.
- Return type:
bool
- class cleanlab.datalab.internal.data.Label(*, data, label_name=None)[source]#
Bases:
object
Class to represent labels in a dataset.
Attributes:
A list of class names that are present in the dataset.
Check if labels are available.
- property class_names: List[str]#
A list of class names that are present in the dataset.
Without labels, this will return an empty list.
- Return type:
List
[str
]
- property is_available: bool#
Check if labels are available.
- Return type:
bool