label#

Classes:

LabelIssueManager(datalab[, k, ...])

Manages label issues in a Datalab.

class cleanlab.datalab.internal.issue_manager.label.LabelIssueManager(datalab, k=10, clean_learning_kwargs=None, health_summary_parameters=None, **_)[source]#

Bases: IssueManager

Manages label issues in a Datalab.

Parameters:

datalab (Datalab) – A Datalab instance.
k (int) – The number of nearest neighbors to consider when computing pred_probs from features. Only applicable if features are provided and pred_probs are not.
clean_learning_kwargs (Optional[Dict[str, Any]]) – Keyword arguments to pass to the CleanLearning constructor.
health_summary_parameters (Optional[Dict[str, Any]]) – Keyword arguments to pass to the health_summary function.

Attributes:

`description`	Short text that summarizes the type of issues handled by this IssueManager.
`issue_name`	Returns a key that is used to store issue summary results about the assigned Lab.
`verbosity_levels`	A dictionary of verbosity levels and their corresponding dictionaries of report items to print.
`issue_score_key`	Returns a key that is used to store issue score results about the assigned Lab.

Methods:

`find_issues`([pred_probs, features])	Find label issues in the datalab.
`get_health_summary`(pred_probs)	Returns a short summary of the health of this Lab.
`collect_info`(issues, summary_dict)	Collects data for the info attribute of the Datalab.
`make_summary`(score)	Construct a summary dataframe.
`report`(issues, summary, info[, ...])	Compose a report of the issues found by this IssueManager.

description: ClassVar[str]#

Short text that summarizes the type of issues handled by this IssueManager.

issue_name: ClassVar[str] = 'label'#: Returns a key that is used to store issue summary results about the assigned Lab.

verbosity_levels: ClassVar[Dict[int, List[str]]]#

A dictionary of verbosity levels and their corresponding dictionaries of report items to print.

Example

>>> verbosity_levels = {
...     0: [],
...     1: ["some_info_key"],
...     2: ["additional_info_key"],
... }

health_summary_parameters: Dict[str, Any]#

find_issues(pred_probs=None, features=None, **kwargs)[source]#

Find label issues in the datalab.

Parameters:

pred_probs (Optional[ndarray[Any, dtype[TypeVar(_ScalarType_co, bound= generic, covariant=True)]]]) – The predicted probabilities for each example.
features (Optional[ndarray[Any, dtype[TypeVar(_ScalarType_co, bound= generic, covariant=True)]]]) – The features for each example.

Return type:

None

get_health_summary(pred_probs)[source]#

Returns a short summary of the health of this Lab.

Return type:: dict

collect_info(issues, summary_dict)[source]#

Collects data for the info attribute of the Datalab.

Note

This method is called by find_issues() after find_issues() has set the issues and summary dataframes as instance attributes.

Return type:: dict

issue_score_key: ClassVar[str] = 'label_score'#: Returns a key that is used to store issue score results about the assigned Lab.

classmethod make_summary(score)#

Construct a summary dataframe.

Parameters:: score (float) – The overall score for this issue.
Return type:: DataFrame
Returns:: summary – A summary dataframe.

classmethod report(issues, summary, info, num_examples=5, verbosity=0, include_description=False, info_to_omit=None)#

Compose a report of the issues found by this IssueManager.

Parameters:

issues (DataFrame) –

An issues dataframe.

Example

>>> import pandas as pd
>>> issues = pd.DataFrame(
...     {
...         "is_X_issue": [True, False, True],
...         "X_score": [0.2, 0.9, 0.4],
...     },
... )

summary (DataFrame) –

The summary dataframe.

Example

>>> summary = pd.DataFrame(
...     {
...         "issue_type": ["X"],
...         "score": [0.5],
...     },
... )

info (Dict[str, Any]) –

The info dict.

Example

>>> info = {
...     "A": "val_A",
...     "B": ["val_B1", "val_B2"],
... }

num_examples (int) – The number of examples to print.
verbosity (int) – The verbosity level of the report.
include_description (bool) – Whether to include a description of the issue in the report.

Return type:

str

Returns:

report_str – A string containing the report.

info: Dict[str, Any]#

issues: pd.DataFrame#

summary: pd.DataFrame#