label#

Classes:

RegressionLabelIssueManager(datalab[, ...])

Manages label issues in a Datalab for regression tasks.

Functions:

find_issues_with_predictions(predictions, y, ...)

Find label issues in a regression dataset based on predictions.

find_issues_with_features(features, y, cl, ...)

Find label issues in a regression dataset based on features.

class cleanlab.datalab.internal.issue_manager.regression.label.RegressionLabelIssueManager(datalab, clean_learning_kwargs=None, threshold=0.05, health_summary_parameters=None, **_)[source]#

Bases: IssueManager

Manages label issues in a Datalab for regression tasks.

Parameters:
  • datalab (Datalab) – A Datalab instance.

  • clean_learning_kwargs (Optional[Dict[str, Any]]) – Keyword arguments to pass to the regression.learn.CleanLearning constructor.

  • threshold (float) – The threshold to use to determine if an example has a label issue. It is a multiplier of the median label quality score that sets the absolute threshold. Only used if predictions are provided to ~RegressionLabelIssueManager.find_issues, not if features are provided. Default is 0.05.

Attributes:

description

Short text that summarizes the type of issues handled by this IssueManager.

issue_name

Returns a key that is used to store issue summary results about the assigned Lab.

verbosity_levels

A dictionary of verbosity levels and their corresponding dictionaries of report items to print.

issue_score_key

Returns a key that is used to store issue score results about the assigned Lab.

info

issues

summary

Methods:

find_issues([features, predictions])

Find label issues in the datalab.

collect_info(issues)

Collects data for the info attribute of the Datalab.

make_summary(score)

Construct a summary dataframe.

report(issues, summary, info[, ...])

Compose a report of the issues found by this IssueManager.

description: ClassVar[str]#

Short text that summarizes the type of issues handled by this IssueManager.

issue_name: ClassVar[str] = 'label'#

Returns a key that is used to store issue summary results about the assigned Lab.

verbosity_levels: ClassVar[Dict[int, List[str]]]#

A dictionary of verbosity levels and their corresponding dictionaries of report items to print.

Example

>>> verbosity_levels = {
...     0: [],
...     1: ["some_info_key"],
...     2: ["additional_info_key"],
... }
find_issues(features=None, predictions=None, **kwargs)[source]#

Find label issues in the datalab. :rtype: None

Priority Order for finding issues:

  1. Custom Model: Requires features to be passed to this method. Used if a model is set up in the constructor.

  2. Predictions: Uses predictions if provided and no model is set up in the constructor.

  3. Default Model: Defaults to a standard model using features if no model or predictions are provided.

collect_info(issues)[source]#

Collects data for the info attribute of the Datalab. :rtype: dict

Note

This method is called by find_issues() after find_issues() has set the issues and summary dataframes as instance attributes.

issue_score_key: ClassVar[str] = 'label_score'#

Returns a key that is used to store issue score results about the assigned Lab.

classmethod make_summary(score)#

Construct a summary dataframe.

Parameters:

score (float) – The overall score for this issue.

Return type:

DataFrame

Returns:

summary – A summary dataframe.

classmethod report(issues, summary, info, num_examples=5, verbosity=0, include_description=False, info_to_omit=None)#

Compose a report of the issues found by this IssueManager.

Parameters:
  • issues (DataFrame) –

    An issues dataframe.

    Example

    >>> import pandas as pd
    >>> issues = pd.DataFrame(
    ...     {
    ...         "is_X_issue": [True, False, True],
    ...         "X_score": [0.2, 0.9, 0.4],
    ...     },
    ... )
    

  • summary (DataFrame) –

    The summary dataframe.

    Example

    >>> summary = pd.DataFrame(
    ...     {
    ...         "issue_type": ["X"],
    ...         "score": [0.5],
    ...     },
    ... )
    

  • info (Dict[str, Any]) –

    The info dict.

    Example

    >>> info = {
    ...     "A": "val_A",
    ...     "B": ["val_B1", "val_B2"],
    ... }
    

  • num_examples (int) – The number of examples to print.

  • verbosity (int) – The verbosity level of the report.

  • include_description (bool) – Whether to include a description of the issue in the report.

Return type:

str

Returns:

report_str – A string containing the report.

info: Dict[str, Any]#
issues: pd.DataFrame#
summary: pd.DataFrame#
cleanlab.datalab.internal.issue_manager.regression.label.find_issues_with_predictions(predictions, y, threshold, **kwargs)[source]#

Find label issues in a regression dataset based on predictions. This uses a threshold to determine if an example has a label issue based on the quality score.

Parameters:
  • predictions (ndarray) – The predictions from a regression model.

  • y (ndarray) – The given labels.

  • threshold (float) – The threshold to use to determine if an example has a label issue. It is a multiplier of the median label quality score that sets the absolute threshold.

  • **kwargs – Various keyword arguments.

Return type:

DataFrame

Returns:

issues – A dataframe of the issues. It contains the following columns: - is_label_issue : bool

True if the example has a label issue.

  • label_scorefloat

    The quality score of the label.

  • given_labelfloat

    The given label. It is the same as the y parameter.

  • predicted_labelfloat

    The predicted label. It is the same as the predictions parameter.

cleanlab.datalab.internal.issue_manager.regression.label.find_issues_with_features(features, y, cl, **kwargs)[source]#

Find label issues in a regression dataset based on features. This delegates the work to the CleanLearning.find_label_issues method.

Parameters:
  • features (ndarray) – The numerical features from a regression dataset.

  • y (ndarray) – The given labels.

  • **kwargs – Various keyword arguments.

Return type:

DataFrame

Returns:

issues – A dataframe of the issues. It contains the following columns: - is_label_issue : bool

True if the example has a label issue.

  • label_scorefloat

    The quality score of the label.

  • given_labelfloat

    The given label. It is the same as the y parameter.

  • predicted_labelfloat

    The predicted label. It is determined by the CleanLearning.find_label_issues method.