label#
Classes:
|
Manages label issues in a Datalab for regression tasks. |
Functions:
|
Find label issues in a regression dataset based on predictions. |
|
Find label issues in a regression dataset based on features. |
- class cleanlab.datalab.internal.issue_manager.regression.label.RegressionLabelIssueManager(datalab, clean_learning_kwargs=None, threshold=0.05, health_summary_parameters=None, **_)[source]#
Bases:
IssueManager
Manages label issues in a Datalab for regression tasks.
- Parameters:
datalab (
Datalab
) – A Datalab instance.clean_learning_kwargs (
Optional
[Dict
[str
,Any
]]) – Keyword arguments to pass to theregression.learn.CleanLearning
constructor.threshold (
float
) – The threshold to use to determine if an example has a label issue. It is a multiplier of the median label quality score that sets the absolute threshold. Only used if predictions are provided to ~RegressionLabelIssueManager.find_issues, not if features are provided. Default is 0.05.
Attributes:
Short text that summarizes the type of issues handled by this IssueManager.
Returns a key that is used to store issue summary results about the assigned Lab.
A dictionary of verbosity levels and their corresponding dictionaries of report items to print.
Returns a key that is used to store issue score results about the assigned Lab.
Methods:
find_issues
([features, predictions])Find label issues in the datalab.
collect_info
(issues)Collects data for the info attribute of the Datalab.
make_summary
(score)Construct a summary dataframe.
report
(issues, summary, info[, ...])Compose a report of the issues found by this IssueManager.
- description: ClassVar[str]#
Short text that summarizes the type of issues handled by this IssueManager.
- issue_name: ClassVar[str] = 'label'#
Returns a key that is used to store issue summary results about the assigned Lab.
- verbosity_levels: ClassVar[Dict[int, List[str]]]#
A dictionary of verbosity levels and their corresponding dictionaries of report items to print.
Example
>>> verbosity_levels = { ... 0: [], ... 1: ["some_info_key"], ... 2: ["additional_info_key"], ... }
- find_issues(features=None, predictions=None, **kwargs)[source]#
Find label issues in the datalab. :rtype:
None
Priority Order for finding issues:
Custom Model: Requires features to be passed to this method. Used if a model is set up in the constructor.
Predictions: Uses predictions if provided and no model is set up in the constructor.
Default Model: Defaults to a standard model using features if no model or predictions are provided.
- collect_info(issues)[source]#
Collects data for the info attribute of the Datalab. :rtype:
dict
Note
This method is called by
find_issues()
afterfind_issues()
has set the issues and summary dataframes as instance attributes.
- issue_score_key: ClassVar[str] = 'label_score'#
Returns a key that is used to store issue score results about the assigned Lab.
- classmethod make_summary(score)#
Construct a summary dataframe.
- Parameters:
score (
float
) – The overall score for this issue.- Return type:
DataFrame
- Returns:
summary
– A summary dataframe.
- classmethod report(issues, summary, info, num_examples=5, verbosity=0, include_description=False, info_to_omit=None)#
Compose a report of the issues found by this IssueManager.
- Parameters:
issues (
DataFrame
) –An issues dataframe.
Example
>>> import pandas as pd >>> issues = pd.DataFrame( ... { ... "is_X_issue": [True, False, True], ... "X_score": [0.2, 0.9, 0.4], ... }, ... )
summary (
DataFrame
) –The summary dataframe.
Example
>>> summary = pd.DataFrame( ... { ... "issue_type": ["X"], ... "score": [0.5], ... }, ... )
info (
Dict
[str
,Any
]) –The info dict.
Example
>>> info = { ... "A": "val_A", ... "B": ["val_B1", "val_B2"], ... }
num_examples (
int
) – The number of examples to print.verbosity (
int
) – The verbosity level of the report.include_description (
bool
) – Whether to include a description of the issue in the report.
- Return type:
str
- Returns:
report_str
– A string containing the report.
- info: Dict[str, Any]#
- issues: pd.DataFrame#
- summary: pd.DataFrame#
- cleanlab.datalab.internal.issue_manager.regression.label.find_issues_with_predictions(predictions, y, threshold, **kwargs)[source]#
Find label issues in a regression dataset based on predictions. This uses a threshold to determine if an example has a label issue based on the quality score.
- Parameters:
predictions (
ndarray
) – The predictions from a regression model.y (
ndarray
) – The given labels.threshold (
float
) – The threshold to use to determine if an example has a label issue. It is a multiplier of the median label quality score that sets the absolute threshold.**kwargs – Various keyword arguments.
- Return type:
DataFrame
- Returns:
issues
– A dataframe of the issues. It contains the following columns: - is_label_issue : boolTrue if the example has a label issue.
- label_scorefloat
The quality score of the label.
- given_labelfloat
The given label. It is the same as the y parameter.
- predicted_labelfloat
The predicted label. It is the same as the predictions parameter.
- cleanlab.datalab.internal.issue_manager.regression.label.find_issues_with_features(features, y, cl, **kwargs)[source]#
Find label issues in a regression dataset based on features. This delegates the work to the CleanLearning.find_label_issues method.
- Parameters:
features (
ndarray
) – The numerical features from a regression dataset.y (
ndarray
) – The given labels.**kwargs – Various keyword arguments.
- Return type:
DataFrame
- Returns:
issues
– A dataframe of the issues. It contains the following columns: - is_label_issue : boolTrue if the example has a label issue.
- label_scorefloat
The quality score of the label.
- given_labelfloat
The given label. It is the same as the y parameter.
- predicted_labelfloat
The predicted label. It is determined by the CleanLearning.find_label_issues method.