issue_manager#
Classes:
| 
 | Base class for managing data issues of a particular type in a Datalab. | 
- class cleanlab.datalab.internal.issue_manager.issue_manager.IssueManager(datalab, **_)[source]#
- Bases: - ABC- Base class for managing data issues of a particular type in a Datalab. - For each example in a dataset, the IssueManager for a particular type of issue should compute: - A numeric severity score between 0 and 1, - with values near 0 indicating severe instances of the issue. - A boolean is_issuevalue, which is True
- if we believe this example suffers from the issue in question. 
- is_issuemay be determined by thresholding the severity score
- (with an a priori determined reasonable threshold value), or via some other means (e.g. Confident Learning for flagging label issues). 
 
- A boolean 
 - The IssueManager should also report: - A global value between 0 and 1 summarizing how severe this issue is in the dataset overall - (e.g. the average severity across all examples in dataset or count of examples where - is_issue=True).- Other interesting - infoabout the issue and examples in the dataset, and statistics estimated from current dataset that may be reused to score this issue in future data. For example,- infofor label issues could contain the: confident_thresholds, confident_joint, predicted label for each example, etc. Another example is for (near)-duplicate detection issue, where- infocould contain: which set of examples in the dataset are all (nearly) identical.
 - Implementing a new IssueManager: - Define the - issue_nameclass attribute, e.g. “label”, “duplicate”, “outlier”, etc. - Implement the abstract methods- find_issuesand- collect_info.- find_issuesis responsible for computing computing the- issuesand- summarydataframes.
- collect_infois responsible for computing the- infodict. It is called by- find_issues, once the manager has set the- issuesand- summarydataframes as instance attributes.
 - Attributes: - Short text that summarizes the type of issues handled by this IssueManager. - Returns a key that is used to store issue summary results about the assigned Lab. - Returns a key that is used to store issue score results about the assigned Lab. - A dictionary of verbosity levels and their corresponding dictionaries of report items to print. - Methods: - find_issues(*args, **kwargs)- Finds occurrences of this particular issue in the dataset. - collect_info(*args, **kwargs)- Collects data for the info attribute of the Datalab. - make_summary(score)- Construct a summary dataframe. - report(issues, summary, info[, ...])- Compose a report of the issues found by this IssueManager. - description: ClassVar[str]#
- Short text that summarizes the type of issues handled by this IssueManager. 
 - issue_name: ClassVar[str]#
- Returns a key that is used to store issue summary results about the assigned Lab. 
 - issue_score_key: ClassVar[str]#
- Returns a key that is used to store issue score results about the assigned Lab. 
 - verbosity_levels: ClassVar[Dict[int, List[str]]]#
- A dictionary of verbosity levels and their corresponding dictionaries of report items to print. - Example - >>> verbosity_levels = { ... 0: [], ... 1: ["some_info_key"], ... 2: ["additional_info_key"], ... } 
 - info: Dict[str, Any]#
 - issues: DataFrame#
 - summary: DataFrame#
 - abstract find_issues(*args, **kwargs)[source]#
- Finds occurrences of this particular issue in the dataset. - Computes the - issuesand- summarydataframes. Calls- collect_infoto compute the- infodict.- Return type:
- None
 
 - collect_info(*args, **kwargs)[source]#
- Collects data for the info attribute of the Datalab. - Note - This method is called by - find_issues()after- find_issues()has set the- issuesand- summarydataframes as instance attributes.- Return type:
- dict
 
 - classmethod make_summary(score)[source]#
- Construct a summary dataframe. - Parameters:
- score ( - float) – The overall score for this issue.
- Return type:
- DataFrame
- Returns:
- summary– A summary dataframe.
 
 - classmethod report(issues, summary, info, num_examples=5, verbosity=0, include_description=False, info_to_omit=None)[source]#
- Compose a report of the issues found by this IssueManager. - Parameters:
- issues ( - DataFrame) –- An issues dataframe. - Example - >>> import pandas as pd >>> issues = pd.DataFrame( ... { ... "is_X_issue": [True, False, True], ... "X_score": [0.2, 0.9, 0.4], ... }, ... ) 
- summary ( - DataFrame) –- The summary dataframe. - Example - >>> summary = pd.DataFrame( ... { ... "issue_type": ["X"], ... "score": [0.5], ... }, ... ) 
- info ( - Dict[- str,- Any]) –- The info dict. - Example - >>> info = { ... "A": "val_A", ... "B": ["val_B1", "val_B2"], ... } 
- num_examples ( - int) – The number of examples to print.
- verbosity ( - int) – The verbosity level of the report.
- include_description ( - bool) – Whether to include a description of the issue in the report.
 
- Return type:
- str
- Returns:
- report_str– A string containing the report.