factory#
The factory module provides a factory class for constructing concrete issue managers and a decorator for registering new issue managers.
This module provides the register()
decorator for users to register new subclasses of
IssueManager
in the registry. Each IssueManager detects some particular type of issue in a dataset.
Note
The REGISTRY
variable is used by the factory class to keep track
of registered issue managers.
The factory class is used as an implementation detail by
Datalab
,
which provides a simplified API for constructing concrete issue managers.
Datalab
is intended to be used by users
and provides detailed documentation on how to use the API.
Warning
Neither the REGISTRY
variable nor the factory class should be used directly by users.
Data:
Registry of issue managers that can be constructed from a task and issue type and used in the Datalab class. |
Functions:
|
Registers the issue manager factory. |
Returns a list of all registered issue types. |
|
|
Returns a list of the issue types that are run by default when |
- cleanlab.datalab.internal.issue_manager_factory.REGISTRY: Dict[Task, Dict[str, Type[IssueManager]]]#
Registry of issue managers that can be constructed from a task and issue type and used in the Datalab class.
Currently, the following issue managers are registered by default for a given task:
Classification:
"outlier"
:OutlierIssueManager
"label"
:LabelIssueManager
"near_duplicate"
:NearDuplicateIssueManager
"non_iid"
:NonIIDIssueManager
"class_imbalance"
:ClassImbalanceIssueManager
"underperforming_group"
:UnderperformingGroupIssueManager
"data_valuation"
:DataValuationIssueManager
"null"
:NullIssueManager
Regression:
"label"
:RegressionLabelIssueManager
"outlier"
:OutlierIssueManager
"near_duplicate"
:NearDuplicateIssueManager
"non_iid"
:NonIIDIssueManager
"null"
:NullIssueManager
Multilabel:
"label"
:MultilabelIssueManager
"outlier"
:OutlierIssueManager
"near_duplicate"
:NearDuplicateIssueManager
"non_iid"
:NonIIDIssueManager
"null"
:NullIssueManager
Warning
This variable should not be used directly by users.
- cleanlab.datalab.internal.issue_manager_factory.register(cls, task='classification')[source]#
Registers the issue manager factory.
- Parameters:
cls (
Type
[IssueManager
]) – A subclass ofIssueManager
.task (
str
) – Specific machine learning task like classification or regression. SeeTask.from_str <cleanlab.datalab.internal.task.Task.from_str>`()
for more details, to see which task type corresponds to which string.
- Return type:
Type
[IssueManager
]- Returns:
cls
– The same class that was passed in.
Example
When defining a new subclass of
IssueManager
, you can register it like so:from cleanlab import IssueManager from cleanlab.datalab.internal.issue_manager_factory import register @register class MyIssueManager(IssueManager): issue_name: str = "my_issue" def find_issues(self, **kwargs): # Some logic to find issues pass
or in a function call:
from cleanlab import IssueManager from cleanlab.datalab.internal.issue_manager_factory import register class MyIssueManager(IssueManager): issue_name: str = "my_issue" def find_issues(self, **kwargs): # Some logic to find issues pass register(MyIssueManager, task="classification")
- cleanlab.datalab.internal.issue_manager_factory.list_possible_issue_types(task)[source]#
Returns a list of all registered issue types.
Any issue type that is not in this list cannot be used in the
find_issues()
method. :rtype:List
[str
]See also
REGISTRY
: All available issue types and their corresponding issue managers can be found here.
- cleanlab.datalab.internal.issue_manager_factory.list_default_issue_types(task)[source]#
Returns a list of the issue types that are run by default when
find_issues()
is called without specifying issue_types.- Return type:
List
[str
]
- task :
Specific machine learning task supported by Datalab.
See also
REGISTRY
: All available issue types and their corresponding issue managers can be found here.