multilabel_scorer#
Helper classes and functions used internally to compute label quality scores in multilabel classification.
Classes:

Enum for the different methods to compute label quality scores. 

Helper class for aggregating the label quality scores for each class into a single score for each datapoint. 

Aggregates label quality scores across different classes to produce one score per example in multilabel classification tasks. 
Functions:

Exponential moving average (EMA) score aggregation function. 

Softmin score aggregation function. 

Computes a quality score for each label in a multilabel classification problem based on outofsample predicted probabilities. 
Compute the prior probability of each label in a multilabel classification problem. 

Get predicted probabilities for a multilabel classifier via crossvalidation. 
 class cleanlab.internal.multilabel_scorer.ClassLabelScorer(value)[source]#
Bases:
Enum
Enum for the different methods to compute label quality scores.
Attributes:
SELF_CONFIDENCE
(*args, **kwargs)Returns the selfconfidence labelquality score for each datapoint.
NORMALIZED_MARGIN
(*args, **kwargs)Returns the "normalized margin" labelquality score for each datapoint.
CONFIDENCE_WEIGHTED_ENTROPY
(*args, **kwargs)Returns the "confidence weighted entropy" labelquality score for each datapoint.
Methods:
__call__
(labels, pred_probs, **kwargs)Returns the labelquality scores for each datapoint based on the given labels and predicted probabilities.
from_str
(method)Constructs an instance of the ClassLabelScorer enum based on the given method name.
 SELF_CONFIDENCE(*args, **kwargs) = get_self_confidence_for_each_label#
Returns the selfconfidence labelquality score for each datapoint.
 NORMALIZED_MARGIN(*args, **kwargs) = get_normalized_margin_for_each_label#
Returns the “normalized margin” labelquality score for each datapoint.
 CONFIDENCE_WEIGHTED_ENTROPY(*args, **kwargs) = get_confidence_weighted_entropy_for_each_label#
Returns the “confidence weighted entropy” labelquality score for each datapoint.
 __call__(labels, pred_probs, **kwargs)[source]#
Returns the labelquality scores for each datapoint based on the given labels and predicted probabilities.
See the documentation for each method for more details.
Example
>>> import numpy as np >>> from cleanlab.internal.multilabel_scorer import ClassLabelScorer >>> labels = np.array([0, 0, 0, 1, 1, 1]) >>> pred_probs = np.array([ ... [0.9, 0.1], ... [0.8, 0.2], ... [0.7, 0.3], ... [0.2, 0.8], ... [0.75, 0.25], ... [0.1, 0.9], ... ]) >>> ClassLabelScorer.SELF_CONFIDENCE(labels, pred_probs) array([0.9 , 0.8 , 0.7 , 0.8 , 0.25, 0.9 ])
 Return type:
ndarray
 classmethod from_str(method)[source]#
Constructs an instance of the ClassLabelScorer enum based on the given method name.
 Parameters:
method (
str
) – The name of the scoring method to use. Return type:
 Returns:
scorer
– An instance of the ClassLabelScorer enum. Raises:
ValueError: – If the given method name is not a valid method name. It must be one of the following: “self_confidence”, “normalized_margin”, or “confidence_weighted_entropy”.
Example
>>> from cleanlab.internal.multilabel_scorer import ClassLabelScorer >>> ClassLabelScorer.from_str("self_confidence") <ClassLabelScorer.SELF_CONFIDENCE: get_self_confidence_for_each_label>
 cleanlab.internal.multilabel_scorer.exponential_moving_average(s, *, alpha=None, axis=1, **_)[source]#
Exponential moving average (EMA) score aggregation function.
For a score vector s = (s_1, …, s_K) with K scores, the values are sorted in descending order and the exponential moving average of the last score is calculated, denoted as EMA_K according to the note below.
Note
The recursive formula for the EMA at step $t = 2, ..., K$ is:
$\text{EMA}_t = \alpha \cdot s_t + (1  \alpha) \cdot \text{EMA}_{t1}, \qquad 0 \leq \alpha \leq 1$We set $\text{EMA}_1 = s_1$ as the largest score in the sorted vector s.
$\alpha$ is the “forgetting factor” that gives more weight to the most recent scores, and successively less weight to the previous scores.
 Parameters:
s (
ndarray
) – Scores to be transformed.alpha (
Optional
[float
]) –Discount factor that determines the weight of the previous EMA score. Higher alpha means that the previous EMA score has a lower weight while the current score has a higher weight.
Its value must be in the interval [0, 1].
If alpha is None, it is set to 2 / (K + 1) where K is the number of scores.
axis (
int
) – Axis along which the scores are sorted.
 Return type:
ndarray
 Returns:
s_ema
– Exponential moving average score.
Examples
>>> from cleanlab.internal.multilabel_scorer import exponential_moving_average >>> import numpy as np >>> s = np.array([[0.1, 0.2, 0.3]]) >>> exponential_moving_average(s, alpha=0.5) np.array([0.175])
 cleanlab.internal.multilabel_scorer.softmin(s, *, temperature=0.1, axis=1, **_)[source]#
Softmin score aggregation function.
 Parameters:
s (
ndarray
) – Input array.temperature (
float
) – Temperature parameter. Too small values may cause numerical underflow and NaN scores.axis (
int
) – Axis along which to apply the function.
 Return type:
ndarray
 Returns:
Softmin score.
 class cleanlab.internal.multilabel_scorer.Aggregator(method, **kwargs)[source]#
Bases:
object
Helper class for aggregating the label quality scores for each class into a single score for each datapoint.
 Parameters:
method (
Union
[str
,Callable
]) – The method to compute the label quality scores for each class. If passed as a callable, your function should take in a 1D array of K scores and return a single aggregated score. Seeexponential_moving_average
for an example of such a function. Alternatively, this can be a str value to specify a builtin function, possible values are the keys of theAggregator
’s possible_methods attribute.kwargs – Additional keyword arguments to pass to the aggregation function when it is called.
Attributes:
Methods:
__call__
(scores, **kwargs)Returns the label quality scores for each datapoint based on the given label quality scores for each class.
 possible_methods: Dict[str, Callable[[...], ndarray]] = {'exponential_moving_average': <function exponential_moving_average>, 'softmin': <function softmin>}#
 __call__(scores, **kwargs)[source]#
Returns the label quality scores for each datapoint based on the given label quality scores for each class.
 Parameters:
scores (
ndarray
) – The label quality scores for each class. Return type:
ndarray
 Returns:
aggregated_scores
– A single label quality score for each datapoint.
 class cleanlab.internal.multilabel_scorer.MultilabelScorer(base_scorer=ClassLabelScorer.SELF_CONFIDENCE, aggregator=Aggregator(method=exponential_moving_average, kwargs={'alpha': 0.8}), *, strict=True)[source]#
Bases:
object
Aggregates label quality scores across different classes to produce one score per example in multilabel classification tasks.
 Parameters:
base_scorer (
ClassLabelScorer
) –The method to compute the label quality scores for each class.
See the documentation for the ClassLabelScorer enum for more details.
aggregator (
Union
[Aggregator
,Callable
]) –The method to aggregate the label quality scores for each class into a single score for each datapoint.
Defaults to the EMA (exponential moving average) aggregator with forgetting factor
alpha=0.8
.See the documentation for the Aggregator class for more details.
See also
strict (
bool
) – Flag for performing strict validation of the input data.
Methods:
__call__
(labels, pred_probs[, ...])Computes a quality score for each label in a multilabel classification problem based on outofsample predicted probabilities.
aggregate
(class_label_quality_scores, **kwargs)Aggregates the label quality scores for each class into a single overall label quality score for each example.
get_class_label_quality_scores
(labels, ...)Computes separate label quality scores for each class.
 __call__(labels, pred_probs, base_scorer_kwargs=None, **aggregator_kwargs)[source]#
Computes a quality score for each label in a multilabel classification problem based on outofsample predicted probabilities. For each example, the label quality scores for each class are aggregated into a single overall label quality score.
 Parameters:
labels (
ndarray
) – A 2D array of shape (n_samples, n_labels) with binary labels.pred_probs (
ndarray
) – A 2D array of shape (n_samples, n_labels) with predicted probabilities.kwargs – Additional keyword arguments to pass to the base_scorer and the aggregator.
base_scorer_kwargs (
Optional
[dict
]) –Keyword arguments to pass to the base_scorer
 aggregator_kwargs:
Additional keyword arguments to pass to the aggregator.
 Return type:
ndarray
 Returns:
scores
– A 1D array of shape (n_samples,) with the quality scores for each datapoint.
Examples
>>> from cleanlab.internal.multilabel_scorer import MultilabelScorer, ClassLabelScorer >>> import numpy as np >>> labels = np.array([[0, 1, 0], [1, 0, 1]]) >>> pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]]) >>> scorer = MultilabelScorer() >>> scores = scorer(labels, pred_probs) >>> scores array([0.9, 0.5])
>>> scorer = MultilabelScorer( ... base_scorer = ClassLabelScorer.NORMALIZED_MARGIN, ... aggregator = np.min, # Use the "worst" label quality score for each example. ... ) >>> scores = scorer(labels, pred_probs) >>> scores array([0.9, 0.4])
 aggregate(class_label_quality_scores, **kwargs)[source]#
Aggregates the label quality scores for each class into a single overall label quality score for each example.
 Parameters:
class_label_quality_scores (
ndarray
) –A 2D array of shape (n_samples, n_labels) with the label quality scores for each class.
See also
kwargs – Additional keyword arguments to pass to the aggregator.
 Return type:
ndarray
 Returns:
scores
– A 1D array of shape (n_samples,) with the quality scores for each datapoint.
Examples
>>> from cleanlab.internal.multilabel_scorer import MultilabelScorer >>> import numpy as np >>> class_label_quality_scores = np.array([[0.9, 0.9, 0.3],[0.4, 0.9, 0.6]]) >>> scorer = MultilabelScorer() # Use the default aggregator (exponential moving average) with default parameters. >>> scores = scorer.aggregate(class_label_quality_scores) >>> scores array([0.42, 0.452]) >>> new_scores = scorer.aggregate(class_label_quality_scores, alpha=0.5) # Use the default aggregator with custom parameters. >>> new_scores array([0.6, 0.575])
Warning
Make sure that keyword arguments correspond to the aggregation function used. I.e. the
exponential_moving_average
function supports analpha
keyword argument, butnp.min
does not.
 get_class_label_quality_scores(labels, pred_probs, base_scorer_kwargs=None)[source]#
Computes separate label quality scores for each class.
 Parameters:
labels (
ndarray
) – A 2D array of shape (n_samples, n_labels) with binary labels.pred_probs (
ndarray
) – A 2D array of shape (n_samples, n_labels) with predicted probabilities.base_scorer_kwargs (
Optional
[dict
]) – Keyword arguments to pass to the base scoringfunction.
 Return type:
ndarray
 Returns:
class_label_quality_scores
– A 2D array of shape (n_samples, n_labels) with the quality scores for each label.
Examples
>>> from cleanlab.internal.multilabel_scorer import MultilabelScorer >>> import numpy as np >>> labels = np.array([[0, 1, 0], [1, 0, 1]]) >>> pred_probs = np.array([[0.1, 0.9, 0.7], [0.4, 0.1, 0.6]]) >>> scorer = MultilabelScorer() # Use the default base scorer (SELF_CONFIDENCE) >>> class_label_quality_scores = scorer.get_class_label_quality_scores(labels, pred_probs) >>> class_label_quality_scores array([[0.9, 0.9, 0.3], [0.4, 0.9, 0.6]])
 cleanlab.internal.multilabel_scorer.get_label_quality_scores(labels, pred_probs, *, method=<cleanlab.internal.multilabel_scorer.MultilabelScorer object>, base_scorer_kwargs=None, **aggregator_kwargs)[source]#
Computes a quality score for each label in a multilabel classification problem based on outofsample predicted probabilities.
 Parameters:
labels – A 2D array of shape (N, K) with binary labels.
pred_probs – A 2D array of shape (N, K) with predicted probabilities.
method (
MultilabelScorer
) – A scoring+aggregation method for computing the label quality scores of examples in a multilabel classification setting.base_scorer_kwargs (
Optional
[dict
]) – Keyword arguments to pass to the classlabel scorer.aggregator_kwargs – Additional keyword arguments to pass to the aggregator.
 Return type:
ndarray
 Returns:
scores
– A 1D array of shape (N,) with the quality scores for each datapoint.
Examples
>>> import cleanlab.internal.multilabel_scorer as ml_scorer >>> import numpy as np >>> labels = np.array([[0, 1, 0], [1, 0, 1]]) >>> pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]]) >>> scores = ml_scorer.get_label_quality_scores(labels, pred_probs, method=ml_scorer.MultilabelScorer()) >>> scores array([0.9, 0.5])
See also
MultilabelScorer
See the documentation for the MultilabelScorer class for more examples of scoring methods and aggregation methods.
 cleanlab.internal.multilabel_scorer.multilabel_py(y)[source]#
Compute the prior probability of each label in a multilabel classification problem.
 Parameters:
y (
ndarray
) – A 2d array of binarized multilabels of shape (N, K) where N is the number of samples and K is the number of classes. Return type:
ndarray
 Returns:
py
– A 2d array of prior probabilities of shape (K,2) where the first column is the probability of the label being 0 and the second column is the probability of the label being 1 for each class.
Examples
>>> from cleanlab.internal.multilabel_scorer import multilabel_py >>> import numpy as np >>> y = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) >>> multilabel_py(y) array([[0.5, 0.5], [0.5, 0.5]]) >>> y = np.array([[0, 0], [0, 1], [1, 0], [1, 0], [1, 0]]) >>> multilabel_py(y) array([[0.4, 0.6], [0.8, 0.2]])
 cleanlab.internal.multilabel_scorer.get_cross_validated_multilabel_pred_probs(X, labels, *, clf, cv)[source]#
Get predicted probabilities for a multilabel classifier via crossvalidation.
Note
The labels are reformatted to a “multiclass” format internally to support a wider range of crossvalidation strategies. If you have a multilabel dataset with K classes, the labels are reformatted to a “multiclass” format with up to 2**K classes (i.e. the number of possible classassignment configurations). It is unlikely that you’ll all 2**K configurations in your dataset.
 Parameters:
X – A 2d array of features of shape (N, M) where N is the number of samples and M is the number of features.
labels (
ndarray
) – A 2d array of binarized multilabels of shape (N, K) where N is the number of samples and K is the number of classes.clf – A multilabel classifier with a
predict_proba
method.cv – A crossvalidation splitter with a
split
method that returns a generator of train/test indices.
 Return type:
ndarray
 Returns:
pred_probs
– A 2d array of predicted probabilities of shape (N, K) where N is the number of samples and K is the number of classes.Note
The predicted probabilities are not expected to sum to 1 for each sample in the case of multilabel classification.
Examples
>>> import numpy as np >>> from sklearn.model_selection import KFold >>> from sklearn.multiclass import OneVsRestClassifier >>> from sklearn.ensemble import RandomForestClassifier >>> from cleanlab.internal.multilabel_scorer import get_cross_validated_multilabel_pred_probs >>> np.random.seed(0) >>> X = np.random.rand(16, 2) >>> labels = np.random.randint(0, 2, size=(16, 2)) >>> clf = OneVsRestClassifier(RandomForestClassifier()) >>> cv = KFold(n_splits=2) >>> get_cross_validated_multilabel_pred_probs(X, labels, clf=clf, cv=cv)