# multilabel_scorer#

Helper classes and functions used internally to compute label quality scores in multi-label classification.

Classes:

 Aggregator(method, **kwargs) Helper class for aggregating the label quality scores for each class into a single score for each datapoint. ClassLabelScorer(value) Enum for the different methods to compute label quality scores. MultilabelScorer([base_scorer, aggregator, ...]) Aggregates label quality scores across different classes to produce one score per example in multi-label classification tasks.

Functions:

 exponential_moving_average(s, *[, alpha, axis]) Exponential moving average (EMA) score aggregation function. Get predicted probabilities for a multi-label classifier via cross-validation. get_label_quality_scores(labels, pred_probs, *) Computes a quality score for each label in a multi-label classification problem based on out-of-sample predicted probabilities. Compute the prior probability of each label in a multi-label classification problem. softmin(s, *[, temperature, axis]) Softmin score aggregation function.
class cleanlab.internal.multilabel_scorer.Aggregator(method, **kwargs)[source]#

Bases: object

Helper class for aggregating the label quality scores for each class into a single score for each datapoint.

Parameters:
• method (Union[str, Callable]) – The method to compute the label quality scores for each class. If passed as a callable, your function should take in a 1D array of K scores and return a single aggregated score. See exponential_moving_average for an example of such a function. Alternatively, this can be a str value to specify a built-in function, possible values are the keys of the Aggregator’s possible_methods attribute.

• kwargs – Additional keyword arguments to pass to the aggregation function when it is called.

Methods:

 __call__(scores, **kwargs) Returns the label quality scores for each datapoint based on the given label quality scores for each class.

Attributes:

__call__(scores, **kwargs)[source]#

Returns the label quality scores for each datapoint based on the given label quality scores for each class.

Parameters:

scores (ndarray) – The label quality scores for each class.

Return type:

ndarray

Returns:

aggregated_scores – A single label quality score for each datapoint.

possible_methods: Dict[str, Callable[[...], ndarray]] = {'exponential_moving_average': <function exponential_moving_average>, 'softmin': <function softmin>}#
class cleanlab.internal.multilabel_scorer.ClassLabelScorer(value)[source]#

Bases: Enum

Enum for the different methods to compute label quality scores.

Attributes:

 CONFIDENCE_WEIGHTED_ENTROPY(*args, **kwargs) Returns the "confidence weighted entropy" label-quality score for each datapoint. NORMALIZED_MARGIN(*args, **kwargs) Returns the "normalized margin" label-quality score for each datapoint. SELF_CONFIDENCE(*args, **kwargs) Returns the self-confidence label-quality score for each datapoint.

Methods:

 __call__(labels, pred_probs, **kwargs) Returns the label-quality scores for each datapoint based on the given labels and predicted probabilities. from_str(method) Constructs an instance of the ClassLabelScorer enum based on the given method name.
CONFIDENCE_WEIGHTED_ENTROPY(*args, **kwargs) = get_confidence_weighted_entropy_for_each_label#

Returns the “confidence weighted entropy” label-quality score for each datapoint.

NORMALIZED_MARGIN(*args, **kwargs) = get_normalized_margin_for_each_label#

Returns the “normalized margin” label-quality score for each datapoint.

SELF_CONFIDENCE(*args, **kwargs) = get_self_confidence_for_each_label#

Returns the self-confidence label-quality score for each datapoint.

__call__(labels, pred_probs, **kwargs)[source]#

Returns the label-quality scores for each datapoint based on the given labels and predicted probabilities.

See the documentation for each method for more details.

Example

>>> import numpy as np
>>> from cleanlab.internal.multilabel_scorer import ClassLabelScorer
>>> labels = np.array([0, 0, 0, 1, 1, 1])
>>> pred_probs = np.array([
...     [0.9, 0.1],
...     [0.8, 0.2],
...     [0.7, 0.3],
...     [0.2, 0.8],
...     [0.75, 0.25],
...     [0.1, 0.9],
... ])
>>> ClassLabelScorer.SELF_CONFIDENCE(labels, pred_probs)
array([0.9 , 0.8 , 0.7 , 0.8 , 0.25, 0.9 ])

Return type:

ndarray

classmethod from_str(method)[source]#

Constructs an instance of the ClassLabelScorer enum based on the given method name.

Parameters:

method (str) – The name of the scoring method to use.

Return type:

ClassLabelScorer

Returns:

scorer – An instance of the ClassLabelScorer enum.

Raises:

ValueError: – If the given method name is not a valid method name. It must be one of the following: “self_confidence”, “normalized_margin”, or “confidence_weighted_entropy”.

Example

>>> from cleanlab.internal.multilabel_scorer import ClassLabelScorer
>>> ClassLabelScorer.from_str("self_confidence")
<ClassLabelScorer.SELF_CONFIDENCE: get_self_confidence_for_each_label>

class cleanlab.internal.multilabel_scorer.MultilabelScorer(base_scorer=ClassLabelScorer.SELF_CONFIDENCE, aggregator=Aggregator(method=exponential_moving_average, kwargs={'alpha': 0.8}), *, strict=True)[source]#

Bases: object

Aggregates label quality scores across different classes to produce one score per example in multi-label classification tasks.

Parameters:
• base_scorer (ClassLabelScorer) –

The method to compute the label quality scores for each class.

See the documentation for the ClassLabelScorer enum for more details.

• aggregator (Union[Aggregator, Callable]) –

The method to aggregate the label quality scores for each class into a single score for each datapoint.

Defaults to the EMA (exponential moving average) aggregator with forgetting factor alpha=0.8.

See the documentation for the Aggregator class for more details.

• strict (bool) – Flag for performing strict validation of the input data.

Methods:

 __call__(labels, pred_probs[, ...]) Computes a quality score for each label in a multi-label classification problem based on out-of-sample predicted probabilities. aggregate(class_label_quality_scores, **kwargs) Aggregates the label quality scores for each class into a single overall label quality score for each example. get_class_label_quality_scores(labels, ...) Computes separate label quality scores for each class.
__call__(labels, pred_probs, base_scorer_kwargs=None, **aggregator_kwargs)[source]#

Computes a quality score for each label in a multi-label classification problem based on out-of-sample predicted probabilities. For each example, the label quality scores for each class are aggregated into a single overall label quality score.

Parameters:
• labels (ndarray) – A 2D array of shape (n_samples, n_labels) with binary labels.

• pred_probs (ndarray) – A 2D array of shape (n_samples, n_labels) with predicted probabilities.

• kwargs – Additional keyword arguments to pass to the base_scorer and the aggregator.

• base_scorer_kwargs (Optional[dict]) –

Keyword arguments to pass to the base_scorer

aggregator_kwargs:

Additional keyword arguments to pass to the aggregator.

Return type:

ndarray

Returns:

scores – A 1D array of shape (n_samples,) with the quality scores for each datapoint.

Examples

>>> from cleanlab.internal.multilabel_scorer import MultilabelScorer, ClassLabelScorer
>>> import numpy as np
>>> labels = np.array([[0, 1, 0], [1, 0, 1]])
>>> pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]])
>>> scorer = MultilabelScorer()
>>> scores = scorer(labels, pred_probs)
>>> scores
array([0.9, 0.5])

>>> scorer = MultilabelScorer(
...     base_scorer = ClassLabelScorer.NORMALIZED_MARGIN,
...     aggregator = np.min,  # Use the "worst" label quality score for each example.
... )
>>> scores = scorer(labels, pred_probs)
>>> scores
array([0.9, 0.4])

aggregate(class_label_quality_scores, **kwargs)[source]#

Aggregates the label quality scores for each class into a single overall label quality score for each example.

Parameters:
• class_label_quality_scores (ndarray) –

A 2D array of shape (n_samples, n_labels) with the label quality scores for each class.

• kwargs – Additional keyword arguments to pass to the aggregator.

Return type:

ndarray

Returns:

scores – A 1D array of shape (n_samples,) with the quality scores for each datapoint.

Examples

>>> from cleanlab.internal.multilabel_scorer import MultilabelScorer
>>> import numpy as np
>>> class_label_quality_scores = np.array([[0.9, 0.9, 0.3],[0.4, 0.9, 0.6]])
>>> scorer = MultilabelScorer() # Use the default aggregator (exponential moving average) with default parameters.
>>> scores = scorer.aggregate(class_label_quality_scores)
>>> scores
array([0.42, 0.452])
>>> new_scores = scorer.aggregate(class_label_quality_scores, alpha=0.5) # Use the default aggregator with custom parameters.
>>> new_scores
array([0.6, 0.575])


Warning

Make sure that keyword arguments correspond to the aggregation function used. I.e. the exponential_moving_average function supports an alpha keyword argument, but np.min does not.

get_class_label_quality_scores(labels, pred_probs, base_scorer_kwargs=None)[source]#

Computes separate label quality scores for each class.

Parameters:
• labels (ndarray) – A 2D array of shape (n_samples, n_labels) with binary labels.

• pred_probs (ndarray) – A 2D array of shape (n_samples, n_labels) with predicted probabilities.

• base_scorer_kwargs (Optional[dict]) – Keyword arguments to pass to the base scoring-function.

Return type:

ndarray

Returns:

class_label_quality_scores – A 2D array of shape (n_samples, n_labels) with the quality scores for each label.

Examples

>>> from cleanlab.internal.multilabel_scorer import MultilabelScorer
>>> import numpy as np
>>> labels = np.array([[0, 1, 0], [1, 0, 1]])
>>> pred_probs = np.array([[0.1, 0.9, 0.7], [0.4, 0.1, 0.6]])
>>> scorer = MultilabelScorer() # Use the default base scorer (SELF_CONFIDENCE)
>>> class_label_quality_scores = scorer.get_class_label_quality_scores(labels, pred_probs)
>>> class_label_quality_scores
array([[0.9, 0.9, 0.3],
[0.4, 0.9, 0.6]])

cleanlab.internal.multilabel_scorer.exponential_moving_average(s, *, alpha=None, axis=1, **_)[source]#

Exponential moving average (EMA) score aggregation function.

For a score vector s = (s_1, …, s_K) with K scores, the values are sorted in descending order and the exponential moving average of the last score is calculated, denoted as EMA_K according to the note below.

Note

The recursive formula for the EMA at step $t = 2, ..., K$ is:

$\\text{EMA}_t = \\alpha \cdot s_t + (1 - \\alpha) \cdot \\text{EMA}_{t-1}, \\qquad 0 \\leq \\alpha \\leq 1$

We set $\\text{EMA}_1 = s_1$ as the largest score in the sorted vector s.

$\\alpha$ is the “forgetting factor” that gives more weight to the most recent scores, and successively less weight to the previous scores.

Parameters:
• s (ndarray) – Scores to be transformed.

• alpha (Optional[float]) –

Discount factor that determines the weight of the previous EMA score. Higher alpha means that the previous EMA score has a lower weight while the current score has a higher weight.

Its value must be in the interval [0, 1].

If alpha is None, it is set to 2 / (K + 1) where K is the number of scores.

• axis (int) – Axis along which the scores are sorted.

Return type:

ndarray

Returns:

s_ema – Exponential moving average score.

Examples

>>> from cleanlab.internal.multilabel_scorer import exponential_moving_average
>>> import numpy as np
>>> s = np.array([0.1, 0.2, 0.3])
>>> exponential_moving_average(s, alpha=0.5)
np.array([0.175])

cleanlab.internal.multilabel_scorer.get_cross_validated_multilabel_pred_probs(X, labels, *, clf, cv)[source]#

Get predicted probabilities for a multi-label classifier via cross-validation.

Note

The labels are reformatted to a “multi-class” format internally to support a wider range of cross-validation strategies. If you have a multi-label dataset with K classes, the labels are reformatted to a “multi-class” format with up to 2**K classes (i.e. the number of possible class-assignment configurations). It is unlikely that you’ll all 2**K configurations in your dataset.

Parameters:
• X – A 2d array of features of shape (N, M) where N is the number of samples and M is the number of features.

• labels (ndarray) – A 2d array of binarized multi-labels of shape (N, K) where N is the number of samples and K is the number of classes.

• clf – A multi-label classifier with a predict_proba method.

• cv – A cross-validation splitter with a split method that returns a generator of train/test indices.

Return type:

ndarray

Returns:

pred_probs – A 2d array of predicted probabilities of shape (N, K) where N is the number of samples and K is the number of classes.

Note

The predicted probabilities are not expected to sum to 1 for each sample in the case of multi-label classification.

Examples

>>> import numpy as np
>>> from sklearn.model_selection import KFold
>>> from sklearn.multiclass import OneVsRestClassifier
>>> from sklearn.ensemble import RandomForestClassifier
>>> from cleanlab.internal.multilabel_scorer import get_cross_validated_multilabel_pred_probs
>>> np.random.seed(0)
>>> X = np.random.rand(16, 2)
>>> labels = np.random.randint(0, 2, size=(16, 2))
>>> clf = OneVsRestClassifier(RandomForestClassifier())
>>> cv = KFold(n_splits=2)
>>> get_cross_validated_multilabel_pred_probs(X, labels, clf=clf, cv=cv)

cleanlab.internal.multilabel_scorer.get_label_quality_scores(labels, pred_probs, *, method=<cleanlab.internal.multilabel_scorer.MultilabelScorer object>, base_scorer_kwargs=None, **aggregator_kwargs)[source]#

Computes a quality score for each label in a multi-label classification problem based on out-of-sample predicted probabilities.

Parameters:
• labels – A 2D array of shape (N, K) with binary labels.

• pred_probs – A 2D array of shape (N, K) with predicted probabilities.

• method (MultilabelScorer) – A scoring+aggregation method for computing the label quality scores of examples in a multi-label classification setting.

• base_scorer_kwargs (Optional[dict]) – Keyword arguments to pass to the class-label scorer.

• aggregator_kwargs – Additional keyword arguments to pass to the aggregator.

Return type:

ndarray

Returns:

scores – A 1D array of shape (N,) with the quality scores for each datapoint.

Examples

>>> import cleanlab.internal.multilabel_scorer as ml_scorer
>>> import numpy as np
>>> labels = np.array([[0, 1, 0], [1, 0, 1]])
>>> pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]])
>>> scores = ml_scorer.get_label_quality_scores(labels, pred_probs, method=ml_scorer.MultilabelScorer())
>>> scores
array([0.9, 0.5])


MultilabelScorer

See the documentation for the MultilabelScorer class for more examples of scoring methods and aggregation methods.

cleanlab.internal.multilabel_scorer.multilabel_py(y)[source]#

Compute the prior probability of each label in a multi-label classification problem.

Parameters:

y (ndarray) – A 2d array of binarized multi-labels of shape (N, K) where N is the number of samples and K is the number of classes.

Return type:

ndarray

Returns:

py – A 2d array of prior probabilities of shape (K,2) where the first column is the probability of the label being 0 and the second column is the probability of the label being 1 for each class.

Examples

>>> from cleanlab.internal.multilabel_scorer import multilabel_py
>>> import numpy as np
>>> y = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
>>> multilabel_py(y)
array([[0.5, 0.5],
[0.5, 0.5]])
>>> y = np.array([[0, 0], [0, 1], [1, 0], [1, 0], [1, 0]])
>>> multilabel_py(y)
array([[0.4, 0.6],
[0.8, 0.2]])

cleanlab.internal.multilabel_scorer.softmin(s, *, temperature=0.1, axis=1, **_)[source]#

Softmin score aggregation function.

Parameters:
• s (ndarray) – Input array.

• temperature (float) – Temperature parameter. Too small values may cause numerical underflow and NaN scores.

• axis (int) – Axis along which to apply the function.

Return type:

ndarray

Returns:

Softmin score.