multilabel_scorer#

Helper classes and functions used internally to compute label quality scores in multi-label classification.

Classes:

ClassLabelScorer(value)

Enum for the different methods to compute label quality scores.

Aggregator(method, **kwargs)

Helper class for aggregating the label quality scores for each class into a single score for each datapoint.

MultilabelScorer([base_scorer, aggregator, ...])

Aggregates label quality scores across different classes to produce one score per example in multi-label classification tasks.

Functions:

exponential_moving_average(s, *[, alpha, axis])

Exponential moving average (EMA) score aggregation function.

softmin(s, *[, temperature, axis])

Softmin score aggregation function.

get_label_quality_scores(labels, pred_probs, *)

Computes a quality score for each label in a multi-label classification problem based on out-of-sample predicted probabilities.

multilabel_py(y)

Compute the prior probability of each label in a multi-label classification problem.

get_cross_validated_multilabel_pred_probs(X, ...)

Get predicted probabilities for a multi-label classifier via cross-validation.

class cleanlab.internal.multilabel_scorer.ClassLabelScorer(value)[source]#

Bases: Enum

Enum for the different methods to compute label quality scores.

Attributes:

SELF_CONFIDENCE(*args, **kwargs)

Returns the self-confidence label-quality score for each datapoint.

NORMALIZED_MARGIN(*args, **kwargs)

Returns the "normalized margin" label-quality score for each datapoint.

CONFIDENCE_WEIGHTED_ENTROPY(*args, **kwargs)

Returns the "confidence weighted entropy" label-quality score for each datapoint.

Methods:

__call__(labels, pred_probs, **kwargs)

Returns the label-quality scores for each datapoint based on the given labels and predicted probabilities.

from_str(method)

Constructs an instance of the ClassLabelScorer enum based on the given method name.

SELF_CONFIDENCE(*args, **kwargs) = get_self_confidence_for_each_label#

Returns the self-confidence label-quality score for each datapoint.

NORMALIZED_MARGIN(*args, **kwargs) = get_normalized_margin_for_each_label#

Returns the “normalized margin” label-quality score for each datapoint.

CONFIDENCE_WEIGHTED_ENTROPY(*args, **kwargs) = get_confidence_weighted_entropy_for_each_label#

Returns the “confidence weighted entropy” label-quality score for each datapoint.

__call__(labels, pred_probs, **kwargs)[source]#

Returns the label-quality scores for each datapoint based on the given labels and predicted probabilities.

See the documentation for each method for more details.

Example

>>> import numpy as np
>>> from cleanlab.internal.multilabel_scorer import ClassLabelScorer
>>> labels = np.array([0, 0, 0, 1, 1, 1])
>>> pred_probs = np.array([
...     [0.9, 0.1],
...     [0.8, 0.2],
...     [0.7, 0.3],
...     [0.2, 0.8],
...     [0.75, 0.25],
...     [0.1, 0.9],
... ])
>>> ClassLabelScorer.SELF_CONFIDENCE(labels, pred_probs)
array([0.9 , 0.8 , 0.7 , 0.8 , 0.25, 0.9 ])
Return type:

ndarray

classmethod from_str(method)[source]#

Constructs an instance of the ClassLabelScorer enum based on the given method name.

Parameters:

method (str) – The name of the scoring method to use.

Return type:

ClassLabelScorer

Returns:

scorer – An instance of the ClassLabelScorer enum.

Raises:

ValueError: – If the given method name is not a valid method name. It must be one of the following: “self_confidence”, “normalized_margin”, or “confidence_weighted_entropy”.

Example

>>> from cleanlab.internal.multilabel_scorer import ClassLabelScorer
>>> ClassLabelScorer.from_str("self_confidence")
<ClassLabelScorer.SELF_CONFIDENCE: get_self_confidence_for_each_label>
cleanlab.internal.multilabel_scorer.exponential_moving_average(s, *, alpha=None, axis=1, **_)[source]#

Exponential moving average (EMA) score aggregation function.

For a score vector s = (s_1, …, s_K) with K scores, the values are sorted in descending order and the exponential moving average of the last score is calculated, denoted as EMA_K according to the note below.

Note

The recursive formula for the EMA at step t=2,...,Kt = 2, ..., K is:

EMAt=αst+(1α)EMAt1,0α1\text{EMA}_t = \alpha \cdot s_t + (1 - \alpha) \cdot \text{EMA}_{t-1}, \qquad 0 \leq \alpha \leq 1

We set EMA1=s1\text{EMA}_1 = s_1 as the largest score in the sorted vector s.

α\alpha is the “forgetting factor” that gives more weight to the most recent scores, and successively less weight to the previous scores.

Parameters:
  • s (ndarray) – Scores to be transformed.

  • alpha (Optional[float]) –

    Discount factor that determines the weight of the previous EMA score. Higher alpha means that the previous EMA score has a lower weight while the current score has a higher weight.

    Its value must be in the interval [0, 1].

    If alpha is None, it is set to 2 / (K + 1) where K is the number of scores.

  • axis (int) – Axis along which the scores are sorted.

Return type:

ndarray

Returns:

s_ema – Exponential moving average score.

Examples

>>> from cleanlab.internal.multilabel_scorer import exponential_moving_average
>>> import numpy as np
>>> s = np.array([[0.1, 0.2, 0.3]])
>>> exponential_moving_average(s, alpha=0.5)
np.array([0.175])
cleanlab.internal.multilabel_scorer.softmin(s, *, temperature=0.1, axis=1, **_)[source]#

Softmin score aggregation function.

Parameters:
  • s (ndarray) – Input array.

  • temperature (float) – Temperature parameter. Too small values may cause numerical underflow and NaN scores.

  • axis (int) – Axis along which to apply the function.

Return type:

ndarray

Returns:

Softmin score.

class cleanlab.internal.multilabel_scorer.Aggregator(method, **kwargs)[source]#

Bases: object

Helper class for aggregating the label quality scores for each class into a single score for each datapoint.

Parameters:
  • method (Union[str, Callable]) – The method to compute the label quality scores for each class. If passed as a callable, your function should take in a 1D array of K scores and return a single aggregated score. See exponential_moving_average for an example of such a function. Alternatively, this can be a str value to specify a built-in function, possible values are the keys of the Aggregator’s possible_methods attribute.

  • kwargs – Additional keyword arguments to pass to the aggregation function when it is called.

Attributes:

Methods:

__call__(scores, **kwargs)

Returns the label quality scores for each datapoint based on the given label quality scores for each class.

possible_methods: Dict[str, Callable[[...], ndarray]] = {'exponential_moving_average': <function exponential_moving_average>, 'softmin': <function softmin>}#
__call__(scores, **kwargs)[source]#

Returns the label quality scores for each datapoint based on the given label quality scores for each class.

Parameters:

scores (ndarray) – The label quality scores for each class.

Return type:

ndarray

Returns:

aggregated_scores – A single label quality score for each datapoint.

class cleanlab.internal.multilabel_scorer.MultilabelScorer(base_scorer=ClassLabelScorer.SELF_CONFIDENCE, aggregator=Aggregator(method=exponential_moving_average, kwargs={'alpha': 0.8}), *, strict=True)[source]#

Bases: object

Aggregates label quality scores across different classes to produce one score per example in multi-label classification tasks.

Parameters:
  • base_scorer (ClassLabelScorer) –

    The method to compute the label quality scores for each class.

    See the documentation for the ClassLabelScorer enum for more details.

  • aggregator (Union[Aggregator, Callable]) –

    The method to aggregate the label quality scores for each class into a single score for each datapoint.

    Defaults to the EMA (exponential moving average) aggregator with forgetting factor alpha=0.8.

    See the documentation for the Aggregator class for more details.

  • strict (bool) – Flag for performing strict validation of the input data.

Methods:

__call__(labels, pred_probs[, ...])

Computes a quality score for each label in a multi-label classification problem based on out-of-sample predicted probabilities.

aggregate(class_label_quality_scores, **kwargs)

Aggregates the label quality scores for each class into a single overall label quality score for each example.

get_class_label_quality_scores(labels, ...)

Computes separate label quality scores for each class.

__call__(labels, pred_probs, base_scorer_kwargs=None, **aggregator_kwargs)[source]#

Computes a quality score for each label in a multi-label classification problem based on out-of-sample predicted probabilities. For each example, the label quality scores for each class are aggregated into a single overall label quality score.

Parameters:
  • labels (ndarray) – A 2D array of shape (n_samples, n_labels) with binary labels.

  • pred_probs (ndarray) – A 2D array of shape (n_samples, n_labels) with predicted probabilities.

  • kwargs – Additional keyword arguments to pass to the base_scorer and the aggregator.

  • base_scorer_kwargs (Optional[dict]) –

    Keyword arguments to pass to the base_scorer

    aggregator_kwargs:

    Additional keyword arguments to pass to the aggregator.

Return type:

ndarray

Returns:

scores – A 1D array of shape (n_samples,) with the quality scores for each datapoint.

Examples

>>> from cleanlab.internal.multilabel_scorer import MultilabelScorer, ClassLabelScorer
>>> import numpy as np
>>> labels = np.array([[0, 1, 0], [1, 0, 1]])
>>> pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]])
>>> scorer = MultilabelScorer()
>>> scores = scorer(labels, pred_probs)
>>> scores
array([0.9, 0.5])
>>> scorer = MultilabelScorer(
...     base_scorer = ClassLabelScorer.NORMALIZED_MARGIN,
...     aggregator = np.min,  # Use the "worst" label quality score for each example.
... )
>>> scores = scorer(labels, pred_probs)
>>> scores
array([0.9, 0.4])
aggregate(class_label_quality_scores, **kwargs)[source]#

Aggregates the label quality scores for each class into a single overall label quality score for each example.

Parameters:
  • class_label_quality_scores (ndarray) –

    A 2D array of shape (n_samples, n_labels) with the label quality scores for each class.

  • kwargs – Additional keyword arguments to pass to the aggregator.

Return type:

ndarray

Returns:

scores – A 1D array of shape (n_samples,) with the quality scores for each datapoint.

Examples

>>> from cleanlab.internal.multilabel_scorer import MultilabelScorer
>>> import numpy as np
>>> class_label_quality_scores = np.array([[0.9, 0.9, 0.3],[0.4, 0.9, 0.6]])
>>> scorer = MultilabelScorer() # Use the default aggregator (exponential moving average) with default parameters.
>>> scores = scorer.aggregate(class_label_quality_scores)
>>> scores
array([0.42, 0.452])
>>> new_scores = scorer.aggregate(class_label_quality_scores, alpha=0.5) # Use the default aggregator with custom parameters.
>>> new_scores
array([0.6, 0.575])

Warning

Make sure that keyword arguments correspond to the aggregation function used. I.e. the exponential_moving_average function supports an alpha keyword argument, but np.min does not.

get_class_label_quality_scores(labels, pred_probs, base_scorer_kwargs=None)[source]#

Computes separate label quality scores for each class.

Parameters:
  • labels (ndarray) – A 2D array of shape (n_samples, n_labels) with binary labels.

  • pred_probs (ndarray) – A 2D array of shape (n_samples, n_labels) with predicted probabilities.

  • base_scorer_kwargs (Optional[dict]) – Keyword arguments to pass to the base scoring-function.

Return type:

ndarray

Returns:

class_label_quality_scores – A 2D array of shape (n_samples, n_labels) with the quality scores for each label.

Examples

>>> from cleanlab.internal.multilabel_scorer import MultilabelScorer
>>> import numpy as np
>>> labels = np.array([[0, 1, 0], [1, 0, 1]])
>>> pred_probs = np.array([[0.1, 0.9, 0.7], [0.4, 0.1, 0.6]])
>>> scorer = MultilabelScorer() # Use the default base scorer (SELF_CONFIDENCE)
>>> class_label_quality_scores = scorer.get_class_label_quality_scores(labels, pred_probs)
>>> class_label_quality_scores
array([[0.9, 0.9, 0.3],
       [0.4, 0.9, 0.6]])
cleanlab.internal.multilabel_scorer.get_label_quality_scores(labels, pred_probs, *, method=<cleanlab.internal.multilabel_scorer.MultilabelScorer object>, base_scorer_kwargs=None, **aggregator_kwargs)[source]#

Computes a quality score for each label in a multi-label classification problem based on out-of-sample predicted probabilities.

Parameters:
  • labels – A 2D array of shape (N, K) with binary labels.

  • pred_probs – A 2D array of shape (N, K) with predicted probabilities.

  • method (MultilabelScorer) – A scoring+aggregation method for computing the label quality scores of examples in a multi-label classification setting.

  • base_scorer_kwargs (Optional[dict]) – Keyword arguments to pass to the class-label scorer.

  • aggregator_kwargs – Additional keyword arguments to pass to the aggregator.

Return type:

ndarray

Returns:

scores – A 1D array of shape (N,) with the quality scores for each datapoint.

Examples

>>> import cleanlab.internal.multilabel_scorer as ml_scorer
>>> import numpy as np
>>> labels = np.array([[0, 1, 0], [1, 0, 1]])
>>> pred_probs = np.array([[0.1, 0.9, 0.1], [0.4, 0.1, 0.9]])
>>> scores = ml_scorer.get_label_quality_scores(labels, pred_probs, method=ml_scorer.MultilabelScorer())
>>> scores
array([0.9, 0.5])

See also

MultilabelScorer

See the documentation for the MultilabelScorer class for more examples of scoring methods and aggregation methods.

cleanlab.internal.multilabel_scorer.multilabel_py(y)[source]#

Compute the prior probability of each label in a multi-label classification problem.

Parameters:

y (ndarray) – A 2d array of binarized multi-labels of shape (N, K) where N is the number of samples and K is the number of classes.

Return type:

ndarray

Returns:

py – A 2d array of prior probabilities of shape (K,2) where the first column is the probability of the label being 0 and the second column is the probability of the label being 1 for each class.

Examples

>>> from cleanlab.internal.multilabel_scorer import multilabel_py
>>> import numpy as np
>>> y = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
>>> multilabel_py(y)
array([[0.5, 0.5],
       [0.5, 0.5]])
>>> y = np.array([[0, 0], [0, 1], [1, 0], [1, 0], [1, 0]])
>>> multilabel_py(y)
array([[0.4, 0.6],
       [0.8, 0.2]])
cleanlab.internal.multilabel_scorer.get_cross_validated_multilabel_pred_probs(X, labels, *, clf, cv)[source]#

Get predicted probabilities for a multi-label classifier via cross-validation.

Note

The labels are reformatted to a “multi-class” format internally to support a wider range of cross-validation strategies. If you have a multi-label dataset with K classes, the labels are reformatted to a “multi-class” format with up to 2**K classes (i.e. the number of possible class-assignment configurations). It is unlikely that you’ll all 2**K configurations in your dataset.

Parameters:
  • X – A 2d array of features of shape (N, M) where N is the number of samples and M is the number of features.

  • labels (ndarray) – A 2d array of binarized multi-labels of shape (N, K) where N is the number of samples and K is the number of classes.

  • clf – A multi-label classifier with a predict_proba method.

  • cv – A cross-validation splitter with a split method that returns a generator of train/test indices.

Return type:

ndarray

Returns:

pred_probs – A 2d array of predicted probabilities of shape (N, K) where N is the number of samples and K is the number of classes.

Note

The predicted probabilities are not expected to sum to 1 for each sample in the case of multi-label classification.

Examples

>>> import numpy as np
>>> from sklearn.model_selection import KFold
>>> from sklearn.multiclass import OneVsRestClassifier
>>> from sklearn.ensemble import RandomForestClassifier
>>> from cleanlab.internal.multilabel_scorer import get_cross_validated_multilabel_pred_probs
>>> np.random.seed(0)
>>> X = np.random.rand(16, 2)
>>> labels = np.random.randint(0, 2, size=(16, 2))
>>> clf = OneVsRestClassifier(RandomForestClassifier())
>>> cv = KFold(n_splits=2)
>>> get_cross_validated_multilabel_pred_probs(X, labels, clf=clf, cv=cv)