Helper methods used internally for computing label quality scores.


get_normalized_entropy(pred_probs[, ...])

Return the normalized entropy of pred_probs.

cleanlab.internal.label_quality_utils.get_normalized_entropy(pred_probs, min_allowed_prob=None)[source]#

Return the normalized entropy of pred_probs.

Normalized entropy is between 0 and 1. Higher values of entropy indicate higher uncertainty in the model’s prediction of the correct label.

Read more about normalized entropy on Wikipedia.

Normalized entropy is used in active learning for uncertainty sampling:

Unlike label-quality scores, entropy only depends on the model’s predictions, not the given label.

  • pred_probs (np.ndarray (shape (N, K))) – Each row of this matrix corresponds to an example x and contains the model-predicted probabilities that x belongs to each possible class: P(label=k|x)

  • min_allowed_prob (float, default: None, deprecated) –

    Minimum allowed probability value. If not None (default), entries of pred_probs below this value will be clipped to this value.

    Deprecated since version 2.5.0: This keyword is deprecated and should be left to the default. The entropy is well-behaved even if pred_probs contains zeros, clipping is unnecessary and (slightly) changes the results.

Return type:



entropy (np.ndarray (shape (N, ))) – Each element is the normalized entropy of the corresponding row of pred_probs.


ValueError – An error is raised if any of the probabilities is not in the interval [0, 1].