label_quality_utils#
Helper methods used internally for computing label quality scores.
Functions:
|
Return the normalized entropy of pred_probs. |
- cleanlab.internal.label_quality_utils.get_normalized_entropy(pred_probs, min_allowed_prob=None)[source]#
Return the normalized entropy of pred_probs.
Normalized entropy is between 0 and 1. Higher values of entropy indicate higher uncertainty in the model’s prediction of the correct label.
Read more about normalized entropy on Wikipedia.
Normalized entropy is used in active learning for uncertainty sampling: https://towardsdatascience.com/uncertainty-sampling-cheatsheet-ec57bc067c0b
Unlike label-quality scores, entropy only depends on the model’s predictions, not the given label.
- Parameters:
pred_probs (
np.ndarray (shape (N
,K))
) – Each row of this matrix corresponds to an example x and contains the model-predicted probabilities that x belongs to each possible class: P(label=k|x)min_allowed_prob (
float
, default:None
,deprecated
) –Minimum allowed probability value. If not
None
(default), entries ofpred_probs
below this value will be clipped to this value.Deprecated since version 2.5.0: This keyword is deprecated and should be left to the default. The entropy is well-behaved even if
pred_probs
contains zeros, clipping is unnecessary and (slightly) changes the results.
- Return type:
ndarray
- Returns:
entropy (
np.ndarray (shape (N
,))
) – Each element is the normalized entropy of the corresponding row ofpred_probs
.- Raises:
ValueError – An error is raised if any of the probabilities is not in the interval [0, 1].