# label_quality_utils#

Helper functions for computing label quality scores

Functions:

 get_normalized_entropy(pred_probs[, ...]) Returns the normalized entropy of pred_probs.
cleanlab.internal.label_quality_utils.get_normalized_entropy(pred_probs: numpy.array, min_allowed_prob=1e-06) numpy.array[source]#

Returns the normalized entropy of pred_probs.

Normalized entropy is between 0 and 1. Higher values of entropy indicate higher uncertainty in the model’s prediction of the correct label.

Normalized entropy is used in active learning for uncertainty sampling: https://towardsdatascience.com/uncertainty-sampling-cheatsheet-ec57bc067c0b

Unlike label-quality scores, entropy only depends on the model’s predictions, not the given label.

Parameters
• pred_probs (np.array (shape (N, K))) – P(label=k|x) is a matrix with K model-predicted probabilities. Each row of this matrix corresponds to an example x and contains the model-predicted probabilities that x belongs to each possible class. The columns must be ordered such that these probabilities correspond to class 0,1,2,… pred_probs should have been computed using 3 (or higher) fold cross-validation.

• min_allowed_prob (float, default 1e-6) – Minimum allowed probability value. Entries of pred_probs below this value will be clipped to this value. Ensures entropy remains well-behaved even when pred_probs contains zeros.

Returns

entropy

Return type

np.array (float)