label_quality_utils#

Helper methods used internally for computing label quality scores

Functions:

get_normalized_entropy(pred_probs[, ...])

Returns the normalized entropy of pred_probs.

cleanlab.internal.label_quality_utils.get_normalized_entropy(pred_probs, min_allowed_prob=1e-06)[source]#

Returns the normalized entropy of pred_probs.

Normalized entropy is between 0 and 1. Higher values of entropy indicate higher uncertainty in the model’s prediction of the correct label.

Read more about normalized entropy on Wikipedia.

Normalized entropy is used in active learning for uncertainty sampling: https://towardsdatascience.com/uncertainty-sampling-cheatsheet-ec57bc067c0b

Unlike label-quality scores, entropy only depends on the model’s predictions, not the given label.

Parameters:
  • pred_probs (ndarray) – Each row of this matrix corresponds to an example x and contains the model-predicted probabilities that x belongs to each possible class: P(label=k|x)

  • min_allowed_prob (float) – Minimum allowed probability value. Entries of pred_probs below this value will be clipped to this value. Ensures entropy remains well-behaved even when pred_probs contains zeros.

Return type:

ndarray

Returns:

entropy – Each element is the normalized entropy of the corresponding row of pred_probs.