multiannotator_utils#

Helper methods used internally in cleanlab.multiannotator

Functions:

assert_valid_inputs_multiannotator(...[, ...])

Validate format of multi-annotator labels

assert_valid_pred_probs([pred_probs, ...])

Validate format of pred_probs for multiannotator active learning functions

format_multiannotator_labels(labels)

Takes an array of labels and formats it such that labels are in the set 0, 1, ..., K-1, where K is the number of classes.

check_consensus_label_classes(...)

Check if any classes no longer appear in the set of consensus labels (established using the consensus_method stated)

compute_soft_cross_entropy(...)

Compute soft cross entropy between the annotators' empirical label distribution and model pred_probs

find_best_temp_scaler(labels_multiannotator, ...)

Find the best temperature scaling factor that minimizes the soft cross entropy between the annotators' empirical label distribution and model pred_probs

temp_scale_pred_probs(pred_probs, temp)

Scales pred_probs by the given temperature factor.

cleanlab.internal.multiannotator_utils.assert_valid_inputs_multiannotator(labels_multiannotator, pred_probs=None, ensemble=False, allow_single_label=False, annotator_ids=None)[source]#

Validate format of multi-annotator labels

Return type:

None

cleanlab.internal.multiannotator_utils.assert_valid_pred_probs(pred_probs=None, pred_probs_unlabeled=None, ensemble=False)[source]#

Validate format of pred_probs for multiannotator active learning functions

cleanlab.internal.multiannotator_utils.format_multiannotator_labels(labels)[source]#

Takes an array of labels and formats it such that labels are in the set 0, 1, ..., K-1, where K is the number of classes. The labels are assigned based on lexicographic order.

Return type:

Tuple[DataFrame, dict]

Returns:

  • formatted_labels – Returns pd.DataFrame of shape (N,M). The return labels will be properly formatted and can be passed to cleanlab.multiannotator functions.

  • mapping – A dictionary showing the mapping of new to old labels, such that mapping[k] returns the name of the k-th class.

cleanlab.internal.multiannotator_utils.check_consensus_label_classes(labels_multiannotator, consensus_label, consensus_method)[source]#

Check if any classes no longer appear in the set of consensus labels (established using the consensus_method stated)

Return type:

None

cleanlab.internal.multiannotator_utils.compute_soft_cross_entropy(labels_multiannotator, pred_probs)[source]#

Compute soft cross entropy between the annotators’ empirical label distribution and model pred_probs

Return type:

float

cleanlab.internal.multiannotator_utils.find_best_temp_scaler(labels_multiannotator, pred_probs, coarse_search_range=[0.1, 0.2, 0.5, 0.8, 1, 2, 3, 5, 8], fine_search_size=4)[source]#

Find the best temperature scaling factor that minimizes the soft cross entropy between the annotators’ empirical label distribution and model pred_probs

Return type:

float

cleanlab.internal.multiannotator_utils.temp_scale_pred_probs(pred_probs, temp)[source]#

Scales pred_probs by the given temperature factor. Temperature of <1 will sharpen the pred_probs while temperatures of >1 will smoothen it.

Return type:

ndarray