multiannotator_utils#

Helper methods used internally in cleanlab.multiannotator

Functions:

`assert_valid_inputs_multiannotator`(...[, ...])	Validate format of multi-annotator labels
`assert_valid_pred_probs`([pred_probs, ...])	Validate format of pred_probs for multiannotator active learning functions
`format_multiannotator_labels`(labels)	Takes an array of labels and formats it such that labels are in the set `0, 1, ..., K-1`, where `K` is the number of classes.
`check_consensus_label_classes`(...)	Check if any classes no longer appear in the set of consensus labels (established using the consensus_method stated)
`compute_soft_cross_entropy`(...)	Compute soft cross entropy between the annotators' empirical label distribution and model pred_probs
`find_best_temp_scaler`(labels_multiannotator, ...)	Find the best temperature scaling factor that minimizes the soft cross entropy between the annotators' empirical label distribution and model pred_probs
`temp_scale_pred_probs`(pred_probs, temp)	Scales pred_probs by the given temperature factor.

cleanlab.internal.multiannotator_utils.assert_valid_inputs_multiannotator(labels_multiannotator, pred_probs=None, ensemble=False, allow_single_label=False, annotator_ids=None)[source]#

Validate format of multi-annotator labels

Return type:: None

cleanlab.internal.multiannotator_utils.assert_valid_pred_probs(pred_probs=None, pred_probs_unlabeled=None, ensemble=False)[source]#: Validate format of pred_probs for multiannotator active learning functions

cleanlab.internal.multiannotator_utils.format_multiannotator_labels(labels)[source]#

Takes an array of labels and formats it such that labels are in the set 0, 1, ..., K-1, where K is the number of classes. The labels are assigned based on lexicographic order.

Return type:

Tuple[DataFrame, dict]

Returns:

formatted_labels – Returns pd.DataFrame of shape (N,M). The return labels will be properly formatted and can be passed to cleanlab.multiannotator functions.
mapping – A dictionary showing the mapping of new to old labels, such that mapping[k] returns the name of the k-th class.

cleanlab.internal.multiannotator_utils.check_consensus_label_classes(labels_multiannotator, consensus_label, consensus_method)[source]#

Check if any classes no longer appear in the set of consensus labels (established using the consensus_method stated)

Return type:: None

cleanlab.internal.multiannotator_utils.compute_soft_cross_entropy(labels_multiannotator, pred_probs)[source]#

Compute soft cross entropy between the annotators’ empirical label distribution and model pred_probs

Return type:: float

cleanlab.internal.multiannotator_utils.find_best_temp_scaler(labels_multiannotator, pred_probs, coarse_search_range=[0.1, 0.2, 0.5, 0.8, 1, 2, 3, 5, 8], fine_search_size=4)[source]#

Find the best temperature scaling factor that minimizes the soft cross entropy between the annotators’ empirical label distribution and model pred_probs

Return type:: float

cleanlab.internal.multiannotator_utils.temp_scale_pred_probs(pred_probs, temp)[source]#

Scales pred_probs by the given temperature factor. Temperature of <1 will sharpen the pred_probs while temperatures of >1 will smoothen it.

Return type:: ndarray