multiannotator#
Methods for analysis of classification data labeled by multiple annotators.
To analyze a fixed dataset labeled by multiple annotators, use the
get_label_quality_multiannotator
function which estimates:
A consensus label for each example that aggregates the individual annotations more accurately than alternative aggregation via majority-vote or other algorithms used in crowdsourcing like Dawid-Skene.
A quality score for each consensus label which measures our confidence that this label is correct.
An analogous label quality score for each individual label chosen by one annotator for a particular example.
An overall quality score for each annotator which measures our confidence in the overall correctness of labels obtained from this annotator.
The algorithms to compute these estimates are described in the CROWDLAB paper.
If you have some labeled and unlabeled data (with multiple annotators for some labeled examples) and want to decide what data to collect additional labels for,
use the get_active_learning_scores
function, which is intended for active learning.
This function estimates an ActiveLab quality score for each example,
which can be used to prioritize which examples are most informative to collect additional labels for.
This function is effective for settings where some examples have been labeled by one or more annotators and other examples can have no labels at all so far,
as well as settings where new labels are collected either in batches of examples or one at a time.
Here is an example notebook showcasing the use of this ActiveLab method for active learning with data re-labeling.
The algorithms to compute these active learning scores are described in the ActiveLab paper.
Each of the main functions in this module utilizes any trained classifier model. Variants of these functions are provided for settings where you have trained an ensemble of multiple models.
Functions:
|
Returns label quality scores for each example and for each annotator in a dataset labeled by multiple annotators. |
Returns label quality scores for each example and for each annotator, based on predictions from an ensemble of models. |
|
|
Returns an ActiveLab quality score for each example in the dataset, to estimate which examples are most informative to (re)label next in active learning. |
Returns an ActiveLab quality score for each example in the dataset, based on predictions from an ensemble of models. |
|
|
Returns the majority vote label for each example, aggregated from the labels given by multiple annotators. |
Converts a long format dataset to wide format which is suitable for passing into |
- cleanlab.multiannotator.get_label_quality_multiannotator(labels_multiannotator, pred_probs, *, consensus_method='best_quality', quality_method='crowdlab', calibrate_probs=False, return_detailed_quality=True, return_annotator_stats=True, return_weights=False, verbose=True, label_quality_score_kwargs={})[source]#
Returns label quality scores for each example and for each annotator in a dataset labeled by multiple annotators.
This function is for multiclass classification datasets where examples have been labeled by multiple annotators (not necessarily the same number of annotators per example).
It computes one consensus label for each example that best accounts for the labels chosen by each annotator (and their quality), as well as a consensus quality score for how confident we are that this consensus label is actually correct. It also computes similar quality scores for each annotator’s individual labels, and the quality of each annotator. Scores are between 0 and 1 (estimated via methods like CROWDLAB); lower scores indicate labels/annotators less likely to be correct.
To decide what data to collect additional labels for, try the
get_active_learning_scores
(ActiveLab) function, which is intended for active learning with multiple annotators.- Parameters:
labels_multiannotator (
pd.DataFrame
ornp.ndarray
) –2D pandas DataFrame or array of multiple given labels for each example with shape
(N, M)
, where N is the number of examples and M is the number of annotators.labels_multiannotator[n][m]
= label for n-th example given by m-th annotator.For a dataset with K classes, each given label must be an integer in 0, 1, …, K-1 or
NaN
if this annotator did not label a particular example. If you have string or other differently formatted labels, you can convert them to the proper format usingformat_multiannotator_labels
. If pd.DataFrame, column names should correspond to each annotator’s ID.pred_probs (
np.ndarray
) – An array of shape(N, K)
of predicted class probabilities from a trained classifier model. Predicted probabilities in the same format expected by theget_label_quality_scores
.consensus_method (
str
orList[str]
, default ="majority_vote"
) –Specifies the method used to aggregate labels from multiple annotators into a single consensus label. Options include:
majority_vote
: consensus obtained using a simple majority vote among annotators, with ties broken viapred_probs
.best_quality
: consensus obtained by selecting the label with highest label quality (quality determined by method specified inquality_method
).
A List may be passed if you want to consider multiple methods for producing consensus labels. If a List is passed, then the 0th element of the list is the method used to produce columns consensus_label, consensus_quality_score, annotator_agreement in the returned DataFrame. The remaning (1st, 2nd, 3rd, etc.) elements of this list are output as extra columns in the returned pandas DataFrame with names formatted as: consensus_label_SUFFIX, consensus_quality_score_SUFFIX where SUFFIX = each element of this list, which must correspond to a valid method for computing consensus labels.
quality_method (
str
, default ="crowdlab"
) –Specifies the method used to calculate the quality of the consensus label. Options include:
crowdlab
: an emsemble method that weighs both the annotators’ labels as well as the model’s prediction.agreement
: the fraction of annotators that agree with the consensus label.
calibrate_probs (
bool
, default= False
) – Boolean value that specifies whether the provided pred_probs should be re-calibrated to better match the annotators’ empirical label distribution. We recommend setting this to True in active learning applications, in order to prevent overconfident models from suggesting the wrong examples to collect labels for.return_detailed_quality (
bool
, default= True
) – Boolean to specify if detailed_label_quality is returned.return_annotator_stats (
bool
, default= True
) – Boolean to specify if annotator_stats is returned.return_weights (
bool
, default= False
) – Boolean to specify if model_weight and annotator_weight is returned. Model and annotator weights are applicable forquality_method == crowdlab
, will returnNone
for any other quality methods.verbose (
bool
, default= True
) – Important warnings and other printed statements may be suppressed ifverbose
is set toFalse
.label_quality_score_kwargs (
dict
, optional) – Keyword arguments to pass intoget_label_quality_scores
.
- Return type:
Dict
[str
,Any
]- Returns:
labels_info (
dict
) – Dictionary containing up to 5 pandas DataFrame with keys as below:label_quality
pandas.DataFramepandas DataFrame in which each row corresponds to one example, with columns:
num_annotations
: the number of annotators that have labeled each example.consensus_label
: the single label that is best for each example (you can control how it is derived from all annotators’ labels via the argument:consensus_method
).annotator_agreement
: the fraction of annotators that agree with the consensus label (only consider the annotators that labeled that particular example).consensus_quality_score
: label quality score for consensus label, calculated by the method specified inquality_method
.
detailed_label_quality
pandas.DataFrameOnly returned if return_detailed_quality=True. Returns a pandas DataFrame with columns quality_annotator_1, quality_annotator_2, …, quality_annotator_M where each entry is the label quality score for the labels provided by each annotator (is
NaN
for examples which this annotator did not label).annotator_stats
pandas.DataFrameOnly returned if return_annotator_stats=True. Returns overall statistics about each annotator, sorted by lowest annotator_quality first. pandas DataFrame in which each row corresponds to one annotator (the row IDs correspond to annotator IDs), with columns:
annotator_quality
: overall quality of a given annotator’s labels, calculated by the method specified inquality_method
.num_examples_labeled
: number of examples annotated by a given annotator.agreement_with_consensus
: fraction of examples where a given annotator agrees with the consensus label.worst_class
: the class that is most frequently mislabeled by a given annotator.
model_weight
floatOnly returned if return_weights=True. It is only applicable for
quality_method == crowdlab
. The model weight specifies the weight of classifier model in weighted averages used to estimate label quality This number is an estimate of how trustworthy the model is relative the annotators.annotator_weight
np.ndarrayOnly returned if return_weights=True. It is only applicable for
quality_method == crowdlab
. An array of shape(M,)
where M is the number of annotators, specifying the weight of each annotator in weighted averages used to estimate label quality. These weights are estimates of how trustworthy each annotator is relative to the other annotators.
- cleanlab.multiannotator.get_label_quality_multiannotator_ensemble(labels_multiannotator, pred_probs, *, calibrate_probs=False, return_detailed_quality=True, return_annotator_stats=True, return_weights=False, verbose=True, label_quality_score_kwargs={})[source]#
Returns label quality scores for each example and for each annotator, based on predictions from an ensemble of models.
This function is similar to ~cleanlab.multiannotator.get_label_quality_multiannotator but for settings where you have trained an ensemble of multiple classifier models rather than a single model.
- Parameters:
labels_multiannotator (
pd.DataFrame
ornp.ndarray
) – Multiannotator labels in the same format expected by ~cleanlab.multiannotator.get_label_quality_multiannotator.pred_probs (
np.ndarray
) – An array of shape(P, N, K)
where P is the number of models, consisting of predicted class probabilities from the ensemble models. Each set of predicted probabilities with shape(N, K)
is in the same format expected by theget_label_quality_scores
.calibrate_probs (
bool
, default= False
) – Boolean value as expected by ~cleanlab.multiannotator.get_label_quality_multiannotator.return_detailed_quality (
bool
, default= True
) – Boolean value as expected by ~cleanlab.multiannotator.get_label_quality_multiannotator.return_annotator_stats (
bool
, default= True
) – Boolean value as expected by ~cleanlab.multiannotator.get_label_quality_multiannotator.return_weights (
bool
, default= False
) – Boolean value as expected by ~cleanlab.multiannotator.get_label_quality_multiannotator.verbose (
bool
, default= True
) – Boolean value as expected by ~cleanlab.multiannotator.get_label_quality_multiannotator.label_quality_score_kwargs (
dict
, optional) – Keyword arguments in the same format expected by ~cleanlab.multiannotator.get_label_quality_multiannotator.
- Return type:
Dict
[str
,Any
]- Returns:
labels_info (
dict
) – Dictionary containing up to 5 pandas DataFrame with keys as below:label_quality
pandas.DataFrameSimilar to output as ~cleanlab.multiannotator.get_label_quality_multiannotator.
detailed_label_quality
pandas.DataFrameSimilar to output as ~cleanlab.multiannotator.get_label_quality_multiannotator.
annotator_stats
pandas.DataFrameSimilar to output as ~cleanlab.multiannotator.get_label_quality_multiannotator.
model_weight
np.ndarrayOnly returned if return_weights=True. An array of shape
(P,)
where is the number of models in the ensemble, specifying the weight of each classifier model in weighted averages used to estimate label quality. These weigthts is an estimate of how trustworthy the model is relative the annotators. An array of shape(P,)
where is the number of models in the ensemble, specifying the model weight used in weighted averages.annotator_weight
np.ndarrayOnly returned if return_weights=True. Similar to output as ~cleanlab.multiannotator.get_label_quality_multiannotator.
See also
- cleanlab.multiannotator.get_active_learning_scores(labels_multiannotator=None, pred_probs=None, pred_probs_unlabeled=None)[source]#
Returns an ActiveLab quality score for each example in the dataset, to estimate which examples are most informative to (re)label next in active learning.
We consider settings where one example can be labeled by one or more annotators and some examples have no labels at all so far.
The score is in between 0 and 1, and can be used to prioritize what data to collect additional labels for. Lower scores indicate examples whose true label we are least confident about based on the current data; collecting additional labels for these low-scoring examples will be more informative than collecting labels for other examples. To use an annotation budget most efficiently, select a batch of examples with the lowest scores and collect one additional label for each example, and repeat this process after retraining your classifier.
You can use this function to get active learning scores for: examples that already have one or more labels (specify
labels_multiannotator
andpred_probs
as arguments), or for unlabeled examples (specifypred_probs_unlabeled
), or for both types of examples (specify all of the above arguments).To analyze a fixed dataset labeled by multiple annotators rather than collecting additional labels, try the ~cleanlab.multiannotator.get_label_quality_multiannotator (CROWDLAB) function instead.
- Parameters:
labels_multiannotator (
pd.DataFrame
ornp.ndarray
, optional) – 2D pandas DataFrame or array of multiple given labels for each example with shape(N, M)
, where N is the number of examples and M is the number of annotators. Note that this function also works with datasets where there is only one annotator (M=1). For more details, labels in the same format expected by the ~cleanlab.multiannotator.get_label_quality_multiannotator. Note that examples that have no annotator labels should not be included in this DataFrame/array. This argument is optional ifpred_probs
is not provided (you might only providepred_probs_unlabeled
to only get active learning scores for the unlabeled examples).pred_probs (
np.ndarray
, optional) – An array of shape(N, K)
of predicted class probabilities from a trained classifier model. Predicted probabilities in the same format expected by theget_label_quality_scores
. This argument is optional if you only want to get active learning scores for unlabeled examples (specify onlypred_probs_unlabeled
instead).pred_probs_unlabeled (
np.ndarray
, optional) – An array of shape(N, K)
of predicted class probabilities from a trained classifier model for examples that have no annotator labels. Predicted probabilities in the same format expected by theget_label_quality_scores
. This argument is optional if you only want to get active learning scores for already-labeled examples (specify onlypred_probs
instead).
- Return type:
Tuple
[ndarray
,ndarray
]- Returns:
active_learning_scores (
np.ndarray
) – Array of shape(N,)
indicating the ActiveLab quality scores for each example. This array is empty if no already-labeled data was provided vialabels_multiannotator
. Examples with the lowest scores are those we should label next in order to maximally improve our classifier model.active_learning_scores_unlabeled (
np.ndarray
) – Array of shape(N,)
indicating the active learning quality scores for each unlabeled example. Returns an empty array if no unlabeled data is provided. Examples with the lowest scores are those we should label next in order to maximally improve our classifier model (scores for unlabeled data are directly comparable with the active_learning_scores for labeled data).
- cleanlab.multiannotator.get_active_learning_scores_ensemble(labels_multiannotator=None, pred_probs=None, pred_probs_unlabeled=None)[source]#
Returns an ActiveLab quality score for each example in the dataset, based on predictions from an ensemble of models.
This function is similar to ~cleanlab.multiannotator.get_active_learning_scores but allows for an ensemble of multiple classifier models to be trained and will aggregate predictions from the models to compute the ActiveLab quality score.
- Parameters:
labels_multiannotator (
pd.DataFrame
ornp.ndarray
) – Multiannotator labels in the same format expected by ~cleanlab.multiannotator.get_active_learning_scores. This argument is optional ifpred_probs
is not provided (in cases where you only providepred_probs_unlabeled
to get active learning scores for unlabeled examples).pred_probs (
np.ndarray
) – An array of shape(P, N, K)
where P is the number of models, consisting of predicted class probabilities from the ensemble models. Note that this function also works with datasets where there is only one annotator (M=1). Each set of predicted probabilities with shape(N, K)
is in the same format expected by theget_label_quality_scores
. This argument is optional if you only want to get active learning scores for unlabeled examples (pass inpred_probs_unlabeled
instead).pred_probs_unlabeled (
np.ndarray
, optional) – An array of shape(P, N, K)
where P is the number of models, consisting of predicted class probabilities from a trained classifier model for examples that have no annotated labels so far (but which we may want to label in the future, and hence compute active learning quality scores for). Each set of predicted probabilities with shape(N, K)
is in the same format expected by theget_label_quality_scores
. This argument is optional if you only want to get active learning scores for labeled examples (pass inpred_probs
instead).
- Return type:
Tuple
[ndarray
,ndarray
]- Returns:
active_learning_scores (
np.ndarray
) – Similar to output asget_label_quality_scores
.active_learning_scores_unlabeled (
np.ndarray
) – Similar to output asget_label_quality_scores
.
See also
- cleanlab.multiannotator.get_majority_vote_label(labels_multiannotator, pred_probs=None, verbose=True)[source]#
Returns the majority vote label for each example, aggregated from the labels given by multiple annotators.
- Parameters:
labels_multiannotator (
pd.DataFrame
ornp.ndarray
) – 2D pandas DataFrame or array of multiple given labels for each example with shape(N, M)
, where N is the number of examples and M is the number of annotators. For more details, labels in the same format expected by the ~cleanlab.multiannotator.get_label_quality_multiannotator.pred_probs (
np.ndarray
, optional) – An array of shape(N, K)
of model-predicted probabilities,P(label=k|x)
. For details, predicted probabilities in the same format expected by ~cleanlab.multiannotator.get_label_quality_multiannotator.verbose (
bool
, optional) – Important warnings and other printed statements may be suppressed ifverbose
is set toFalse
.
- Return type:
ndarray
- Returns:
consensus_label (
np.ndarray
) – An array of shape(N,)
with the majority vote label aggregated from all annotators.In the event of majority vote ties, ties are broken in the following order: using the model
pred_probs
(if provided) and selecting the class with highest predicted probability, using the empirical class frequencies and selecting the class with highest frequency, using an initial annotator quality score and selecting the class that has been labeled by annotators with higher quality, and lastly by random selection.
- cleanlab.multiannotator.convert_long_to_wide_dataset(labels_multiannotator_long)[source]#
Converts a long format dataset to wide format which is suitable for passing into ~cleanlab.multiannotator.get_label_quality_multiannotator.
Dataframe must contain three columns named:
task
representing each example labeled by the annotatorsannotator
representing each annotatorlabel
representing the label given by an annotator for the corresponding task (i.e. example)
- Parameters:
labels_multiannotator_long (
pd.DataFrame
) – pandas DataFrame in long format with three columns namedtask
,annotator
andlabel
- Return type:
DataFrame
- Returns:
labels_multiannotator_wide (
pd.DataFrame
) – pandas DataFrame of the proper format to be passed aslabels_multiannotator
for the othercleanlab.multiannotator
functions.