rank#
Methods to rank and score sentences in a token classification dataset (text data), based on how likely they are to contain label errors.
The underlying algorithms are described in this paper.
Functions:
|
Returns overall quality scores for the labels in each sentence, as well as for the individual tokens' labels in a token classification dataset. |
|
Converts scores output by |
- cleanlab.token_classification.rank.get_label_quality_scores(labels, pred_probs, *, tokens=None, token_score_method='self_confidence', sentence_score_method='min', sentence_score_kwargs={})[source]#
Returns overall quality scores for the labels in each sentence, as well as for the individual tokens’ labels in a token classification dataset.
Each score is between 0 and 1.
Lower scores indicate token labels that are less likely to be correct, or sentences that are more likely to contain a mislabeled token.
- Parameters:
labels (
list) –Nested list of given labels for all tokens, such that
labels[i]is a list of labels, one for each token in thei-th sentence.For a dataset with K classes, each label must be in 0, 1, …, K-1.
pred_probs (
list) –List of np arrays, such that
pred_probs[i]has shape(T, K)if thei-th sentence contains T tokens.Each row of
pred_probs[i]corresponds to a tokentin thei-th sentence, and contains model-predicted probabilities thattbelongs to each of the K possible classes.Columns of each
pred_probs[i]should be ordered such that the probabilities correspond to class 0, 1, …, K-1.tokens (
Optional[list]) –Nested list such that
tokens[i]is a list of tokens (strings/words) that comprise thei-th sentence.These strings are used to annotated the returned
token_scoresobject, see its documentation for more information.sentence_score_method (
{"min", "softmin"}, default"min") –Method to aggregate individual token label quality scores into a single score for the sentence.
min: sentence score = minimum of token scores in the sentencesoftmin: sentence score =<s, softmax(1-s, t)>, wheresdenotes the token label scores of the sentence, and<a, b> == np.dot(a, b). Here parametertcontrols the softmax temperature, such that the score converges towardminast -> 0. Unlikemin,softminis affected by the scores of all tokens in the sentence.
token_score_method (
{"self_confidence", "normalized_margin", "confidence_weighted_entropy"}, default"self_confidence") –Label quality scoring method for each token.
See
cleanlab.rank.get_label_quality_scoresdocumentation for more info.sentence_score_kwargs (
dict) –Optional keyword arguments for
sentence_score_methodfunction (for advanced users only).See
cleanlab.token_classification.rank._softmin_sentence_scorefor more info about keyword arguments supported for that scoring method.
- Return type:
Tuple[ndarray,list]- Returns:
sentence_scores– Array of shape(N, )of scores between 0 and 1, one per sentence in the dataset.Lower scores indicate sentences more likely to contain a label issue.
token_scores– List ofpd.Series, such thattoken_info[i]contains the label quality scores for individual tokens in thei-th sentence.If
tokensstrings were provided, they are used as index for eachSeries.
Examples
>>> import numpy as np >>> from cleanlab.token_classification.rank import get_label_quality_scores >>> labels = [[0, 0, 1], [0, 1]] >>> pred_probs = [ ... np.array([[0.9, 0.1], [0.7, 0.3], [0.05, 0.95]]), ... np.array([[0.8, 0.2], [0.8, 0.2]]), ... ] >>> sentence_scores, token_scores = get_label_quality_scores(labels, pred_probs) >>> sentence_scores array([0.7, 0.2]) >>> token_scores [0 0.90 1 0.70 2 0.95 dtype: float64, 0 0.8 1 0.2 dtype: float64]
- cleanlab.token_classification.rank.issues_from_scores(sentence_scores, *, token_scores=None, threshold=0.1)[source]#
Converts scores output by
token_classification.rank.get_label_quality_scoresto a list of issues of similar format as output bytoken_classification.filter.find_label_issues.Issues are sorted by label quality score, from most to least severe.
Only considers as issues those tokens with label quality score lower than
threshold, so this parameter determines the number of issues that are returned. This method is intended for converting the most severely mislabeled examples to a format compatible withsummarymethods liketoken_classification.summary.display_issues. This method does not estimate the number of label errors since thethresholdis arbitrary, for that instead usetoken_classification.filter.find_label_issues, which estimates the label errors via Confident Learning rather than score thresholding.- Parameters:
sentence_scores (
ndarray) –Array of shape
(N, )of overall sentence scores, whereNis the number of sentences in the dataset.Same format as the
sentence_scoresreturned bytoken_classification.rank.get_label_quality_scores.token_scores (
Optional[list]) –Optional list such that
token_scores[i]contains the individual token scores for thei-th sentence.Same format as the
token_scoresreturned bytoken_classification.rank.get_label_quality_scores.threshold (
float) – Tokens (or sentences, iftoken_scoresis not provided) with quality scores above thethresholdare not included in the result.
- Return type:
Union[list,ndarray]- Returns:
issues– List of label issues identified by comparing quality scores to threshold, such that each element is a tuple(i, j), which indicates that thej-th token of thei-th sentence has a label issue.These tuples are ordered in
issueslist based on the token label quality score.Use
token_classification.summary.display_issuesto view these issues within the original sentences.If
token_scoresis not provided, returns array of integer indices (rather than tuples) of the sentences whose label quality score falls below thethreshold(also sorted by overall label quality score of each sentence).
Examples
>>> import numpy as np >>> from cleanlab.token_classification.rank import issues_from_scores >>> sentence_scores = np.array([0.1, 0.3, 0.6, 0.2, 0.05, 0.9, 0.8, 0.0125, 0.5, 0.6]) >>> issues_from_scores(sentence_scores) array([7, 4])
Changing the score threshold
>>> issues_from_scores(sentence_scores, threshold=0.5) array([7, 4, 0, 3, 1])
Providing token scores along with sentence scores finds issues at the token level
>>> token_scores = [ ... [0.9, 0.6], ... [0.0, 0.8, 0.8], ... [0.8, 0.8], ... [0.1, 0.02, 0.3, 0.4], ... [0.1, 0.2, 0.03, 0.4], ... [0.1, 0.2, 0.3, 0.04], ... [0.1, 0.2, 0.4], ... [0.3, 0.4], ... [0.08, 0.2, 0.5, 0.4], ... [0.1, 0.2, 0.3, 0.4], ... ] >>> issues_from_scores(sentence_scores, token_scores=token_scores) [(1, 0), (3, 1), (4, 2), (5, 3), (8, 0)]