rank#
Methods to rank and score sentences in a token classification dataset (text data), based on how likely they are to contain label errors.
The underlying algorithms are described in this paper.
Functions:
| 
 | Returns overall quality scores for the labels in each sentence, as well as for the individual tokens' labels in a token classification dataset. | 
| 
 | Converts scores output by  | 
- cleanlab.token_classification.rank.get_label_quality_scores(labels, pred_probs, *, tokens=None, token_score_method='self_confidence', sentence_score_method='min', sentence_score_kwargs={})[source]#
- Returns overall quality scores for the labels in each sentence, as well as for the individual tokens’ labels in a token classification dataset. - Each score is between 0 and 1. - Lower scores indicate token labels that are less likely to be correct, or sentences that are more likely to contain a mislabeled token. - Parameters:
- labels ( - list) –- Nested list of given labels for all tokens, such that - labels[i]is a list of labels, one for each token in the- i-th sentence.- For a dataset with K classes, each label must be in 0, 1, …, K-1. 
- pred_probs ( - list) –- List of np arrays, such that - pred_probs[i]has shape- (T, K)if the- i-th sentence contains T tokens.- Each row of - pred_probs[i]corresponds to a token- tin the- i-th sentence, and contains model-predicted probabilities that- tbelongs to each of the K possible classes.- Columns of each - pred_probs[i]should be ordered such that the probabilities correspond to class 0, 1, …, K-1.
- tokens ( - Optional[- list]) –- Nested list such that - tokens[i]is a list of tokens (strings/words) that comprise the- i-th sentence.- These strings are used to annotated the returned - token_scoresobject, see its documentation for more information.
- sentence_score_method ( - {"min", "softmin"}, default- "min") –- Method to aggregate individual token label quality scores into a single score for the sentence. - min: sentence score = minimum of token scores in the sentence
- softmin: sentence score =- <s, softmax(1-s, t)>, where- sdenotes the token label scores of the sentence, and- <a, b> == np.dot(a, b). Here parameter- tcontrols the softmax temperature, such that the score converges toward- minas- t -> 0. Unlike- min,- softminis affected by the scores of all tokens in the sentence.
 
- token_score_method ( - {"self_confidence", "normalized_margin", "confidence_weighted_entropy"}, default- "self_confidence") –- Label quality scoring method for each token. - See - cleanlab.rank.get_label_quality_scoresdocumentation for more info.
- sentence_score_kwargs ( - dict) –- Optional keyword arguments for - sentence_score_methodfunction (for advanced users only).- See - cleanlab.token_classification.rank._softmin_sentence_scorefor more info about keyword arguments supported for that scoring method.
 
- Return type:
- Tuple[- ndarray,- list]
- Returns:
- sentence_scores– Array of shape- (N, )of scores between 0 and 1, one per sentence in the dataset.- Lower scores indicate sentences more likely to contain a label issue. 
- token_scores– List of- pd.Series, such that- token_info[i]contains the label quality scores for individual tokens in the- i-th sentence.- If - tokensstrings were provided, they are used as index for each- Series.
 
 - Examples - >>> import numpy as np >>> from cleanlab.token_classification.rank import get_label_quality_scores >>> labels = [[0, 0, 1], [0, 1]] >>> pred_probs = [ ... np.array([[0.9, 0.1], [0.7, 0.3], [0.05, 0.95]]), ... np.array([[0.8, 0.2], [0.8, 0.2]]), ... ] >>> sentence_scores, token_scores = get_label_quality_scores(labels, pred_probs) >>> sentence_scores array([0.7, 0.2]) >>> token_scores [0 0.90 1 0.70 2 0.95 dtype: float64, 0 0.8 1 0.2 dtype: float64] 
- cleanlab.token_classification.rank.issues_from_scores(sentence_scores, *, token_scores=None, threshold=0.1)[source]#
- Converts scores output by - token_classification.rank.get_label_quality_scoresto a list of issues of similar format as output by- token_classification.filter.find_label_issues.- Issues are sorted by label quality score, from most to least severe. - Only considers as issues those tokens with label quality score lower than - threshold, so this parameter determines the number of issues that are returned. This method is intended for converting the most severely mislabeled examples to a format compatible with- summarymethods like- token_classification.summary.display_issues. This method does not estimate the number of label errors since the- thresholdis arbitrary, for that instead use- token_classification.filter.find_label_issues, which estimates the label errors via Confident Learning rather than score thresholding.- Parameters:
- sentence_scores ( - ndarray) –- Array of shape - (N, )of overall sentence scores, where- Nis the number of sentences in the dataset.- Same format as the - sentence_scoresreturned by- token_classification.rank.get_label_quality_scores.
- token_scores ( - Optional[- list]) –- Optional list such that - token_scores[i]contains the individual token scores for the- i-th sentence.- Same format as the - token_scoresreturned by- token_classification.rank.get_label_quality_scores.
- threshold ( - float) – Tokens (or sentences, if- token_scoresis not provided) with quality scores above the- thresholdare not included in the result.
 
- Return type:
- Union[- list,- ndarray]
- Returns:
- issues– List of label issues identified by comparing quality scores to threshold, such that each element is a tuple- (i, j), which indicates that the- j-th token of the- i-th sentence has a label issue.- These tuples are ordered in - issueslist based on the token label quality score.- Use - token_classification.summary.display_issuesto view these issues within the original sentences.- If - token_scoresis not provided, returns array of integer indices (rather than tuples) of the sentences whose label quality score falls below the- threshold(also sorted by overall label quality score of each sentence).
 - Examples - >>> import numpy as np >>> from cleanlab.token_classification.rank import issues_from_scores >>> sentence_scores = np.array([0.1, 0.3, 0.6, 0.2, 0.05, 0.9, 0.8, 0.0125, 0.5, 0.6]) >>> issues_from_scores(sentence_scores) array([7, 4]) - Changing the score threshold - >>> issues_from_scores(sentence_scores, threshold=0.5) array([7, 4, 0, 3, 1]) - Providing token scores along with sentence scores finds issues at the token level - >>> token_scores = [ ... [0.9, 0.6], ... [0.0, 0.8, 0.8], ... [0.8, 0.8], ... [0.1, 0.02, 0.3, 0.4], ... [0.1, 0.2, 0.03, 0.4], ... [0.1, 0.2, 0.3, 0.04], ... [0.1, 0.2, 0.4], ... [0.3, 0.4], ... [0.08, 0.2, 0.5, 0.4], ... [0.1, 0.2, 0.3, 0.4], ... ] >>> issues_from_scores(sentence_scores, token_scores=token_scores) [(1, 0), (3, 1), (4, 2), (5, 3), (8, 0)]