span_classification#
Methods to find label issues in span classification datasets (text data), each token in a sentence receives one or more class labels.
The underlying label error detection algorithms are in cleanlab.token_classification.
Functions:
|
Identifies tokens with label issues in a span classification dataset. |
|
See documentation of |
|
See documentation of |
- cleanlab.experimental.span_classification.find_label_issues(labels, pred_probs)[source]#
Identifies tokens with label issues in a span classification dataset.
Tokens identified with issues will be ranked by their individual label quality score.
To rank the sentences based on their overall label quality, use
experimental.span_classification.get_label_quality_scores
- Parameters:
labels (
list
) –- Nested list of given labels for all tokens.
Refer to documentation for this argument in
token_classification.filter.find_label_issues
for further details.
Note: Currently, only a single span class is supported.
pred_probs (
list
) – An array of shape(T, K)
of model-predicted class probabilities. Refer to documentation for this argument intoken_classification.filter.find_label_issues
for further details.
- Returns:
issues
– List of label issues identified by cleanlab, such that each element is a tuple(i, j)
, which indicates that the j-th token of the i-th sentence has a label issue.These tuples are ordered in issues list based on the likelihood that the corresponding token is mislabeled.
Use
experimental.span_classification.get_label_quality_scores
to view these issues within the original sentences.
Examples
>>> import numpy as np >>> from cleanlab.experimental.span_classification import find_label_issues >>> labels = [[0, 0, 1, 1], [1, 1, 0]] >>> pred_probs = [ ... np.array([0.9, 0.9, 0.9, 0.1]), ... np.array([0.1, 0.1, 0.9]), ... ] >>> find_label_issues(labels, pred_probs)
- cleanlab.experimental.span_classification.display_issues(issues, tokens, *, labels=None, pred_probs=None, exclude=[], class_names=None, top=20)[source]#
See documentation of
token_classification.summary.display_issues
for description.- Return type:
None
- cleanlab.experimental.span_classification.get_label_quality_scores(labels, pred_probs, **kwargs)[source]#
See documentation of
token_classification.rank.get_label_quality_scores
for description.- Return type:
Tuple
[ndarray
,list
]