span_classification#

Methods to find label issues in span classification datasets (text data), each token in a sentence receives one or more class labels.

The underlying label error detection algorithms are in cleanlab.token_classification.

Functions:

find_label_issues(labels, pred_probs)

Identifies tokens with label issues in a span classification dataset.

display_issues(issues, tokens, *[, labels, ...])

See documentation of token_classification.summary.display_issues for description.

get_label_quality_scores(labels, pred_probs, ...)

See documentation of token_classification.rank.get_label_quality_scores for description.

cleanlab.experimental.span_classification.find_label_issues(labels, pred_probs)[source]#

Identifies tokens with label issues in a span classification dataset.

Tokens identified with issues will be ranked by their individual label quality score.

To rank the sentences based on their overall label quality, use experimental.span_classification.get_label_quality_scores

Parameters:
Returns:

issues – List of label issues identified by cleanlab, such that each element is a tuple (i, j), which indicates that the j-th token of the i-th sentence has a label issue.

These tuples are ordered in issues list based on the likelihood that the corresponding token is mislabeled.

Use experimental.span_classification.get_label_quality_scores to view these issues within the original sentences.

Examples

>>> import numpy as np
>>> from cleanlab.experimental.span_classification import find_label_issues
>>> labels = [[0, 0, 1, 1], [1, 1, 0]]
>>> pred_probs = [
...     np.array([0.9, 0.9, 0.9, 0.1]),
...     np.array([0.1, 0.1, 0.9]),
... ]
>>> find_label_issues(labels, pred_probs)
cleanlab.experimental.span_classification.display_issues(issues, tokens, *, labels=None, pred_probs=None, exclude=[], class_names=None, top=20)[source]#

See documentation of token_classification.summary.display_issues for description.

Return type:

None

cleanlab.experimental.span_classification.get_label_quality_scores(labels, pred_probs, **kwargs)[source]#

See documentation of token_classification.rank.get_label_quality_scores for description.

Return type:

Tuple[ndarray, list]