span_classification#

Methods to find label issues in span classification datasets (text data), each token in a sentence receives one or more class labels.

The underlying label error detection algorithms are in cleanlab.token_classification.

Functions:

`find_label_issues`(labels, pred_probs)	Identifies tokens with label issues in a span classification dataset.
`display_issues`(issues, tokens, *[, labels, ...])	See documentation of `token_classification.summary.display_issues` for description.
`get_label_quality_scores`(labels, pred_probs, ...)	See documentation of `token_classification.rank.get_label_quality_scores` for description.

cleanlab.experimental.span_classification.find_label_issues(labels, pred_probs)[source]#

Identifies tokens with label issues in a span classification dataset.

Tokens identified with issues will be ranked by their individual label quality score.

To rank the sentences based on their overall label quality, use experimental.span_classification.get_label_quality_scores

Parameters:

labels (list) –

Nested list of given labels for all tokens.
Refer to documentation for this argument in token_classification.filter.find_label_issues for further details.

Note: Currently, only a single span class is supported.
pred_probs (list) – An array of shape (T, K) of model-predicted class probabilities. Refer to documentation for this argument in token_classification.filter.find_label_issues for further details.

Returns:

issues – List of label issues identified by cleanlab, such that each element is a tuple (i, j), which indicates that the j-th token of the i-th sentence has a label issue.

These tuples are ordered in issues list based on the likelihood that the corresponding token is mislabeled.

Use experimental.span_classification.get_label_quality_scores to view these issues within the original sentences.

Examples

>>> import numpy as np
>>> from cleanlab.experimental.span_classification import find_label_issues
>>> labels = [[0, 0, 1, 1], [1, 1, 0]]
>>> pred_probs = [
...     np.array([0.9, 0.9, 0.9, 0.1]),
...     np.array([0.1, 0.1, 0.9]),
... ]
>>> find_label_issues(labels, pred_probs)

cleanlab.experimental.span_classification.display_issues(issues, tokens, *, labels=None, pred_probs=None, exclude=[], class_names=None, top=20)[source]#

See documentation of token_classification.summary.display_issues for description.

Return type:: None

cleanlab.experimental.span_classification.get_label_quality_scores(labels, pred_probs, **kwargs)[source]#

See documentation of token_classification.rank.get_label_quality_scores for description.

Return type:: Tuple[ndarray, list]