summary#

Methods to display images and their label issues in a semantic segmentation dataset, as well as summarize the overall types of issues identified.

Functions:

display_issues(issues, *[, labels, ...])

Display semantic segmentation label issues, showing images with problematic pixels highlighted.

common_label_issues(issues, labels, ...[, ...])

Display the frequency of which label are swapped in the dataset.

filter_by_class(class_index, issues, labels, ...)

Return label issues involving particular class.

cleanlab.segmentation.summary.display_issues(issues, *, labels=None, pred_probs=None, class_names=None, exclude=None, top=None)[source]#

Display semantic segmentation label issues, showing images with problematic pixels highlighted.

Can also show given and predicted masks for each image identified to have label issue.

Parameters:
  • issues (ndarray) –

    Boolean mask for the entire dataset where True represents a pixel label issue and False represents an example that is accurately labeled.

    Same format as output by segmentation.filter.find_label_issues or segmentation.rank.issues_from_scores.

  • labels (Optional[ndarray]) – Optional discrete array of noisy labels for a segmantic segmentation dataset, in the shape (N,H,W,), where each pixel must be integer in 0, 1, …, K-1. If labels is provided, this function also displays given label of the pixel identified with issue. Refer to documentation for this argument in find_label_issues for more information.

  • pred_probs (Optional[ndarray]) –

    Optional array of shape (N,K,H,W,) of model-predicted class probabilities. If pred_probs is provided, this function also displays predicted label of the pixel identified with issue. Refer to documentation for this argument in find_label_issues for more information.

    Tip

    If your labels are one hot encoded you can np.argmax(labels_one_hot, axis=1) assuming that labels_one_hot is of dimension (N,K,H,W) before entering in the function

  • class_names (Optional[List[str]]) –

    Optional list of strings, where each string represents the name of a class in the semantic segmentation problem. The order of the names should correspond to the numerical order of the classes. The list length should be equal to the number of unique classes present in the labels. If provided, this function will generate a legend showing the color mapping of each class in the provided colormap.

    Example: If there are three classes in your labels, represented by 0, 1, 2, then class_names might look like this:

    class_names = ['background', 'person', 'dog']
    

  • top (Optional[int]) – Optional maximum number of issues to be printed. If not provided, a good default is used.

  • exclude (Optional[List[int]]) – Optional list of label classes that can be ignored in the errors, each element must be 0, 1, …, K-1

Return type:

None

cleanlab.segmentation.summary.common_label_issues(issues, labels, pred_probs, *, class_names=None, exclude=None, top=None, verbose=True)[source]#

Display the frequency of which label are swapped in the dataset.

These may correspond to pixels that are ambiguous or systematically misunderstood by the data annotators.

  • N - Number of images in the dataset

  • K - Number of classes in the dataset

  • H - Height of each image

  • W - Width of each image

Parameters:
  • issues (ndarray) –

    Boolean mask for the entire dataset where True represents a pixel label issue and False represents an example that is accurately labeled.

    Same format as output by segmentation.filter.find_label_issues or segmentation.rank.issues_from_scores.

  • labels (ndarray) – A discrete array of noisy labels for a segmantic segmentation dataset, in the shape (N,H,W,). where each pixel must be integer in 0, 1, …, K-1. Refer to documentation for this argument in find_label_issues for more information.

  • pred_probs (ndarray) –

    An array of shape (N,K,H,W,) of model-predicted class probabilities. Refer to documentation for this argument in find_label_issues for more information.

    Tip

    If your labels are one hot encoded you can np.argmax(labels_one_hot, axis=1) assuming that labels_one_hot is of dimension (N,K,H,W) before entering in the function

  • class_names (Optional[List[str]]) – Optional length K list of names of each class, such that class_names[i] is the string name of the class corresponding to labels with value i. If class_names is provided, display these string names for predicted and given labels, otherwise display the integer index of classes.

  • exclude (Optional[List[int]]) – Optional list of label classes that can be ignored in the errors, each element must be in 0, 1, …, K-1.

  • top (Optional[int]) – Optional maximum number of tokens to print information for. If not provided, a good default is used.

  • verbose (bool) – Set to False to suppress all print statements.

Return type:

DataFrame

Returns:

issues_df – DataFrame with columns ['given_label', 'predicted_label', 'num_label_issues'] where each row contains information about a particular given/predicted label swap. Rows are ordered by the number of label issues inferred to exhibit this type of label swap.

cleanlab.segmentation.summary.filter_by_class(class_index, issues, labels, pred_probs)[source]#

Return label issues involving particular class. Note that this includes errors where the given label is the class of interest, and the predicted label is any other class.

Parameters:
  • class_index (int) – The specific class you are interested in.

  • issues (ndarray) –

    Boolean mask for the entire dataset where True represents a pixel label issue and False represents an example that is accurately labeled.

    Same format as output by segmentation.filter.find_label_issues or segmentation.rank.issues_from_scores.

  • labels (ndarray) – A discrete array of noisy labels for a segmantic segmentation dataset, in the shape (N,H,W,), where each pixel must be integer in 0, 1, …, K-1. Refer to documentation for this argument in find_label_issues for further details.

  • pred_probs (ndarray) – An array of shape (N,K,H,W,) of model-predicted class probabilities. Refer to documentation for this argument in find_label_issues for further details.

Return type:

ndarray

Returns:

issues_subset – Boolean mask for the subset dataset where True represents a pixel label issue and False represents an example that is accurately labeled for the labeled class.

Returned mask shows all instances that involve the particular class of interest.