rank#

Methods to rank and score images in a semantic segmentation dataset based on how likely they are to contain mislabeled pixels.

Functions:

get_label_quality_scores(labels, pred_probs, *)

Returns a label quality score for each image.

issues_from_scores(image_scores[, ...])

Converts scores output by get_label_quality_scores to a list of issues of similar format as output by segmentation.filter.find_label_issues.

cleanlab.segmentation.rank.get_label_quality_scores(labels, pred_probs, *, method='softmin', batch_size=None, n_jobs=None, verbose=True, **kwargs)[source]#

Returns a label quality score for each image.

This is a function to compute label quality scores for semantic segmentation datasets, where lower scores indicate labels less likely to be correct.

  • N - Number of images in the dataset

  • K - Number of classes in the dataset

  • H - Height of each image

  • W - Width of each image

Parameters:
  • labels (ndarray) – A discrete array of noisy labels for a segmantic segmentation dataset, in the shape (N,H,W,), where each pixel must be integer in 0, 1, …, K-1. Refer to documentation for this argument in find_label_issues for further details.

  • pred_probs (ndarray) – An array of shape (N,K,H,W,) of model-predicted class probabilities. Refer to documentation for this argument in find_label_issues for further details.

  • method ({"softmin", "num_pixel_issues"}, default "softmin") –

    Label quality scoring method.

    • ”softmin” - Calculates the inner product between scores and softmax(1-scores). For efficiency, use instead of “num_pixel_issues”.

    • ”num_pixel_issues” - Uses the number of pixels with label issues for each image using find_label_issues

  • batch_size (Optional[int]) – Optional size of mini-batches to use for estimating the label issues for ‘num_pixel_issues’ only, not ‘softmin’. To maximize efficiency, try to use the largest batch_size your memory allows. If not provided, a good default is used.

  • n_jobs (Optional[int]) – Optional number of processes for multiprocessing (default value = 1). Only used on Linux. For ‘num_pixel_issues’ only, not ‘softmin’ If n_jobs=None, will use either the number of: physical cores if psutil is installed, or logical cores otherwise.

  • verbose (bool) – Set to False to suppress all print statements.

  • **kwargs

    • downsampleint,

      Factor to shrink labels and pred_probs by for ‘num_pixel_issues’ only, not ‘softmin’ . Default 16 Must be a factor divisible by both the labels and the pred_probs. Larger values of downsample produce faster runtimes but potentially less accurate results due to over-compression. Set to 1 to avoid any downsampling.

    • temperaturefloat,

      Temperature for softmin. Default 0.1

Return type:

Tuple[ndarray, ndarray]

Returns:

  • image_scores – Array of shape (N, ) of scores between 0 and 1, one per image in the dataset. Lower scores indicate image more likely to contain a label issue.

  • pixel_scores – Array of shape (N,H,W) of scores between 0 and 1, one per pixel in the dataset.

cleanlab.segmentation.rank.issues_from_scores(image_scores, pixel_scores=None, threshold=0.1)[source]#

Converts scores output by ~cleanlab.segmentation.rank.get_label_quality_scores to a list of issues of similar format as output by segmentation.filter.find_label_issues.

Only considers as issues those tokens with label quality score lower than threshold, so this parameter determines the number of issues that are returned.

Note

  • This method is intended for converting the most severely mislabeled examples into a format compatible with summary methods like segmentation.summary.display_issues.

  • This method does not estimate the number of label errors since the threshold is arbitrary, for that instead use segmentation.filter.find_label_issues, which estimates the label errors via Confident Learning rather than score thresholding.

Parameters:
  • image_scores (ndarray) – Array of shape (N, ) of overall image scores, where N is the number of images in the dataset. Same format as the image_scores returned by ~cleanlab.segmentation.rank.get_label_quality_scores.

  • pixel_scores (Optional[ndarray]) – Optional array of shape (N,H,W) of scores between 0 and 1, one per pixel in the dataset. Same format as the pixel_scores returned by ~cleanlab.segmentation.rank.get_label_quality_scores.

  • threshold (float) – Optional quality scores threshold that determines which pixels are included in result. Pixels with with quality scores above the threshold are not included in the result. If not provided, all pixels are included in result.

Return type:

ndarray

Returns:

issues – Returns a boolean mask for the entire dataset where True represents a pixel label issue and False represents an example that is accurately labeled with using the threshold provided by the user. Use segmentation.summary.display_issues to view these issues within the original images.

If pixel_scores is not provided, returns array of integer indices (rather than boolean mask) of the images whose label quality score falls below the threshold (sorted by overall label quality score of each image).