rank#
Methods to rank and score images in a semantic segmentation dataset based on how likely they are to contain mislabeled pixels.
Functions:
|
Returns a label quality score for each image. |
|
Converts scores output by |
- cleanlab.segmentation.rank.get_label_quality_scores(labels, pred_probs, *, method='softmin', batch_size=None, n_jobs=None, verbose=True, **kwargs)[source]#
Returns a label quality score for each image.
This is a function to compute label quality scores for semantic segmentation datasets, where lower scores indicate labels less likely to be correct.
N - Number of images in the dataset
K - Number of classes in the dataset
H - Height of each image
W - Width of each image
- Parameters:
labels (
ndarray
) – A discrete array of noisy labels for a segmantic segmentation dataset, in the shape(N,H,W,)
, where each pixel must be integer in 0, 1, …, K-1. Refer to documentation for this argument infind_label_issues
for further details.pred_probs (
ndarray
) – An array of shape(N,K,H,W,)
of model-predicted class probabilities. Refer to documentation for this argument infind_label_issues
for further details.method (
{"softmin", "num_pixel_issues"}
, default"softmin"
) –Label quality scoring method.
”softmin” - Calculates the inner product between scores and softmax(1-scores). For efficiency, use instead of “num_pixel_issues”.
”num_pixel_issues” - Uses the number of pixels with label issues for each image using
find_label_issues
batch_size (
Optional
[int
]) – Optional size of mini-batches to use for estimating the label issues for ‘num_pixel_issues’ only, not ‘softmin’. To maximize efficiency, try to use the largest batch_size your memory allows. If not provided, a good default is used.n_jobs (
Optional
[int
]) – Optional number of processes for multiprocessing (default value = 1). Only used on Linux. For ‘num_pixel_issues’ only, not ‘softmin’ If n_jobs=None, will use either the number of: physical cores if psutil is installed, or logical cores otherwise.verbose (
bool
) – Set toFalse
to suppress all print statements.**kwargs –
- downsampleint,
Factor to shrink labels and pred_probs by for ‘num_pixel_issues’ only, not ‘softmin’ . Default
16
Must be a factor divisible by both the labels and the pred_probs. Larger values of downsample produce faster runtimes but potentially less accurate results due to over-compression. Set to 1 to avoid any downsampling.
- temperaturefloat,
Temperature for softmin. Default
0.1
- Return type:
Tuple
[ndarray
,ndarray
]- Returns:
image_scores
– Array of shape(N, )
of scores between 0 and 1, one per image in the dataset. Lower scores indicate image more likely to contain a label issue.pixel_scores
– Array of shape(N,H,W)
of scores between 0 and 1, one per pixel in the dataset.
- cleanlab.segmentation.rank.issues_from_scores(image_scores, pixel_scores=None, threshold=0.1)[source]#
Converts scores output by ~cleanlab.segmentation.rank.get_label_quality_scores to a list of issues of similar format as output by
segmentation.filter.find_label_issues
.Only considers as issues those tokens with label quality score lower than threshold, so this parameter determines the number of issues that are returned.
Note
This method is intended for converting the most severely mislabeled examples into a format compatible with
summary
methods likesegmentation.summary.display_issues
.This method does not estimate the number of label errors since the threshold is arbitrary, for that instead use
segmentation.filter.find_label_issues
, which estimates the label errors via Confident Learning rather than score thresholding.
- Parameters:
image_scores (
ndarray
) – Array of shape (N, ) of overall image scores, where N is the number of images in the dataset. Same format as the image_scores returned by ~cleanlab.segmentation.rank.get_label_quality_scores.pixel_scores (
Optional
[ndarray
]) – Optional array of shape(N,H,W)
of scores between 0 and 1, one per pixel in the dataset. Same format as the pixel_scores returned by ~cleanlab.segmentation.rank.get_label_quality_scores.threshold (
float
) – Optional quality scores threshold that determines which pixels are included in result. Pixels with with quality scores above the threshold are not included in the result. If not provided, all pixels are included in result.
- Return type:
ndarray
- Returns:
issues
– Returns a boolean mask for the entire dataset whereTrue
represents a pixel label issue andFalse
represents an example that is accurately labeled with using the threshold provided by the user. Usesegmentation.summary.display_issues
to view these issues within the original images.If pixel_scores is not provided, returns array of integer indices (rather than boolean mask) of the images whose label quality score falls below the threshold (sorted by overall label quality score of each image).