filter#
Methods to find label issues in image semantic segmentation datasets, where each pixel in an image receives its own class label.
Functions:
|
Returns a boolean mask for the entire dataset, per pixel where |
- cleanlab.segmentation.filter.find_label_issues(labels, pred_probs, *, batch_size=None, n_jobs=None, verbose=True, **kwargs)[source]#
Returns a boolean mask for the entire dataset, per pixel where
True
represents an example identified with a label issue andFalse
represents an example of a pixel correctly labeled.N - Number of images in the dataset
K - Number of classes in the dataset
H - Height of each image
W - Width of each image
Tip
If you encounter the error “pred_probs is not defined”, try setting
n_jobs=1
.- Parameters:
labels (
ndarray
) –A discrete array of shape
(N,H,W,)
of noisy labels for a semantic segmentation dataset, i.e. some labels may be erroneous.Format requirements: For a dataset with K classes, each pixel must be labeled using an integer in 0, 1, …, K-1.
Tip
If your labels are one hot encoded you can do:
labels = np.argmax(labels_one_hot, axis=1)
assuming that labels_one_hot is of dimension(N,K,H,W)
, in order to get properly formatted labels.pred_probs (
ndarray
) – An array of shape(N,K,H,W,)
of model-predicted class probabilities,P(label=k|x)
for each pixelx
. The prediction for each pixel is an array corresponding to the estimated likelihood that this pixel belongs to each of theK
classes. The 2nd dimension of pred_probs must be ordered such that these probabilities correspond to class 0, 1, …, K-1.batch_size (
Optional
[int
]) – Optional size of image mini-batches used for computing the label issues in a streaming fashion (does not affect results, just the runtime and memory requirements). To maximize efficiency, try to use the largest batch_size your memory allows. If not provided, a good default is used.n_jobs (
Optional
[int
]) – Optional number of processes for multiprocessing (default value = 1). Only used on Linux. If n_jobs=None, will use either the number of: physical cores if psutil is installed, or logical cores otherwise.verbose (
bool
) – Set toFalse
to suppress all print statements.**kwargs –
downsample: int, Optional factor to shrink labels and pred_probs by. Default
1
Must be a factor divisible by both the labels and the pred_probs. Larger values of downsample produce faster runtimes but potentially less accurate results due to over-compression. Set to 1 to avoid any downsampling.
- Return type:
ndarray
- Returns:
label_issues (
np.ndarray
) – Returns a boolean mask for the entire dataset of length (N,H,W) whereTrue
represents a pixel label issue andFalse
represents an example that is correctly labeled.