filter#
Methods to find label issues in image semantic segmentation datasets, where each pixel in an image receives its own class label.
Functions:
|
Returns a boolean mask for the entire dataset, per pixel where |
- cleanlab.segmentation.filter.find_label_issues(labels, pred_probs, *, batch_size=None, n_jobs=None, verbose=True, **kwargs)[source]#
Returns a boolean mask for the entire dataset, per pixel where
Truerepresents an example identified with a label issue andFalserepresents an example of a pixel correctly labeled.N - Number of images in the dataset
K - Number of classes in the dataset
H - Height of each image
W - Width of each image
Tip
If you encounter the error “pred_probs is not defined”, try setting
n_jobs=1.- Parameters:
labels (
ndarray) –A discrete array of shape
(N,H,W,)of noisy labels for a semantic segmentation dataset, i.e. some labels may be erroneous.Format requirements: For a dataset with K classes, each pixel must be labeled using an integer in 0, 1, …, K-1.
Tip
If your labels are one hot encoded you can do:
labels = np.argmax(labels_one_hot, axis=1)assuming that labels_one_hot is of dimension(N,K,H,W), in order to get properly formatted labels.pred_probs (
ndarray) – An array of shape(N,K,H,W,)of model-predicted class probabilities,P(label=k|x)for each pixelx. The prediction for each pixel is an array corresponding to the estimated likelihood that this pixel belongs to each of theKclasses. The 2nd dimension of pred_probs must be ordered such that these probabilities correspond to class 0, 1, …, K-1.batch_size (
Optional[int]) – Optional size of image mini-batches used for computing the label issues in a streaming fashion (does not affect results, just the runtime and memory requirements). To maximize efficiency, try to use the largest batch_size your memory allows. If not provided, a good default is used.n_jobs (
Optional[int]) – Optional number of processes for multiprocessing (default value = 1). Only used on Linux. If n_jobs=None, will use either the number of: physical cores if psutil is installed, or logical cores otherwise.verbose (
bool) – Set toFalseto suppress all print statements.**kwargs –
downsample: int, Optional factor to shrink labels and pred_probs by. Default
1Must be a factor divisible by both the labels and the pred_probs. Larger values of downsample produce faster runtimes but potentially less accurate results due to over-compression. Set to 1 to avoid any downsampling.
- Return type:
ndarray- Returns:
label_issues (
np.ndarray) – Returns a boolean mask for the entire dataset of length (N,H,W) whereTruerepresents a pixel label issue andFalserepresents an example that is correctly labeled.