filter#
Methods to find label issues in image semantic segmentation datasets, where each pixel in an image receives its own class label.
Functions:
| 
 | Returns a boolean mask for the entire dataset, per pixel where  | 
- cleanlab.segmentation.filter.find_label_issues(labels, pred_probs, *, batch_size=None, n_jobs=None, verbose=True, **kwargs)[source]#
- Returns a boolean mask for the entire dataset, per pixel where - Truerepresents an example identified with a label issue and- Falserepresents an example of a pixel correctly labeled.- N - Number of images in the dataset 
- K - Number of classes in the dataset 
- H - Height of each image 
- W - Width of each image 
 - Tip - If you encounter the error “pred_probs is not defined”, try setting - n_jobs=1.- Parameters:
- labels ( - ndarray) –- A discrete array of shape - (N,H,W,)of noisy labels for a semantic segmentation dataset, i.e. some labels may be erroneous.- Format requirements: For a dataset with K classes, each pixel must be labeled using an integer in 0, 1, …, K-1. - Tip - If your labels are one hot encoded you can do: - labels = np.argmax(labels_one_hot, axis=1)assuming that- labels_one_hotis of dimension- (N,K,H,W), in order to get properly formatted- labels.
- pred_probs ( - ndarray) – An array of shape- (N,K,H,W,)of model-predicted class probabilities,- P(label=k|x)for each pixel- x. The prediction for each pixel is an array corresponding to the estimated likelihood that this pixel belongs to each of the- Kclasses. The 2nd dimension of- pred_probsmust be ordered such that these probabilities correspond to class 0, 1, …, K-1.
- batch_size ( - Optional[- int]) – Optional size of image mini-batches used for computing the label issues in a streaming fashion (does not affect results, just the runtime and memory requirements). To maximize efficiency, try to use the largest- batch_sizeyour memory allows. If not provided, a good default is used.
- n_jobs ( - Optional[- int]) – Optional number of processes for multiprocessing (default value = 1). Only used on Linux. If- n_jobs=None, will use either the number of: physical cores if psutil is installed, or logical cores otherwise.
- verbose ( - bool) – Set to- Falseto suppress all print statements.
- **kwargs – - downsample: int, Optional factor to shrink labels and pred_probs by. Default - 1Must be a factor divisible by both the labels and the pred_probs. Larger values of- downsampleproduce faster runtimes but potentially less accurate results due to over-compression. Set to 1 to avoid any downsampling.
 
 
- Return type:
- ndarray
- Returns:
- label_issues ( - np.ndarray) – Returns a boolean mask for the entire dataset of length- (N,H,W)where- Truerepresents a pixel label issue and- Falserepresents an example that is correctly labeled.