Finding Label Errors in Object Detection Datasets#

This 5-minute quickstart tutorial demonstrates how to find potential label errors in object detection datasets. In object detection data, each image is annotated with multiple bounding boxes. Each bounding box surrounds a physical object within an image scene, and is annotated with a given class label.

Using such labeled data, we train a model to predict the locations and classes of objects in an image. An example notebook to train the object detection model whose predictions we rely on in this tutorial is available here. These predictions can subsequently be input to cleanlab in order to identify mislabeled images and a quality score quantifying our confidence in the overall annotations for each image.

After correcting these label issues, you can train an even better version of your model without changing your training code!

This tutorial uses a subset of the COCO (Common Objects in Context) dataset which has images of everyday scenes and considers objects from the 5 most popular classes: car, chair, cup, person, traffic light.

Overview of what we we’ll do in this tutorial

  • Score images based on their overall label quality (i.e. our confidence each image is correctly labeled) using cleanlab.object_detection.rank.get_label_quality_scores

  • Estimate which images have label issues using cleanlab.object_detection.filter.find_label_issues

  • Visually review images + labels using cleanlab.object_detection.summary.visualize

Quickstart

Already have labels and predictions in the proper format? Just run the code below to find label issues in your object detection dataset.

from cleanlab.object_detection.filter import find_label_issues
from cleanlab.object_detection.rank import get_label_quality_scores

# To get boolean vector of label issues for all images
is_label_issue = find_label_issues(labels, predictions)

# To get label quality scores for all images
label_quality_scores = get_label_quality_scores(labels, predictions)

1. Install required dependencies and download data#

You can use pip to install all packages required for this tutorial as follows

!pip install matplotlib
!pip insall cleanlab
# Make sure to install the version corresponding to this tutorial
# E.g. if viewing master branch documentation:
#     !pip install git+https://github.com/cleanlab/cleanlab.git
[2]:
%%capture

!wget -nc 'https://cleanlab-public.s3.amazonaws.com/ObjectDetectionBenchmarking/tutorial_obj/predictions.pkl'
!wget -nc 'https://cleanlab-public.s3.amazonaws.com/ObjectDetectionBenchmarking/tutorial_obj/labels.pkl'
!wget -nc 'https://cleanlab-public.s3.amazonaws.com/ObjectDetectionBenchmarking/tutorial_obj/example_images.zip' && unzip -q -o example_images.zip
[3]:
import pickle

from cleanlab.object_detection.rank import get_label_quality_scores, issues_from_scores
from cleanlab.object_detection.filter import find_label_issues
from cleanlab.object_detection.summary import visualize

2. Format data, labels, and model predictions#

We begin by loading labels and predictions for our dataset, which are the only inputs required to find label issues with cleanlab. Note that the predictions should be out-of-sample, which can be obtained for every image in a dataset via K-fold cross-validation.

In a separate example notebook (link), we trained a Detectron2 object detection model and used it to obtain predictions on a held-out validation dataset whose labels we audit here.

Note: If you want to find all the mislabeled images across the entire COCO dataset, you can first execute our other example notebook that uses K-fold cross-validation to produce out-of-sample predictions for every image, then use those labels and predictions below.

[4]:
IMAGE_PATH = './example_images/'  # path to raw image files downloaded above
predictions = pickle.load(open("predictions.pkl", "rb"))
labels = pickle.load(open("labels.pkl", "rb"))

In object detection datasets, each given label is a made up of bounding box coordinates and a class label. A model prediction is also made up of a bounding box and predicted class label, as well as the model confidence (probability estimate) in its prediction. To detect label issues, cleanlab requires given labels for each image, and the corresponding model predictions for the image (but not the image itself).

Here’s what an example looks like in our dataset. We visualize the given and predicted labels (in red and blue) for this image using the cleanlab.object_detection.summary.visualize method.

[5]:
image_to_visualize = 8  # change this to view other images
image_path = IMAGE_PATH + labels[image_to_visualize]['seg_map']
visualize(image_path, label=labels[image_to_visualize], prediction=predictions[image_to_visualize], overlay=False)
../_images/tutorials_object_detection_8_0.png

The required format of these labels and predictions matches what popular object detection frameworks like MMDetection and Detectron2 expect. Recall the 5 possible class labels in our dataset are: car, chair, cup, person, traffic light. These classes are represented as (zero-indexed) integers 0,1,…,4.

labels is a list where for the i-th image in our dataset, labels[i] is a dictionary containing: key labels – a list of class labels for each bounding box in this image and key bboxes – a numpy array of the bounding boxes’ coordinates. Each bounding box in labels[i]['bboxes'] is in the format [x1,y1,x2,y2] where (x1,y1) corresponds to the bottom-left corner of the box and (x2,y2) the top-right.

Let’s see what labels[i] looks like for our previous example image:

[6]:
labels[image_to_visualize]
[6]:
{'bboxes': array([[201.96, 101.71, 334.78, 334.68]], dtype=float32),
 'labels': array([3]),
 'bboxes_ignore': array([], shape=(0, 4), dtype=float32),
 'masks': [[[290.44,
    200.04,
    286.59,
    213.5,
    285.63,
    224.08,
    290.44,
    231.77,
    293.32,
    235.62,
    289.48,
    251.97,
    282.74,
    266.39,
    281.78,
    271.2,
    280.82,
    277.93,
    279.86,
    287.55,
    277.93,
    299.09,
    276.97,
    307.75,
    276.97,
    321.21,
    281.78,
    326.02,
    290.44,
    330.83,
    286.59,
    333.71,
    263.51,
    334.68,
    261.59,
    319.29,
    257.74,
    295.25,
    251.97,
    290.44,
    251.97,
    283.7,
    250.05,
    283.7,
    243.31,
    303.9,
    243.31,
    316.4,
    243.31,
    319.29,
    247.16,
    323.14,
    251.01,
    326.02,
    249.08,
    328.91,
    227.93,
    327.94,
    226.0,
    323.14,
    226.96,
    313.52,
    226.96,
    303.9,
    226.0,
    293.32,
    216.39,
    283.7,
    226.0,
    236.58,
    228.89,
    226.96,
    232.73,
    219.27,
    239.47,
    216.39,
    240.43,
    209.65,
    242.35,
    202.92,
    240.43,
    185.61,
    230.81,
    198.11,
    219.27,
    215.42,
    218.31,
    224.08,
    220.23,
    229.85,
    217.35,
    237.54,
    213.5,
    238.5,
    207.73,
    239.47,
    204.84,
    239.47,
    201.96,
    237.54,
    201.96,
    228.89,
    205.81,
    224.08,
    206.77,
    220.23,
    218.31,
    191.38,
    219.27,
    185.61,
    223.12,
    180.8,
    226.0,
    175.03,
    229.85,
    167.34,
    231.77,
    159.64,
    236.86,
    153.25,
    240.46,
    151.71,
    253.35,
    149.13,
    254.9,
    147.07,
    250.26,
    143.46,
    247.16,
    140.88,
    244.59,
    124.39,
    244.59,
    115.11,
    246.65,
    109.44,
    249.74,
    104.81,
    256.44,
    102.23,
    262.11,
    101.71,
    268.29,
    101.71,
    273.96,
    101.71,
    277.06,
    101.71,
    283.76,
    108.41,
    284.79,
    110.48,
    287.88,
    119.24,
    286.85,
    122.33,
    286.85,
    126.97,
    286.85,
    132.64,
    286.85,
    136.76,
    285.82,
    145.52,
    284.27,
    150.16,
    286.33,
    151.71,
    290.97,
    155.83,
    293.03,
    173.35,
    297.67,
    180.05,
    317.25,
    190.87,
    319.32,
    191.9,
    326.53,
    192.42,
    329.62,
    192.93,
    332.2,
    196.03,
    334.26,
    201.18,
    334.78,
    207.88,
    329.11,
    209.94,
    326.53,
    205.82,
    324.47,
    203.24,
    323.44,
    202.21,
    320.86,
    202.21,
    316.22,
    203.76,
    314.68,
    203.24,
    313.65,
    200.67,
    307.46,
    199.63,
    297.67,
    198.6,
    294.58,
    197.06,
    291.49,
    197.06,
    290.97,
    196.03]]],
 'seg_map': '000000481413.jpg'}

predictions is a list where the predictions output by our model for the i-th image: predictions[i] is a list/array of shape (K,). Here K is the number of classes in the dataset (same for every image) and predictions[i][k] is of shape (M,5), where M is the number of bounding boxes predicted to contain objects of class k (in image i, differs between images). The five columns of predictions[i][k] correspond to [x1,y1,x2,y2,pred_prob] for each bounding box predicted by the model. pred_prob is the model confidence in its predicted label for this box. Since our dataset has K = 5 classes, we have: predictions[i].shape = (5,).

Let’s see what predictions[i] looks like for our previous example image:

[7]:
predictions[image_to_visualize]
[7]:
array([array([], shape=(0, 5), dtype=float32),
       array([], shape=(0, 5), dtype=float32),
       array([], shape=(0, 5), dtype=float32),
       array([[204.42398  , 103.44503  , 337.29968  , 336.21005  ,   0.9978472]],
             dtype=float32)                                                      ,
       array([], shape=(0, 5), dtype=float32)], dtype=object)

Once you have labels and predictions in the appropriate formats, you can find label issues with cleanlab for any object detection dataset!

3. Use cleanlab to find label issues#

Given labels and predictions from our trained model, cleanlab can automatically find mislabeled images in the dataset. In object detection, we consider an image mislabeled if any of its bounding boxes or their class labels are incorrect (including if the image contains any overlooked objects which should’ve been annotated with a box)

Images may be mislabeled because annotators:

  • overlooked an object (forgot to annotate a bounding box around a depicted object)

  • chose the wrong class label for an annotated box in the correct location

  • imperfectly drew the bounding box such that its location is incorrect

Cleanlab is expected to flag images that exhibit any of these annotation errors as having label issues. More severe annotation errors are expected to produce lower cleanlab label quality scores closer to 0. Let’s first estimate which images have label issues:

[8]:
label_issue_idx = find_label_issues(labels, predictions, return_indices_ranked_by_score=True)

num_examples_to_show = 5 # view this many images flagged with the most severe label issues
label_issue_idx[:num_examples_to_show]
Pruning 0 predictions out of 113 using threshold==0.0. These predictions are no longer considered as potential candidates for identifying label issues as their similarity with the given labels is no longer considered.
[8]:
array([50, 16, 31, 29, 45])

The above code identifies which images have label issues, returning a list of their indices. This is because we specified the return_indices_ranked_by_score argument which sorts these indices by the estimated label quality of each image. Below we describe how to directly estimate the label quality scores of each image.

Note: You can omit the return_indices_ranked_by_score argument for find_label_issues() to instead return a Boolean mask for the entire dataset (True entries in this mask correspond to images with label issues)

Get label quality scores#

Cleanlab can also compute scores for each image to estimate our confidence that it has been correctly labeled. These label quality scores range between 0 and 1, with smaller values indicating examples whose annotation is more likely to be wrong in some way.

Each image in the dataset receives a label quality score. These scores are useful for prioritizing which images to review; if you have too little time, first review the images with the lowest label quality scores.

[9]:
scores = get_label_quality_scores(labels, predictions)
scores[:num_examples_to_show]
Pruning 0 predictions out of 113 using threshold==0.0. These predictions are no longer considered as potential candidates for identifying label issues as their similarity with the given labels is no longer considered.
[9]:
array([0.97489622, 0.70610878, 0.98764951, 0.88899237, 0.99085805])

We can also use the label quality scores to flag which images have label issues based on a threshold. Here we convert these per-image scores into an array of indices corresponding to images flagged with label issues, sorted by label quality score, in the same format returned by find_label_issues()

[10]:
issue_idx = issues_from_scores(scores, threshold=0.5)  # lower threshold will return fewer (but more confident) label issues
issue_idx[:num_examples_to_show], scores[issue_idx][:num_examples_to_show]
[10]:
(array([50, 16, 31, 29, 45]),
 array([6.95569726e-05, 9.03354841e-05, 8.57510169e-04, 1.58447666e-03,
        2.39755858e-01]))

4. Use ObjectLab to visualize label issues#

Finally, we can visualize images with potential label errors via cleanlab’s visualize() function. To enhance the visualization, you can supply a class_names dictionary to include as a legend and turn off overlay to see the given and predicted labels side by side.

[11]:
issue_to_visualize = issue_idx[0]  # change this to view other images
class_names = {"0": "car", "1": "chair", "2": "cup", "3":"person", "4": "traffic light"}

label = labels[issue_to_visualize]
prediction = predictions[issue_to_visualize]
score = scores[issue_to_visualize]
image_path = IMAGE_PATH + label['seg_map']

print(image_path, '| idx', issue_to_visualize , '| label quality score:', score, '| is issue: True')
visualize(image_path, label=label, prediction=prediction, class_names=class_names, overlay=False)
./example_images/000000009483.jpg | idx 50 | label quality score: 6.95569726168054e-05 | is issue: True
../_images/tutorials_object_detection_22_1.png

The visualization depicts the given label (original image annotation which cleanlab identified as problematic) in red on the left and the model-predicted label in blue on the right. Each bounding box contains a class-index number in the top corner indicating which object class that bounding box was annotated/predicted to contain.

This image has a low label quality score and is marked as an error. On closer inspection we notice the annotator missed the reflection of the person in the mirror that the model identified. Additionally, the chairs visible in the reflection were not annotated.

Notice examples where the predictions and labels are more similar have higher quality scores than those that are missmatched, and are less likeley to be marked as issues and the number of boxes is agnostic to the score.

Better trained models will lead to better label error detection but you don’t need a near perfect model to identify label issues.

Different kinds of label issues identified by ObjectLab#

Now lets view the first few images in our vaidation dataset that are clearly marked as issues and see what various inconsistencies between the given and predicted label we can spot.

[12]:
issue_to_visualize = issue_idx[1]
label = labels[issue_to_visualize]
prediction = predictions[issue_to_visualize]
score = scores[issue_to_visualize]

image_path = IMAGE_PATH + label['seg_map']
print(image_path, '| idx', issue_to_visualize , '| label quality score:', score, '| is issue: True')
visualize(image_path, label=label, prediction=prediction, class_names=class_names, overlay=False)
./example_images/000000395701.jpg | idx 16 | label quality score: 9.033548411774308e-05 | is issue: True
../_images/tutorials_object_detection_24_1.png

Notice the armchair to the left of the TV is missing an annotation.

[13]:
issue_to_visualize = issue_idx[6]
label = labels[issue_to_visualize]
prediction = predictions[issue_to_visualize]
score = scores[issue_to_visualize]

image_path = IMAGE_PATH + label['seg_map']
print(image_path, '| idx', issue_to_visualize , '| label quality score:', score, '| is issue: True')
visualize(image_path, label=label, prediction=prediction, class_names=class_names, overlay=False)
./example_images/000000154004.jpg | idx 62 | label quality score: 0.38300759625496356 | is issue: True
../_images/tutorials_object_detection_26_1.png

Similarly, the woman in a red jacket in the foreground is missing an annotation.

[14]:
issue_to_visualize = issue_idx[2]
label = labels[issue_to_visualize]
prediction = predictions[issue_to_visualize]
score = scores[issue_to_visualize]

image_path = IMAGE_PATH + label['seg_map']
print(image_path, '| idx', issue_to_visualize , '| label quality score:', score, '| is issue: True')
visualize(image_path, label=label, prediction=prediction, class_names=class_names, overlay=False)
./example_images/000000448410.jpg | idx 31 | label quality score: 0.0008575101690203273 | is issue: True
../_images/tutorials_object_detection_28_1.png

The people in this image should have had individual bounding boxes around each persons (the COCO guidelines state only groups with 10+ objects of the same type can be a “crowd” bounded by a single box). Individuals in the back are missing annotations.

All of these examples received low label quality scores reflecting their low annotation quality in the original dataset.

Other uses of visualize#

The visualize() function can also depict non-issue images, labels or predictions alone, or just the image itself. Let’s explore this with a few images in our dataset.

We can save a visualization to file via the save_path argument. Note the label quality score is high for this example and it is marked as a non-issue. The given and predicted labels closely resemble each other contributing to the high score.

[15]:
image_to_visualize = 0
image_path = IMAGE_PATH + labels[image_to_visualize]['seg_map']
print(image_path, '| idx', image_to_visualize , '| label quality score:', scores[image_to_visualize], '| is issue:', image_to_visualize in issue_idx)
visualize(image_path, label=labels[image_to_visualize], prediction=predictions[image_to_visualize], class_names=class_names, save_path='./example_image.png')
./example_images/000000499768.jpg | idx 0 | label quality score: 0.9748962231208227 | is issue: False
../_images/tutorials_object_detection_31_1.png

For the next example, notice how we are only passing in the given labels to visualize. We can limit visualization to either labels, predictions, or neither.

[16]:
image_to_visualize = 3
image_path = IMAGE_PATH + labels[image_to_visualize]['seg_map']
print(image_path, '| idx', image_to_visualize , '| label quality score:', scores[image_to_visualize], '| is issue:', image_to_visualize in issue_idx)
visualize(image_path, label=labels[image_to_visualize], class_names=class_names)
./example_images/000000521141.jpg | idx 3 | label quality score: 0.8889923658893665 | is issue: False
../_images/tutorials_object_detection_33_1.png

For completeness, let’s just look at an image alone.

[17]:
image_to_visualize = 2
image_path = IMAGE_PATH + labels[image_to_visualize]['seg_map']
print(image_path, '| idx', image_to_visualize , '| label quality score:', scores[image_to_visualize], '| is issue:', image_to_visualize in issue_idx)
visualize(image_path)
./example_images/000000143931.jpg | idx 2 | label quality score: 0.9876495074395956 | is issue: False
../_images/tutorials_object_detection_35_1.png