filter#

Methods to find label issues in an object detection dataset, where each annotated bounding box in an image receives its own class label.

Functions:

find_label_issues(labels, predictions, *[, ...])

Identifies potentially mislabeled images in an object detection dataset.

cleanlab.object_detection.filter.find_label_issues(labels, predictions, *, return_indices_ranked_by_score=False, overlapping_label_check=True)[source]#

Identifies potentially mislabeled images in an object detection dataset. An image is flagged with a label issue if any of its bounding boxes appear incorrectly annotated. This includes images for which a bounding box: should have been annotated but is missing, has been annotated with the wrong class, or has been annotated in a suboptimal location.

Suppose the dataset has N images, K possible class labels. If return_indices_ranked_by_score is False, a boolean mask of length N is returned, indicating whether each image has a label issue (True) or not (False). If return_indices_ranked_by_score is True, the indices of images flagged with label issues are returned, sorted with the most likely-mislabeled images ordered first.

Parameters:
  • labels (List[Dict[str, Any]]) –

    Annotated boxes and class labels in the original dataset, which may contain some errors.

    This is a list of N dictionaries such that labels[i] contains the given labels for the i-th image in the following format: {'bboxes': np.ndarray((L,4)), 'labels': np.ndarray((L,)), 'image_name': str} where L is the number of annotated bounding boxes for the i-th image and bboxes[l] is a bounding box of coordinates in [x1,y1,x2,y2] format with given class label labels[j]. image_name is an optional part of the labels that can be used to later refer to specific images.

    For more information on proper labels formatting, check out the MMDetection library.

  • predictions (List[ndarray]) –

    Predictions output by a trained object detection model. For the most accurate results, predictions should be out-of-sample to avoid overfitting, eg. obtained via cross-validation. This is a list of N np.ndarray such that predictions[i] corresponds to the model prediction for the i-th image. For each possible class k in 0, 1, …, K-1: predictions[i][k] is a np.ndarray of shape (M,5), where M is the number of predicted bounding boxes for class k. Here the five columns correspond to [x1,y1,x2,y2,pred_prob], where [x1,y1,x2,y2] are coordinates of the bounding box predicted by the model and pred_prob is the model’s confidence in the predicted class label for this bounding box.

    Note: Here, [x1,y1] corresponds to the coordinates of the bottom-left corner of the bounding box, while [x2,y2] corresponds to the coordinates of the top-right corner of the bounding box. The last column, pred_prob, represents the predicted probability that the bounding box contains an object of the class k.

    For more information see the MMDetection package for an example object detection library that outputs predictions in the correct format.

  • return_indices_ranked_by_score (Optional[bool]) – Determines what is returned by this method (see description of return value for details).

  • overlapping_label_check (bool, default = True) – If True, boxes annotated with more than one class label have their swap score penalized. Set this to False if you are not concerned when two very similar boxes exist with different class labels in the given annotations.

Return type:

ndarray

Returns:

label_issues (np.ndarray) – Specifies which images are identified to have a label issue. If return_indices_ranked_by_score = False, this function returns a boolean mask of length N (True entries indicate which images have label issue). If return_indices_ranked_by_score = True, this function returns a (shorter) array of indices of images with label issues, sorted by how likely the image is mislabeled.

More precisely, indices are sorted by image label quality score calculated via object_detection.rank.get_label_quality_scores.