filter#
Methods to find label issues in an object detection dataset, where each annotated bounding box in an image receives its own class label.
Functions:
|
Identifies potentially mislabeled images in an object detection dataset. |
- cleanlab.object_detection.filter.find_label_issues(labels, predictions, *, return_indices_ranked_by_score=False, overlapping_label_check=True)[source]#
Identifies potentially mislabeled images in an object detection dataset. An image is flagged with a label issue if any of its bounding boxes appear incorrectly annotated. This includes images for which a bounding box: should have been annotated but is missing, has been annotated with the wrong class, or has been annotated in a suboptimal location.
Suppose the dataset has
N
images,K
possible class labels. Ifreturn_indices_ranked_by_score
isFalse
, a boolean mask of lengthN
is returned, indicating whether each image has a label issue (True
) or not (False
). Ifreturn_indices_ranked_by_score
isTrue
, the indices of images flagged with label issues are returned, sorted with the most likely-mislabeled images ordered first.- Parameters:
labels (
List
[Dict
[str
,Any
]]) –Annotated boxes and class labels in the original dataset, which may contain some errors. This is a list of
N
dictionaries such thatlabels[i]
contains the given labels for the i-th image in the following format:{'bboxes': np.ndarray((L,4)), 'labels': np.ndarray((L,)), 'image_name': str}
whereL
is the number of annotated bounding boxes for the i-th image andbboxes[l]
is a bounding box of coordinates in[x1,y1,x2,y2]
format and with given class labellabels[j]
.image_name
is an optional part of the labels that can be used to later refer to specific images.Note: Here,
(x1,y1)
corresponds to the top-left and(x2,y2)
corresponds to the bottom-right corner of the bounding box with respect to the image matrix [e.g. XYXY in Keras <https://keras.io/api/keras_cv/bounding_box/formats/>, Detectron 2 <https://detectron2.readthedocs.io/en/latest/modules/utils.html#detectron2.utils.visualizer.Visualizer.draw_box>].For more information on proper labels formatting, check out the MMDetection library.
predictions (
List
[ndarray
]) –Predictions output by a trained object detection model. For the most accurate results, predictions should be out-of-sample to avoid overfitting, eg. obtained via cross-validation. This is a list of
N
np.ndarray
such thatpredictions[i]
corresponds to the model prediction for the i-th image. For each possible classk
in 0, 1, …, K-1:predictions[i][k]
is anp.ndarray
of shape(M,5)
, whereM
is the number of predicted bounding boxes for classk
. Here the five columns correspond to[x1,y1,x2,y2,pred_prob]
, where[x1,y1,x2,y2]
are coordinates of the bounding box predicted by the model andpred_prob
is the model’s confidence in the predicted class label for this bounding box.Note: Here,
(x1,y1)
corresponds to the top-left and(x2,y2)
corresponds to the bottom-right corner of the bounding box with respect to the image matrix [e.g. XYXY in Keras <https://keras.io/api/keras_cv/bounding_box/formats/>, Detectron 2 <https://detectron2.readthedocs.io/en/latest/modules/utils.html#detectron2.utils.visualizer.Visualizer.draw_box>]. The last column, pred_prob, represents the predicted probability that the bounding box contains an object of the class k.For more information see the MMDetection package for an example object detection library that outputs predictions in the correct format.
return_indices_ranked_by_score (
Optional
[bool
]) – Determines what is returned by this method (see description of return value for details).overlapping_label_check (
bool
, default= True
) – If True, boxes annotated with more than one class label have their swap score penalized. Set this to False if you are not concerned when two very similar boxes exist with different class labels in the given annotations.
- Return type:
ndarray
- Returns:
label_issues (
np.ndarray
) – Specifies which images are identified to have a label issue. Ifreturn_indices_ranked_by_score = False
, this function returns a boolean mask of lengthN
(True
entries indicate which images have label issue). Ifreturn_indices_ranked_by_score = True
, this function returns a (shorter) array of indices of images with label issues, sorted by how likely the image is mislabeled.More precisely, indices are sorted by image label quality score calculated via
object_detection.rank.get_label_quality_scores
.