When it comes to training models for object detection most people are using the bounding boxes defined by two points ((x_min,y_min),(x_max,y_max)). It does not matter which format is actually used - they are describing the exact same thing. A fundamental problem with these bounding boxes is the lack of accuracy especially on rotated objects in close proximity on high resolution images. Using an object segmentation approach might not actually be feasible to get a better ground truth. An alternative are so called rotated or oriented bounding boxes or simply a rectangular bounding box described by all four corner coordinates. This kind bounding box is more common for remote sensing applications.

Let’s have a look at a few example visualizations of how both bbox types look on top of each other when roated around a center point.

rotation_example_aspect_ratio_1.0 rotation_example_aspect_ratio_2.0 rotation_example_aspect_ratio_5.0

There are many applications where the standard 2 point bounding box causes more harm than good during training. To illustrate a this a bit more, we can plot the IoU between both polygons and the ratio of the areas as a function of rotation angle for various aspect ratios.

bbox_area_ratio_rot_angle IoU_rot_angle