Machine learning helps to perform various computer vision tasks. However, understanding objects in 3D is still a challenging problem due to the lack of 3D datasets. Recently, researchers published an Objectron dataset consisting of short object-centered video clips. It also includes camera metadata and manually annotated 3D bounding boxes.
Example images in the Objectron dataset. Image credit: Google
The dataset can be applied in 3D object detection. A two-stage model is proposed for this task. Firstly, the object detector finds the 2D crop of an object. Then, a 3D bounding box is estimated, and a 2D crop for the next frame is computed. That way, the object detector does not have to run every frame.
Diagram of a reference 3D object detection solution. Image credit: Google
The approach is suitable for detecting shoes, chairs, mugs, and cameras. The authors also suggest an algorithm to calculate metrics for the performance of 3D object detection. This work paves the road for further applications in view synthesis or 3D representation.
Link to the article: https://ai.googleblog.com/2020/11/announcing-objectron-dataset.html