Predicting future poses of road uses is a crucial challenge in autonomous driving systems. An orthographic bird’s-eye view perspective is commonly used for LiDAR-based prediction. Using cameras instead of LiDAR would result in a leaner, cheaper, and higher resolution visual recognition system.
Traffic at the intersection. Image credit: 綾小路 葵 via Flickr, CC BY-SA 2.0
Operating a camera-based system in a bird’s eye view frame would be beneficial for planning and control. A recent paper proposes the first future prediction model in the bird’s-eye view from monocular camera videos.
The network models future stochasticity directly from driving data. The system predicts temporally consistent future instance segmentation and motion. It can also reason about the probabilistic nature of the future and predict plausible and multimodal future trajectories. The results show that the suggested system outperforms current prediction baselines in autonomous driving.
Driving requires interacting with road agents and predicting their future behaviour in order to navigate safely. We present FIERY: a probabilistic future prediction model in bird’s-eye view from monocular cameras. Our model predicts future instance segmentation and motion of dynamic agents that can be transformed into non-parametric future trajectories. Our approach combines the perception, sensor fusion and prediction components of a traditional autonomous driving stack by estimating bird’s-eye-view prediction directly from surround RGB monocular camera inputs. FIERY learns to model the inherent stochastic nature of the future directly from camera driving data in an end-to-end manner, without relying on HD maps, and predicts multimodal future trajectories. We show that our model outperforms previous prediction baselines on the NuScenes and Lyft datasets. Code is available at this https URL
Research paper: Hu, A., “FIERY: Future Instance Prediction in Bird’s-Eye View from Surround Monocular Cameras”, 2021.Link: https://arxiv.org/abs/2104.10490v1