Home > Robotics > CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data

CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data

As satellite positioning is vulnerable in the reception of signals, alternative methods are in demand for absolute large-scale localization of autonomous vehicles. As small and low-cost cameras are available to capture information, features of a known environment can be recognized in the captured images and used to determine the absolute camera poses. However, such an approach lacks suitable open-source datasets.

Image credit: Eschenzweig via Wikimedia, CC-BY-SA-4.0

A recent paper on arXiv.org proposes a synthetic data generation scheme. It takes the geographic camera poses as input and renders the simulated RGB images accompanied by 2D and 3D modalities such as semantics, geographic coordinates, depth, and surface normal.

Two large-scale benchmark datasets using the proposed workflow datasets for sim-to-real visual localization were curated. Also, a cross-modal visual representation learning approach was introduced for absolute localization.

We present a visual localization system that learns to estimate camera poses in the real world with the help of synthetic data. Despite significant progress in recent years, most learning-based approaches to visual localization target at a single domain and require a dense database of geo-tagged images to function well. To mitigate the data scarcity issue and improve the scalability of the neural localization models, we introduce TOPO-DataGen, a versatile synthetic data generation tool that traverses smoothly between the real and virtual world, hinged on the geographic camera viewpoint. New large-scale sim-to-real benchmark datasets are proposed to showcase and evaluate the utility of the said synthetic data. Our experiments reveal that synthetic data generically enhances the neural network performance on real data. Furthermore, we introduce CrossLoc, a cross-modal visual representation learning approach to pose estimation that makes full use of the scene coordinate ground truth via self-supervision. Without any extra data, CrossLoc significantly outperforms the state-of-the-art methods and achieves substantially higher real-data sample efficiency. Our code is available at this https URL.

Research paper: Yan, Q., Zheng, J., Reding, S., Li, S., and Doytchinov, I., “CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data”, , 2021. Link: https://arxiv.org/abs/2112.09081


Notify of
Inline Feedbacks
View all comments
Would love your thoughts, please comment.x