Adaptive-Attentive Geolocalization from few queries: a hybrid approach

Visual place recognition is important in 3D reconstruction, consumer photography, or augmented reality. Given a query image, an algorithm should find images that depict the same place from a geotagged gallery.

However, most of the current methods work with queries and gallery images belonging to the same domain. A recent study suggests a novel two-blocks architecture that works with different domains.

In the first block, the mapping from the source to the target domain is learned. The second block produces a domain-invariant representation of the input data suitable for the retrieval task. Using just a few images from the target domain is enough for a significant localization improvement. An additional attention mechanism is created to weight the feature during training and testing. Also, a new large-scale dataset, which consists of Google Street View images and Oxford RobotCar dataset queries, is introduced.

We address the task of cross-domain visual place recognition, where the goal is to geolocalize a given query image against a labeled gallery, in the case where the query and the gallery belong to different visual domains. To achieve this, we focus on building a domain robust deep network by leveraging over an attention mechanism combined with few-shot unsupervised domain adaptation techniques, where we use a small number of unlabeled target domain images to learn about the target distribution. With our method, we are able to outperform the current state of the art while using two orders of magnitude less target domain images. Finally we propose a new large-scale dataset for cross-domain visual place recognition, called SVOX. Upon acceptance of the paper, code and dataset will be released.