Robot Perception enables Complex Navigation Behavior via Self-Supervised Learning

Humans are quite capable of coordinating their physical movements together with visual perception. In robots, this task isn’t that easy, especially when aiming to create a system that is capable of operating autonomously in long periods of time. Computer vision systems and motion perception systems, when implemented separately, often specialize in relatively narrow tasks and lack integration with each other.

In a new article, researchers from the Queensland University of Technology suggest an architecture for building unified robotic visuomotor control systems for active target-driven navigation tasks using principles of reinforcement learning.

Overview of the proposed unified robot learning framework for navigation tasks. Image credit: Marvin Chancán and Michael Milford, QUT Centre for Robotics, Queensland University of Technology

In their work, authors used the self-supervised machine learning to create motion estimates from visual odometry data and ‘localization representations’ from visual location recognition data. These two types of visuomotor signals are then temporally combined so that the machine learning system could automatically “learn” control policies and make complex navigation decisions. The proposed technique can effectively generalize extreme environmental changes with success rate of up to 80% compared to 30% for a solely vision-based navigation systems:

Our method temporally incorporates compact motion and visual perception data – directly obtained using self-supervision from a single image sequence – to enable complex goal-oriented navigation skills. We demonstrate our approach on two real-world driving dataset, KITTI and Oxford RobotCar, using the new interactive CityLearn framework. The results show that our method can accurately generalize to extreme environmental changes such as day to night cycles with up to an 80% success rate, compared to 30% for a vision-only navigation systems.

We have shown that combining self-supervised learning for visuomotor perception and RL for decision-making considerably improves the ability to deploy robotic systems capable of solving complex navigation tasks from raw image sequences only. We proposed a method, including a new neural network architecture, that temporally integrates two fundamental sensor modalities such as motion and vision for large-scale target-driven navigation tasks using real data via RL. Our approach was demonstrated to be robust to drastic visual changing conditions, where typical vision-only navigation pipelines fail. This suggest that odometry-based data can be used to improve the overall performance and robustness of conventional visionbased systems for learning complex navigation tasks. In future work, we seek to extend this approach by using unsupervised learning for both decision-making and perception.

Link to research article: