Navigation-Oriented Scene Understanding for Robotic Autonomy: Learning to Segment Driveability in Egocentric Images

While much research related to robotic autonomy is done for controlled environments, many practical applications include venturing into unknown territories and successfully navigating them. Autonomous navigation for robots in unfamiliar territories is an area of broad interest for researchers. Its vast applications range from rover navigation on Mars to self-driving cars.

Image credit: Steve Jurvetson via Wikimedia (CC BY 2.0)

A recent study published on arXiv and authored by Galadrielle Humblot-Renaux, Letizia Marchegiani, Thomas B. Moeslund, and Rikke Gade discusses the scene understanding for outdoor robotic navigation using on-board camera capabilities.

Importance of this research:

In the words of the researchers,

This navigation-oriented learning scheme is, to the best of our knowledge, the first attempt at incorporating an interclass ranking in a scene understanding task, taking into account both the type and location of mistakes during learning to improve task-oriented pixel classification. Our contributions can be summarized as follows: a driveability affordance which does not make any assumptions about the types of obstacles in the scene or the presence of an explicit path, widening the learning problem beyond structured, predictable landscapes; a soft ordinal label encoding which incorporates the ambiguity and order between levels of driveability during learning, with some areas in the image being more driveable than others; a loss weighting scheme which, rather than treating all pixels as equally important for navigation, concentrates learning in safety-critical areas while allowing leniency around object outlines and distant scene background; a challenging experimental procedure: beyond samedataset testing, we evaluate the generalizability of our approach on three unseen datasets, including the WildDash benchmark which captures a large variety of difficult driving scenarios across 100+ countries.

Proposed Solution

The researchers have used the below 3 step process to segment driveability for robotic navigation:

  • Step-1: Pixel annotations are mapped from existing semantic segmentation datasets to a 3-level driveability affordance. In this step, Pixels are classified to a 3-level driveability affordance, namely
  • Preferable: Where we expect
  • Possible, but not preferable: Technically navigable but challenging to navigate
  • Impossible: Includes undrivable OR unreachable territory (Ex: Sky, obstacles, hazardous terrain, etc.)
  • Step-2: Affordance labels are softened to model the levels’ Here, the segmentation of a pixel is re-calibrated based on the segmentation of its neighboring pixels, with weightage inversely proportional to the distance of the neighboring pixel. This is referred to as Soft Ordinal Vectors (or SORD) labeling scheme
  • Step-3: A loss weighting scheme is proposed which selectively emphasizes the areas most relevant to navigation. In this step, the researchers have formulated a pixel-wise loss weighting scheme that assigns higher importance to the pixels most relevant for driving decisions. It serves as a naive placeholder for depth data, under the simple assumption that the lower a pixel in the image, the closer it is to the camera.

Image credit: arXiv:2109.07245

The above image shows how an image in the top right corner transitions through the above-proposed Steps-1, 2 & 3 (i.e., Top Left, Bottom Left & Bottom Right, respectively)

Image credit: arXiv:2109.07245

Results and Conclusion

The researchers have propose a simple yet effective method for open-world robotic navigation. For scene segmentation for unseeen environments, the researchers have demonstrated this approach as being quantitatively & qualitatively superior compared to similar existing techniques. Moreover, the proposed model is also more adaptable based on the training data. A lightweight variant can also replace the segmentation architecture for resource-constrained platforms. The proposed method that takes an RGB image as input is easily trainable and can be extended to arbitrary scenes making it very versatile and also offer a wide range of robotic applications

Source: Galadrielle Humblot-Renaux, Letizia Marchegiani, Thomas B. Moeslund, and Rikke Gade “Navigation-Oriented Scene Understanding for Robotic Autonomy: Learning to Segment Driveability in Egocentric Images”


Notify of
Inline Feedbacks
View all comments
Would love your thoughts, please comment.x