Robotic exploration is the task of autonomously navigating an unknown environment to gather sufficient information to represent it. Usually, deep reinforcement learning-based algorithms are employed, and extrinsic rewards based on occupancy anticipation are used in training. However, such rewards require knowledge about the precise layout of the training environments, which is expensive to gather. Therefore, a recent paper proposes to train the model with a purely intrinsic reward signal.
A robot. Image credit: Pxfuel, free licence
The reward employs the impact of the agent actions on the environment, measured as the difference between two consecutive observations. A pseudo-count is discounted for previously visited states, and an additional module is designed to keep track of the pseudo-count. The researchers have deployed the devised algorithm in the real world. It outperforms state-of-the-art baselines in simulated experiments.
Exploration of indoor environments has recently experienced a significant interest, also thanks to the introduction of deep neural agents built in a hierarchical fashion and trained with Deep Reinforcement Learning (DRL) on simulated environments. Current state-of-the-art methods employ a dense extrinsic reward that requires the complete a priori knowledge of the layout of the training environment to learn an effective exploration policy. However, such information is expensive to gather in terms of time and resources. In this work, we propose to train the model with a purely intrinsic reward signal to guide exploration, which is based on the impact of the robot’s actions on the environment. So far, impact-based rewards have been employed for simple tasks and in procedurally generated synthetic environments with countable states. Since the number of states observable by the agent in realistic indoor environments is non-countable, we include a neural-based density model and replace the traditional count-based regularization with an estimated pseudo-count of previously visited states. The proposed exploration approach outperforms DRL-based competitors relying on intrinsic rewards and surpasses the agents trained with a dense extrinsic reward computed with the environment layouts. We also show that a robot equipped with the proposed approach seamlessly adapts to point-goal navigation and real-world deployment.