Home > Robotics > Memory-based gaze prediction in deep imitation learning for robot manipulation

Memory-based gaze prediction in deep imitation learning for robot manipulation

Deep imitation learning has enabled robots to perform manipulation tasks without predefined rules. However, current architectures infer a reactive action to the current states, while in real-world robots may be required to utilize memory.

Industrial robot. Image credit: Humanrobo via Wikimedia, CC-BY-SA-3.0

Therefore, a recent paper published on arXiv.org proposes a sequential data-based gaze control to achieve memory-based robot manipulation.

When humans recall the location of an object in the closed cupboard, they first gaze at the remembered location and then attempt to manipulate it. Similarly, researchers state that a memory-based gaze generation system enables the robot to determine the correct location, which can only be inferred from the data of the previous time step. Transformer-based self-attention architecture for gaze prediction is proposed.

Experiments on a multi-object manipulation task show that Transformer’s self-attention is a promising approach for such tasks.

Deep imitation learning is a promising approach that does not require hard-coded control rules in autonomous robot manipulation. The current applications of deep imitation learning to robot manipulation have been limited to reactive control based on the states at the current time step. However, future robots will also be required to solve tasks utilizing their memory obtained by experience in complicated environments (e.g., when the robot is asked to find a previously used object on a shelf). In such a situation, simple deep imitation learning may fail because of distractions caused by complicated environments. We propose that gaze prediction from sequential visual input enables the robot to perform a manipulation task that requires memory. The proposed algorithm uses a Transformer-based self-attention architecture for the gaze estimation based on sequential data to implement memory. The proposed method was evaluated with a real robot multi-object manipulation task that requires memory of the previous states.

Research paper: Kim, H., Ohmura, Y., and Kuniyoshi, Y., “Memory-based gaze prediction in deep imitation learning for robot manipulation”, 2022. Link: https://arxiv.org/abs/2202.04877


Notify of
Inline Feedbacks
View all comments
Would love your thoughts, please comment.x