Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning

Deep reinforcement learning (DRL) has been successfully used to solve robotics tasks like locomotion, manipulation, or navigation. However, complex tasks require a long training time.

A recent paper on arXiv.org explores massive parallelism for the improvement of the quality and time-to-deployment of DRL policies.

Robonaut. Image credit NASA via Pixabay

The researchers examine how the standard RL formulation and the most used hyper-parameters should be adapted to learn efficiently in the highly parallel regime. They introduce a novel game-inspired curriculum that automatically adapts the task difficulty to the performance of the policy.

The proposed approach can train a perceptive policy in minutes on a single GPU, with the complexity of sim-to-real transfer to the hardware. It is shown that the task can be solved using simple observation and action spaces as well as relatively straightforward rewards.

In this work, we present and study a training set-up that achieves fast policy generation for real-world robotic tasks by using massive parallelism on a single workstation GPU. We analyze and discuss the impact of different training algorithm components in the massively parallel regime on the final policy performance and training times. In addition, we present a novel game-inspired curriculum that is well suited for training with thousands of simulated robots in parallel. We evaluate the approach by training the quadrupedal robot ANYmal to walk on challenging terrain. The parallel approach allows training policies for flat terrain in under four minutes, and in twenty minutes for uneven terrain. This represents a speedup of multiple orders of magnitude compared to previous work. Finally, we transfer the policies to the real robot to validate the approach. We open-source our training code to help accelerate further research in the field of learned legged locomotion.

Research paper: Rudin, N., Hoeller, D., Reist, P., and Hutter, M., “Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning”, 2021. Link: https://arxiv.org/abs/2109.11978


Notify of
Inline Feedbacks
View all comments
Would love your thoughts, please comment.x