Using deep reinforcement learning in robots may be a more efficient alternative to preprogrammed movements and enable robots to perform more challenging tasks. Nevertheless, learning on physical systems must not damage the robot.
A recent paper focuses on the task of juggling two balls. It cannot be simulated as high accelerations are required, and non-linear effects and dynamic contacts occur. Moreover, the optimal policy cannot be transferred and must be learned on every individual robot.
Image credit: moise_theodor via Pixabay, free licence
The study shows how currently available learning approaches and engineering abilities can be used in this task. As the results show, the system was able to learn the task within 56 minutes. After gradual improvement, it achieves repeated juggling of more than 4500 repeated caches. In comparison, untrained human jugglers can perform about 20 catches after a few hours of training.
Robots that can learn in the physical world will be important to en-able robots to escape their stiff and pre-programmed movements. For dynamic high-acceleration tasks, such as juggling, learning in the real-world is particularly challenging as one must push the limits of the robot and its actuation without harming the system, amplifying the necessity of sample efficiency and safety for robot learning algorithms. In contrast to prior work which mainly focuses on the learning algorithm, we propose a learning system, that directly incorporates these requirements in the design of the policy representation, initialization, and optimization. We demonstrate that this system enables the high-speed Barrett WAM manipulator to learn juggling two balls from 56 minutes of experience with a binary reward signal. The final policy juggles continuously for up to 33 minutes or about 4500 repeated catches. The videos documenting the learning process and the evaluation can be found at this https URL