Demonstration-Guided Reinforcement Learning with Learned Skills

Humans are remarkably efficient at acquiring new skills from demonstrations: often, a single demonstration of the desired behavior and a few trials of the task are sufficient to master it.

Can machines replicate the same learning methodology?

Industrial robots. Image credit: ISAPUT via Wikimedia, CC-BY-SA-4.0

Yes, they can!! Here are currently available techniques for demonstration-guided machine learning:

Imitation learning: It refers to learning by imitation OR learning by a demonstration where complex behavior is learned by leveraging a demonstration set. Possible limitations include limitation to learning robust policies & unstable training

Demonstration-guided RL: Reinforcement learning is combined with Imitation learning to overcome the limitations of Imitation Learning. However, since many demonstrations are required to learn effectively, it is expensive, especially since every new task is considered an independent learning problem. So, training is expensive. What can we do about it?

Online RL with offline datasets: Here, Reinforcement learning is accelerated by leveraging task-agnostic experience (OR offline datasets collected across many tasks).

Skill-based RL: It learns new tasks by recombining tasks from task-agnostic datasets.

This post is based on the research paper by Karl Pertsch, Youngwoon Lee, Yue Wu, Joseph J. Lim. In the words of the researcher, the objective of their research is threefold:

(1) we introduce the problem of leveraging task-agnostic offline datasets for accelerating demonstration-guided RL on unseen tasks,

(2) we propose SkiLD, a skill-based algorithm for efficient demonstration-guided RL and

(3) we show the effectiveness of our approach on a maze navigation and two complex robotic manipulation tasks.

SkilLD

SkiLD has been described as a new method for demonstration guided reinforced learning that leverages task agnostic experience datasets and task-specific demonstrations for accelerated learning of unseen tasks. SkiLD accelerates the learning of long-horizon tasks while reducing the number of required demonstrations. The research outlines that

Given task agnostic large dataset, our approach extracts reusable skills: robust short-horizon behaviors that can be recombined to learn new tasks. Like a human imitating complex behaviors via the chaining of known skills, complex tasks could be learned faster. Concretely, we propose Skill-based Learning with Demonstrations (SkiLD), a demonstration-guided RL algorithm that learns short-horizon skills from offline datasets and then learns new tasks efficiently by leveraging these skills to follow a given set of demonstrations. Across challenging navigation and robotic manipulation tasks our approach significantly improves the learning efficiency over prior demonstration-guided RL approaches.

Researchers’ Approach

Researchers have extracted skill-related characteristics from the task agnostic experience data. These extracted skills are leveraged to improve the efficiency of demonstration-guided RL on an unseen tasks

SkiLD, combines task-agnostic experience and task-specific demonstrations to efficiently learn target tasks in three steps: (1) extract skill representation from task-agnostic offline data, (2) learn task-agnostic skill prior from task-agnostic data and task-specific skill posterior from demonstrations, and (3) learn a high-level skill policy for the target task using prior knowledge from both task-agnostic offline data and task-specific demonstrations.

Experimental setups

Below setups were used to measure the effectiveness of popular learning techniques and SkiLD

  • Maze Navigation: Navigating a 2D maze
  • Robot Kitchen Environment: To perform a sequence of 4 sub-tasks, such as opening the microwave or switching on the light, in the correct order
  • Robot Office Environment: To clean an office environment by placing objects in their target bins or putting them in a drawer

Image courtesy of the researchers, arXiv:2107.10253v1

Conclusion

SkiLD uses large, task-agnostic datasets and a small number of task-specific demonstrations for learning. SkilLD is proposed as an efficient demonstration-guided machine learning methodology that can be used to learn complex tasks. It uses a demonstration-guided Reinforced Learning to leverage previously learned skills from other tasks and recombine them for faster learning. Experiments found the technique proposed by the researchers accomplishes the learning goals faster than other popular techniques in this specific field in tasks such as 2D maze navigation, Robot Kitchen Environment & Robot Office Environment.

Research paper: Karl Pertsch, Youngwoon Lee, Yue Wu, Joseph J. Lim “Demonstration-Guided Reinforcement Learning with Learned Skills”

Source

Subscribe
Notify of
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x