Deep reinforcement learning (RL) has helped to solve a lot of tasks, including mastery of the game of Go, learning to play video games, and learning basic robotic control. Nevertheless, manual human effort is required to engineer rewards for each task.
A recent paper on arXiv.org looks into human-in-the-loop RL. It proposes for humans to provide feedback interactively to the agent as it is training.
Industrial robots. Image credit: Auledas via Wikimedia, CC-BY-SA-4.0
The researchers suggest integrating human feedback not only for RL but also for extracting human-aligned skills. A human preference function is used to weigh the likelihood of trajectories in the dataset based on their degree of alignment with human intent.
It is shown that the method successfully extracts structured skills from noisy offline datasets. During experiments, it solved complex manipulation tasks in a robotic kitchen environment more efficiently than prior leading human-in-the-loop and skill extraction baselines.
A promising approach to solving challenging long-horizon tasks has been to extract behavior priors (skills) by fitting generative models to large offline datasets of demonstrations. However, such generative models inherit the biases of the underlying data and result in poor and unusable skills when trained on imperfect demonstration data. To better align skill extraction with human intent we present Skill Preferences (SkiP), an algorithm that learns a model over human preferences and uses it to extract human-aligned skills from offline data. After extracting human-preferred skills, SkiP also utilizes human feedback to solve down-stream tasks with RL. We show that SkiP enables a simulated kitchen robot to solve complex multi-step manipulation tasks and substantially outperforms prior leading RL algorithms with human preferences as well as leading skill extraction algorithms without human preferences.
Research paper: Wang, X., Lee, K., Hakhamaneshi, K., Abbeel, P., and Laskin, M., “Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback”, 2021. Link: https://arxiv.org/abs/2108.05382