AIR-Act2Act: Human-human interaction dataset for teaching non-verbal social behaviors to robots

To interact with humans, social robots should make appropriate responses depending on human behavior. A lot of attempts to improve the social intelligence of robots are based on predefined behaviors. A recent paper on arXiv.org suggests employing machine learning to teach robots that provide social services to the elderly.

Robots were taught such interactions as bowing for greeting and saying goodbye, shaking hands, hugging a crying person, high-fiving, or scratching the head in case of awkwardness. The data was collected with the help of 100 seniors and used several cameras to capture the behaviors from different points of view. The dataset consists of detailed maps, body indexes, and 3D skeletal data. Human behavior is transformed into joint angles of a humanoid robot. Furthermore, the dataset can be used as training input in other human action recognition algorithms.

To better interact with users, a social robot should understand the users’ behavior, infer the intention, and respond appropriately. Machine learning is one way of implementing robot intelligence. It provides the ability to automatically learn and improve from experience instead of explicitly telling the robot what to do. Social skills can also be learned through watching human-human interaction videos. However, human-human interaction datasets are relatively scarce to learn interactions that occur in various situations. Moreover, we aim to use service robots in the elderly-care domain; however, there has been no interaction dataset collected for this domain. For this reason, we introduce a human-human interaction dataset for teaching non-verbal social behaviors to robots. It is the only interaction dataset that elderly people have participated in as performers. We recruited 100 elderly people and two college students to perform 10 interactions in an indoor environment. The entire dataset has 5,000 interaction samples, each of which contains depth maps, body indexes and 3D skeletal data that are captured with three Microsoft Kinect v2 cameras. In addition, we provide the joint angles of a humanoid NAO robot which are converted from the human behavior that robots need to learn. The dataset and useful python scripts are available for download at this https URL. It can be used to not only teach social skills to robots but also benchmark action recognition algorithms.

Link: https://arxiv.org/abs/2009.02041

Source