How to select and use tools? Active Perception of Target Objects Using Multimodal Deep Learning

In order to perform a lot of everyday actions, it is necessary to handle and operate various tools. Robots can usually repeat specific tool-use motions for specific objects. However, they have difficulties when determining which tool should be used and adjusting how to handle it depending on the object.

A recent study tries to approach the problem using active perception. The robot is allowed to interact with an object to recognize its characteristics.

Image credit: ponce_photography via Pixabay, free licence

The researchers used transferring food ingredients as an example task. The robot had to recognize what ingredients are in a pot, select a ladle or turner depending on the ingredient characteristics, and transfer the ingredient to a bowl.

As a result, the robot successfully transferred untrained ingredients. It was confirmed that a neural network could recognize the characteristics of unknown objects in its latent space.

Selection of appropriate tools and use of them when performing daily tasks is a critical function for introducing robots for domestic applications. In previous studies, however, adaptability to target objects was limited, making it difficult to accordingly change tools and adjust actions. To manipulate various objects with tools, robots must both understand tool functions and recognize object characteristics to discern a tool-object-action relation. We focus on active perception using multimodal sensorimotor data while a robot interacts with objects, and allow the robot to recognize their extrinsic and intrinsic characteristics. We construct a deep neural networks (DNN) model that learns to recognize object characteristics, acquires tool-object-action relations, and generates motions for tool selection and handling. As an example tool-use situation, the robot performs an ingredients transfer task, using a turner or ladle to transfer an ingredient from a pot to a bowl. The results confirm that the robot recognizes object characteristics and servings even when the target ingredients are unknown. We also examine the contributions of images, force, and tactile data and show that learning a variety of multimodal information results in rich perception for tool use.

Research paper: Saito, N., Ogata, T., Funabashi, S., Mori, H., and Sugano, S., “How to select and use tools? : Active Perception of Target Objects Using Multimodal Deep Learning”, 2021. Link:


Notify of
Inline Feedbacks
View all comments
Would love your thoughts, please comment.x