Reconstructing the occluded contents of containers is an important task in assistive technology and robotics. Previous attempts to solve this task relied on laser reflections or radio frequency-based sensing.
Industrial robots. Image credit: Auledas via Wikimedia, CC-BY-SA-4.0
A recent study proposes to use another modality: acoustic vibrations. When an object interacts with a container, it creates a vibration. The researchers introduce a smart container that uses the vibration of itself to reconstruct an image of its contents.
It uses passively and naturally available vibrations that depend on the physical properties of the box and its contained objects. The container uses four contact microphones to detect its own vibration. A convolutional network uses the vibrations to predict the visual scene inside the container. Using cheap acoustic sensors was enough to reconstruct the position and shape of objects inside within centimeters.
We introduce The Boombox, a container that uses acoustic vibrations to reconstruct an image of its inside contents. When an object interacts with the container, they produce small acoustic vibrations. The exact vibration characteristics depend on the physical properties of the box and the object. We demonstrate how to use this incidental signal in order to predict visual structure. After learning, our approach remains effective even when a camera cannot view inside the box. Although we use low-cost and low-power contact microphones to detect the vibrations, our results show that learning from multi-modal data enables us to transform cheap acoustic sensors into rich visual sensors. Due to the ubiquity of containers, we believe integrating perception capabilities into them will enable new applications in human-computer interaction and robotics. Our project website is at: this http URL
Research paper: Chen, B., Chiquier, M., Lipson, H., and Vondrick, C., “The Boombox: Visual Reconstruction from Acoustic Vibrations”, 2021. Link: https://arxiv.org/abs/2105.08052