In order to apply autonomous vehicles in everyday life, it is necessary to verify whether the system is able to detect pedestrians and change the trajectory accordingly. However, real-world experiments would be unethical, and commonly used artist-designed human meshes require lots of effort.
Therefore, a recent paper on arXiv.org suggests employing real-world sensor data captured by an autonomous driving fleet, recover 3D motion and shapes of pedestrians, and use it in simulations.
Photo credit: Florida Department of Transportation / NHTSA
LiDAR point clouds and camera images are used. The shape and pose of pedestrians are recovered using both deep learning and energy minimization methods. A realistic LiDAR simulation system poses and deforms a single artist-created mesh using real-world data. That way, a large collection of meshes in different scenes is generated. The experimental evaluation shows that training with simulated data improves pedestrian detection performance.
Sensor simulation is a key component for testing the performance of self-driving vehicles and for data augmentation to better train perception systems. Typical approaches rely on artists to create both 3D assets and their animations to generate a new scenario. This, however, does not scale. In contrast, we propose to recover the shape and motion of pedestrians from sensor readings captured in the wild by a self-driving car driving around. Towards this goal, we formulate the problem as energy minimization in a deep structured model that exploits human shape priors, reprojection consistency with 2D poses extracted from images, and a ray-caster that encourages the reconstructed mesh to agree with the LiDAR readings. Importantly, we do not require any ground-truth 3D scans or 3D pose annotations. We then incorporate the reconstructed pedestrian assets bank in a realistic LiDAR simulation system by performing motion retargeting, and show that the simulated LiDAR data can be used to significantly reduce the amount of annotated real-world data required for visual perception tasks.