Motion forecasting is essential for autonomous systems. However, current methods struggle to discover the physical laws from data and are inefficient for knowledge transfer. A recent study on arXiv.org proposes to tackle these challenges from a causal representation perspective.
Motion forecasting can be applied in different fields. Image credit: Bicanski via Pixnio, CC0 Public Domain
A new formalism describes human behaviors with three groups of variables: domain-invariant causal variables accounting for physical laws, domain-specific confounders associated with motion styles, and non-causal spurious features. Seeking the commonalities across multiple domains promotes causal invariance of the learned representations. A modular architecture factorizes the representations of invariant mechanisms and style confounders. Moreover, a style consistency loss is introduced to strengthen the structure of motion styles.
It is shown that the method outperforms its competitors in terms of generalization and transferability.
Learning behavioral patterns from observational data has been a de-facto approach to motion forecasting. Yet, the current paradigm suffers from two shortcomings: brittle under covariate shift and inefficient for knowledge transfer. In this work, we propose to address these challenges from a causal representation perspective. We first introduce a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables, namely invariant mechanisms, style confounders, and spurious features. We then introduce a learning framework that treats each group separately: (i) unlike the common practice of merging datasets collected from different locations, we exploit their subtle distinctions by means of an invariance loss encouraging the model to suppress spurious correlations; (ii) we devise a modular architecture that factorizes the representations of invariant mechanisms and style confounders to approximate a causal graph; (iii) we introduce a style consistency loss that not only enforces the structure of style representations but also serves as a self-supervisory signal for test-time refinement on the fly. Experiment results on synthetic and real datasets show that our three proposed components significantly improve the robustness and reusability of the learned motion representations, outperforming prior state-of-the-art motion forecasting models for out-of-distribution generalization and low-shot transfer.
Research paper: Liu, Y., Cadei, R., Schweizer, J., Bahmani, S., and Alahi, A., “Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective”, 2021. Link: https://arxiv.org/abs/2111.14820