DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer

Dance generation from music is an interesting but challenging task. While most previous research focuses on this task as a sequence of motion state parameters, curve representation can make the animation look more fluent. Hence, a recent research paper proposes to model the dance generation problem as key poses generation and motion curve generation.

Dance – abstract artistic impression. Image credit: Gerd Altmann via Pixabay, free licence

The framework is based on encoder-decoder architectures and an adversarial training scheme. For the decoder, a novel architecture tailored for the dance generation is introduced. It uses kinematic chain networks to model the spatial correlation between body parts. Including physics constraints in the model leads to more natural results.

The learned local attention module is used to introduce temporal locality and avoid degradation to averaging actions. A user study showed that the suggested method generates better dances in performance quality compared with other works.

In this work, we propose a novel deep learning framework that can generate a vivid dance from a whole piece of music. In contrast to previous works that define the problem as generation of frames of motion state parameters, we formulate the task as a prediction of motion curves between key poses, which is inspired by the animation industry practice. The proposed framework, named DanceNet3D, first generates key poses on beats of the given music and then predicts the in-between motion curves. DanceNet3D adopts the encoder-decoder architecture and the adversarial schemes for training. The decoders in DanceNet3D are constructed on MoTrans, a transformer tailored for motion generation. In MoTrans we introduce the kinematic correlation by the Kinematic Chain Networks, and we also propose the Learned Local Attention module to take the temporal local correlation of human motion into consideration. Furthermore, we propose PhantomDance, the first large-scale dance dataset produced by professional animatiors, with accurate synchronization with music. Extensive experiments demonstrate that the proposed approach can generate fluent, elegant, performative and beat-synchronized 3D dances, which significantly surpasses previous works quantitatively and qualitatively.

Research paper: Li, B., Zhao, Y., and Sheng, L., “DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer”, 2021. Link:


Notify of
Inline Feedbacks
View all comments
Would love your thoughts, please comment.x