EMo-Mask: Emotional Controllable Motion Generation

CS348I: Computer Graphics in the Era of AI (2024 Winter) Final Project
Stanford Univerisity
-->

This project introduces EMo-Mask, an extension of the motion generation model MoMask, which incorporates emotion understanding to generate diverse and expressive human motions. By integrating an EmotionEmbedder module, EMo-Mask learns meaningful representations of emotions and guides the motion generation process. Experiments demonstrate that incorporating emotion embeddings at the M-transformer input of MoMask yields the best performance. The project identifies limitations, such as the sensitivity of average Mean Squared Error loss, and proposes future directions for enhancing the EmotionEmbedder architecture and exploring perceptually relevant loss functions. EMo-Mask represents a significant step towards generating emotionally expressive motions, contributing to the creation of more engaging and believable animated characters.


* This video contains audio.

Approach Overview


Gallery of Happy Running

Gallery of Anger Running

Gallery of Fear Running

Gallery of Sad Running

Gallery of Happy Walking

Gallery of Anger Walking

Gallery of Fear Walking

Gallery of Sad Walking

Impact of different integration points

Exploring the incorporation of emotion embeddings at different stages of the motion generation process: (a) R-transformer Input: Influencing the refinement of motion tokens based on emotion embeddings. (b) M-transformer Input: Guiding the generation of base motion tokens conditioned on emotion embeddings. (c) VQ-VAE Input: Conditioning the encoding and decoding of motion sequences using emotion embeddings.