EMo-Mask: Emotional Controllable Motion Generation

CS348I: Computer Graphics in the Era of AI (2024 Winter) Final Project
Stanford University

Abstract

This project extends the motion generation model MoMask by adding emotion understanding capabilities. We built an EmotionEmbedder module that helps guide the motion generation process with emotional context. Through experiments, we found that adding emotion information at the M-transformer input works best. We also explored different ways to train the model and identified some challenges, like the limitations of using mean squared error for comparing motions.

* This video contains audio

Overview

Framework Overview

Our approach combines emotion understanding with motion generation using the EmotionEmbedder module.

Method

Emotion Embedding

We focused on four basic emotions from Plutchik's wheel: Joy, Sadness, Fear, and Anger. The EmotionEmbedder converts these emotions into vectors that can guide motion generation.

Plutchik's Wheel

Data Collection Pipeline

Data Collection Pipeline
Our data collection process: (a) Identify required emotion (b) Collect descriptions from LLM (c) Generate motions using MoMask

Integration Approaches

Integration Points
Three integration points tested: (A) M-transformer (B) R-transformer (C) VQ-VAE

Results

Integration Results

Integration Results Comparison
Results for integrating emotion embedding at different points: (a) M-Transformer integration (b) R-Transformer integration (c) VQ-VAE integration

Integration Point Comparison

Comparing different integration points: M-transformer worked best, producing more stable and emotionally expressive motions.

Motion Examples

Key Findings

What Worked Well

  • M-transformer integration produced the most stable results
  • Successfully generated different emotional styles for both walking and running
  • Combined loss functions helped maintain motion quality while adding emotional expression

Challenges

  • MSE loss was sometimes too sensitive to small motion differences
  • VQ-VAE integration produced less realistic movements
  • Some emotions were harder to express clearly in motion

Full Gallery