OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers

doi:10.1109/CVPR52733.2024.00053

	OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers
	Liang, Han1 ; Bao, Jiacheng1 ; Zhang, Ruichi1 ; Ren, Sihan1 ; Xu, Yuecheng1 ; Yang, Sibei1 ; Chen, Xin 2; Yu, Jingyi1 ; Xu, Lan1
	2024
会议录名称	2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024
ISSN	1063-6919
DOI	10.1109/CVPR52733.2024.00053
摘要	We have recently seen tremendous progress in realistic text-to-motion generation. Yet, the existing methods often fail or produce implausible motions with unseen text inputs, which limits the applications. In this paper, we present OMG, a novel framework, which enables compelling motion generation from zero-shot open-vocabulary text prompts. Our key idea is to carefully tailor the pretrain-then-finetune paradigm into the text-to-motion generation. At the pre-training stage, our model improves the generation ability by learning the rich out-of-domain inherent motion traits. To this end, we scale up a large unconditional diffusion model up to 1B parameters, so as to utilize the massive unlabeled motion data up to over 20M motion instances. At the subsequent fine-tuning stage, we introduce motion ControlNet, which incorporates text prompts as conditioning information, through a trainable copy of the pre-trained model and the proposed novel Mixture-of-Controllers (MoC) block. MoC block adaptively recognizes various ranges of the sub-motions with a cross-attention mechanism and processes them separately with the text-token-specific experts. Such a design effectively aligns the CLIP token embeddings of text prompts to various ranges of compact and expressive motion features. Extensive experiments demonstrate that our OMG achieves significant improvements over the state-of-the-art methods on zero-shot text-to-motion generation. Project page: https://tr3e.github.io/omg-page.
会议名称	IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
出版地	10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1264 USA
会议地点	null,Seattle,WA
会议日期	JUN 16-22, 2024
URL	查看原文
收录类别	CPCI-S
语种	英语
资助项目	National Key R&D Program of China[2022YFF0902301] ; Shanghai Local college capacity building program[22010502800]
WOS研究方向	Computer Science
WOS类目	Computer Science, Artificial Intelligence ; Computer Science, Interdisciplinary Applications ; Computer Science, Theory & Methods
WOS记录号	WOS:001322555900044
出版者	IEEE COMPUTER SOC
文献类型	会议论文
条目标识符	https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/372956
专题	信息科学与技术学院_PI研究组_杨思蓓组信息科学与技术学院_PI研究组_虞晶怡组信息科学与技术学院_本科生信息科学与技术学院_博士生信息科学与技术学院_PI研究组_许岚组
通讯作者	Liang, Han
作者单位	1.ShanghaiTech Univ, Shanghai, Peoples R China 2.Tencent PCG, Shenzhen, Peoples R China
第一作者单位	上海科技大学
通讯作者单位	上海科技大学
第一作者的第一单位	上海科技大学
推荐引用方式 GB/T 7714	Liang, Han,Bao, Jiacheng,Zhang, Ruichi,et al. OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers[C]. 10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1264 USA:IEEE COMPUTER SOC,2024.