| |||||||
ShanghaiTech University Knowledge Management System
OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers | |
2024 | |
会议录名称 | 2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024 |
ISSN | 1063-6919 |
DOI | 10.1109/CVPR52733.2024.00053 |
摘要 | We have recently seen tremendous progress in realistic text-to-motion generation. Yet, the existing methods often fail or produce implausible motions with unseen text inputs, which limits the applications. In this paper, we present OMG, a novel framework, which enables compelling motion generation from zero-shot open-vocabulary text prompts. Our key idea is to carefully tailor the pretrain-then-finetune paradigm into the text-to-motion generation. At the pre-training stage, our model improves the generation ability by learning the rich out-of-domain inherent motion traits. To this end, we scale up a large unconditional diffusion model up to 1B parameters, so as to utilize the massive unlabeled motion data up to over 20M motion instances. At the subsequent fine-tuning stage, we introduce motion ControlNet, which incorporates text prompts as conditioning information, through a trainable copy of the pre-trained model and the proposed novel Mixture-of-Controllers (MoC) block. MoC block adaptively recognizes various ranges of the sub-motions with a cross-attention mechanism and processes them separately with the text-token-specific experts. Such a design effectively aligns the CLIP token embeddings of text prompts to various ranges of compact and expressive motion features. Extensive experiments demonstrate that our OMG achieves significant improvements over the state-of-the-art methods on zero-shot text-to-motion generation. Project page: https://tr3e.github.io/omg-page. |
会议名称 | IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) |
出版地 | 10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1264 USA |
会议地点 | null,Seattle,WA |
会议日期 | JUN 16-22, 2024 |
URL | 查看原文 |
收录类别 | CPCI-S |
语种 | 英语 |
资助项目 | National Key R&D Program of China[2022YFF0902301] ; Shanghai Local college capacity building program[22010502800] |
WOS研究方向 | Computer Science |
WOS类目 | Computer Science, Artificial Intelligence ; Computer Science, Interdisciplinary Applications ; Computer Science, Theory & Methods |
WOS记录号 | WOS:001322555900044 |
出版者 | IEEE COMPUTER SOC |
文献类型 | 会议论文 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/372956 |
专题 | 信息科学与技术学院_PI研究组_杨思蓓组 信息科学与技术学院_PI研究组_虞晶怡组 信息科学与技术学院_本科生 信息科学与技术学院_博士生 信息科学与技术学院_PI研究组_许岚组 |
通讯作者 | Liang, Han |
作者单位 | 1.ShanghaiTech Univ, Shanghai, Peoples R China 2.Tencent PCG, Shenzhen, Peoples R China |
第一作者单位 | 上海科技大学 |
通讯作者单位 | 上海科技大学 |
第一作者的第一单位 | 上海科技大学 |
推荐引用方式 GB/T 7714 | Liang, Han,Bao, Jiacheng,Zhang, Ruichi,et al. OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers[C]. 10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1264 USA:IEEE COMPUTER SOC,2024. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。