消息
×
loading..
OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers
2024
会议录名称2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024
ISSN1063-6919
DOI10.1109/CVPR52733.2024.00053
摘要We have recently seen tremendous progress in realistic text-to-motion generation. Yet, the existing methods often fail or produce implausible motions with unseen text inputs, which limits the applications. In this paper, we present OMG, a novel framework, which enables compelling motion generation from zero-shot open-vocabulary text prompts. Our key idea is to carefully tailor the pretrain-then-finetune paradigm into the text-to-motion generation. At the pre-training stage, our model improves the generation ability by learning the rich out-of-domain inherent motion traits. To this end, we scale up a large unconditional diffusion model up to 1B parameters, so as to utilize the massive unlabeled motion data up to over 20M motion instances. At the subsequent fine-tuning stage, we introduce motion ControlNet, which incorporates text prompts as conditioning information, through a trainable copy of the pre-trained model and the proposed novel Mixture-of-Controllers (MoC) block. MoC block adaptively recognizes various ranges of the sub-motions with a cross-attention mechanism and processes them separately with the text-token-specific experts. Such a design effectively aligns the CLIP token embeddings of text prompts to various ranges of compact and expressive motion features. Extensive experiments demonstrate that our OMG achieves significant improvements over the state-of-the-art methods on zero-shot text-to-motion generation. Project page: https://tr3e.github.io/omg-page.
会议名称IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
出版地10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1264 USA
会议地点null,Seattle,WA
会议日期JUN 16-22, 2024
URL查看原文
收录类别CPCI-S
语种英语
资助项目National Key R&D Program of China[2022YFF0902301] ; Shanghai Local college capacity building program[22010502800]
WOS研究方向Computer Science
WOS类目Computer Science, Artificial Intelligence ; Computer Science, Interdisciplinary Applications ; Computer Science, Theory & Methods
WOS记录号WOS:001322555900044
出版者IEEE COMPUTER SOC
文献类型会议论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/372956
专题信息科学与技术学院_PI研究组_杨思蓓组
信息科学与技术学院_PI研究组_虞晶怡组
信息科学与技术学院_本科生
信息科学与技术学院_博士生
信息科学与技术学院_PI研究组_许岚组
通讯作者Liang, Han
作者单位
1.ShanghaiTech Univ, Shanghai, Peoples R China
2.Tencent PCG, Shenzhen, Peoples R China
第一作者单位上海科技大学
通讯作者单位上海科技大学
第一作者的第一单位上海科技大学
推荐引用方式
GB/T 7714
Liang, Han,Bao, Jiacheng,Zhang, Ruichi,et al. OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers[C]. 10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1264 USA:IEEE COMPUTER SOC,2024.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Liang, Han]的文章
[Bao, Jiacheng]的文章
[Zhang, Ruichi]的文章
百度学术
百度学术中相似的文章
[Liang, Han]的文章
[Bao, Jiacheng]的文章
[Zhang, Ruichi]的文章
必应学术
必应学术中相似的文章
[Liang, Han]的文章
[Bao, Jiacheng]的文章
[Zhang, Ruichi]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。