消息
×
loading..
Tri-Modal Motion Retrieval by Learning a Joint Embedding Space
2024-03-01
会议录名称ARXIV
ISSN1063-6919
页码1596-1605
发表状态已发表
DOIarXiv:2403.00691
摘要

Information retrieval is an ever-evolving and crucial research domain. The substantial demand for high-quality human motion data especially in online acquirement has led to a surge in human motion research works. Prior works have mainly concentrated on dual-modality learning, such as text and motion tasks, but three-modality learning has been rarely explored. Intuitively, an extra introduced modality can enrich a model's application scenario, and more importantly, an adequate choice of the extra modality can also act as an intermediary and enhance the alignment between the other two disparate modalities. In this work, we introduce LAVIMO (LAnguage-VIdeo-MOtion alignment), a novel framework for three-modality learning integrating human-centric videos as an additional modality, thereby effectively bridging the gap between text and motion. Moreover, our approach leverages a specially designed attention mechanism to foster enhanced alignment and synergistic effects among text, video, and motion modalities. Empirically, our results on the HumanML3D and KIT-ML datasets show that LAVIMO achieves state-of-the-art performance in various motion-related cross-modal retrieval tasks, including text-to-motion, motion-to-text, video-to-motion and motion-to-video.

关键词Contrastive Learning Human engineering Metadata Text processing Cross-modal Cross-modal retrieval Embeddings High quality Human motion data Human motions Motion alignment Motion retrieval Multi-modal learning Video motion
会议地点Seattle, WA, USA
会议日期16-22 June 2024
URL查看原文
收录类别EI
语种英语
WOS类目Computer Science, Artificial Intelligence ; Computer Science, Software Engineering
WOS记录号PPRN:88004002
出版者IEEE Computer Society
EI入藏号20250917941978
EI主题词Information retrieval
EI分类号101.5 Ergonomics and Human Factors Engineering ; 1101.2 Machine Learning ; 1106.2 Data Handling and Data Processing ; 903.1 Information Sources and Analysis ; 903.3 Information Retrieval and Use
原始文献类型Conference article (CA)
来源库IEEE
文献类型会议论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/372891
专题信息科学与技术学院_硕士生
创意与艺术学院_PI研究组(P)_田政组
通讯作者Tian, Zheng
作者单位
1.ShanghaiTech Univ, Shanghai, Peoples R China
2.Chinese Acad Sci, Shenzhen Inst Adv Technol, Beijing, Peoples R China
第一作者单位上海科技大学
通讯作者单位上海科技大学
第一作者的第一单位上海科技大学
推荐引用方式
GB/T 7714
Yin, Kangning,Zou, Shihao,Ge, Yuxuan,et al. Tri-Modal Motion Retrieval by Learning a Joint Embedding Space[C]:IEEE Computer Society,2024:1596-1605.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Yin, Kangning]的文章
[Zou, Shihao]的文章
[Ge, Yuxuan]的文章
百度学术
百度学术中相似的文章
[Yin, Kangning]的文章
[Zou, Shihao]的文章
[Ge, Yuxuan]的文章
必应学术
必应学术中相似的文章
[Yin, Kangning]的文章
[Zou, Shihao]的文章
[Ge, Yuxuan]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。