EA-VTR: Event-Aware Video-Text Retrieval
2025
会议录名称COMPUTER VISION - ECCV 2024, PT LII (IF:0.402[JCR-2005],0.000[5-Year])
ISSN0302-9743
卷号15110
发表状态已发表
DOI10.1007/978-3-031-72943-0_5
摘要Understanding the content of events occurring in the video and their inherent temporal logic is crucial for video-text retrieval. However, web-crawled pre-training datasets often lack sufficient event information, and the widely adopted video-level cross-modal contrastive learning also struggles to capture detailed and complex video-text event alignment. To address these challenges, we make improvements from both data and model perspectives. In terms of pre-training data, we focus on supplementing the missing specific event content and event temporal transitions with the proposed event augmentation strategies. Based on the event-augmented data, we construct a novel Event-Aware Video-Text Retrieval model, i.e., EA-VTR, which achieves powerful video-text retrieval ability through superior video event awareness. EA-VTR can efficiently encode frame-level and video-level visual representations simultaneously, enabling detailed event content and complex event temporal cross-modal alignment, ultimately enhancing the comprehensive understanding of video events. Our method not only significantly outperforms existing approaches on multiple datasets for Text-to-Video Retrieval and Video Action Recognition tasks, but also demonstrates superior event content perceive ability on Multi-event Video-Text Retrieval and Video Moment Retrieval tasks, as well as outstanding event temporal logic understanding ability on Test of Time task.
会议名称18th European Conference on Computer Vision (ECCV)
出版地GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND
会议地点null,Milan,ITALY
会议日期SEP 29-OCT 04, 2024
URL查看原文
收录类别CPCI-S
语种英语
资助项目Key Research and Development Program of Xinjiang Urumqi Autonomous Region[2023B01005] ; Natural Science Foundation of China["62302501","62036011","62122086","62192782","61721004","U2033210","62372451"] ; Beijing Natural Science Foundation["JQ21017","JQ24022","L243015"]
WOS研究方向Computer Science
WOS类目Computer Science, Artificial Intelligence ; Computer Science, Interdisciplinary Applications ; Computer Science, Theory & Methods
WOS记录号WOS:001401189300005
出版者SPRINGER INTERNATIONAL PUBLISHING AG
EISSN1611-3349
文献类型会议论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/496890
专题信息科学与技术学院
通讯作者Zhang, Ziqi
作者单位
1.Chinese Acad Sci, Inst Automat, MAIS, Beijing, Peoples R China
2.Tencent PCG, ARC Lab, Shenzhen, Peoples R China
3.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
4.ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China
5.Univ Hong Kong, Pokfulam, Hong Kong, Peoples R China
推荐引用方式
GB/T 7714
Ma, Zongyang,Zhang, Ziqi,Chen, Yuxin,et al. EA-VTR: Event-Aware Video-Text Retrieval[C]. GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND:SPRINGER INTERNATIONAL PUBLISHING AG,2025.
条目包含的文件
条目无相关文件。
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Ma, Zongyang]的文章
[Zhang, Ziqi]的文章
[Chen, Yuxin]的文章
百度学术
百度学术中相似的文章
[Ma, Zongyang]的文章
[Zhang, Ziqi]的文章
[Chen, Yuxin]的文章
必应学术
必应学术中相似的文章
[Ma, Zongyang]的文章
[Zhang, Ziqi]的文章
[Chen, Yuxin]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。