ShanghaiTech University Knowledge Management System
EA-VTR: Event-Aware Video-Text Retrieval | |
2025 | |
会议录名称 | COMPUTER VISION - ECCV 2024, PT LII (IF:0.402[JCR-2005],0.000[5-Year]) |
ISSN | 0302-9743 |
卷号 | 15110 |
发表状态 | 已发表 |
DOI | 10.1007/978-3-031-72943-0_5 |
摘要 | Understanding the content of events occurring in the video and their inherent temporal logic is crucial for video-text retrieval. However, web-crawled pre-training datasets often lack sufficient event information, and the widely adopted video-level cross-modal contrastive learning also struggles to capture detailed and complex video-text event alignment. To address these challenges, we make improvements from both data and model perspectives. In terms of pre-training data, we focus on supplementing the missing specific event content and event temporal transitions with the proposed event augmentation strategies. Based on the event-augmented data, we construct a novel Event-Aware Video-Text Retrieval model, i.e., EA-VTR, which achieves powerful video-text retrieval ability through superior video event awareness. EA-VTR can efficiently encode frame-level and video-level visual representations simultaneously, enabling detailed event content and complex event temporal cross-modal alignment, ultimately enhancing the comprehensive understanding of video events. Our method not only significantly outperforms existing approaches on multiple datasets for Text-to-Video Retrieval and Video Action Recognition tasks, but also demonstrates superior event content perceive ability on Multi-event Video-Text Retrieval and Video Moment Retrieval tasks, as well as outstanding event temporal logic understanding ability on Test of Time task. |
会议名称 | 18th European Conference on Computer Vision (ECCV) |
出版地 | GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND |
会议地点 | null,Milan,ITALY |
会议日期 | SEP 29-OCT 04, 2024 |
URL | 查看原文 |
收录类别 | CPCI-S |
语种 | 英语 |
资助项目 | Key Research and Development Program of Xinjiang Urumqi Autonomous Region[2023B01005] ; Natural Science Foundation of China["62302501","62036011","62122086","62192782","61721004","U2033210","62372451"] ; Beijing Natural Science Foundation["JQ21017","JQ24022","L243015"] |
WOS研究方向 | Computer Science |
WOS类目 | Computer Science, Artificial Intelligence ; Computer Science, Interdisciplinary Applications ; Computer Science, Theory & Methods |
WOS记录号 | WOS:001401189300005 |
出版者 | SPRINGER INTERNATIONAL PUBLISHING AG |
EISSN | 1611-3349 |
文献类型 | 会议论文 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/496890 |
专题 | 信息科学与技术学院 |
通讯作者 | Zhang, Ziqi |
作者单位 | 1.Chinese Acad Sci, Inst Automat, MAIS, Beijing, Peoples R China 2.Tencent PCG, ARC Lab, Shenzhen, Peoples R China 3.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China 4.ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China 5.Univ Hong Kong, Pokfulam, Hong Kong, Peoples R China |
推荐引用方式 GB/T 7714 | Ma, Zongyang,Zhang, Ziqi,Chen, Yuxin,et al. EA-VTR: Event-Aware Video-Text Retrieval[C]. GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND:SPRINGER INTERNATIONAL PUBLISHING AG,2025. |
条目包含的文件 | ||||||
条目无相关文件。 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。