RelVid: Relational Learning with Vision-Language Models for Weakly Video Anomaly Detection
2025-03-25
发表期刊SENSORS (IF:3.4[JCR-2023],3.7[5-Year])
EISSN1424-8220
卷号25期号:7
DOI10.3390/s25072037
摘要Weakly supervised video anomaly detection aims to identify abnormal events in video sequences without requiring frame-level supervision, which is a challenging task in computer vision. Traditional methods typically rely on low-level visual features with weak supervision from a single backbone branch, which often struggles to capture the distinctive characteristics of different categories. This limitation reduces their adaptability to real-world scenarios. In real-world situations, the boundary between normal and abnormal events is often unclear and context-dependent. For example, running on a track may be considered normal, but running on a busy road could be deemed abnormal. To address these challenges, RelVid is introduced as a novel framework that improves anomaly detection by expanding the relative feature gap between classes extracted from a single backbone branch. The key innovation of RelVid lies in the integration of auxiliary tasks, which guide the model to learn more discriminative features, significantly boosting the model's performance. These auxiliary tasks-including text-based anomaly detection and feature reconstruction learning-act as additional supervision, helping the model capture subtle differences and anomalies that are often difficult to detect in weakly supervised settings. In addition, RelVid incorporates two other components, which include class activation feature learning for improved feature discrimination and a temporal attention module for capturing sequential dependencies. This approach enhances the model's robustness and accuracy, enabling it to better handle complex and ambiguous scenarios. Evaluations on two widely used benchmark datasets, UCF-Crime and XD-Violence, demonstrate the effectiveness of RelVid. Compared to state-of-the-art methods, RelVid achieves superior performance in both detection accuracy and robustness.
关键词vision-language model Adapter weakly video anomaly detection feature learning
URL查看原文
收录类别SCI
语种英语
资助项目Guangxi Key Research and Development Plan[AB22080054] ; null[2021289]
WOS研究方向Chemistry ; Engineering ; Instruments & Instrumentation
WOS类目Chemistry, Analytical ; Engineering, Electrical & Electronic ; Instruments & Instrumentation
WOS记录号WOS:001465633100001
出版者MDPI
文献类型期刊论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/507079
专题信息科学与技术学院
信息科学与技术学院_硕士生
通讯作者Xu, Zhengyi; Chen, Xinrong
作者单位
1.Chinese Acad Sci, Shanghai Adv Res Inst, Shanghai 201210, Peoples R China
2.ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai 201210, Peoples R China
3.Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 100049, Peoples R China
4.Fudan Univ, Acad Engn & Technol, Shanghai 200433, Peoples R China
第一作者单位信息科学与技术学院
推荐引用方式
GB/T 7714
Wang, Jingxin,Li, Guohan,Liu, Jiaqi,et al. RelVid: Relational Learning with Vision-Language Models for Weakly Video Anomaly Detection[J]. SENSORS,2025,25(7).
APA Wang, Jingxin,Li, Guohan,Liu, Jiaqi,Xu, Zhengyi,Chen, Xinrong,&Wei, Jianming.(2025).RelVid: Relational Learning with Vision-Language Models for Weakly Video Anomaly Detection.SENSORS,25(7).
MLA Wang, Jingxin,et al."RelVid: Relational Learning with Vision-Language Models for Weakly Video Anomaly Detection".SENSORS 25.7(2025).
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Wang, Jingxin]的文章
[Li, Guohan]的文章
[Liu, Jiaqi]的文章
百度学术
百度学术中相似的文章
[Wang, Jingxin]的文章
[Li, Guohan]的文章
[Liu, Jiaqi]的文章
必应学术
必应学术中相似的文章
[Wang, Jingxin]的文章
[Li, Guohan]的文章
[Liu, Jiaqi]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。