MMJ-Bench: A Comprehensive Study on Jailbreak Attacks and Defenses for Vision Language Models
2024-08-16
状态已发表
摘要

As deep learning advances, Large Language Models (LLMs) and their multimodal counterparts, Vision-Language Models (VLMs), have shown exceptional performance in many real-world tasks. However, VLMs face significant security challenges, such as jailbreak attacks, where attackers attempt to bypass the model's safety alignment to elicit harmful responses. The threat of jailbreak attacks on VLMs arises from both the inherent vulnerabilities of LLMs and the multiple information channels that VLMs process. While various attacks and defenses have been proposed, there is a notable gap in unified and comprehensive evaluations, as each method is evaluated on different dataset and metrics, making it impossible to compare the effectiveness of each method. To address this gap, we introduce textit{MMJ-Bench}, a unified pipeline for evaluating jailbreak attacks and defense techniques for VLMs. Through extensive experiments, we assess the effectiveness of various attack methods against SoTA VLMs and evaluate the impact of defense mechanisms on both defense effectiveness and model utility for normal tasks. Our comprehensive evaluation contribute to the field by offering a unified and systematic evaluation framework and the first public-available benchmark for VLM jailbreak research. We also demonstrate several insightful findings that highlights directions for future studies.

DOIarXiv:2408.08464
相关网址查看原文
出处Arxiv
WOS记录号PPRN:91463304
WOS类目Computer Science, Information Systems
文献类型预印本
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/415564
专题信息科学与技术学院_博士生
信息科学与技术学院_硕士生
信息科学与技术学院_PI研究组_王雯婕组
通讯作者Wang, Wenjie
作者单位
ShanghaiTech Univ, Shanghai, Peoples R China
推荐引用方式
GB/T 7714
Weng, Fenghua,Xu, Yue,Fu, Chengyan,et al. MMJ-Bench: A Comprehensive Study on Jailbreak Attacks and Defenses for Vision Language Models. 2024.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Weng, Fenghua]的文章
[Xu, Yue]的文章
[Fu, Chengyan]的文章
百度学术
百度学术中相似的文章
[Weng, Fenghua]的文章
[Xu, Yue]的文章
[Fu, Chengyan]的文章
必应学术
必应学术中相似的文章
[Weng, Fenghua]的文章
[Xu, Yue]的文章
[Fu, Chengyan]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。