LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models

doi:10.1109/ICCV51070.2023.00274

	LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models
	Shi, Cheng; Yang, Sibei
	2023
会议录名称	PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION
ISSN	1550-5499
页码	2920-2929
发表状态	已发表
DOI	10.1109/ICCV51070.2023.00274
摘要	Prompt engineering is a powerful tool used to enhance the performance of pre-trained models on downstream tasks. For example, providing the prompt "Let's think step by step"improved GPT-3's reasoning accuracy to 63% on MutiArith while prompting "a photo of"filled with a class name enables CLIP to achieve 80% zero-shot accuracy on ImageNet. While previous research has explored prompt learning for the visual modality, analyzing what constitutes a good visual prompt specifically for image recognition is limited. In addition, existing visual prompt tuning methods' generalization ability is worse than text-only prompting tuning. This paper explores our key insight: synthetic text images are good visual prompts for vision-language models! To achieve that, we propose our LoGoPrompt, which reformulates the classification objective to the visual prompt selection and addresses the chicken-and-egg challenge of first adding synthetic text images as class-wise visual prompts or predicting the class first. Without any trainable visual prompt parameters, experimental results on 16 datasets demonstrate that our method consistently outperforms state-of-the-art methods in few-shot learning, base-to-new generalization, and domain generalization. © 2023 IEEE.
关键词	Visualization Computer vision Image recognition Self-supervised learning Cognition Task analysis Tuning
会议名称	2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
会议地点	Paris, France
会议日期	October 2, 2023 - October 6, 2023
URL	查看原文
收录类别	EI
语种	英语
出版者	Institute of Electrical and Electronics Engineers Inc.
EI入藏号	20241215794325
原始文献类型	Conference article (CA)
来源库	IEEE
引用统计	正在获取...
文献类型	会议论文
条目标识符	https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/359965
专题	信息科学与技术学院信息科学与技术学院_硕士生信息科学与技术学院_PI研究组_杨思蓓组
通讯作者	Yang, Sibei
作者单位	ShanghaiTech University, School of Information Science and Technology, China
第一作者单位	信息科学与技术学院
通讯作者单位	信息科学与技术学院
第一作者的第一单位	信息科学与技术学院
推荐引用方式 GB/T 7714	Shi, Cheng,Yang, Sibei. LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models[C]:Institute of Electrical and Electronics Engineers Inc.,2023:2920-2929.