CALIP: Zero-Shot Enhancement of CLIP with Parameter-Free Attention

	CALIP: Zero-Shot Enhancement of CLIP with Parameter-Free Attention
	Guo, Ziyu 1,2; Zhang, Renrui 2,3; Qiu, Longtian4 ; Ma, Xianzheng 3; Miao, Xupeng 5; He, Xuming4 ; Cui, Bin 1
	2023-06-27
会议录名称	PROCEEDINGS OF THE 37TH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, AAAI 2023
ISSN	2159-5399
卷号	37
页码	746-754
发表状态	已发表
摘要	Contrastive Language-Image Pre-training (CLIP) has been shown to learn visual representations with promising zero-shot performance. To further improve its downstream accuracy, existing works propose additional learnable modules upon CLIP and fine-tune them by few-shot training sets. However, the resulting extra training cost and data requirement severely hinder the efficiency for model deployment and knowledge transfer. In this paper, we introduce a free-lunch enhancement method, CALIP, to boost CLIP’s zero-shot performance via a parameter-free Attention module. Specifically, we guide visual and textual representations to interact with each other and explore cross-modal informative features via attention. As the pre-training has largely reduced the embedding distances between two modalities, we discard all learnable parameters in the attention and bidirectionally update the multi-modal features, enabling the whole process to be parameter-free and training-free. In this way, the images are blended with textual-aware signals and the text representations become visual-guided for better adaptive zero-shot alignment. We evaluate CALIP on various benchmarks of 14 datasets for both 2D image and 3D point cloud few-shot classification, showing consistent zero-shot performance improvement over CLIP. Based on that, we further insert a small number of linear layers in CALIP’s attention module and verify our robustness under the few-shot settings, which also achieves leading performance compared to existing methods. Those extensive experiments demonstrate the superiority of our approach for efficient enhancement of CLIP. Code is available at https://github.com/ZiyuGuo99/CALIP. Copyright © 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
会议举办国	Association for the Advancement of Artificial Intelligence
会议录编者/会议主办者	Association for the Advancement of Artificial Intelligence
关键词	Benchmarking Classification (of information) Knowledge management Visual languages Zero-shot learning Cost requirements Data requirements Down-stream Learn+ Performance Pre-training Training costs Training data Training sets Visual representations
会议名称	37th AAAI Conference on Artificial Intelligence, AAAI 2023
出版地	2275 E BAYSHORE RD, STE 160, PALO ALTO, CA 94303 USA
会议地点	Washington, DC, United states
会议日期	February 7, 2023 - February 14, 2023
URL	查看原文
收录类别	EI ; CPCI-S
语种	英语
资助项目	NSFC["61832001","U22B2037"]
WOS研究方向	Computer Science
WOS类目	Computer Science, Artificial Intelligence ; Computer Science, Interdisciplinary Applications ; Computer Science, Theory & Methods
WOS记录号	WOS:001243759700083
出版者	AAAI Press
EI入藏号	20233314552792
EI主题词	Image enhancement
EISSN	2374-3468
EI分类号	716.1 Information Theory and Signal Processing ; 723.1.1 Computer Programming Languages ; 723.5 Computer Applications ; 903.1 Information Sources and Analysis ; 903.3 Information Retrieval and Use
原始文献类型	Conference article (CA)
引用统计	正在获取...
文献类型	会议论文
条目标识符	https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/325817
专题	信息科学与技术学院_硕士生信息科学与技术学院_PI研究组_何旭明组信息科学与技术学院_博士生
通讯作者	Zhang, Renrui
作者单位	1.School of CS and Key Lab of HCST, Peking University, China; 2.The Chinese University of Hong Kong, Hong Kong; 3.Shanghai AI Laboratory, China; 4.ShanghaiTech University, China; 5.Carnegie Mellon University, United States
推荐引用方式 GB/T 7714	Guo, Ziyu,Zhang, Renrui,Qiu, Longtian,et al. CALIP: Zero-Shot Enhancement of CLIP with Parameter-Free Attention[C]//Association for the Advancement of Artificial Intelligence. 2275 E BAYSHORE RD, STE 160, PALO ALTO, CA 94303 USA:AAAI Press,2023:746-754.