NLPrompt: Noise-Label Prompt Learning for Vision-Language Models

doi:arXiv:2412.01256

	NLPrompt: Noise-Label Prompt Learning for Vision-Language Models
	Pan, Bikang 1; Li, Qun1 ; Tang, Xiaoying 2; Huang, Wei 3; Fang, Zhen 4; Liu, Feng 5; Wang, Jingya1 ; Yu, Jingyi1 ; Shi, Ye1
	2024-12-02
状态	已发表
摘要	The emergence of vision-language foundation models, such as CLIP, has revolutionized image-text representation, enabling a broad range of applications via prompt learning. Despite its promise, real-world datasets often contain noisy labels that can degrade prompt learning performance. In this paper, we demonstrate that using mean absolute error (MAE) loss in prompt learning, named PromptMAE, significantly enhances robustness against noisy labels while maintaining high accuracy. Though MAE is straightforward and recognized for its robustness, it is rarely used in noisy-label learning due to its slow convergence and poor performance outside prompt learning scenarios. To elucidate the robustness of PromptMAE, we leverage feature learning theory to show that MAE can suppress the influence of noisy samples, thereby improving the signal-to-noise ratio and enhancing overall robustness. Additionally, we introduce PromptOT, a prompt-based optimal transport data purification method to enhance the robustness further. PromptOT employs text encoder representations in vision-language models as prototypes to construct an optimal transportation matrix. This matrix effectively partitions datasets into clean and noisy subsets, allowing for the application of cross-entropy loss to the clean subset and MAE loss to the noisy subset. Our Noise-Label Prompt Learning method, named NLPrompt, offers a simple and efficient approach that leverages the expressive representation and precise alignment capabilities of vision-language models for robust prompt learning. We validate NLPrompt through extensive experiments across various noise settings, demonstrating significant performance improvements.
语种	英语
DOI	arXiv:2412.01256
相关网址	查看原文
出处	Arxiv
收录类别	PPRN.PPRN
WOS记录号	PPRN:119639141
WOS类目	Computer Science, Artificial Intelligence ; Computer Science, Software Engineering
文献类型	预印本
条目标识符	https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/471031
专题	信息科学与技术学院_PI研究组_石野组信息科学与技术学院_PI研究组_虞晶怡组信息科学与技术学院_硕士生信息科学与技术学院_PI研究组_汪婧雅组
作者单位	1.ShanghaiTech Univ, Shanghai, Peoples R China 2.Chinese Univ Hong Kong, Shenzhen, Peoples R China 3.RIKEN Ctr Adv Intelligence Project, Tokyo, Japan 4.Univ Technol Sydney, Sydney, Australia 5.Univ Melbourne, Melbourne, Australia
推荐引用方式 GB/T 7714	Pan, Bikang,Li, Qun,Tang, Xiaoying,et al. NLPrompt: Noise-Label Prompt Learning for Vision-Language Models. 2024.