消息
×
loading..
How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?
2024-07-10
会议录名称ARXIV
ISSN1063-6919
发表状态已发表
DOIarXiv:2407.07479
摘要

Dominant dual-encoder models enable efficient imagetext retrieval but suffer from limited accuracy, while the cross-encoder models offer higher accuracy at the expense of efficiency. Distilling cross-modality matching knowledge from cross-encoder to dual-encoder provides a natural approach to harness their strengths. Thus, we investigate the following valuable question: how to make crossencoder a good teacher for dual-encoder? Our findings are threefold: (1) Cross-modal similarity score distribution of cross-encoder is more concentrated, while the result of dual-encoder is nearly normal, making vanilla logit distillation less effective. However, ranking distillation remains practical, as it is not affected by the score distribution. (2) Only the relative order between hard negatives conveys valid knowledge, while the order information between easy negatives has little significance. (3) Maintaining the coordination between distillation loss and dual-encoder training loss is beneficial for knowledge transfer. Based on these findings, we propose a novel Contrastive Partial Ranking Distillation (CPRD) method, which implements the objective of mimicking relative order between hard negative samples with contrastive learning. This approach coordinates with the training of the dual-encoder, effectively transferring valid knowledge from the cross-encoder to the dualencoder. Extensive experiments on image-text retrieval and ranking tasks show that our method surpasses other distillation methods and significantly improves the accuracy of dual-encoder.

会议地点Seattle, WA, USA
会议日期16-22 June 2024
URL查看原文
WOS类目Computer Science, Software Engineering
WOS记录号PPRN:90762130
来源库IEEE
文献类型会议论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/408348
专题信息科学与技术学院
通讯作者Yuan, Chunfeng
作者单位
1.Chinese Acad Sci, Inst Automation, State Key Lab Multimodal Artificial Intelligence Syst, Beijing, Peoples R China
2.ARC Lab, Tencent PCG, Shenzhen, Peoples R China
3.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
4.ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China
5.Univ Hong Kong, Hong Kong, Peoples R China
推荐引用方式
GB/T 7714
Chen, Yuxin,Ma, Zongyang,Zhang, Ziqi,et al. How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?[C],2024.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Chen, Yuxin]的文章
[Ma, Zongyang]的文章
[Zhang, Ziqi]的文章
百度学术
百度学术中相似的文章
[Chen, Yuxin]的文章
[Ma, Zongyang]的文章
[Zhang, Ziqi]的文章
必应学术
必应学术中相似的文章
[Chen, Yuxin]的文章
[Ma, Zongyang]的文章
[Zhang, Ziqi]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。