消息
×
loading..
Boosting Multi-modal Model Performance with Adaptive Gradient Modulation
2023-10-06
会议录名称2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)
ISSN1550-5499
发表状态已发表
DOI10.1109/ICCV51070.2023.02030
摘要While the field of multi-modal learning keeps growing fast, the deficiency of the standard joint training paradigm has become clear through recent studies. They attribute the sub-optimal performance of the jointly trained model to the modality competition phenomenon. Existing works attempt to improve the jointly trained model by modulating the training process. Despite their effectiveness, those methods can only apply to late fusion models. More importantly, the mechanism of the modality competition remains unexplored. In this paper, we first propose an adaptive gradient modulation method that can boost the performance of multi-modal models with various fusion strategies. Extensive experiments show that our method surpasses all existing modulation methods. Furthermore, to have a quantitative understanding of the modality competition and the mechanism behind the effectiveness of our modulation method, we introduce a novel metric to measure the competition strength. This metric is built on the mono-modal concept, a function that is designed to represent the competition-less state of a modality. Through systematic investigation, our results confirm the intuition that the modulation encourages the model to rely on the more informative modality. In addition, we find that the jointly trained model typically has a preferred modality on which the competition is weaker than other modalities. However, this preferred modality need not dominate others. Our code will be available at https://github.com/lihong2303/AGM_ICCV2023.
关键词Measurement Training Adaptation models Computer vision Systematics Codes Computational modeling
会议地点Paris, France
会议日期1-6 Oct. 2023
URL查看原文
收录类别EI
来源库IEEE
引用统计
正在获取...
文献类型会议论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/350265
专题信息科学与技术学院_硕士生
通讯作者Yi Zhou
作者单位
1.School of Information Science and Technology, ShanghaiTech University
2.Shanghai Center for Brain Science and Brain-Inspired Technology, Shanghai, China
3.SIST, University of Science and Technology of China, Hefei, China
4.NEL-BITA, University of Science and Technology of China, Hefei, China
5.Key Laboratory of Brain-inspired Intelligent Perception and Cognition (University of Science and Technology of China), Ministry of Education
6.School of Management, University of Science and Technology of China, Hefei, China
7.Shanghai Innovation Center for Processor Technologies
第一作者单位信息科学与技术学院
第一作者的第一单位信息科学与技术学院
推荐引用方式
GB/T 7714
Hong Li,Xingyu Li,Pengbo Hu,et al. Boosting Multi-modal Model Performance with Adaptive Gradient Modulation[C],2023.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Hong Li]的文章
[Xingyu Li]的文章
[Pengbo Hu]的文章
百度学术
百度学术中相似的文章
[Hong Li]的文章
[Xingyu Li]的文章
[Pengbo Hu]的文章
必应学术
必应学术中相似的文章
[Hong Li]的文章
[Xingyu Li]的文章
[Pengbo Hu]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。