Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation
2024
会议录名称PROCEEDINGS OF MACHINE LEARNING RESEARCH
卷号235
页码56357-56381
发表状态已发表
摘要

Fine-tuning pretrained large models to downstream tasks is an important problem, which however suffers from huge memory overhead due to large-scale parameters. This work strives to reduce memory overhead in fine-tuning from perspectives of activation function and layer normalization. To this end, we propose the Approximate Backpropagation (Approx-BP) theory, which provides the theoretical feasibility of decoupling the forward and backward passes. We apply our Approx-BP theory to backpropagation training and derive memory-efficient alternatives of GELU and SiLU activation functions, which use derivative functions of ReLUs in the backward pass while keeping their forward pass unchanged. In addition, we introduce a Memory-Sharing Backpropagation strategy, which enables the activation memory to be shared by two adjacent layers, thereby removing activation memory usage redundancy. Our method neither induces extra computation nor reduces training efficiency. We conduct extensive experiments with pretrained vision and language models, and the results demonstrate that our proposal can reduce up to ∼30% of the peak memory usage. Our code is released at github. Copyright 2024 by the author(s)

关键词Memory architecture Memory management Activation functions Activation memory Down-stream Fine tuning Large models Large-scales Memory overheads Memory usage Memory-sharing Scale parameter
会议名称41st International Conference on Machine Learning, ICML 2024
会议地点Vienna, Austria
会议日期July 21, 2024 - July 27, 2024
收录类别EI
语种英语
出版者ML Research Press
EI入藏号20243817052881
EI主题词Visual languages
EISSN2640-3498
EI分类号1103 ; 1104 ; 1106.1.1
原始文献类型Conference article (CA)
文献类型会议论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/430549
专题信息科学与技术学院
信息科学与技术学院_硕士生
通讯作者Xu, Jun
作者单位
1.School of Statistics and Data Science, Nankai University, Tianjin, China
2.School of Information Science and Technology, ShanghaiTech University, Shanghai, China
3.Department of Automation, Tsinghua University, Peking, China
4.Central Research Institute, United Imaging Healthcare, Co., Ltd., China
推荐引用方式
GB/T 7714
Yang, Yuchen,Shi, Yingdong,Wang, Cheems,et al. Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation[C]:ML Research Press,2024:56357-56381.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Yang, Yuchen]的文章
[Shi, Yingdong]的文章
[Wang, Cheems]的文章
百度学术
百度学术中相似的文章
[Yang, Yuchen]的文章
[Shi, Yingdong]的文章
[Wang, Cheems]的文章
必应学术
必应学术中相似的文章
[Yang, Yuchen]的文章
[Shi, Yingdong]的文章
[Wang, Cheems]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。