Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation

	Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation
	Yang, Yuchen 1; Shi, Yingdong2 ; Wang, Cheems 3; Zhen, Xiantong 4; Shi, Yuxuan 1; Xu, Jun 1
	2024
会议录名称	PROCEEDINGS OF MACHINE LEARNING RESEARCH
卷号	235
页码	56357-56381
发表状态	已发表
摘要	Fine-tuning pretrained large models to downstream tasks is an important problem, which however suffers from huge memory overhead due to large-scale parameters. This work strives to reduce memory overhead in fine-tuning from perspectives of activation function and layer normalization. To this end, we propose the Approximate Backpropagation (Approx-BP) theory, which provides the theoretical feasibility of decoupling the forward and backward passes. We apply our Approx-BP theory to backpropagation training and derive memory-efficient alternatives of GELU and SiLU activation functions, which use derivative functions of ReLUs in the backward pass while keeping their forward pass unchanged. In addition, we introduce a Memory-Sharing Backpropagation strategy, which enables the activation memory to be shared by two adjacent layers, thereby removing activation memory usage redundancy. Our method neither induces extra computation nor reduces training efficiency. We conduct extensive experiments with pretrained vision and language models, and the results demonstrate that our proposal can reduce up to ∼30% of the peak memory usage. Our code is released at github. Copyright 2024 by the author(s)
关键词	Memory architecture Memory management Activation functions Activation memory Down-stream Fine tuning Large models Large-scales Memory overheads Memory usage Memory-sharing Scale parameter
会议名称	41st International Conference on Machine Learning, ICML 2024
会议地点	Vienna, Austria
会议日期	July 21, 2024 - July 27, 2024
收录类别	EI
语种	英语
出版者	ML Research Press
EI入藏号	20243817052881
EI主题词	Visual languages
EISSN	2640-3498
EI分类号	1103 ; 1104 ; 1106.1.1
原始文献类型	Conference article (CA)
文献类型	会议论文
条目标识符	https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/430549
专题	信息科学与技术学院信息科学与技术学院_硕士生
通讯作者	Xu, Jun
作者单位	1.School of Statistics and Data Science, Nankai University, Tianjin, China 2.School of Information Science and Technology, ShanghaiTech University, Shanghai, China 3.Department of Automation, Tsinghua University, Peking, China 4.Central Research Institute, United Imaging Healthcare, Co., Ltd., China
推荐引用方式 GB/T 7714	Yang, Yuchen,Shi, Yingdong,Wang, Cheems,et al. Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation[C]:ML Research Press,2024:56357-56381.