ShanghaiTech University Knowledge Management System
Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation | |
2024 | |
会议录名称 | PROCEEDINGS OF MACHINE LEARNING RESEARCH |
卷号 | 235 |
页码 | 56357-56381 |
发表状态 | 已发表 |
摘要 | Fine-tuning pretrained large models to downstream tasks is an important problem, which however suffers from huge memory overhead due to large-scale parameters. This work strives to reduce memory overhead in fine-tuning from perspectives of activation function and layer normalization. To this end, we propose the Approximate Backpropagation (Approx-BP) theory, which provides the theoretical feasibility of decoupling the forward and backward passes. We apply our Approx-BP theory to backpropagation training and derive memory-efficient alternatives of GELU and SiLU activation functions, which use derivative functions of ReLUs in the backward pass while keeping their forward pass unchanged. In addition, we introduce a Memory-Sharing Backpropagation strategy, which enables the activation memory to be shared by two adjacent layers, thereby removing activation memory usage redundancy. Our method neither induces extra computation nor reduces training efficiency. We conduct extensive experiments with pretrained vision and language models, and the results demonstrate that our proposal can reduce up to ∼30% of the peak memory usage. Our code is released at github. Copyright 2024 by the author(s) |
关键词 | Memory architecture Memory management Activation functions Activation memory Down-stream Fine tuning Large models Large-scales Memory overheads Memory usage Memory-sharing Scale parameter |
会议名称 | 41st International Conference on Machine Learning, ICML 2024 |
会议地点 | Vienna, Austria |
会议日期 | July 21, 2024 - July 27, 2024 |
收录类别 | EI |
语种 | 英语 |
出版者 | ML Research Press |
EI入藏号 | 20243817052881 |
EI主题词 | Visual languages |
EISSN | 2640-3498 |
EI分类号 | 1103 ; 1104 ; 1106.1.1 |
原始文献类型 | Conference article (CA) |
文献类型 | 会议论文 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/430549 |
专题 | 信息科学与技术学院 信息科学与技术学院_硕士生 |
通讯作者 | Xu, Jun |
作者单位 | 1.School of Statistics and Data Science, Nankai University, Tianjin, China 2.School of Information Science and Technology, ShanghaiTech University, Shanghai, China 3.Department of Automation, Tsinghua University, Peking, China 4.Central Research Institute, United Imaging Healthcare, Co., Ltd., China |
推荐引用方式 GB/T 7714 | Yang, Yuchen,Shi, Yingdong,Wang, Cheems,et al. Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation[C]:ML Research Press,2024:56357-56381. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。