Accelerating Mini-batch HGNN Training by Reducing CUDA Kernels
2025
会议录名称LECTURE NOTES IN COMPUTER SCIENCE (INCLUDING SUBSERIES LECTURE NOTES IN ARTIFICIAL INTELLIGENCE AND LECTURE NOTES IN BIOINFORMATICS)
ISSN0302-9743
卷号15253 LNCS
页码283-303
DOI10.1007/978-981-96-1542-1_17
摘要Heterogeneous graph neural networks (HGNNs) are essential for capturing the structure and semantic information in heterogeneous graphs. However, existing GPU-based solutions, such as PyTorch Geometric, suffer from low GPU utilization due to numerous short-execution-time and memory-bound CUDA kernels during HGNN training. To address this issue, we introduce HiFuse, an enhancement for PyTorch Geometric designed to accelerate mini-batch HGNN training on CPU-GPU systems. From the data perspective, we reorganize and merge multiple smaller vertex feature matrices into larger ones, enabling a single kernel to process larger data chunks. This efficiently exploits data locality, reduces the kernel launch overhead, and improves overall GPU utilization. From the workflow perspective, we sophisticatedly offload the construction of semantic graphs from GPU to CPU to reduce the number of CUDA kernels. To meet the parallelism requirements on CPU and ensure seamless execution between CPU and GPU, we employ parallelization techniques including multi-threading and asynchronous pipeline. This allows different stages of the process to overlap, enhancing GPU utilization and reducing end-to-end execution latency, leading to a more efficient and balanced use of computational resources. Through extensive experiments, HiFuse demonstrates an average 2.38× speedup compared to a state-of-the-art solution. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
关键词Computer graphics equipment - Digital storage - Graphics processing unit - Heterogeneous networks Feature matrices - Graph neural networks - Heterogeneous graph - Heterogeneous graph neural network - Memory bounds - Neural networks trainings - Semantics Information - Single kernel - Structure information - Time bound
会议名称24th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2024
会议地点Macau, China
会议日期October 29, 2024 - October 31, 2024
收录类别EI
语种英语
出版者Springer Science and Business Media Deutschland GmbH
EI入藏号20250917965122
EI主题词Graph neural networks
EISSN1611-3349
EI分类号1101 Artificial Intelligence - 1102.3.1 Computer Circuits - 1103.1 Data Storage, Equipment and Techniques - 1103.2 Computer Peripheral Equipment - 1105 Computer Networks - 714.2 Semiconductor Devices and Integrated Circuits
原始文献类型Conference article (CA)
文献类型会议论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/497029
专题信息科学与技术学院_硕士生
通讯作者Yan, Mingyu
作者单位
1.SKLP, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;
2.University of Chinese Academy of Sciences, Beijing, China;
3.Shanghai Tech University, Shanghai, China;
4.Yancheng Zhongke High Thoughput Computing Research Institute Co., Ltd., Jiangsu, Suzhou, China
推荐引用方式
GB/T 7714
Wu, Meng,Qiu, Jingkai,Yan, Mingyu,et al. Accelerating Mini-batch HGNN Training by Reducing CUDA Kernels[C]:Springer Science and Business Media Deutschland GmbH,2025:283-303.
条目包含的文件
条目无相关文件。
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Wu, Meng]的文章
[Qiu, Jingkai]的文章
[Yan, Mingyu]的文章
百度学术
百度学术中相似的文章
[Wu, Meng]的文章
[Qiu, Jingkai]的文章
[Yan, Mingyu]的文章
必应学术
必应学术中相似的文章
[Wu, Meng]的文章
[Qiu, Jingkai]的文章
[Yan, Mingyu]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。