消息
×
loading..
Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks
2020-11
会议录名称14TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION
页码881-897
发表状态已发表
DOIN/A
摘要

Performing Deep Neural Network (DNN) computation on hardware accelerators efficiently is challenging. Existing DNN frameworks and compilers often treat the DNN operators in a data flow graph (DFG) as opaque library functions and schedule them onto accelerators to be executed individually. They rely on another layer of scheduler, often implemented in hardware, to exploit the parallelism available in the operators. Such a two-layered approach incurs significant scheduling overhead and often cannot fully utilize the available hardware resources. In this paper, we propose RAMMER, a DNN compiler design that optimizes the execution of DNN workloads on massively parallel accelerators. RAMMER generates an efficient static spatio-temporal schedule for a DNN at compile time to minimize scheduling overhead. It maximizes hardware utilization by holistically exploiting parallelism through inter- and intra- operator co-scheduling. RAMMER achieves this by proposing several novel, hardware neutral, and clean abstractions for the computation tasks and the hardware accelerators. These abstractions expose a much richer scheduling space to RAMMER, which employs several heuristics to explore this space and finds efficient schedules. We implement RAMMER for multiple hardware backends such as NVIDIA GPUs, AMD GPUs, and Graphcore IPU. Experiments show RAMMER significantly outperforms state-of-the-art compilers such as TensorFlow XLA and TVM by up to 20.1×. It also outperforms TensorRT, a vendor optimized proprietary DNN inference library from NVIDIA, by up to 3.1×. © 2020 Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2020. All rights reserved.

会议录编者/会议主办者Alibaba Group ; Alipay ; Amazon ; Ant Group ; et al. ; USENIX
关键词Deep neural networks Graphic methods Optimization Abstracting Systems analysis Program compilers Data flow analysis Data flow graphs Compiler optimizations Computation tasks Hardware accelerators Hardware resources Hardware utilization Layered approaches Library functions Massively parallels
会议名称14th USENIX Symposium on Operating Systems Design and Implementation,OSDI 2020
会议地点Virtual, Online
会议日期November 4, 2020 - November 6, 2020
收录类别SCI ; CPCI ; CPCI-S ; EI
语种英语
出版者USENIX Association
EI入藏号N/A
EI主题词Scheduling
EI分类号461.4 Ergonomics and Human Factors Engineering ; 903.1 Information Sources and Analysis ; 912.2 Management ; 912.3 Operations Research ; 921.4 Combinatorial Mathematics, Includes Graph Theory, Set Theory ; 921.5 Optimization Techniques ; 961 Systems Science
原始文献类型Conference article (CA)
文献类型会议论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/183458
专题信息科学与技术学院_硕士生
共同第一作者Xie, Zhiqiang
通讯作者Ma, Lingxiao
作者单位
1.Peking University
2.ShanghaiTech University
3.Microsoft Research
推荐引用方式
GB/T 7714
Ma, Lingxiao,Xie, Zhiqiang,Yang, Zhi,et al. Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks[C]//Alibaba Group, Alipay, Amazon, Ant Group, et al., USENIX:USENIX Association,2020:881-897.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Ma, Lingxiao]的文章
[Xie, Zhiqiang]的文章
[Yang, Zhi]的文章
百度学术
百度学术中相似的文章
[Ma, Lingxiao]的文章
[Xie, Zhiqiang]的文章
[Yang, Zhi]的文章
必应学术
必应学术中相似的文章
[Ma, Lingxiao]的文章
[Xie, Zhiqiang]的文章
[Yang, Zhi]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。