AKG: Automatic Kernel Generation for Neural Processing Units using Polyhedral Transformation

AKG (Auto Kernel Generator) is a tensor compiler for NPUs. AKG leverages polyhedral schedulers to perform a much wider class of transformations, and extends the semantics of the polyhedral representation to combine complex tiling techniques and hierarchical fusion strategies. Published in MICRO’20 and PLDI’21.

Abstract

Existing tensor compilers have proven their effectiveness in deploying deep neural networks on general-purpose hardware like CPU and GPU, but optimizing for neural processing units (NPUs) is still challenging due to the heterogeneous compute units and complicated memory hierarchy.

In this paper, we present AKG, a tensor compiler for NPUs. AKG first lowers the tensor expression language to a polyhedral representation, which is used to automate the memory management of NPUs. Unlike existing approaches that resort to manually written schedules, AKG leverages polyhedral schedulers to perform a much wider class of transformations, and extends the semantics of the polyhedral representation to combine complex tiling techniques and hierarchical fusion strategies. We also implement the domain-specific optimization of convolution in AKG. Moreover, to achieve the optimal performance, we introduce complementary optimizations in code generation, which is followed by an auto-tuner.

We conduct extensive experiments on benchmarks ranging from single operators to end-to-end networks. The experimental results show that AKG can obtain superior performance to both manual scheduling approaches and vendor provided libraries. We believe AKG will cast a light on the follow-up compiler works on NPUs.

Publications

  • Jie Zhao, Bojie Li, Wang Nie, Zhen Geng, Renwei Zhang, Xiong Gao, Bin Cheng, Chen Wu, Yun Cheng, Zheng Li, Peng Di, Kun Zhang, Xuefeng Jin. AKG: Automatic Kernel Generation for Neural Processing Units using Polyhedral Transformations. 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI’21). Virtual, Canada, June 20-25, 2021. pp.1233-1248. [Paper PDF] [Slides by Jie Zhao]
  • Jie Zhao, Peng Di. Optimizing the Memory Hierarchy by Compositing Automatic Transformations on Computations and Data. 53rd IEEE/ACM International Symposium on Microarchitecture (MICRO 2020). Global Online Event, October 17-21, 2020. pp.427-441. [Paper PDF] [Slides by Jie Zhao] (1/4 best paper candidates)