Ensuring high productivity in scientific software development necessitates developing and maintaining a single codebase that can run efficiently on a range of accelerator-based supercomputing platforms. While prior work has investigated the performance portability of a few selected proxy applications or programming models, in this presentation we will discuss our comprehensive study of a range of proxy applications implemented in the major programming models suitable for GPU-based platforms. We present and analyze performance results across NVIDIA and AMD GPU hardware currently deployed in leadership-class computing facilities using a representative range of scientific codes and several programming models – CUDA, HIP, Kokkos, RAJA, OpenMP, OpenACC, and SYCL. Based on the specific characteristics of applications tested, we include recommendations to developers on how to choose the right programming model for their code. We find that Kokkos, RAJA, and SYCL in particular offer the most promise empirically as performance portable programming models. These results provide a comprehensive evaluation of the extent to which each programming model for heterogeneous systems provides true performance portability in real-world usage.
Slides will be available for download here after the presentation.
Josh Davis is a fourth-year computer science Ph.D. student at the University of Maryland in College Park, MD, advised by Prof. Abhinav Bhatele. His primary research interests are performance portable GPU programming models, tools for productive and portable GPU performance analysis, and automatic verification of correctness in parallel programs. He received Bachelor's degrees in computer science and philosophy from the University of Delaware in 2017.