Using a CPU-GPU hybrid computing framework is becoming a common configuration for supercomputers. The wide deployment of GPUs (as well as other hardware accelerators) brings to the HPC community a big question: Are we using them effectively? Inappropriate use of GPUs can generate incorrect results in certain cases, but more often, will slow down the program instead of speeding it up. This paper describes a tool that satisfies the needs of programmers to analyze the runtime performance of kernels and obtain insights for better GPU utilization. Compared to existing GPU performance tools, ours provides some unique features: data-centric profiling and generating complete GPU call stacks. With the guidance of the tool, we were able to improve the kernel performance of three widely-studied GPU benchmarks by a factor of up to 46.6x with minor code modificatio
Slides will be available for download here after the presentation.
Hui Zhang is a senior research engineer in the Memory Solutions Lab of Samsung Semiconductor Inc. He works on providing advanced data-center solutions. His current research focuses on the hyper-acceleration of Big-Data infrastructures(e.g., Spark), and using distributed and heterogeneous architectures (CPU/GPU/FPGA) to accelerate highly intensive data-analytic and machine learning workloads.
He received his Ph.D. in Computer Engineering from the University of Maryland under Dr. Jeffrey K. Hollingsworth, and B.S. in Electrical Engineering from Beihang University (BUAA). He conducted PhD research in the area of High-Performance-Computing (HPC), building performance tools for emerging highly-parallel programming models.