Implementation and optimization techniques of linear algebra kernels for modern HPC systems
Speaker(s) : Daichi Mukunoki (Tokyo Woman's Christian University / RIKEN Advanced Institute of Computational Sciences)
In recent years, performance improvements of HPC systems have relied on increasing parallelism. At system-level, the number of nodes is increasing, and at node-level, the number of cores is increasing. Thus, software must support both highly parallelism and multiple levels of parallelism. In this talk, I will introduce our recent studies to adress those issues on linear algebra computations on highly parallel architectures: (1) a model-based performance tuning of level-2 BLAS kernels on multiple GPU architectures and (2) implementation and performance analysis of parallel matrix-multiplication (PDGEMM) using the 2.5D algorithm on the K computer. In addition, I'd like to introduce our research plans to provide accurate computation environtments on future HPC systems.
Marc.Mezzarobba (at) nulllip6.fr