Reproducible and Accurate BLAS for ExaScale Computing
报告人 : Roman Iakymchuk (Pequan)As Exascale computing (10^18 operations per second) is likely to be reached within a decade, getting accurate results in floating-point arithmetic on such computers will be a challenge. However, another challenge will be the reproducibility of the results -- meaning getting a bitwise identical floating-point result from multiple runs of the same code -- due to non-associativity of floating-point operations and dynamic scheduling on parallel computers.
In this talk, I will present a reproducible and accurate (rounding-to-nearest) algorithms for the fundamental linear algebra operations -- like the ones included in the BLAS library -- in parallel environments such as Intel server CPUs, Intel Xeon Phi, and both NVIDIA and AMD GPUs. I will show that the performance of our algorithms is comparable with the standard non-deterministic BLAS routines.
marc (at) nullmezzarobba.net