02.07.2010

Beteiligte : Sid Touati (INRIA)

ABSTRACT

Code optimisation methods are usually experimented with by repeatedly measuring the execution times of the initial and optimised code versions. Given a fixed input for the program, speedup is defined as the ratio of the initial and optimised execution times. However, even with a fixed execution environment, running times vary, especially for toy/kernel benchmarks. With the introduction of multi-core architectures, this variability is becoming increasingly unstable. So different kinds of speedups may be reported, based on average execution time, minimal execution time, median, etc. Many published speedups in the literature are observations of a set of experiments that do not guarantee reproducibility. To improve upon that, this tutorial presents a rigorous statistical methodology regarding program performance analysis. We rely on well-known statistical tests (Shapiro-Wilk's test, Fisher's F-test, Student's t-test, Kolmogorov-Smirnov's test, Wilcoxon-Mann-Whitney's test) to study whether observed speedups are statistically significant. By fixing a desired risk level, we are able to analyse the statistical significance for both average and median execution times.

Link to the document and to the software: http://hal.archives-ouvertes.fr/inria-00443839

REFERENCES

[1]Lawrence D Brown, Tony Cai, and Anirban DasGupta. Interval Estimation for a Binomial Proportion. Statistical Science, 16(2):101{133, 2001.

[2] William Feller. An introduction to probability theory and its applications, volume 1. John Wiley and Sons, Inc., 1968. Third edition.

[3] Gilbert Saporta. Probabilités, analyse des données et statistique. Editions Technip, Paris, France, 1990. ISBN 978-2-7108-0814-5.

[4] Mann H.B. and Whitney D.R. On a test of whether one of two random variables is stochastically larger than the other. Ann. of Math. Statistics, 18(1):50{60, 1947.

[5] Myles Hollanderand and Douglas A. Wolfe. Nonparametric Statistical Methods. Wiley-Interscience, 1973. ISBN: 0-471-40635-X.

[6] R Development Core Team. R: A Language and Environment for Statistical Computing. 2008.

[7] Raj Jain. The Art of Computer Systems Performance Analysis : Techniques for Experimental Design, Measurement, Simulation, and Modelling. John Wiley and Sons, inc., New York, 1991.

[8] D Van Dantzig. On the consistency and the power of wilcoxon's two-sample test. In Koninklijke Nederlandse Akademie Van Wetenschappen,