Prolonging the Life of Linux: Is it possible or not?
Speaker(s) : Kenji KONO (Université de Keio - Japon)
Linux is far from bug-free, although it is crucially important to achieve high availability of computer systems from smartphones to enterprise servers. When a failure is detected inside the kernel, Linux merely crashes without trying to continue to its service. In this talk, we explore the possibility of prolonging the life of Linux; i.e. continuing the service even after a failure is detected inside the kernel. To this end, we conduct an experimental campaign of fault injection on Linux 2.6.38, using a kernel-level fault injector widely used in the OS community. Our findings are threefold. First, most errors propagate only within the contexts of the failing processes. This implies that Linux can be recovered simply by revoking the context of a failing process. Second, even if an error propagates to global data structures, the corrupted data can be scarcely accessed from other processes because of the synchronization primitives. Finally, the life of Linux can be prolonged with a high probability.
More details here
Gilles.Muller (at) null