In this talk, I describe an integrated approach for dynamic scale out and recovery of stateful stream processing operators. The idea is to expose internal operator state explicitly to the stream processing system through a set of state management primitives. Externalised operator state is checkpointed periodically and backed up by the system. In addition, the system identifies operator bottlenecks and automatically scales them out by allocating new VMs. We evaluate this approach as part of the SEEP experimental stream processing system on the Amazon EC2 cloud platform and show that it can scale automatically, while recovering quickly from failures.
(This work was presented at SIGMOD'13.)
Bio: Peter Pietzuch is a Senior Lecturer at Imperial College London, leading the Large-scale Distributed Systems (LSDS) group in the Department of Computing. His research focuses on the design and engineering of scalable, reliable and secure large-scale software systems, with a particular interest in data management and networking issues. He has published over sixty research papers in international venues, including USENIX ATC, NSDI, SIGMOD, VLDB, ICDE, ICDCS, Middleware and DEBS. He has co-authored a book on Distributed Event-based Systems published by Springer. Before joining Imperial College, he was a post-doctoral fellow at Harvard University. He holds PhD and MA degrees from the University of Cambridge.