Colloquium

Colloquium d’Informatique de Sorbonne Université

Willy Zwaenepoel, École Polytechnique Fédérale de Lausanne

Mardi 22 mars 2016 18 h
Amphi 25, Sorbonne Université - Faculté des Sciences

Really Big Data
Analytics on Graphs with Trillions of Edges

Willy Zwaenepoel received his BS/MS from the University of Gent, Belgium, and his PhD from Stanford University. He is currently a Professor of Computer Science at EPFL. Before he has held appointments as Professor of Computer Science and Electrical Engineering at Rice University, and as Dean of the School of Computer and Communication Sciences at EPFL. His interests are in operating systems and distributed systems. He is a Fellow of the ACM and the IEEE, he has received the IEEE Kanai Award and several best paper awards, and is a member of the Belgian and European Academies. He has also been involved in a number of startups, including BugBuster (acquired by AppDynamics), iMimic (acquired by Ironport/Cisco), Midokura and Nutanix.

Big graphs occur naturally in many applications, most obviously in social networks, but also in many other areas such as biology and forensics. Current approaches to processing large graphs use either supercomputers or very large clusters. In both cases the entire graph must reside in memory before it can be processed. We are pursuing an alternative approach, processing graphs from secondary storage. While this comes with a performance penalty, it makes analytics on very large graphs feasible on a small number of commodity machines.

We have developed two systems, one for a single machine and one for a cluster of machines. X-Stream, the single machine solution, aims to make all secondary storage access sequential. It uses two techniques to achieve this goal, edge-centric processing and streaming partitions. Chaos, the cluster solution, starts from the observation that there is little benefit to locality when accessing data from secondary storage over a high-speed network. As a result, Chaos spreads graph data uniformly randomly over storage devices, and uses randomized access to achieve I/O balance. Chaos furthermore uses work stealing to achieve computational load balance. By using these techniques, it avoids the need for expensive partitioning during pre-processing, while still achieving good scaling behavior. With Chaos we have been able to process an 8-trillion-edge graph on 32 machines, a new milestone for graph size on a small cluster. I will describe both systems and their performance on a number of benchmarks and in comparison to state-of-the-art alternatives.

This is joint work with Laurent Bindschaedler (EPFL), Jasmina Malicevic (EPFL) and Amitabha Roy (Intel Labs).

Master Class

L'un des moment particulièrement apprécié lors du colloquium est la « Masterclass » au cours de laquelle quelques doctorants du laboratoires ont l'opportunité de présenter leurs travaux à l'invité(e). Chaque présentation est suivie d'une discussion approfondie. Le programme complet est donné dans le document suivant.

Informations en ligne

Transparents

Enregistrement vidéo

https://sorbonne-universite.cloud.panopto.eu/Panopto/Pages/Embed.aspx?id=4dfa0641-3d1c-4c73-b3fc-aec80124c397

À propos

Initié en 2012, le Colloquium d’Informatique de Sorbonne Université est un évènement régulier ayant pour but d'inviter des personnalités majeures du domaine de l’informatique à donner une conférence sur le campus de la faculté des sciences et ingénierie de Sorbonne Université. Il vise un public large, divers mais techniquement averti, et notamment les chercheurs en informatique de toutes spécialités, les doctorants et les étudiants en informatique de niveau Master.

L’évènement principal du Colloquium est l’exposé de l’orateur, d’environ 45 minutes, suivi d’une séance de questions et d’interactions avec l’auditoire. Il est généralement associé à l’organisation d’une masterclass à destination des doctorants du LIP6 et/ou d’autres laboratoires.

Principal participant au comité d’organisation, le LIP6 assure l’organisation du Colloquium et reçoit occasionnellement le soutien de l’ISIR.

Comité de Pilotage

Contact: Marc Shapiro

Annonce des Colloquium

Si vous souhaitez être informé des prochains événements, vous pouvez souscrire à la liste de diffusion.
Si vous ne souhaitez plus être informé des événements, vous pouvez vous désinscrire de la liste de diffusion