LE GUILLOU Ève

PhD Student at Sorbonne University
Team : APR
    Sorbonne Université - LIP6
    Boîte courrier 169
    Couloir 25-26, Étage 3, Bureau 302
    4 place Jussieu
    75252 PARIS CEDEX 05
    FRANCE

+33 1 44 27 88 79
Eve.Le-Guillou (at) nulllip6.fr
https://lip6.fr/Eve.Le-Guillou

Supervision : Julien TIERNY, Pierre FORTIN

Topological Analysis of Distributed Data

Topological Data Analysis (TDA) tackles the complexity of large-scale data by capturing its structural characteristics in a concise encoding for analysis and visualization. As datasets grow, it becomes frequent for a single dataset to exceed the memory limit of one machine, making distributed-memory systems, with their much larger capacities, a necessary solution. However, adapting an algorithm for distributed-memory systems requires substantial changes to ensure correctness and performance. In particular, TDA algorithms face challenges in this context, as they rely on global data accesses and multiple traversals with minimal computation, a combinationthat often scales poorly in a distributed-memory context. Furthermore, existing distributed-memory implementations are mono-tailored for one particular topological representation which induces practical drawbacks. The Topology ToolKit (TTK) aims at providing a unified framework for TDA algorithms with a reusable and efficient data structure. However, TTK was up until now limited to shared-memory parallelism. In this thesis, we add distributed support to TTK using the Message Passing Interface (MPI). First, we adapt TTK’s core data structure and add distributed-memory support to several existing algorithms, both to demonstrate the new features and highlight their performance. Performance tests showcase the efficiency of each algorithm as well as of the overallsoftware infrastructure. Additionally, we apply a real-life topological analysis pipeline to two massive datasets to demonstrate our software’s effectiveness at scale. Then, we focus our effort on a much more complex abstraction: the persistence diagram. Its robustness and reliability make it one of the most used topological representation. The Discrete Morse Sandwich (DMS) is currently the most efficient algorithm for computing the diagram on one node. Our new method, the Distributed Discrete Morse Sandwich (DDMS), builds upon DMS and introduces tailored step-specific modifications, resulting in a hybrid MPI+thread implementation. Performance tests demonstrate thegain of our approach over the original DMS method as well as Dipha, the reference method for persistence diagram computation in a distributed-memory context. Our method successfully computes persistence diagrams on datasets containing up to 6 billion vertices.


Phd defence : 10/10/2025

Jury members :

Tom PETERKA, Argonne National Laboratory [Rapporteur]
David COEURJOLLY, CNRS [Rapporteur]
Julien TIERNY, CNRS
Isabelle BLOCH, Sorbonne Université
Bruno RAFFIN, Inria
Federico IURICICH, Clemson University
Christophe CALVIN, CEA
Pierre FORTIN, Université de Lille

2024-2025 Publications