PhD graduated
Team : RO
Departure date : 09/30/2022

Supervision : Christoph DĂśRR

Co-supervision : COHEN-ADDAD Vincent

Approximation Algorithms and Sketches for Clustering

This thesis presents contributions to the theoretical study of clustering problems. The broad objective of these problems is to partition a data set into groups, such that data in the same group are similar. The k-medians and k-means problems are common ways of formally modeling this problem, which we focus on in this thesis.
In the first part of the thesis, we show the first linear time approximation scheme when the input is in a constant dimensional Euclidean space (or, more generally, doubling space), i.e., a very fast algorithm that computes a very good approximation of the optimal solution. We extend the techniques used to treat the problem from the point of view of differential privacy.
In the second part, we aim at designing algorithm to compute simplified representations of the input, which preserve the structure of the problem: we introduce several techniques to reduce the number of input data, while ensuring that solving the problem after the reduction is almost equivalent to solving it on the original set. We also show that in several cases, our techniques are optimal. In the particular case of Euclidean spaces, a different way to simplify the input is to reduce the dimension (preserving the structure of the input in the same way). We present the first deterministic algorithm to achieve near-optimal dimension. Finally, we show how to use those compression schemes in order to answer fast statistical queries.

Defence : 09/13/2022

Jury members :

Robert Krauthgamer, Weizmann Institute [Rapporteur]
Laurent Viennot, INRIA [Rapporteur]
Cristina Bazgan, Université Paris Dauphine
Monika Henzinger, Universität Wien
Chris Schwiegelshohn, Aarhus University
Vincent Cohen-Addad, Google
Christoph Dürr, Sorbonne Université

Departure date : 09/30/2022

2019-2022 Publications