Supervision : Michèle SORIA
Co-supervision : FLAJOLET Philippe
Probabilistic Algorithms for Data Streaming and Random Generation
This thesis examines two types of problems---that of analyzing large quantities of real data, and the complimentary problem of creating large quantities of (random) data---using a set of common tools: analytic combinatorics (and generating functions), enumeration, probabilities, probabilistic algorithms, and in particular the Boltzmann method for random generation.
First, we study several data streaming algorithms: algorithms which are able to extract information from large streams of data using very limited ressources (in particular, memory and processing time per element of the stream). One of our main contributions is to provide a full analysis of an optimal algorithm to estimate the number of distinct elements in a stream, a problem which has garnered a lot of research in the past. Our second contribution, a work in common with researchers from UPC in Barcelona, is to introduce a completely novel type of estimator for the number of distinct elements, which uses statistics on permutations.
The second part focuses on the random generation both of laws and combinatorial object. We introduce the first optimal algorithm for the random generation of the discrete uniform law, which is one of the most wildly used building blocks in computational simulations. We also, with Olivier Bodini, introduce an extension of the Boltzmann method to randomly generate a new kind of objects belonging to multiplicative combinatorics, which are an underexplored part of combinatorics with ties to analytic number theory. Finally we present ongoing work with Olivier Bodini on improving the practicality of the Boltzmann method.
Defence : 12/13/2012 - 10h - Site Jussieu - Salle Jean-Louis Laurière - 25-26/101
Jury members :
BODINI Olivier (Université Paris Nord)
MAGNIEN Clémence (Université Pierre et Marie Curie)
MARCKERT Jean-François (Université Bordeaux 1)
NICAUD Cyril (Université Marne-La-Vallée) [Rapporteur]
ROSSIN Dominique (École Polytechnique) [Rapporteur]
SEDGEWICK Robert (Princeton University)
SORIA Michèle (Université Pierre et Marie Curie)
VALLÉE Brigitte (Université de Caen)
- J. Lumbroso : “Algorithmes probabilistes pour la fouille quantitative de données et la génération aléatoire”, thesis, defence 12/13/2012, supervision Soria, Michèle, rapporteurs : FLAJOLET Philippe (2012)
- Ah. Helmi, J. Lumbroso, C. Martínez, A. Viola : “Data Streams as Random Permutations: the Distinct Element Problem”, 23rd International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA'12), vol. DMTCS Proceedings vol. AQ, 23rd Intern. Meeting on Probabilistic, Combinatorial, and Asymptotic Methods for the Analysis of Algorithms (AofA'12), DMTCS Proceedings, Montreal, Canada, pp. 323-338, (Discrete Mathematics and Theoretical Computer Science) (2012)
- O. Bodini, J. Lumbroso : “Dirichlet Random Sampling for Multiplicative Combinatorial Structures”, Meeting on Analytic Algorithmics and Combinatorics (ANALCO) 2012, Kyoto, Japan, pp. 92-106, (SIAM) (2012)
- J. Lumbroso : “An optimal cardinality estimation algorithm based on order statistics and its full analysis”, 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA'10), vol. DMTCS Proceedings vol. AM, 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA'10), DMTCS Proceedings, Vienna, Austria, pp. 489-504, (Discrete Mathematics and Theoretical Computer Science) (2010)
- O. Bodini, J. Lumbroso : “Optimal Partial Tiling of Manhattan Polyominoes”, DGCI 2009 - 15th IAPR International Conference on Discrete Geometry for Computer Imagery, vol. 5810, Lecture Notes in Computer Science, Montreal, Canada, pp. 79-91, (Springer) (2009)