Monitoring information flows : building, modeling and simulation of citation graphs, application to buzz detection.
The adoption of the Web as a massive information diffusion medium has considerably modified the mediatic environment. Information is suitable for the emergence of new phenomena that may impact political, strategical or economical decisions. These phenomena can be observed through information flows, the subject of this thesis.
The approach we choose to tackle the study of information flows is based on the study of citation graphs between information websites.
It is divided into three main problems: the building, the analysis and the generation of a citation graph. Then, an application of this work dedicated to buzz detection is presented.
In order to build a citation graph, we propose a crawling method adapted to the extraction of citation relations between web sources. The strategy is based on a thorough extraction of the source publications and goes with the cleaning of webpages in order to extract useful hyperlinks.
The analysis of the citation graph consists in a characterization method of the nodes of the graph as information sources with distinct behaviours.
It results in the identification of four publishing behaviours among the web sources, mainly differentiated from each other by the frequency of publication, the diversity of the cited sources and the ability to exploit the specificity of web publishing.
With respect to the problem of generation, motivated by the will to conduct experiments on various corpora, the objective is to generate realistic citation graphs, that is to say, graphs that reproduce the publishing behaviours identified on real data.
Thus, we propose a flexible and adaptive generation model for citation graphs that imitates the publishing process of an article on a website. This model is implemented in a simulation tool adapted to the study of information flows.
Finally, we propose an application of the generation method and the simulation tool to the study of buzz detection. To achieve this goal, we analyse the notion of buzz and propose a definition on which we base several formalisations adapted to available data. The interpretation of the conducted experiments leads us to associate the proposed detection methods to specific applications according to their semantics.
Defence : 12/07/2011 - 09h30 - Site Jussieu 25-26/105 Jury members : Marie-Aude Aufaure, Professeur, MAS - Ecole Centrale Paris [rapporteur]
Djamel Zighed, Professeur, ERIC - Université Lumière Lyon 2 [rapporteur]
Bernd Amann, Professeur, LIP6 - UPMC - Sorbonne Universités
Bernadette Bouchon-Meunier, Directeur de recherche, LIP6 - UPMC - Sorbonne Universités
Thomas Delavallade, Ingénieur, Thales Communications
Marie-Jeanne Lesot, Maître de conférences, LIP6 - UPMC - Sorbonne Universités
Camille Roth, Chargé de recherche, CAMS - CNRS/EHESS