Кандидат наук
Подразделение : BD
Окончание контракта : 30.06.2014
Научны(е)й руководител(и)ь : Anne DOUCET
Со-руководитель : GANÇARSKI Stéphane, PIWOWARSKI Benjamin

Access to web archives: Querying, Navigating and Optimizing

An important amount of the world’s cultural and intellectual knowledge is being created on the web everyday. However, the web has en ephemeral nature e.g. new information replaces older information constantly without any notification, leaving a significant gap in our knowledge. That’s why archiving the web has become a cultural necessity to preserve the knowledge for the next generations. However, the success of any web archive will be measured by the means of access it provides; as it is the case today on the real web. Our research is placed in the context of access to web archives and studies different research problems related to this issue. These research problems are grouped into two main topics: Access Methods and Optimization of Access. For access methods, we first propose a conceptual model, as well as operators to manipulate them, as the basis of a query language for web archives to better satisfy user information needs. Next, a new navigation method for web archives that takes the coherence of pages into account is introduced. In the context of access optimization, we propose a change detection algorithm to understand and to quantify what happened (and thus changed) between two versions of a web page. Then, we study the behavior of different static index pruning methods with temporal queries before proposing a new diversification-based static index pruning method and showing its application to temporal collections and a substantial gain in performance.
Защита диссертаций : 11.10.2013 - 11h30 - Site Jussieu 25-26/105
Члены жюри :
Sihem AMER-YAHIA CNRS / LIG [Rapporteur]
Arjen P. DE VRIES Université Delft [Rapporteur]
François BANCILHON DataPublica
Matthieu CORD UPMC Paris 6
David GROSS-AMBLARD Université de Rennes 1
Pierre SENELLART Télécom ParisTech
Anne DOUCET UPMC Paris 6
Stéphane GANÇARSKI UPMC Paris 6
Benjamin PIWOWARSKI Encadrant CNRS / LIP6

Публикации 2010-2013

