Refresh Strategies and Online Change Estimation for Highly Dynamic Web Content
With the rapidly increasing number of sources and devices connected to the Internet and the growing success of the Web 2.0 services, the online available web content is getting more and more diverse and dynamic. In order to facilitate the efficient dissemination of the evolutive and often temporary information streams (news, messages, announcements), many web applications publish their most recent information items as RSS and Atom documents which are then collected and transformed by RSS aggregators like Google Reader or Yahoo! News.
Our research is placed in the context of content-based feed aggregation systems and is focused on the design of optimal refresh strategies for highly dynamic RSS feed sources. First, we introduce two quality measures specific to aggregation feeds which reflect the information completeness and average freshness of the result feeds. We propose a best-effort feed refresh strategy that achieves maximum aggregation quality compared with all other existing policies with the same average number of refreshes. We analyse the characteristics of a representative collection of real-world RSS feeds focusing on their temporal dimension. We study different online change estimation models and techniques and their integration with our refresh strategy. The presented methods have been implemented and tested against synthetic and real-world RSS feed data sets.
Defence : 09/20/2012 - 15h30 - Site Jussieu - Salle Jean-Louis Laurière - 25-26/101 Jury members : M. LAMARRE Philippe (INSA Lyon) [Rapporteur]
M. GROSS-AMBLARD David (Université de Rennes 1) [Rapporteur]
Mme. BERTI-EQUILLE Laure (IRD, Aix-Marseille Université)
M. CORD Matthieu (UPMC Paris 6)
M. AMANN Bernd (UPMC Paris 6)
M. ARTIERES Thierry (UPMC Paris 6)