CREUS Jordi
رئاسـة البـحث : Bernd AMANN
تأطـير مـشـترك : VODISLAV Dan
ROSES: A Continuous Query Processor for Large-Scale Content-Based RSS Feed Aggregation
RSS and Atom are generally less known than the HTML web format, but they are omnipresent in many modern web applications for publishing highly dynamic web contents. Nowadays, news sites publish thousands of RSS/Atom feeds, often organized into general topics like politics, economy, sports, culture, etc. Weblog and microblogging systems like Twitter use the RSS publication format, and even more general social media like Facebook produce an RSS feed for every user and trending topic. This vast number of continuous data-sources can be accessed by using general-purpose feed aggregator applications like Google Reader, desktop clients like Firefox or Thunderbird and by RSS mash-up applications like Yahoo! pipes, Netvibes or Google News. Today, RSS and Atom feeds represent a huge stream of structured text data which potential is still not fully exploited. In this thesis, we first present ROSES –Really Open Simple and Efficient Syndication–, a data model and continuous query language for RSS/Atom feeds. ROSES allows users to create new personalized feeds from existing real-world feeds through a simple, yet complete, declarative query language and algebra. The ROSES algebra has been implemented in a complete scalable prototype system capable of handling and processing ROSES feed aggregation queries. The query engine has been designed in order to scale in terms of the number of queries. In particular, it implements a new cost-based multi-query optimization approach based on query normalization and shared filter factorization. We propose two different factorization algorithms: (i) STA, an adaption of an existing approximate algorithm for finding minimal directed Steiner trees [CCC+98a], and (ii) VCA, a greedy approximation algorithm based on efficient heuristics outperforming the previous one with respect to optimization cost. Our optimization approach has been validated by extensive experimental evaluation on real world data collections.
Keywords: RSS, Atom, Data Stream Management Systems, publish/subscribe, continuous query processing, multi-query optimization, shared filter factorization, Steiner tree problem
مناقـشـة مـذكـرة : 07/12/2012
أعـضاء لجنة المناقـشة :
Mme Ioana MANOLESCU, Directeur de Recherche à l'Inria, [Rapporteur]
M. Jean-Marc PETIT, Professeur des Universités à l’INSA Lyon, [Rapporteur]
Mme Anne DOUCET, Professeur des Universités à l’UPMC
Mme Béatrice FINANCE, Maître de Conférences à l’UVSQ (HDR)
M. Bernd AMANN, Professeur des Universités à l’UPMC
M. Dan VODISLAV, Professeur des Universités à l’UCP
إصدارات 2009-2012
-
2012
- J. Creus : “ROSES : Un moteur de requêtes continues pour l’aggrégation de flux RSS à large échelle”, أطروحة, مناقـشـة مـذكـرة 07/12/2012, رئاسـة البـحث Amann, Bernd, تأطـير مـشـترك : Vodislav, Dan (2012)
- J. Creus, B. Amann, V. Christophides, N. Travers, D. Vodislav : “RoSeS, un moteur de requêtes continues pour la syndication RSS à large échelle”, Revue des Sciences et Technologies de l'Information - Série ISI : Ingénierie des Systèmes d'Information, vol. 17 (5), pp. 57-85, (Lavoisier) (2012)
-
2011
- J. Creus, B. Amann, V. Christophides, N. Travers, D. Vodislav : “Optimizing large collections of continuous content-based RSS aggregation queries”, 27es journées Bases de Données Avancées (BDA 2011), Rabat, Morocco, pp. 1-21 (2011)
- J. Creus, B. Amann, N. Travers, D. Vodislav : “RoSeS: a continuous query processor for large-scale RSS filtering and aggregation”, CIKM '11 - 20th ACM international conference on Information and knowledge management, Glasgow, United Kingdom, pp. 2549-2552, (ACM) (2011)
- J. Creus, B. Amann, N. Travers, D. Vodislav : “RoSeS: A Continuous Content-Based Query Engine for RSS Feeds”, DEXA - Database and Expert Systems Applications, vol. 6861, Lecture Notes in Computer Science, Toulouse, France, pp. 203-218, (Springer) (2011)
-
2010
- G. Hochard, Z. Lacroix, B. Amann, J. Creus : “A Semantic Map of RSS Feeds to support Discovery”, 3rd International Workshop on REsource Discovery, vol. 6799, Lecture Notes in Computer Science, Paris, France, pp. 122-133, (Springer) (2010)
- J. Creus, B. Amann, N. Travers, D. Vodislav : “RoSeS : Un agrégateur de flux avancé”, BDA'10 - Bases de Données Avancées, Toulouse, France, pp. 1-6 (2010)
-
2009
- C. Constantin, J. Creus, C. Du Mouza, R. Horincar, N. Travers : “D2.1 State-of-the art of XML data stream models, Livrable 2.1 ANR RoSeS”, (2009)
- D. Vodislav, B. Amann, J. Creus, N. Travers : “Modèle et Algèbre ROSES. Livrable D2.2 ANR RoSeS”, (2009)