Projets BD

Équipe : BD

experimaestro - Planification et gestion d'expériences informatiques

Experimaestro is an experiment manager based on a server that contains a job scheduler (job dependencies, locking mechanisms) and a framework to describe the experiments with JavaScript or in Java.

Project Leader : Benajmin PIWOWARSKI

01/01/2016

https://github.com/bpiwowar/experimaestro

http://www-bd.lip6.fr/wiki/en/site/recherche/logiciels/sparqlwithspark

SPARQL on Spark - SPARQL query processing with Apache Spark

A common way to achieve scalability for processing SPARQL queries over large RDF data sets is to choose map-reduce frameworks like Hadoop or Spark. Processing complex SPARQL queries generating large join plans over distributed data partitions is a major challenge in these shared nothing architectures. In this article we are particularly interested in two representative distributed join algorithms, partitioned join and broadcast join, which are deployed in map-reduce frameworks for the evaluation of complex distributed graph pattern join plans. We compare five SPARQL graph pattern evaluation implementations on top of Apache Spark to illustrate the importance of cautiously choosing the physical data storage layer and of the possibility to use both join algorithms to take account of the existing predefined data partitionings. Our experimentations with different SPARQL benchmarks over real-world and synthetic workloads emphasize that hybrid join plans introduce more flexibility and often can achieve better performance than join plans using a single kind of join implementation.

Project Leader : Hubert NAACKE

01/01/2015

http://www-bd.lip6.fr/wiki/en/site/recherche/logiciels/sparqlwithspark

BOM - Block-o-Matic!

Block-o-Matic est un algorithme de segmentation de pages Web basé sur une approche hybride pour la segmentation de documents numérisés et la segmentation de contenu à base visuelle. Une page Web est associée à trois structures: l'arborescence DOM, la structure de contenu et la structure logique. L'arborescence DOM représente les éléments HTML d'une page, la structure géométrique organise le contenu en fonction d'une catégorie et de sa géométrie et enfin la structure logique est le résultat de la cartographie de la structure du contenu sur la base du sens humain. Le processus de segmentation est divisé en trois phases: l'analyse, la compréhension et la reconstruction d'une page Web. Une méthode d'évaluation est proposée afin d'effectuer l'évaluation des segmentations de pages Web sur la base d'une vérité de terrain de 400 pages classées en 16 catégories. Un ensemble de mesures est présenté en fonction des propriétés géométriques des blocs. Des résultats satisfaisants sont obtenus en comparaison avec d'autres algorithmes suivant la même approche.

Project Leader : Andrès SANOJA

01/01/2012

http://www-poleia.lip6.fr/~sanojaa/BOM/