CARON Clément

PhD student at Sorbonne University
Team : BD
https://lip6.fr/Clement.Caron

Supervision : Bernd AMANN

Co-supervision : CONSTANTIN Camelia

Provenance and Quality in Data Oriented Workflows : Application to the WebLab Platform

The WebLab platform is an application used to define and execute media-mining workflows. It is an open source platform, developed by the IPCC section of Airbus Defence and Space, for the integration of external components. A designer can create complex media-mining workflows using components, whose operation is not always known (black-boxes services). These complex workflows can lead to a problem of data quality, however, and before this work, no tool existed to analyse and improve the quality of WebLab workflows.
To deal with black-box services, we choose to tackle this quality problem with a non-intrusive approach: we enhance the definition of the WebLab workflow with provenance and quality propagation rules. Provenance rules generate fine-grained data dependency links between data and services after the execution of a WebLab workflow. Then the quality propagation rules use these links to reason on the influence that the quality of the data used by a component has on the quality of the output data.
The contributions of this thesis are:

  1. a provenance links generation model based on data dependency rules;
  2. a propagation model for quality values over a provenance graph;
  3. an extension of the WebLab architecture with the implementation of our two models, and of a user interface.

Defence : 11/03/2015

Jury members :

VIDAL Maria Esther, Université Simon Bolivar, Venezuela (PR, CV attaché) [Rapporteur]
GRIGORI Daniela, PR/HDR Université de Dauphine [Rapporteur]
VARGAS-SOLAR Genoveva, CR CNRS/HDR LIG Grenoble
MARSALA Christophe, PR UPMC (EDITE)
AMANN Bernd, PR UPMC (EDITE)
CONSTANTIN Camelia, MCF UPMC (EDITE)

Departure date : 12/31/2015

2013-2015 Publications