Supervision : Bernd AMANN
Co-supervision : CONSTANTIN Camelia
Provenance and Quality in Data Oriented Workflows : Application to the WebLab Platform
The WebLab platform is an application used to define and execute media-mining workflows. It is an open source platform, developed by the IPCC section of Airbus Defence and Space, for the integration of external components. A designer can create complex media-mining workflows using components, whose operation is not always known (black-boxes services). These complex workflows can lead to a problem of data quality, however, and before this work, no tool existed to analyse and improve the quality of WebLab workflows.
To deal with black-box services, we choose to tackle this quality problem with a non-intrusive approach: we enhance the definition of the WebLab workflow with provenance and quality propagation rules. Provenance rules generate fine-grained data dependency links between data and services after the execution of a WebLab workflow. Then the quality propagation rules use these links to reason on the influence that the quality of the data used by a component has on the quality of the output data.
The contributions of this thesis are:
- a provenance links generation model based on data dependency rules;
- a propagation model for quality values over a provenance graph;
- an extension of the WebLab architecture with the implementation of our two models, and of a user interface.
Defence : 11/03/2015
Jury members :
VIDAL Maria Esther, Université Simon Bolivar, Venezuela (PR, CV attaché) [Rapporteur]
GRIGORI Daniela, PR/HDR Université de Dauphine [Rapporteur]
VARGAS-SOLAR Genoveva, CR CNRS/HDR LIG Grenoble
MARSALA Christophe, PR UPMC (EDITE)
AMANN Bernd, PR UPMC (EDITE)
CONSTANTIN Camelia, MCF UPMC (EDITE)
- C. Caron : “Provenance et Qualité dans les Workflows Orientés Données : Application à la Plateforme WebLab”, thesis, defence 11/03/2015, supervision Amann, Bernd, co-supervision : Constantin, Camelia (2015)
- C. Caron, B. Amann, C. Constantin, P. Giroux, A. Santanchè : “Provenance-Based Quality Assessment and Inference in Data-Centric Workflow Executions”, OTM 2014 Conferences - Confederated International Conferences: CoopIS, and ODBASE 2014, vol. 8841, Lecture Notes in Computer Science, Amantea, Italy, pp. 130-147 (2014)
- C. Caron, B. Amann, C. Constantin, P. Giroux : “WePIGE: The WebLab Provenance Information Generator and Explorer”, 17th International Conference on Extending Database Technology, EDBT 2014, Athens, Greece, pp. 664-667 (2014)
- B. Amann, C. Constantin, C. Caron, P. Giroux : “WebLab PROV: Computing fine-grained provenance links for XML artifacts”, BIGProv'13 Workshop (in conjunction with EDBT/ICDT), Gênes, Italy, pp. 298-306, (ACM) (2013)