Semantic Services for Assisting Users to Augment Data in the Context of Analytic Data Sources
The production of analytic datasets is a significant big data trend and has gone well beyond the scope of traditional IT-governed dataset development. Analytic datasets are now created by data scientists and data analysts using big data frameworks and agile data preparation tools. However, it still remains difficult for a data analyst to start from a dataset at hand and customize it with additional attributes coming from other existing datasets. In this thesis, we aim to assist users who want to augment the schema of analytic datasets with attributes coming from other semantically related datasets. We introduce attribute graphs as a novel concise and natural way to represent literal functional dependencies over hierarchical dimension-level types. We extend schema complement in the context of analytic tables. We introduce several reduction operations to enforce schema complements when schema augmentation yields a row multiplication in the augmented dataset. We define formal quality criteria and algorithms to control the correctness, non-ambiguity, and completeness of generated schema augmentations and schema complements. We describe the implementation of our solution as a REST service within the SAP HANA platform and provide a detailed description of our algorithms. Finally, we evaluate the performance of our algorithms to compute unique identifiers in dimensions and analyze the effectiveness of our REST service using two application scenarios.
Defence : 06/24/2020 - 11h - Visioconférence Jury members : Mme. Bonifati Angela, Professeur, LIRIS, Université Lyon 1 [Rapporteur]
M. Maabout Sofian, Maître de conférences, LaBRI, Université de Bordeaux [Rapporteur]
M. Darmont Jérôme, Professeur, Laboratoire ERIC, Université Lyon 2
Mme. Lesot Marie-Jeanne, Maître de conférences, LIP6, Sorbonne Université
M. Amann Bernd, Professeur, LIP6, Sorbonne Université
M. Gançarski Stéphane, Maître de conférences, LIP6, Sorbonne Université