LIU Rutian
Supervision : Bernd AMANN
Co-supervision : GANÇARSKI Stéphane
Semantic Services for Assisting Users to Augment Data in the Context of Analytic Data Sources
The production of analytic datasets is a significant big data trend and has gone well beyond the scope of traditional IT-governed dataset development. Analytic datasets are now created by data scientists and data analysts using big data frameworks and agile data preparation tools. However, it still remains difficult for a data analyst to start from a dataset at hand and customize it with additional attributes coming from other existing datasets.
In this thesis, we aim to assist users who want to augment the schema of analytic datasets with attributes coming from other semantically related datasets. We introduce attribute graphs as a novel concise and natural way to represent literal functional dependencies over hierarchical dimension-level types. We extend schema complement in the context of analytic tables. We introduce several reduction operations to enforce schema complements when schema augmentation yields a row multiplication in the augmented dataset. We define formal quality criteria and algorithms to control the correctness, non-ambiguity, and completeness of generated schema augmentations and schema complements. We describe the implementation of our solution as a REST service within the SAP HANA platform and provide a detailed description of our algorithms. Finally, we evaluate the performance of our algorithms to compute unique identifiers in dimensions and analyze the effectiveness of our REST service using two application scenarios.
Defence : 06/24/2020
Jury members :
Mme. Bonifati Angela, Professeur, LIRIS, Université Lyon 1 [Rapporteur]
M. Maabout Sofian, Maître de conférences, LaBRI, Université de Bordeaux [Rapporteur]
M. Darmont Jérôme, Professeur, Laboratoire ERIC, Université Lyon 2
Mme. Lesot Marie-Jeanne, Maître de conférences, LIP6, Sorbonne Université
M. Amann Bernd, Professeur, LIP6, Sorbonne Université
M. Gançarski Stéphane, Maître de conférences, LIP6, Sorbonne Université
2019-2023 Publications
-
2023
- E. Simon, B. Amann, R. Liu, S. Gançarski : “Controlling the Correctness of Aggregation Operations During Sessions of Interactive Analytic Queries”, Journal of data and information quality, (ACM) (2023)
-
2022
- E. Simon, B. Amann, R. Liu, S. Gançarski : “Controlling the Correctness of Aggregation Operations During Sessions of Interactive Analytic Queries”, (2022)
-
2020
- R. Liu : “Semantic Services for Assisting Users to Augment Data in the Context of Analytic Data Sources”, thesis, phd defence 06/24/2020, supervision Amann, Bernd, co-supervision : Gançarski, Stéphane (2020)
- R. Liu, E. Simon, B. Amann, S. Gançarski : “Discovering and merging related analytic datasets”, Information Systems, vol. 91, pp. 101495, (Elsevier) (2020)
-
2019
- R. Liu, E. Simon, B. Amann, S. Gançarski : “Augmenting Analytic Datasets Using Natural and Aggregate-based Schema Complements”, Post-actes BDA 2019 -Gestion de Données Principes Technologies et Applications, Lyon, France (2019)