Team : LFI
Departure date : 08/31/2015
Supervision : Marie-Jeanne LESOT Co-supervision : RIFQI Maria
Data mining based on gradual itemsets extraction : contextualization and enrichment?
This thesis's works belongs to the framework of knowledge extraction and data mining applied to numerical or fuzzy data in order to extract linguistic summaries in the form of gradual itemsets: the latter express correlation between attribute values of the form « the more the temperature increases, the more the pressure increases ». Our goal is to contextualize and enrich these gradual itemsets by proposing different types of additional information so as to increase their quality and provide a better interpretation. We propose four types of new itemsets: first of all, reinforced gradual itemsets, in the case of fuzzy data, perform a contextualization by integrating additional attributes linguistically introduced by the expression « all the more ». They can be illustrated by the example « the more the temperature decreases, the more the volume of air decreases, all the more its density increases ». Reinforcement is interpreted as increased validity of the gradual itemset. In addition, we study the extension of the concept of reinforcement to association rules, discussing their possible interpretations and showing their limited contribution. We then propose to process the contradictory itemsets that arise for example in the case of simultaneous extraction of « the more the temperature increases, the more the humidity increases »
and « the more the temperature increases, the less the humidity decreases ». To manage these contradictions, we define a constrained variant of the gradual itemset support, which, in particular, does not only depend on the considered itemset, but also on its potential contradictors. We also propose two extraction methods: the first one consists in filtering, after all itemsets have been generated, and the second one integrates the filtering process within the generation step. We introduce characterized gradual itemsets, defined by adding a clause linguistically introduced by the expression « especially if » that can be illustrated by a sentence such as « the more the temperature decreases, the more the humidity decreases, especially if the temperature varies in [0, 10] °C »: the additional clause precise value ranges on which the validity of the itemset is increased. We formalize the quality of this enrichment as a trade-off between two constraints imposed to identified interval, namely a high validity and a high size, as well as an extension taking into account the data density. We propose a method to automatically extract characterized gradual based on appropriate mathematical morphology tools and the definition of an appropriate filter and transcription. We define also accelerated gradual itemsets that quantify the correlations between the attribute values and contextualize the gradual itemset through the linguistic expression « quickly », for example « the more the temperature increases, the more quickly the humidity increases ». We propose an interpretation as convexity constraint imposed on the relation between the attributes composing a considered gradual itemset that we model as an additional constraint covariation, which is expressed in the same formalism as constraints of classical gradual itemsets.
We propose and study two extraction methods, by filtering a posteriori and integrating in the generation process. For each of the four proposed contextualizations, we study and formalize the semantics and desired interpretation. We then propose quality measures to evaluate the validity of the given enriched itemset. We also propose and implement efficient algorithms for the automatic extraction of itemsets that maximize the proposed quality criteria. Finally, we carry out experimental studies both on artificial data, to study and analyze the behavior of the proposed approaches, and on real data to show the relevance of the proposed approaches and the interest of extracted enriched itemsets.
The experimental results for each approach allow to validate the contribution of the different proposed gradual itemsets and their associated interpretation.
Defence : 07/09/2014 - 14h - Site Jussieu 25-26/105 Jury members : Anne Laurent,Professeur LIRMM - Université Montpellier 2, [Rapporteur]
Olivier Pivert,ProfesseurENSSAT - Université Rennes 1, [Rapporteur]
Bernd Amann, Professeur LIP6-UPMC
Sadok Ben Yahia, Professeur URPAH-Université des Sciences de Tunis
Marie-Jeanne Lesot, Maître de Conférences[HDR]LIP6-UPMC
Maria Rifqi, Maître de Conférences[HDR]LEMMA, Université Paris 2