Team : ACASA
Departure date : 10/30/2014
: Jean-Gabriel GANASCIA
On the Enumeration of Pseudo-Intents : Choosing the Order and Extending to Partial Implications
This thesis deals with the problem of the computation of implications, which are regularities of the form "when there is A there is B", in datasets composed of objects described by attributes. Computing all the implications can be viewed as the enumeration of sets of attributes called pseudo-intents. It is known that pseudo-intents cannot be enumerated with a polynomial delay in the lectic order but no such result exists for other orders. While some current algorithms do not enumerate in the lectic order, none of them have a polynomial delay. The lack of knowledge on other orders leaves the possibility for a polynomial-delay algorithm to exist and finding it would be an important and useful step. Unfortunately, current algorithms do not allow us to choose the order so studying its influence on the complexity of the enumeration is harder than necessary. We thus take a first step towards a better understanding of the role of the order in the enumeration of pseudo-intents by providing an algorithm that can enumerate pseudo-intents in any order that respects the inclusion relation.
In the first part, we explain and study the properties of our algorithm. As with all enumeration algorithms, the first problem is to construct all the sets only once. We propose to use a spanning tree, itself based on the lectic order, to avoid multiple constructions of a same set. The use of this spanning tree instead of the classic lectic order increases the space complexity but offers much more flexibility in the enumeration order. We show that, compared to the well-known Next Closure algorithm, ours performs less logical closures on sparse contexts and more once the average number of attributes per object exceeds 30%. The space complexity of the algorithm is also empirically studied and we show that different orders behave differently with the lectic order being the most efficient. We postulate that the efficiency of an order is function of its distance to the order used in the canonicity test.
In the second part, we take an interest in the computation of implications in a more complex setting : relational data. In these contexts, objects are represented by both attributes and binary relations with other objects. The need to represent relation information causes an exponential increase in the number of attributes so naive algorithms become unusable extremely fast. We propose a modification of our algorithm that enumerates the pseudo-intents of contexts in which relational information is represented by attributes. A quick study of the type of relational information that can be considered is provided. We use the example of description logics as a framework for expressing relational data.
In the third part, we extend our work to the more general domain of association rules. Association rules are regularities of the form ``when there is A there is B with x% certainty'' so implications are association rules with 100% certainty. Our algorithm already computes a basis for implications so we propose a very simple modification and demonstrate that it can compute the Luxenburger basis of a context along with the Duquenne-Guigues basis. This effectively allows our algorithm to compute a basis for association rules that is of minimal cardinality.
: 09/30/2014 - 14h30 - Site Jussieu 25-26/105Jury members
Karell Bertet, Maitre de conférences, Université de la Rochelle [Rapporteur]
Henry Soldano, Maitre de conférences, Université Paris-Nord [Rapporteur]
Jean-Gabriel Ganascia, Professeur, UPMC
Alain Gély, Maitre de conférences, Université de Lorraine
Annick Valibouze, Professeur, UPMC