Graphical Representations of Functions and Factored Markovian Decision Processes
In decision theoretic planning, the factored framework (Factored Markovian Decision Process, FMDP) has produced several efficient algorithms in order to resolve large sequential decision making under uncertainty problems. The efficiency of this algorithms relies on data structures such as decision trees or algebraïc decision diagrams (ADDs). These planification technics are exploited in Reinforcement Learning by the architecture SDyna in order to resolve large and unknown problems. However, state-of-the-art learning and planning algorithms used in SDyna require the problem to be specified uniquely using binary variables and/or to use improvable data structure in term of compactness. In this book, we present our research works that seek to elaborate and to use a new data structure more efficient and less restrictive, and to integrate it in a new instance of the SDyna architecture. In a first part, we present the state-of-the-art modeling tools used in the algorithms that tackle large sequential decision making under uncertainty problems. We detail the modeling using decision trees and ADDs. Then we introduce the Ordered and Reduced Graphical Representation of Function, a new data structure that we propose in this thesis to deal with the various problems concerning the ADDs. We demonstrate that ORGRFs improve on ADDs to model large problems. In a second part, we go over the resolution of large sequential decision under uncertainty problems using Dynamic Programming. After the introduction of the main algorithms, we see in details the factored alternative. We indicate the improvable points of these factored versions. We describe our new algorithm that improve on these points and exploit the ORGRFs previously introduced. In a last part, we speak about the use of FMDPs in Reinforcement Learning. Then we introduce a new algorithm to learn the new datastrcture we propose. Thanks to this new algorithm, a new instance of the SDyna architecture is proposed, based on the ORGRFs : the SPIMDDI instance. We test its efficiency on several standard problems from the litterature. Finally, we present some works around this new instance. We detail a new algorithm for efficient exploration-exploitation compromise management, aiming to simplify F-RMax. Then we speak about an application of SPIMDDI to the managements of units in a strategic real time video game.
Defence : 02/02/2016 - 10h - Site Jussieu 55-65/211 Jury members : M. Régis Sabbadin, Directrice de recherche, INRIA, Toulouse [Rapporteur]
M. Abdel-Illah Mouaddib, Professeur, GREYC, Université de Caen [Rapporteur]
M. Olivier Sigaud, Professeur, ISIR, UPMC
M. Florent Teichteil-Koenigsbuch, Docteur, Airbus Group Innovations, Toulouse
M. Christophe Gonzales, Professeur, LIP6, UPMC
M. Pierre-Henri Wuillemin, Maitre de Conférence, LIP6, UPMC