LIP6 1997/034

Reports
Système d'Apprentissage par Auto-Observation. Application au Jeu de Go
T. Cazenave
249 pages - 12/15/1997- document en - http://www.lip6.fr/lip6/reports/1997/lip6.1997.034.ps.tar.gz - 1,221 Ko
Contact : Tristan.Cazenave (at) nulllip6.fr
Ancien Thème : APA

This thesis describes a system learning by self-observation, Introspect.
This system automatically creates, for a given domain, the knowledge to cut the search trees of this domain. It has mainly been applied to the game of Go, it learns to prove tactical Go theorems. Gogol, the Go program whose tactical part has been written by Introspect, is in the group of Go programs following the best four Go programs. It is the best learning Go program. The mechanisms described in this thesis have enabled to write in one year a Go program which figures well in international competitions, whereas the best Go programs have been improved during the last 15 or 20 years.
Introspect begins with a simple and short description of the goals it has to achieve and with a set of rules describing the direct consequences of its actions. Using some examples, it specializes automatically itself into another program which forecasts efficiently in the long term the consequences of its action on the achievement of the predefined goals.
Introspect uses a knowledge representation based on first order logic. It represents its knowledge differently when it learns and when it uses its learned rules.
When learning, it uses a general representation so as to learn general rules using few examples. It has a logic compilation mechanism which enables it to match the learned rules rapidly. Moreover, so as to observe itself, it solves problems with a knowledge representation it can manipulate. It interprets the rules and memorizes their firings. Using the problem solving trace, it can explain why it has deduced interesting facts. It obtains a list of explaining facts. This list of facts is generalized so as to create new rules, by replacing facts containing instanciated variables by predicates containing variables.
When using the learned knowledge, Introspect no more needs a general but costful knowledge representation, it partially evaluates some premises of the learned rules so as to match them more rapidly. It also compiles its rules into C++ programs so as to be more efficient.
In the domain of games, an extension of Combinatorial Game Theory to unknown values is defined. It enables to represent partial information about complex games and information on subtrees of the total search tree.
The programs written by Introspect can be easily parallelized. The learning method is general and can be applied to other domains than the game of Go. I give some examples of applications to the game of Abalone and to the management of a firm. In these domains too, Introspect replaces some tree search by the firing of some learned rules.

Keywords : Machine Learning, Self-observation, Generalization, Explanation, Compilation, Combinatorial Game Theory, Game of Go, Management
Publisher : Valerie.Mangin (at) nulllip6.fr

Mentions légales

Site map