LIP6 1998/048

  Thesis
    Construction d'ontologies à partir de textes techniques - application aux systèmes documentaires
  H. Assadi
  286 pages - 10/19/1998
  • Our thesis deals with the problem of domain ontology acquisition from technical texts. We define the "annotated regional ontology": it consists of a conceptual network describing a particular domain. In this network, concepts are connected to linguistic expressions and to the corpus from which they were built. We propose a methodology and tools for the construction of regional ontologies from technical documentation. Our proposal is based on principles from the differential semantics theory of F. Rastier.
    Our methodology, called "Interactive Conceptual Analysis" (ICA) puts the technical documentation in the core of the knowledge acquisition process, and it uses text analysis tools. The ICA takes place in two stages: a preliminary elicitation stage, called "macroscopic analysis" and an iterative refinement stage, called "microscopic analysis". The ICA takes efficiently into account the human factor, represented by the expert / knowledge engineer team. Our methodology is fully corpus-based, it doesn't need any external conceptual resource.
    We developed support tools for the ICA: (1) lexiclass performs an automatic clustering of linguistic expressions in function of syntactic relations they hold in the text; (2) The tools of "conceptual structures generation" which use both the results of the preliminary morpho-syntactic analysis and the current version of the ontology to propose new candidate conceptual structures to be added to the ontology.
    Our thesis took place at the Research and Development Division of Electricité de France, within a project dealing with "Technical Documentation Consultation Systems" (TDCS). A TDCS is presented as a hypertext allowing a context-based access to the technical documentation dealing with a given domain, via two structured indexes, one representing the domain and the other the tasks. A preliminary knowledge engineering process is needed to build conceptual models before the indexes. Our methodology and tools have been used within a project of TDCS building in the domain of electrical network planning.
  • Keywords : knowledge engineering, natural language processing, knowledge representation, hypertext, semantics
