• Home
  • Page : 'rapport_recherche' inconnue (menus.php)

LIP6 1997/003

  • Thesis
    Une approche de la catégorisation de textes par l'apprentissage symbolique
  • I. Moulinier
  • 192 pages - 04/30/1997- document en - http://www.lip6.fr/lip6/reports/1997/lip6.1997.003.ps.gz - 544 Ko
  • Contact : Isabelle.Moulinier (at) nulllip6.fr
  • Ancien Thème : APA
  • Our aim in this dissertation is to assess whether classification, especially symbolic machine learning, may be applied to the text categorization task, i.e. the content-based assignement of categories to documents. We focus on two complementary aspects. First, we investigate the extent to which learning techniques provide solutions for Information Retrieval problems, with an emphasis on document filtering. Then, we stress that textual data possess specificities outside the usual scope of machine learning applications. Indeed, such data involve thousands of exemples and tens of thousands of features. For the sake of computational efficiency, we introduce a feature selection stage prior to the learning process. We thus propose the SCAR reduction method, which takes into account the specificities of textual data. We compare the SCAR method with two state-of-the-art approaches. Evalution is carried out on a large-sized collection: the Reuters-22,173 corpus. Finally, we study the relationships between learning bias and textual data, in the context of filtering applications. We observe an overall equivalence between all learners. However, learning bias turns out to have a real impact on effectiveness, depending on the class of problems we identified.
  • Keywords : Machine learning, text categorization, feature reduction, information retrieval
  • Publisher : Valerie.Mangin (at) nulllip6.fr
Mentions légales
Site map