Team : MALIRE
Departure date : 09/30/2012
Supervision : Bernadette BOUCHON-MEUNIER Co-supervision : LABROCHE Nicolas
Metadata for personalization and knowledge access
In recent years, several institutions and projects focused on developing educational resource repositories. One method proposed to improve the research on these repositories or on the web in general is to enrich the documents with metadata. They provide a set of information to better identify the resources: author, date of publication, title, etc. These metadata can describe, locate and facilitate the discovery and use of the resources. The need for metadata production methods that are more efficient and less expensive than those involving humans arises. Hence, our objective in this thesis is to provide methods for automatically extracting metadata from educational resources to minimize the human annotation effort.
In a first study, we explore the relationships between the different metadata fields. We use supervised learning methods as well as association rules generation methods. This study allows us to confirm the hypothesis that some metadata fields can contribute to the annotation of other fields. This approach is important since it is independent of the type and representation of the resource. We then focus on the extraction of metadata from the content of the resources, such as title and author. Extraction methods are based on statistical learning, text analysis and text properties extraction techniques, such as the style and the layout. The proposed methods can give better results compared to those using the META tags from the source code of the HTML pages. In our experiment, we also evaluate the class imbalance influence on the classification results. To that aim, we compare the obtained results by applying some resampling techniques such as ENN, NCL and SMOTE. In a last study, we propose a method to automatically describe educational resources with specific concepts. We distinguish two types of concepts: defined concepts and prerequisite concepts. This work has been subsequently used to achieve automatic scheduling of educational resources.
Defence : 05/03/2011 - 14h - Site Jussieu - Salle Jean-Louis Laurière - 25-26/101 Jury members : Mme Bernadette Bouchon-Meunier, Directeur de recherche, CNRS
M Nicolas Labroche, Maître de conférence à l'UPMC
M Bernt Aman, Professeur à l'UPMC
Mme Florence Sèdes, Professeur à l'université Paul Sabatier, Toulouse [Rapporteur]
M Bruno Crémilleux, Professeur à l'université de Caen [Rapporteur]
M Charles Tijus, Professeur à l'université Paris 8 (Examinateur)
Mme Monique Baron, Maître de conférence à l'UPMC
S. Changuel, N. Labroche, B. Bouchon‑Meunier : “Automatic Web Pages Author Extraction”, FQAS 2009 - 8th International Conference on Flexible Query Answering Systems, vol. 5822, Lecture Notes in Computer Science, Roskilde, Denmark, pp. 300-311, (Springer) (2009)