Supervision : Bernadette BOUCHON-MEUNIER
Co-supervision : LABROCHE Nicolas
Metadata for personalization and knowledge access
In recent years, several institutions and projects focused on developing educational resource repositories. One method proposed to improve the research on these repositories or on the web in general is to enrich the documents with metadata. They provide a set of information to better identify the resources: author, date of publication, title, etc. These metadata can describe, locate and facilitate the discovery and use of the resources. The need for metadata production methods that are more efficient and less expensive than those involving humans arises. Hence, our objective in this thesis is to provide methods for automatically extracting metadata from educational resources to minimize the human annotation effort.
In a first study, we explore the relationships between the different metadata fields. We use supervised learning methods as well as association rules generation methods. This study allows us to confirm the hypothesis that some metadata fields can contribute to the annotation of other fields. This approach is important since it is independent of the type and representation of the resource.
We then focus on the extraction of metadata from the content of the resources, such as title and author. Extraction methods are based on statistical learning, text analysis and text properties extraction techniques, such as the style and the layout. The proposed methods can give better results compared to those using the META tags from the source code of the HTML pages. In our experiment, we also evaluate the class imbalance influence on the classification results. To that aim, we compare the obtained results by applying some resampling techniques such as ENN, NCL and SMOTE.
In a last study, we propose a method to automatically describe educational resources with specific concepts. We distinguish two types of concepts: defined concepts and prerequisite concepts. This work has been subsequently used to achieve automatic scheduling of educational resources.
Defence : 05/03/2011 - 14h - Site Jussieu - Salle Jean-Louis Laurière - 25-26/101
Jury members :
Mme Bernadette Bouchon-Meunier, Directeur de recherche, CNRS
M Nicolas Labroche, Maître de conférence à l'UPMC
M Bernt Aman, Professeur à l'UPMC
Mme Florence Sèdes, Professeur à l'université Paul Sabatier, Toulouse [Rapporteur]
M Bruno Crémilleux, Professeur à l'université de Caen [Rapporteur]
M Charles Tijus, Professeur à l'université Paris 8 (Examinateur)
Mme Monique Baron, Maître de conférence à l'UPMC
- S. Changuel, N. Labroche, B. Bouchon‑Meunier : “Resources Sequencing Using Automatic Prerequisite--Outcome Annotation”, ACM Transactions on Intelligent Systems and Technology, vol. 6 (1), ACM Transactions on Intelligent Systems and Technology, pp. 6:1-6:30, (ACM) (2015)
- S. Changuel : “Métadonnées pour la personnalisation et l’accès à la connaissance”, thesis, defence 05/03/2011, supervision Bouchon-meunier, Bernadette, rapporteurs : LABROCHE Nicolas (2011)
- S. Changuel, N. Labroche : “Distinguishing defined concepts from prerequisite concepts in learning resources”, IEEE Symposium on Computational Intelligence and Data Mining, SSCI 2011 Conference, Paris, France, pp. 22-29, (IEEE) (2011)
- S. Changuel, N. Labroche, B. Bouchon‑Meunier : “Automatic Concept Type Identification from learning Resources”, 2010 International Joint Conference on Neural Networks, IJCNN, Barcelone, Spain, pp. 1-6, (IEEE) (2010)
- S. Changuel, N. Labroche, B. Bouchon‑Meunier : “Automatic Web Pages Author Extraction”, FQAS 2009 - 8th International Conference on Flexible Query Answering Systems, vol. 5822, Lecture Notes in Computer Science, Roskilde, Denmark, pp. 300-311, (Springer) (2009)
- S. Changuel, N. Labroche, B. Bouchon‑Meunier : “A General Learning Method for Automatic Title Extraction from HTML Pages”, MLDM 2009 - 6th International Conference on Machine Learning and Data Mining in Pattern Recognition, vol. 5632, Lecture Notes in Computer Science, Leipzig, Germany, pp. 704-718, (Springer) (2009)