PhD graduated
Team : MLIA
    Sorbonne Université - LIP6
    Boîte courrier 169
    Couloir 26-00, Étage 5, Bureau 524
    4 place Jussieu
    75252 PARIS CEDEX 05

Tel: +33 1 44 27 48 44, Thomas.Gerald (at) nulllip6.fr

Supervision : Patrick GALLINARI

Co-supervision : BASKIOTIS Nicolas

Representation Learning for large scale classification

The past decades have seen the rise of new technologies that simplify information sharing. Today, a large part of the data is accessible to a large number of users. In this thesis, we propose to study the problems of document annotations that lead to many applications such as information retrievals ones. We will be interested in the field of extreme classification which characterizes the task of automatic classification when the number of labels is important. Many difficulties arise from the size and complexity of considered data: the prediction time, storage and relevance of the prediction are the most representative. Recent research dealing with this issue is now based on three types of approaches: ensembling approaches learning a large set of simple classifiers; “hierarchical” methods organizing a structure of simple classifiers; approaches by representations plunging documents into small spaces. In this thesis, we will study the approaches of classification by representation. Through our contributions, we will propose different approaches to overcome the problems of prediction time and representation space structure. First, we will study discrete representations with the objective to find the best possible representations while ensuring a low inference time. In a second step, we will consider hyperbolic representations in order to take advantage of the qualities of this space for structured data.

Defence : 11/17/2020 - 14h - https://zoom.us/j/95444893316?pwd=d0tIVVJzM3k5am5KQ2hHNXAyRGRadz09

Jury members :

Massih Reza Amini (Professeur à l'Université Grenoble Alpes, AMA) [Rapporteur]
Pascale Kuntz-Cosperec (Professeure à Polytech Nantes, Laboratoire des Sciences du Numérique de Nantes) [Rapporteur]
Patrick Gallinari (LIP6, MLIA)
Nicolas Baskiotis (LIP6, MLIA)
Julien Tierny (Chargé de Recherche à Sorbonne université, LIP6, équipe APR)
Xiangliang Zhang (Associate Professor à King Abdullah University of Science and Technology, CEMSE)

2020 Publications

  • 2020
    • Th. Gérald : “Apprentissage de Représentation pour la classification large échelle”, thesis, defence 11/17/2020, supervision Gallinari, Patrick, rapporteurs : BASKIOTIS Nicolas (2020)
    • N. Miolane, N. Guigui, A. Le Brigant, J. Mathe, B. Hou, Y. Thanwerdas, S. Heyder, O. Peltre, N. Koep, H. Zaatiti, H. Hajri, Y. Cabanes, Th. Gerald, P. Chauchat, Ch. Shewmake, D. Brooks, B. Kainz, C. Donnat, S. Holmes, X. Pennec : “Geomstats: A Python Package for Riemannian Geometry in Machine Learning”, Journal of Machine Learning Research, vol. 21 (223), pp. 1-9, (Microtome Publishing) (2020)
    • N. Miolane, N. Guigui, H. Zaatiti, Ch. Shewmake, H. Hajri, D. Brooks, A. Le Brigant, J. Mathe, B. Hou, Y. Thanwerdas, S. Heyder, O. Peltre, N. Koep, Y. Cabanes, Th. Gerald, P. Chauchat, B. Kainz, C. Donnat, S. Holmes, X. Pennec : “Introduction to Geometric Learning in Python with Geomstats”, SciPy 2020 - 19th Python in Science Conference, Austin, Texas, United States, pp. 48-57 (2020)