- Computer Science Laboratory

PIWOWARSKI Benjamin

Habilitation
Team : MLIA

Apprentissage de Représentations et Accès à l'Information

The field of information access is of vital importance in our modern societies, since most of the information is now accessible in a digital form and is increasing in volume at a fast pace. Techniques from this domain allow to query this information and to access it in an appropriate form (e.g. summary, list of documents, etc). Data representation for many different entities, such as a query from a user, the text of a document, an image, is key to the success of information access models based on machine learning techniques. Throughout the years, there has been a shift from hand-crafted models of data to automated methods for learning such an appropriate representation of data. The latter, i.e. the problem of how to represent raw data, has undergone a revolution in the last ten years, driven by deep learning. Such works have developed a series of models and techniques to represent complex data as vectors in a vector space, empowering the notion of distance/angle in such spaces to represent semantic relationships between the entities. The work I present in this manuscript focuses on the problem of data representation in the context of information access. In particular, I present works dealing with (1) probabilistic representations of textual and graph data; and (2) the problem of grounding textual representation in the “real” world.


Phd defence : 10/23/2020

Jury members :

Éric Gaussier — Université Grenoble Alpes, France [rapporteur]
Jian-Yun Nie — University of Montreal, Canada [rapporteur]
Fabrizio Sebastiani — Italian National Council of Research, Italy [rapporteur]
Eneko Agirre — University of Basque Country, Paris
Matthieu Cord — Sorbonne Université, France
Mounia Lalmas — Spotify Research, UK