PhD graduated
Team : MLIA
Departure date : 04/04/2021

Supervision : Matthieu CORD

Co-supervision : THOME Nicolas

Deep multimodal learning for vision and language processing

Digital technologies have become instrumental in transforming our society. Recent statistical methods have been successfully deployed to automate the processing of the growing amount of images, videos, and texts we produce daily. In particular, deep neural networks have been adopted by the computer vision and natural language processing communities for their ability to perform accurate image recognition and text understanding once trained on big sets of data. Advances in both communities built the groundwork for new research problems at the intersection of vision and language. Integrating language into visual recognition could have an important impact on human life through the creation of real-world applications such as next-generation search engines or AI assistants.
In the first part of this thesis, we focus on systems for cross-modal text-image retrieval. We propose a learning strategy to efficiently align both modalities while structuring the retrieval space with semantic information. In the second part, we focus on systems able to answer questions about an image. We propose a multimodal architecture that iteratively fuses the visual and textual modalities using a factorized bilinear model while modeling pairwise relationships between each region of the image. In the last part, we address the issues related to biases in the modeling. We propose a learning strategy to reduce the language biases which are commonly present in visual question answering systems.

Defence : 07/08/2020 - 13h - Campus Pierre et Marie Curie, salle Jacques Pitrat (25-26/105)

Jury members :

Mme. Gabriela Csurka, Naver LABS Europe [rapportrice]
M. Ivan Laptev, INRIA Paris [rapporteur]
M. Patrick Gallinari, Sorbonne Université - LIP6
M. Thomas Serre, Brown University
M. Eduardo Valle, Campinas University - RECOD
M. Nicolas Thome, CNAM - CEDRIC
M. Matthieu Cord, Sorbonne Université - LIP6

2017-2021 Publications