LIP6 CNRS Sorbonne Université Tremplin Carnot Interfaces
Direct Link LIP6 » News » PhD students


PhD graduated
Team : MLIA
Localisation : Campus Pierre et Marie Curie
    Sorbonne Université - LIP6
    Boîte courrier 169
    Couloir 26-00, Étage 5, Bureau 525
    4 place Jussieu
    75252 PARIS CEDEX 05
Tel: +33 1 44 27 51 29, Remi.Cadene (at)
Supervision : Matthieu CORD
Co-supervision : THOME Nicolas

Deep multimodal learning for vision and language processing

Digital technologies have become instrumental in transforming our society. Recent statistical methods have been successfully deployed to automate the processing of the growing amount of images, videos, and texts we produce daily. In particular, deep neural networks have been adopted by the computer vision and natural language processing communities for their ability to perform accurate image recognition and text understanding once trained on big sets of data. Advances in both communities built the groundwork for new research problems at the intersection of vision and language. Integrating language into visual recognition could have an important impact on human life through the creation of real-world applications such as next-generation search engines or AI assistants.
In the first part of this thesis, we focus on systems for cross-modal text-image retrieval. We propose a learning strategy to efficiently align both modalities while structuring the retrieval space with semantic information. In the second part, we focus on systems able to answer questions about an image. We propose a multimodal architecture that iteratively fuses the visual and textual modalities using a factorized bilinear model while modeling pairwise relationships between each region of the image. In the last part, we address the issues related to biases in the modeling. We propose a learning strategy to reduce the language biases which are commonly present in visual question answering systems.
Defence : 07/08/2020 - 13h - Campus Jussieu, Salle Jacques Pitrat (25-26/105) + Visio
Jury members :
Mme. Gabriela Csurka, Naver LABS Europe [rapportrice]
M. Ivan Laptev, INRIA Paris [rapporteur]
M. Patrick Gallinari, Sorbonne Université - LIP6
M. Thomas Serre, Brown University
M. Eduardo Valle, Campinas University - RECOD
M. Nicolas Thome, CNAM - CEDRIC
M. Matthieu Cord, Sorbonne Université - LIP6

2017-2019 Publications

 Mentions légales
Site map |