LIP6 CNRS Sorbonne Université Tremplin Carnot Interfaces
Direct Link LIP6 » News » PhD students

BORDES Patrick

PhD graduated
Team : MLIA
Localisation : Campus Pierre et Marie Curie
    Sorbonne Université - LIP6
    Boîte courrier 169
    Couloir 26-00, Étage 5, Bureau 511
    4 place Jussieu
    75252 PARIS CEDEX 05
Tel: +33 1 44 27 87 33, Patrick.Bordes (at)
Supervision : Patrick GALLINARI
Co-supervision : PIWOWARSKI Benjamin

Deep Multimodal Learning for Joint Textual and Visual Reasoning

In the last decade, the evolution of Deep Learning techniques to learn meaningful data representations for text and images, combined with an important increase of multimodal data, mainly from social network and e-commerce websites, has triggered a growing interest in the research community about the joint understanding of language and vision. The challenge at the heart of Multimodal Machine Learning is the intrinsic difference in semantics between language and vision: while vision faithfully represents reality and conveys low-level semantics, language is a human construction carrying high-level reasoning.
One the one hand, language can enhance the performance of vision models. The underlying hypothesis is that textual representations contain visual information. We apply this principle to two Zero-Shot Learning tasks. In the first contribution on ZSL, we extend a common assumption in ZSL, which states that textual representations encode information about the visual appearance of objects, by showing that they also encode information about their visual surroundings and their real-world frequence. In a second contribution, we consider the transductive setting in ZSL. We propose a solution to the limitations of current transductive approaches, that assume that the visual space is well-clustered, which does not hold true when the number of unknown classes is high.
On the other hand, vision can expand the capacities of language models. We demonstrate it by tackling Visual Question Generation (VQG), which extends the standard Question Generation task by using an image as complementary input, by using visual representations derived from Computer Vision.
Defence : 11/26/2020 - 09h -
Jury members :
Mr Yannis Avrithis (INRIA Rennes-Bretagne Atlantique) [Rapporteur]
Mr Loic Barrault (University of Sheffield) [Rapporteur]
Mr Patrick Gallinari (LIP6, MLIA)
Mr Benjamin Piwowarski (LIP6, MLIA, CNRS)
Mrs Diane Bouchacourt (FAIR)
Mme Catherine Pelachaud (ISIR)

2017-2019 Publications

 Mentions légales
Site map |