Supervision : Matthieu CORD
Co-supervision : THOME Nicolas
Designing Deep Architectures for Visual Understanding.
Nowadays, images are ubiquitous through the use of smartphones and social media. It then becomes necessary to have automatic means of processing them, in order to analyze and interpret the large amount of available data. In this thesis, we are interested in object detection, i.e. the problem of identifying and localizing all objects present in an image. This can be seen as a first step toward a complete visual understanding of scenes. It is tackled with deep convolutional neural networks, under the Deep Learning paradigm. One drawback of this approach is the need for numerous labeled data to learn from. Since precise annotations are time-consuming to produce, we first rely on bigger datasets built with cheaper image-level labels. We design a global pooling function to work with them and to recover latent information about spatial localization of objects. We then deal with usual object-level annotations and introduce several new modules to learn part-based representations. By being more flexible than standard bounding boxes and exploiting latent object structure, they yield finer descriptions. We address the issue of efficiency in end-to-end learning both of these latent representations by leveraging fully convolutional networks. Besides, exploiting additional annotations on available images can be an alternative to having more images, especially when these are difficult to obtain. We formalize this problem as a specific kind of multi-task learning with a primary objective to focus on, and design a way to effectively learn from this auxiliary supervision under this framework. All models are thoroughly experimentally evaluated on standard datasets and achieve competitive results with the literature.
Defence : 11/20/2018 - 10h30 - Site Jussieu 25-26/105
M. Florent Perronnin, Naver Labs Europe [Rapporteur]
M. Josef Sivic, INRIA – ENS [Rapporteur]
M. Alexandre Alahi, EPFL – VITA Lab
M. Matthieu Cord, Sorbonne Université – LIP6
M. Gilles Henaff, Thales LAS France S.A.S.
Mme Natalia Neverova, Facebook AI Research
M. Nicolas Thome, CNAM – CEDRIC