Supervision : Matthieu CORD
Co-supervision : THOME Nicolas
Apprentissage d'architectures profondes pour la détection et la reconnaissance de cibles en imagerie optronique
Designing Deep Architectures for Visual Understanding.
Nowadays, images are ubiquitous through the use of smartphones and social media. It then becomes necessary to have automatic means of processing them, in order to analyze and interpret the large amount of available data. In this thesis, we are interested in object detection, i.e. the problem of identifying and localizing all objects present in an image. This can be seen as a first step toward a complete visual understanding of scenes. It is tackled with deep convolutional neural networks, under the Deep Learning paradigm. One drawback of this approach is the need for numerous labeled data to learn from. Since precise annotations are time-consuming to produce, we first rely on bigger datasets built with cheaper image-level labels. We design a global pooling function to work with them and to recover latent information about spatial localization of objects. We then deal with usual object-level annotations and introduce several new modules to learn part-based representations. By being more flexible than standard bounding boxes and exploiting latent object structure, they yield finer descriptions. We address the issue of efficiency in end-to-end learning both of these latent representations by leveraging fully convolutional networks. Besides, exploiting additional annotations on available images can be an alternative to having more images, especially when these are difficult to obtain. We formalize this problem as a specific kind of multi-task learning with a primary objective to focus on, and design a way to effectively learn from this auxiliary supervision under this framework. All models are thoroughly experimentally evaluated on standard datasets and achieve competitive results with the literature.
Defence : 11/20/2018 - 10h30 - Site Jussieu 25-26/105
Jury members :
M. Florent Perronnin, Naver Labs Europe [Rapporteur]
M. Josef Sivic, INRIA – ENS [Rapporteur]
M. Alexandre Alahi, EPFL – VITA Lab
M. Matthieu Cord, Sorbonne Université – LIP6
M. Gilles Henaff, Thales LAS France S.A.S.
Mme Natalia Neverova, Facebook AI Research
M. Nicolas Thome, CNAM – CEDRIC
- T. Mordan : “Apprentissage d’architectures profondes pour la détection et la reconnaissance de cibles en imagerie optronique”, thesis, defence 11/20/2018, supervision Cord, Matthieu, rapporteurs : THOME Nicolas (2018)
- T. Mordan, N. Thome, G. Henaff, M. Cord : “Revisiting Multi-Task Learning with ROCK: a Deep Residual Auxiliary Block for Visual Detection”, Advances in Neural Information Processing Systems 32 (NeurIPS 2018), Montréal, Canada (2018)
- T. Mordan, N. Thome, G. Henaff, M. Cord : “End-to-End Learning of Latent Deformable Part-Based Representations for Object Detection”, International Journal of Computer Vision, (Springer Verlag) (2018)
- T. Mordan, N. Thome, M. Cord, G. Henaff : “Deformable Part-based Fully Convolutional Network for Object Detection”, British Machine Vision Conference (BMVC), London, United Kingdom (2017)
- Th. Durand, T. Mordan, N. Thome, M. Cord : “WILDCAT: Weakly Supervised Learning of Deep ConvNets for Image Classification, Pointwise Localization and Segmentation”, IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, United States, pp. 5957-5966 (2017)