PANTIN Jérémie

PhD graduated (ATER, Sorbonne Université)
Team : LFI
    Sorbonne Université - LIP6
    Boîte courrier 169
    Couloir 26-00, Étage 5, Bureau 504
    4 place Jussieu
    75252 PARIS CEDEX 05

Tel: +33 1 44 27 88 87, Jeremie.Pantin (at)

Supervision : Christophe MARSALA

Detection and semantic characterisation of textual outliers

Outlier detection is a recurring problem in machine learning, involving the identification of data points significantly different from the rest of the dataset. In this context, we focus on identifying such outliers with textual data, which faces several challenges, including the formalisation and definition of textual outliers. There exists a distinct difference between syntactic and semantic outliers. To address this ambiguity, we propose a new taxonomy for identifying these outliers.
Within this framework, we identify various types of outliers and associated levels of difficulty, and we introduce a novel method to study them. With this method, it becomes possible to leverage a vast array of datasets, highlighting the strengths and weaknesses of anomaly detection and outlier detection approaches. Outlier detection can be performed using ensemble methods, where multiple text representations can be simultaneously employed with various detection techniques, enhancing efficiency and robustness against challenging outliers.
We introduce a novel approach that leverages robust learning and ensemble learning. We connect this work with XAI and data representation studies. Lastly, we present an application of our work in the domain of unsupervised abstractive summarization. In this scenario, outlier analysis aids in filtering out non-relevant sentences, resulting in an improvement in the quality of the summary.

Defence : 09/11/2023

Jury members :

LAURENT Anne (Université de Montpellier) [Rapporteur]
SMITS Gregory (IMT Atlantique) [Rapporteur]
AMANN Bernd (Sorbonne Université)
MARSALA Christophe (Sorbonne Université)

2022-2024 Publications