PhD graduated
Team : MoVe
Departure date : 08/31/2021

Supervision : Jean-François PRADAT-PEYRE

Data mining and modeling of poorly structured or unstructured data

Supervised learning models are usually trained on data with limited constraints. Unfortunately, data are generally scarce, incomplete and biased in real-world use cases, which hampers efficient model design. Such data can and should still be leveraged to discover relevant patterns, glean insight and develop meaningful conclusions. In this thesis, we investigate an unsupervised learning approach to isolate minority samples encompassed within a larger population. Our review includes two different use cases: Amyotrophic Lateral Sclerosis prognosis and identification of potential innovation funding recipients. Despite differences in their purpose, these contexts face similar issues: poor data availability of partial and unrepresentative samples. In both cases, the aim is to detect samples from a minority population: patients with a poorer 1-year prognosis and companies that are more likely to be successful funding applicants. Data are projected into a lower-dimensional space using Uniform Manifold Approximation and Projection (UMAP), a nonlinear dimension reduction technique. Differences in data distributions are harnessed and used to isolate the target minority population, using Density Based Clustering of Applications with Noise (DBSCAN) and alpha shapes. Correlations between input and target variables become visible within the projection space and minority samples are isolated from the remaining data. As a result, in spite of poor data quality, we provide additional insight with regard to recently diagnosed patients and potential funding applicants.

Defence : 06/25/2021 - 14h - visio / hydride

Jury members :

Mme Hélène BLASCO, PU-PH, Université de Tours [rapporteur]
M Patrice BERTAIL, PU, Université de Nanterre [rapporteur]
Mme Emmanuelle ENCRENAZ, MCF-HDR, Sorbonne Université
M François DELBOT, MCF, Sorbonne Université
M Pierre-François PRADAT, PU-PH, Sorbonne Université
M Gaétan LE CHAT, Dr, FRS Consulting
M Jean-François PRADAT-PEYRE, Sorbonne Université

2019-2021 Publications