- Computer Science Laboratory

JARRAD Sara

PhD Student at Sorbonne University (ATER, Sorbonne Université)
Team : BD
    Sorbonne Université - LIP6
    Boîte courrier 169
    Couloir 25-26, Étage 5, Bureau 518
    4 place Jussieu
    75252 PARIS CEDEX 05
    FRANCE

+33 1 44 27 87 56
Sara.Jarrad (at) nulllip6.fr
https://lip6.fr/Sara.Jarrad

Supervision : Stéphane GANÇARSKI
Co-supervision : NAACKE Hubert

Search, classification and recommendation of similar sequences : Application to mobility trajectories

Sequential data corresponds to a series of ordered events. In the field of human mobility, such data can be effectively modeled as trajectories composed of Points of Interest (POIs). Trajectories represent sequences of specific locations visited by users in chronological order. User- generated content on the Web constitutes a rich source for analyzing human behavior. Data extracted from shared photos, tags, and other digital interactions can be used to reconstruct the mobility trajectories of individuals. In this context, our work focuses on three main tasks: POI and trajectory recommendation, sequence similarity search, and top-k ranking of the most similar sequences to a given query sequence. Our contributions are twofold: first, to provide a comprehensive overview of the fundamental concepts and existing approaches in these domains; and second, to propose novel solutions that address key limitations

in the state-of-the-art. We start by examining recommendation tasks, particularly the prediction of the next POI to visit. Many existing methods struggle to capture the semantic relationships between POIs, or rely on spatio-temporal contexts that are not relevant for our sequential-only data. To address these issues, we introduce a method based on vector representations (embeddings) generated by language models. This approach exploits the contextual dependencies between POIs while relying exclusively on sequential data, thereby improving the quality of POI recommendations. Next, we extend our study to the task of sequence similarity search, which aims to quantify the similarity between two sequences based on their shared elements. This problem goes beyond mobility trajectories and applies to other types of sequential data. To overcome the limitations of existing methods that are often computationally expensive or poorly suited to the structure of our data, we propose SISIS, an efficient indexing-based approach to retrieve all sequences similar to a query sequence, according to a user-defined threshold (measured by the number of common elements in the same order). We also present SISIS*, an extension of SISIS that integrates embeddings to enrich contextual information and enhance search performance.

Finally, we address the ranking task by introducing a new scoring function and an efficient algorithm for top-k sequence retrieval. Our scoring function assigns higher scores to candidate sequences that share a greater number of subsequences with the query, while reducing computational overhead through efficient management of sequence sets.


Phd defence : 07/08/2025

Jury members :

Maude Manouvrier MCF-HDR, Université Paris-Dauphine-PSL [Rapporteur]
Reza Akbarinia CR-HDR, Inria [Rapporteur]
Benjamin Piwowarski DR, Sorbonne Université
Olivier Curé PR, Université Gustave Eiffel
Stéphane Gançarski MCF-HDR, Sorbonne Université
Hubert Naacke PR, Sorbonne Université

2022-2025 Publications