Software has become ubiquitous in nearly every aspect of modern life, from critical infrastructure to daily applications.
Despite significant advances in development methodologies and testing practices, software systems continue to suffer from bugs. The debugging process required to locate and fix these faults remains expensive, requiring significant time and financial resources.
To address this challenge, the research field of Fault Localization (FL) has emerged aiming to automatically identify faulty elements in a program.
Depending on the type of data leveraged for localization, several families of approaches have been developed.
Among the most prominent are Spectrum-Based Fault Localization (SBFL), which relies on test executions and Information Retrieval Fault Localization (IRFL), which leverages textual artifacts such as bug reports.
Granularity is a key dimension of Fault Localization.
Existing techniques often operate at coarse-grained levels such as files or methods, leaving statements relatively underexplored. This gap is particularly critical for IRFL, as the amount of textual information contained in a single statement is limited, thereby reducing the effectiveness of Information Retrieval (IR) techniques.
In this thesis, we address this limitation by investigating both SBFL and IRFL at statement-level.
Our first contribution introduces a hybrid approach that combines SBFL and IRFL in order to overcome their respective limitations. Specifically, we integrate Ochiai, from SBFL, and Latent Dirichlet Allocation (LDA) from IRFL.
This combination allows the localization process to benefit from complementary strengths from both techniques, thus improving Fault Localization precision at the statement-level.
Building upon this foundation, our second contribution tackles the challenge of multi-Fault Localization.
We propose grouping statements into code fragments and applying an Evolutionary Algorithm (EA) to explore the search space of possible fragments. Each fragment is evaluated using a fitness function that combines both Ochiai and LDA scores.
The best-performing fragment is then transformed into a ranking consistent with standard Fault Localization outputs.
Our third contribution addresses the lack of contextual information at the statement-level, a particularly critical issue for IRFL approaches.
Although statements are often treated as independent entities during localization, they naturally belong to larger semantic blocks defined by their structural and dataflow context. To exploit this context, we introduce a document expansion strategy composed of two complementary components.
The first leverages structural context by expanding each statement with preprocessed names of its enclosing class and method as well as an optional associated comment. The second incorporates terms from related lines identified by a graph of variable relations.
Experimental results show that this expansion improves localization quality compared to other IRFL approaches without expansion process.
Together, these three contributions address fundamental challenges of statement-level Fault Localzation: the limitations of individual data sources, the complexity of multiple faults, and the lack of contextual information in short statements.
This work therefore provides new insights for enhancing statement-level localization and suggests additional directions for future research in automated Fault Localization.