The rise of free and open-source software fosters development by reusing software libraries that are available within ecosystems such as Java / Maven. This offers advantages, particularly in terms of development time.
However, this also raises maintenance issues, which are amplified by the dependencies between the libraries themselves. Beyond a project’s direct dependencies, it is crucial to analyze its entire software supply chain. In this thesis, achieved in collaboration with an industrial partner, the focus is put on analyzing the quality and maintenance of projects with reference to their supply chain seen through the prism of their direct and indirect software dependencies. From a broader perspective, the study concerns dependency ecosystems at a global scale.
The first contribution of this thesis is a systematic mapping study on software dependency quality metrics. This review reveals the richness of the existing metrics but also the need to provide efficient means for associating these metrics to project or ecosystem dependency graphs, which are often very large. To address this issue, a second contribution proposes tool-based approaches for mining ecosystem-scale dependency graphs, enriching them with dependency related quality metrics, and efficiently querying large dependency graphs.
Beyond measuring the quality of a project in terms of its dependencies, it is crucial to be able to react to issues such as library obsolescence or vulnerability presence.
To this end, a third contribution of this thesis concerns an approach, based on linear programming, for the generation of software dependency update plans that integrate user preferences in terms of quality, while minimizing incompatibilities. All the tools and datasets developed during this thesis are free and open source, and some of them were used as a basis for the “Mining Challenge” of a conference in the field.