Supervision : Antoine MINÉ
Static Type and Value Analysis by Abstract Interpretation of Python Programs with Native C Libraries
In this thesis, we aim at designing both theoretically and experimentally methods for the automatic detection of potential bugs in software – or the proof of the absence thereof. This detection is done statically by analyzing programs' source code without running them. We rely on the abstract interpretation framework to derive sound, computable semantics. In particular, we focus on analyzing dynamic programming languages. The target of this work is the analysis of Python programs combined with native C libraries.
The abstract interpretation workflow requires a concrete semantics, which will be approximated to create sound computable analyses. Our first step is thus to define the uncomputable, collecting semantics of Python formally. This semantics relies on previous work and retro-engineering from the source code of the reference interpreter. We took special care to make the semantics explainable by providing references to the actual implementation's source code for each case of the semantics.
We have implemented all the analyses presented in this thesis within the Mopsa framework. From a static analysis developer's perspective, Mopsa has a double goal. It aims at reducing the cost of lifting an abstract domain from a toy language to a real-world one. It also aspires to define relational analyses composed of loosely coupled abstract domains that can cooperate. We take particular care to define all our abstract domains independently from each other. Mopsa's core is language agnostic, making Mopsa suitable to analyze different programming languages. The analyses rely on some general use domains, such as numerical domains, the recency abstraction of dynamic memory allocation, and the loop iterator.
We present a first analysis aiming at detecting type-related errors in Python programs. It does so by inferring both the nominal and structural types of objects. Python's type annotations can be reused to support libraries. This analysis is refined into a value analysis which is more precise and infers numerical invariants for the manipulated builtin numeric values. These analyses rely on abstractions of dynamic memory allocation and data structures that we have adapted to the case of dynamic programming languages. Both analyses scale to real-world benchmarks a few thousand lines long used by Python developers.
Most Python programs rely on libraries written in C for the sake of performance or code reuse. We define a multilanguage analysis that can detect runtime errors in every part of a multilanguage program: in the Python part, in the C part, and at the boundary between the languages. Before defining this analysis, we provide an overview of the different interoperability mechanisms available between Python and C, and we define a collecting semantics for the interoperability mechanism provided by the reference interpreter. This analysis can tackle tests of popular, real-world libraries a few thousand lines of C and Python long in a few minutes.
Defence : 11/22/2021
Jury members :
MASTROENI Isabella (Università di Verona) [Rapporteur]
MØLLER Anders (Aarhus Universitet) [Rapporteur]
CHAILLOUX Emmanuel (Sorbonne Université)
MINÉ Antoine (Sorbonne Université)
LOGOZZO Francesco (Facebook Seattle)
MÜLLER Peter (ETH Zürich)
SCHMITT Alan (INRIA Rennes)
- R. Monat, A. Ouadjaout, A. Miné : “Mopsa-C: Modular Domains and Relational Abstract Interpretation for C Programs (Competition Contribution)”, Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2023), vol. 13994, Lecture Notes in Computer Science, Paris, France, pp. 565-570, (Springer, Cham), (ISBN: 978-3-031-30820-8) (2023)
- R. Monat : “Static Type and Value Analysis by Abstract Interpretation of Python Programs with Native C Libraries”, thesis, defence 11/22/2021, supervision Miné, Antoine (2021)
- R. Monat, A. Ouadjaout, A. Miné : “A Multilanguage Static Analysis of Python Programs with Native C Extensions”, Static Analysis Symposium (SAS), Chicago, Illinois, United States (2021)
- D. Merigoux, R. Monat, J. Protzenko : “A Modern Compiler for the French Tax Code”, CC '21: 30th ACM SIGPLAN International Conference on Compiler Construction, CC 2021: Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction, Virtual, Korea, Republic of, pp. 71-82, (ACM) (2021)
- R. Monat, A. Ouadjaout, A. Miné : “Static Type Analysis by Abstract Interpretation of Python Programs”, 34th European Conference on Object-Oriented Programming (ECOOP 2020), vol. 166, Leibniz International Proceedings in Informatics (LIPIcs), Berlin (Virtual / Covid), Germany, (Schloss Dagstuhl--Leibniz-Zentrum für Informatik) (2020)
- R. Monat, A. Ouadjaout, A. Miné : “Value and Allocation Sensitivity in Static Python Analyses”, Proceedings of the 9th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis, London, United Kingdom, pp. 8-13, (ACM) (2020)
- D. Merigoux, R. Monat, Ch. Gaie : “Étude formelle de l’implémentation du code des impôts”, JFLA 2020 - 31e Journées Francophones des Langages Applicatifs, Gruissan, France (2020)
- M. Journault, A. Miné, R. Monat, A. Ouadjaout : “Combinations of Reusable Abstract Domains for a Multilingual Static Analyzer”, Verified Software. Theories, Tools, and Experiments, vol. 12031, Lecture Notes in Computer Science, New York, United States, pp. 1-18, (Springer) (2019)
- R. Monat : “Static Analysis by Abstract Interpretation Collecting Types of Python Programs”, (2018)
- R. Monat, A. Miné : “Precise Thread-Modular Abstract Interpretation of Concurrent Programs Using Relational Interference Abstractions”, Verification, Model Checking, and Abstract Interpretation, vol. 10145, Lecture Notes in Computer Science, Paris, France, pp. 386-404, (Springer) (2017)