Statistical Similarity of Binaries; At Scale

Lundi 18 septembre 2017
Nimrod PARTUSH, Technion, Israel

This talk covers a new statistical, semantic-based approach for establishing similarity between stripped Binary executables (with no debug information). The main challenge in binary similarity, is to establish similarity even when the code has been compiled using different compilers, with different optimization levels, or targeting different architectures. Overcoming this challenge, while avoiding false positives, is invaluable to the process of reverse engineering and the process of locating vulnerable code.

The main idea is to use similarity by composition: decompose the code into smaller comparable fragments, define semantic similarity between fragments, and use statistical reasoning to lift fragment similarity into similarity between procedures by quantifying fragment significance.

The talk will cover two approaches for semantic fragment similarity:

* A theorem-prover based approach which find and proves (partial) semantic equivalence over two segments of code, aimed towards precision.

* A canonical-form based approach, which uses the compiler optimizer as a tool for finding semantic equivalence, aimed towards scale.

The talk will also include a study of the statistical framework which defines fragment significance, e.g., what is the size and composition of the sample space and how it affect precision.

Both approaches were implemented and extensively evaluated showing they can accurately find various prominent vulnerabilities across compilers, optimization levels and architectures, including Heartbleed, Shellshock and Venom.

Antoine.Mine (at) nulllip6.fr