11:30 AM - 12:00 PM (PST)
On the Challenges of Scaling Static Vulnerability Discovery

Today, the focus of research on automatic discovery of vulnerabilities has shifted primarily towards fuzzing. With its impressive track record of uncovering highly critical memory corruption vulnerabilities, this comes as no surprise. Unfortunately, as a dynamic approach, fuzzing has inherent limitations that further research is unable to address. In particular, it cannot be applied in two settings that are increasingly gaining relevance. First, as fuzzing requires running the program, it is unsuited for the discovery of variants of a vulnerability across large bodies of code for which configuring execution environments is expensive, e.g., when scanning for vulnerabilities across all hardware drivers supported by an operating system kernel or across a family of firmware images. Second, fuzzing is only effective when run over a longer period of time and under simulation of external events, prohibiting integration into modern CI/CD pipelines where the time required for completion would severely interfere with feature delivery. As much as we wish to escape it, these settings call for static program analysis. In this talk, the presenter speaks about the lessons learned over the past years in building a scalable system for the discovery of vulnerabilities in these two settings, a system that is today used by several large companies on a day-to-day basis. We focus on the challenge of designing a single system to allow for vulnerability discovery via a flexible and extensible query language, across programming languages, for both system code and high-level web and cloud applications. We speak about bringing static data-flow tracking into production when a scan must finish in 10 minutes, the many little tweaks that you will not read in a research paper, as well as the largely ignored topic of scaling static analysis horizontally. We compare the different abstraction levels of code and show how a single analysis engine can be built to process low-level binary code and high-level Javascript alike. Finally, we present open-source implementations of this research that can hopefully be a starting point for anyone interested in the topic.