Bounded Model Checking for C Programs
in an Enterprise Environment
Michael Tautschnig

Amazon Web Services & Queen Mary University of London
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Customer: I would like
to get a guarantee that
there are no security
bugs in this software.
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
“Software”
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
“Software” eco system of
can’t be published,
but …
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Ample Open-Source Software “out there”
• Debian (http://sources.debian.net/stats/ 21st October 2016)
• 26,900 source packages
• 13,736,903 individual source files
• 1,276,743,654 lines of source code (any programming language)
• 45.5% (approx 500M) C code, 22.2% C++, 5.6% shell, 4.7% Java
• SourceForge, github, CodePlex, ...: how to automate any kind of analysis?
• Distributions (RedHat, Ubuntu/Debian, SuSE, … - but also industrial set ups)!
• Software organised in source packages
• Uniform interface to access/download packages
• Uniform build interface, dependency management
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
How?
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Building one Source Package: Compiler Tool-chain
• For now: C source code only
• goto-cc (part of CBMC distribution)
• Uses compiler’s (here: GCC’s) preprocessor
• Own C parser/front end (no Cil, LLVM, EDG, ...)
• Supports GCC, Visual Studio, CodeWarrior, ARM-CC dialects and command
line options
• Builds intermediate representation understood by CBMC/CProver tools
• Linking of compiled files/archives/libraries
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Supporting arbitrary Build Systems
• Builds are performed in chroot environments
• /usr/bin/gcc and /usr/bin/ld replaced by scripts invoking goto-cc (+ more work)
• Key procedure:
1. Run real compiler/linker (gcc/ld)
2. Compile/link using goto-cc
3. Add result as additional ELF section
• Resulting file remains executable
• Stable under file renaming, archiving, etc.
• Linking stage extracts intermediate representation from extra ELF section
x86
binary
CProver
IR
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Building Thousands of Packages
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Infrastructure: (Ab-)using Jenkins
Scripts, notes, configuration: https://github.com/tautschnig/cprover-debian
Jenkins master:
4 cores, 64 GB
5 slave nodes: each
64 cores,
256 GB memory
Ultimate Debian
Database:
Package versions, bugs
SQL
SSH
Debian mirror:
source archives
FTP
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Current per-package Work Flow
Compile, link
Store archive
of all object
files/
executables
dump-c:
create human-
readable C
code from IR
Add generic
assertions
(pointer
checks,
arithmetic
overflow, no-
NaN, ...)
Run CBMC
w/unwinding
bound 1, Z3/
Minisat
(DAC’03,
TACAS’04,
CAV’13)
Loop
acceleration
(CAV’13)
Re-compile using goto-cc
Static weak
memory cycles
(TOPLAS/
PLDI’14)
re-compile
using gcc
(errors not
fatal)
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Results?
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Exercising Language Front Ends
Compile, link
Store archive
of all object
files/
executables
dump-c:
create human-
readable C
code from IR
Add generic
assertions
(pointer
checks,
arithmetic
overflow, no-
NaN, ...)
Run CBMC
w/unwinding
bound 1, Z3/
Minisat
(DAC’03,
TACAS’04,
CAV’13)
Loop
acceleration
(CAV’13)
Re-compile using goto-cc
Static weak
memory cycles
(TOPLAS/
PLDI’14)
re-compile
using gcc
(errors not
fatal)
+
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Exercising Language Front Ends
• Many bug fixes and improvements to the parser, type checker
• Re-engineering of parts of the linker
• Bug fixes in IR construction
• Compilation (without further analysis steps) of entire archive: ~2 days
• > 250 GB of compressed archives of IR object files/executables
• 10314 archives available:
http://theory.eecs.qmul.ac.uk/debian+mole/pkgs/
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Results for relevant to Practitioners: Bug Reports
• Key feature: type checking at link time
• 844 bugs reported, 530 already fixed by developers
• Hundreds still to be reported
• http://bugs.debian.org/cgi-bin/pkgreport.cgi?users=mt@debian.org&tag=goto-
cc&archive=both
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Reporting bugs
Automated Testing using SMID | Michael Tautschnig
Where are the cats?
• CAV’14: J. Alglave, D. Kroening, V. Nimal, D. Poetzl: Don't sit on the fence: A
static analysis approach to automatic fence insertion
• PLDI’14/TOPLAS: J. Alglave, L. Maranget, M. Tautschnig: Herding Cats -
Modelling, simulation, testing, and data-mining for weak memory (cited in Linux
Weekly News and C/C++ WG21/N4036)
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Focus on improving/developing Methods
Compile, link
Store archive
of all object
files/
executables
dump-c:
create human-
readable C
code from IR
Add generic
assertions
(pointer
checks,
arithmetic
overflow, no-
NaN, ...)
Run CBMC
w/unwinding
bound 1, Z3/
Minisat
(DAC’03,
TACAS’04,
CAV’13)
Loop
acceleration
(CAV’13)
Re-compile using goto-cc
Static weak
memory cycles
(TOPLAS/
PLDI’14)
re-compile
using gcc
(errors not
fatal)
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
TOPLAS/PLDI’14: analysing 200 million LOC for
potential weak memory susceptibility
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Automated Information Leak Detection
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Analysing the Patched Version
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Overall Analysis Status (preliminary!)
Compile, link
Store archive
of all object
files/
executables
dump-c:
create human-
readable C
code from IR
Add generic
assertions
(pointer
checks,
arithmetic
overflow, no-
NaN, ...)
Run CBMC
w/unwinding
bound 1, Z3/
Minisat
(DAC’03,
TACAS’04,
CAV’13)
Loop
acceleration
(CAV’13)
Re-compile using goto-cc
Static weak
memory cycles
(TOPLAS/
PLDI’14)
re-compile
using gcc
(errors not
fatal)
Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Overall Analysis Status (preliminary!)
• In addition to 314 bugs reported and not yet fixed: 4915 packages with error
reports - top causes:
1789 CBMC counterexamples (including several using loop acceleration)
1711 Loop acceleration bugs
200 Floating point support in Z3 back end
198 Type-inconsistent access to heap with symbolic offset
129 CBMC Out-of-memory
54 Parameter counts differ
48 Conflicting array sizes
46 Conflicting types
42 Conflicting struct types
32 Conflicting return types (byte size)
Questions
Software? Yes.
Guarantees? Sometimes.

Bounded Model Checking for C Programs in an Enterprise Environment

  • 1.
    Bounded Model Checkingfor C Programs in an Enterprise Environment Michael Tautschnig Amazon Web Services & Queen Mary University of London
  • 2.
    Bounded Model Checkingfor C Programs in an Enterprise Environment | Michael Tautschnig Customer: I would like to get a guarantee that there are no security bugs in this software.
  • 3.
    Bounded Model Checkingfor C Programs in an Enterprise Environment | Michael Tautschnig “Software”
  • 4.
    Bounded Model Checkingfor C Programs in an Enterprise Environment | Michael Tautschnig “Software” eco system of can’t be published, but …
  • 5.
    Bounded Model Checkingfor C Programs in an Enterprise Environment | Michael Tautschnig Ample Open-Source Software “out there” • Debian (http://sources.debian.net/stats/ 21st October 2016) • 26,900 source packages • 13,736,903 individual source files • 1,276,743,654 lines of source code (any programming language) • 45.5% (approx 500M) C code, 22.2% C++, 5.6% shell, 4.7% Java • SourceForge, github, CodePlex, ...: how to automate any kind of analysis? • Distributions (RedHat, Ubuntu/Debian, SuSE, … - but also industrial set ups)! • Software organised in source packages • Uniform interface to access/download packages • Uniform build interface, dependency management
  • 6.
    Bounded Model Checkingfor C Programs in an Enterprise Environment | Michael Tautschnig How?
  • 7.
    Bounded Model Checkingfor C Programs in an Enterprise Environment | Michael Tautschnig Building one Source Package: Compiler Tool-chain • For now: C source code only • goto-cc (part of CBMC distribution) • Uses compiler’s (here: GCC’s) preprocessor • Own C parser/front end (no Cil, LLVM, EDG, ...) • Supports GCC, Visual Studio, CodeWarrior, ARM-CC dialects and command line options • Builds intermediate representation understood by CBMC/CProver tools • Linking of compiled files/archives/libraries
  • 8.
    Bounded Model Checkingfor C Programs in an Enterprise Environment | Michael Tautschnig Supporting arbitrary Build Systems • Builds are performed in chroot environments • /usr/bin/gcc and /usr/bin/ld replaced by scripts invoking goto-cc (+ more work) • Key procedure: 1. Run real compiler/linker (gcc/ld) 2. Compile/link using goto-cc 3. Add result as additional ELF section • Resulting file remains executable • Stable under file renaming, archiving, etc. • Linking stage extracts intermediate representation from extra ELF section x86 binary CProver IR
  • 9.
    Bounded Model Checkingfor C Programs in an Enterprise Environment | Michael Tautschnig Building Thousands of Packages
  • 10.
    Bounded Model Checkingfor C Programs in an Enterprise Environment | Michael Tautschnig Infrastructure: (Ab-)using Jenkins Scripts, notes, configuration: https://github.com/tautschnig/cprover-debian Jenkins master: 4 cores, 64 GB 5 slave nodes: each 64 cores, 256 GB memory Ultimate Debian Database: Package versions, bugs SQL SSH Debian mirror: source archives FTP
  • 11.
    Bounded Model Checkingfor C Programs in an Enterprise Environment | Michael Tautschnig Current per-package Work Flow Compile, link Store archive of all object files/ executables dump-c: create human- readable C code from IR Add generic assertions (pointer checks, arithmetic overflow, no- NaN, ...) Run CBMC w/unwinding bound 1, Z3/ Minisat (DAC’03, TACAS’04, CAV’13) Loop acceleration (CAV’13) Re-compile using goto-cc Static weak memory cycles (TOPLAS/ PLDI’14) re-compile using gcc (errors not fatal)
  • 12.
    Bounded Model Checkingfor C Programs in an Enterprise Environment | Michael Tautschnig Results?
  • 13.
    Bounded Model Checkingfor C Programs in an Enterprise Environment | Michael Tautschnig Exercising Language Front Ends Compile, link Store archive of all object files/ executables dump-c: create human- readable C code from IR Add generic assertions (pointer checks, arithmetic overflow, no- NaN, ...) Run CBMC w/unwinding bound 1, Z3/ Minisat (DAC’03, TACAS’04, CAV’13) Loop acceleration (CAV’13) Re-compile using goto-cc Static weak memory cycles (TOPLAS/ PLDI’14) re-compile using gcc (errors not fatal) +
  • 14.
    Bounded Model Checkingfor C Programs in an Enterprise Environment | Michael Tautschnig Exercising Language Front Ends • Many bug fixes and improvements to the parser, type checker • Re-engineering of parts of the linker • Bug fixes in IR construction • Compilation (without further analysis steps) of entire archive: ~2 days • > 250 GB of compressed archives of IR object files/executables • 10314 archives available: http://theory.eecs.qmul.ac.uk/debian+mole/pkgs/
  • 15.
    Bounded Model Checkingfor C Programs in an Enterprise Environment | Michael Tautschnig Results for relevant to Practitioners: Bug Reports • Key feature: type checking at link time • 844 bugs reported, 530 already fixed by developers • Hundreds still to be reported • http://bugs.debian.org/cgi-bin/pkgreport.cgi?users=mt@debian.org&tag=goto- cc&archive=both
  • 16.
    Bounded Model Checkingfor C Programs in an Enterprise Environment | Michael Tautschnig Reporting bugs
  • 17.
    Automated Testing usingSMID | Michael Tautschnig Where are the cats? • CAV’14: J. Alglave, D. Kroening, V. Nimal, D. Poetzl: Don't sit on the fence: A static analysis approach to automatic fence insertion • PLDI’14/TOPLAS: J. Alglave, L. Maranget, M. Tautschnig: Herding Cats - Modelling, simulation, testing, and data-mining for weak memory (cited in Linux Weekly News and C/C++ WG21/N4036)
  • 18.
    Bounded Model Checkingfor C Programs in an Enterprise Environment | Michael Tautschnig Focus on improving/developing Methods Compile, link Store archive of all object files/ executables dump-c: create human- readable C code from IR Add generic assertions (pointer checks, arithmetic overflow, no- NaN, ...) Run CBMC w/unwinding bound 1, Z3/ Minisat (DAC’03, TACAS’04, CAV’13) Loop acceleration (CAV’13) Re-compile using goto-cc Static weak memory cycles (TOPLAS/ PLDI’14) re-compile using gcc (errors not fatal)
  • 19.
    Bounded Model Checkingfor C Programs in an Enterprise Environment | Michael Tautschnig TOPLAS/PLDI’14: analysing 200 million LOC for potential weak memory susceptibility
  • 20.
    Bounded Model Checkingfor C Programs in an Enterprise Environment | Michael Tautschnig Automated Information Leak Detection
  • 21.
    Bounded Model Checkingfor C Programs in an Enterprise Environment | Michael Tautschnig Analysing the Patched Version
  • 22.
    Bounded Model Checkingfor C Programs in an Enterprise Environment | Michael Tautschnig Overall Analysis Status (preliminary!) Compile, link Store archive of all object files/ executables dump-c: create human- readable C code from IR Add generic assertions (pointer checks, arithmetic overflow, no- NaN, ...) Run CBMC w/unwinding bound 1, Z3/ Minisat (DAC’03, TACAS’04, CAV’13) Loop acceleration (CAV’13) Re-compile using goto-cc Static weak memory cycles (TOPLAS/ PLDI’14) re-compile using gcc (errors not fatal)
  • 23.
    Bounded Model Checkingfor C Programs in an Enterprise Environment | Michael Tautschnig Overall Analysis Status (preliminary!) • In addition to 314 bugs reported and not yet fixed: 4915 packages with error reports - top causes: 1789 CBMC counterexamples (including several using loop acceleration) 1711 Loop acceleration bugs 200 Floating point support in Z3 back end 198 Type-inconsistent access to heap with symbolic offset 129 CBMC Out-of-memory 54 Parameter counts differ 48 Conflicting array sizes 46 Conflicting types 42 Conflicting struct types 32 Conflicting return types (byte size)
  • 24.