Bounded Model Checking for C Programs in an Enterprise Environment

Bounded Model Checking for C Programs
in an Enterprise Environment
Michael Tautschnig

Amazon Web Services & Queen Mary University of London

Bounded Model Checking for C Programs in an Enterprise Environment | Michael Tautschnig
Customer: I would like
to get a guarantee that
there are no security
bugs in this software.

“Software”

“Software” eco system of
can’t be published,
but …

Ample Open-Source Software “out there”
• Debian (http://sources.debian.net/stats/ 21st October 2016)
• 26,900 source packages
• 13,736,903 individual source ﬁles
• 1,276,743,654 lines of source code (any programming language)
• 45.5% (approx 500M) C code, 22.2% C++, 5.6% shell, 4.7% Java
• SourceForge, github, CodePlex, ...: how to automate any kind of analysis?
• Distributions (RedHat, Ubuntu/Debian, SuSE, … - but also industrial set ups)!
• Software organised in source packages
• Uniform interface to access/download packages
• Uniform build interface, dependency management

How?

Building one Source Package: Compiler Tool-chain
• For now: C source code only
• goto-cc (part of CBMC distribution)
• Uses compiler’s (here: GCC’s) preprocessor
• Own C parser/front end (no Cil, LLVM, EDG, ...)
• Supports GCC, Visual Studio, CodeWarrior, ARM-CC dialects and command
line options
• Builds intermediate representation understood by CBMC/CProver tools
• Linking of compiled ﬁles/archives/libraries

Supporting arbitrary Build Systems
• Builds are performed in chroot environments
• /usr/bin/gcc and /usr/bin/ld replaced by scripts invoking goto-cc (+ more work)
• Key procedure:
1. Run real compiler/linker (gcc/ld)
2. Compile/link using goto-cc
3. Add result as additional ELF section
• Resulting ﬁle remains executable
• Stable under ﬁle renaming, archiving, etc.
• Linking stage extracts intermediate representation from extra ELF section
x86
binary
CProver
IR

Building Thousands of Packages

Infrastructure: (Ab-)using Jenkins
Scripts, notes, conﬁguration: https://github.com/tautschnig/cprover-debian
Jenkins master:
4 cores, 64 GB
5 slave nodes: each
64 cores,
256 GB memory
Ultimate Debian
Database:
Package versions, bugs
SQL
SSH
Debian mirror:
source archives
FTP

Current per-package Work Flow
Compile, link
Store archive
of all object
ﬁles/
executables
dump-c:
create human-
readable C
code from IR
Add generic
assertions
(pointer
checks,
arithmetic
overﬂow, no-
NaN, ...)
Run CBMC
w/unwinding
bound 1, Z3/
Minisat
(DAC’03,
TACAS’04,
CAV’13)
Loop
acceleration
(CAV’13)
Re-compile using goto-cc
Static weak
memory cycles
(TOPLAS/
PLDI’14)
re-compile
using gcc
(errors not
fatal)

Results?

Exercising Language Front Ends
Compile, link
Store archive
of all object
ﬁles/
executables
dump-c:
create human-
readable C
code from IR
Add generic
assertions
(pointer
checks,
arithmetic
overﬂow, no-
NaN, ...)
Run CBMC
w/unwinding
bound 1, Z3/
Minisat
(DAC’03,
TACAS’04,
CAV’13)
Loop
acceleration
(CAV’13)
Static weak
memory cycles
(TOPLAS/
PLDI’14)
re-compile
using gcc
(errors not
fatal)
+

Exercising Language Front Ends
• Many bug fixes and improvements to the parser, type checker
• Re-engineering of parts of the linker
• Bug fixes in IR construction
• Compilation (without further analysis steps) of entire archive: ~2 days
• > 250 GB of compressed archives of IR object files/executables
• 10314 archives available:
http://theory.eecs.qmul.ac.uk/debian+mole/pkgs/

Results for relevant to Practitioners: Bug Reports
• Key feature: type checking at link time
• 844 bugs reported, 530 already ﬁxed by developers
• Hundreds still to be reported
• http://bugs.debian.org/cgi-bin/pkgreport.cgi?users=mt@debian.org&tag=goto-
cc&archive=both

Reporting bugs

Automated Testing using SMID | Michael Tautschnig
Where are the cats?
• CAV’14: J. Alglave, D. Kroening, V. Nimal, D. Poetzl: Don't sit on the fence: A
static analysis approach to automatic fence insertion
• PLDI’14/TOPLAS: J. Alglave, L. Maranget, M. Tautschnig: Herding Cats -
Modelling, simulation, testing, and data-mining for weak memory (cited in Linux
Weekly News and C/C++ WG21/N4036)

Focus on improving/developing Methods
Compile, link
Store archive
of all object
ﬁles/
executables
dump-c:
create human-
readable C
code from IR
Add generic
assertions
(pointer
checks,
arithmetic
overﬂow, no-
NaN, ...)
Run CBMC
w/unwinding
bound 1, Z3/
Minisat
(DAC’03,
TACAS’04,
CAV’13)
Loop
acceleration
(CAV’13)
Static weak
memory cycles
(TOPLAS/
PLDI’14)
re-compile
using gcc
(errors not
fatal)

TOPLAS/PLDI’14: analysing 200 million LOC for
potential weak memory susceptibility

Automated Information Leak Detection

Analysing the Patched Version

Overall Analysis Status (preliminary!)
Compile, link
Store archive
of all object
ﬁles/
executables
dump-c:
create human-
readable C
code from IR
Add generic
assertions
(pointer
checks,
arithmetic
overﬂow, no-
NaN, ...)
Run CBMC
w/unwinding
bound 1, Z3/
Minisat
(DAC’03,
TACAS’04,
CAV’13)
Loop
acceleration
(CAV’13)
Static weak
memory cycles
(TOPLAS/
PLDI’14)
re-compile
using gcc
(errors not
fatal)

Overall Analysis Status (preliminary!)
• In addition to 314 bugs reported and not yet fixed: 4915 packages with error
reports - top causes:
1789 CBMC counterexamples (including several using loop acceleration)
1711 Loop acceleration bugs
200 Floating point support in Z3 back end
198 Type-inconsistent access to heap with symbolic offset
129 CBMC Out-of-memory
54 Parameter counts differ
48 Conflicting array sizes
46 Conflicting types
42 Conflicting struct types
32 Conflicting return types (byte size)

Questions
Software? Yes.
Guarantees? Sometimes.

Bounded Model Checking for C Programs in an Enterprise Environment

More Related Content

What's hot

Viewers also liked

Similar to Bounded Model Checking for C Programs in an Enterprise Environment

More from AdaCore

Recently uploaded

Bounded Model Checking for C Programs in an Enterprise Environment