1.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
Formal Veriﬁcation of Programming Language
Implementations
Ph.D. Literature Seminar
Jason S. Reich
<jason@cs.york.ac.uk>
University of York
11th January 2010
2.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
Compiling an arithmetic language
Compile from a simple arithmetic language to machine code for a
simple register machine.
Example taken from [McCart67]
3.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
Compiling an arithmetic language
Compile from a simple arithmetic language to machine code for a
simple register machine.
Source language
Numeric constants
Variables
Addition
e.g. (x + 3) + (x + (y + 2))
Example taken from [McCart67]
4.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
Compiling an arithmetic language
Compile from a simple arithmetic language to machine code for a
simple register machine.
Target language
Source language
Load Immediate into ac
Numeric constants LOAD into ac from
Variables address/register
Addition STOre ac value to
address/register
e.g. (x + 3) + (x + (y + 2))
ADD register value to ac
Example taken from [McCart67]
5.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
Compiling an arithmetic language
Arithmetic expression compiler in Haskell
compile :: I n t → Source → Target
compile t ( Const v ) = [ Li v ]
compile t ( Var x ) = [ Load (Map x ) ]
compile t (Sum e1 e2 ) = c o m p i l e t e1
++ [ Sto ( Reg t ) ]
++ c o m p i l e ( t + 1 ) e2
++ [ Add ( Reg t ) ]
When compiled and executed, is the value in the accumulator the
result of the source arithmetic expression?
6.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
Compiling an arithmetic language
(x + 3) + (x + (y + 2)) compiled to machine code?
1 LOAD M[x] 8 LOAD M[y]
2 STO R[t + 0] 9 STO R[t + 2]
3 LI 3 10 LI 2
4 ADD R[t + 0] 11 ADD R[t + 2]
5 STO R[t + 0] 12 ADD R[t + 1]
6 LOAD M[x] 13 ADD R[t]
7 STO R[t + 1]
n.b. Where M is a mapping of variable names to memory locations and R is an
indexing of registers.
7.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
Why use high-level languages?
Rapid development
Easier to understand, maintain and modify
Less likely to make mistakes
Easier to reason about and infer properties
Architecture portability
But...
8.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
Can you trust your compiler?
Use a compiler to translate from a high-level language to a
low-level
Compilers are programs (generally) written by people
People make mistakes
Can silently turn “a correct program into an incorrect
executable” [Leroy09]
GHC 6.10.x is ≈ 800, 000 lines of code and has had 737 bugs
reported in the bug tracker as of 04/12/2009 [GHC]
Can we formally verify a compiler?
9.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
McCarthy and Painter, 1967
“Correctness of a compiler for arithmetic expressions”
[McCart67]
Describe, in ﬁrst-order predicate logic;
Source language semantics
Target language semantics
A compilation process
Reason that the compiler maintains semantic equivalence
10.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
McCarthy and Painter, 1967
Semantic equivalence in [McCart67]
∀e ∈ Expressions, ∀µ ∈ Variable Mappings •
source(e, µ) ≡ acValue(target(compile(e), construct(µ)))
Very limited, small toy source and target language
Proof performed by hand
Logical framework and proof presented in under ten pages
Shows that proving a compiler correct is possible
11.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
Proving the [McCart67] compiler
target (compile t x) ( construct s) Ac ≡ source x s
type Abstract = Name → Value
type Concrete = Address → Value
construct s = λ (Map v ) → s v
write k v s = λ k’ → i f k == k ’ t h e n v e l s e s k ’
−− S e m a n t i c s f o r the source language
s o u r c e : : Source → A b s t r a c t → Value
s o u r c e ( Const n ) = n
s o u r c e ( Var v ) s = s v
s o u r c e ( Add x y ) s = source x s + source y s
−− S e m a n t i c s f o r t h e t a r g e t l a n g u a g e
t a r g e t : : Target → Concrete → Concrete
target [ ] s = s
t a r g e t ( i : i s ) s = t a r g e t i s $ case i of
Li n → w r i t e Ac n s
Load r → w r i t e Ac ( s r ) s
Sto r → w r i t e r ( s Ac ) s
Sum r → w r i t e Ac ( s Ac + s r ) s
12.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
Proving the [McCart67] compiler
Proof of correctness for constants
{ case where ‘x = Const n’ }
target (compile t (Const n)) ( construct s) Ac
= { inline ‘compile’ }
target [ Li n] ( construct s) Ac
= { inline ‘ target ’ }
write Ac n (construct s) Ac
= { inline ‘ write ’ }
n
= { equivalent to }
source (Const v) s
13.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
Proving the [McCart67] compiler
Proof of correctness for variables
{ case where ‘x = Var v’ }
target (compile t (Var v)) ( construct s) Ac
= { inline ‘compile’ }
target [Load (Map v)] (construct s) Ac
= { inline ‘ target ’ }
write Ac (construct s (Map v)) (construct s) Ac
= { inline ‘ write ’ }
( construct s) (Map v)
= { inline ‘ construct ’ }
s v
= { equivalent to }
source (Var v) s
14.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
Assumed lemmas
Untouched Registers lemma
Any expression x, compiled to use registers t and above, will not
write to a register less than t. Therefore;
r < t ⇒ target (compile t x) s (Reg r) ≡ s (Reg r)
Untouched Variables lemma
The compiled form of expression x will never write to a memory
location mapped to a variable. Therefore;
target (compile t x) s (Map v) ≡ s (Map v)
15.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
Proving the [McCart67] compiler
Proof of correctness for addition
{ case where ‘x = Add x y’ }
target (compile t (Add x y)) ( construct s) Ac
= { inline ‘compile’ and ‘ target ’ }
let s1 = target (compile t x) ( construct s)
s2 = write (Reg t) (s1 Ac) s1
s3 = target (compile (t + 1) y) s2
in write Ac (s3 Ac + s3 (Reg t)) s3 Ac
= { State lemmas and inline ‘ write ’ s }
target (compile t x) ( construct s) Ac +
target (compile (t + 1) y) ( construct s) Ac
= { inductive hypothesis − structural induction }
source x s + source y s
= { equivalent to }
source (Add x y) s
16.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
Milner and Weyhrauch, 1972
“Proving compiler correctness in a mechanised logic”
[Milner72]
Provide an LCF machine-checked proof of the
McCarthy-Painter example
Proceed towards mechanically proving a compiler for a more
complex language to a stack machine
Claim to have “no signiﬁcant doubt that the remainder of the
proof can be done on machine” [Milner72]
17.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
Morris, 1973
“Advice on structuring compilers and proving them correct”
[Morris73]
Proves by hand the correctness of a compiler for a source
language that contains assignment, conditionals, loops,
arithmetic, booleans operations and local deﬁnitions
“Essence” of the advice presented in [Morris73]
compile
Source language −−→
−− Target language
Target semantics
Source semantics
Source meanings ←−−
−− Target meanings
decode
18.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
Thatcher, Wagner and Wright, 1980
Advice presented in [Thatch80]
compile
Source language −−→
−− Target language
Target semantics
Source semantics
Source meanings −−→
−− Target meanings
encode
“More on advice on structuring compilers and proving them
correct” [Thatch80]
Provides a diﬀerent encoding of the target language to
[Morris73]
Claim that mechanised theorem proving tools required further
development
19.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
Syntax of source language in [Thatch80]
ae ::= integer constant
st ::= continue
| variable
| variable := ae
| - ae
| if be then st else st
| Pr ae
| st ; st
| Su ae
| while be do st
| ae + ae
be ::= boolean constant | ae − ae
| even ae | ae × ae
| ae ≤ ae | if be then ae else ae
| ae ≥ ae | st result ae
| ae = ae | let variable be ae in ae
| ¬ be n.b. Similar to [Milner72] and [Morris73]
| be ∧ be but with more operators and sequential
| be ∨ be composition. Struggling to ﬁt this onto
one slide.
20.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
The “structuring compilers” series
Discuss constructing algebras to describe language syntax and
meaning
The language abstract syntaxes as initial algebras
Unique homomorphism from syntaxes to meanings, the
semantics
The compiler is the unique homomorphism between source
and target syntaxes
“... reduces to a proof that encode is a homomorphism ...”
[Thatch80]
“No structual induction is required ...” [Thatch80]
21.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
Meijer, 1994
“More advice on proving a compiler correct: Improve a correct
compiler” [Meijer94]
Given an interpreter for a source language, can we transform
it into a compiler to and residual interpreter for the target
language?
A functional decomposition problem (i.e.
interpreter = emulator ◦ compiler )
Demonstrate this technique for a ﬁrst-order imperative
language compiling to a three-address code machine
While quite feasible for ﬁrst-order languages, becomes far
more diﬃcult for higher-order languages
22.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
Berghofer and Stecker, 2003
“Extracting a formally veriﬁed, fully executable compiler from
a proof assistant” [Bergho03]
Proves a compiler for a subset of the Java source language to
Java bytecode
Includes typechecking, abstract syntax tree annotation and
bytecode translation
Isabelle/HOL used to prove properties about an abstract
compiler
Isabelle code extraction to produce an executable compiler
23.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
Dave, 2003
Papers listed against decade published
Maulik A. Dave’s
bibliography for “Compiler
Veriﬁcation” [Dave03]
Ninety-nine papers listed
Ninety-one of those listed
were published after 1990
Interestingly neither the
Milner and Weyhrauch paper
nor the Meijer are included
24.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
Recent work
Leroy’s “A formally veriﬁed compiler back-end” [Leroy09]
Proves a compiler for Cminor to PowerPC assembler
Chlipala’s “A veriﬁed compiler for an impure functional
language” [Chlipa10]
For a toy (but still quite feature rich) functional source
language to instructions register-based machine
Both use the Coq proof assistant and code extraction
Both decompose the problem into compilation to several
intermediate languages
Both express worries that the proof assistant itself contain
bugs that would invalidate correctness
25.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
Conclusions
Compilers have been proved correct for progressively larger
source languages
A variety of diﬀerent techniques are available ensuring
semantic equivalences
Rapidly became apparent that some kind of proof assistant is
required
Decomposition of large compilers is a key factor for success
Programs are only veriﬁed when all surrounding elements are
veriﬁed
26.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
Open questions
What about compilers for larger target languages and more
advanced compilation facilities?
Are our mechanised assistants producing valid proofs?
Are there other ways to decompose the problem?
Are particular language paradigms more amenable to compiler
veriﬁcation?
Why haven’t the concepts of [Meijer94] been more widely
used?
What other ways are there of decomposing the compiler
veriﬁcation problem?
27.
Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions
More information
Slides and bibliography will be made available at;
http://www-users.cs.york.ac.uk/~jason/
Jason S. Reich
<jason@cs.york.ac.uk>
Be the first to comment