Upcoming SlideShare
×

# Formal Verification of Programming Languages

2,135 views

Published on

Now with more Haskell and proof!

Published in: Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
2,135
On SlideShare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
0
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Formal Verification of Programming Languages

1. 1. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Formal Veriﬁcation of Programming Language Implementations Ph.D. Literature Seminar Jason S. Reich <jason@cs.york.ac.uk> University of York 11th January 2010
2. 2. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Compiling an arithmetic language Compile from a simple arithmetic language to machine code for a simple register machine. Example taken from [McCart67]
3. 3. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Compiling an arithmetic language Compile from a simple arithmetic language to machine code for a simple register machine. Source language Numeric constants Variables Addition e.g. (x + 3) + (x + (y + 2)) Example taken from [McCart67]
4. 4. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Compiling an arithmetic language Compile from a simple arithmetic language to machine code for a simple register machine. Target language Source language Load Immediate into ac Numeric constants LOAD into ac from Variables address/register Addition STOre ac value to address/register e.g. (x + 3) + (x + (y + 2)) ADD register value to ac Example taken from [McCart67]
5. 5. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Compiling an arithmetic language Arithmetic expression compiler in Haskell compile :: I n t → Source → Target compile t ( Const v ) = [ Li v ] compile t ( Var x ) = [ Load (Map x ) ] compile t (Sum e1 e2 ) = c o m p i l e t e1 ++ [ Sto ( Reg t ) ] ++ c o m p i l e ( t + 1 ) e2 ++ [ Add ( Reg t ) ] When compiled and executed, is the value in the accumulator the result of the source arithmetic expression?
6. 6. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Compiling an arithmetic language (x + 3) + (x + (y + 2)) compiled to machine code? 1 LOAD M[x] 8 LOAD M[y] 2 STO R[t + 0] 9 STO R[t + 2] 3 LI 3 10 LI 2 4 ADD R[t + 0] 11 ADD R[t + 2] 5 STO R[t + 0] 12 ADD R[t + 1] 6 LOAD M[x] 13 ADD R[t] 7 STO R[t + 1] n.b. Where M is a mapping of variable names to memory locations and R is an indexing of registers.
7. 7. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Why use high-level languages? Rapid development Easier to understand, maintain and modify Less likely to make mistakes Easier to reason about and infer properties Architecture portability But...
8. 8. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Can you trust your compiler? Use a compiler to translate from a high-level language to a low-level Compilers are programs (generally) written by people People make mistakes Can silently turn “a correct program into an incorrect executable” [Leroy09] GHC 6.10.x is ≈ 800, 000 lines of code and has had 737 bugs reported in the bug tracker as of 04/12/2009 [GHC] Can we formally verify a compiler?
9. 9. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions McCarthy and Painter, 1967 “Correctness of a compiler for arithmetic expressions” [McCart67] Describe, in ﬁrst-order predicate logic; Source language semantics Target language semantics A compilation process Reason that the compiler maintains semantic equivalence
10. 10. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions McCarthy and Painter, 1967 Semantic equivalence in [McCart67] ∀e ∈ Expressions, ∀µ ∈ Variable Mappings • source(e, µ) ≡ acValue(target(compile(e), construct(µ))) Very limited, small toy source and target language Proof performed by hand Logical framework and proof presented in under ten pages Shows that proving a compiler correct is possible
11. 11. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Proving the [McCart67] compiler target (compile t x) ( construct s) Ac ≡ source x s type Abstract = Name → Value type Concrete = Address → Value construct s = λ (Map v ) → s v write k v s = λ k’ → i f k == k ’ t h e n v e l s e s k ’ −− S e m a n t i c s f o r the source language s o u r c e : : Source → A b s t r a c t → Value s o u r c e ( Const n ) = n s o u r c e ( Var v ) s = s v s o u r c e ( Add x y ) s = source x s + source y s −− S e m a n t i c s f o r t h e t a r g e t l a n g u a g e t a r g e t : : Target → Concrete → Concrete target [ ] s = s t a r g e t ( i : i s ) s = t a r g e t i s \$ case i of Li n → w r i t e Ac n s Load r → w r i t e Ac ( s r ) s Sto r → w r i t e r ( s Ac ) s Sum r → w r i t e Ac ( s Ac + s r ) s
12. 12. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Proving the [McCart67] compiler Proof of correctness for constants { case where ‘x = Const n’ } target (compile t (Const n)) ( construct s) Ac = { inline ‘compile’ } target [ Li n] ( construct s) Ac = { inline ‘ target ’ } write Ac n (construct s) Ac = { inline ‘ write ’ } n = { equivalent to } source (Const v) s
13. 13. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Proving the [McCart67] compiler Proof of correctness for variables { case where ‘x = Var v’ } target (compile t (Var v)) ( construct s) Ac = { inline ‘compile’ } target [Load (Map v)] (construct s) Ac = { inline ‘ target ’ } write Ac (construct s (Map v)) (construct s) Ac = { inline ‘ write ’ } ( construct s) (Map v) = { inline ‘ construct ’ } s v = { equivalent to } source (Var v) s
14. 14. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Assumed lemmas Untouched Registers lemma Any expression x, compiled to use registers t and above, will not write to a register less than t. Therefore; r < t ⇒ target (compile t x) s (Reg r) ≡ s (Reg r) Untouched Variables lemma The compiled form of expression x will never write to a memory location mapped to a variable. Therefore; target (compile t x) s (Map v) ≡ s (Map v)
15. 15. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Proving the [McCart67] compiler Proof of correctness for addition { case where ‘x = Add x y’ } target (compile t (Add x y)) ( construct s) Ac = { inline ‘compile’ and ‘ target ’ } let s1 = target (compile t x) ( construct s) s2 = write (Reg t) (s1 Ac) s1 s3 = target (compile (t + 1) y) s2 in write Ac (s3 Ac + s3 (Reg t)) s3 Ac = { State lemmas and inline ‘ write ’ s } target (compile t x) ( construct s) Ac + target (compile (t + 1) y) ( construct s) Ac = { inductive hypothesis − structural induction } source x s + source y s = { equivalent to } source (Add x y) s
16. 16. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Milner and Weyhrauch, 1972 “Proving compiler correctness in a mechanised logic” [Milner72] Provide an LCF machine-checked proof of the McCarthy-Painter example Proceed towards mechanically proving a compiler for a more complex language to a stack machine Claim to have “no signiﬁcant doubt that the remainder of the proof can be done on machine” [Milner72]
17. 17. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Morris, 1973 “Advice on structuring compilers and proving them correct” [Morris73] Proves by hand the correctness of a compiler for a source language that contains assignment, conditionals, loops, arithmetic, booleans operations and local deﬁnitions “Essence” of the advice presented in [Morris73] compile Source language −−→ −− Target language    Target semantics Source semantics Source meanings ←−− −− Target meanings decode
18. 18. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Thatcher, Wagner and Wright, 1980 Advice presented in [Thatch80] compile Source language −−→ −− Target language    Target semantics Source semantics Source meanings −−→ −− Target meanings encode “More on advice on structuring compilers and proving them correct” [Thatch80] Provides a diﬀerent encoding of the target language to [Morris73] Claim that mechanised theorem proving tools required further development
19. 19. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Syntax of source language in [Thatch80] ae ::= integer constant st ::= continue | variable | variable := ae | - ae | if be then st else st | Pr ae | st ; st | Su ae | while be do st | ae + ae be ::= boolean constant | ae − ae | even ae | ae × ae | ae ≤ ae | if be then ae else ae | ae ≥ ae | st result ae | ae = ae | let variable be ae in ae | ¬ be n.b. Similar to [Milner72] and [Morris73] | be ∧ be but with more operators and sequential | be ∨ be composition. Struggling to ﬁt this onto one slide.
20. 20. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions The “structuring compilers” series Discuss constructing algebras to describe language syntax and meaning The language abstract syntaxes as initial algebras Unique homomorphism from syntaxes to meanings, the semantics The compiler is the unique homomorphism between source and target syntaxes “... reduces to a proof that encode is a homomorphism ...” [Thatch80] “No structual induction is required ...” [Thatch80]
21. 21. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Meijer, 1994 “More advice on proving a compiler correct: Improve a correct compiler” [Meijer94] Given an interpreter for a source language, can we transform it into a compiler to and residual interpreter for the target language? A functional decomposition problem (i.e. interpreter = emulator ◦ compiler ) Demonstrate this technique for a ﬁrst-order imperative language compiling to a three-address code machine While quite feasible for ﬁrst-order languages, becomes far more diﬃcult for higher-order languages
22. 22. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Berghofer and Stecker, 2003 “Extracting a formally veriﬁed, fully executable compiler from a proof assistant” [Bergho03] Proves a compiler for a subset of the Java source language to Java bytecode Includes typechecking, abstract syntax tree annotation and bytecode translation Isabelle/HOL used to prove properties about an abstract compiler Isabelle code extraction to produce an executable compiler
23. 23. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Dave, 2003 Papers listed against decade published Maulik A. Dave’s bibliography for “Compiler Veriﬁcation” [Dave03] Ninety-nine papers listed Ninety-one of those listed were published after 1990 Interestingly neither the Milner and Weyhrauch paper nor the Meijer are included
24. 24. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Recent work Leroy’s “A formally veriﬁed compiler back-end” [Leroy09] Proves a compiler for Cminor to PowerPC assembler Chlipala’s “A veriﬁed compiler for an impure functional language” [Chlipa10] For a toy (but still quite feature rich) functional source language to instructions register-based machine Both use the Coq proof assistant and code extraction Both decompose the problem into compilation to several intermediate languages Both express worries that the proof assistant itself contain bugs that would invalidate correctness
25. 25. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Conclusions Compilers have been proved correct for progressively larger source languages A variety of diﬀerent techniques are available ensuring semantic equivalences Rapidly became apparent that some kind of proof assistant is required Decomposition of large compilers is a key factor for success Programs are only veriﬁed when all surrounding elements are veriﬁed
26. 26. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Open questions What about compilers for larger target languages and more advanced compilation facilities? Are our mechanised assistants producing valid proofs? Are there other ways to decompose the problem? Are particular language paradigms more amenable to compiler veriﬁcation? Why haven’t the concepts of [Meijer94] been more widely used? What other ways are there of decomposing the compiler veriﬁcation problem?
27. 27. Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions More information Slides and bibliography will be made available at; http://www-users.cs.york.ac.uk/~jason/ Jason S. Reich <jason@cs.york.ac.uk>