Upcoming SlideShare
×

Formal Verification of Programming Languages

814 views
723 views

Published on

Literature Seminar — Presented to PLASMA research group on 8th December 2009

Published in: Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
814
On SlideShare
0
From Embeds
0
Number of Embeds
48
Actions
Shares
0
0
0
Likes
0
Embeds 0
No embeds

No notes for slide

Formal Verification of Programming Languages

1. 1. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions Formal Veriﬁcation of Programming Language Implementations Ph.D. Literature Seminar Jason S. Reich <jason@cs.york.ac.uk> University of York December 8, 2009
2. 2. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions Compiling an arithmetic language Compile from a simple arithmetic language to machine code for a simple register machine. Example taken from [McCart67]
3. 3. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions Compiling an arithmetic language Compile from a simple arithmetic language to machine code for a simple register machine. Source language Numeric constants Variables Addition e.g. (x + 3) + (x + (y + 2)) Example taken from [McCart67]
4. 4. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions Compiling an arithmetic language Compile from a simple arithmetic language to machine code for a simple register machine. Target language Source language Load Immediate into ac Numeric constants LOAD into ac from Variables address/register Addition STOre ac value to address/register e.g. (x + 3) + (x + (y + 2)) ADD register value to ac Example taken from [McCart67]
5. 5. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions Compiling an arithmetic language Arithmetic expression compiler in Haskell compile : : Source −> I n t −> Target compile ( Const v ) t = [ L i v ] compile ( Var x ) t = [ Load x ] compile (Sum e1 e2 ) t = c o m p i l e e1 t ++ [ Sto ( "t + " ++ show t ) ] ++ c o m p i l e e2 ( t + 1 ) ++ [ Add ( "t + " ++ show t ) ]
6. 6. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions Compiling an arithmetic language When compiled and executed, is the value in the accumulator the result of the source arithmetic expression? (x + 3) + (x + (y + 2)) compiled to machine code? 1 LOAD x 8 LOAD y 2 STO t 9 STO t + 2 3 LI 3 10 LI 2 4 ADD t 11 ADD t + 2 5 STO t 12 ADD t + 1 6 LOAD x 13 ADD t 7 STO t + 1 n.b. Where x and y are known memory locations and t + k are registers.
7. 7. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions Why use high-level languages? Rapid development Easier to understand, maintain and modify Less likely to make mistakes Easier to reason about and infer properties Architecture portability But...
8. 8. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions Can you trust your compiler? Use a compiler to translate from a high-level language to a low-level Compilers are programs (generally) written by people People make mistakes Can silently turn “a correct program into an incorrect executable” [Leroy09] GHC 6.10.x is ≈ 800, 000 lines of code and has had 737 bugs reported in the bug tracker as of 04/12/2009 [GHC] Can we formally verify a compiler?
9. 9. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions Can you trust your compiler? Use a compiler to translate from a high-level language to a low-level Compilers are programs (generally) written by people People make mistakes Can silently turn “a correct program into an incorrect executable” [Leroy09] GHC 6.10.x is ≈ 800, 000 lines of code and has had 737 bugs reported in the bug tracker as of 04/12/2009 [GHC] Can we formally verify a compiler?
10. 10. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions Can you trust your compiler? Use a compiler to translate from a high-level language to a low-level Compilers are programs (generally) written by people People make mistakes Can silently turn “a correct program into an incorrect executable” [Leroy09] GHC 6.10.x is ≈ 800, 000 lines of code and has had 737 bugs reported in the bug tracker as of 04/12/2009 [GHC] Can we formally verify a compiler?
11. 11. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions McCarthy and Painter, 1967 “Correctness of a compiler for arithmetic expressions” [McCart67] Describe, in ﬁrst-order predicate logic; Source language semantics Target language semantics A compilation process Reason that the compiler maintains semantic equivalence
12. 12. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions McCarthy and Painter, 1967 Semantic equivalence in [McCart67] ∀e ∈ Expressions, ∀µ : Variable Mappings • interpret(e, µ) ≡ acValue(emulate(compile(e), mkState(µ))) Very limited, small toy source and target language Proof performed by hand Logical framework and proof presented in under ten pages Shows that proving a compiler correct is possible
13. 13. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions Milner and Weyhrauch, 1972 “Proving compiler correctness in a mechanised logic” [Milner72] Provide an LCF machine-checked proof of the McCarthy-Painter example Proceed towards mechanically proving a compiler for a more complex language to a stack machine Claim to have “no signiﬁcant doubt that the remainder of the proof can be done on machine” [Milner72]
14. 14. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions Morris, 1973 “Advice on structuring compilers and proving them correct” [Morris73] Proves by hand the correctness of a compiler for a source language that contains assignment, conditionals, loops, arithmetic, booleans operations and local deﬁnitions “Essence” of the advice presented in [Morris73] compile Source language −−→ −− Target language    Target semantics Source semantics Source meanings ←−− −− Target meanings decode
15. 15. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions Thatcher, Wagner and Wright, 1980 Advice presented in [Thatch80] compile Source language −−→ −− Target language    Target semantics Source semantics Source meanings −−→ −− Target meanings encode “More on advice on structuring compilers and proving them correct” [Thatch80] Provides a correct compiler for a more advanced target language than [Morris73] Claim that mechanised theorem proving tools required further development
16. 16. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions The “structuring compilers” series Discuss constructing algebras to describe languages How to move from one algebra to another Encode abstract state to concrete or decode to abstract? “there is not enough information in the [abstract] state to recover the [concrete] state completely” [Moore89] Further paper “Even more on advice on structuring compilers and proving them correct: changing an arrow” [Orejas81] [Moore89] discusses this issue from a practical perspective
17. 17. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions The “structuring compilers” series Discuss constructing algebras to describe languages How to move from one algebra to another Encode abstract state to concrete or decode to abstract? “there is not enough information in the [abstract] state to recover the [concrete] state completely” [Moore89] Further paper “Even more on advice on structuring compilers and proving them correct: changing an arrow” [Orejas81] [Moore89] discusses this issue from a practical perspective
18. 18. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions The “structuring compilers” series Discuss constructing algebras to describe languages How to move from one algebra to another Encode abstract state to concrete or decode to abstract? “there is not enough information in the [abstract] state to recover the [concrete] state completely” [Moore89] Further paper “Even more on advice on structuring compilers and proving them correct: changing an arrow” [Orejas81] [Moore89] discusses this issue from a practical perspective
19. 19. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions Meijer, 1994 “More advice on proving a compiler correct: Improve a correct compiler” [Meijer94] Given a interpreter for a source language, can we transform it into a compiler to and residual interpreter for the target language? A functional decomposition problem (i.e. interpreter = emulator ◦ compiler ) Demonstrate this technique for a ﬁrst-order imperative language compiling to a three-address code machine While quite feasible for ﬁrst-order languages, becomes far more diﬃcult for higher-order languages
20. 20. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions Berghofer and Stecker, 2003 “Extracting a formally veriﬁed, fully executable compiler from a proof assistant” [Bergho03] Proves a compiler for a subset of the Java source language to Java bytecode Includes typechecking, abstract syntax tree annotation and bytecode translation Isabelle/HOL used to prove properties about an abstract compiler Isabelle code extraction to produce an executable compiler
21. 21. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions Dave, 2003 Papers listed against decade published Maulik A. Dave’s bibliography for “Compiler Veriﬁcation” [Dave03] Ninety-nine papers listed Ninety-one of those listed were published after 1990 Interestingly neither the Milner and Weyhrauch paper nor the Meijer are included
22. 22. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions Dave, 2003 Papers listed against decade published Maulik A. Dave’s bibliography for “Compiler Veriﬁcation” [Dave03] Ninety-nine papers listed Ninety-one of those listed were published after 1990 Interestingly neither the Milner and Weyhrauch paper nor the Meijer are included
23. 23. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions Dave, 2003 Papers listed against decade published Maulik A. Dave’s bibliography for “Compiler Veriﬁcation” [Dave03] Ninety-nine papers listed Ninety-one of those listed were published after 1990 Interestingly neither the Milner and Weyhrauch paper nor the Meijer are included
24. 24. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions Recent work Leroy’s “A formally veriﬁed compiler back-end” [Leroy09] Proves a compiler for Cminor to PowerPC assembler Chlipala’s “A veriﬁed compiler for an impure functional language” [Chlipa10] For a toy (but still quite feature rich) functional source language to instructions register-based machine Both use the Coq proof assistant and code extraction Both decompose the problem into compilation to several intermediate languages Both express worries that the proof assistant itself contain bugs that would invalidate correctness
25. 25. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions Conclusions Compilers have been proved correct for progressively larger source languages Rapidly became apparent that some kind of proof assistant is required Decomposition of large compilers is a key factor for success Programs are only veriﬁed when all surrounding elements are veriﬁed
26. 26. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions Open questions What about compilers for larger target languages and more advanced compilation facilities? Are our mechanised assistants producing valid proofs? Are there other ways to decompose the problem? Are particular language paradigms more amenable to compiler veriﬁcation? Why haven’t the concepts of [Meijer94] been more widely used? What other ways are there of decomposing the compiler veriﬁcation problem?
27. 27. Motivation 1960s 1970s 1980s 1990s 2000s Conclusions More information Slides and bibliography will be made available at; http://www-users.cs.york.ac.uk/~jason/ Jason S. Reich <jason@cs.york.ac.uk>