• Save
Formal Verification of Programming Languages
Upcoming SlideShare
Loading in...5
×
 

Formal Verification of Programming Languages

on

  • 1,953 views

Now with more Haskell and proof!

Now with more Haskell and proof!

Statistics

Views

Total Views
1,953
Views on SlideShare
1,936
Embed Views
17

Actions

Likes
0
Downloads
0
Comments
0

2 Embeds 17

http://www.qatar.cmu.edu 16
http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Formal Verification of Programming Languages Formal Verification of Programming Languages Presentation Transcript

  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Formal Verification of Programming Language Implementations Ph.D. Literature Seminar Jason S. Reich <jason@cs.york.ac.uk> University of York 11th January 2010
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Compiling an arithmetic language Compile from a simple arithmetic language to machine code for a simple register machine. Example taken from [McCart67]
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Compiling an arithmetic language Compile from a simple arithmetic language to machine code for a simple register machine. Source language Numeric constants Variables Addition e.g. (x + 3) + (x + (y + 2)) Example taken from [McCart67]
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Compiling an arithmetic language Compile from a simple arithmetic language to machine code for a simple register machine. Target language Source language Load Immediate into ac Numeric constants LOAD into ac from Variables address/register Addition STOre ac value to address/register e.g. (x + 3) + (x + (y + 2)) ADD register value to ac Example taken from [McCart67]
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Compiling an arithmetic language Arithmetic expression compiler in Haskell compile :: I n t → Source → Target compile t ( Const v ) = [ Li v ] compile t ( Var x ) = [ Load (Map x ) ] compile t (Sum e1 e2 ) = c o m p i l e t e1 ++ [ Sto ( Reg t ) ] ++ c o m p i l e ( t + 1 ) e2 ++ [ Add ( Reg t ) ] When compiled and executed, is the value in the accumulator the result of the source arithmetic expression?
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Compiling an arithmetic language (x + 3) + (x + (y + 2)) compiled to machine code? 1 LOAD M[x] 8 LOAD M[y] 2 STO R[t + 0] 9 STO R[t + 2] 3 LI 3 10 LI 2 4 ADD R[t + 0] 11 ADD R[t + 2] 5 STO R[t + 0] 12 ADD R[t + 1] 6 LOAD M[x] 13 ADD R[t] 7 STO R[t + 1] n.b. Where M is a mapping of variable names to memory locations and R is an indexing of registers.
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Why use high-level languages? Rapid development Easier to understand, maintain and modify Less likely to make mistakes Easier to reason about and infer properties Architecture portability But...
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Can you trust your compiler? Use a compiler to translate from a high-level language to a low-level Compilers are programs (generally) written by people People make mistakes Can silently turn “a correct program into an incorrect executable” [Leroy09] GHC 6.10.x is ≈ 800, 000 lines of code and has had 737 bugs reported in the bug tracker as of 04/12/2009 [GHC] Can we formally verify a compiler?
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions McCarthy and Painter, 1967 “Correctness of a compiler for arithmetic expressions” [McCart67] Describe, in first-order predicate logic; Source language semantics Target language semantics A compilation process Reason that the compiler maintains semantic equivalence
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions McCarthy and Painter, 1967 Semantic equivalence in [McCart67] ∀e ∈ Expressions, ∀µ ∈ Variable Mappings • source(e, µ) ≡ acValue(target(compile(e), construct(µ))) Very limited, small toy source and target language Proof performed by hand Logical framework and proof presented in under ten pages Shows that proving a compiler correct is possible
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Proving the [McCart67] compiler target (compile t x) ( construct s) Ac ≡ source x s type Abstract = Name → Value type Concrete = Address → Value construct s = λ (Map v ) → s v write k v s = λ k’ → i f k == k ’ t h e n v e l s e s k ’ −− S e m a n t i c s f o r the source language s o u r c e : : Source → A b s t r a c t → Value s o u r c e ( Const n ) = n s o u r c e ( Var v ) s = s v s o u r c e ( Add x y ) s = source x s + source y s −− S e m a n t i c s f o r t h e t a r g e t l a n g u a g e t a r g e t : : Target → Concrete → Concrete target [ ] s = s t a r g e t ( i : i s ) s = t a r g e t i s $ case i of Li n → w r i t e Ac n s Load r → w r i t e Ac ( s r ) s Sto r → w r i t e r ( s Ac ) s Sum r → w r i t e Ac ( s Ac + s r ) s
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Proving the [McCart67] compiler Proof of correctness for constants { case where ‘x = Const n’ } target (compile t (Const n)) ( construct s) Ac = { inline ‘compile’ } target [ Li n] ( construct s) Ac = { inline ‘ target ’ } write Ac n (construct s) Ac = { inline ‘ write ’ } n = { equivalent to } source (Const v) s
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Proving the [McCart67] compiler Proof of correctness for variables { case where ‘x = Var v’ } target (compile t (Var v)) ( construct s) Ac = { inline ‘compile’ } target [Load (Map v)] (construct s) Ac = { inline ‘ target ’ } write Ac (construct s (Map v)) (construct s) Ac = { inline ‘ write ’ } ( construct s) (Map v) = { inline ‘ construct ’ } s v = { equivalent to } source (Var v) s
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Assumed lemmas Untouched Registers lemma Any expression x, compiled to use registers t and above, will not write to a register less than t. Therefore; r < t ⇒ target (compile t x) s (Reg r) ≡ s (Reg r) Untouched Variables lemma The compiled form of expression x will never write to a memory location mapped to a variable. Therefore; target (compile t x) s (Map v) ≡ s (Map v)
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Proving the [McCart67] compiler Proof of correctness for addition { case where ‘x = Add x y’ } target (compile t (Add x y)) ( construct s) Ac = { inline ‘compile’ and ‘ target ’ } let s1 = target (compile t x) ( construct s) s2 = write (Reg t) (s1 Ac) s1 s3 = target (compile (t + 1) y) s2 in write Ac (s3 Ac + s3 (Reg t)) s3 Ac = { State lemmas and inline ‘ write ’ s } target (compile t x) ( construct s) Ac + target (compile (t + 1) y) ( construct s) Ac = { inductive hypothesis − structural induction } source x s + source y s = { equivalent to } source (Add x y) s
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Milner and Weyhrauch, 1972 “Proving compiler correctness in a mechanised logic” [Milner72] Provide an LCF machine-checked proof of the McCarthy-Painter example Proceed towards mechanically proving a compiler for a more complex language to a stack machine Claim to have “no significant doubt that the remainder of the proof can be done on machine” [Milner72]
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Morris, 1973 “Advice on structuring compilers and proving them correct” [Morris73] Proves by hand the correctness of a compiler for a source language that contains assignment, conditionals, loops, arithmetic, booleans operations and local definitions “Essence” of the advice presented in [Morris73] compile Source language −−→ −− Target language    Target semantics Source semantics Source meanings ←−− −− Target meanings decode
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Thatcher, Wagner and Wright, 1980 Advice presented in [Thatch80] compile Source language −−→ −− Target language    Target semantics Source semantics Source meanings −−→ −− Target meanings encode “More on advice on structuring compilers and proving them correct” [Thatch80] Provides a different encoding of the target language to [Morris73] Claim that mechanised theorem proving tools required further development
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Syntax of source language in [Thatch80] ae ::= integer constant st ::= continue | variable | variable := ae | - ae | if be then st else st | Pr ae | st ; st | Su ae | while be do st | ae + ae be ::= boolean constant | ae − ae | even ae | ae × ae | ae ≤ ae | if be then ae else ae | ae ≥ ae | st result ae | ae = ae | let variable be ae in ae | ¬ be n.b. Similar to [Milner72] and [Morris73] | be ∧ be but with more operators and sequential | be ∨ be composition. Struggling to fit this onto one slide.
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions The “structuring compilers” series Discuss constructing algebras to describe language syntax and meaning The language abstract syntaxes as initial algebras Unique homomorphism from syntaxes to meanings, the semantics The compiler is the unique homomorphism between source and target syntaxes “... reduces to a proof that encode is a homomorphism ...” [Thatch80] “No structual induction is required ...” [Thatch80]
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Meijer, 1994 “More advice on proving a compiler correct: Improve a correct compiler” [Meijer94] Given an interpreter for a source language, can we transform it into a compiler to and residual interpreter for the target language? A functional decomposition problem (i.e. interpreter = emulator ◦ compiler ) Demonstrate this technique for a first-order imperative language compiling to a three-address code machine While quite feasible for first-order languages, becomes far more difficult for higher-order languages
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Berghofer and Stecker, 2003 “Extracting a formally verified, fully executable compiler from a proof assistant” [Bergho03] Proves a compiler for a subset of the Java source language to Java bytecode Includes typechecking, abstract syntax tree annotation and bytecode translation Isabelle/HOL used to prove properties about an abstract compiler Isabelle code extraction to produce an executable compiler
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Dave, 2003 Papers listed against decade published Maulik A. Dave’s bibliography for “Compiler Verification” [Dave03] Ninety-nine papers listed Ninety-one of those listed were published after 1990 Interestingly neither the Milner and Weyhrauch paper nor the Meijer are included
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Recent work Leroy’s “A formally verified compiler back-end” [Leroy09] Proves a compiler for Cminor to PowerPC assembler Chlipala’s “A verified compiler for an impure functional language” [Chlipa10] For a toy (but still quite feature rich) functional source language to instructions register-based machine Both use the Coq proof assistant and code extraction Both decompose the problem into compilation to several intermediate languages Both express worries that the proof assistant itself contain bugs that would invalidate correctness
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Conclusions Compilers have been proved correct for progressively larger source languages A variety of different techniques are available ensuring semantic equivalences Rapidly became apparent that some kind of proof assistant is required Decomposition of large compilers is a key factor for success Programs are only verified when all surrounding elements are verified
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions Open questions What about compilers for larger target languages and more advanced compilation facilities? Are our mechanised assistants producing valid proofs? Are there other ways to decompose the problem? Are particular language paradigms more amenable to compiler verification? Why haven’t the concepts of [Meijer94] been more widely used? What other ways are there of decomposing the compiler verification problem?
  • Motivation 1960s Proof 1970s 1980s 1990s 2000s Conclusions More information Slides and bibliography will be made available at; http://www-users.cs.york.ac.uk/~jason/ Jason S. Reich <jason@cs.york.ac.uk>