NICTA Copyright 2014 
Productivity for 
Proof 
Engineering 
M. Staples, R. Jeffery, 
J. Andronick, T. Murray, G. Klein, 
R. Kolanski
In the beginning - 
• Empirical software engineering – 
• Formal methods/verification – 
• Operating systems – 
• seL4 and L4.verified projects at UNSW/NICTA 
• Goal – “An implementation correctness proof for 
seL4 with the kernel running on a mainstream 
embedded processor within 10% of the 
performance of L4.” Klein 2009. 
NICTA Copyright 2014 
2
History 
• seL4 concluded successfully by end 2007 
• 10,000 lines of C code 
• 2.2 person years of effort 
• L4.verified > 20 person years 
• For cost effective proof engineering a key 
consideration is proof productivity. 
NICTA Copyright 2014 
3
NICTA Copyright 2014 4
This study - Specs 
• Retrospective 9 projects from L4.verified. 
• All used Isabelle theorem prover. 
• Three formal specifications of seL4 – 
– Exec – models an executable representation of 
seL4’s design 
– Abstract – complete functional specification 
– CapDL – capabilities (access rights) between 
components 
NICTA Copyright 2014 
5
This study - Proofs 
• Six proofs – Three refinement proofs – 
– Code-to-exec, 
– Exec-to-abstract, 
– Abstract-to-CapDL. 
• Two security proofs – 
– Info.flow and 
– Integrity 
• CapDL policy proof. 
NICTA Copyright 2014 
6
Measures 
• Effort – in person weeks 
• Output – Lines of proof 
• Other variables – maximum team size, schedule 
pressure, overall difficulty, years experience with 
Isabelle, formal methods or theorem proving, the 
domain (operating systems). 
NICTA Copyright 2014 
7
The data 
NICTA Copyright 2014 
8 
Final Size 
(Kilo Lines of 
proof) 
Total Effort 
(Person weeks) 
Sched. Pressure Overall Diffic. Max Team 
(Headcount) 
CapDL Spec 2.14 27.5 AV LO 5 
CapDL-policy proof 0.85 11.3 LO AV 1 
Abstract-to-CapDL 
Refinement 
20.4 66 AV AV 5 
Integrity 7.05 28.5 V. HI HI 4 
Info.Flow 27.1 75.9 V.HI V.HI 8 
Exec-to-Abstract 
Refinement 
96.6 368 HI V.HI 6 
Code-to-Exec 
Refinement 
53.34 138 V.HI HI 6 
Exec Spec Haskell 6.01 92 AV HI 1 
Abstract Spec 4.9 15.3 AV AV 3
Effort – Size Plot for projects 
NICTA Copyright 2014 
9 
!
Project relationships 
• Total Project Effort = 9.98 + 3.35*Final Size 
R2 = 0.914, p<0.001 
• Possible outliers – large abstract refinement and 
executable spec. 
• Weak evidence that schedule pressure is 
associated with decreased effort, and overall 
difficulty and maximum team size with increased 
effort. But small sample size and not significant 
at 0.05. Experience not significant. 
NICTA Copyright 2014 
10
Effort – Size plot for individuals 
NICTA Copyright 2014 
11 
!
Individual relationships 
• 24 Individual contributions to five projects 
• R2 = 0.93, p<0.001 
NICTA Copyright 2014 
12
Threats 
• construct validity 
– Limitations of lines of proof as a size measure (?) 
– Subjective measures carefully defined 
• external validity 
– seL4 only therefore limited, but aids internal validity 
– Generalization not known 
• Internal validity 
– Wherever possible measures were carefully defined 
and reviewed by multiple persons 
– Factors not measured? 
NICTA Copyright 2014 
13
Conclusions 
• Proof engineering can bring the benefits of 
formal verification to more software engineering 
projects, but understanding cost effectiveness is 
an issue. 
• We find proof size and effort are strongly related 
for projects and individuals in L4verified 
• Significant opportunity for the empirical 
community to help understand rework, tools and 
techniques, proof patterns, reuse and so on in 
proof engineering. 
NICTA Copyright 2014 
14

167 - Productivity for proof engineering

  • 1.
    NICTA Copyright 2014 Productivity for Proof Engineering M. Staples, R. Jeffery, J. Andronick, T. Murray, G. Klein, R. Kolanski
  • 2.
    In the beginning- • Empirical software engineering – • Formal methods/verification – • Operating systems – • seL4 and L4.verified projects at UNSW/NICTA • Goal – “An implementation correctness proof for seL4 with the kernel running on a mainstream embedded processor within 10% of the performance of L4.” Klein 2009. NICTA Copyright 2014 2
  • 3.
    History • seL4concluded successfully by end 2007 • 10,000 lines of C code • 2.2 person years of effort • L4.verified > 20 person years • For cost effective proof engineering a key consideration is proof productivity. NICTA Copyright 2014 3
  • 4.
  • 5.
    This study -Specs • Retrospective 9 projects from L4.verified. • All used Isabelle theorem prover. • Three formal specifications of seL4 – – Exec – models an executable representation of seL4’s design – Abstract – complete functional specification – CapDL – capabilities (access rights) between components NICTA Copyright 2014 5
  • 6.
    This study -Proofs • Six proofs – Three refinement proofs – – Code-to-exec, – Exec-to-abstract, – Abstract-to-CapDL. • Two security proofs – – Info.flow and – Integrity • CapDL policy proof. NICTA Copyright 2014 6
  • 7.
    Measures • Effort– in person weeks • Output – Lines of proof • Other variables – maximum team size, schedule pressure, overall difficulty, years experience with Isabelle, formal methods or theorem proving, the domain (operating systems). NICTA Copyright 2014 7
  • 8.
    The data NICTACopyright 2014 8 Final Size (Kilo Lines of proof) Total Effort (Person weeks) Sched. Pressure Overall Diffic. Max Team (Headcount) CapDL Spec 2.14 27.5 AV LO 5 CapDL-policy proof 0.85 11.3 LO AV 1 Abstract-to-CapDL Refinement 20.4 66 AV AV 5 Integrity 7.05 28.5 V. HI HI 4 Info.Flow 27.1 75.9 V.HI V.HI 8 Exec-to-Abstract Refinement 96.6 368 HI V.HI 6 Code-to-Exec Refinement 53.34 138 V.HI HI 6 Exec Spec Haskell 6.01 92 AV HI 1 Abstract Spec 4.9 15.3 AV AV 3
  • 9.
    Effort – SizePlot for projects NICTA Copyright 2014 9 !
  • 10.
    Project relationships •Total Project Effort = 9.98 + 3.35*Final Size R2 = 0.914, p<0.001 • Possible outliers – large abstract refinement and executable spec. • Weak evidence that schedule pressure is associated with decreased effort, and overall difficulty and maximum team size with increased effort. But small sample size and not significant at 0.05. Experience not significant. NICTA Copyright 2014 10
  • 11.
    Effort – Sizeplot for individuals NICTA Copyright 2014 11 !
  • 12.
    Individual relationships •24 Individual contributions to five projects • R2 = 0.93, p<0.001 NICTA Copyright 2014 12
  • 13.
    Threats • constructvalidity – Limitations of lines of proof as a size measure (?) – Subjective measures carefully defined • external validity – seL4 only therefore limited, but aids internal validity – Generalization not known • Internal validity – Wherever possible measures were carefully defined and reviewed by multiple persons – Factors not measured? NICTA Copyright 2014 13
  • 14.
    Conclusions • Proofengineering can bring the benefits of formal verification to more software engineering projects, but understanding cost effectiveness is an issue. • We find proof size and effort are strongly related for projects and individuals in L4verified • Significant opportunity for the empirical community to help understand rework, tools and techniques, proof patterns, reuse and so on in proof engineering. NICTA Copyright 2014 14