Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Repair dagstuhl jan2017

15,062 views

Published on

Overview talk given at Dagstuhl Seminar on Automated Program Repair, January 2017

Published in: Education
  • Be the first to comment

  • Be the first to like this

Repair dagstuhl jan2017

  1. 1. General Summary of Program Repair, and Semantic Repair Abhik Roychoudhury National University of Singapore Dagstuhl seminar, 2017
  2. 2. Bug Fixing o Most software has many bugs. o Security-related bugs should be fixed before they are exploited by malicious users. o Oftentimes, bugs are not fixed even a few months after they were reported. o E.g. Bug 18665 of glibc • Reported and responded on July 2015 • Patched on Feb 2016 • CVSS score: 8.1 / 10 (buffer overflow) o “Thanks for the bug report. Do you have a test case that triggers this scenario? Do you have a patch or suggested fix?” Dagstuhl seminar, 2017
  3. 3. (Why) Program Repair 1. “Patches as better bug reports” [Weimer 2006]. 2. Automating the simple one-line fixes as patch suggestions • Work with companies with commercial testing tools. • automating targeted repair techniques with template fixes e.g. overflows. 3. Grading and understanding of programming assignments • … if only the education business takes off 4. … Note: 2 & 3 are very different businesses. Dagstuhl seminar, 2017
  4. 4. DARPA CGC 4 A team of hackers won $2 million by building a machine that could hack better than they could Read more at http://www.businessinsider.sg/forallsecure-mayhem- darpa-cyber-grand-challenge-2016- 8/#ZuIF7Dmq3aaCAdaq.99 DARPA Cyber Grand Challenge -> Automation of Security [detecting and fixing vulnerabilities automatically]
  5. 5. (Troubles with) Repair • Weak description of intended behavior / correctness criterion e.g. tests • Possibility to use “Bugs as deviant behavior” philosophy • Weak applicability of repair techniques e.g. only overflow errors • Large search space of candidate patches for general-purpose repair tools. • Patch suggestions and Interactive Repair Dagstuhl seminar, 2017
  6. 6. Correctness Criterion • Assertions or Specifications o May be suitable for targeted repair e.g. access control policy • Bugs as deviant behavior o A property which is rarely violated – dynamic invariants! o Make sure that it is never violated [Clearview paper, SOSP 2009] • Test-driven repair o Repair based on test cases, to pass them. o Most works we talk about use this criterion. o Brings us to issues like strength of test oracle, quality of test-suite … Dagstuhl seminar, 2017
  7. 7. Large search space – syntax directed view 1. Where to fix – in which line? 2. Generate the candidate patches in this line. 3. Validate the candidate patches. Dagstuhl seminar, 2017
  8. 8. Large search space – semantic view 1. Where to fix – in which line? 2. What values should be returned by these lines? <inp=1, ret=0> 3. What are the expressions which will return these values? Dagstuhl seminar, 2017
  9. 9. High level view Dagstuhl seminar, 2017 Test input Concrete values Expected output of program Output: Value-set or Constraint Symbolic execution Program Concrete Execution
  10. 10. General purpose repair • … given a test-suite [Conceptual characterization] o Generate –and-test patches (GenProg) o Specification inference and patch synthesis • Infer specification or properties about the patch to be synthesized. • Meet the specification by enumeration, or by solving constraints. • Various works – SemFix, Nopol, SPR, … o Ordering of search space of patches • Use minimality to prioritize the search space. • Use learning approaches to prioritize the search space. o Patch templates can be learnt from human fixes. Dagstuhl seminar, 2017
  11. 11. General purpose repair • … given a test-suite [Technical characterization] o Generate –and-test patches (heuristic search) • Use a well-known search framework GP for program repair o Specification inference and patch synthesis • Infer specification or properties about the patch to be synthesized. • Meet the specification by searching in a space, or by solving constraints. • Develop a customized search algorithm for each of the repair sub-problems, or use symbolic execution to infer specifications about the patch. o Embed a patch quality criterion in repair. • Use minimality to prioritize the search space. • Patch templates can be learnt from human fixes, or favor small fixes. • Machine learning is used to re-order the search space. Dagstuhl seminar, 2017
  12. 12. Specification Inference • Infer specification or properties about the patch to be synthesized. o Meet the specification by searching in a space, or by solving constraints. o Develop a customized search algorithm for each of the repair sub-problems, or use symbolic execution to infer specifications about the patch. Dagstuhl seminar, 2017 1. Where to fix – in which line? 2. What values should be returned by these lines? <inp=1, ret=0> 3. What are the expressions which will return these values? a. Enumerate values within a restricted domain e.g. T/F values for conditions [SPR] b. Use symbolic exec. to get sample values. [Angelix] c. Use symbolic exec. to infer all possible values as constraint. [SemFix]
  13. 13. Interactive Repair RQ1: Can users help the tool to improve the accuracy of the fix localization process? RQ2: Can users help the tool to quickly and effectively find a correct patch? ● Interactive Fault Localization Using Test Information ○ Recommend checking points or breakpoints ○ Patch suggestions at or around break-points ● Iterative Bug Isolation
  14. 14. Interactive Repair if( a || b) Branch is never executed line 2 Branch is never executed line 3 void getLargest(int a, int b, int c){ if( a > b && b > a) printf(“%d”, b) else if( b >= a && b >= c ) printf(“%d”, b) else if( c >= a && c >= b ) printf(“%d”, c) } Branch is never executed • Change condition to a > b && a > c • Remove b > a • Remove branch Automatic breakpoint Insertion Anti-patterns as fault explanation in natural language • a > b && b > a is a trivial condition Dagstuhl seminar, 2017 Multiple buggy locations
  15. 15. if( a || b) Expected c but got b line 3 void getLargest(int a, int b, int c){ if( a > b && a > c) printf(“%d”, b) else if( b >= a && b >= c ) printf(“%d”, b) else if( c >= a && c >= b ) printf(“%d”, c) } Expected c but got b • Change b to a Interactive Repair • Iterative Bug Isolation Dagstuhl seminar, 2017 Interactive & Iterative fault localization
  16. 16. Syntax and semantics based Syntax-based Schematic for 𝑒 𝜖 𝑆𝑒𝑎𝑟𝑐ℎ𝑆𝑝𝑎𝑐𝑒 do validate 𝑒 done Semantics-based Schematic for 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝜋: ∃𝛼. 𝜋 𝛼 do synthesize 𝑒 done Dagstuhl seminar, 2017
  17. 17. Comparison Dagstuhl seminar, 2017 Syntax-based Schematic for 𝑒 𝜖 𝑆𝑒𝑎𝑟𝑐ℎ𝑆𝑝𝑎𝑐𝑒 do validate 𝑒 // break if possible done Semantics-based Schematic for 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝜋: ∃𝛼. 𝜋 𝛼 do synthesize 𝑒 // cannot break done Syntax-based Schematic for 𝑒 𝜖 𝑆𝑒𝑎𝑟𝑐ℎ𝑆𝑝𝑎𝑐𝑒 do // long loop validate 𝑒 done Semantics-based Schematic for 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝜋: ∃𝛼. 𝜋 𝛼 do // efficient grouping synthesize 𝑒 done
  18. 18. Expand the schematic Dagstuhl seminar, 2017 Semantics-based Schematic for 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝜋: ∃𝛼. 𝜋 𝛼 do synthesize 𝑒 done Semantics-based Schematic for 𝑝𝑎𝑡ℎ 𝜋: ∃𝛼. 𝜋 𝛼 do synthesize 𝑒 done Semantics-based Schematic for each path do Get repair constraint  and Solve  to construct e done Semantics based schematic Get repair constraint from tests; Conjoin repair constraint from each test.
  19. 19. Conjure up a function Dagstuhl seminar, 2017 Buggy Program … var = a + b – c;x Failing test input Concrete Execution Symbolic Execution with x as the only unknown Path conditions, Output Expressions x = f(Live Vars) Get properties of function f via symbolic execution. Construct a function f which satisfies these properties !
  20. 20. Example 1 int is_upward( int inhibit, int up_sep, int down_sep){ 2 int bias; 3 if (inhibit) 4 bias = down_sep; // bias= up_sep + 100 5 else bias = up_sep ; 6 if (bias > down_sep) 7 return 1; 8 else return 0; 9 } inhibit up_sep down_se p Observed output Expected Output Result 1 0 100 0 0 pass 1 11 110 0 1 fail 0 100 50 1 1 pass 1 -20 60 0 1 fail 0 0 10 0 0 pass 20
  21. 21. Repair Constraint 1 int is_upward( int inhibit, int up_sep, int down_sep){ 2 int bias; 3 if (inhibit) 4 bias = down_sep; // bias= up_sep + 100 5 else bias = up_sep ; 6 if (bias > down_sep) 7 return 1; 8 else return 0; 9 } inhibit up_sep down_se p Observed output Expected Output Result 1 11 110 0 1 fail inhibit = 1, up_sep = 11, down_sep = 110 bias = X, path condition = true inhibit = 1, up_sep = 11, down_sep = 110 bias = X, path condition = X> 110 inhibit = 1, up_sep = 11, down_sep = 110 bias = X, path condition = X ≤ 110 Line 4 Line 7 Line 8 21
  22. 22. Repair Constraint 1 int is_upward( int inhibit, int up_sep, int down_sep){ 2 int bias; 3 if (inhibit) 4 bias = f(inhibit, up_sep, down_sep) 5 else bias = up_sep ; 6 if (bias > down_sep) 7 return 1; 8 else return 0; 9 } Inhibit == 1 up_sep == 11 down_se p == 110 Symbolic Execution f(1,11,110) > 110 22
  23. 23. Function synthesis • Instead of solving • Select primitive components to be used by the synthesized program based on complexity • Look for a program that uses only these primitive components and satisfy the repair constraint o Done via another constraint solving problem – pgm. synthesis • Solving the repair constraint is the key, not how it is solved • Enumerate expressions over a given set of components / operators o Enforce axioms of the operators o If candidate repair contains a constant, solve using SMT Repair Constraint: f(1,11,110) > 110  f(1,0,100) ≤ 100  f(1,-20,60) > 60 23
  24. 24. Patch as minimal change 24 Failing tests Debugging DSE Synthesis Failing tests MaxSMT solver Conjure a function which represents minimal change to buggy program.
  25. 25. Example 25 if (x > y) if (x > z) out =10; else out = 20; else out = 30; return out; if (x >= y) if (x >= z) out =10; else out = 20; else out = 30; return out; if (x > y) if (x > z) out =10; else out = 20; else out = 30; return ((x==y)? ((x==z)?10: 20)): out); SemFix DirectFix Test cases: all possible orderings of x,y,z
  26. 26. No fault localization 26 int foo(int x, int y){ if (x > y) y = y + 1; else y = y – 1; return y + 2; } Test: foo(0,0) == 3? x = 0  y = 0  result = 3 ( if (x1 > y1) then (y2 = y1 + 1) else (y2 = y1 – 1)  (result = y2 + 2) )  = UNSAT
  27. 27. Constraint = Whole Pgm. 27 27 x = 0  y = 0  result = 3 ( if (x1 > y1) then (y2 = y1 + 1) else (y2 = y1 – 1)  (result = y2 + 2) )  = UNSAT ( if (x1 >= y1) then (y2 = y1 + 1) else (y2 = y1 – 1)  (result = y2 + 2) )  x = 0  y = 0  result = 3 = SAT
  28. 28. Comparison with SemFix 0 2 4 6 8 10 12 SemFix DirectFix 28 #Pgm Equiv Same Loc Diff Regression SemFix 44 17% 46% 6.36 54% DirectFix 44 53% 95% 2.31 31%
  29. 29. Need Concise Constraints 29 Failing tests MaxSMT solver Minimized Mutations for Repair Failing tests DSE Concise Semantics Signature MaxSMT solver
  30. 30. Remember the schematic? Dagstuhl seminar, 2017 Semantics-based Schematic for 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛 𝜋: ∃𝛼. 𝜋 𝛼 do synthesize 𝑒 done Semantics-based Schematic for 𝑝𝑎𝑡ℎ 𝜋: ∃𝛼. 𝜋 𝛼 do synthesize 𝑒 done Semantics-based Schematic for 𝑝𝑎𝑡ℎ 𝜋: ∃𝛼. 𝜋 𝛼 do for all test t get constraint t Solve t t to construct 𝑒 done Semantics-based Schematic for 𝑝𝑎𝑡ℎ 𝜋: ∃𝛼. 𝜋 𝛼 do Get repair constraint  Solve  to construct 𝑒 done
  31. 31. Value based “Constraint” Dagstuhl seminar, 2017 Semantics-based Schematic for 𝑝𝑎𝑡ℎ 𝜋: ∃𝛼. 𝜋 𝛼 do for all test t get constraint t Solve t t to construct 𝑒 done Instead of representing t as a SMT constraint represent it using values. Value that is arbitrarily set during execution to a selected expression and that makes the program pass. Can be found by solving path condition of failing test case 𝐼, 𝑂 : 𝑝𝑎𝑡ℎ𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛 𝛼 ∧ 𝑖𝑛𝑝𝑢𝑡 = 𝐼 ∧ 𝑜𝑢𝑡𝑝𝑢𝑡 = 𝑂
  32. 32. Angelic Forest 32 E1 E2 E3 Failing Test Angelic Paths SAT angelic1 angelic2 angelic3
  33. 33. Angelic Forest 33 E1 E2 E3 Failing Test Angelic Paths UNSAT angelic1 angelic2 angelic3 angelic1 angelic3
  34. 34. Repair Constraint • SemFix work (ICSE 2013) o Example: for an identified expression e to be fixed • [ e > 0 ] ∧ f(t) == e for each test t • DirectFix work (ICSE 2015) o Whole Program as repair constraint o Use the principle of minimality to synthesize a minimal patch. • Angelix work (ICSE 2016) o Example: for identified expressions e1, e2, … to be fixed o [ (e == 1) ∨ (e == 2) ∨ (e== 3)] ∧ f(t) ==e for each test t. o [ (e1 == 0 ∧ e2 == 1) ∨ (e1==1 ∧e2 ==0)] ∧ f(t) ==e1∧g(t)==e2 for each test t. Dagstuhl seminar, 2017
  35. 35. Implementation 35KLEE Clang Runtime Synthesis Z3 Buggy Source Instrumented Source Suspicious Locations Debugger Angelic Forest Clang Instrumented Source Patch
  36. 36. Results 36 0 10 20 30 40 wireshark php gzip gmp libtiff Overall Angelix SPR GenProg #Fixes Del Del, Per Angelix 28 5 18% SPR 31 13 42% Subject LoC wireshark 2814K php 1046K gzip 491K gmp 145K libtiff 77K
  37. 37. Multiline Results Defect Fixed Expressions Libtiff-4a24508-cc79c2b 2 Libtiff-829d8c4-036d7bb 2 CoreUtils-00743a1f-ec48bead 3 CoreUtils-1dd8a331-d461bfd2 2 CoreUtils-c5ccf29b-a04ddb8d 3 37
  38. 38. “Latest” Results 38 1 i f ( hbtype == TLS1 HB REQUEST) { 2 . . . 3 memcpy (bp , pl , payload ) ; 4 . . . 5 } (a) The buggy part of the Heartbleed- vulnerable OpenSSL 1 i f ( hbtype == TLS1 HB REQUEST 2 && payload + 18 < s->s3->rrec.length) { 3 . . . 4 } (b) A fix generated automatically 1 if (1 + 2 + payload + 16 > s->s3->rrec.length) 2 return 0; 3 . . . 4 i f ( hbtype == TLS1_HB_REQUEST) { 5 . . . 6 } 7 e l s e i f ( hbtype == TLS1_HB_RESPONSE) { 8 . . . 9 } 10 r e t u r n 0 ; (c) The developer-provided repair The Heartbleed Bug is a serious vulnerability in the popular OpenSSL cryptographic software library. This weakness allows stealing the information protected, under normal conditions, by the SSL/TLS encryption used to secure the Internet. SSL/TLS provides communication security and privacy over the Internet for applications such as web, email, instant messaging (IM) and some virtual private networks (VPNs). --- Source: heartbleed.com

×