Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Automated Program Repair, Distinguished lecture at MPI-SWS

1,050 views

Published on

MPI-SWS Distinguished Lecture 2019. The talk focuses on fuzzing, symbolic execution as background technologies and compares their relative power. Then the use of such technologies for automated program repair is investigated.

Published in: Education
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Automated Program Repair, Distinguished lecture at MPI-SWS

  1. 1. Trustworthy Software & Automated Program Repair Abhik Roychoudhury Professor, National University of Singapore MPI Distinguished Lecture 2019 1
  2. 2. MPI Distinguished Lecture 2019 Working in Program Analysis and Software Security (2001-) Singapore Cybersecurity Consortium (2016) National Satellite of Excellence in Trustworthy Software Systems (2019) 2 Comprehensive university with Science, Arts, Engineering, medical, Law, Business, Public Policy, Music, Computing, … Public university 30K undergraduate students, 10K+ graduate students overall. 2500+ faculty members overall, 100+ in Computing (two departments CS and IS). http://www.nus.edu.sg/about#corporate-information
  3. 3. Snapshot of the talk • Search problems in software error detection. • Fuzzing and Symbolic Execution : Random search and logical analysis. • Random Search techniques are becoming more effective. • [PRELUDE] • The problem of program repair, as opposed to error detection. • Symbolic technique produces higher quality patches than random or biased random search. • Novel view of symbolic execution for spec. inference. • [MAIN PART of theTALK] MPI Distinguished Lecture 2019 3
  4. 4. Trustworthy SW • FuzzTesting – Feed semi-random inputs to find hangs and crashes • Continuous fuzzing – Incrementally find new “problems” in software • Crash reproduction – Re-construct a reported crash, crashing input not included due to privacy • Reaching nooks and corners • Localizing reported observable errors • Patching reported errors from input-output examples Space of Problems (Search Problems?) MPI Distinguished Lecture 2019 4
  5. 5. Trustworthy SW Search Problems • Random Search – Less systematic – Easy set-up, execute up to a time budget – Use objective function to steer search. • Symbolic Execution – Systematic – More involved set-up, solver calls. – Use logical formula to steer search. MPI Distinguished Lecture 2019 5
  6. 6. Use of Random Search - Fuzzing MPI Distinguished Lecture 2019 Input: Seed Inputs S 1:T✗ = ∅ 2:T = S 3: ifT = ∅ then 4: add empty file toT 5: end if 6: repeat 7: t = chooseNext(T) 8: p = assignEnergy(t) 9: for i from 1 to p do 10: t0 = mutate_input(t) 11: if t0 crashes then 12: add t0 toT✗ 13: else if isInteresting(t0 ) then 14: add t0 toT 15: end if 16: end for 17: until timeout reached or abort-signal Output: Crashing InputsT✗ 6
  7. 7. Intuition • if (condition1) • return // short path, frequented by many many inputs • else if (condition2) • exit // short paths, frequented by many inputs • else …. MPI Distinguished Lecture 2019 [CCS16, and its adoption] 7
  8. 8. Results MPI Distinguished Lecture 2019 p(i) = 0, if f(i) > µ min( ((i)/β)*2s(i), M) otherwise β is a constant s(i) #times the input exercising path i has been chosen for fuzzing f(i) #fuzz exercising path i (path-frequency) µ mean #fuzz exercising a discovered path (avg. path-frequency) M maximum energy expendable on a state Integrated into main-line of AFL fuzzer within a year of publication (CCS16. 8
  9. 9. MPI Distinguished Lecture 2019 SEARCH( A, L, U, X, found, j){ int j, found = 0; while (L <= U && found == 0){ j = (L+U)/2; if (X == A[j]){ found = 1;} else if (X < A[j]){ U = j -1; } else{ L = j +1; } } if (found == 0){ j = L – 1;} } SEARCH(A, 1, 5, 20, found, j) SEARCH(A, 1, 5, X, found, j) SEARCH(A, N, N+4, X, found, j) SEARCH(A, 1, M, X, found, j) Testing ? Comprehension?? Verification ??? Blurring the lines “Program testing and program proving can be considered as extreme alternatives. …. This paper describes a practical approach between these two extremes … Each symbolic execution result may be equivalent to a large number of normal tests” 9
  10. 10. Symbolic Execution MPI Distinguished Lecture 2019 int test_me(int Climb, int Up){ int sep, upward; if (Climb > 0){ sep = Up;} else {sep = add100(Up);} if (sep > 150){ upward = 1; } else {upward = 0;} if (upward < 0){ abort; } else return upward; } 10
  11. 11. Symbolic Execution MPI Distinguished Lecture 2019 int test_me(int Climb, int Up){ int sep, upward; if (Climb > 0){ sep = Up;} else {sep = add100(Up);} if (sep > 150){ upward = 1; } else {upward = 0;} if (upward < 0){ abort; } else return upward; } 11 Execute IF(r) : FORK [provided r is unresolved] Then: PC := PC  r Else: PC := PC  r Execute IF(r) Resolved branch condition r using concrete values Suppose true, PC := PC  r , OR Suppose false, PC := PC  r
  12. 12. Fuzzing vs. Symbolic Execution Bug Finding - Symbolic execution tree construction e.g. KLEE [Modeling system environment] - Grey-box fuzz testing for systematic path exploration inspired by concolic execution AFLFast MPI Distinguished Lecture 2019 12
  13. 13. Fuzzing vs. Symbolic Execution ReachabilityAnalysis Reachability of a location in the program - Traverse the symbolic execution tree using search strategies e.g. Hercules - Encode it as an optimization problem inside the genetic search of grey-box fuzzing AFLGo MPI Distinguished Lecture 2019 [CCS17] 13
  14. 14. Fuzzing vs Symbolic Execution MPI Distinguished Lecture 2019 φ1 = (x>y)∧(x+y>10) φ2 = ¬(x>y)∧(x+y>10)  Directed Fuzzing as optimization problem! 1. Instrumentation Time: • Instrument program to aggregate distance values. 2. Runtime, for each input • decide how long to be fuzzed based on distance. • If input is closer to the targets, it is fuzzed for longer. 14
  15. 15. Digression: Fuzzing vs Symbolic Execution MPI Distinguished Lecture 2019 Neuro-symbolic execution [NDSS19] 15 (Ack: figure from P Saxena and co-authors)
  16. 16. Trustworthy SW Search Problems • Random Search – Less systematic – Easy set-up, execute up to a time budget – Use objective function to steer search. – Enhance the effectiveness of search, with symbolic execution as inspiration • Symbolic Execution – More involved set-up, solver calls. – Use logical formula to steer search.. • Novel view of symbolic execution for spec. inference – Beyond error detection, self-healing MPI Distinguished Lecture 2019 16
  17. 17. Beyond Error Detection MPI Distinguished Lecture 2019 In the absence of formal specifications, analyze the buggy program and its artifacts such as execution traces via various heuristics to glean a specification about how it can pass tests and what could have gone wrong! Specification Inference (application: self-healing) 17 Buggy Program Tests
  18. 18. Program Repair MPI Distinguished Lecture 2019 REPLACETHIS FLOW Buggy Program Tests 18
  19. 19. Repair: Why? MPI Distinguished Lecture 2019 Education Productivity Security 19
  20. 20. Search MPI Distinguished Lecture 2019 Applicability Scalability Over-fitting Large program? Large search space? 20 (Ack: figure from C Le Goues)
  21. 21. Over-fitting MPI Distinguished Lecture 2019 Tests with oracles Buggy Program Symbolic Formulae Program Repair Patched Program 21
  22. 22. Example MPI Distinguished Lecture 2019 Test id a b c oracle Pass 1 -1 -1 -1 INVALID 2 1 1 1 EQUILATERAL 3 2 2 3 ISOSCELES 4 2 3 2 ISOSCELES 5 3 2 2 ISOSCELES 6 2 3 4 SCALANE 1 int triangle(int a, int b, int c){ 2 if (a <= 0 || b <= 0 || c <= 0) 3 return INVALID; 4 if (a == b && b == c) 5 return EQUILATERAL; 6 if (a == b || b != c) // bug! 7 return ISOSCELES; 8 return SCALENE; 9 } Correct fix (a == b || b== c || a == c) Traverse all mutations of line 6 ?? Hard to generate fix since (a ==c) or (c ==a) never appear anywhere else in the program ! 22
  23. 23. Example MPI Distinguished Lecture 2019 Test id a b c oracle Pass 1 -1 -1 -1 INVALID 2 1 1 1 EQUILATERAL 3 2 2 3 ISOSCELES 4 2 3 2 ISOSCELES 5 3 2 2 ISOSCELES 6 2 3 4 SCALANE 1 int triangle(int a, int b, int c){ 2 if (a <= 0 || b <= 0 || c <= 0) 3 return INVALID; 4 if (a == b && b == c) 5 return EQUILATERAL; 6 if (a == b || b != c) // bug! 7 return ISOSCELES; 8 return SCALENE; 9 } Correct fix (a == b || b== c || a == c) Automatically generate the constraint f(2,2,3)  f(2,3,2)  f(3,2,)   f(2,3,4) Solution f(ab,c) = (a == b || b == c || a == c) 23
  24. 24. Comparison 1. Where to fix, which line? 2. Generate patches in the candidate line 3. Validate the candidate patches against correctness criterion. 1. Where to fix, which line(s)? 2. What values should be returned by those lines, e.g. <inp ==1, ret== 0> 3. What are the expressions which will return such values? MPI Distinguished Lecture 2019 Syntax-based Schematic for e in Search-space{ Validate e againstTests } Semantics-basedSchematic for t inTests { generate repair constraintΨt } Synthesize e from ∧tΨt 24
  25. 25. Specification Inference MPI Distinguished Lecture 2019 var = f(live_vars) // X Test input t Concrete values Oracle (expected output) Output: Value-set or Constraint Symbolic execution Program Concrete Execution [ICSE13] 25
  26. 26. Example inhibit up_sep down_sep Observed o/p Oracle Pass 1 0 100 0 0 1 11 110 0 1 0 100 50 1 1 1 -20 60 0 1 0 0 10 0 0 MPI Distinguished Lecture 2019 1 int is_upward( int inhibit, int up_sep, int down_sep){ 2 int bias; 3 if (inhibit) 4 bias = down_sep; // bias= up_sep + 100 5 else bias = up_sep ; 6 if (bias > down_sep) 7 return 1; 8 else return 0; 9 } 26
  27. 27. Debugging • Given a test-suiteT – fail(s) º # of failing executions in which s occurs – pass(s) º # of passing executions in which s occurs – allfail ºTotal # of failing executions – allpass º Total # of passing executions • allfail+ allpass = |T| • Can also use other metric likeOchiai. Score(s) = fail(s) allfail fail(s) allfail pass(s) allpass + Buggy Program Test Suite -Investigate what this statement should be. - Generate a fixed statement Fixed Program YES NO MPI Distinguished Lecture 2019 27
  28. 28. Example MPI Distinguished Lecture 2019 28
  29. 29. Example MPI Distinguished Lecture 2019 29
  30. 30. Example MPI Distinguished Lecture 2019 • Accumulated constraints – f(1,11, 110) > 110  – f(1,0,100) ≤ 100  – … • Find a f satisfying this constraint – By fixing the set of operators appearing in f • Candidate methods • Search over the space of expressions • Program synthesis with fixed set of operators – Can also be achieved by second-order constraint solving • Generated fix – f(inhibit,up_sep,down_sep) = up_sep + 100 30
  31. 31. Repair Workflow MPI Distinguished Lecture 2019 31
  32. 32. Simplified Workflow, but MPI Distinguished Lecture 2019 Applicability Over-fitting Scalability [ICSE15] 32
  33. 33. Comparison MPI Distinguished Lecture 2019 #Programs Equivalent Same Loc. Diff. SemFix [ICSE13] 44 17% 46% 6.36 DirectFix [ICSE15] 44 53% 95% 2.31 33
  34. 34. Workflows MPI Distinguished Lecture 2019 Applicability Over-fitting Scalability 34
  35. 35. Angelix MPI Distinguished Lecture 2019 35
  36. 36. Repair Constraint MPI Distinguished Lecture 2019 • SemFix work (ICSE 2013) – Example: for an identified expression e to be fixed • [ X > 0 ] ∧ f(t) == X for each test t • DirectFix work (ICSE 2015) – Whole Program as repair constraint – Use the principle of minimality to synthesize a minimal patch. • Angelix work (ICSE 2016) – Example: for identified expressions e1, e2, … to be fixed – [ (X == 1) ∨ (X == 2) ∨ (X== 3)] ∧ f(t) ==X for each test t. – [ (X== 1 ∧Y == 1) ∨ (X==2 ∧Y ==2)] ∧ f(t) ==X ∧g(t)==Y for each test t. 36
  37. 37. Scalability Subject LoC wireshark 2814K php 1046K gzip 491K gmp 145K libtiff 77K MPI Distinguished Lecture 2019 Average time == 32 minutes 0 5 10 15 20 25 30 35 wireshark php gzip gmp libtiff Overall Angelix SPR GenProg 37
  38. 38. Patch Quality MPI Distinguished Lecture 2019 38
  39. 39. Experience of others • “The core technique inAngelix using symbolic execution and program synthesis works well”. • “It can potentially suffer from poor fault localization”. • “With better fault localization, the patch synthesis seems hard to improve in effectiveness” – Can still be improved in terms of efficiency • [Anecdotal comments only from user groups] MPI Distinguished Lecture 2019 [ICSE16 and its usage] 39 Experience of others
  40. 40. Specification Inference • Two approaches – Get property of function f via symbolic execution, and synthesize a function f satisfying these properties. – Directly solve for function f by building a second-order symbolic execution engine. MPI Distinguished Lecture 2019 • Allow for existentially quantified second order variables. • Restrict their interpretation to a language e.g. linear integer arithmetic Term =Var |Constant |Term +Term |Term –Term |Constant *Term • Example SAT – (0) > 0  (1) ≤ 0 – Satisfying solution  = x. 1 – x 40 Specification Inference
  41. 41. Term =Var | Constant |Term +Term | Term –Term | Constant *Term Second order Program Repair 41 scanf(“%d”, &x); for(i = 0; i <10; i++){ int t = (i,x); if (t > 0) printf(“1”); else printf(“0”); } P(5)  “1110000000” expected “1111111000” Buggy Program: SampleTest: Synthesis Specification:  . i i  output = expected Solve for  directly Term =Var | Constant |Term +Term | Term –Term | Constant *Term MPI Distinguished Lecture 2019
  42. 42. Term =Var | Constant |Term +Term | Term –Term | Constant *Term Second order Program Repair 42 scanf(“%d”, &x); for(i = 0; i <10; i++){ int t = (i,x); if (t > 0) printf(“1”); else printf(“0”); } P(5)  “1110000000” expected “1111111000” Buggy Program: SampleTest: Synthesis Specification:  . i i  output = expected Solve for  directly Term =Var | Constant |Term +Term | Term –Term | Constant *Term 𝜌 0,5 > 0 𝜌 1,5 > 0 𝜌 1,5 > 0 𝜌 2,5 > 0 𝜌 2,5 > 0 𝜌 2,5 > 0 𝜌 2,5 > 0 Yes No Yes No Yes No 𝑈𝑁𝑆𝐴𝑇 Yes Term =Var | Constant |Term +Term | Term –Term | Constant *Term MPI Distinguished Lecture 2019
  43. 43. Encoding for Synthesis 43 MPI Distinguished Lecture 2019 … error_severity(1); return; } / / r(ent->fts_info, ent->fts_errno, prev_depth) else if (ent->fts_info == FTSSLNONE){ if (symlink_loop(ent->fts_accpath)) …
  44. 44. Digression: Library Synthesis MPI Distinguished Lecture 2019 44
  45. 45. (Test-based) Program Repair Syntax-based Schematic Semantic Schematic for t inTests { generate repair constraintΨt } Synthesize e from ∧tΨt MPI Distinguished Lecture 2019 for e in Search-space{ Validate e against Tests } 45
  46. 46. Middle Road 中道 MPI Distinguished Lecture 2019 46
  47. 47. Test- equivalence based repair MPI Distinguished Lecture 2019 scanf ("%d" ,&x); for (i = 0; i < 10; i++) if (x – i > 0) printf ("1"); else printf ("0"); Consider all inequalities 𝛼𝑥 ± 𝛽𝑖 [>≥=≠] 𝛾 Sequence of values: Equivalence class (x = 4): {T, T, T, T, T, T, T, T, T, T} {x > 0, x > 1, …} {T, T, T, T, T, T, T, T, T, F} {x – i > -5, …} {T, T, T, T, T, T, T, T, F, T} EMPTY {T, T, T, T, T, T, T, T, F, F} {x – i > -4, …} {T, T, T, T, T, T, T, F, T, T} EMPTY {T, T, T, T, T, T, T, F, T, F} EMPTY {T, T, T, T, T, T, T, F, F,T} EMPTY … 47
  48. 48. Efficiency MPI Distinguished Lecture 2019 48 [TOSEM18]
  49. 49. Repair with Fuzzing MPI Distinguished Lecture, 2019 49
  50. 50. Fix2Fit MPI Distinguished Lecture 2019 50 Integration of repair into programming environments? Number of plausible patches that can be reduced if the tests are empowered with more oracles [ISSTA19]
  51. 51. Provably Correct MPI Distinguished Lecture 2019 FromTests? From Programs? 51
  52. 52. Analyzing Linux Busybox MPI Distinguished Lecture 2019 52 [ICSE18]
  53. 53. Other Applications: Education MPI Distinguished Lecture 2019 Education Productivity Security Intelligent tutoring systems:Automated grading and hint generation via Program Repair Detailed Study in IIT-Kanpur, India [FSE17, and ongoing] 53
  54. 54. Repair in steps MPI Distinguished Lecture 2019 54
  55. 55. Most Relevant Results MPI Distinguished Lecture 2019 Semantic Program Repair Using a Reference Implementation ( PDF ) ICSE 2018. Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis ( pdf ) ICSE 2016. DirectFix: Looking for Simple Program Repairs ( PDF ) ICSE 2015. SemFix: Program Repair via Semantic Analysis ( pdf ) ICSE 2013. Symbolic execution with second order existential constraints ESEC-FSE 2018. ACKNOWLEDGEMENT: National Cyber Security Research program from NRF Singapore http://www.comp.nus.edu.sg/~tsunami/ ACKNOWLEGEMENT: Sergey Mechtaev, Semantic Program Repair,ACM SIGSOFTOutstanding Doctoral Dissertation. Crash-Avoiding Program Repair ISSTA 2019. A Feasibility Study of Using Automated Program Repair for Introductory Programming Assignments ESEC-FSE 2017. 55
  56. 56. Other Results (mostly background technology) MPI Distinguished Lecture 2019 Coverage-based Greybox Fuzzing as Markov Chain CCS 2016. Directed Greybox Fuzzing CCS 2017. ACKNOWLEDGEMENT: National Cyber Security Research program from NRF Singapore DSO National Laboratories, Singapore. Neuro-Symbolic Execution: Augmenting Symbolic Execution with Neural Constraints NDSS 2019. 56 Hercules: Reproducing Crashes in Real-World Application Binaries ICSE 2015.
  57. 57. Relevant Projects in Singapore https://www.comp.nus.edu.sg/~nsoe-tss/ https://www.comp.nus.edu.sg/~tsunami/ Consortium:47 companies MPI Distinguished Lecture 2019 57
  58. 58. Summary MPI Distinguished Lecture 2019 Figure taken from: Automated Program Repair C. Le Goues, M. Pradel, A. Roychoudhury Review Article,Communications of the ACM, 2019. Selectively HIRING atVARIOUS LEVELS: Post Docs, … + Open Positions for ASST PROFs in NUS CS Department 58

×