Upcoming SlideShare
×

# Issta13 workshop on debugging

17,136 views
17,070 views

Published on

Published in: Technology
1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
17,136
On SlideShare
0
From Embeds
0
Number of Embeds
16,242
Actions
Shares
0
9
0
Likes
1
Embeds 0
No embeds

No notes for slide

### Issta13 workshop on debugging

1. 1. Abhik Roychoudhury National University of Singapore ISSTA 2013 Workshop - July 2013 1 HOW SYMBOLIC REASONING CAN HELP PROGRAM (DEBUGGING AND) REPAIR
2. 2. DEBUGGING VS. BUG HUNTING P input = 0 output = 0 P G( pc = end output > input) Model Checker Counter-example: input = 0, output = 0 We should have (output > input) (a) Debugging (b) Model Checking 2 ISSTA 2013 Workshop - July 2013
3. 3. EXECUTION WITH SYMBOLIC INPUTS 3 out = in + 1 out = in * 2 Program P Program Q Symbolic input in == Concrete output out == + 1 Concrete output out == 2* To expose difference, try to find such that + 1 2 * Symbolic input in == ISSTA2013Workshop-July2013
4. 4. PATH CONDITION COMPUTATION 4 1 input in; 2 z = 0; x = 0; 3 if (in > 0){ 4 z = in *2; 5 x = in +2; 6 x = x + 2; 7 } 8 else … 9 if ( z > x){ return error; } in == 5 Line# Assignment store Path condition 1 {} true 2 {(z,0),(x,0)} true 3 {(z,0),(x,0)} in > 0 4 {(z,2*in), (x,0)} in > 0 5 {(z,2*in), (x,in+2)} in > 0 6 {(z,2*in), (x, in+4)} in > 0 7 {(z, 2*in), (x, in+4)} in > 0 9 {(z, 2*in), (x, in+4)} in>0 (2*in > in +4) Using the assignment store, can also compute symbolic expression for output along each path. ISSTA2013Workshop-July2013
5. 5. USAGE OF DSE 5 ISSTA2013Workshop-July2013 input x, y; a = 0; b = 0; if (x > y) a = x; else a = y; if (x + y > 10) b = a; return b; Passing inputs: Continue the search for failing inputs, those which do not go through the “same” path. Path condition of (x == 0, y == 0) x ≤ y x + y ≤ 10 x == 0, y == 0 x > y a = x a = y x +y >10 b = a return b Cover more paths x ≤ y x + y ≤ 10 x ≤ y x + y ≤ 10 x ≤ y
6. 6. IMPLICIT ASSUMPTION IN DSE  Inputs executing a path are “similar”.  If we test one of them, no need to test the others.  Use “similarity” to skip over parts of a large search space.  DSE is a tool to achieve this goal.  Testing is search over paths, not search over inputs.  Coarser-grained notion of similarity? 6 ISSTA2013Workshop-July2013
7. 7. THE SEARCH FOR “SIMILARITY”  Testing  No need to test “similar” inputs.  Can look for “similarity” beyond paths.  Debugging  Given a failing input – find “similar” inputs that pass  Logical comparison to detect “deviations” – bug report.  Repair  Find “similar” inputs showing the “same” error  Group all executions through which a fail is rescued.  Symbolic execution used to capture “intended behavior”. 7 ISSTA2013Workshop-July2013
8. 8. 8 “SIMILARITY” BEYOND PATHS 1 int x,y,z; // input variables 2 int out; // output variable 3 int a; 4 int b = 2; 5 if(x - y > 0) //b1 6 a = x; 7 else 8 a = y; 9 if (x + y > 10) //b2 10 b = a; 11 if(z*z > 3) //b3 12 printf("square(z) > 3 n"); 13 else 14 printf("square(z) <= 3 n"); 15 out = b; //slicing criteria If x − y > 0 and x + y > 10, then out == x Paths: 1,2,3,4,5,6,9,10,11,12,15 1,2,3,4,5,6,9,10,11,13,14,15 If x − y ≤ 0 and x + y > 10, then out == y Paths: 1,2,3,4,5,7,8,9,10,11,12,15 1,2,3,4,5,7,8,9,10,11,13,14,15 If x + y ≤ 10, then out == 2 Paths: 1,2,3,4,5,6,9,11,12,15 1,2,3,4,5,6,9,11,13,14,15 1,2,3,4,5,7,8,9,11,12,15 1,2,3,4,5,7,8,9,11,13,14,15 ISSTA2013Workshop-July2013
9. 9. PROGRAM SUMMARY 9 ¬(x+y >10) (out== 2) (x-y > 0) (x+y > 10) (out == x) ¬(x-y > 0) (x+y > 10) (out == y) ISSTA2013Workshop-July2013 Group inputs which produce the same symbolic output. - Efficient testing, and debugging
10. 10. RELEVANT SLICE CONDITION 10 ISSTA2013Workshop-July2013 1 int x,y,z; //input variables 2 int out; // output variable 3 int a; 4 int b = 2; 5 if(x - y > 0) //b1 6 a = x; 7 else 8 a = y; 9 if (x + y > 10) //b2 10 b = a; 11 if(z*z > 3) //b3 12 printf("square(z)>3 n"); 13 else 14 printf("square(z)<=3n"); 15 out = b; //slicing criteria Relevant Slicing Potential Dependence  Path condition computed over relevant slice  Backward dynamic slicing  control,  data and  potential dependence.  Precisely captures i-o relationship  Groups several paths together
11. 11. PROPERTIES t, t’ program inputs π(t): execution trace of input t RSC(π(t)): relevant slice condition computed on π(t)  Same symbolic output:  Given a path π(t), if an input t’ satisfies RSC(π(t)), then RSC(π(t’) is the same as RSC(π(t)). π(t) and π(t’) computes the same symbolic output.  Complete RSC coverage:  Path exploration (based on reordered RSC) can explore all symbolic outputs. 11 Property for Path Condition: Suppose is a path condition, if t’ satisfy , , then the path condition for contains as a prefix. )...( 21 i )'(t )...( 21 i However, this does NOT hold for Relevant-Slice Condition, making the exploration completely out of order. mi ISSTA2013Workshop-July2013 )...( 21 mf
12. 12. (EXPECTED) VALIDATION 12 ISSTA2013Workshop-July2013 0 100 200 300 400 500 Relevant Slice Condition Paths explored Average formula size 0 50000 100000 150000 200000 Relevant Slice Condition
13. 13. REGRESSION DEBUGGING Old Stable Program P Test Input t New Buggy Program P’ 1 3 ISSTA2013Workshop-July2013
14. 14. ADAPTING TRACE COMPARISON Directly Compare σ and π Old Stable Program P Test Input t New Buggy Program P’ Path σ for t Path π for t New Input t’ 14 ISSTA2013Workshop-July2013
15. 15. THE SEARCH FOR “SIMILARITY” Old Pgm. P New Pgm. P’ Buggy input The new test input 15 ISSTA2013Workshop-July2013
16. 16. DARWIN f:Path condition of t in P Old Stable Program P Test Input t New Buggy Program P’ Alternative Input t’ Concrete and Symbolic Execution STP Solver and input validation Satisfiable sub- formulae from f f’ f':Path condition of t in P’ 'ff Bug Report (Assembly level) Bug Report (Source level) 16 ISSTA 2013 Workshop - July 2013
17. 17. CHOOSING ALTERNATIVE INPUTS b1 b6 b3 b2 b4 b5 11 2 3 4 5 2 3  )...(' 21 mf 1f 21f 321f  'Solve ff 'f At most m alternate inputs !! Check for satisfiability of  1 7
18. 18. BUG REPORT COMPUTATION b1 b6 b3 b2 b4 b5 1 2 3 4 5 3  'f 321f tnew = input obtained by solving Bug report by comparing traces of tbug and tnew should be the branch b3 !! At most m alternate inputs at most m lines in bug report. tbug tnew 18 )...(' 21 mf 'Solve ff ISSTA2013Workshop-July2013
19. 19. COARSER-GRAINED “SIMILARITY” Old Pgm. P New Pgm. P’ Buggy input The new test input 19 Solve rsc rsc’ instead of f f ’ rsc, rsc’ Relevant slice conditions ISSTA2013Workshop-July2013
20. 20. RESULTS ON DARWIN20 Programs Path Condition Relevant Slice Condition Time JLex 543min 15min Jtopas 81min 5min NanoXML 3min 43s Results JLex 50LOC 3LOC Jtopas 4LOC 4LOC NanoXML 8LOC 6LOC Less time Better result Smaller formula to solve, Less formula to solve -> More accurate bug report, obtained faster. ISSTA2013Workshop-July2013
21. 21. IF WE ARE INTERESTED IN STATISTICS  Jlex  ~7290 LoC  v1.2.1 vs. v1.1.1  Diff == 518 LoC  Jtopas  ~5754 LoC  v0.7 vs. v0.8  Diff == 2489 LoC  NanoXML  ~5244 LoC  v2.1 vs. v2.2  Diff == 2496 LoC 21 ISSTA2013Workshop-July2013 Other results in DARWIN paper First implementation on top of BitBlaze (thanks to BitBlaze team) Results on libPNG – 36K LoC TCPflow – 1000 LoC Different implementations of web-servers Miniweb, Savant against Apache.
22. 22. PROGRAM REPAIR  Correctness specification Test suite  Program repair Passing all tests  Repair strategy Rescue failing executions  Use of symbolic execution  Group together all executions through which a failing execution could be rescued.  New notion of “similarity” 22 ISSTA2013Workshop-July2013
23. 23. 0. THE PROBLEM 1 int is_upward( int inhibit, int up_sep, int down_sep){ 2 int bias; 3 if (inhibit) 4 bias = down_sep; // bias= up_sep + 100 5 else bias = up_sep ; 6 if (bias > down_sep) 7 return 1; 8 else return 0; 9 } inhibit up_sep down_se p Observed output Expected Output Result 1 0 100 0 0 pass 1 11 110 0 1 fail 0 100 50 1 1 pass 1 -20 60 0 1 fail 0 0 10 0 0 pass ISSTA2013Workshop-July2013 23
24. 24. 1. FIND A SUSPECT 1 int is_upward( int inhibit, int up_sep, int down_sep){ 2 int bias; 3 if (inhibit) 4 bias = down_sep; // bias= up_sep + 100 5 else bias = up_sep ; 6 if (bias > down_sep) 7 return 1; 8 else return 0; 9 } Line Score Rank 4 0.75 1 8 0.6 2 3 0.5 3 6 0.5 3 5 0 5 7 0 5 ISSTA2013Workshop-July2013 24
25. 25. 2 WHAT IT SHOULD HAVE BEEN 1 int is_upward( int inhibit, int up_sep, int down_sep){ 2 int bias; 3 if (inhibit) 4 bias = down_sep; // bias= up_sep + 100 5 else bias = up_sep ; 6 if (bias > down_sep) 7 return 1; 8 else return 0; 9 } inhibit up_sep down_se p Observed output Expected Output Result 1 11 110 0 1 fail inhibit = 1, up_sep = 11, down_sep = 110 bias = X, path condition = true inhibit = 1, up_sep = 11, down_sep = 110 bias = X, path condition = X> 110 inhibit = 1, up_sep = 11, down_sep = 110 bias = X, path condition = X ≤ 110 Line 4 Line 7 Line 8 ISSTA2013Workshop-July2013 25
26. 26. 2. WHAT IT SHOULD HAVE BEEN 1 int is_upward( int inhibit, int up_sep, int down_sep){ 2 int bias; 3 if (inhibit) 4 bias = f(inhibit, up_sep, down_sep) 5 else bias = up_sep ; 6 if (bias > down_sep) 7 return 1; 8 else return 0; 9 } Inhibit == 1 up_sep == 11 down_se p == 110 Symbolic Execution f(1,11,110) > 110 ISSTA2013Workshop-July2013 26
27. 27. 3. FIX THE SUSPECT  Accumulated constraints  f(1,11, 110) > 110  f(1,0,100) ≤ 100  …  Find a f satisfying this constraint  By fixing the set of operators appearing in f  Candidate methods  Search over the space of expressions  Program synthesis with fixed set of operators  More efficient!!  Generated fix  f(inhibit,up_sep,down_sep) = up_sep + 100 ISSTA2013Workshop-July2013 27
28. 28. TO RECAPITULATE  Ranked Bug report  Hypothesize the error causes – suspect  Symbolic execution  Specification of the suspicious statement  Input-output requirements from each test  Repair constraint  Program synthesis  Decide operators which can appear in the fix  Generate a fixed statement by solving repair constraint. ISSTA2013Workshop-July2013 28
29. 29. WHAT IT SHOULD HAVE BEEN Buggy Program … var = a + b – c;x Concrete test input Concrete Execution Symbolic Execution with x as the only unknown Path conditions, Output Expressions ISSTA2013Workshop-July2013 29
30. 30. EXAMPLE 30 1 int is_upward( int inhibit, int up_sep, int down_sep){ 2 int bias; 3 if (inhibit) 4 bias = f(inhibit, up_sep, down_sep) // X 5 else bias = up_sep ; 6 if (bias > down_sep) 7 return 1; 8 else return 0; 9 } Inhibit == 1 up_sep == 11 down_sep == 110 Symbolic Execution ( pcj outj == expected_out(t) ) f(t) == X j Paths Repair constraint ( (X >110 1 ==1) (X ≤ 110 0 == 1) ) f(1,11,110) == X ISSTA2013Workshop-July2013 30
31. 31. TO RECAPITULATE  Ranked Bug report  Hypothesize the error causes – suspect  Symbolic execution  Specification of the suspicious statement  Input-output requirements from each test  Repair constraint  Program synthesis  Decide operators which can appear in the fix  Generate a fix by solving repair constraint. ISSTA2013Workshop-July2013 31
32. 32. WHY PROGRAM SYNTHESIS  Instead of solving  Select primitive components to be used by the synthesized program based on complexity  Look for a program that uses only these primitive components and satisfy the repair constraint  Where to place each component?  What are the parameters? int tmp = down_sep -1; return up_sep + tmp; int tmp=down_sep + 1; return tmp- inhibit; int tmp = down_sep -1; return tmp + inhibit ; int tmp = down_sep -1; return tmp + inhibit ; + + inhibit up_sep ISSTA2013Workshop-July2013 Repair Constraint: f(1,11,110) > 110 f(1,0,100) ≤ 100 f(1,-20,60) > 60 32
33. 33. LOCATION VARIABLES  Define location variables for each component  Constraint on location variables solved by SMT.  Well-formed e.g. defined before being used  Output constraint from each test (repair constraint)  Meaning of the components  Lines determine the value Lx == Ly x == y  Once locations are found, program is constructed. ISSTA2013Workshop-July2013 Components = {+} Lin == 0, Lout == 1, Lout+ == 1, Lin1+ == 0, Lin2+ == 0 0 r0 = input; 1 r = r0 + r0; 2 return r; 33
34. 34. SUBJECTS USED 34 ISSTA2013Workshop-July2013 Subject LoC # Versions Description TCAS 135 41 Air Traffic Control Schedule 304 9 Process scheduler Schedule2 262 9 Process scheduler Replace 518 29 Text processing Grep 9366 2 Text search engine SIR programs Subject LoC mknod 183 mkdir 159 mkfifo 107 cp 2272 GNU CoreUtils Repaired by both GP and SEMFIX. Ours/GP = 0.63 (time)
35. 35. WHY IS SEMFIX MORE STABLE? 0 5 10 15 20 25 30 35 40 45 10 20 30 40 50 Total Semfix GenProg # tests #ofprogramsrepaired TCAS Overall 90 programs from SIR SemFix repaired 48/90, GenProg repaired 16/90 for 50 tests. GenProg running time is >3 times of SemFix ISSTA2013Workshop-July2013 Time bound = 4 mins. 35
36. 36. TYPE OF BUGS (SIR) Total SemFix GenProg Constant 14 10 3 Arithmetic 14 6 0 Comparison 16 12 5 Logic 10 10 3 Code Missing 27 5 3 Redundant Code 9 5 2 ALL 90 48 16 ISSTA2013Workshop-July2013 36
37. 37. EXAMPLE FIXES  enabled = High_Confidence && (Own_Tracked_Alt_Rate <= OLEV); /*&& (Cur_Vertical_Sep > MAXALTDIFF);missing code*/  Synthesizes missing code  tmp = Up_Separation;  Synthesizes  tmp = ((OtherCapability < Alt_Layer_Value)?  Two_of_Three_Reports_Valid:  Cur_Vertical_Sep  ); ISSTA2013Workshop-July2013 37
38. 38. STEPPING BACK, PERSPECTIVE  [Obvious] Level of automation  Never completely ~ Programming environments!  Program synthesis likely to play a useful role.  Is debugging required?  Testing search and repair combined.  Avoid statistical fault localization.  Find the location to fix via symbolic reasoning and MAXSAT – not clear about quality of repair produced.  Can generate suggestions instead of repairs?  What is a repair (or not) may depend on context. 38 ISSTA2013Workshop-July2013
39. 39. SPECIFIC APPLICATIONS OF REPAIR  Role-based sanitization of HTML output  XSS attacks – insert scripts into web-pages  Role-based XSS sanitization – reduce false +ve 39 ISSTA2013Workshop-July2013 WordPress, a popular blogging application, groups users into roles. •A user in the author role can create a new post in the blog with most non-code tags permitted. •Anonymous commenter can use only few text formatting tags. (S)he cannot insert images, but authors can. •Neither can insert <script> tag or … Un-trusted input flows into HTML tag context, but sanitizer applies changes as function of the user role. - Weinberger et al, ESORICS 2011. Given the policy, a hand-in-hand test generation followed by (context- sensitive) repair?
40. 40. REFERENCES  Path Exploration based on Symbolic Output Dawei Qi, Hoang D.T. Nguyen, Abhik Roychoudhury, ESEC-FSE 2011, To appear in TOSEM.  DARWIN: An Approach for Debugging Evolving Programs Dawei Qi, Abhik Roychoudhury, Zhenkai Liang, Kapil Vaswani, ESEC-FSE 2009, TOSEM 21(3), 2012.  SemFix: Program Repair via Semantic Analysis Hoang D.T. Nguyen, Dawei Qi, Abhik Roychoudhury, Satish Chandra, ICSE 2013.  Co-authors  Dawei Qi, Zhenkai Liang, HDT Nguyen – NUS.  Satish Chandra – IBM.  Kapil Vaswani – MSR.  Collaborator (ongoing)  Prateek Saxena – NUS, Mattia Fazzini (visiting) 40 ISSTA2013Workshop-July2013