Abhik Roychoudhury
National University of Singapore
ISSTA 2013 Workshop - July 2013
1
HOW SYMBOLIC REASONING CAN
HELP PROGRAM (DEBUGGING AND)
REPAIR
DEBUGGING VS. BUG HUNTING
P
input = 0
output = 0
P
G( pc = end output > input)
Model Checker
Counter-example:
input = 0, output = 0
We should have (output >
input)
(a) Debugging (b) Model Checking
2
ISSTA 2013 Workshop - July 2013
EXECUTION WITH SYMBOLIC INPUTS
3
out = in + 1 out = in * 2
Program P Program
Q
Symbolic input
in ==
Concrete output
out == + 1
Concrete output
out == 2*
To expose difference, try to find such that + 1 2 *
Symbolic input
in ==
ISSTA2013Workshop-July2013
PATH CONDITION COMPUTATION
4
1 input in;
2 z = 0; x = 0;
3 if (in > 0){
4 z = in *2;
5 x = in +2;
6 x = x + 2;
7 }
8 else …
9 if ( z > x){
return error;
}
in == 5
Line# Assignment store Path condition
1 {} true
2 {(z,0),(x,0)} true
3 {(z,0),(x,0)} in > 0
4 {(z,2*in), (x,0)} in > 0
5 {(z,2*in), (x,in+2)} in > 0
6 {(z,2*in), (x, in+4)} in > 0
7 {(z, 2*in), (x, in+4)} in > 0
9 {(z, 2*in), (x, in+4)} in>0 (2*in > in +4)
Using the assignment store, can also compute symbolic
expression for output along each path.
ISSTA2013Workshop-July2013
USAGE OF DSE
5
ISSTA2013Workshop-July2013
input x, y;
a = 0; b = 0;
if (x > y)
a = x;
else
a = y;
if (x + y > 10)
b = a;
return b;
Passing inputs: Continue the search for
failing inputs, those which do not go through
the “same” path.
Path condition of (x == 0, y == 0)
x ≤ y x + y ≤ 10
x == 0, y == 0
x > y
a = x a = y
x +y >10
b = a
return b
Cover more paths
x ≤ y x + y ≤ 10
x ≤ y x + y ≤ 10
x ≤ y
IMPLICIT ASSUMPTION IN DSE
 Inputs executing a path are “similar”.
 If we test one of them, no need to test the others.
 Use “similarity” to skip over parts of a large search
space.
 DSE is a tool to achieve this goal.
 Testing is search over paths, not search over inputs.
 Coarser-grained notion of similarity?
6
ISSTA2013Workshop-July2013
THE SEARCH FOR “SIMILARITY”
 Testing
 No need to test “similar” inputs.
 Can look for “similarity” beyond paths.
 Debugging
 Given a failing input – find “similar” inputs that pass
 Logical comparison to detect “deviations” – bug report.
 Repair
 Find “similar” inputs showing the “same” error
 Group all executions through which a fail is rescued.
 Symbolic execution used to capture “intended behavior”.
7
ISSTA2013Workshop-July2013
8
“SIMILARITY” BEYOND PATHS
1 int x,y,z; // input variables
2 int out; // output variable
3 int a;
4 int b = 2;
5 if(x - y > 0) //b1
6 a = x;
7 else
8 a = y;
9 if (x + y > 10) //b2
10 b = a;
11 if(z*z > 3) //b3
12 printf("square(z) > 3 n");
13 else
14 printf("square(z) <= 3 n");
15 out = b; //slicing criteria
If x − y > 0 and x + y > 10,
then out == x
Paths: 1,2,3,4,5,6,9,10,11,12,15
1,2,3,4,5,6,9,10,11,13,14,15
If x − y ≤ 0 and x + y > 10,
then out == y
Paths: 1,2,3,4,5,7,8,9,10,11,12,15
1,2,3,4,5,7,8,9,10,11,13,14,15
If x + y ≤ 10,
then out == 2
Paths: 1,2,3,4,5,6,9,11,12,15
1,2,3,4,5,6,9,11,13,14,15
1,2,3,4,5,7,8,9,11,12,15
1,2,3,4,5,7,8,9,11,13,14,15
ISSTA2013Workshop-July2013
PROGRAM SUMMARY
9
¬(x+y >10) (out== 2)
(x-y > 0) (x+y > 10) (out == x)
¬(x-y > 0) (x+y > 10) (out == y)
ISSTA2013Workshop-July2013
Group inputs which produce the same symbolic output.
- Efficient testing, and debugging
RELEVANT SLICE CONDITION
10
ISSTA2013Workshop-July2013
1 int x,y,z; //input variables
2 int out; // output variable
3 int a;
4 int b = 2;
5 if(x - y > 0) //b1
6 a = x;
7 else
8 a = y;
9 if (x + y > 10) //b2
10 b = a;
11 if(z*z > 3) //b3
12 printf("square(z)>3 n");
13 else
14 printf("square(z)<=3n");
15 out = b; //slicing criteria
Relevant Slicing
Potential Dependence
 Path condition computed over
relevant slice
 Backward dynamic slicing
 control,
 data and
 potential dependence.
 Precisely captures i-o relationship
 Groups several paths together
PROPERTIES
t, t’ program inputs
π(t): execution trace of input t
RSC(π(t)): relevant slice condition computed on π(t)
 Same symbolic output:
 Given a path π(t), if an input t’ satisfies RSC(π(t)), then
RSC(π(t’) is the same as RSC(π(t)). π(t) and π(t’)
computes the same symbolic output.
 Complete RSC coverage:
 Path exploration (based on reordered RSC) can explore all
symbolic outputs.
11
Property for Path Condition:
Suppose is a path condition,
if t’ satisfy , , then the path condition for
contains as a prefix.
)...( 21 i )'(t
)...( 21 i
However, this does NOT hold for Relevant-Slice Condition, making the
exploration completely out of order.
mi
ISSTA2013Workshop-July2013
)...( 21 mf
(EXPECTED) VALIDATION
12
ISSTA2013Workshop-July2013
0
100
200
300
400
500
Relevant Slice
Condition
Paths explored Average formula size
0
50000
100000
150000
200000 Relevant Slice
Condition
REGRESSION DEBUGGING
Old Stable
Program P
Test Input t
New Buggy
Program P’
1
3
ISSTA2013Workshop-July2013
ADAPTING TRACE COMPARISON
Directly Compare σ and π
Old Stable
Program P
Test Input t
New Buggy
Program P’
Path σ
for t
Path π
for t
New Input t’
14
ISSTA2013Workshop-July2013
THE SEARCH FOR “SIMILARITY”
Old
Pgm. P
New
Pgm. P’
Buggy input
The new test input
15
ISSTA2013Workshop-July2013
DARWIN
f:Path condition
of t in P
Old Stable
Program P
Test Input t
New Buggy
Program P’
Alternative Input t’
Concrete and
Symbolic Execution
STP Solver
and input
validation
Satisfiable sub-
formulae from
f f’
f':Path condition
of t in P’
'ff
Bug Report (Assembly level)
Bug Report (Source level)
16
ISSTA 2013 Workshop - July 2013
CHOOSING ALTERNATIVE INPUTS
b1
b6
b3
b2
b4
b5
11
2
3
4
5
2
3

)...(' 21 mf
1f
21f
321f

'Solve ff
'f At most m alternate inputs !!
Check for satisfiability of

1
7
BUG REPORT COMPUTATION
b1
b6
b3
b2
b4
b5
1
2
3
4
5
3

'f 321f
tnew = input obtained by solving
Bug report by comparing traces of tbug
and tnew should be the branch b3 !!
At most m alternate inputs
at most m lines in bug report.
tbug
tnew
18
)...(' 21 mf
'Solve ff
ISSTA2013Workshop-July2013
COARSER-GRAINED “SIMILARITY”
Old
Pgm. P
New
Pgm. P’
Buggy input
The new test input
19
Solve rsc rsc’ instead of f f ’
rsc, rsc’ Relevant slice conditions
ISSTA2013Workshop-July2013
RESULTS ON DARWIN20
Programs Path Condition
Relevant Slice
Condition
Time
JLex 543min 15min
Jtopas 81min 5min
NanoXML 3min 43s
Results
JLex 50LOC 3LOC
Jtopas 4LOC 4LOC
NanoXML 8LOC 6LOC
Less time
Better result
Smaller formula to solve, Less formula to solve ->
More accurate bug report, obtained faster.
ISSTA2013Workshop-July2013
IF WE ARE INTERESTED IN STATISTICS
 Jlex
 ~7290 LoC
 v1.2.1 vs. v1.1.1
 Diff == 518 LoC
 Jtopas
 ~5754 LoC
 v0.7 vs. v0.8
 Diff == 2489 LoC
 NanoXML
 ~5244 LoC
 v2.1 vs. v2.2
 Diff == 2496 LoC 21
ISSTA2013Workshop-July2013
Other results in DARWIN paper
First implementation on top of
BitBlaze (thanks to BitBlaze team)
Results on
libPNG – 36K LoC
TCPflow – 1000 LoC
Different implementations of
web-servers
Miniweb, Savant against Apache.
PROGRAM REPAIR
 Correctness specification Test suite
 Program repair Passing all tests
 Repair strategy Rescue failing executions
 Use of symbolic execution
 Group together all executions through which a failing execution
could be rescued.
 New notion of “similarity”
22
ISSTA2013Workshop-July2013
0. THE PROBLEM
1 int is_upward( int inhibit, int up_sep, int down_sep){
2 int bias;
3 if (inhibit)
4 bias = down_sep; // bias= up_sep + 100
5 else bias = up_sep ;
6 if (bias > down_sep)
7 return 1;
8 else return 0;
9 }
inhibit up_sep down_se
p
Observed
output
Expected
Output
Result
1 0 100 0 0 pass
1 11 110 0 1 fail
0 100 50 1 1 pass
1 -20 60 0 1 fail
0 0 10 0 0 pass
ISSTA2013Workshop-July2013
23
1. FIND A SUSPECT
1 int is_upward( int inhibit, int up_sep, int down_sep){
2 int bias;
3 if (inhibit)
4 bias = down_sep; // bias= up_sep + 100
5 else bias = up_sep ;
6 if (bias > down_sep)
7 return 1;
8 else return 0;
9 }
Line Score Rank
4 0.75 1
8 0.6 2
3 0.5 3
6 0.5 3
5 0 5
7 0 5
ISSTA2013Workshop-July2013
24
2 WHAT IT SHOULD HAVE BEEN
1 int is_upward( int inhibit, int up_sep, int down_sep){
2 int bias;
3 if (inhibit)
4 bias = down_sep; // bias= up_sep + 100
5 else bias = up_sep ;
6 if (bias > down_sep)
7 return 1;
8 else return 0;
9 }
inhibit up_sep down_se
p
Observed
output
Expected
Output
Result
1 11 110 0 1 fail
inhibit = 1, up_sep = 11, down_sep = 110
bias = X, path condition = true
inhibit = 1, up_sep = 11, down_sep = 110
bias = X, path condition = X> 110
inhibit = 1, up_sep = 11, down_sep = 110
bias = X, path condition = X ≤ 110
Line 4
Line 7 Line 8
ISSTA2013Workshop-July2013
25
2. WHAT IT SHOULD HAVE BEEN
1 int is_upward( int inhibit, int up_sep, int
down_sep){
2 int bias;
3 if (inhibit)
4 bias = f(inhibit, up_sep, down_sep)
5 else bias = up_sep ;
6 if (bias > down_sep)
7 return 1;
8 else return 0;
9 }
Inhibit
== 1
up_sep
== 11
down_se
p == 110
Symbolic Execution
f(1,11,110) > 110
ISSTA2013Workshop-July2013
26
3. FIX THE SUSPECT
 Accumulated constraints
 f(1,11, 110) > 110
 f(1,0,100) ≤ 100
 …
 Find a f satisfying this constraint
 By fixing the set of operators appearing in f
 Candidate methods
 Search over the space of expressions
 Program synthesis with fixed set of operators
 More efficient!!
 Generated fix
 f(inhibit,up_sep,down_sep) = up_sep + 100
ISSTA2013Workshop-July2013
27
TO RECAPITULATE
 Ranked Bug report
 Hypothesize the error causes – suspect
 Symbolic execution
 Specification of the suspicious statement
 Input-output requirements from each test
 Repair constraint
 Program synthesis
 Decide operators which can appear in the fix
 Generate a fixed statement by solving repair constraint.
ISSTA2013Workshop-July2013
28
WHAT IT SHOULD HAVE BEEN
Buggy Program
…
var = a + b – c;x
Concrete test input
Concrete Execution
Symbolic Execution with x as the
only unknown
Path conditions,
Output Expressions
ISSTA2013Workshop-July2013
29
EXAMPLE
30
1 int is_upward( int inhibit, int up_sep, int down_sep){
2 int bias;
3 if (inhibit)
4 bias = f(inhibit, up_sep, down_sep) // X
5 else bias = up_sep ;
6 if (bias > down_sep)
7 return 1;
8 else return 0;
9 }
Inhibit == 1 up_sep == 11 down_sep == 110
Symbolic Execution
( pcj outj == expected_out(t) )
f(t) == X
j Paths
Repair constraint
( (X >110 1 ==1)
(X ≤ 110 0 == 1)
)
f(1,11,110) == X
ISSTA2013Workshop-July2013
30
TO RECAPITULATE
 Ranked Bug report
 Hypothesize the error causes – suspect
 Symbolic execution
 Specification of the suspicious statement
 Input-output requirements from each test
 Repair constraint
 Program synthesis
 Decide operators which can appear in the fix
 Generate a fix by solving repair constraint.
ISSTA2013Workshop-July2013
31
WHY PROGRAM SYNTHESIS
 Instead of solving
 Select primitive components to be used by the synthesized program
based on complexity
 Look for a program that uses only these primitive components and
satisfy the repair constraint
 Where to place each component?
 What are the parameters?
int tmp = down_sep -1;
return up_sep + tmp;
int tmp=down_sep + 1;
return tmp- inhibit;
int tmp = down_sep -1;
return tmp + inhibit ;
int tmp = down_sep -1;
return tmp + inhibit ;
+
+
inhibit up_sep
ISSTA2013Workshop-July2013
Repair Constraint:
f(1,11,110) > 110 f(1,0,100) ≤ 100
f(1,-20,60) > 60
32
LOCATION VARIABLES
 Define location variables for each component
 Constraint on location variables solved by SMT.
 Well-formed e.g. defined before being used
 Output constraint from each test (repair constraint)
 Meaning of the components
 Lines determine the value Lx == Ly x == y
 Once locations are found, program is constructed.
ISSTA2013Workshop-July2013
Components = {+}
Lin == 0, Lout == 1, Lout+ == 1, Lin1+ == 0, Lin2+ == 0
0 r0 = input;
1 r = r0 + r0;
2 return r;
33
SUBJECTS USED
34
ISSTA2013Workshop-July2013
Subject LoC # Versions Description
TCAS 135 41 Air Traffic Control
Schedule 304 9 Process scheduler
Schedule2 262 9 Process scheduler
Replace 518 29 Text processing
Grep 9366 2 Text search engine
SIR programs
Subject LoC
mknod 183
mkdir 159
mkfifo 107
cp 2272
GNU CoreUtils
Repaired by both GP and
SEMFIX.
Ours/GP = 0.63 (time)
WHY IS SEMFIX MORE STABLE?
0
5
10
15
20
25
30
35
40
45
10 20 30 40 50
Total
Semfix
GenProg
# tests
#ofprogramsrepaired
TCAS
Overall 90 programs from SIR
SemFix repaired 48/90, GenProg repaired 16/90 for 50 tests.
GenProg running time is >3 times of SemFix
ISSTA2013Workshop-July2013
Time bound = 4 mins.
35
TYPE OF BUGS (SIR)
Total SemFix GenProg
Constant 14 10 3
Arithmetic 14 6 0
Comparison 16 12 5
Logic 10 10 3
Code
Missing
27 5 3
Redundant
Code
9 5 2
ALL 90 48 16
ISSTA2013Workshop-July2013
36
EXAMPLE FIXES
 enabled = High_Confidence && (Own_Tracked_Alt_Rate <=
OLEV); /*&& (Cur_Vertical_Sep > MAXALTDIFF);missing
code*/
 Synthesizes missing code
 tmp = Up_Separation;
 Synthesizes
 tmp = ((OtherCapability < Alt_Layer_Value)?
 Two_of_Three_Reports_Valid:
 Cur_Vertical_Sep
 );
ISSTA2013Workshop-July2013
37
STEPPING BACK, PERSPECTIVE
 [Obvious] Level of automation
 Never completely ~ Programming environments!
 Program synthesis likely to play a useful role.
 Is debugging required?
 Testing search and repair combined.
 Avoid statistical fault localization.
 Find the location to fix via symbolic reasoning and
MAXSAT – not clear about quality of repair produced.
 Can generate suggestions instead of repairs?
 What is a repair (or not) may depend on context.
38
ISSTA2013Workshop-July2013
SPECIFIC APPLICATIONS OF REPAIR
 Role-based sanitization of HTML output
 XSS attacks – insert scripts into web-pages
 Role-based XSS sanitization – reduce false +ve
39
ISSTA2013Workshop-July2013
WordPress, a popular blogging application, groups users into roles.
•A user in the author role can create a new post in the blog with
most non-code tags permitted.
•Anonymous commenter can use only few text formatting tags.
(S)he cannot insert images, but authors can.
•Neither can insert <script> tag or …
Un-trusted input flows into HTML tag context, but sanitizer applies
changes as function of the user role.
- Weinberger et al, ESORICS 2011.
Given the policy, a hand-in-hand test generation followed by (context-
sensitive) repair?
REFERENCES
 Path Exploration based on Symbolic Output Dawei Qi, Hoang D.T.
Nguyen, Abhik Roychoudhury, ESEC-FSE 2011, To appear in TOSEM.
 DARWIN: An Approach for Debugging Evolving Programs Dawei
Qi, Abhik Roychoudhury, Zhenkai Liang, Kapil Vaswani, ESEC-FSE
2009, TOSEM 21(3), 2012.
 SemFix: Program Repair via Semantic Analysis Hoang D.T.
Nguyen, Dawei Qi, Abhik Roychoudhury, Satish Chandra, ICSE 2013.
 Co-authors
 Dawei Qi, Zhenkai Liang, HDT Nguyen – NUS.
 Satish Chandra – IBM.
 Kapil Vaswani – MSR.
 Collaborator (ongoing)
 Prateek Saxena – NUS, Mattia Fazzini (visiting)
40
ISSTA2013Workshop-July2013

Issta13 workshop on debugging

  • 1.
    Abhik Roychoudhury National Universityof Singapore ISSTA 2013 Workshop - July 2013 1 HOW SYMBOLIC REASONING CAN HELP PROGRAM (DEBUGGING AND) REPAIR
  • 2.
    DEBUGGING VS. BUGHUNTING P input = 0 output = 0 P G( pc = end output > input) Model Checker Counter-example: input = 0, output = 0 We should have (output > input) (a) Debugging (b) Model Checking 2 ISSTA 2013 Workshop - July 2013
  • 3.
    EXECUTION WITH SYMBOLICINPUTS 3 out = in + 1 out = in * 2 Program P Program Q Symbolic input in == Concrete output out == + 1 Concrete output out == 2* To expose difference, try to find such that + 1 2 * Symbolic input in == ISSTA2013Workshop-July2013
  • 4.
    PATH CONDITION COMPUTATION 4 1input in; 2 z = 0; x = 0; 3 if (in > 0){ 4 z = in *2; 5 x = in +2; 6 x = x + 2; 7 } 8 else … 9 if ( z > x){ return error; } in == 5 Line# Assignment store Path condition 1 {} true 2 {(z,0),(x,0)} true 3 {(z,0),(x,0)} in > 0 4 {(z,2*in), (x,0)} in > 0 5 {(z,2*in), (x,in+2)} in > 0 6 {(z,2*in), (x, in+4)} in > 0 7 {(z, 2*in), (x, in+4)} in > 0 9 {(z, 2*in), (x, in+4)} in>0 (2*in > in +4) Using the assignment store, can also compute symbolic expression for output along each path. ISSTA2013Workshop-July2013
  • 5.
    USAGE OF DSE 5 ISSTA2013Workshop-July2013 inputx, y; a = 0; b = 0; if (x > y) a = x; else a = y; if (x + y > 10) b = a; return b; Passing inputs: Continue the search for failing inputs, those which do not go through the “same” path. Path condition of (x == 0, y == 0) x ≤ y x + y ≤ 10 x == 0, y == 0 x > y a = x a = y x +y >10 b = a return b Cover more paths x ≤ y x + y ≤ 10 x ≤ y x + y ≤ 10 x ≤ y
  • 6.
    IMPLICIT ASSUMPTION INDSE  Inputs executing a path are “similar”.  If we test one of them, no need to test the others.  Use “similarity” to skip over parts of a large search space.  DSE is a tool to achieve this goal.  Testing is search over paths, not search over inputs.  Coarser-grained notion of similarity? 6 ISSTA2013Workshop-July2013
  • 7.
    THE SEARCH FOR“SIMILARITY”  Testing  No need to test “similar” inputs.  Can look for “similarity” beyond paths.  Debugging  Given a failing input – find “similar” inputs that pass  Logical comparison to detect “deviations” – bug report.  Repair  Find “similar” inputs showing the “same” error  Group all executions through which a fail is rescued.  Symbolic execution used to capture “intended behavior”. 7 ISSTA2013Workshop-July2013
  • 8.
    8 “SIMILARITY” BEYOND PATHS 1int x,y,z; // input variables 2 int out; // output variable 3 int a; 4 int b = 2; 5 if(x - y > 0) //b1 6 a = x; 7 else 8 a = y; 9 if (x + y > 10) //b2 10 b = a; 11 if(z*z > 3) //b3 12 printf("square(z) > 3 n"); 13 else 14 printf("square(z) <= 3 n"); 15 out = b; //slicing criteria If x − y > 0 and x + y > 10, then out == x Paths: 1,2,3,4,5,6,9,10,11,12,15 1,2,3,4,5,6,9,10,11,13,14,15 If x − y ≤ 0 and x + y > 10, then out == y Paths: 1,2,3,4,5,7,8,9,10,11,12,15 1,2,3,4,5,7,8,9,10,11,13,14,15 If x + y ≤ 10, then out == 2 Paths: 1,2,3,4,5,6,9,11,12,15 1,2,3,4,5,6,9,11,13,14,15 1,2,3,4,5,7,8,9,11,12,15 1,2,3,4,5,7,8,9,11,13,14,15 ISSTA2013Workshop-July2013
  • 9.
    PROGRAM SUMMARY 9 ¬(x+y >10)(out== 2) (x-y > 0) (x+y > 10) (out == x) ¬(x-y > 0) (x+y > 10) (out == y) ISSTA2013Workshop-July2013 Group inputs which produce the same symbolic output. - Efficient testing, and debugging
  • 10.
    RELEVANT SLICE CONDITION 10 ISSTA2013Workshop-July2013 1int x,y,z; //input variables 2 int out; // output variable 3 int a; 4 int b = 2; 5 if(x - y > 0) //b1 6 a = x; 7 else 8 a = y; 9 if (x + y > 10) //b2 10 b = a; 11 if(z*z > 3) //b3 12 printf("square(z)>3 n"); 13 else 14 printf("square(z)<=3n"); 15 out = b; //slicing criteria Relevant Slicing Potential Dependence  Path condition computed over relevant slice  Backward dynamic slicing  control,  data and  potential dependence.  Precisely captures i-o relationship  Groups several paths together
  • 11.
    PROPERTIES t, t’ programinputs π(t): execution trace of input t RSC(π(t)): relevant slice condition computed on π(t)  Same symbolic output:  Given a path π(t), if an input t’ satisfies RSC(π(t)), then RSC(π(t’) is the same as RSC(π(t)). π(t) and π(t’) computes the same symbolic output.  Complete RSC coverage:  Path exploration (based on reordered RSC) can explore all symbolic outputs. 11 Property for Path Condition: Suppose is a path condition, if t’ satisfy , , then the path condition for contains as a prefix. )...( 21 i )'(t )...( 21 i However, this does NOT hold for Relevant-Slice Condition, making the exploration completely out of order. mi ISSTA2013Workshop-July2013 )...( 21 mf
  • 12.
    (EXPECTED) VALIDATION 12 ISSTA2013Workshop-July2013 0 100 200 300 400 500 Relevant Slice Condition Pathsexplored Average formula size 0 50000 100000 150000 200000 Relevant Slice Condition
  • 13.
    REGRESSION DEBUGGING Old Stable ProgramP Test Input t New Buggy Program P’ 1 3 ISSTA2013Workshop-July2013
  • 14.
    ADAPTING TRACE COMPARISON DirectlyCompare σ and π Old Stable Program P Test Input t New Buggy Program P’ Path σ for t Path π for t New Input t’ 14 ISSTA2013Workshop-July2013
  • 15.
    THE SEARCH FOR“SIMILARITY” Old Pgm. P New Pgm. P’ Buggy input The new test input 15 ISSTA2013Workshop-July2013
  • 16.
    DARWIN f:Path condition of tin P Old Stable Program P Test Input t New Buggy Program P’ Alternative Input t’ Concrete and Symbolic Execution STP Solver and input validation Satisfiable sub- formulae from f f’ f':Path condition of t in P’ 'ff Bug Report (Assembly level) Bug Report (Source level) 16 ISSTA 2013 Workshop - July 2013
  • 17.
    CHOOSING ALTERNATIVE INPUTS b1 b6 b3 b2 b4 b5 11 2 3 4 5 2 3  )...('21 mf 1f 21f 321f  'Solve ff 'f At most m alternate inputs !! Check for satisfiability of  1 7
  • 18.
    BUG REPORT COMPUTATION b1 b6 b3 b2 b4 b5 1 2 3 4 5 3  'f321f tnew = input obtained by solving Bug report by comparing traces of tbug and tnew should be the branch b3 !! At most m alternate inputs at most m lines in bug report. tbug tnew 18 )...(' 21 mf 'Solve ff ISSTA2013Workshop-July2013
  • 19.
    COARSER-GRAINED “SIMILARITY” Old Pgm. P New Pgm.P’ Buggy input The new test input 19 Solve rsc rsc’ instead of f f ’ rsc, rsc’ Relevant slice conditions ISSTA2013Workshop-July2013
  • 20.
    RESULTS ON DARWIN20 ProgramsPath Condition Relevant Slice Condition Time JLex 543min 15min Jtopas 81min 5min NanoXML 3min 43s Results JLex 50LOC 3LOC Jtopas 4LOC 4LOC NanoXML 8LOC 6LOC Less time Better result Smaller formula to solve, Less formula to solve -> More accurate bug report, obtained faster. ISSTA2013Workshop-July2013
  • 21.
    IF WE AREINTERESTED IN STATISTICS  Jlex  ~7290 LoC  v1.2.1 vs. v1.1.1  Diff == 518 LoC  Jtopas  ~5754 LoC  v0.7 vs. v0.8  Diff == 2489 LoC  NanoXML  ~5244 LoC  v2.1 vs. v2.2  Diff == 2496 LoC 21 ISSTA2013Workshop-July2013 Other results in DARWIN paper First implementation on top of BitBlaze (thanks to BitBlaze team) Results on libPNG – 36K LoC TCPflow – 1000 LoC Different implementations of web-servers Miniweb, Savant against Apache.
  • 22.
    PROGRAM REPAIR  Correctnessspecification Test suite  Program repair Passing all tests  Repair strategy Rescue failing executions  Use of symbolic execution  Group together all executions through which a failing execution could be rescued.  New notion of “similarity” 22 ISSTA2013Workshop-July2013
  • 23.
    0. THE PROBLEM 1int is_upward( int inhibit, int up_sep, int down_sep){ 2 int bias; 3 if (inhibit) 4 bias = down_sep; // bias= up_sep + 100 5 else bias = up_sep ; 6 if (bias > down_sep) 7 return 1; 8 else return 0; 9 } inhibit up_sep down_se p Observed output Expected Output Result 1 0 100 0 0 pass 1 11 110 0 1 fail 0 100 50 1 1 pass 1 -20 60 0 1 fail 0 0 10 0 0 pass ISSTA2013Workshop-July2013 23
  • 24.
    1. FIND ASUSPECT 1 int is_upward( int inhibit, int up_sep, int down_sep){ 2 int bias; 3 if (inhibit) 4 bias = down_sep; // bias= up_sep + 100 5 else bias = up_sep ; 6 if (bias > down_sep) 7 return 1; 8 else return 0; 9 } Line Score Rank 4 0.75 1 8 0.6 2 3 0.5 3 6 0.5 3 5 0 5 7 0 5 ISSTA2013Workshop-July2013 24
  • 25.
    2 WHAT ITSHOULD HAVE BEEN 1 int is_upward( int inhibit, int up_sep, int down_sep){ 2 int bias; 3 if (inhibit) 4 bias = down_sep; // bias= up_sep + 100 5 else bias = up_sep ; 6 if (bias > down_sep) 7 return 1; 8 else return 0; 9 } inhibit up_sep down_se p Observed output Expected Output Result 1 11 110 0 1 fail inhibit = 1, up_sep = 11, down_sep = 110 bias = X, path condition = true inhibit = 1, up_sep = 11, down_sep = 110 bias = X, path condition = X> 110 inhibit = 1, up_sep = 11, down_sep = 110 bias = X, path condition = X ≤ 110 Line 4 Line 7 Line 8 ISSTA2013Workshop-July2013 25
  • 26.
    2. WHAT ITSHOULD HAVE BEEN 1 int is_upward( int inhibit, int up_sep, int down_sep){ 2 int bias; 3 if (inhibit) 4 bias = f(inhibit, up_sep, down_sep) 5 else bias = up_sep ; 6 if (bias > down_sep) 7 return 1; 8 else return 0; 9 } Inhibit == 1 up_sep == 11 down_se p == 110 Symbolic Execution f(1,11,110) > 110 ISSTA2013Workshop-July2013 26
  • 27.
    3. FIX THESUSPECT  Accumulated constraints  f(1,11, 110) > 110  f(1,0,100) ≤ 100  …  Find a f satisfying this constraint  By fixing the set of operators appearing in f  Candidate methods  Search over the space of expressions  Program synthesis with fixed set of operators  More efficient!!  Generated fix  f(inhibit,up_sep,down_sep) = up_sep + 100 ISSTA2013Workshop-July2013 27
  • 28.
    TO RECAPITULATE  RankedBug report  Hypothesize the error causes – suspect  Symbolic execution  Specification of the suspicious statement  Input-output requirements from each test  Repair constraint  Program synthesis  Decide operators which can appear in the fix  Generate a fixed statement by solving repair constraint. ISSTA2013Workshop-July2013 28
  • 29.
    WHAT IT SHOULDHAVE BEEN Buggy Program … var = a + b – c;x Concrete test input Concrete Execution Symbolic Execution with x as the only unknown Path conditions, Output Expressions ISSTA2013Workshop-July2013 29
  • 30.
    EXAMPLE 30 1 int is_upward(int inhibit, int up_sep, int down_sep){ 2 int bias; 3 if (inhibit) 4 bias = f(inhibit, up_sep, down_sep) // X 5 else bias = up_sep ; 6 if (bias > down_sep) 7 return 1; 8 else return 0; 9 } Inhibit == 1 up_sep == 11 down_sep == 110 Symbolic Execution ( pcj outj == expected_out(t) ) f(t) == X j Paths Repair constraint ( (X >110 1 ==1) (X ≤ 110 0 == 1) ) f(1,11,110) == X ISSTA2013Workshop-July2013 30
  • 31.
    TO RECAPITULATE  RankedBug report  Hypothesize the error causes – suspect  Symbolic execution  Specification of the suspicious statement  Input-output requirements from each test  Repair constraint  Program synthesis  Decide operators which can appear in the fix  Generate a fix by solving repair constraint. ISSTA2013Workshop-July2013 31
  • 32.
    WHY PROGRAM SYNTHESIS Instead of solving  Select primitive components to be used by the synthesized program based on complexity  Look for a program that uses only these primitive components and satisfy the repair constraint  Where to place each component?  What are the parameters? int tmp = down_sep -1; return up_sep + tmp; int tmp=down_sep + 1; return tmp- inhibit; int tmp = down_sep -1; return tmp + inhibit ; int tmp = down_sep -1; return tmp + inhibit ; + + inhibit up_sep ISSTA2013Workshop-July2013 Repair Constraint: f(1,11,110) > 110 f(1,0,100) ≤ 100 f(1,-20,60) > 60 32
  • 33.
    LOCATION VARIABLES  Definelocation variables for each component  Constraint on location variables solved by SMT.  Well-formed e.g. defined before being used  Output constraint from each test (repair constraint)  Meaning of the components  Lines determine the value Lx == Ly x == y  Once locations are found, program is constructed. ISSTA2013Workshop-July2013 Components = {+} Lin == 0, Lout == 1, Lout+ == 1, Lin1+ == 0, Lin2+ == 0 0 r0 = input; 1 r = r0 + r0; 2 return r; 33
  • 34.
    SUBJECTS USED 34 ISSTA2013Workshop-July2013 Subject LoC# Versions Description TCAS 135 41 Air Traffic Control Schedule 304 9 Process scheduler Schedule2 262 9 Process scheduler Replace 518 29 Text processing Grep 9366 2 Text search engine SIR programs Subject LoC mknod 183 mkdir 159 mkfifo 107 cp 2272 GNU CoreUtils Repaired by both GP and SEMFIX. Ours/GP = 0.63 (time)
  • 35.
    WHY IS SEMFIXMORE STABLE? 0 5 10 15 20 25 30 35 40 45 10 20 30 40 50 Total Semfix GenProg # tests #ofprogramsrepaired TCAS Overall 90 programs from SIR SemFix repaired 48/90, GenProg repaired 16/90 for 50 tests. GenProg running time is >3 times of SemFix ISSTA2013Workshop-July2013 Time bound = 4 mins. 35
  • 36.
    TYPE OF BUGS(SIR) Total SemFix GenProg Constant 14 10 3 Arithmetic 14 6 0 Comparison 16 12 5 Logic 10 10 3 Code Missing 27 5 3 Redundant Code 9 5 2 ALL 90 48 16 ISSTA2013Workshop-July2013 36
  • 37.
    EXAMPLE FIXES  enabled= High_Confidence && (Own_Tracked_Alt_Rate <= OLEV); /*&& (Cur_Vertical_Sep > MAXALTDIFF);missing code*/  Synthesizes missing code  tmp = Up_Separation;  Synthesizes  tmp = ((OtherCapability < Alt_Layer_Value)?  Two_of_Three_Reports_Valid:  Cur_Vertical_Sep  ); ISSTA2013Workshop-July2013 37
  • 38.
    STEPPING BACK, PERSPECTIVE [Obvious] Level of automation  Never completely ~ Programming environments!  Program synthesis likely to play a useful role.  Is debugging required?  Testing search and repair combined.  Avoid statistical fault localization.  Find the location to fix via symbolic reasoning and MAXSAT – not clear about quality of repair produced.  Can generate suggestions instead of repairs?  What is a repair (or not) may depend on context. 38 ISSTA2013Workshop-July2013
  • 39.
    SPECIFIC APPLICATIONS OFREPAIR  Role-based sanitization of HTML output  XSS attacks – insert scripts into web-pages  Role-based XSS sanitization – reduce false +ve 39 ISSTA2013Workshop-July2013 WordPress, a popular blogging application, groups users into roles. •A user in the author role can create a new post in the blog with most non-code tags permitted. •Anonymous commenter can use only few text formatting tags. (S)he cannot insert images, but authors can. •Neither can insert <script> tag or … Un-trusted input flows into HTML tag context, but sanitizer applies changes as function of the user role. - Weinberger et al, ESORICS 2011. Given the policy, a hand-in-hand test generation followed by (context- sensitive) repair?
  • 40.
    REFERENCES  Path Explorationbased on Symbolic Output Dawei Qi, Hoang D.T. Nguyen, Abhik Roychoudhury, ESEC-FSE 2011, To appear in TOSEM.  DARWIN: An Approach for Debugging Evolving Programs Dawei Qi, Abhik Roychoudhury, Zhenkai Liang, Kapil Vaswani, ESEC-FSE 2009, TOSEM 21(3), 2012.  SemFix: Program Repair via Semantic Analysis Hoang D.T. Nguyen, Dawei Qi, Abhik Roychoudhury, Satish Chandra, ICSE 2013.  Co-authors  Dawei Qi, Zhenkai Liang, HDT Nguyen – NUS.  Satish Chandra – IBM.  Kapil Vaswani – MSR.  Collaborator (ongoing)  Prateek Saxena – NUS, Mattia Fazzini (visiting) 40 ISSTA2013Workshop-July2013