Keynote given at the Asia Pacific Software Engineering Conference (APSEC), December 2020, on Automated Program Repair technologies and their applications.
1. Automated Program Repair
Abhik Roychoudhury
National University of Singapore
abhik@comp.nus.edu.sg
APSEC 2020 Keynote
1
ACKNOWLEDGEMENT: National Cyber Security Research program from NRF Singapore
2. SUPPOSE I AM UNWELL TODAY
2
APSEC 2020 Keynote
Which one is more
manageable?
3. Beyond Error Detection
APSEC 2020 Keynote
In the absence of formal specifications, analyze the
buggy program and its artifacts such as execution
traces via various heuristics to glean a specification
about how it can pass tests and what could have gone
wrong!
Specification Inference
(application: self-healing)
3
Buggy
Program
Tests
7. Over-fitting
APSEC 2020 Keynote
Tests with
oracles
Buggy
Program
Symbolic
Formulae
Program
Repair
Patched
Program
7
Tests: (ip1,op1), (ip2,op2), (ip3,op3), …
AVOID
if (ip1) return op1
else if (ip2) return op2
else …
8. Example
APSEC 2020 Keynote
Test id a b c oracle Pass
1 -1 -1 -1 INVALID
2 1 1 1 EQUILATERAL
3 2 2 3 ISOSCELES
4 2 3 2 ISOSCELES
5 3 2 2 ISOSCELES
6 2 3 4 SCALANE
1 int triangle(int a, int b, int c){
2 if (a <= 0 || b <= 0 || c <= 0)
3 return INVALID;
4 if (a == b && b == c)
5 return EQUILATERAL;
6 if (a == b || b != c) // bug!
7 return ISOSCELES;
8 return SCALENE;
9 }
Correct fix
(a == b || b== c || a == c)
Traverse all mutations of line 6 ??
Hard to generate fix since (a ==c) or (c ==a) never
appear anywhere else in the program !
8
9. Example
APSEC 2020 Keynote
Test id a b c oracle Pass
1 -1 -1 -1 INVALID
2 1 1 1 EQUILATERAL
3 2 2 3 ISOSCELES
4 2 3 2 ISOSCELES
5 3 2 2 ISOSCELES
6 2 3 4 SCALANE
1 int triangle(int a, int b, int c){
2 if (a <= 0 || b <= 0 || c <= 0)
3 return INVALID;
4 if (a == b && b == c)
5 return EQUILATERAL;
6 if (a == b || b != c) // bug!
7 return ISOSCELES;
8 return SCALENE;
9 }
Correct fix
(a == b || b== c || a == c)
Automatically generate the constraint
f(2,2,3) f(2,3,2) f(3,2,2) f(2,3,4)
Solution
f(a,b,c) = (a == b || b == c || a == c)
9
10. Comparison
Where to fix, which
line?
Generate patches in
the candidate line
Validate the candidate
patches against
correctness criterion.
Where to fix, which
line(s)?
What values should be
returned by those lines,
• e.g. <inp ==1, ret== 0>
What are the
expressions which will
return such values?
APSEC 2020 Keynote
Syntax-based Schematic
for e in Search-space{
Validate e againstTests
}
Semantics-basedSchematic
for t inTests {
generate repair constraintΨt
}
Synthesize e from ∧tΨt
10
11. Specification
Inference
APSEC 2020 Keynote
var = f(live_vars) // X
Test input t
Concrete
values
Oracle (expected output)
Output:
Value-set or Constraint
Symbolic
execution
Program
Concrete Execution
[ICSE13] 11
13. Debugging
• Given a test-suiteT
– fail(s) º # of failing executions in which s occurs
– pass(s) º # of passing executions in which s occurs
– allfail ºTotal # of failing executions
– allpass º Total # of passing executions
• allfail+ allpass = |T|
• Can also use other metric likeOchiai.
Score(s) =
fail(s)
allfail
fail(s)
allfail
pass(s)
allpass
+
Buggy
Program
Test Suite
-Investigate what
this statement
should be.
- Generate a fixed
statement
Fixed
Program
YES
NO
APSEC 2020 Keynote
13
17. Example
APSEC 2020 Keynote
• Accumulated constraints
– f(1,11, 110) > 110
– f(1,0,100) ≤ 100
– …
• Find a f satisfying this constraint
– By fixing the set of operators appearing in f
• Candidate methods
• Search over the space of expressions
• Program synthesis with fixed set of operators
– Can also be achieved by second-order constraint solving
• Generated fix
– f(inhibit,up_sep,down_sep) = up_sep + 100
17
18. Second-order
Reasoning
APSEC 2020 Keynote
18
• Two approaches
– Get property of function f via symbolic execution, and
synthesize a function f satisfying these properties.
– Directly solve for function f by building a second-order
symbolic execution engine.
• Allow for existentially quantified second order variables.
• Restrict their interpretation to a language e.g. linear
integer arithmetic
Term =Var |Constant |Term +Term |Term –Term |Constant *Term
• Example SAT
– (0) > 0 (1) ≤ 0
– Satisfying solution = x. 1 – x
25. Repair Constraint
APSEC 2020 Keynote
• SemFix work (ICSE 2013)
– Example: for an identified expression e to be fixed
• [ X > 0 ] ∧ f(t) == X for each test t
• DirectFix work (ICSE 2015)
– Whole Program as repair constraint
– Use the principle of minimality to synthesize a minimal patch.
• Angelix work (ICSE 2016)
– Example: for identified expressions e1, e2, … to be fixed
– [ (X == 1) ∨ (X == 2) ∨ (X== 3)] ∧ f(t) ==X for each test t.
– [ (X== 1 ∧Y == 1) ∨ (X==2 ∧Y ==2)] ∧ f(t) ==X ∧g(t)==Y for each test t.
25
31. Combat over-fitting: Fuzz Testing
APSEC 2020 Keynote
31
Crashing patches
Search space Crash-free patches
Distinguish crashing and crash-free patches (practical)
Correct patches
Crashing patches may (1) partially fix the crash or (2) unexpectedly introduce new crash
Test
generation
Test cases Repair
Buggy
program
Patched program
Auto-generate
tests
P
P
32. APSEC 2020 Keynote
32
Fix2Fit char* strncpy(char* s,char* t, int n) {
for(int i=0; i<n;i++) // buffer overflow or data leakage
t[i]=s[i];
}
copy the first n characters of s to t.
{p1, p2,p3}
{p1, p3} {p2}
{p1} {p3}
ID Plausible patch
P1 i <n && i!=3
p2 i <5
p3 i <n && i<strlen(s)
correct patch
crashing patch
s=“foo”, n=5
s=“fo”, n=5
mutate
crashing patch
33. Fix2Fit
APSEC 2020 Keynote
33
Integration of repair into programming environments?
Number of plausible patches that can be reduced if the tests are
empowered with more oracles
34. Applications
of
Repair
34
APSEC 2020 Keynote
Repair of security vulnerability
Repair of embedded software
Repair as feedback
for programming
education
Automated
grading
Feedback to
students for
making
progress.
35. Application: Security
APSEC 2020 Keynote
35
“The C and C++ programming languages are notoriously insecure yet remain indispensable. Developers
therefore resort to a multi-pronged approach to find security issues before adversaries. These include
manual, static, and dynamic program analysis. Dynamic bug finding tools or "sanitizers" --- can find bugs
that elude other types of analysis because they observe the actual execution of a program, and can
therefore directly observe incorrect program behavior as it happens.” Song et al 2018.
Time to Fix
Number of vulnerabilities in 2019
overall number of new vulnerabilities: (20,362)
36. Combat
Overfitting:
Constraint
Extraction
APSEC 2020 Keynote
36
Repair
Buggy program
Patched program
P
P
Constraints
• Program vulnerability can be formalized as violations of
constraints, e.g. buffer overflow
access(buffer) < base(buffer) + size(buffer)
char getValue(char[] arr,int index){
intlen =size(arr);
if (index <= len) // errorlocation
return arr[index];
return 0;
}
failing input: arr={1, 2, 3}, index=3
additional specifications to fix the bug for all tests
Concrete Buggy
state: arr[3]
Abstracted constraint
violation: index > len
37. Constraint
Propagation
APSEC 2020 Keynote
37
𝜑’ {P} 𝜑
crashing locationfix location
• Propagate crash-free constraint 𝜑 from crash location
to fix location by calculating weakest precondition
[e ⟼ e’]𝜑’ {P} 𝜑
• The goal of repair is to ensure 𝜑’ is satisfied at the fix
location.
ExtractFix
43. Reference Solution Incorrect Student Program
def search(x, seq):
for i in range(len(seq)):
if x <= seq[i]:
return i
return len(seq)
def search(e, lst):
for j in range(len(lst)):
if e < lst[j]:
return j
else:
j = j + 1
return len(lst) + 1
Repair Incorrect Student Program
def search(e, lst):
for j in range(len(lst)):
if e <= lst[j]:
return j
else:
pass
return len(lst)
def search(e, lst):
for j in range(len(lst)):
if e < lst[j]:
return j
else:
j = j + 1
return len(lst) + 1
Refactored Correct Solution Incorrect Student Program
def search(x, seq):
for i in range(len(seq)):
if x <= seq[i]:
return i
else:
pass
return len(seq)
def search(e, lst):
for j in range(len(lst)):
if e < lst[j]:
return j
else:
j = j + 1
return len(lst) + 1
Example:
Write a Python program which
* Given a sorted sequence seq
* Counts the number of elements smaller than x
43
44. Most Relevant Results
Semantic Program Repair Using a Reference Implementation ( PDF )
ICSE 2018.
Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis ( pdf )
ICSE 2016.
DirectFix: Looking for Simple Program Repairs ( PDF )
ICSE 2015.
SemFix: Program Repair via Semantic Analysis ( pdf )
ICSE 2013.
Symbolic execution with second order existential constraints
ESEC-FSE 2018.
ACKNOWLEDGEMENT: National Cyber Security Research program from NRF Singapore
http://www.comp.nus.edu.sg/~tsunami/ https://www.comp.nus.edu.sg/~nsoe-tss/
Crash-Avoiding Program Repair
ISSTA 2019.
A Feasibility Study of Using Automated Program Repair for Introductory Programming Assignments
ESEC-FSE 2017.
44
45. Perspective
APSEC 2020 Keynote
Automated Program Repair
C. Le Goues, M. Pradel, A. Roychoudhury
Review Article,Communications of the ACM, 2019.
45
abhik@comp.nus.edu.sg
https://www.comp.nus.edu.sg/~abhik