3. Automatic Software Evolution
• An activity or a technique to evolve software
automatically.
• Supports software development process and increase
the productivity of human developers.
3
4. Area Techniques
Refactoring
Henkel et al.(2005), Murphy-Hill et al.(2007), Higo et al.
(2008), Tsantalis et al.(2009), Tsantalis et al.(2010),
Tsantalis et al.(2011), Dijkman et al.(2011)
Automatic Patch
Generation
Arcuri (2008), Arcuri et al.(2008), Dallmeier et al.(2009),
Weimer et al.(2009), Wei et al.(2010),
Orlov and Sipper (2011), Le Goues et al.(2012), Kim et al.
(2013), Nguyen et al.(2013), Long et al.(2015)
Automatic Runtime
Recovery
Rinard et al.(2004), Elkarablieh and Khursid (2008),
Dobolyi et al.(2008), Nagarajan et al.(2009), Perkins et al.
(2009), Carbin et al.(2011), Kling et al.(2012), Carzaniga et
al.(2013), Long et al.(2014)
Performance
Improvement
White et al.(2008) , Langdon et al.(2010), Orlov et al.
(2011), White et al.(2011), Harman et al.(2012), Langdon et
al.(2013), Petke et al.(2014)
4
14. Categorization of
Generate and Validate
9
Approaches
Variant Generation
Simple
Mutations
Mutations with
Existing Source
Code
Pre-defined
Templates
Search
Method
Genetic
Programming
(GP)
Arcuri et al.
2008,
FINCH (Orlov et
al. 2011).
GenProg (Le
Goues et al.
2012), Petke et
al. 2014.
PAR (Kim et al.
2013)
Random/
Heuristic
Search
Debroy and
Wong 2010.
SemFix (Nguyen
et al. 2013).
AE (Weimer et
al. 2013),
TrpAutoRepair
(Qi et al. 2013),
RSRepair (Qi et
al. 2014).
SPR (Long et al.
2015), Prophet
(Long et al.
2015)
17. Co-Evolutionary Method
• Co-evolution of source code and a test suite.
• Using basic 34 primitives.
• Evaluated on eight seeded faults, and fixed five faults.
12
21. SemFix
• Program Repair via Semantic Analysis.
• Selects a target statement based on fault localization.
• Generates repair constraint based on a test suite.
• Synthesizes a new statement satisfying the constraint.
16
23. SemFix
• SemFix repaired 48 out of 90 bugs from SIR and GNU
Coreutils.
• SemFix also generated a repair faster than a GP-based
technique.
18
24. 19
Techniques Description Limitation
Co-evolutionary
Method
Evolving source code and
a test suite together.
Using simple primitives.
Evaluated on seed faults for a
simple program.
Primitives are too small.
SemFix
Derive repair constraints by
source code and test cases.
Synthesize a statement
satisfying the constraints.
Components used in
statement synthesis are
simple.
Only evaluated on small
programs.
27. GenProg
• Automatic program repair technique.
• Statement insertion/deletion/replacement.
• Using source code in the same revision.
• Fixed 55 out of 105 bugs.
• Assumes a patch already exists in existing source code.
22
28. Genetic Improvement
• Petke et al. evolve MiniSAT solver for Combinatorial
Interaction Testing (CIT).
• Using multiple variations of MiniSAT solver written by
human as code base.
• Evolved MiniSAT is even faster than the human’s on
CIT.
23
30. AE
• Generating all possible variants one by one.
• Using the same mutation as GenProg.
• Selects a fix location based on fault localization.
• Detects and skip equivalent variants validation.
• Generates first order mutant only.
25
31. RSRepair
• RSRepair has the same search space as GenProg.
• Generates variants one by one - fewer patch trials.
• Randomly selects a change location.
• Outperforms GenProg in 23 out of 24 cases.
26
32. 27
Techniques Description Limitation
GenProg
Statement level mutations.
Insert/replace new statement
from existing code. Lack of ability to create new
statements.
Fitness guided search is not
effective.Genetic
Improvement
Statement level mutations.
Using multiple variation of
the same program as code
base.
AE
Deterministic search.
Equivalent variant detection.
Lack of ability to create new
statements.
Search space is limited to
variants with only one
mutation.RSRepair Random search.
35. PAR
• PAR uses 10 pre-defined fix templates.
• Fix templates are drawn from manual inspection of
human patches.
• Fix templates include null checker, parameter changes
and expression changes.
• Patches requiring new code can be generated.
30
41. Prophet
• Using the same transformation schema as SPR.
• Learning a model from successful patches.
• Ranks candidate patches based on the trained model.
• Prophet generated correct patches for 15 defects, while
SPR generated 11.
36
45. 38
Techniques Description Limitation
PAR
10 fix templates from manual
inspection.
Only some of fix templates
are useful.
Template instantiation using
existing code.
SPR
Seven transformation
schemas.
Condition synthesis for
schema instantiation.
Hard coded heuristic search.
Search space is limited by
schemas.
Prophet
Same transformation
schemas as SPR.
Ranks variants based on a
probabilistic model.
Search space is limited by
schemas.
47. Search Space Explosion
• Pre-defined templates limit search space.
• SPR has correct variants for only 19 out of 69 defects in
its search space.
• 35 out of 69 defects can be fixed with extended search
space (Long et al. 2015).
• How about additional costs?
40
48. Search Method
• Random application of mutations may generate
plausible, incorrect variants (Qi et al. 2015).
• Prophet can find four more correct patches than SPR.
• Only difference is search method.
• Search space extension makes search even harder.
• Effective and efficient search method is necessary.
41
49. How to address the issues?
• Avoiding error-prone changes by learning from existing
changes.
• PAR and SPR show that template approach works.
• Identify frequent changes from software repositories,
then use them as templates.
• Mining usage patterns of such changes to assist search.
42
50. Summary
• Automatic software evolution have been used in many
areas.
• Generate and Validate systems have been advanced in
two major directions - program variant generations and
search method.
• Current challenges in search space explosion and
effective search method.
43
51. 44
Approaches Limitation
Variant
Generation
Simple
Mutations
Applied modifications are very simple.
Scalability issue - only works for small programs.
Mutations with
Existing Source
Code
Existing code restricts possible program variants.
Low possibility that necessary code fragments
exist.
Pre-defined
Templates
Pre-defined templates restrict search space.
Only a small number of templates are used.
Search
Method
Genetic
Programming
Additional costs for fitness evaluation.
Fitness guided search is not effective.
Random/
Heuristic
Search
Search space is limited based on the number of
mutations.
Mostly consider only one mutation.