SlideShare a Scribd company logo
1 of 63
It Does What You Say, Not What You Mean:
Lessons From A Decade of Program Repair
Westley Weimer
ThanhVu Nguyen
Claire Le Goues
Stephanie Forrest
2
Survivor Bias
Flashback to ICSE 2009
3
(…And then extended for TSE)
4
The Once And Future Problem: Bugs
• 2002 NIST survey: software bugs to cost 0.6% of the US GDP.
• 2003 textbooks: bemoaned that up to 90% of a software
project's cost was dedicated to maintenance and bug repair.
• 2005 Mozilla developers: complained about 300 bugs
appearing per day: "far too many" to handle.
• As a graduate student, Wes had worked on SLAM and BLAST
and envied Dawson Engler.
– Only to hear "we already have tens of thousands of un-fixed bug
reports; don't bother finding more bugs”.
5
The Cunning Plan
• Automatically, efficiently
repair certain classes of
bugs in off-the-shelf,
unannotated legacy
programs.
• Basic idea: biased, random
search through the space of
all programs for a variant
that repairs the problem.
6
https://upload.wikimedia.org/wikipedia/commons/a/a4/13-02-27-spielbank-wiesbaden-by-RalfR-093.jpg
Genetic programming: the application
of evolutionary or genetic algorithms
to program source code.
7
INPUT
OUTPUT
EVALUATE FITNESS
DISCARD
ACCEPT
MUTATE
8
MUTATE
DISCARD
INPUT EVALUATE FITNESS
ACCEPT
OUTPUT
9
Original Secret Sauces
• Use existing test cases to evaluate candidate
repairs.
• Search by perturbing parts of the program likely
to contain the error.
• Existing program code and behavior contains the
seeds of many repairs.
– Leverage existing developer expertise rather than
inventing new code!
10
Candidate Repairs:
Modified Abstract Syntax Trees
• Programs statements are manipulated.
– Reduces search space compared to changing expressions.
• Custom spectrum fault localization.
– Called the "weighted path" in the papers: it wasn't very good.
– Reduces search space compared to changing the whole program.
• Simple mutation (e.g., in the style of introductory students).
– Choose a statement based on fault localization weights.
– Delete S, Replace S with S1, or Insert S2 after S.
• Choose S1 and S2 from the entire AST.
• Reduces search space compared to inventing statements (synthesis).
11
MUTATE
12
INPUT
DISCARD
ACCEPT
EVALUATE FITNESS
Minimization using
delta debugging to
“mitigate the risk of
breaking untested
functionality.”
Search-Based Software Engineering
• This search-based approach admits other
(multi-objective) fitness functions.
– Energy reduction, graphical fidelity, readability,
execution time, etc.
• In 2009: bugs via pass/fail tests only.
[ Harman and Jones. Search-based software engineering. Information &
Software Technology (2001) ]
13
The
original!
14
Inspired
By
Fuzzing
The
original!
Note the overachievement!
We were aiming for a
minimum of 50k LOC.
15
Inspired
By
Fuzzing
Fast Forward to the
2019 ICSE SEIP
Track….
(Presented just two hours ago!)
16
“Results from repair
applied to 6 multi-
million line systems.”
“Facebook, Inc”
“one widely-studied
[repair] approach uses
software testing to guide
the repair process, as
typified by GenProg.”
How did we get here?
17
An incomplete list of acknowledgements
• As a group, we saw an emphasis on taking risks and allowing junior
researchers to rise to the occasion.
• Mark Harman and collaborators at CREST and SBST
• John Knight, Jack Davidson, Anh Nguyen-Tuong, Eric Schulte, Zak
Fry, Ethan Fast, and Michael Dewey-Vogt from UVA/UNM
• Tom Ball and collaborators from Microsoft Research
• Pat Hurley and Sol Greenspan for initial funding support
• … and many more!
18
Another possibility: We were right
about everything! All we needed was
for Facebook to take an interest and
implement it.
19
… Not quite.
20
MUTATE
DISCARD
INPUT EVALUATE FITNESS
ACCEPT
OUTPUT
21
Storing full programs does not scale.
Storing patches instead would help
make genetic improvement feasible.
Candidate Repairs:
Modified Abstract Syntax Trees
• Programs statements are manipulated.
– Reduces search space compared to changing expressions.
• Custom spectrum fault localization.
– High (1.0) if S is not visited on a passed test.
– Low (0.1, 0.0) if S is also visited on a passed test.
• Simple mutation.
– Choose a statement based on fault localization weights.
– Delete S, Replace S with S1, or Insert S2 after S.
• Choose S1 and S2 from the entire AST.
• Reduces search space compared to inventing statements (synthesis).
22
EVALUATE FITNESS
MUTATE
23
INPUT
OUTPUT
ACCEPT
DISCARD
Arbitrary weightings
for the failing test
did not guide us well.
MUTATE
24
INPUT
DISCARD
ACCEPT
EVALUATE FITNESS
Minimization does not
affect semantic patch
quality. ML experts have
known since the 70's
that model size can be
independent of degree
of overfitting.
Selection and Maintenance Debt
• If there are multiple high-fitness candidates, how do you pick
one?
– Used stochastic universal sampling (roulette wheel selection).
– Better approaches (e.g., tournament selection) were known.
– SUS was first on the list on Wikipedia (no, really …).
• First GenProg prototype was made for a Multi University
Research Initiatives grant meeting.
– Fixed GCD infinite loop and Nullhttpd buffer overrun.
– Victim of its own success: no refactoring for years.
25
Lessons from a decade of program repair
• Lesson 1: Don’t let the perfect be the enemy of the good.
26
Another possibility: We were right
about everything! All we needed was
for Facebook to take an interest and
implement it.
27
The Three
Major
Challenges
28
Scalability Repair quality
Expressive power
The
original!
29
“If I gave you the last 100 bugs
from <my project>, how many
could <your technique> fix?”
– Real Engineers
30
Systematic Benchmark Construction
31
• Approach: use historical data to
approximate discovery and repair
of bugs in the wild.
• Mine program versions (going
back in time on SourceForge,
Google Code, Fedora SRPM, etc.)
where test case behavior changes.
• Corresponds to a human-written
repair for the bug tested by the
failing test case(s).
ManyBugs drove algorithmic changes
Program LOC Tests Bugs Description
fbc 97,000 773 3 Language (legacy)
gmp 145,000 146 2 Multiple precision math
gzip 491,000 12 5 Data compression
libtiff 77,000 78 24 Image manipulation
lighttpd 62,000 295 9 Web server
php 1,046,000 8,471 44 Language (web)
python 407,000 355 11 Language (general)
wireshark 2,814,000 63 7 Network packet analyzer
Total 5,139,000 10,193 105
32
ManyBugs drove algorithmic changes
Program LOC Tests Bugs Description
fbc 97,000 773 3 Language (legacy)
gmp 145,000 146 2 Multiple precision math
gzip 491,000 12 5 Data compression
libtiff 77,000 78 24 Image manipulation
lighttpd 62,000 295 9 Web server
php 1,046,000 8,471 44 Language (web)
python 407,000 355 11 Language (general)
wireshark 2,814,000 63 7 Network packet analyzer
Total 5,139,000 10,193 105
33
ManyBugs drove algorithmic changes
Program LOC Tests Bugs Description
fbc 97,000 773 3 Language (legacy)
gmp 145,000 146 2 Multiple precision math
gzip 491,000 12 5 Data compression
libtiff 77,000 78 24 Image manipulation
lighttpd 62,000 295 9 Web server
php 1,046,000 8,471 44 Language (web)
python 407,000 355 11 Language (general)
wireshark 2,814,000 63 7 Network packet analyzer
Total 5,139,000 10,193 105
34
35
Scalability
• Both search- and
synthesis-based repair
techniques underwent
fundamental
reconfigurations to enable
scalability.
• A shared dataset of
indicative, real-world bugs
drove innovation in test-
driven program repair.
[ Nguyen et al. SemFix: program repair via semantic analysis. ICSE
2013. ]
[ Mechtaev et al. Angelix: scalable multiline program patch
synthesis via symbolic analysis. ICSE 2016. ]
[ Sim et al. Using benchmarking to advance research: A challenge
to software engineering. ICSE 2003. ]
The Three
Major
Challenges
One aspect of impactful research?
Release your code!
• This is still much less common than you’d expect.
– I am not and have not been perfect about this, but I try.
• Releasing code takes time and energy. Why clean up your code
and write documentation when it won’t turn into another
paper?
• Releasing code and data supports extension and comparison, a
cornerstone of the scientific process.
36
Lessons from a decade of program repair
• Lesson 1: Don’t let the perfect be the enemy of the good.
• Lesson 2: Shared metrics, benchmarks, and code drive research
progress and innovation.
• Lesson 3: Impact arises from more than just a single paper.
37
38
Scalability Repair quality
The Three
Major
Challenges
EVALUATE FITNESS
MUTATE
39
INPUT
OUTPUT
ACCEPT
DISCARD
Using tests was controversial
40
Tests make for a dangerous fitness function. 41
“reuse a content-
length check from
elsewhere in the code”
 nullhttpd: a webserver with basic
GET + POST functionality.
Version 0.5.0: remote-exploitable
heap-based buffer overflow in
handling of POST.
Failing test case: run
exploit, see if
webserver is still
running
Easy passing test cases:
1. “GET index.html”
2. “GET image.jpg”
3. “GET notfound.html”
4. ”POST /cgi-bin/hello.pl”
+
=
Tests make for a dangerous fitness function. 42
 nullhttpd: a webserver with basic
GET + POST functionality.
Version 0.5.0: remote-exploitable
heap-based buffer overflow in
handling of POST.
Failing test case: run
exploit, see if
webserver is still
running
+
=
“delete handling
of POST requests”
Easy passing test cases:
1. “GET index.html”
2. “GET image.jpg”
3. “GET notfound.html”
4. ”POST /cgi-bin/hello.pl”
Test suite quality definitely matters.
43
How much, practically, is a different question…
Wes
Claire
GRAD SCHOOL
Swords
44The journal extension!
 Scenario: Long-running servers +
IDS + generate repairs for
detected anomalies.
 Workloads: a day/week of
unfiltered requests to the UVA CS
webserver./php application.
[Rinard, et al.. Enhancing server availability and security through failure-
oblivious computing. OSDI ‘04.]
45The journal extension!
 Scenario: Long-running servers +
IDS + generate repairs for
detected anomalies.
 Workloads: a day/week of
unfiltered requests to the UVA CS
webserver./php application.
THIS PATCH
DELETED
CODE
[Rinard, et al.. Enhancing server availability and security through failure-
oblivious computing. OSDI ‘04.]
46The journal extension!
Even a functionality-reducing repair
had little practical impact.
[Rinard, et al.. Enhancing server availability and security through failure-
oblivious computing. OSDI ‘04.]
Lessons from a decade of program repair
• Lesson 1: Don’t let the perfect be the enemy of the good.
• Lesson 2: Shared metrics, benchmarks, and code can drive
research progress and innovation.
• Lesson 3: Impact arises from more than just a single paper.
47
2019: “We follow the standard practice of using test
cases to evaluate patch correctness.” –many papers
• Majority of 10-12 repair papers at ICSE 2019 use tests.
• Research efforts to understand, characterize, measure, and
promote patch quality are ongoing.
• Tests are good for many reasons!
– Developers understand them.
– They can be used to check a wide array of properties.
• Tests support a general repair paradigm that can (in principle)
integrate into existing QA practice.
48
We were wrong about tests at the time.
[ Urli et al. How to design a program repair bot?:
insights from the Repairnator project. ICSE (SEIP)
2018 ]
• The rise/dominance of continuous
integration, a natural extension
point for automatic repair, is a
fairly recent phenomenon.
• Modern workflows make it easy to
include a human in the patch
review loop, further reducing risk
of low-quality patches.
• But: the actual use case we
proposed in 2009 was unready for
prime time.
49
Lessons from a decade of program repair
• Lesson 1: Don’t let the perfect be the enemy of the good.
• Lesson 2: Shared metrics, benchmarks, and code drive research
progress and innovation.
• Lesson 3: Impact arises from more than just a single paper.
• Lesson 4: It is OK for SE research to lead SE practice.
50
Scientific Foundations of Repair
• On the one hand, it is surprising that GenProg has worked well.
• On the other hand, why hasn't it worked better?
• If evolution worked as well in software as it does in biology, we
would have seen many more advances.
51
Software = Engineering + Evolution ?
• There are already many evident analogues between Darwinian
processes and software engineering.
– Successful code is copied and reused (clones, COTS, interfaces vs.
inheritance).
– Programmers make small modifications (localization, churn vs.
variation).
• This suggests new directions for understanding and improving
software and software engineering.
52
Repair Theory: Evolutionary Computation
• Genetic Programming was introduced in 1985.
– Rarely scaled beyond polynomials.
– Steph thought repair using GP sounded fun, but would not work!
• The open question is to bridge the gap between results and
techniques in evolutionary biology and software engineering.
• One direction is to understand why it works at all.
– Transition insights from evolutionary computing to software.
[ Arcuri. On the automation of fixing software bugs. ICSE Companion 2008 ]
53
Taking Evolution Seriously
• Potential biological properties of software:
- e.g.: Mutational robustness vs. environmental robustness.
• Understanding the search space can help us search effectively.
- Many bugs are small. Software is not fragile.
- Other potentially interesting analogues:
• Neutrality and epistasis.
• Fitness distributions.
• Neutral network topology.
• Call to arms: be open to insights from other fields.
54
Lessons from a decade of program repair
• Lesson 1: Don’t let the perfect be the enemy of the good.
• Lesson 2: Shared metrics, benchmarks, and code drive research
progress and innovation.
• Lesson 3: Impact arises from more than just a single paper.
• Lesson 4: It is OK for research to lead industrial practice.
• Lesson 5: Be open to insights from other fields.
55
How did we get here?
56
10 years
of
progress
57
Scalability Repair quality
Expressive power
Big leaps in
scalability.
Incremental
leaps,
fundamental
research.
10 years
of
progress
58
Scalability Repair quality
Expressive power
Big leaps in
scalability.
Incremental
leaps,
fundamental
research.
Relaxing of
concerns in
practice.
10 years
of
progress
59
Scalability
Future
work: the
ongoing
challenges
60
Repair quality
Expressive power
Future work: the ongoing challenges
Expressive power
• More complicated, compositional,
multi-part patches.
• New ways to use machine
learning over prior human edits to
inform patch construction.
• Integration into existing QA
processes.
Repair quality
• Better understanding of partial
correctness, hostile fitness
landscapes, intermediate quality.
• Are human-informed repairs more
acceptable/trustworthy?
• Use of additional, non-test signals
available from the development
process.
61
[ Saha et al. Harnessing Evolution for Multi-Hunk Program Repair. ICSE 2019 ]
[ Long. Automatic Patch Generation via Learning from Successful Human Patches. PhD Thesis ]
[ Monperrus. A critical review of "automatic patch generation learned from human-written patches":
essay on the problem statement and the evaluation of automatic software repair. ICSE 2014 ]
Stepping back: A challenge about the future of SE work
62
Automation
Humans have not
traditionally been
expected to read,
understand, or modify
generated code.
Human
development
effort
Compilers/code
generators raise the
level of abstraction at
which humans can
operate.
Software Bots
Automated synthesis +
transformation is integrating
into the [complex socio
technical | natural Darwinian]
process of SE. How?
Lessons from a decade of program repair
• We made a dozen mistakes in
algorithmic design.
• Many of those mistakes were
discovered/addressed via shared
code and indicative benchmarks.
• …that was a lot of work.
• Test cases effectively capture key
aspects of acceptability for
deployment.
• The success/failure of evolutionary
approaches surfaces fundamental
properties of software.
• Lesson 1: Don’t let the perfect be
the enemy of the good.
• Lesson 2: Shared metrics,
benchmarks, and code drive
research progress and innovation.
• Lesson 3: Impact arises from more
than just a single paper.
• Lesson 4: It is OK for research to
lead industrial practice.
• Lesson 5: Be open to insights from
other fields.
63

More Related Content

What's hot

NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템
NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템
NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템tcaesvk
 
개발자, 머신러닝 엔지니어로 살아남기
개발자,  머신러닝 엔지니어로 살아남기개발자,  머신러닝 엔지니어로 살아남기
개발자, 머신러닝 엔지니어로 살아남기Curt Park
 
메이플2 하우징시스템 역기획서
메이플2 하우징시스템 역기획서메이플2 하우징시스템 역기획서
메이플2 하우징시스템 역기획서Eui hwan Hyeon
 
完全版:「UI自動テストツールとAI」〜AIを使った自動テストの「今」と「未来」〜
完全版:「UI自動テストツールとAI」〜AIを使った自動テストの「今」と「未来」〜完全版:「UI自動テストツールとAI」〜AIを使った自動テストの「今」と「未来」〜
完全版:「UI自動テストツールとAI」〜AIを使った自動テストの「今」と「未来」〜Nozomi Ito
 
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기taeseon ryu
 
良質なコードを高速に書くコツ
良質なコードを高速に書くコツ良質なコードを高速に書くコツ
良質なコードを高速に書くコツShunji Konishi
 
Fail Fast: PopCap's Approach to Quality
Fail Fast: PopCap's Approach to QualityFail Fast: PopCap's Approach to Quality
Fail Fast: PopCap's Approach to QualityJames Gwertzman
 
用十分鐘瞭解 陳鍾誠的程式設計課 (採用JavaScript + C的原因)
用十分鐘瞭解  陳鍾誠的程式設計課  (採用JavaScript + C的原因)用十分鐘瞭解  陳鍾誠的程式設計課  (採用JavaScript + C的原因)
用十分鐘瞭解 陳鍾誠的程式設計課 (採用JavaScript + C的原因)鍾誠 陳鍾誠
 
石造物3Dアーカイブ―記録と公開のお勧め―
石造物3Dアーカイブ―記録と公開のお勧め―石造物3Dアーカイブ―記録と公開のお勧め―
石造物3Dアーカイブ―記録と公開のお勧め―Kosuke Shinoda
 
如何用十分鐘快速瞭解一個程式語言 《以JavaScript和C語言為例》
如何用十分鐘快速瞭解一個程式語言  《以JavaScript和C語言為例》如何用十分鐘快速瞭解一個程式語言  《以JavaScript和C語言為例》
如何用十分鐘快速瞭解一個程式語言 《以JavaScript和C語言為例》鍾誠 陳鍾誠
 
Generative AI Art - The Dark Side
Generative AI Art - The Dark SideGenerative AI Art - The Dark Side
Generative AI Art - The Dark SideAbhinav Gupta
 
김충효, 10년째 같은 회사를 다니고 있습니다
김충효, 10년째 같은 회사를 다니고 있습니다김충효, 10년째 같은 회사를 다니고 있습니다
김충효, 10년째 같은 회사를 다니고 있습니다devCAT Studio, NEXON
 
검은사막 역기획서 - UI편집기
검은사막 역기획서 - UI편집기검은사막 역기획서 - UI편집기
검은사막 역기획서 - UI편집기SeungminBaik1
 
NDC 2016 이은석 - 돌죽을 끓입시다: 창의적 게임개발팀을 위한 왓 스튜디오의 업무 문화
NDC 2016 이은석 - 돌죽을 끓입시다: 창의적 게임개발팀을 위한 왓 스튜디오의 업무 문화NDC 2016 이은석 - 돌죽을 끓입시다: 창의적 게임개발팀을 위한 왓 스튜디오의 업무 문화
NDC 2016 이은석 - 돌죽을 끓입시다: 창의적 게임개발팀을 위한 왓 스튜디오의 업무 문화Eunseok Yi
 
十分鐘讓程式人搞懂雲端平台與技術
十分鐘讓程式人搞懂雲端平台與技術十分鐘讓程式人搞懂雲端平台與技術
十分鐘讓程式人搞懂雲端平台與技術鍾誠 陳鍾誠
 
Responsible/Trustworthy AI in the Era of Foundation Models
Responsible/Trustworthy AI in the Era of Foundation Models Responsible/Trustworthy AI in the Era of Foundation Models
Responsible/Trustworthy AI in the Era of Foundation Models Liming Zhu
 
Artificial Intelligence Explained: What Are Generative Adversarial Networks (...
Artificial Intelligence Explained: What Are Generative Adversarial Networks (...Artificial Intelligence Explained: What Are Generative Adversarial Networks (...
Artificial Intelligence Explained: What Are Generative Adversarial Networks (...Bernard Marr
 
Eclipse Community Survey Report 2013
Eclipse Community Survey Report 2013Eclipse Community Survey Report 2013
Eclipse Community Survey Report 2013Ian Skerrett
 
Excel VBAという諸刃の剣を真っすぐに扱うために
Excel VBAという諸刃の剣を真っすぐに扱うためにExcel VBAという諸刃の剣を真っすぐに扱うために
Excel VBAという諸刃の剣を真っすぐに扱うためにTakumi Nasuno
 

What's hot (20)

Web上の誹謗中傷を表す文の自動抽出
Web上の誹謗中傷を表す文の自動抽出Web上の誹謗中傷を表す文の自動抽出
Web上の誹謗中傷を表す文の自動抽出
 
NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템
NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템
NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템
 
개발자, 머신러닝 엔지니어로 살아남기
개발자,  머신러닝 엔지니어로 살아남기개발자,  머신러닝 엔지니어로 살아남기
개발자, 머신러닝 엔지니어로 살아남기
 
메이플2 하우징시스템 역기획서
메이플2 하우징시스템 역기획서메이플2 하우징시스템 역기획서
메이플2 하우징시스템 역기획서
 
完全版:「UI自動テストツールとAI」〜AIを使った自動テストの「今」と「未来」〜
完全版:「UI自動テストツールとAI」〜AIを使った自動テストの「今」と「未来」〜完全版:「UI自動テストツールとAI」〜AIを使った自動テストの「今」と「未来」〜
完全版:「UI自動テストツールとAI」〜AIを使った自動テストの「今」と「未来」〜
 
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기
SSDC2022 - AI for Everyone 딥러닝 논문읽고 성장하는 모임이야기
 
良質なコードを高速に書くコツ
良質なコードを高速に書くコツ良質なコードを高速に書くコツ
良質なコードを高速に書くコツ
 
Fail Fast: PopCap's Approach to Quality
Fail Fast: PopCap's Approach to QualityFail Fast: PopCap's Approach to Quality
Fail Fast: PopCap's Approach to Quality
 
用十分鐘瞭解 陳鍾誠的程式設計課 (採用JavaScript + C的原因)
用十分鐘瞭解  陳鍾誠的程式設計課  (採用JavaScript + C的原因)用十分鐘瞭解  陳鍾誠的程式設計課  (採用JavaScript + C的原因)
用十分鐘瞭解 陳鍾誠的程式設計課 (採用JavaScript + C的原因)
 
石造物3Dアーカイブ―記録と公開のお勧め―
石造物3Dアーカイブ―記録と公開のお勧め―石造物3Dアーカイブ―記録と公開のお勧め―
石造物3Dアーカイブ―記録と公開のお勧め―
 
如何用十分鐘快速瞭解一個程式語言 《以JavaScript和C語言為例》
如何用十分鐘快速瞭解一個程式語言  《以JavaScript和C語言為例》如何用十分鐘快速瞭解一個程式語言  《以JavaScript和C語言為例》
如何用十分鐘快速瞭解一個程式語言 《以JavaScript和C語言為例》
 
Generative AI Art - The Dark Side
Generative AI Art - The Dark SideGenerative AI Art - The Dark Side
Generative AI Art - The Dark Side
 
김충효, 10년째 같은 회사를 다니고 있습니다
김충효, 10년째 같은 회사를 다니고 있습니다김충효, 10년째 같은 회사를 다니고 있습니다
김충효, 10년째 같은 회사를 다니고 있습니다
 
검은사막 역기획서 - UI편집기
검은사막 역기획서 - UI편집기검은사막 역기획서 - UI편집기
검은사막 역기획서 - UI편집기
 
NDC 2016 이은석 - 돌죽을 끓입시다: 창의적 게임개발팀을 위한 왓 스튜디오의 업무 문화
NDC 2016 이은석 - 돌죽을 끓입시다: 창의적 게임개발팀을 위한 왓 스튜디오의 업무 문화NDC 2016 이은석 - 돌죽을 끓입시다: 창의적 게임개발팀을 위한 왓 스튜디오의 업무 문화
NDC 2016 이은석 - 돌죽을 끓입시다: 창의적 게임개발팀을 위한 왓 스튜디오의 업무 문화
 
十分鐘讓程式人搞懂雲端平台與技術
十分鐘讓程式人搞懂雲端平台與技術十分鐘讓程式人搞懂雲端平台與技術
十分鐘讓程式人搞懂雲端平台與技術
 
Responsible/Trustworthy AI in the Era of Foundation Models
Responsible/Trustworthy AI in the Era of Foundation Models Responsible/Trustworthy AI in the Era of Foundation Models
Responsible/Trustworthy AI in the Era of Foundation Models
 
Artificial Intelligence Explained: What Are Generative Adversarial Networks (...
Artificial Intelligence Explained: What Are Generative Adversarial Networks (...Artificial Intelligence Explained: What Are Generative Adversarial Networks (...
Artificial Intelligence Explained: What Are Generative Adversarial Networks (...
 
Eclipse Community Survey Report 2013
Eclipse Community Survey Report 2013Eclipse Community Survey Report 2013
Eclipse Community Survey Report 2013
 
Excel VBAという諸刃の剣を真っすぐに扱うために
Excel VBAという諸刃の剣を真っすぐに扱うためにExcel VBAという諸刃の剣を真っすぐに扱うために
Excel VBAという諸刃の剣を真っすぐに扱うために
 

Similar to It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair

In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?CS, NcState
 
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningSergey Karayev
 
A tale of bug prediction in software development
A tale of bug prediction in software developmentA tale of bug prediction in software development
A tale of bug prediction in software developmentMartin Pinzger
 
On the Value of User Preferences in Search-Based Software Engineering
On the Value of User Preferences in Search-Based Software EngineeringOn the Value of User Preferences in Search-Based Software Engineering
On the Value of User Preferences in Search-Based Software EngineeringAbdel Salam Sayyad
 
Performance Analysis of Idle Programs
Performance Analysis of Idle ProgramsPerformance Analysis of Idle Programs
Performance Analysis of Idle Programsgreenwop
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...IEEEFINALYEARSTUDENTPROJECT
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...IEEEMEMTECHSTUDENTSPROJECTS
 
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...IEEEFINALYEARSTUDENTPROJECTS
 
Automated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsAutomated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsLionel Briand
 
Final Exam Questions Fall03
Final Exam Questions Fall03Final Exam Questions Fall03
Final Exam Questions Fall03Radu_Negulescu
 
Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Lionel Briand
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUCS, NcState
 
Workshop BI/DWH AGILE TESTING SNS Bank English
Workshop BI/DWH AGILE TESTING SNS Bank EnglishWorkshop BI/DWH AGILE TESTING SNS Bank English
Workshop BI/DWH AGILE TESTING SNS Bank EnglishMarcus Drost
 
Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!Josef Hardi
 
Testing Zen
Testing ZenTesting Zen
Testing Zenday
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performancePiotr Przymus
 
Easygenomics ISCB Cloud section 2012
Easygenomics ISCB Cloud section 2012Easygenomics ISCB Cloud section 2012
Easygenomics ISCB Cloud section 2012Xing Xu
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directionsTao He
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Keynote VST2020 (Workshop on Validation, Analysis and Evolution of Software ...
Keynote VST2020 (Workshop on  Validation, Analysis and Evolution of Software ...Keynote VST2020 (Workshop on  Validation, Analysis and Evolution of Software ...
Keynote VST2020 (Workshop on Validation, Analysis and Evolution of Software ...University of Antwerp
 

Similar to It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair (20)

In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?
 
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
 
A tale of bug prediction in software development
A tale of bug prediction in software developmentA tale of bug prediction in software development
A tale of bug prediction in software development
 
On the Value of User Preferences in Search-Based Software Engineering
On the Value of User Preferences in Search-Based Software EngineeringOn the Value of User Preferences in Search-Based Software Engineering
On the Value of User Preferences in Search-Based Software Engineering
 
Performance Analysis of Idle Programs
Performance Analysis of Idle ProgramsPerformance Analysis of Idle Programs
Performance Analysis of Idle Programs
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
 
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
 
Automated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance SystemsAutomated Testing of Autonomous Driving Assistance Systems
Automated Testing of Autonomous Driving Assistance Systems
 
Final Exam Questions Fall03
Final Exam Questions Fall03Final Exam Questions Fall03
Final Exam Questions Fall03
 
Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSU
 
Workshop BI/DWH AGILE TESTING SNS Bank English
Workshop BI/DWH AGILE TESTING SNS Bank EnglishWorkshop BI/DWH AGILE TESTING SNS Bank English
Workshop BI/DWH AGILE TESTING SNS Bank English
 
Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!Ontology-based data access: why it is so cool!
Ontology-based data access: why it is so cool!
 
Testing Zen
Testing ZenTesting Zen
Testing Zen
 
What’s eating python performance
What’s eating python performanceWhat’s eating python performance
What’s eating python performance
 
Easygenomics ISCB Cloud section 2012
Easygenomics ISCB Cloud section 2012Easygenomics ISCB Cloud section 2012
Easygenomics ISCB Cloud section 2012
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directions
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Keynote VST2020 (Workshop on Validation, Analysis and Evolution of Software ...
Keynote VST2020 (Workshop on  Validation, Analysis and Evolution of Software ...Keynote VST2020 (Workshop on  Validation, Analysis and Evolution of Software ...
Keynote VST2020 (Workshop on Validation, Analysis and Evolution of Software ...
 

Recently uploaded

chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 

Recently uploaded (20)

chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 

It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair

  • 1. It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair Westley Weimer ThanhVu Nguyen Claire Le Goues Stephanie Forrest
  • 5. The Once And Future Problem: Bugs • 2002 NIST survey: software bugs to cost 0.6% of the US GDP. • 2003 textbooks: bemoaned that up to 90% of a software project's cost was dedicated to maintenance and bug repair. • 2005 Mozilla developers: complained about 300 bugs appearing per day: "far too many" to handle. • As a graduate student, Wes had worked on SLAM and BLAST and envied Dawson Engler. – Only to hear "we already have tens of thousands of un-fixed bug reports; don't bother finding more bugs”. 5
  • 6. The Cunning Plan • Automatically, efficiently repair certain classes of bugs in off-the-shelf, unannotated legacy programs. • Basic idea: biased, random search through the space of all programs for a variant that repairs the problem. 6 https://upload.wikimedia.org/wikipedia/commons/a/a4/13-02-27-spielbank-wiesbaden-by-RalfR-093.jpg
  • 7. Genetic programming: the application of evolutionary or genetic algorithms to program source code. 7
  • 10. Original Secret Sauces • Use existing test cases to evaluate candidate repairs. • Search by perturbing parts of the program likely to contain the error. • Existing program code and behavior contains the seeds of many repairs. – Leverage existing developer expertise rather than inventing new code! 10
  • 11. Candidate Repairs: Modified Abstract Syntax Trees • Programs statements are manipulated. – Reduces search space compared to changing expressions. • Custom spectrum fault localization. – Called the "weighted path" in the papers: it wasn't very good. – Reduces search space compared to changing the whole program. • Simple mutation (e.g., in the style of introductory students). – Choose a statement based on fault localization weights. – Delete S, Replace S with S1, or Insert S2 after S. • Choose S1 and S2 from the entire AST. • Reduces search space compared to inventing statements (synthesis). 11
  • 12. MUTATE 12 INPUT DISCARD ACCEPT EVALUATE FITNESS Minimization using delta debugging to “mitigate the risk of breaking untested functionality.”
  • 13. Search-Based Software Engineering • This search-based approach admits other (multi-objective) fitness functions. – Energy reduction, graphical fidelity, readability, execution time, etc. • In 2009: bugs via pass/fail tests only. [ Harman and Jones. Search-based software engineering. Information & Software Technology (2001) ] 13
  • 15. The original! Note the overachievement! We were aiming for a minimum of 50k LOC. 15 Inspired By Fuzzing
  • 16. Fast Forward to the 2019 ICSE SEIP Track…. (Presented just two hours ago!) 16 “Results from repair applied to 6 multi- million line systems.” “Facebook, Inc” “one widely-studied [repair] approach uses software testing to guide the repair process, as typified by GenProg.”
  • 17. How did we get here? 17
  • 18. An incomplete list of acknowledgements • As a group, we saw an emphasis on taking risks and allowing junior researchers to rise to the occasion. • Mark Harman and collaborators at CREST and SBST • John Knight, Jack Davidson, Anh Nguyen-Tuong, Eric Schulte, Zak Fry, Ethan Fast, and Michael Dewey-Vogt from UVA/UNM • Tom Ball and collaborators from Microsoft Research • Pat Hurley and Sol Greenspan for initial funding support • … and many more! 18
  • 19. Another possibility: We were right about everything! All we needed was for Facebook to take an interest and implement it. 19
  • 21. MUTATE DISCARD INPUT EVALUATE FITNESS ACCEPT OUTPUT 21 Storing full programs does not scale. Storing patches instead would help make genetic improvement feasible.
  • 22. Candidate Repairs: Modified Abstract Syntax Trees • Programs statements are manipulated. – Reduces search space compared to changing expressions. • Custom spectrum fault localization. – High (1.0) if S is not visited on a passed test. – Low (0.1, 0.0) if S is also visited on a passed test. • Simple mutation. – Choose a statement based on fault localization weights. – Delete S, Replace S with S1, or Insert S2 after S. • Choose S1 and S2 from the entire AST. • Reduces search space compared to inventing statements (synthesis). 22
  • 24. MUTATE 24 INPUT DISCARD ACCEPT EVALUATE FITNESS Minimization does not affect semantic patch quality. ML experts have known since the 70's that model size can be independent of degree of overfitting.
  • 25. Selection and Maintenance Debt • If there are multiple high-fitness candidates, how do you pick one? – Used stochastic universal sampling (roulette wheel selection). – Better approaches (e.g., tournament selection) were known. – SUS was first on the list on Wikipedia (no, really …). • First GenProg prototype was made for a Multi University Research Initiatives grant meeting. – Fixed GCD infinite loop and Nullhttpd buffer overrun. – Victim of its own success: no refactoring for years. 25
  • 26. Lessons from a decade of program repair • Lesson 1: Don’t let the perfect be the enemy of the good. 26
  • 27. Another possibility: We were right about everything! All we needed was for Facebook to take an interest and implement it. 27
  • 30. “If I gave you the last 100 bugs from <my project>, how many could <your technique> fix?” – Real Engineers 30
  • 31. Systematic Benchmark Construction 31 • Approach: use historical data to approximate discovery and repair of bugs in the wild. • Mine program versions (going back in time on SourceForge, Google Code, Fedora SRPM, etc.) where test case behavior changes. • Corresponds to a human-written repair for the bug tested by the failing test case(s).
  • 32. ManyBugs drove algorithmic changes Program LOC Tests Bugs Description fbc 97,000 773 3 Language (legacy) gmp 145,000 146 2 Multiple precision math gzip 491,000 12 5 Data compression libtiff 77,000 78 24 Image manipulation lighttpd 62,000 295 9 Web server php 1,046,000 8,471 44 Language (web) python 407,000 355 11 Language (general) wireshark 2,814,000 63 7 Network packet analyzer Total 5,139,000 10,193 105 32
  • 33. ManyBugs drove algorithmic changes Program LOC Tests Bugs Description fbc 97,000 773 3 Language (legacy) gmp 145,000 146 2 Multiple precision math gzip 491,000 12 5 Data compression libtiff 77,000 78 24 Image manipulation lighttpd 62,000 295 9 Web server php 1,046,000 8,471 44 Language (web) python 407,000 355 11 Language (general) wireshark 2,814,000 63 7 Network packet analyzer Total 5,139,000 10,193 105 33
  • 34. ManyBugs drove algorithmic changes Program LOC Tests Bugs Description fbc 97,000 773 3 Language (legacy) gmp 145,000 146 2 Multiple precision math gzip 491,000 12 5 Data compression libtiff 77,000 78 24 Image manipulation lighttpd 62,000 295 9 Web server php 1,046,000 8,471 44 Language (web) python 407,000 355 11 Language (general) wireshark 2,814,000 63 7 Network packet analyzer Total 5,139,000 10,193 105 34
  • 35. 35 Scalability • Both search- and synthesis-based repair techniques underwent fundamental reconfigurations to enable scalability. • A shared dataset of indicative, real-world bugs drove innovation in test- driven program repair. [ Nguyen et al. SemFix: program repair via semantic analysis. ICSE 2013. ] [ Mechtaev et al. Angelix: scalable multiline program patch synthesis via symbolic analysis. ICSE 2016. ] [ Sim et al. Using benchmarking to advance research: A challenge to software engineering. ICSE 2003. ] The Three Major Challenges
  • 36. One aspect of impactful research? Release your code! • This is still much less common than you’d expect. – I am not and have not been perfect about this, but I try. • Releasing code takes time and energy. Why clean up your code and write documentation when it won’t turn into another paper? • Releasing code and data supports extension and comparison, a cornerstone of the scientific process. 36
  • 37. Lessons from a decade of program repair • Lesson 1: Don’t let the perfect be the enemy of the good. • Lesson 2: Shared metrics, benchmarks, and code drive research progress and innovation. • Lesson 3: Impact arises from more than just a single paper. 37
  • 38. 38 Scalability Repair quality The Three Major Challenges
  • 40. Using tests was controversial 40
  • 41. Tests make for a dangerous fitness function. 41 “reuse a content- length check from elsewhere in the code”  nullhttpd: a webserver with basic GET + POST functionality. Version 0.5.0: remote-exploitable heap-based buffer overflow in handling of POST. Failing test case: run exploit, see if webserver is still running Easy passing test cases: 1. “GET index.html” 2. “GET image.jpg” 3. “GET notfound.html” 4. ”POST /cgi-bin/hello.pl” + =
  • 42. Tests make for a dangerous fitness function. 42  nullhttpd: a webserver with basic GET + POST functionality. Version 0.5.0: remote-exploitable heap-based buffer overflow in handling of POST. Failing test case: run exploit, see if webserver is still running + = “delete handling of POST requests” Easy passing test cases: 1. “GET index.html” 2. “GET image.jpg” 3. “GET notfound.html” 4. ”POST /cgi-bin/hello.pl”
  • 43. Test suite quality definitely matters. 43 How much, practically, is a different question… Wes Claire GRAD SCHOOL Swords
  • 44. 44The journal extension!  Scenario: Long-running servers + IDS + generate repairs for detected anomalies.  Workloads: a day/week of unfiltered requests to the UVA CS webserver./php application. [Rinard, et al.. Enhancing server availability and security through failure- oblivious computing. OSDI ‘04.]
  • 45. 45The journal extension!  Scenario: Long-running servers + IDS + generate repairs for detected anomalies.  Workloads: a day/week of unfiltered requests to the UVA CS webserver./php application. THIS PATCH DELETED CODE [Rinard, et al.. Enhancing server availability and security through failure- oblivious computing. OSDI ‘04.]
  • 46. 46The journal extension! Even a functionality-reducing repair had little practical impact. [Rinard, et al.. Enhancing server availability and security through failure- oblivious computing. OSDI ‘04.]
  • 47. Lessons from a decade of program repair • Lesson 1: Don’t let the perfect be the enemy of the good. • Lesson 2: Shared metrics, benchmarks, and code can drive research progress and innovation. • Lesson 3: Impact arises from more than just a single paper. 47
  • 48. 2019: “We follow the standard practice of using test cases to evaluate patch correctness.” –many papers • Majority of 10-12 repair papers at ICSE 2019 use tests. • Research efforts to understand, characterize, measure, and promote patch quality are ongoing. • Tests are good for many reasons! – Developers understand them. – They can be used to check a wide array of properties. • Tests support a general repair paradigm that can (in principle) integrate into existing QA practice. 48
  • 49. We were wrong about tests at the time. [ Urli et al. How to design a program repair bot?: insights from the Repairnator project. ICSE (SEIP) 2018 ] • The rise/dominance of continuous integration, a natural extension point for automatic repair, is a fairly recent phenomenon. • Modern workflows make it easy to include a human in the patch review loop, further reducing risk of low-quality patches. • But: the actual use case we proposed in 2009 was unready for prime time. 49
  • 50. Lessons from a decade of program repair • Lesson 1: Don’t let the perfect be the enemy of the good. • Lesson 2: Shared metrics, benchmarks, and code drive research progress and innovation. • Lesson 3: Impact arises from more than just a single paper. • Lesson 4: It is OK for SE research to lead SE practice. 50
  • 51. Scientific Foundations of Repair • On the one hand, it is surprising that GenProg has worked well. • On the other hand, why hasn't it worked better? • If evolution worked as well in software as it does in biology, we would have seen many more advances. 51
  • 52. Software = Engineering + Evolution ? • There are already many evident analogues between Darwinian processes and software engineering. – Successful code is copied and reused (clones, COTS, interfaces vs. inheritance). – Programmers make small modifications (localization, churn vs. variation). • This suggests new directions for understanding and improving software and software engineering. 52
  • 53. Repair Theory: Evolutionary Computation • Genetic Programming was introduced in 1985. – Rarely scaled beyond polynomials. – Steph thought repair using GP sounded fun, but would not work! • The open question is to bridge the gap between results and techniques in evolutionary biology and software engineering. • One direction is to understand why it works at all. – Transition insights from evolutionary computing to software. [ Arcuri. On the automation of fixing software bugs. ICSE Companion 2008 ] 53
  • 54. Taking Evolution Seriously • Potential biological properties of software: - e.g.: Mutational robustness vs. environmental robustness. • Understanding the search space can help us search effectively. - Many bugs are small. Software is not fragile. - Other potentially interesting analogues: • Neutrality and epistasis. • Fitness distributions. • Neutral network topology. • Call to arms: be open to insights from other fields. 54
  • 55. Lessons from a decade of program repair • Lesson 1: Don’t let the perfect be the enemy of the good. • Lesson 2: Shared metrics, benchmarks, and code drive research progress and innovation. • Lesson 3: Impact arises from more than just a single paper. • Lesson 4: It is OK for research to lead industrial practice. • Lesson 5: Be open to insights from other fields. 55
  • 56. How did we get here? 56
  • 57. 10 years of progress 57 Scalability Repair quality Expressive power Big leaps in scalability. Incremental leaps, fundamental research.
  • 58. 10 years of progress 58 Scalability Repair quality Expressive power Big leaps in scalability. Incremental leaps, fundamental research. Relaxing of concerns in practice.
  • 61. Future work: the ongoing challenges Expressive power • More complicated, compositional, multi-part patches. • New ways to use machine learning over prior human edits to inform patch construction. • Integration into existing QA processes. Repair quality • Better understanding of partial correctness, hostile fitness landscapes, intermediate quality. • Are human-informed repairs more acceptable/trustworthy? • Use of additional, non-test signals available from the development process. 61 [ Saha et al. Harnessing Evolution for Multi-Hunk Program Repair. ICSE 2019 ] [ Long. Automatic Patch Generation via Learning from Successful Human Patches. PhD Thesis ] [ Monperrus. A critical review of "automatic patch generation learned from human-written patches": essay on the problem statement and the evaluation of automatic software repair. ICSE 2014 ]
  • 62. Stepping back: A challenge about the future of SE work 62 Automation Humans have not traditionally been expected to read, understand, or modify generated code. Human development effort Compilers/code generators raise the level of abstraction at which humans can operate. Software Bots Automated synthesis + transformation is integrating into the [complex socio technical | natural Darwinian] process of SE. How?
  • 63. Lessons from a decade of program repair • We made a dozen mistakes in algorithmic design. • Many of those mistakes were discovered/addressed via shared code and indicative benchmarks. • …that was a lot of work. • Test cases effectively capture key aspects of acceptability for deployment. • The success/failure of evolutionary approaches surfaces fundamental properties of software. • Lesson 1: Don’t let the perfect be the enemy of the good. • Lesson 2: Shared metrics, benchmarks, and code drive research progress and innovation. • Lesson 3: Impact arises from more than just a single paper. • Lesson 4: It is OK for research to lead industrial practice. • Lesson 5: Be open to insights from other fields. 63

Editor's Notes

  1. 17:46
  2. 3D growth: https://www.flickr.com/photos/86530412@N02/7935377706, www.stockmonkeys.com, labeled CC BY 2.0 quality: https://pixabay.com/en/approved-control-quality-stamp-147677/, public domain diversity: CC BY-NC-SA 2.0 https://www.flickr.com/photos/cimmyt/5219256862, Photo credit: Xochiquetzal Fonseca/CIMMYT.
  3. Figure from: https://cloud.google.com/solutions/continuous-integration/