Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Wcre2010 shihab
1. Predicting Re-opened Bugs
A Case Study on the Eclipse Project
Emad Shihab, A. Ihara, Y. Kamei, W. Ibrahim,
M. Ohira, B. Adams, A. E. Hassan and K. Matsumoto
emads@cs.queensu.ca
SAIL, Queen’s University, Canada
NAIST, Japan
1
2. When you discover a bug …
Report bug Fix bug Verify fix Close bug
Re-opened
2
6. Research questions …
1. Which attributes indicate re-opened bugs?
2. Can we accurately predict if a bug will be re-
opened using the extracted attributes?
6
13. Research question 1
Which attributes indicate re-opened bugs?
13
Comment text, description text and fix location
(component) are the best indicators
14. Top node analysis setup
1. Build 10 decision trees for each attribute set
3. Repeat using all attributes
2. Record the frequency and level of each attribute
14
15. Decision tree prediction model
15
No. files
>= 5 < 5
Dev exp
>= 3 < 3
Re-openedMonth
Time
>= 12 < 12
Time to resolve
>= 6 < 6 >= 24 < 24
Re-opened Not Re-opened Re-opened.
.
.
.
.
.
Level 1
Level 2
Level 3
16. Top node analysis example with 3
trees
Comment
Time No. comments
Comment
Time No. files
No. files
Time Description size
Level Frequency Attributes
Level 1 2
1
Comment
No. files
Level 2 3
1
1
1
Time
No. comments
No. files
Description size
.
.
.
.
.
.
16
17. Which attributes best indicate re-
opened bugs?
17
Work habit attributes
9 X Month
1 X Time (Hour of day)
Weekday
Day of month
18. Which attributes best indicate re-
opened bugs?
18
Bug report attributes
Component
Platform
Severity
Priority
CC list
Priority changed
Description size
Description text
Number of comments
Comment size
10 X Comment text
Metadata
Textual
data
19. Which attributes best indicate re-
opened bugs?
7 X Time to resolve
3 X Last status
Number of files in fix
19
Bug fix attributes
20. Which attributes best indicate re-
opened bugs?
5 X Reporter name
5 X Fixer name
Reporter experience
Fixer experience
20
People attributes
21. Combining all attributes
+ ++
Level Frequency Attributes
Level 1 10 Comment text
Level 2 19
1
Description text
Component
21
22. Research question 2
Can we accurately predict if a bug will be
re-opened using the extracted attributes?
22
Our models can correctly predict re-opened bugs with
63% precision and 85% recall
23. Decision tree prediction model
23
No. files
>= 5 < 5
Dev exp
>= 3 < 3
Re-openedMonth
Time
>= 12 < 12
Time to resolve
>= 6 < 6 >= 24 < 24
Re-opened Not Re-opened Re-opened.
.
.
.
.
.
Level 1
Level 2
Level 3
24. Performance measures
Re-opened precision:
Re-opened Recall:
Re-opened Not re-opened
Re-opened TP FP
Not re-opened FN TN
Predicted
Actual
𝑇𝑃
𝑇𝑃 + 𝐹𝑃
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
Not re-opened precision:
Not re-opened recall:
𝑇𝑁
𝑇𝑁 + 𝐹𝑁
𝑇𝑁
𝑇𝑁 + 𝐹𝑃
24
28. Bug comments are important …
Bug report is most important set
What words are important?
Comment text most important bug report attribute
28
29. Important words
Re-opened Not Re-opened
control
background
debugging
breakpoint
blocked
platforms
verified
duplicate
screenshot
important
testing
warning
29
36. Work habits Bug report Bug fix People
Precisionandrecallquantity
Recall
Precision
Predicting re-opened bugs
36
37. Which attributes best indicate re-
opened bugs?
Month (9)
Time (1)
Work habits
Comment
text (10)
Bug report Bug fix
Time to fix (7)
Last status (3)
People
Fixer (5)
Reporter (5)
37