Wcre2010 shihab

Predicting Re-opened Bugs
A Case Study on the Eclipse Project
Emad Shihab, A. Ihara, Y. Kamei, W. Ibrahim,
M. Ohira, B. Adams, A. E. Hassan and K. Matsumoto
emads@cs.queensu.ca
SAIL, Queen’s University, Canada
NAIST, Japan
1

When you discover a bug …
Report bug Fix bug Verify fix Close bug
Re-opened
2

Increase maintenance costs …
4

Research questions …
1. Which attributes indicate re-opened bugs?
2. Can we accurately predict if a bug will be re-
opened using the extracted attributes?
6

Determine
best
attributes
Mine code
and bug
repositories
Approach overview
Extract
attributes
Predict re-
opened bugs
7

Our dimensions …
8
Work habit Bug report
Bug fix People

Work habit attributes
1. Time (Hour of day)
2. Weekday
3. Day of month
4. Month
9

Bug report attributes
1. Component
2. Platform
3. Severity
4. Priority
5. CC list
6. Priority changed
7. Description size
8. Description text
9. Number of comments
10. Comment size
11. Comment text
10
Metadata
Textual
data

Bug fix attributes
1. Time to resolve (in days)
2. Last status
3. Number of edited files
11

People attributes
1. Reporter Name
2. Reporter experience
3. Fixer name
4. Fixer experience
12

Research question 1
Which attributes indicate re-opened bugs?
13
Comment text, description text and fix location
(component) are the best indicators

Top node analysis setup
1. Build 10 decision trees for each attribute set
3. Repeat using all attributes
2. Record the frequency and level of each attribute
14

Decision tree prediction model
15
No. files
>= 5 < 5
Dev exp
>= 3 < 3
Re-openedMonth
Time
>= 12 < 12
Time to resolve
>= 6 < 6 >= 24 < 24
Re-opened Not Re-opened Re-opened.
.
.
.
.
.
Level 1
Level 2
Level 3

Top node analysis example with 3
trees
Comment
Time No. comments
Comment
Time No. files
No. files
Time Description size
Level Frequency Attributes
Level 1 2
1
Comment
No. files
Level 2 3
1
1
1
Time
No. comments
No. files
Description size
.
.
.
.
.
.
16

Which attributes best indicate re-
opened bugs?
17
Work habit attributes
9 X Month
1 X Time (Hour of day)
Weekday
Day of month

opened bugs?
18
Component
Platform
Severity
Priority
CC list
Priority changed
Description size
Description text
Number of comments
Comment size
10 X Comment text
Metadata
Textual
data

opened bugs?
7 X Time to resolve
3 X Last status
Number of files in fix
19
Bug fix attributes

opened bugs?
5 X Reporter name
5 X Fixer name
Reporter experience
Fixer experience
20
People attributes

Combining all attributes
+ ++
Level Frequency Attributes
Level 1 10 Comment text
Level 2 19
1
Description text
Component
21

Research question 2
Can we accurately predict if a bug will be
re-opened using the extracted attributes?
22
Our models can correctly predict re-opened bugs with
63% precision and 85% recall

Decision tree prediction model
23
No. files
>= 5 < 5
Dev exp
>= 3 < 3
Re-openedMonth
Time
>= 12 < 12
Time to resolve
>= 6 < 6 >= 24 < 24
Re-opened Not Re-opened Re-opened.
.
.
.
.
.
Level 1
Level 2
Level 3

Performance measures
Re-opened precision:
Re-opened Recall:
Re-opened Not re-opened
Re-opened TP FP
Not re-opened FN TN
Predicted
Actual
𝑇𝑃
𝑇𝑃 + 𝐹𝑃
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
Not re-opened precision:
Not re-opened recall:
𝑇𝑁
𝑇𝑁 + 𝐹𝑁
𝑇𝑁
𝑇𝑁 + 𝐹𝑃
24

33
63
21
27
74
83 83
67
Work habits Bug report Bug fix People
Precisionandrecall(%)
Precision
Recall
Predicting re-opened bugs
25

93
97
93 91
71
91
39
66
Precision
Recall
Predicting NOT re-opened bugs
26

Combining all attributes
63
97
85
90
re-opened NOT re-opened
Precision
Recall
27
+ ++

Bug comments are important …
Bug report is most important set
What words are important?
Comment text most important bug report attribute
28

Important words
Re-opened Not Re-opened
control
background
debugging
breakpoint
blocked
platforms
verified
duplicate
screenshot
important
testing
warning
29

Pr: 93 %
Re: 71 %
Pr: 33 %
Re: 74 %
Pr: 97%
Re: 91%
Pr: 93%
Re: 39%
Pr: 63 %
Re: 83 %
Pr: 21%
Re: 83%
Pr: 91%
Re: 66%
Pr: 27%
Re: 67%ened
pened
31

32

Predicting NOT re-opened bugs
Pr: 93 %
Re: 71 %
Pr: 97%
Re: 91%
Pr: 93%
Re: 39%
Pr: 91%
Re: 66%
33

Pr: 97 %
Re: 90 %
Pr: 63 %
Re: 85 %Re-opened
Not Re-opened
+ ++
Recall
Precision
34

Predict re-
opened
bugs
Mine code
and bug
repositories
Approach overview
Attributes of
re-opened
bugs
Measure
performance
35

Precisionandrecallquantity
Recall
Precision
36

opened bugs?
Month (9)
Time (1)
Work habits
Comment
text (10)
Bug report Bug fix
Time to fix (7)
Last status (3)
People
Fixer (5)
Reporter (5)
37

1. Component
2. Platform
3. Severity
4. Priority
5. CC list
6. Priority changed
7. Description size
8. Description text
9. Number of comments
10. Comment size
11. Comment text
40
Metadata
Textual
data

Wcre2010 shihab

Recommended

Recommended

More Related Content

Similar to Wcre2010 shihab

Similar to Wcre2010 shihab (20)

More from SAIL_QU

More from SAIL_QU (20)

Wcre2010 shihab