Advantages of Hiring UIUX Design Service Providers for Your Business
Bug Prediction Based on Fine-Grained Module Histories
1. Bug Prediction Based on
Fine-Grained Module
Histories
H i d e a k i H a t a
O s a m u M i z u n o
To h r u K i k u n o
1
2. Overview
Background
Historical metrics are useful for bug prediction
Problem
For method-level prediction, it is difficult to
collect historical metrics
Solution & Results
Historage: fine-grained version control system
First study of method-level bug prediction with
well-known historical metrics
2
5. Mining Version
Control Repository
Commit message
Fix bug #32528
... n-3 n-2 n-1 n n+1 n+2 n+3 ...
< July 2007 > Code delta
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 1 2 3 4
5 6 7 8 9 10 11
5
6. What We Have
Learned
Prediction accuracy
Historical metrics ≥ Static code metrics
[Moser et al. ’08, Kamei et al. ’10]
Required effort
File-level ≤ Package-level
[Kamei et al. ’10, Nguyen et al. ’10,
Posnett et al. ’11]
6
7. State of the Art
Papers (TSE, EMSE, ICSE, ESEC/FSE, FSE, ICSM, MSR)
Package-level
Cache model
[Kim et al. ’07]
File-level
Spam filtering model
[Mizuno et al. ’07]
Method-level
0 5 10 15
No method-level prediction with well-known historical metrics
7
8. Method-Level
Prediction
Requirement
Method-level historical metrics
Problem
Analysis of method histories is difficult
8
9. Difficulties
1.Tracking methods is troublesome
Matching methods should be found between
sequential snapshots
2.Method-level metadata are not easily available
Metadata (who, when, n-2 n-1 n
how, etc.) are associated
with files
9
10. Historage
com1 com2
Fine-grained version
control system[1]
is created on top on a
Git repository Method
Method
Method
Method
Method Method
Method Method
stores methods as files
detects rename/move
Method
Method
Method
Method
with Git mechanism
[1] Hata et al., “Historage: Fine-Grained Version Control System for Java,” IWPSE-EVOL ’11.
Tool: git2historage(https://github.com/hdrky/git2historage)
10
11. Visualization of repository history
•tree: directory
•white node: method
Git - file histories Historage - method histories
11
13. Study
Comparison
Prediction level: package, file, and method
Same metrics and a same prediction algorithm
(random forest)
Buggy modules: identified with SZZ algorithm[2]
Evaluation
10-fold cross validation
Effort-based evaluation
[2] Sliwerski et al., “When do changes induce fixes?” MSR ’05.
13
15. Collected Metrics
LOC Lines of code
Add/DelLOC Added / Deleted LOC
Chg/FixChgNum # of changes/bug-fix changes
PastBugNum # of fixed bug IDs
Period Existing days
BugIntroNum # of bug introducing changes
LogCoupNum # of logical coupling changes
Avg/Max/MinInterval Avg/Max/Min change interval
HCM Process complexity metric
DevTotal/Major/Minor # of Total/Major /Minor developers
Ownership Highest proportion of ownership
15
16. Effort-Based
Evaluation
100
Percent of Bugs found
75
50
25
0
0 20 40 60 80 100
Percent of LOC
sample curve
16
17. Result (ECF)
100
Percent of Bugs Found
80
60
40
20
Package
File
Method
0
0 20 40 60 80 100
Percent of Lines
17
18. 1000 Times Run (ECF)
80
Percent of Bugs Found
60
40
20
0
Package File Method
percentages of bugs found in 20% LOC on a 1,000 times run
18
19. 1000 Times Run (All)
Package File Method
100
Percent of bugs found
75
50
25
0
Xpand WTP Incubator Ant Lucene/Solr OpenJPA Cassandra ECF Wicket
median values of the percentage of bugs found in 20% LOC
19
20. Why Is Method-Level
800
Prediction Effective?
10 20 30 40 50 60
Number of methods
600
LOC
400
200
0
0
Package File Method All Buggy
Size # of method in a file
Although models predict buggy modules correctly, they are
largely non-buggy in packages, or files.
20
21. Observations from
Correlation Analysis
Are there differences between method-level and
package/file -level prediction models?
Same
Large changes tend to be buggy
Frequent changes tend to be buggy
Different
Bugs do not occur repeatedly
Organizational metrics may not contribute to method-
level prediction
21
22. Threats to Validity
Targets are limited to open-source written in
Java projects
No manual inspection of identifying buggy
modules
Effort-based evaluation may not reflect actual
efforts
22
23. Fine-Grained Study Is
Big Data Analysis
Need scalable Files Methods
techniques 30000
preparing fine-
22500
grained data
(making Historage)
15000
analyzing histories
(collecting metrics) 7500
building prediction 0
models Xpand Ant ECF Wicket
# of modules in one snapshot
23
24. Conclusions
Summary
Method-level bug prediction with well-known
historical metrics
Future work
Empirical studies of actual effort using method-
level prediction
More metrics and more projects (including
industrial projects)
24