Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Yasutaka Kamei Shinsuke Matsumoto Akito Monden
Ken-ichi Matsumoto Bram Adams Ahmed E. Hassan
Revisiting Common Bug Predict...
Bugs are Everywhere
1
These Bugs …
 Have expensive consequences for a
company
 Affect its reputation
2
OCTOPUS “Paul”
3
Correctly predicts win-loss of all 8 German
games in World Cup 2010
Measure the Source Code…
 Complexity
 Cohesion
 Churn
 . . .
 # Previous bugs
4
…and Build a Prediction Model
5
 Statistical or
 Machine learning
techniques
Predict a Bug
6
Predict a Bug
7
Predict a Bug
8
# BUGS: 2
# BUGS: 7
# BUGS: 0
Predict a Bug
9
# BUGS: 2
# BUGS: 7
# BUGS: 0
Do We Consider Effort?
10
Do We Consider Effort?
11
File A
File B
Do We Consider Effort?
12
# BUGS: 5
Effort : 1
File A
File B
# BUGS: 6
Effort : 20
Do We Consider Effort?
13
File A
File B
# BUGS: 5
Effort : 1
# BUGS: 6
Effort : 20
Effort-aware Models*
* T. Mende and R. Koschke, “Effort-aware defect prediction models,” in Proc. of European
Conference o...
Effort-aware Models*
* T. Mende and R. Koschke, “Effort-aware defect prediction models,” in Proc. of European
Conference o...
Effort-aware Models*
* T. Mende and R. Koschke, “Effort-aware defect prediction models,” in Proc. of European
Conference o...
Effort-aware Models*
* T. Mende and R. Koschke, “Effort-aware defect prediction models,” in Proc. of European
Conference o...
Effort-aware Models*
* T. Mende and R. Koschke, “Effort-aware defect prediction models,” in Proc. of European
Conference o...
Major Findings in Prediction Studies
 Process metrics are better defect predictors
than product metrics
 Package-level p...
Major Findings in Prediction Studies
 Process metrics are better defect predictors
than product metrics
 Package-level p...
Case Study
Platform
(5,000 files)
JDT
(3,000 files)
PDE
(1,000 files)
Ver. 3.0, 3.1 and 3.2
21
Case Study
Platform
(5,000 files)
JDT
(3,000 files)
PDE
(1,000 files)
Ver. 3.0, 3.1 and 3.2
22
Cross-release Prediction
TimePlatform 3.1 Platform 3.2
23
Cross-release Prediction
24
Build a prediction model
TimePlatform 3.1 Platform 3.2
Measure Metrics
Bugs
Training data
Cross-release Prediction
25
BUGS : 5
BUGS : 7
BUGS : 0
Build a prediction model Predict a bug
TimePlatform 3.1 Platform 3....
Cumulative Lift Chart
All modules are ordered by decreasing
predicted RR(x) .
26
#bugs
0 200 400 600 800
05001000150020002...
Cumulative Lift Chart
All modules are ordered by decreasing
predicted RR(x) .
27
#bugs
0 200 400 600 800
05001000150020002...
Research Questions
RQ1: Are process metrics still more effective
than product metrics in effort-aware models?
RQ2: Are pac...
Research Questions
RQ1: Are process metrics still more effective
than product metrics in effort-aware models?
RQ2: Are pac...
RQ1: Process vs. Product Metrics
Compare prediction models based on process
and product metrics at the file-level
30
Proce...
Process Metrics
31
Product Metrics
32
Model Building Approach
 Linear model
 Regression tree
 Random forest
33
Model Building Approach
 Linear model
 Regression tree
 Random forest
34
KSLOC
#bugs
0 200 400 600 800
05001000150020002500
Process Metrics are still more
effective than Product Metrics
35
20%
74...
KSLOC
#bugs
0 200 400 600 800
05001000150020002500
36
20%
74%
29%
KSLOC(= Effort)
#Bugs
2.6 (= 74/29)
Process
Product
Proc...
Impact of Process and Product Metrics
The top five metrics are all process metrics
37
NSM
NSC
Refactorings
NORM
LCOM
SIX
D...
Research Questions
RQ1: Are process metrics still more effective
than product metrics in effort-aware models?
RQ2: Are pac...
RQ2: Package-level vs. File-level
Package-level File-level
39
Model Building Approach
40
B1 Package-level metrics
B2 Lift file-level metrics to package-level
B3 Lift file-level predict...
B1 Package-level Metrics
RQ2: Model Building Approach
41
Martin metrics Build a model
Package-level
predictions
Martin’s Package Metrics
42
B1 Package-level Metrics
RQ2: Model Building Approach
43
Martin metrics Build a model
Package-level
predictions
Model Building Approach
44
B1 Package-level metrics
B2 Lift file-level metrics to package-level
B3 Lift file-level predict...
B2 Lift File-level Metrics to
Package-level
45
RQ2: Model Building Approach
Lift
Metrics
File-level
metrics
Package-level
...
B2 Lift File-level Metrics to
Package-level
46
RQ2: Model Building Approach
Package A
File a
File b
File c
Lift
Metrics
Fi...
B2 Lift File-level Metrics to
Package-level
47
RQ2: Model Building Approach
Package A
File a
File b
File c
Complexity:
9
4...
B2 Lift File-level Metrics to
Package-level
48
RQ2: Model Building Approach
Lift
Metrics
File-level
metrics
Package-level
...
B2 Lift File-level Metrics to
Package-level
49
RQ2: Model Building Approach
Lift
Metrics
File-level
metrics
Package-level
...
Model Building Approach
50
B1 Package-level metrics
B2 Lift file-level metrics to package-level
B3 Lift file-level predict...
B3 Lift File-level Predictions
to Package-level
51
RQ2: Model Building Approach
Lift
Predictions
Build a model
File-level
...
B3 Lift File-level Predictions
to Package-level
52
RQ2: Model Building Approach
Lift
Predictions
Build a model
File-level
...
B3 Lift File-level Predictions
to Package-level
53
RQ2: Model Building Approach
Package A
File a
File b
File c
#bugs
6
3
2...
B3 Lift File-level Predictions
to Package-level
54
RQ2: Model Building Approach
Package A
File a
File b
File c
#bugs
6
3
2...
B3 Lift File-level Predictions
to Package-level
55
RQ2: Model Building Approach
Package A
File a
File b
File c
#bugs
6
3
2...
B3 Lift File-level Predictions
to Package-level
56
RQ2: Model Building Approach
Build a model
File-level
predictions
Lift
...
Summary of Model Building
Approaches
57
RQ2: Model Building Approach
Package-level
metrics
Build a model
at Package-level
...
Summary of Model Building
Approaches
58
RQ2: Model Building Approach
Package-level
metrics
File-level
metrics
Build a mode...
Summary of Model Building
Approaches
59
RQ2: Model Building Approach
Package-level
metrics
File-level
metrics
Build a mode...
#bugs
0 200 400 600 800
05001000150020002500
60
20%
Lifting Predictions yields the Best
Performance at Package-level
KSLOC...
#bugs
0 200 400 600 800
05001000150020002500
61
20%
Lifting Predictions yields the Best
Performance at Package-level
KSLOC...
#bugs
0 200 400 600 800
05001000150020002500
62
20%
57% B2 LiftUp(Input)
19% B1 Martin Metrics
Lifting Predictions yields ...
Refactorings
A
NA.
LCOM
Ca
NSC
NC
NSM
I
NOF
WMC
NORM
D
PAR
VG
DIT
NSF
NOM
SIX
Ce
LOCAdded
Codechurn
LOCDeleted
Age
MLOC
NB...
64
#bugs
0 200 400 600 800
05001000150020002500
20%
74%
62%
Package
B3 LiftUp(Pred.)
File
KSLOC(= Effort)
#Bugs
File-level...
Research Questions
RQ1: Are process metrics still more effective
than product metrics in effort-aware models?
RQ2: Are pac...
Why is RQ2 not Supported?
66
Package-level File-level
Why is RQ2 not Supported?
The larger the package, the more likely a bug
is introduced.
67
Package-level File-level
# BUGS:...
68
69
70
71
Reference
72
Studied Projects
73
SZZ Algorithm
74
Example of Counting
the Number of Bugs
75
v3.0 release v3.1 release v3.2 release
B
bug introduction bug fix
v3.0 release v...
Martin’s Package Metrics
76
Product Metrics
77
Used Metrics
78
Process metrics Product metrics
Martin’s Package Metrics
Upcoming SlideShare
Loading in …5
×

Icsm2010 kamei

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Login to see the comments

Icsm2010 kamei

  1. 1. Yasutaka Kamei Shinsuke Matsumoto Akito Monden Ken-ichi Matsumoto Bram Adams Ahmed E. Hassan Revisiting Common Bug Prediction Findings Using Effort-Aware Models
  2. 2. Bugs are Everywhere 1
  3. 3. These Bugs …  Have expensive consequences for a company  Affect its reputation 2
  4. 4. OCTOPUS “Paul” 3 Correctly predicts win-loss of all 8 German games in World Cup 2010
  5. 5. Measure the Source Code…  Complexity  Cohesion  Churn  . . .  # Previous bugs 4
  6. 6. …and Build a Prediction Model 5  Statistical or  Machine learning techniques
  7. 7. Predict a Bug 6
  8. 8. Predict a Bug 7
  9. 9. Predict a Bug 8 # BUGS: 2 # BUGS: 7 # BUGS: 0
  10. 10. Predict a Bug 9 # BUGS: 2 # BUGS: 7 # BUGS: 0
  11. 11. Do We Consider Effort? 10
  12. 12. Do We Consider Effort? 11 File A File B
  13. 13. Do We Consider Effort? 12 # BUGS: 5 Effort : 1 File A File B # BUGS: 6 Effort : 20
  14. 14. Do We Consider Effort? 13 File A File B # BUGS: 5 Effort : 1 # BUGS: 6 Effort : 20
  15. 15. Effort-aware Models* * T. Mende and R. Koschke, “Effort-aware defect prediction models,” in Proc. of European Conference on Software Maintenance and Reengineering (CSMR’10), 2010, pp. 109–118. 14
  16. 16. Effort-aware Models* * T. Mende and R. Koschke, “Effort-aware defect prediction models,” in Proc. of European Conference on Software Maintenance and Reengineering (CSMR’10), 2010, pp. 109–118. 15 # BUGS: 5 Effort : 1 # BUGS: 6 Effort : 20 File A File B
  17. 17. Effort-aware Models* * T. Mende and R. Koschke, “Effort-aware defect prediction models,” in Proc. of European Conference on Software Maintenance and Reengineering (CSMR’10), 2010, pp. 109–118. 16 5 = 5 / 1 0.30 = 6 / 20 # BUGS: 5 Effort : 1 # BUGS: 6 Effort : 20 File A File B
  18. 18. Effort-aware Models* * T. Mende and R. Koschke, “Effort-aware defect prediction models,” in Proc. of European Conference on Software Maintenance and Reengineering (CSMR’10), 2010, pp. 109–118. 17 5 = 5 / 1 0.30 = 6 / 20 # BUGS: 5 Effort : 1 # BUGS: 6 Effort : 20 File A File B
  19. 19. Effort-aware Models* * T. Mende and R. Koschke, “Effort-aware defect prediction models,” in Proc. of European Conference on Software Maintenance and Reengineering (CSMR’10), 2010, pp. 109–118. 18 # BUGS: 5 Effort : 1 # BUGS: 6 Effort : 20 5 = 5 / 1 0.30 = 6 / 20 SLOC File A File B
  20. 20. Major Findings in Prediction Studies  Process metrics are better defect predictors than product metrics  Package-level prediction has higher precision and recall than file-level prediction  … 19 [Schroter2006ISESE], [Zimmermann2007PROMISE], … [Graves2000TSE], [Nagappan2006ICSE], [Moser2008ICSE], …
  21. 21. Major Findings in Prediction Studies  Process metrics are better defect predictors than product metrics  Package-level prediction has higher precision and recall than file-level prediction  … 20 RQ1 RQ2 [Schroter2006ISESE], [Zimmermann2007PROMISE], … [Graves2000TSE], [Nagappan2006ICSE], [Moser2008ICSE], …
  22. 22. Case Study Platform (5,000 files) JDT (3,000 files) PDE (1,000 files) Ver. 3.0, 3.1 and 3.2 21
  23. 23. Case Study Platform (5,000 files) JDT (3,000 files) PDE (1,000 files) Ver. 3.0, 3.1 and 3.2 22
  24. 24. Cross-release Prediction TimePlatform 3.1 Platform 3.2 23
  25. 25. Cross-release Prediction 24 Build a prediction model TimePlatform 3.1 Platform 3.2 Measure Metrics Bugs Training data
  26. 26. Cross-release Prediction 25 BUGS : 5 BUGS : 7 BUGS : 0 Build a prediction model Predict a bug TimePlatform 3.1 Platform 3.2 Measure Metrics Bugs Measure Metrics Bugs Training data Testing data
  27. 27. Cumulative Lift Chart All modules are ordered by decreasing predicted RR(x) . 26 #bugs 0 200 400 600 800 05001000150020002500 KSLOC(= Effort) #Bugs
  28. 28. Cumulative Lift Chart All modules are ordered by decreasing predicted RR(x) . 27 #bugs 0 200 400 600 800 05001000150020002500 KSLOC(= Effort) #Bugs 20% 54%
  29. 29. Research Questions RQ1: Are process metrics still more effective than product metrics in effort-aware models? RQ2: Are package-level predictions still more effective than file-level predictions? 28
  30. 30. Research Questions RQ1: Are process metrics still more effective than product metrics in effort-aware models? RQ2: Are package-level predictions still more effective than file-level predictions? 29
  31. 31. RQ1: Process vs. Product Metrics Compare prediction models based on process and product metrics at the file-level 30 Process metrics Product metrics
  32. 32. Process Metrics 31
  33. 33. Product Metrics 32
  34. 34. Model Building Approach  Linear model  Regression tree  Random forest 33
  35. 35. Model Building Approach  Linear model  Regression tree  Random forest 34
  36. 36. KSLOC #bugs 0 200 400 600 800 05001000150020002500 Process Metrics are still more effective than Product Metrics 35 20% 74% 29% Process Product KSLOC(= Effort) #Bugs
  37. 37. KSLOC #bugs 0 200 400 600 800 05001000150020002500 36 20% 74% 29% KSLOC(= Effort) #Bugs 2.6 (= 74/29) Process Product Process Metrics are still more effective than Product Metrics
  38. 38. Impact of Process and Product Metrics The top five metrics are all process metrics 37 NSM NSC Refactorings NORM LCOM SIX DIT NSF NOF NOM WMC VG PAR MLOC NBD SLOC LOCAdded Codechurn LOCDeleted Age BugFixes Revisions 0.00 0.05 0.10 0.15 0.20 rf1 IncNodePurity Revisions BugFixes LOCDeleted Age Codechurn LOCAdded Refactorings Process Metrics Product Metrics
  39. 39. Research Questions RQ1: Are process metrics still more effective than product metrics in effort-aware models? RQ2: Are package-level predictions still more effective than file-level predictions? 38 YES
  40. 40. RQ2: Package-level vs. File-level Package-level File-level 39
  41. 41. Model Building Approach 40 B1 Package-level metrics B2 Lift file-level metrics to package-level B3 Lift file-level predictions to package-level
  42. 42. B1 Package-level Metrics RQ2: Model Building Approach 41 Martin metrics Build a model Package-level predictions
  43. 43. Martin’s Package Metrics 42
  44. 44. B1 Package-level Metrics RQ2: Model Building Approach 43 Martin metrics Build a model Package-level predictions
  45. 45. Model Building Approach 44 B1 Package-level metrics B2 Lift file-level metrics to package-level B3 Lift file-level predictions to package-level
  46. 46. B2 Lift File-level Metrics to Package-level 45 RQ2: Model Building Approach Lift Metrics File-level metrics Package-level metrics
  47. 47. B2 Lift File-level Metrics to Package-level 46 RQ2: Model Building Approach Package A File a File b File c Lift Metrics File-level metrics Package-level metrics
  48. 48. B2 Lift File-level Metrics to Package-level 47 RQ2: Model Building Approach Package A File a File b File c Complexity: 9 4 5 6 = 9 + 4 + 5 3 Lift Metrics File-level metrics Package-level metrics
  49. 49. B2 Lift File-level Metrics to Package-level 48 RQ2: Model Building Approach Lift Metrics File-level metrics Package-level metrics
  50. 50. B2 Lift File-level Metrics to Package-level 49 RQ2: Model Building Approach Lift Metrics File-level metrics Package-level metrics Build a model Package-level predictions
  51. 51. Model Building Approach 50 B1 Package-level metrics B2 Lift file-level metrics to package-level B3 Lift file-level predictions to package-level
  52. 52. B3 Lift File-level Predictions to Package-level 51 RQ2: Model Building Approach Lift Predictions Build a model File-level predictionsFile-level metrics
  53. 53. B3 Lift File-level Predictions to Package-level 52 RQ2: Model Building Approach Lift Predictions Build a model File-level predictionsFile-level metrics
  54. 54. B3 Lift File-level Predictions to Package-level 53 RQ2: Model Building Approach Package A File a File b File c #bugs 6 3 2 Lift Predictions
  55. 55. B3 Lift File-level Predictions to Package-level 54 RQ2: Model Building Approach Package A File a File b File c #bugs 6 3 2 KSLOC: 1.0 0.5 1.5 Lift Predictions
  56. 56. B3 Lift File-level Predictions to Package-level 55 RQ2: Model Building Approach Package A File a File b File c #bugs 6 3 2 KSLOC: 1.0 0.5 1.5 1.0+0.5+1.5 6+3+2 2.9 = Lift Predictions
  57. 57. B3 Lift File-level Predictions to Package-level 56 RQ2: Model Building Approach Build a model File-level predictions Lift Predictions Package-level predictions File-level metrics
  58. 58. Summary of Model Building Approaches 57 RQ2: Model Building Approach Package-level metrics Build a model at Package-level Package-level predictions B1 Martin Metrics
  59. 59. Summary of Model Building Approaches 58 RQ2: Model Building Approach Package-level metrics File-level metrics Build a model at Package-level Package-level predictions Lift MetricsB2 LiftUp(Input) B1 Martin Metrics
  60. 60. Summary of Model Building Approaches 59 RQ2: Model Building Approach Package-level metrics File-level metrics Build a model at Package-level Package-level predictions Build a model at File-level File-level predictions Lift Predictions Lift MetricsB2 LiftUp(Input) B3 LiftUp(Pred) B1 Martin Metrics
  61. 61. #bugs 0 200 400 600 800 05001000150020002500 60 20% Lifting Predictions yields the Best Performance at Package-level KSLOC(= Effort) #Bugs 62% B3 LiftUp(Prediction)
  62. 62. #bugs 0 200 400 600 800 05001000150020002500 61 20% Lifting Predictions yields the Best Performance at Package-level KSLOC(= Effort) #Bugs 62% B3 LiftUp(Prediction) 57% B2 LiftUp(Input)
  63. 63. #bugs 0 200 400 600 800 05001000150020002500 62 20% 57% B2 LiftUp(Input) 19% B1 Martin Metrics Lifting Predictions yields the Best Performance at Package-level KSLOC(= Effort) #Bugs 62% B3 LiftUp(Prediction)
  64. 64. Refactorings A NA. LCOM Ca NSC NC NSM I NOF WMC NORM D PAR VG DIT NSF NOM SIX Ce LOCAdded Codechurn LOCDeleted Age MLOC NBD BugFixes SLOC Revisions 0.000 0.002 0.004 0.006 0.008 0.010 0.012 rf3Impact of Martin Metrics 63 Ce D I NC Ca NA A Refactorings Revisions BugFixes LOCDeleted Codechurn LOCAdded Age Martin Metrics Process Metrics Product Metrics
  65. 65. 64 #bugs 0 200 400 600 800 05001000150020002500 20% 74% 62% Package B3 LiftUp(Pred.) File KSLOC(= Effort) #Bugs File-level Predictions are more effective than Package-level
  66. 66. Research Questions RQ1: Are process metrics still more effective than product metrics in effort-aware models? RQ2: Are package-level predictions still more effective than file-level predictions? 65 YES NO
  67. 67. Why is RQ2 not Supported? 66 Package-level File-level
  68. 68. Why is RQ2 not Supported? The larger the package, the more likely a bug is introduced. 67 Package-level File-level # BUGS: 8 SLOC : 20 # BUGS: 2 SLOC : 0.5
  69. 69. 68
  70. 70. 69
  71. 71. 70
  72. 72. 71
  73. 73. Reference 72
  74. 74. Studied Projects 73
  75. 75. SZZ Algorithm 74
  76. 76. Example of Counting the Number of Bugs 75 v3.0 release v3.1 release v3.2 release B bug introduction bug fix v3.0 release v3.1 release v3.2 release A bug introduction bug fix
  77. 77. Martin’s Package Metrics 76
  78. 78. Product Metrics 77
  79. 79. Used Metrics 78 Process metrics Product metrics Martin’s Package Metrics

×