0
Personalized
Defect Prediction

Tian Jiang

Lin Tan

University of
Waterloo

University of
Waterloo

Sunghun Kim
Hong Kong...
How to Find Bugs?
• Code Review
• Testing
• Static Analysis
• Dynamic Analysis
• Verification
• Defect Prediction
2
2
Defect Prediction

Software
History

Predictor

Future
Defect

3
3
Developers are Different

4
4
Developers are Different
Modulo %

FOR

Bitwise OR

CONTINUE

% of Buggy Changes

80
60
40
20
0

A

B

C

D

Average
Linux...
Developers are Different
Modulo %

FOR

Bitwise OR

CONTINUE

% of Buggy Changes

80
60
40
20
0

A

B

C

D

Average
Linux...
Developers are Different
Modulo %

FOR

Bitwise OR

CONTINUE

% of Buggy Changes

80
60
40
20
0

A

B

C

D

Average
Linux...
Successes in Other Fields

5
5
Successes in Other Fields

•

Google personalized search

5
5
Successes in Other Fields

•
•

Google personalized search
Facebook personalized ad placement

5
5
Contributions

6
6
Contributions
•

Personalized Change Classification (PCC)
✦ One model for each developer

6
6
Contributions
•

Personalized Change Classification (PCC)
✦ One model for each developer

•

Confidence-based Hybrid PCC (PC...
Contributions
•

Personalized Change Classification (PCC)
✦ One model for each developer

•

Confidence-based Hybrid PCC (PC...
What is a Change?

7
7
What is a Change?
Commit: 09a02f...
Author: John Smith
Message: I submitted some code.
file1.c
+
+
+
-

file2.c
+
-

file3.c
...
What is a Change?

Commit

Commit: 09a02f...
Author: John Smith
Message: I submitted some code.
file1.c
+
+
+
-

file2.c
+
-...
What is a Change?

Commit

Commit: 09a02f...
Author: John Smith
Message: I submitted some code.
file1.c
+
+
+
-

file2.c
+
-...
Change Classification (CC)

8
8
Change Classification (CC)
Training Phase

Prediction Phase

Software
History

8
8
Change Classification (CC)
Training Phase

Software
History

Prediction Phase

Training
Instances

1. Label changes
with cl...
Change Classification (CC)
Training Phase

Software
History

Training
Instances

1. Label changes
with clean or buggy

Pred...
Change Classification (CC)
Training Phase

Software
History

Training
Instances

1. Label changes
with clean or buggy

Pred...
Change Classification (CC)
Training Phase

Software
History

Training
Instances

1. Label changes
with clean or buggy

Pred...
Label Clean or Buggy

9
9
Label Clean or Buggy
[Sliwerski et al. ’05]
Revision History

9
9
Label Clean or Buggy
[Sliwerski et al. ’05]
Revision History
Bug-Fixing Change
Commit: 1da57...
Message: I fixed a bug
fileA...
Label Clean or Buggy
[Sliwerski et al. ’05]
Revision History
Buggy Change

Bug-Fixing Change

Commit: 7a3bc...
Message: ne...
Three Types of Features

10
10
Three Types of Features

• Metadata
• Bag-of-Words
• Characteristic Vector

10
10
Characteristic Vector

11
11
Characteristic Vector
Count Abstract Syntax Tree (AST) nodes

11
11
Characteristic Vector
Count Abstract Syntax Tree (AST) nodes
for (...; ...; ...) {
for (...; ...; ...) {
if (...) ...;
}
}...
Characteristic Vector
Count Abstract Syntax Tree (AST) nodes
for:
if:
while:
...

for (...; ...; ...) {
for (...; ...; ......
Characteristic Vector
Count Abstract Syntax Tree (AST) nodes
for:
if:
while:
...

for (...; ...; ...) {
for (...; ...; ......
CC: Training

12
12
CC: Training

Training Instances

Model

12
12
CC: Training

Training Instances

Model

12
12
CC: Prediction

Unlabeled
Changes

13
13
CC: Prediction

Unlabeled
Changes

Model

Predicted
Changes

13
13
PCC: Training

14
14
PCC: Training

Training Instances

14
14
PCC: Training

Dev 1

Training Instances

Dev 2

Dev 3
Group Changes by Developer
14
14
PCC: Training
Model 1
Dev 1

Model 2
Training Instances

Dev 2

Model 3
Dev 3
Group Changes by Developer

Training
14
14
PCC: Prediction
Model 1

Model 2

Model 3

15
15
PCC: Prediction
Model 1

Model 2
(Dev 2)
Model 3
Choose a Model by Developer

15
15
PCC: Prediction
Model 1

Model 2
(Dev 2)
Model 3
Choose a Model by Developer

Prediction

15
15
PCC+: Prediction

16
16
PCC+: Prediction
Combiner

CC

PCC
Feed Changes to All Models

Prediction

16
16
Confidence Measure

17
17
Confidence Measure
•

Bugginess
✦ Probability of a change being buggy

17
17
Confidence Measure
•

Bugginess
✦ Probability of a change being buggy

•

Confidence Measure
✦ Comparable measure of confiden...
Confidence Measure
•

Bugginess
✦ Probability of a change being buggy

•

Confidence Measure
✦ Comparable measure of confiden...
Research Questions

18
18
Research Questions
•

RQ1: Do PCC and PCC+ outperform CC?

18
18
Research Questions
•
•

RQ1: Do PCC and PCC+ outperform CC?
RQ2: Does PCC outperform CC in other setups?
✦ Classification a...
Two Metrics

19
19
Two Metrics
•

F1-Score
✦ Harmonic mean of precision and recall

19
19
Two Metrics
•

F1-Score
✦ Harmonic mean of precision and recall

•

Cost Effectiveness
✦ Relevant in cost sensitive scenar...
Cost Effectiveness
Cumulative LOC

Changes

LOC

10%

Buggy #1

10

15%

Buggy #2

5

19%

Buggy #3

4

27%

Buggy #4

8

...
Cost Effectiveness
Cumulative LOC

Changes

LOC

10%

Buggy #1

10

15%

Buggy #2

5

19%

Buggy #3

4

27%

Buggy #4

8

...
Cost Effectiveness
Cumulative LOC

Changes

LOC

10%

Buggy #1

10

15%

Buggy #2

5

19%

Buggy #3

4

27%

Buggy #4

8

...
Cost Effectiveness
Cumulative LOC

Changes

LOC

10%

Buggy #1

10

15%

Buggy #2

5

19%

Buggy #3

4

27%

Buggy #4

8

...
Cost Effectiveness
Cumulative LOC

Changes

LOC

10%

Buggy #1

10

15%

Buggy #2

5

19%

Buggy #3

4

27%

Buggy #4

8

...
Cost Effectiveness
Cumulative LOC

Changes

LOC

10%

Buggy #1

10

15%

Buggy #2

5

19%

Buggy #3

4

27%

Buggy #4

8

...
Cost Effectiveness
Cumulative LOC

Changes

LOC

10%

Buggy #1

10

15%

Buggy #2

5

19%

Buggy #3

4

27%

Buggy #4

8

...
Cost Effectiveness
Cumulative LOC

10%
15%
19%
27%

Changes

LOC

ug
Buggy #1B
e
ru
T

10

Buggy #2

5

ug
Buggy #3B
e
ru
...
Test Subjects
Projects

Language

LOC

# of Changes

Linux kernel

C

7.3M

429K

PostgreSQL

C

289K

89K

Xorg

C

1.1M
...
PCC/PCC+ vs. CC
Decision Tree, NofB20

23
23
PCC/PCC+ vs. CC
Decision Tree, NofB20
Projects

CC

PCC

Delta

PCC+

Delta

Linux

160

179

+19

172

+12

PostgreSQL

5...
PCC/PCC+ outperforms CC.

24
24
Different Classification Alg.
NofB20
Projects

Naive Bayes

Logistic Regression

CC

PCC

Delta

CC

PCC

Delta

Linux

138...
Different Classification Alg.
NofB20
Projects

Naive Bayes

Logistic Regression

CC

PCC

Delta

CC

PCC

Delta

Linux

138...
Different Training Set Sizes
PCC

CC

300

NofB20

250
200
150
100

10

20

30

40

50

60

70

80

90

Training Set Size ...
Different Training Set Sizes
PCC

CC

300

NofB20

250
200
150
100

10

20

30

40

50

60

70

80

90

Training Set Size ...
The improvement presents in
other setups.

27
27
Related Work

•

Kim et al., Classifying software changes: Clean or
buggy?, TSE ’08

•

Bettenburg et al., Think locally, ...
Conclusions & Future Work
•
•

PCC and PCC+ improve prediction performance.

•

Personalized approach can be applied to ot...
Upcoming SlideShare
Loading in...5
×

Personalized Defect Prediction

1,623

Published on

Tian's ASE 2013 Presentation

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,623
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
24
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Personalized Defect Prediction"

  1. 1. Personalized Defect Prediction Tian Jiang Lin Tan University of Waterloo University of Waterloo Sunghun Kim Hong Kong University of Science and Technology 1
  2. 2. How to Find Bugs? • Code Review • Testing • Static Analysis • Dynamic Analysis • Verification • Defect Prediction 2 2
  3. 3. Defect Prediction Software History Predictor Future Defect 3 3
  4. 4. Developers are Different 4 4
  5. 5. Developers are Different Modulo % FOR Bitwise OR CONTINUE % of Buggy Changes 80 60 40 20 0 A B C D Average Linux Kernel, 2005-2010 4 4
  6. 6. Developers are Different Modulo % FOR Bitwise OR CONTINUE % of Buggy Changes 80 60 40 20 0 A B C D Average Linux Kernel, 2005-2010 4 4
  7. 7. Developers are Different Modulo % FOR Bitwise OR CONTINUE % of Buggy Changes 80 60 40 20 0 A B C D Average Linux Kernel, 2005-2010 Personalized models can improve performance. 4 4
  8. 8. Successes in Other Fields 5 5
  9. 9. Successes in Other Fields • Google personalized search 5 5
  10. 10. Successes in Other Fields • • Google personalized search Facebook personalized ad placement 5 5
  11. 11. Contributions 6 6
  12. 12. Contributions • Personalized Change Classification (PCC) ✦ One model for each developer 6 6
  13. 13. Contributions • Personalized Change Classification (PCC) ✦ One model for each developer • Confidence-based Hybrid PCC (PCC+) ✦ Picks predictions with highest confidence 6 6
  14. 14. Contributions • Personalized Change Classification (PCC) ✦ One model for each developer • Confidence-based Hybrid PCC (PCC+) ✦ Picks predictions with highest confidence • Evaluate on six C and Java projects ✦ Find up to 155 more bugs by inspecting 20% LOC ✦ Improve F1 by up to 0.08 6 6
  15. 15. What is a Change? 7 7
  16. 16. What is a Change? Commit: 09a02f... Author: John Smith Message: I submitted some code. file1.c + + + - file2.c + - file3.c + + - 7 7
  17. 17. What is a Change? Commit Commit: 09a02f... Author: John Smith Message: I submitted some code. file1.c + + + - file2.c + - file3.c + + - Change 1 Change 2 Change 3 7 7
  18. 18. What is a Change? Commit Commit: 09a02f... Author: John Smith Message: I submitted some code. file1.c + + + - file2.c + - file3.c + + - Change 1 Change 2 Change 3 Change-Level: Inspect less code to locate a bug. 7 7
  19. 19. Change Classification (CC) 8 8
  20. 20. Change Classification (CC) Training Phase Prediction Phase Software History 8 8
  21. 21. Change Classification (CC) Training Phase Software History Prediction Phase Training Instances 1. Label changes with clean or buggy 8 8
  22. 22. Change Classification (CC) Training Phase Software History Training Instances 1. Label changes with clean or buggy Prediction Phase Features 2. Extract features 8 8
  23. 23. Change Classification (CC) Training Phase Software History Training Instances 1. Label changes with clean or buggy Prediction Phase Features 2. Extract features Classification Algorithm Model 3. Build prediction model 8 8
  24. 24. Change Classification (CC) Training Phase Software History Training Instances 1. Label changes with clean or buggy Prediction Phase Features 2. Extract features Classification Algorithm 3. Build prediction model Model Future Instances 4. Predict 8 8
  25. 25. Label Clean or Buggy 9 9
  26. 26. Label Clean or Buggy [Sliwerski et al. ’05] Revision History 9 9
  27. 27. Label Clean or Buggy [Sliwerski et al. ’05] Revision History Bug-Fixing Change Commit: 1da57... Message: I fixed a bug fileA.c - if (i < 128) +if (i <= 128) Contain keyword “fix”, or ID of manually verified bug report [Herzif et al. ’13] 9 9
  28. 28. Label Clean or Buggy [Sliwerski et al. ’05] Revision History Buggy Change Bug-Fixing Change Commit: 7a3bc... Message: new feature fileA.c +... +if (i < 128) +... Commit: 1da57... Message: I fixed a bug fileA.c Fixed by a later change git blame - if (i < 128) +if (i <= 128) Contain keyword “fix”, or ID of manually verified bug report [Herzif et al. ’13] 9 9
  29. 29. Three Types of Features 10 10
  30. 30. Three Types of Features • Metadata • Bag-of-Words • Characteristic Vector 10 10
  31. 31. Characteristic Vector 11 11
  32. 32. Characteristic Vector Count Abstract Syntax Tree (AST) nodes 11 11
  33. 33. Characteristic Vector Count Abstract Syntax Tree (AST) nodes for (...; ...; ...) { for (...; ...; ...) { if (...) ...; } } 11 11
  34. 34. Characteristic Vector Count Abstract Syntax Tree (AST) nodes for: if: while: ... for (...; ...; ...) { for (...; ...; ...) { if (...) ...; } } 11 11
  35. 35. Characteristic Vector Count Abstract Syntax Tree (AST) nodes for: if: while: ... for (...; ...; ...) { for (...; ...; ...) { if (...) ...; } } 2 1 0 11 11
  36. 36. CC: Training 12 12
  37. 37. CC: Training Training Instances Model 12 12
  38. 38. CC: Training Training Instances Model 12 12
  39. 39. CC: Prediction Unlabeled Changes 13 13
  40. 40. CC: Prediction Unlabeled Changes Model Predicted Changes 13 13
  41. 41. PCC: Training 14 14
  42. 42. PCC: Training Training Instances 14 14
  43. 43. PCC: Training Dev 1 Training Instances Dev 2 Dev 3 Group Changes by Developer 14 14
  44. 44. PCC: Training Model 1 Dev 1 Model 2 Training Instances Dev 2 Model 3 Dev 3 Group Changes by Developer Training 14 14
  45. 45. PCC: Prediction Model 1 Model 2 Model 3 15 15
  46. 46. PCC: Prediction Model 1 Model 2 (Dev 2) Model 3 Choose a Model by Developer 15 15
  47. 47. PCC: Prediction Model 1 Model 2 (Dev 2) Model 3 Choose a Model by Developer Prediction 15 15
  48. 48. PCC+: Prediction 16 16
  49. 49. PCC+: Prediction Combiner CC PCC Feed Changes to All Models Prediction 16 16
  50. 50. Confidence Measure 17 17
  51. 51. Confidence Measure • Bugginess ✦ Probability of a change being buggy 17 17
  52. 52. Confidence Measure • Bugginess ✦ Probability of a change being buggy • Confidence Measure ✦ Comparable measure of confidence 17 17
  53. 53. Confidence Measure • Bugginess ✦ Probability of a change being buggy • Confidence Measure ✦ Comparable measure of confidence • Select the prediction with the highest confidence. 17 17
  54. 54. Research Questions 18 18
  55. 55. Research Questions • RQ1: Do PCC and PCC+ outperform CC? 18 18
  56. 56. Research Questions • • RQ1: Do PCC and PCC+ outperform CC? RQ2: Does PCC outperform CC in other setups? ✦ Classification algorithms ✦ Sizes of training sets 18 18
  57. 57. Two Metrics 19 19
  58. 58. Two Metrics • F1-Score ✦ Harmonic mean of precision and recall 19 19
  59. 59. Two Metrics • F1-Score ✦ Harmonic mean of precision and recall • Cost Effectiveness ✦ Relevant in cost sensitive scenarios ✦ NofB20: Number of Bugs discovered by inspecting top 20% lines of code 19 19
  60. 60. Cost Effectiveness Cumulative LOC Changes LOC 10% Buggy #1 10 15% Buggy #2 5 19% Buggy #3 4 27% Buggy #4 8 Buggy #5 12 ... ... 100 20 20
  61. 61. Cost Effectiveness Cumulative LOC Changes LOC 10% Buggy #1 10 15% Buggy #2 5 19% Buggy #3 4 27% Buggy #4 8 Buggy #5 12 ... ... 100 20 20
  62. 62. Cost Effectiveness Cumulative LOC Changes LOC 10% Buggy #1 10 15% Buggy #2 5 19% Buggy #3 4 27% Buggy #4 8 Buggy #5 12 ... ... 100 20 20
  63. 63. Cost Effectiveness Cumulative LOC Changes LOC 10% Buggy #1 10 15% Buggy #2 5 19% Buggy #3 4 27% Buggy #4 8 Buggy #5 12 ... ... 100 20 20
  64. 64. Cost Effectiveness Cumulative LOC Changes LOC 10% Buggy #1 10 15% Buggy #2 5 19% Buggy #3 4 27% Buggy #4 8 Buggy #5 12 ... ... 100 20 20
  65. 65. Cost Effectiveness Cumulative LOC Changes LOC 10% Buggy #1 10 15% Buggy #2 5 19% Buggy #3 4 27% Buggy #4 8 Buggy #5 12 ... ... 100 20 20
  66. 66. Cost Effectiveness Cumulative LOC Changes LOC 10% Buggy #1 10 15% Buggy #2 5 19% Buggy #3 4 27% Buggy #4 8 Buggy #5 12 ... ... 100 21 21
  67. 67. Cost Effectiveness Cumulative LOC 10% 15% 19% 27% Changes LOC ug Buggy #1B e ru T 10 Buggy #2 5 ug Buggy #3B e ru T ug Buggy #4B e ru T 4 8 Buggy #5 12 ... ... NofB20=3 100 21 21
  68. 68. Test Subjects Projects Language LOC # of Changes Linux kernel C 7.3M 429K PostgreSQL C 289K 89K Xorg C 1.1M 46K Eclipse Java 1.5M 73K Lucene* Java 828K 76K Jackrabbit* Java 589K 61K * With manually labelled bug report data [Herzif et al. ’13] 22 22
  69. 69. PCC/PCC+ vs. CC Decision Tree, NofB20 23 23
  70. 70. PCC/PCC+ vs. CC Decision Tree, NofB20 Projects CC PCC Delta PCC+ Delta Linux 160 179 +19 172 +12 PostgreSQL 55 210 +155 175 +120 Xorg 96 159 +63 161 +65 Eclipse 116 207 +91 200 +84 Lucene 177 254 +77 257 +80 Jackrabbit 411 449 +38 459 +48 Average - - +74 - +68 Statistical significant deltas are in bold. 23 23
  71. 71. PCC/PCC+ outperforms CC. 24 24
  72. 72. Different Classification Alg. NofB20 Projects Naive Bayes Logistic Regression CC PCC Delta CC PCC Delta Linux 138 147 +9 102 137 +35 PostgreSQL 89 113 +24 46 56 +10 Xorg 84 101 +17 52 29 -23 Eclipse 65 108 +43 54 55 +1 Lucene 152 139 -13 30 200 +170 Jackrabbit 420 414 -6 261 370 +109 Average - - +12 - - +59 Statistical significant deltas are in bold. 25 25
  73. 73. Different Classification Alg. NofB20 Projects Naive Bayes Logistic Regression CC PCC Delta CC PCC Delta Linux 138 147 +9 102 137 +35 PostgreSQL 89 113 +24 46 56 +10 Xorg 84 101 +17 52 29 -23 Eclipse 65 108 +43 54 55 +1 Lucene 152 139 -13 30 200 +170 Jackrabbit 420 414 -6 261 370 +109 Average - - +12 - - +59 Statistical significant deltas are in bold. 25 25
  74. 74. Different Training Set Sizes PCC CC 300 NofB20 250 200 150 100 10 20 30 40 50 60 70 80 90 Training Set Size Per Developer 26 26
  75. 75. Different Training Set Sizes PCC CC 300 NofB20 250 200 150 100 10 20 30 40 50 60 70 80 90 Training Set Size Per Developer 26 26
  76. 76. The improvement presents in other setups. 27 27
  77. 77. Related Work • Kim et al., Classifying software changes: Clean or buggy?, TSE ’08 • Bettenburg et al., Think locally, act globally: Improving defect and effort prediction models, MSR ’12 28 28
  78. 78. Conclusions & Future Work • • PCC and PCC+ improve prediction performance. • Personalized approach can be applied to other fields. The improvement presents in other setups. ✦ Recommendation systems ✦ Vulnerability prediction ✦ Top crashes prediction 29 29
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×