DRONE: Predicting Priority of Reported Bugs
by Multi-Factor Analysis
Yuan Tian1, David Lo1, Chengnian Sun2
1Singapore Mana...
Bug tracking systems allow developers to prioritize
which bugs are to be fixed first
¡  Manual process
¡  Depend on othe...
Priority Vs Severity
3
“Severity is assigned by customers [users] while
priority is provided by developers . . . customer
...
q  Background
q  Approach
Overall Framework
Features
Classification Module
q  Experiment
Dataset
Research Questions
Res...
q  Background
q  Approach
Overall Framework
Features
Classification Module
q  Experiment
Dataset
Research Questions
Res...
(1) summary
(2) description
(3) product
(4) component
(5) author
(6) severity
(7) priority.
1
2
3
4
5
6
Time-related Info....
q  Background
q  Approach
Overall Framework
Features
Classification Module
q  Experiment
Dataset
Research Questions
Res...
Training
Reports
Related Reports
Model
Predicted Priority
Testing
Reports
Temporal Textual
Author
Severity
Product
Model B...
Temporal Factor
TMP1 Number of bugs reported within 7 days before the reporting of BR.
TMP2 Number of bugs reported with t...
Author Factor
AUT1 Mean priority of all bugs reports made by the author of BR prior to the
reporting of BR.
AUT2 Median pr...
Product Factor
PRO1 BR’s product field Note: categorical feature
PRO2 Number of bug reports made for the same product as t...
Training
Reports
Related Reports
Model
Predicted Priority
Testing
Reports
Temporal Textual
Author
Severity
Product
Model B...
Model
Building
Data
Training
Features
Linear
Regression Model
GRAY: Thresholding and Linear Regression to Classify
Imbalan...
Model
Building
Data
Training
Features
Linear
Regression
Model
Application
Model
Validation
Data
Thresholding
Thresholds
GR...
Model
Building
Data
Training
Features
Linear
Regression
Model
Application
Model
Validation
Data
Thresholding
Thresholds
GR...
Thresholding Process
16
BR1 1.2
BR2 1.4
BR3 3.1
BR4 3.5
BR5 2.1
BR6 3.2
BR7 3.4
BR8 3.7
BR9 1.3
BR10 4.5
BR1 1.2
BR9 1.3
B...
Model
Building
Data
Training
Features
Linear
Regression
Model
Application
Testing
Features
Model
Predicted Priority
Valida...
q  Background
q  Approach
Overall Framework
Features
Classification Module
q  Experiment
Dataset
Research Questions
Res...
q Eclipse Project
§  2001-10-10 to 2007-12-14,
§  178,609 bug reports.
Dataset
DRONE TestingDRONE Training
Model Buildi...
? Accuracy (Precision, Recall, F-measure)
Compare with SEVERISprio [Menzies & Marcus], SEVERISprio+
? Efficiency (Run time...
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
P1 P2 P3 P4 P5
F-measure
DRONE SEVERISprio SEVERISprio+
RQ1: How accurate?
21
29...
Approach
Run Time (in seconds)
Feature
Extraction(train)
Model
Building
Feature
Extraction(test)
Model
Application
SEVERIS...
Feature
PRO5
PRO16
REP1
REP3
PRO18
PRO10
PRO21
PRO7
REP5
Text “1663”
RQ3: What are the top-features?
23
Feature
PRO5
PRO16
REP1
REP3
PRO18
PRO10
PRO21
PRO7
REP5
Text “1663”
RQ3: What are the top-features?
6 out of the top-10 f...
Feature
PRO5
PRO16
REP1
REP3
PRO18
PRO10
PRO21
PRO7
REP5
Text “1663”
RQ3: What are the top-features?
3 out of the top-10 f...
Feature
PRO5
PRO16
REP1
REP3
PRO18
PRO10
PRO21
PRO7
REP5
Text “1663”
RQ3: What are the top-features?
1)  org.eclipse.ui.in...
Conclusion
yuan.tian.2012@smu.edu.sg
q  Priority prediction is an ordinal +
imbalance classification problem -
>linear re...
Conclusion
yuan.tian.2012@smu.edu.sg
q  Priority prediction is an ordinal +
imbalance classification problem -
>linear re...
Conclusion
yuan.tian.2012@smu.edu.sg
q  Priority prediction is an ordinal +
imbalance classification problem -
>linear re...
30
I acknowledge the support of Google and the
ICSM organizers in the form of a Female
Student Travel Grant, which enabled...
Conclusion
yuan.tian.2012@smu.edu.sg
q  Priority prediction is an ordinal +
imbalance classification problem -
>linear re...
APPENDIX
33
P1:10% P2:20% P3:40% P4:20% P5:10%
Proportions of each priority levels in Validation Data:
After applying Linear Regressio...
BR1 1.2
BR9 1.3
BR2 1.4
BR5 2.1
BR3 3.1
BR6 3.2
BR7 3.4
BR4 3.5
BR8 3.7
BR10 4.5
P1
P2
P3
P4
P5
1.2
1.4
3.4
3.7
Initialize...
36
BR1 1.2
BR9 1.3
BR2 1.4
BR5 2.1
BR3 3.1
BR6 3.2
BR7 3.4
BR4 3.5
BR8 3.7
BR10 4.5
1.2
1.4
3.4
3.7
Tune one
threshold
1.1...
BR1 1.2
BR9 1.3
BR2 1.4
BR5 2.1
BR3 3.1
BR6 3.2
BR7 3.4
BR4 3.5
BR8 3.7
BR10 4.5
1.4
3.4
3.7
1.3
Threshold 1 is fixed
BR1 ...
q  Menzies and Marcus (ICSM 2008)
¡  Analyze reports in NASA
¡  Textual features +feature selection+ RIPPER
q  Lamkanf...
q Tokenization
Spliting document into tokens according to delimiters.
q Stop-word Removal
eg: are, is, I, he
q Stemming...
40
q Textual Features
§  Compute BM25Fext scores
§  Feature1: Extract unigram
§  Feature2: Extract bigrams
q Non-Text...
Feature
PRO5 Proportion of bug reports made for the same product as that of BR prior to the
reporting of BR that are assig...
Upcoming SlideShare
Loading in …5
×

DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis

299 views

Published on

This is my presentation on ICSM 2013.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
299
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis

  1. 1. DRONE: Predicting Priority of Reported Bugs by Multi-Factor Analysis Yuan Tian1, David Lo1, Chengnian Sun2 1Singapore Management University 2National University of Singapore
  2. 2. Bug tracking systems allow developers to prioritize which bugs are to be fixed first ¡  Manual process ¡  Depend on other bugs ¡  Time consuming What is priority and when it is assigned? New Assigned 300 reports to triage daily! Validity Check, Duplicate Check, Priority Assignment Developer Assignment Bug Triager 2
  3. 3. Priority Vs Severity 3 “Severity is assigned by customers [users] while priority is provided by developers . . . customer [user] reported severity does impact the developer when they assign a priority level to a bug report, but it’s not the only consideration. For example, it may be a critical issue for a particular reporter that a bug is fixed but it still may not be the right thing for the eclipse team to fix.” Eclipse PMC Member Importance: P5 (lowest priority level), major (high severity)
  4. 4. q  Background q  Approach Overall Framework Features Classification Module q  Experiment Dataset Research Questions Results q Conclusion Outline 4
  5. 5. q  Background q  Approach Overall Framework Features Classification Module q  Experiment Dataset Research Questions Results q Conclusion Outline 5
  6. 6. (1) summary (2) description (3) product (4) component (5) author (6) severity (7) priority. 1 2 3 4 5 6 Time-related Info. 7 6 Bug Report
  7. 7. q  Background q  Approach Overall Framework Features Classification Module q  Experiment Dataset Research Questions Results q Conclusion Outline 7
  8. 8. Training Reports Related Reports Model Predicted Priority Testing Reports Temporal Textual Author Severity Product Model Builder Model Application Feature Extraction Module Classifier Module Training Phase Testing Phase Overall Framework 8
  9. 9. Temporal Factor TMP1 Number of bugs reported within 7 days before the reporting of BR. TMP2 Number of bugs reported with the same severity within 7 days before the reporting of BR. TMP3 Number of bugs reported with the same or higher severity within 7 days before the reporting of BR. TMP4-6 The same as TMP1-3 except the time duration is 1 day. TMP7-9 The same as TMP1-3 except the time duration is 3 days. TMP10-12 The same as TMP1-3 except the time duration is 30 days. Textual Factor TXT1-n Stemmed words from the description field of BR excluding stop words. Severity Factor SEV BR’s severity field. 9
  10. 10. Author Factor AUT1 Mean priority of all bugs reports made by the author of BR prior to the reporting of BR. AUT2 Median priority of all bugs reports made by the author of BR prior to the reporting of BR. AUT3 The number of bug reports made by the author of BR prior to the reporting of BR. Related Reports Factor [REP-, Sun et al.] REP1 Mean priority of the top-20 most similar bug reports to BR as measured using REP- prior to the reporting of BR. REP2 Median priority of the top-20 most similar bug reports to BR as measured using REP prior to the reporting of BR. REP3-4 The same as REP1-2 except only the top 10 bug reports are considered. REP5-6 The same as REP1-2 except only the top 5 bug reports are considered. REP7-8 The same as REP1-2 except only the top 3 bug reports are considered. REP9-10 The same as REP1-2 except only the top 1 bug reports are considered. 10
  11. 11. Product Factor PRO1 BR’s product field Note: categorical feature PRO2 Number of bug reports made for the same product as that of BR prior to the reporting of BR. PRO3 Number of bug reports made for the same product of the same severity as that of BR prior to the reporting of BR. PRO4 Number of bug reports made for the same product of the same or higher severity as those of BR prior to the reporting of BR. PRO5 Proportion of bug reports made for the same product as that of BR prior to the reporting of BR that are assigned priority P1. PRO6-9 The same as PRO5 except they are for priority P2-P5 respectively. PRO10 Mean priority of bug reports made for the same product as that of BR prior to the reporting of BR. PRO11 Median priority of bug reports mad for the same product as that of BR prior to the reporting of BR. PRO12-22 The same as PRO1-11 except they are for the component field of BR. 11
  12. 12. Training Reports Related Reports Model Predicted Priority Testing Reports Temporal Textual Author Severity Product Model Builder Model Application Feature Extraction Module Classifier Module Training Phase Testing Phase Overall Framework 12
  13. 13. Model Building Data Training Features Linear Regression Model GRAY: Thresholding and Linear Regression to Classify Imbalanced Data. 13 Map feature values to real numbers Training Phase
  14. 14. Model Building Data Training Features Linear Regression Model Application Model Validation Data Thresholding Thresholds GRAY: Thresholding and Linear Regression to Classify Imbalanced Data. 14 Training Phase
  15. 15. Model Building Data Training Features Linear Regression Model Application Model Validation Data Thresholding Thresholds GRAY: Thresholding and Linear Regression to Classify Imbalanced Data. 15 •  Thresholding process maps real numbers to priority levels. Training Phase
  16. 16. Thresholding Process 16 BR1 1.2 BR2 1.4 BR3 3.1 BR4 3.5 BR5 2.1 BR6 3.2 BR7 3.4 BR8 3.7 BR9 1.3 BR10 4.5 BR1 1.2 BR9 1.3 BR2 1.4 BR5 2.1 BR3 3.1 BR6 3.2 BR7 3.4 BR4 3.5 BR8 3.7 BR10 4.5 Sort BR1 1.2 BR9 1.3 BR2 1.4 BR5 2.1 BR3 3.1 BR6 3.2 BR7 3.4 BR4 3.5 BR8 3.7 BR10 4.5 P1 P2 P3 P4 P5 1.2 1.4 3.4 3.7
  17. 17. Model Building Data Training Features Linear Regression Model Application Testing Features Model Predicted Priority Validation Data Thresholding Thresholds GRAY: Thresholding and Linear Regression to Classify Imbalanced Data. 17 Training Phase Testing Phase
  18. 18. q  Background q  Approach Overall Framework Features Classification Module q  Experiment Dataset Research Questions Results q Conclusion Outline 18
  19. 19. q Eclipse Project §  2001-10-10 to 2007-12-14, §  178,609 bug reports. Dataset DRONE TestingDRONE Training Model Building Validation REP- 4.50% 6.89% 85.45% 1.95% 1.21% P1 P2 P3 P4 P5 19
  20. 20. ? Accuracy (Precision, Recall, F-measure) Compare with SEVERISprio [Menzies & Marcus], SEVERISprio+ ? Efficiency (Run time) ? Top features (Fisher score) Research Questions & Measurements 20
  21. 21. 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% P1 P2 P3 P4 P5 F-measure DRONE SEVERISprio SEVERISprio+ RQ1: How accurate? 21 29.47% 18.75% 1.  Baselines predict everything as P3 ! 2.  Average F-measure improves from 18.75% to 29.47% 3.  A relative improvement of 57.17%.
  22. 22. Approach Run Time (in seconds) Feature Extraction(train) Model Building Feature Extraction(test) Model Application SEVERISprio <0.01 812.18 <0.01 <0.01 SEVERISprio+ <0.01 773.62 <0.01 <0.01 DRONE 0.01 69.25 <0.01 <0.01 RQ2: How efficient? 22 Our approach is much faster in Model Building!
  23. 23. Feature PRO5 PRO16 REP1 REP3 PRO18 PRO10 PRO21 PRO7 REP5 Text “1663” RQ3: What are the top-features? 23
  24. 24. Feature PRO5 PRO16 REP1 REP3 PRO18 PRO10 PRO21 PRO7 REP5 Text “1663” RQ3: What are the top-features? 6 out of the top-10 features belong to the product factor family. 24
  25. 25. Feature PRO5 PRO16 REP1 REP3 PRO18 PRO10 PRO21 PRO7 REP5 Text “1663” RQ3: What are the top-features? 3 out of the top-10 features come from the related-report factor family. 25
  26. 26. Feature PRO5 PRO16 REP1 REP3 PRO18 PRO10 PRO21 PRO7 REP5 Text “1663” RQ3: What are the top-features? 1)  org.eclipse.ui.internal.Wor kbench.run(Workbech.jav a:1663) 2)  Appears in 15% P5 reports. 26
  27. 27. Conclusion yuan.tian.2012@smu.edu.sg q  Priority prediction is an ordinal + imbalance classification problem - >linear regression + thresholding is one option. q  DRONE can improve the average F- measure of baselines from 18.75% to 29.47%, a relative improvement of 57.17%. q  Product factor features are the most discriminative features, followed by related-reports factor features.
  28. 28. Conclusion yuan.tian.2012@smu.edu.sg q  Priority prediction is an ordinal + imbalance classification problem - >linear regression + thresholding is one option. q  DRONE can improve the average F- measure of baselines from 18.75% to 29.47%, a relative improvement of 57.17%. q  Product factor features are the most discriminative features, followed by related-reports factor features.
  29. 29. Conclusion yuan.tian.2012@smu.edu.sg q  Priority prediction is an ordinal + imbalance classification problem - >linear regression + thresholding is one option. q  DRONE can improve the average F- measure of baselines from 18.75% to 29.47%, a relative improvement of 57.17%. q  Product factor features are the most discriminative features, followed by related-reports factor features.
  30. 30. 30 I acknowledge the support of Google and the ICSM organizers in the form of a Female Student Travel Grant, which enabled me to attend this conference. Thank you!
  31. 31. Conclusion yuan.tian.2012@smu.edu.sg q  Priority prediction is an ordinal + imbalance classification problem - >linear regression + thresholding is one option. q  DRONE can improve the average F- measure of baselines from 18.75% to 29.47%, a relative improvement of 57.17%. q  Product factor features are the most discriminative features, followed by related-reports factor features.
  32. 32. APPENDIX
  33. 33. 33
  34. 34. P1:10% P2:20% P3:40% P4:20% P5:10% Proportions of each priority levels in Validation Data: After applying Linear Regression Model on Validation Data: BR1 1.2 BR2 1.4 BR3 3.1 BR4 3.5 BR5 2.1 BR6 3.2 BR7 3.4 BR8 3.7 BR9 1.3 BR10 4.5 BR1 1.2 BR9 1.3 BR2 1.4 BR5 2.1 BR3 3.1 BR6 3.2 BR7 3.4 BR4 3.5 BR8 3.7 BR10 4.5 Sort BR1 1.2 BR9 1.3 BR2 1.4 BR5 2.1 BR3 3.1 BR6 3.2 BR7 3.4 BR4 3.5 BR8 3.7 BR10 4.5 Initial P1 P2 P3 P4 P5 1.2 1.4 3.4 3.7 Predicted priority level 34
  35. 35. BR1 1.2 BR9 1.3 BR2 1.4 BR5 2.1 BR3 3.1 BR6 3.2 BR7 3.4 BR4 3.5 BR8 3.7 BR10 4.5 P1 P2 P3 P4 P5 1.2 1.4 3.4 3.7 Initialized Thresholds: BR1 1.2 BR9 1.3 BR2 1.4 BR5 2.1 BR3 3.1 BR6 3.2 BR7 3.4 BR4 3.5 BR8 3.7 BR10 4.5 P3 P4 P5 Tune one threshold Compute F-measures 1.2 1.4 3.4 3.7 1.1 1.3 Compute F-measures 35
  36. 36. 36 BR1 1.2 BR9 1.3 BR2 1.4 BR5 2.1 BR3 3.1 BR6 3.2 BR7 3.4 BR4 3.5 BR8 3.7 BR10 4.5 1.2 1.4 3.4 3.7 Tune one threshold 1.1 1.3 Compute F-measures BR1 1.2 BR9 1.3 BR2 1.4 BR5 2.1 BR3 3.1 BR6 3.2 BR7 3.4 BR4 3.5 BR8 3.7 BR10 4.5 P3 P4 P5 1.4 3.4 3.7 1.3 Update threshold value P2 P1 Higher
  37. 37. BR1 1.2 BR9 1.3 BR2 1.4 BR5 2.1 BR3 3.1 BR6 3.2 BR7 3.4 BR4 3.5 BR8 3.7 BR10 4.5 1.4 3.4 3.7 1.3 Threshold 1 is fixed BR1 1.2 BR9 1.3 BR2 1.4 BR5 2.1 BR3 3.1 BR6 3.2 BR7 3.4 BR4 3.5 BR8 3.7 BR10 4.5 Tune for next threshold P3 P4 P5 1.4 3.4 3.7 1.3 P2 P1 37
  38. 38. q  Menzies and Marcus (ICSM 2008) ¡  Analyze reports in NASA ¡  Textual features +feature selection+ RIPPER q  Lamkanfi et al. (MSR 2010, CSMR 2011) ¡  Predict coarse-grained severity labels ¡  Severe vs. non-severe ¡  Analyze reports in open-source systems ¡  Compare and contrast various algorithms q  Tian et al.(WCRE 2012) ¡  Information retrieval + k nearest neighbour Previous Research Work: Severity Prediciton 38
  39. 39. q Tokenization Spliting document into tokens according to delimiters. q Stop-word Removal eg: are, is, I, he q Stemming eg: woking, works, worked->work Text Pre-processing 39
  40. 40. 40 q Textual Features §  Compute BM25Fext scores §  Feature1: Extract unigram §  Feature2: Extract bigrams q Non-Textual Features §  Feature3: Product field §  Feature4: Component Field Appendix: Similarity Between Bug Reports (REP-) Note: Weights are learned from duplicate bug reports.
  41. 41. Feature PRO5 Proportion of bug reports made for the same product as that of BR prior to the reporting of BR that are assigned priority P1. PRO16 Proportion of bug reports made for the same component as that of BR prior to the reporting of BR that are assigned priority P1. REP1 Mean priority of the top-20 most similar bug reports to BR as measured using REP- prior to the reporting of BR. REP3 Mean priority of the top-10 most similar bug reports to BR as measured using REP prior to the reporting of BR. PRO18 Proportion of bug reports made for the same component as that of BR prior to the reporting of BR that are assigned priority P3. PRO10 Mean priority of bug reports made for the same product as that of BR prior to the reporting of BR. PRO21 Mean priority of bug reports made for the same component as that of BR prior to the reporting of BR. PRO7 Proportion of bug reports made for the same product as that of BR prior to the reporting of BR that are assigned priority P3. REP5 Mean priority of the top-5 most similar bug reports to BR as measured using REP prior to the reporting of BR. Text “1663” 41

×