SlideShare a Scribd company logo
Automatic Fine-Grained Issue Report
Reclassification
Pavneet Singh Kochhar, Ferdian Thung, David Lo
Singapore Management University
{kochharps.2012, ferdiant.2013, davidlo}@smu.edu.sg
2/24
Misclassification of Issue Reports
BUG
Herzig et al. *
• 40% of issue reports are misclassified.
• 1/3 issue reports are wrongly classified as bugs.
* It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction,
K. Herzig, S. Just, A. Zeller, ICSE 2013
DOCUMENTATIONIMPROVEMENT
REFACTORING
BACKPORTCLEANUP
DESIGN DEFECT
TASK
TEST
Impact of Misclassification
• Well-known projects receive large number of issue reports
• Large number of bug reports can overwhelm the
number of developers.
• Mozilla developer - “Everyday, almost 300 bugs appear
that need triaging.” *
• Manual Process
• Misclassified reports take more time to fix+
* J. Anvik, L. Hiew, and G. C. Murphy, “Coping with an open bug repository,” in ETX, pp. 35–39, 2005
+ X. Xia, D. Lo, M. Wen, E. Shihab, and B. Zhou, “An empirical study of bug report field reassignment,” in
CSMR-WCRE, pp. 174–183, 2014.
3/24
Related Work
• Herzig et al. [1] –
• Manually classify over 7000 issue reports.
• 14 different categories
 We use the same dataset
 We use 13 categories (merge UNKNOWN & OTHERS)
• Antoniol et al. [2] –
• Classify issue reports either as “bug” or “enhancement”
 We consider “reclassification” problem
 We use 13 different categories
[1] It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, K. Herzig, S. Just, A.
Zeller, ICSE 2013
[2] G. Antoniol, K. Ayari, M. D. Penta, F. Khomh, and Y.-G. Gueheneuc, “Is it a bug or an enhancement?
a text-based approach to classify change requests,” in CASCON, pp. 23:304–23:318, 2008.
4/24
Our Study
Fine-Grained Issue Report Reclassification
13 Categories*
BUG RFE IMPROVEMENT DOCUMENTATION
TASK BUILD
REFACTORING
DESIGN
DEFECT
TEST CLEANUP
BACKPORT
SPECIFICATION
OTHERS
* It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction,
K. Herzig, S. Just, A. Zeller, ICSE 2013
5/24
(Adaptive
Maintenance)
(Perfective
Maintenance)
(Deallocating
memory)
(Removing
Duplicate
methods)
Overall Framework
Training
Issue
Reports
Ground
Truth
Categories*
New Issue
Reports
Model
Building
Model
Feature Extraction
Predicted
Reclassified
Categories
Training Phase Deployment Phase
*Herzig et al.
6/24
Pre-Processing
• Text Pre-Processing
• Summary & Description fields
• Stop-word removal
• eg., “is”, “are”, “if”
• Stemming (Reducing to root form)
• eg., “reads” and “reading” -----> “read”
• Use Porter Stemmer*
*http://tartarus.org/martin/PorterStemmer/
7/24
Feature Extraction
1. TF-IDF
TF - Term Frequency, IDF- Inverse Document Frequency
2. Reported Category (C1-C13)
Cn=1 where n=1 to 13
8/24
Feature Extraction
3. Exception Trace (S)
a) Phrase: “Exception in thread”
b) Regex : [A-Za-z0-9$.]+Exception
eg., java.lang.NullPointerException
c) Regex :
[A-Za-z0-9$.]+[A-Za-z0-9]+([A-Za-z0-9]+(java:[0-9]+)?)
eg., oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:447)
4. Issue Reporter (R1-RM)
where M is total number of reporters
9/24
Model Building
• LibSVM (Support Vector Machine)*
• Multi-class classification
• Inputs
• L, Learner (Training Algorithm)
• X, Set of Training Data i.e., Issue Reports
• y, where 𝑦𝑖 ∈ {1, … 𝑘}, Labels i.e., 13 categories
• Output
• A list of classifiers 𝑓 𝑘 for k ∈ {1, … 𝑘},
• Classifiers are applied on unseen data to predict label k
*http://www.csie.ntu.edu.tw/~cjlin/libsvm/
10/24
Dataset
Projects Organization Tracker Number of
Issue Reports
HTTPClient Apache JIRA 746
Jackrabbit Apache JIRA 2402
Lucene-Java Apache JIRA 2443
Rhino Mozilla BugZilla 1226
Tomcat5 Apache BugZilla 584
Total = 7401 Issue Reports *
* It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction,
K. Herzig, S. Just, A. Zeller, ICSE 2013
11/24
Evaluation Metrics
𝑃𝑟𝑒 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 = #𝑇𝑃𝑐𝑎𝑡𝑒𝑔 𝑜𝑟𝑦
#𝑇𝑃𝑐𝑎𝑡𝑒𝑔 𝑜𝑟𝑦
+ #𝐹𝑃 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦
(Precision)
𝑅𝑒𝑐𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 = #𝑇𝑃𝑐𝑎𝑡𝑒𝑔 𝑜𝑟𝑦
#𝑇𝑃𝑐𝑎𝑡𝑒𝑔 𝑜𝑟𝑦
+ #𝐹𝑁 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦
(Recall)
𝐹1 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 = 2 𝑥 𝑃𝑟𝑒 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦
𝑥 𝑅𝑒𝑐 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦
𝑃𝑟𝑒 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦
+𝑅𝑒𝑐𝑐𝑎𝑡𝑒 𝑔𝑜𝑟𝑦
(F-Measure)
𝑊𝐹1 =
1
𝑁 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦=1
#𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦
𝑛 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑋 𝐹1 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦
( Weighted F-Measure)
We use Weighted Precision, Recall & F-Measure
12/24
Baselines
• Baseline-1
Predicts reclassified category same as assigned category
• Baseline-2
Predicts reclassified category as “BUG”
(Majority of the issues are BUGS)
13/24
Research Questions
RQ1: Effectiveness of Our Approach
RQ2: Varying the Amount of Training Data
RQ3: Most Discriminative Features
RQ4: Analysis of Correctly & Wrongly Classified Issue Reports
RQ5: Comparison to Other Classification Algorithms
14/24
RQ1: Effectiveness of Our Approach
HTTPClient Jackrabbit Lucene-Java
Prec Rec WF1 Prec Rec WF1 Prec Rec WF1
Ours 0.61 0.63 0.60 0.71 0.72 0.71 0.63 0.62 0.63
Baseline-1 0.54 0.52 0.43 0.61 0.62 0.54 0.50 0.50 0.43
Baseline-2 0.16 0.40 0.23 0.15 0.39 0.21 0.08 0.28 0.12
Improvement-1 12.96 21.15 39.53 16.39 16.12 31.48 24.00 26.00 44.18
Improvement-2 281.2 57.4 160.8 373.3 84.6 238.0 675.0 125.0 416.6
Rhino Tomcat5
Prec Rec WF1 Prec Rec WF1
Ours 0.58 0.61 0.57 0.58 0.62 0.58
Baseline-1 0.35 0.57 0.43 0.36 0.58 0.45
Baseline-2 0.26 0.51 0.35 0.30 0.54 0.38
Improvement-1 65.71 7.01 32.55 61.11 6.89 28.88
Improvement-2 123.0 19.6 62.85 93.3 14.8 52.63
15/24
RQ2: Varying Training Data
% of Issue
Reports
HTTPClient Jackrabbit Lucene-Java
Prec Rec WF1 Prec Rec WF1 Prec Rec WF1
10 0.49 0.56 0.47 0.63 0.65 0.60 0.55 0.57 0.53
20 0.54 0.55 0.46 0.64 0.66 0.61 0.57 0.57 0.54
30 0.58 0.60 0.54 0.68 0.70 0.67 0.59 0.60 0.58
40 0.54 0.53 0.48 0.69 0.71 0.68 0.59 0.58 0.56
50 0.58 0.61 0.57 0.69 0.71 0.69 0.62 0.63 0.61
60 0.59 0.62 0.58 0.64 0.65 0.62 0.61 0.62 0.61
70 0.60 0.62 0.58 0.70 0.72 0.70 0.62 0.63 0.62
80 0.62 0.68 0.61 0.70 0.72 0.70 0.63 0.64 0.63
90 0.61 0.64 0.60 0.71 0.73 0.71 0.62 0.63 0.62
16/24
RQ2: Varying Training Data
% of Issue
Reports
Rhino Tomcat5
Prec Rec WF1 Prec Rec WF1
10 0.45 0.52 0.40 0.47 0.54 0.43
20 0.46 0.50 0.39 0.50 0.55 0.45
30 0.46 0.50 0.40 0.54 0.60 0.53
40 0.47 0.48 0.40 0.56 0.62 0.56
50 0.52 0.58 0.50 0.56 0.61 0.56
60 0.55 0.59 0.53 0.50 0.48 0.42
70 0.56 0.60 0.54 0.49 0.44 0.38
80 0.58 0.61 0.56 0.57 0.62 0.58
90 0.59 0.61 0.56 0.54 0.59 0.55
17/24
RQ3: Most Discriminative Features
HTTPClient Jackrabbit
Feature Fisher
Score
Feature Fisher
Score
Stemmed word “test” 1.73 Reported Category (BUG) 0.72
Reported Category (TASK) 0.58 Stemmed word “test” 0.55
Stemmed word “privat” 0.56 Stemmed word “maven” 0.51
Reported Category (BUG) 0.54 Stemmed word “backport” 0.46
Stemmed word “cleanup” 0.50 Reported Category (IMPR) 0.43
18/24
RQ3: Most Discriminative Features
Lucene-Java Rhino
Feature Fisher
Score
Feature Fisher
Score
Stemmed word “test” 0.94 Stemmed word “test” 3.84
Reported Category (BUG) 0.61 Stemmed word “suit” 0.43
Reported Category (TEST) 0.50 Stemmed word “patch” 0.32
Stemmed word “backport” 0.45 Stemmed word “driver” 0.29
Stemmed word “remov” 0.38 Stemmed word “regress” 0.27
Tomcat5
Feature Fisher Score
Stemmed word “longer” 1.15
Issue Reporter “starksm” 0.71
Stemmed word “class” 0.64
Stemmed word “ant” 0.62
Reported Category (BUG) 0.56
19/24
RQ4: Correctly & Wrongly Classified Reports
BUG RFE IMPR TEST DOC BUILD CLEANUP REFAC
BUG 2631 48 119 26 23 8 8 1
RFE 139 765 223 6 13 7 13 31
IMPR 320 214 658 8 12 13 16 19
TEST 84 12 15 220 1 8 4 3
DOC 95 39 37 0 209 13 17 2
BUILD 29 17 19 11 10 127 5 1
CLEANUP 58 30 42 6 11 5 104 12
REFAC 20 51 61 1 2 0 16 91
Predicted Labels
GroundTruthLabels
Table shows 8 categories (Total 13 categories)
BUG – 2631/2914 (90.3%)
TEST – 220/349 (63%)
RFE – 765/1221 (62.7%)
20/24
RQ4: Correctly & Wrongly Classified Reports
BUG RFE IMPR TEST DOC BUILD CLEANUP REFAC
BUG 2631 48 119 26 23 8 8 1
RFE 139 765 223 6 13 7 13 31
IMPR 320 214 658 8 12 13 16 19
TEST 84 12 15 220 1 8 4 3
DOC 95 39 37 0 209 13 17 2
BUILD 29 17 19 11 10 127 5 1
CLEANUP 58 30 42 6 11 5 104 12
REFAC 20 51 61 1 2 0 16 91
Predicted Labels
GroundTruthLabels
21/24
RQ5: Comparison with Other Algorithms
Approach HTTPClient Jackrabbit Lucene-Java
Prec Rec WF1 Prec Rec WF1 Prec Rec WF1
Ours (LibSVM) 0.61 0.63 0.60 0.71 0.72 0.71 0.62 0.63 0.62
Naïve Bayes 0.49 0.47 0.48 0.51 0.39 0.43 0.46 0.37 0.40
NB
Multinomial
0.53 0.60 0.54 0.64 0.66 0.61 0.60 0.59 0.56
K-Nearest
Neighbors
0.47 0.29 0.34 0.60 0.58 0.59 0.46 0.40 0.42
Random
Forest
0.45 0.56 0.46 0.54 0.58 0.53 0.45 0.48 0.43
RBF Network 0.37 0.39 0.37 0.39 0.41 0.40 0.31 0.31 0.30
22/24
RQ5: Comparison with Other Algorithms
Approach Rhino Tomcat5
Prec Rec WF1 Prec Rec WF1
Ours (LibSVM) 0.58 0.61 0.57 0.58 0.62 0.58
Naïve Bayes 0.51 0.51 0.51 0.48 0.40 0.42
NB
Multinomial
0.52 0.58 0.49 0.51 0.58 0.47
K-Nearest
Neighbors
0.50 0.43 0.43 0.43 0.43 0.42
Random
Forest
0.51 0.56 0.47 0.45 0.56 0.46
RBF Network 0.40 0.43 0.41 0.33 0.54 0.39
23/24
Conclusion & Future Work
Automated approach to reclassify issue reports
Evaluate over 7000 issue reports
Extract features such as TF-IDF, Reported
category, Exception trace, Issue reporter
Perform multi-class classification (13 Categories)
F-Measure Score 0.57-0.71
Improvement of 28.88% - 414.66% over baselines
Future Work:
 Analyse more issue reports
 Design advanced multi-class solution
24/24
Thank You!
Email: kochharps.2012@smu.edu.sg

More Related Content

Viewers also liked

Donald Mender
Donald MenderDonald Mender
Donald Mender
agrilinea
 
Trust (LA TUTELA DEL PATRIMONIO)
Trust (LA TUTELA DEL PATRIMONIO)Trust (LA TUTELA DEL PATRIMONIO)
Trust (LA TUTELA DEL PATRIMONIO)
Raffaele Regni
 
The Story Factor
The Story FactorThe Story Factor
The Story Factor
Wan Yusof Wan Jeffery
 
Dinas pendidikan
Dinas pendidikanDinas pendidikan
Dinas pendidikan
pandirambo900
 
Dinas sosial
Dinas sosialDinas sosial
Dinas sosial
pandirambo900
 
Gabriele Caselli: Bere per non pensare. Il consumo di alcool come strategia d...
Gabriele Caselli: Bere per non pensare. Il consumo di alcool come strategia d...Gabriele Caselli: Bere per non pensare. Il consumo di alcool come strategia d...
Gabriele Caselli: Bere per non pensare. Il consumo di alcool come strategia d...
Gabriele Caselli
 
Khawar CV1. - Copy
Khawar CV1. - CopyKhawar CV1. - Copy
Khawar CV1. - Copy
khawar hussain
 
Stress in Psicologia Cognitiva (Gabriele Caselli)
Stress in Psicologia Cognitiva (Gabriele Caselli)Stress in Psicologia Cognitiva (Gabriele Caselli)
Stress in Psicologia Cognitiva (Gabriele Caselli)
Gabriele Caselli
 
Tim Bailey CV updated July 2015
Tim Bailey CV updated July 2015Tim Bailey CV updated July 2015
Tim Bailey CV updated July 2015
Tim Bailey
 
Lineamientos para el proceso de conformación del Consejo de cuenca.
Lineamientos para el proceso de conformación del Consejo de cuenca.Lineamientos para el proceso de conformación del Consejo de cuenca.
Lineamientos para el proceso de conformación del Consejo de cuenca.
Maria Fernanda Abella Amaya
 
Massimo Pregnolato
Massimo PregnolatoMassimo Pregnolato
Massimo Pregnolato
agrilinea
 
An Empirical Study on the Adequacy of Testing in Open Source Projects
An Empirical Study on the Adequacy of Testing in Open Source ProjectsAn Empirical Study on the Adequacy of Testing in Open Source Projects
An Empirical Study on the Adequacy of Testing in Open Source Projects
Pavneet Singh Kochhar
 
Bkd
BkdBkd
Cingular Continuity And Crisis Management Plan
Cingular Continuity And Crisis Management PlanCingular Continuity And Crisis Management Plan
Cingular Continuity And Crisis Management Plan
samuelgould
 
6C - Be or do?
6C - Be or do?6C - Be or do?
6C - Be or do?
Fabiola Damiani Maquito
 
Из истории создания отеч. школы хирургии
Из истории создания  отеч. школы хирургииИз истории создания  отеч. школы хирургии
Из истории создания отеч. школы хирургии
nizhgma.ru
 
Oil & Gas Big Data use cases
Oil & Gas Big Data use casesOil & Gas Big Data use cases
Oil & Gas Big Data use cases
elephantscale
 
197744740 jurnal-1
197744740 jurnal-1197744740 jurnal-1
197744740 jurnal-1
homeworkping3
 
82159587 case-study-on-corba
82159587 case-study-on-corba82159587 case-study-on-corba
82159587 case-study-on-corba
homeworkping3
 

Viewers also liked (19)

Donald Mender
Donald MenderDonald Mender
Donald Mender
 
Trust (LA TUTELA DEL PATRIMONIO)
Trust (LA TUTELA DEL PATRIMONIO)Trust (LA TUTELA DEL PATRIMONIO)
Trust (LA TUTELA DEL PATRIMONIO)
 
The Story Factor
The Story FactorThe Story Factor
The Story Factor
 
Dinas pendidikan
Dinas pendidikanDinas pendidikan
Dinas pendidikan
 
Dinas sosial
Dinas sosialDinas sosial
Dinas sosial
 
Gabriele Caselli: Bere per non pensare. Il consumo di alcool come strategia d...
Gabriele Caselli: Bere per non pensare. Il consumo di alcool come strategia d...Gabriele Caselli: Bere per non pensare. Il consumo di alcool come strategia d...
Gabriele Caselli: Bere per non pensare. Il consumo di alcool come strategia d...
 
Khawar CV1. - Copy
Khawar CV1. - CopyKhawar CV1. - Copy
Khawar CV1. - Copy
 
Stress in Psicologia Cognitiva (Gabriele Caselli)
Stress in Psicologia Cognitiva (Gabriele Caselli)Stress in Psicologia Cognitiva (Gabriele Caselli)
Stress in Psicologia Cognitiva (Gabriele Caselli)
 
Tim Bailey CV updated July 2015
Tim Bailey CV updated July 2015Tim Bailey CV updated July 2015
Tim Bailey CV updated July 2015
 
Lineamientos para el proceso de conformación del Consejo de cuenca.
Lineamientos para el proceso de conformación del Consejo de cuenca.Lineamientos para el proceso de conformación del Consejo de cuenca.
Lineamientos para el proceso de conformación del Consejo de cuenca.
 
Massimo Pregnolato
Massimo PregnolatoMassimo Pregnolato
Massimo Pregnolato
 
An Empirical Study on the Adequacy of Testing in Open Source Projects
An Empirical Study on the Adequacy of Testing in Open Source ProjectsAn Empirical Study on the Adequacy of Testing in Open Source Projects
An Empirical Study on the Adequacy of Testing in Open Source Projects
 
Bkd
BkdBkd
Bkd
 
Cingular Continuity And Crisis Management Plan
Cingular Continuity And Crisis Management PlanCingular Continuity And Crisis Management Plan
Cingular Continuity And Crisis Management Plan
 
6C - Be or do?
6C - Be or do?6C - Be or do?
6C - Be or do?
 
Из истории создания отеч. школы хирургии
Из истории создания  отеч. школы хирургииИз истории создания  отеч. школы хирургии
Из истории создания отеч. школы хирургии
 
Oil & Gas Big Data use cases
Oil & Gas Big Data use casesOil & Gas Big Data use cases
Oil & Gas Big Data use cases
 
197744740 jurnal-1
197744740 jurnal-1197744740 jurnal-1
197744740 jurnal-1
 
82159587 case-study-on-corba
82159587 case-study-on-corba82159587 case-study-on-corba
82159587 case-study-on-corba
 

Similar to Automatic Fine-Grained Issue Report Reclassification

It’s Not a Bug, It’s a Feature: Does Misclassification Affect Bug Localization?
It’s Not a Bug, It’s a Feature:Does Misclassification Affect Bug Localization?It’s Not a Bug, It’s a Feature:Does Misclassification Affect Bug Localization?
It’s Not a Bug, It’s a Feature: Does Misclassification Affect Bug Localization?
Pavneet Singh Kochhar
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
Claire Le Goues
 
Mutant Tests Too: The SQL
Mutant Tests Too: The SQLMutant Tests Too: The SQL
Mutant Tests Too: The SQL
DataWorks Summit
 
Potential Biases in Bug Localization: Do They Matter?
Potential Biases in Bug Localization: Do They Matter?Potential Biases in Bug Localization: Do They Matter?
Potential Biases in Bug Localization: Do They Matter?
Pavneet Singh Kochhar
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
Sung Kim
 
Performance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle CoherencePerformance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle Coherence
aragozin
 
A tale of bug prediction in software development
A tale of bug prediction in software developmentA tale of bug prediction in software development
A tale of bug prediction in software development
Martin Pinzger
 
PhD Dissertation Defense (April 2015)
PhD Dissertation Defense (April 2015)PhD Dissertation Defense (April 2015)
PhD Dissertation Defense (April 2015)
Shauvik Roy Choudhary, Ph.D.
 
Workshop BI/DWH AGILE TESTING SNS Bank English
Workshop BI/DWH AGILE TESTING SNS Bank EnglishWorkshop BI/DWH AGILE TESTING SNS Bank English
Workshop BI/DWH AGILE TESTING SNS Bank English
Marcus Drost
 
OPNFV Doctor - OpenStack最新情報セミナー 2017年7月
OPNFV Doctor - OpenStack最新情報セミナー 2017年7月OPNFV Doctor - OpenStack最新情報セミナー 2017年7月
OPNFV Doctor - OpenStack最新情報セミナー 2017年7月
VirtualTech Japan Inc.
 
Service Discovery. Spring Cloud Internals
Service Discovery. Spring Cloud InternalsService Discovery. Spring Cloud Internals
Service Discovery. Spring Cloud Internals
Aleksandr Tarasov
 
Deliver Faster with BDD/TDD - Designing Automated Tests That Don't Suck
Deliver Faster with BDD/TDD - Designing Automated Tests That Don't SuckDeliver Faster with BDD/TDD - Designing Automated Tests That Don't Suck
Deliver Faster with BDD/TDD - Designing Automated Tests That Don't Suck
Kevin Brockhoff
 
Life Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
Life Cycle of Metrics, Alerting, and Performance Monitoring in MicroservicesLife Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
Life Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
Sean Chittenden
 
Issre2010 malik
Issre2010 malikIssre2010 malik
Issre2010 malik
SAIL_QU
 
ICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-Localization
Masud Rahman
 
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
Yulia Tsisyk
 
Performance Monitoring for Docker Environments - Docker Amsterdam June Meetup
Performance Monitoring for Docker Environments - Docker Amsterdam June MeetupPerformance Monitoring for Docker Environments - Docker Amsterdam June Meetup
Performance Monitoring for Docker Environments - Docker Amsterdam June Meetup
Stijn Polfliet
 
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug PredictionIt's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
sjust
 
Keynote HotSWUp 2012
Keynote HotSWUp 2012Keynote HotSWUp 2012
Keynote HotSWUp 2012
Martin Pinzger
 
An Exploration of Challenges Limiting Pragmatic Software Defect Prediction
An Exploration of Challenges Limiting Pragmatic Software Defect PredictionAn Exploration of Challenges Limiting Pragmatic Software Defect Prediction
An Exploration of Challenges Limiting Pragmatic Software Defect Prediction
SAIL_QU
 

Similar to Automatic Fine-Grained Issue Report Reclassification (20)

It’s Not a Bug, It’s a Feature: Does Misclassification Affect Bug Localization?
It’s Not a Bug, It’s a Feature:Does Misclassification Affect Bug Localization?It’s Not a Bug, It’s a Feature:Does Misclassification Affect Bug Localization?
It’s Not a Bug, It’s a Feature: Does Misclassification Affect Bug Localization?
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
 
Mutant Tests Too: The SQL
Mutant Tests Too: The SQLMutant Tests Too: The SQL
Mutant Tests Too: The SQL
 
Potential Biases in Bug Localization: Do They Matter?
Potential Biases in Bug Localization: Do They Matter?Potential Biases in Bug Localization: Do They Matter?
Potential Biases in Bug Localization: Do They Matter?
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
 
Performance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle CoherencePerformance Test Driven Development with Oracle Coherence
Performance Test Driven Development with Oracle Coherence
 
A tale of bug prediction in software development
A tale of bug prediction in software developmentA tale of bug prediction in software development
A tale of bug prediction in software development
 
PhD Dissertation Defense (April 2015)
PhD Dissertation Defense (April 2015)PhD Dissertation Defense (April 2015)
PhD Dissertation Defense (April 2015)
 
Workshop BI/DWH AGILE TESTING SNS Bank English
Workshop BI/DWH AGILE TESTING SNS Bank EnglishWorkshop BI/DWH AGILE TESTING SNS Bank English
Workshop BI/DWH AGILE TESTING SNS Bank English
 
OPNFV Doctor - OpenStack最新情報セミナー 2017年7月
OPNFV Doctor - OpenStack最新情報セミナー 2017年7月OPNFV Doctor - OpenStack最新情報セミナー 2017年7月
OPNFV Doctor - OpenStack最新情報セミナー 2017年7月
 
Service Discovery. Spring Cloud Internals
Service Discovery. Spring Cloud InternalsService Discovery. Spring Cloud Internals
Service Discovery. Spring Cloud Internals
 
Deliver Faster with BDD/TDD - Designing Automated Tests That Don't Suck
Deliver Faster with BDD/TDD - Designing Automated Tests That Don't SuckDeliver Faster with BDD/TDD - Designing Automated Tests That Don't Suck
Deliver Faster with BDD/TDD - Designing Automated Tests That Don't Suck
 
Life Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
Life Cycle of Metrics, Alerting, and Performance Monitoring in MicroservicesLife Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
Life Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
 
Issre2010 malik
Issre2010 malikIssre2010 malik
Issre2010 malik
 
ICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-Localization
 
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
Рахманов Александр "Что полезного в разборе дампов для .NET-разработчиков?"
 
Performance Monitoring for Docker Environments - Docker Amsterdam June Meetup
Performance Monitoring for Docker Environments - Docker Amsterdam June MeetupPerformance Monitoring for Docker Environments - Docker Amsterdam June Meetup
Performance Monitoring for Docker Environments - Docker Amsterdam June Meetup
 
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug PredictionIt's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
 
Keynote HotSWUp 2012
Keynote HotSWUp 2012Keynote HotSWUp 2012
Keynote HotSWUp 2012
 
An Exploration of Challenges Limiting Pragmatic Software Defect Prediction
An Exploration of Challenges Limiting Pragmatic Software Defect PredictionAn Exploration of Challenges Limiting Pragmatic Software Defect Prediction
An Exploration of Challenges Limiting Pragmatic Software Defect Prediction
 

More from Pavneet Singh Kochhar

Mining Testing Questions on Stack Overflow
Mining Testing Questions on Stack OverflowMining Testing Questions on Stack Overflow
Mining Testing Questions on Stack Overflow
Pavneet Singh Kochhar
 
Cataloging GitHub Repositories
Cataloging GitHub RepositoriesCataloging GitHub Repositories
Cataloging GitHub Repositories
Pavneet Singh Kochhar
 
An Exploratory Study of Functionality and Learning Resources of WebAPIs on Pr...
An Exploratory Study of Functionality and Learning Resources of WebAPIs on Pr...An Exploratory Study of Functionality and Learning Resources of WebAPIs on Pr...
An Exploratory Study of Functionality and Learning Resources of WebAPIs on Pr...
Pavneet Singh Kochhar
 
Revisiting Assert Use in GitHub Projects
Revisiting Assert Use in GitHub ProjectsRevisiting Assert Use in GitHub Projects
Revisiting Assert Use in GitHub Projects
Pavneet Singh Kochhar
 
Practitioners’ Expectations on Automated Fault Localization
Practitioners’ Expectations on Automated Fault LocalizationPractitioners’ Expectations on Automated Fault Localization
Practitioners’ Expectations on Automated Fault Localization
Pavneet Singh Kochhar
 
A Large Scale Study of Multiple Programming Languages and Code Quality
A Large Scale Study of Multiple Programming Languages and Code QualityA Large Scale Study of Multiple Programming Languages and Code Quality
A Large Scale Study of Multiple Programming Languages and Code Quality
Pavneet Singh Kochhar
 
Adoption of Software Testing in Open Source Projects - A Preliminary Study on...
Adoption of Software Testing in Open Source Projects - A Preliminary Study on...Adoption of Software Testing in Open Source Projects - A Preliminary Study on...
Adoption of Software Testing in Open Source Projects - A Preliminary Study on...
Pavneet Singh Kochhar
 
Understanding the Test Automation Culture of App Developers
Understanding the Test Automation Culture of App DevelopersUnderstanding the Test Automation Culture of App Developers
Understanding the Test Automation Culture of App Developers
Pavneet Singh Kochhar
 
Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...
Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...
Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...
Pavneet Singh Kochhar
 
An Empirical Study of Adoption of Software Testing in Open Source Projects
An Empirical Study of Adoption of Software Testing in Open Source ProjectsAn Empirical Study of Adoption of Software Testing in Open Source Projects
An Empirical Study of Adoption of Software Testing in Open Source Projects
Pavneet Singh Kochhar
 

More from Pavneet Singh Kochhar (10)

Mining Testing Questions on Stack Overflow
Mining Testing Questions on Stack OverflowMining Testing Questions on Stack Overflow
Mining Testing Questions on Stack Overflow
 
Cataloging GitHub Repositories
Cataloging GitHub RepositoriesCataloging GitHub Repositories
Cataloging GitHub Repositories
 
An Exploratory Study of Functionality and Learning Resources of WebAPIs on Pr...
An Exploratory Study of Functionality and Learning Resources of WebAPIs on Pr...An Exploratory Study of Functionality and Learning Resources of WebAPIs on Pr...
An Exploratory Study of Functionality and Learning Resources of WebAPIs on Pr...
 
Revisiting Assert Use in GitHub Projects
Revisiting Assert Use in GitHub ProjectsRevisiting Assert Use in GitHub Projects
Revisiting Assert Use in GitHub Projects
 
Practitioners’ Expectations on Automated Fault Localization
Practitioners’ Expectations on Automated Fault LocalizationPractitioners’ Expectations on Automated Fault Localization
Practitioners’ Expectations on Automated Fault Localization
 
A Large Scale Study of Multiple Programming Languages and Code Quality
A Large Scale Study of Multiple Programming Languages and Code QualityA Large Scale Study of Multiple Programming Languages and Code Quality
A Large Scale Study of Multiple Programming Languages and Code Quality
 
Adoption of Software Testing in Open Source Projects - A Preliminary Study on...
Adoption of Software Testing in Open Source Projects - A Preliminary Study on...Adoption of Software Testing in Open Source Projects - A Preliminary Study on...
Adoption of Software Testing in Open Source Projects - A Preliminary Study on...
 
Understanding the Test Automation Culture of App Developers
Understanding the Test Automation Culture of App DevelopersUnderstanding the Test Automation Culture of App Developers
Understanding the Test Automation Culture of App Developers
 
Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...
Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...
Code Coverage and Test Suite Effectiveness: Empirical Study with Real Bugs in...
 
An Empirical Study of Adoption of Software Testing in Open Source Projects
An Empirical Study of Adoption of Software Testing in Open Source ProjectsAn Empirical Study of Adoption of Software Testing in Open Source Projects
An Empirical Study of Adoption of Software Testing in Open Source Projects
 

Recently uploaded

E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
kalichargn70th171
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
Rakesh Kumar R
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
lorraineandreiamcidl
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
Gerardo Pardo-Castellote
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 

Recently uploaded (20)

E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
Why Mobile App Regression Testing is Critical for Sustained Success_ A Detail...
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 

Automatic Fine-Grained Issue Report Reclassification

  • 1. Automatic Fine-Grained Issue Report Reclassification Pavneet Singh Kochhar, Ferdian Thung, David Lo Singapore Management University {kochharps.2012, ferdiant.2013, davidlo}@smu.edu.sg
  • 2. 2/24 Misclassification of Issue Reports BUG Herzig et al. * • 40% of issue reports are misclassified. • 1/3 issue reports are wrongly classified as bugs. * It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, K. Herzig, S. Just, A. Zeller, ICSE 2013 DOCUMENTATIONIMPROVEMENT REFACTORING BACKPORTCLEANUP DESIGN DEFECT TASK TEST
  • 3. Impact of Misclassification • Well-known projects receive large number of issue reports • Large number of bug reports can overwhelm the number of developers. • Mozilla developer - “Everyday, almost 300 bugs appear that need triaging.” * • Manual Process • Misclassified reports take more time to fix+ * J. Anvik, L. Hiew, and G. C. Murphy, “Coping with an open bug repository,” in ETX, pp. 35–39, 2005 + X. Xia, D. Lo, M. Wen, E. Shihab, and B. Zhou, “An empirical study of bug report field reassignment,” in CSMR-WCRE, pp. 174–183, 2014. 3/24
  • 4. Related Work • Herzig et al. [1] – • Manually classify over 7000 issue reports. • 14 different categories  We use the same dataset  We use 13 categories (merge UNKNOWN & OTHERS) • Antoniol et al. [2] – • Classify issue reports either as “bug” or “enhancement”  We consider “reclassification” problem  We use 13 different categories [1] It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, K. Herzig, S. Just, A. Zeller, ICSE 2013 [2] G. Antoniol, K. Ayari, M. D. Penta, F. Khomh, and Y.-G. Gueheneuc, “Is it a bug or an enhancement? a text-based approach to classify change requests,” in CASCON, pp. 23:304–23:318, 2008. 4/24
  • 5. Our Study Fine-Grained Issue Report Reclassification 13 Categories* BUG RFE IMPROVEMENT DOCUMENTATION TASK BUILD REFACTORING DESIGN DEFECT TEST CLEANUP BACKPORT SPECIFICATION OTHERS * It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, K. Herzig, S. Just, A. Zeller, ICSE 2013 5/24 (Adaptive Maintenance) (Perfective Maintenance) (Deallocating memory) (Removing Duplicate methods)
  • 6. Overall Framework Training Issue Reports Ground Truth Categories* New Issue Reports Model Building Model Feature Extraction Predicted Reclassified Categories Training Phase Deployment Phase *Herzig et al. 6/24
  • 7. Pre-Processing • Text Pre-Processing • Summary & Description fields • Stop-word removal • eg., “is”, “are”, “if” • Stemming (Reducing to root form) • eg., “reads” and “reading” -----> “read” • Use Porter Stemmer* *http://tartarus.org/martin/PorterStemmer/ 7/24
  • 8. Feature Extraction 1. TF-IDF TF - Term Frequency, IDF- Inverse Document Frequency 2. Reported Category (C1-C13) Cn=1 where n=1 to 13 8/24
  • 9. Feature Extraction 3. Exception Trace (S) a) Phrase: “Exception in thread” b) Regex : [A-Za-z0-9$.]+Exception eg., java.lang.NullPointerException c) Regex : [A-Za-z0-9$.]+[A-Za-z0-9]+([A-Za-z0-9]+(java:[0-9]+)?) eg., oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:447) 4. Issue Reporter (R1-RM) where M is total number of reporters 9/24
  • 10. Model Building • LibSVM (Support Vector Machine)* • Multi-class classification • Inputs • L, Learner (Training Algorithm) • X, Set of Training Data i.e., Issue Reports • y, where 𝑦𝑖 ∈ {1, … 𝑘}, Labels i.e., 13 categories • Output • A list of classifiers 𝑓 𝑘 for k ∈ {1, … 𝑘}, • Classifiers are applied on unseen data to predict label k *http://www.csie.ntu.edu.tw/~cjlin/libsvm/ 10/24
  • 11. Dataset Projects Organization Tracker Number of Issue Reports HTTPClient Apache JIRA 746 Jackrabbit Apache JIRA 2402 Lucene-Java Apache JIRA 2443 Rhino Mozilla BugZilla 1226 Tomcat5 Apache BugZilla 584 Total = 7401 Issue Reports * * It’s not a Bug, it’s a Feature: How Misclassification Impacts Bug Prediction, K. Herzig, S. Just, A. Zeller, ICSE 2013 11/24
  • 12. Evaluation Metrics 𝑃𝑟𝑒 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 = #𝑇𝑃𝑐𝑎𝑡𝑒𝑔 𝑜𝑟𝑦 #𝑇𝑃𝑐𝑎𝑡𝑒𝑔 𝑜𝑟𝑦 + #𝐹𝑃 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 (Precision) 𝑅𝑒𝑐𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 = #𝑇𝑃𝑐𝑎𝑡𝑒𝑔 𝑜𝑟𝑦 #𝑇𝑃𝑐𝑎𝑡𝑒𝑔 𝑜𝑟𝑦 + #𝐹𝑁 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 (Recall) 𝐹1 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 = 2 𝑥 𝑃𝑟𝑒 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑥 𝑅𝑒𝑐 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑃𝑟𝑒 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 +𝑅𝑒𝑐𝑐𝑎𝑡𝑒 𝑔𝑜𝑟𝑦 (F-Measure) 𝑊𝐹1 = 1 𝑁 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦=1 #𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑛 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑋 𝐹1 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 ( Weighted F-Measure) We use Weighted Precision, Recall & F-Measure 12/24
  • 13. Baselines • Baseline-1 Predicts reclassified category same as assigned category • Baseline-2 Predicts reclassified category as “BUG” (Majority of the issues are BUGS) 13/24
  • 14. Research Questions RQ1: Effectiveness of Our Approach RQ2: Varying the Amount of Training Data RQ3: Most Discriminative Features RQ4: Analysis of Correctly & Wrongly Classified Issue Reports RQ5: Comparison to Other Classification Algorithms 14/24
  • 15. RQ1: Effectiveness of Our Approach HTTPClient Jackrabbit Lucene-Java Prec Rec WF1 Prec Rec WF1 Prec Rec WF1 Ours 0.61 0.63 0.60 0.71 0.72 0.71 0.63 0.62 0.63 Baseline-1 0.54 0.52 0.43 0.61 0.62 0.54 0.50 0.50 0.43 Baseline-2 0.16 0.40 0.23 0.15 0.39 0.21 0.08 0.28 0.12 Improvement-1 12.96 21.15 39.53 16.39 16.12 31.48 24.00 26.00 44.18 Improvement-2 281.2 57.4 160.8 373.3 84.6 238.0 675.0 125.0 416.6 Rhino Tomcat5 Prec Rec WF1 Prec Rec WF1 Ours 0.58 0.61 0.57 0.58 0.62 0.58 Baseline-1 0.35 0.57 0.43 0.36 0.58 0.45 Baseline-2 0.26 0.51 0.35 0.30 0.54 0.38 Improvement-1 65.71 7.01 32.55 61.11 6.89 28.88 Improvement-2 123.0 19.6 62.85 93.3 14.8 52.63 15/24
  • 16. RQ2: Varying Training Data % of Issue Reports HTTPClient Jackrabbit Lucene-Java Prec Rec WF1 Prec Rec WF1 Prec Rec WF1 10 0.49 0.56 0.47 0.63 0.65 0.60 0.55 0.57 0.53 20 0.54 0.55 0.46 0.64 0.66 0.61 0.57 0.57 0.54 30 0.58 0.60 0.54 0.68 0.70 0.67 0.59 0.60 0.58 40 0.54 0.53 0.48 0.69 0.71 0.68 0.59 0.58 0.56 50 0.58 0.61 0.57 0.69 0.71 0.69 0.62 0.63 0.61 60 0.59 0.62 0.58 0.64 0.65 0.62 0.61 0.62 0.61 70 0.60 0.62 0.58 0.70 0.72 0.70 0.62 0.63 0.62 80 0.62 0.68 0.61 0.70 0.72 0.70 0.63 0.64 0.63 90 0.61 0.64 0.60 0.71 0.73 0.71 0.62 0.63 0.62 16/24
  • 17. RQ2: Varying Training Data % of Issue Reports Rhino Tomcat5 Prec Rec WF1 Prec Rec WF1 10 0.45 0.52 0.40 0.47 0.54 0.43 20 0.46 0.50 0.39 0.50 0.55 0.45 30 0.46 0.50 0.40 0.54 0.60 0.53 40 0.47 0.48 0.40 0.56 0.62 0.56 50 0.52 0.58 0.50 0.56 0.61 0.56 60 0.55 0.59 0.53 0.50 0.48 0.42 70 0.56 0.60 0.54 0.49 0.44 0.38 80 0.58 0.61 0.56 0.57 0.62 0.58 90 0.59 0.61 0.56 0.54 0.59 0.55 17/24
  • 18. RQ3: Most Discriminative Features HTTPClient Jackrabbit Feature Fisher Score Feature Fisher Score Stemmed word “test” 1.73 Reported Category (BUG) 0.72 Reported Category (TASK) 0.58 Stemmed word “test” 0.55 Stemmed word “privat” 0.56 Stemmed word “maven” 0.51 Reported Category (BUG) 0.54 Stemmed word “backport” 0.46 Stemmed word “cleanup” 0.50 Reported Category (IMPR) 0.43 18/24
  • 19. RQ3: Most Discriminative Features Lucene-Java Rhino Feature Fisher Score Feature Fisher Score Stemmed word “test” 0.94 Stemmed word “test” 3.84 Reported Category (BUG) 0.61 Stemmed word “suit” 0.43 Reported Category (TEST) 0.50 Stemmed word “patch” 0.32 Stemmed word “backport” 0.45 Stemmed word “driver” 0.29 Stemmed word “remov” 0.38 Stemmed word “regress” 0.27 Tomcat5 Feature Fisher Score Stemmed word “longer” 1.15 Issue Reporter “starksm” 0.71 Stemmed word “class” 0.64 Stemmed word “ant” 0.62 Reported Category (BUG) 0.56 19/24
  • 20. RQ4: Correctly & Wrongly Classified Reports BUG RFE IMPR TEST DOC BUILD CLEANUP REFAC BUG 2631 48 119 26 23 8 8 1 RFE 139 765 223 6 13 7 13 31 IMPR 320 214 658 8 12 13 16 19 TEST 84 12 15 220 1 8 4 3 DOC 95 39 37 0 209 13 17 2 BUILD 29 17 19 11 10 127 5 1 CLEANUP 58 30 42 6 11 5 104 12 REFAC 20 51 61 1 2 0 16 91 Predicted Labels GroundTruthLabels Table shows 8 categories (Total 13 categories) BUG – 2631/2914 (90.3%) TEST – 220/349 (63%) RFE – 765/1221 (62.7%) 20/24
  • 21. RQ4: Correctly & Wrongly Classified Reports BUG RFE IMPR TEST DOC BUILD CLEANUP REFAC BUG 2631 48 119 26 23 8 8 1 RFE 139 765 223 6 13 7 13 31 IMPR 320 214 658 8 12 13 16 19 TEST 84 12 15 220 1 8 4 3 DOC 95 39 37 0 209 13 17 2 BUILD 29 17 19 11 10 127 5 1 CLEANUP 58 30 42 6 11 5 104 12 REFAC 20 51 61 1 2 0 16 91 Predicted Labels GroundTruthLabels 21/24
  • 22. RQ5: Comparison with Other Algorithms Approach HTTPClient Jackrabbit Lucene-Java Prec Rec WF1 Prec Rec WF1 Prec Rec WF1 Ours (LibSVM) 0.61 0.63 0.60 0.71 0.72 0.71 0.62 0.63 0.62 Naïve Bayes 0.49 0.47 0.48 0.51 0.39 0.43 0.46 0.37 0.40 NB Multinomial 0.53 0.60 0.54 0.64 0.66 0.61 0.60 0.59 0.56 K-Nearest Neighbors 0.47 0.29 0.34 0.60 0.58 0.59 0.46 0.40 0.42 Random Forest 0.45 0.56 0.46 0.54 0.58 0.53 0.45 0.48 0.43 RBF Network 0.37 0.39 0.37 0.39 0.41 0.40 0.31 0.31 0.30 22/24
  • 23. RQ5: Comparison with Other Algorithms Approach Rhino Tomcat5 Prec Rec WF1 Prec Rec WF1 Ours (LibSVM) 0.58 0.61 0.57 0.58 0.62 0.58 Naïve Bayes 0.51 0.51 0.51 0.48 0.40 0.42 NB Multinomial 0.52 0.58 0.49 0.51 0.58 0.47 K-Nearest Neighbors 0.50 0.43 0.43 0.43 0.43 0.42 Random Forest 0.51 0.56 0.47 0.45 0.56 0.46 RBF Network 0.40 0.43 0.41 0.33 0.54 0.39 23/24
  • 24. Conclusion & Future Work Automated approach to reclassify issue reports Evaluate over 7000 issue reports Extract features such as TF-IDF, Reported category, Exception trace, Issue reporter Perform multi-class classification (13 Categories) F-Measure Score 0.57-0.71 Improvement of 28.88% - 414.66% over baselines Future Work:  Analyse more issue reports  Design advanced multi-class solution 24/24