BUG OR NOT? BUG REPORT
CLASSIFICATION USING
N-GRAM IDF
Pannavat Terdchanakul, Hideaki Hata,
Passakorn Phannachitta, Kenichi Matsumoto
Software Engineering Laboratory
Nara Institute of Science and Technology
Bug report misclassification is a problem, which an issue report
classified as a bug, but actually is a non-bug [1].
§ An issue is classified as a bug. However, its description imply that it not
related to defection of the software (i.e. request for new feature).
§ Example: Lucene-2074 (https://issues.apache.org/jira/browse/LUCENE-2074)
2
[1]. Antoniol et al. "Is it a bug or an enhancement?: a text-based approach to classify change requests." CASCON 2008
Bug report misclassification is considered as a serious problem.
Could lead bug reports
research to produce an
unreliable result
Manual Inspection is
difficult and needs a lot
of effort to be done [2].
3
[2]. Herzig et al."It's not a bug, it's a feature: how misclassification impacts bug prediction." ICSE 2013
Several studies have attempted to tackle the misclassification of
bug problems.
§ Herzig et al. [2] spent 90 days to manually classify over 7,000 bug reports.
They found that about one-third of reports are actually not the bugs.
§ Antoniol et al. [1] proposed a word-based automatic classification technique.
§ Pingclasai et al. [3] proposed classification models based on one of topic
modeling techniques; Latent Dirichlet Allocation (LDA).
§ Limsettho et al. [4] also proposed classfication models based on other topic
modeling techniques; Hierarchical Dirichlet Process (HDP).
§ Zhou et al. [5] proposed a hybrid approach by combining both text mining and
data mining techniques via a technique called data grafting.
In this work, we propose to apply an alternative technique to
classify bug reports, namely, N-Gram IDF, a theoretical extension
of Inverse Document Frequency (IDF)
4
[1]. Antoniol et al. "Is it a bug or an enhancement?: a text-based approach to classify change requests." CASCON 2008
[2]. Herzig et al."It's not a bug, it's a feature: how misclassification impacts bug prediction." ICSE 2013
[3]. Pingclasai et al.”Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling." APSEC 2013
[4]. Limsettho et al. “Comparing Hierarchical Dirichlet Process With Latent Dirichlet Allocation in Bug Report Multiclass Classification.” SNPD 2014
[5]. Zhou et al. “Combining Text Mining and Data Mining for Bug Report Classification” Journal of Software: Evolution and Process 28.3(2016)
N-Gram IDF is a theoretical extension of IDF for handling word
or phrase of any length [6].
§ N-gram IDF utilizes enchant suffix array [7] to enumerate all valid N-grams.
§ In this study, we apply N-gram IDF to extract the useful N-grams along with
document frequency of N-grams from the corpus of bug reports. Useful N-
grams are used as features for classification model.
5
[6]. Shirakawa et al. "N-gram IDF: A Global Term Weighting Scheme Based on Information Distance”, WWW 2015
[7]. Abouelhoda et al. “Replacing suffix trees with enhanced suffix array” JDA, vol. 2, Mar 2004
N-Gram IDF utilize enchant suffix array to
get N-grams (Figure based from [6]).
To construct classification model, firstly, we pre-process the bug
reports. We then apply N-gram IDF the corpus. The output is a
list of valid N-gram terms.
6
Bug Reports
Pre-processed bug reports
Pre-Processing
Pre-processed bug reports
Applying N-Gram IDF
N-grams Dictionary
We then count the raw frequency of each N-gram word and
keep the value as feature vectors. Then we use the vectors as
inputs to train our classification model.
7
Pre-processed bug reports
N-grams Dictionary
Feature Vectors
Feature Vectors
Feature Vector
Pre-Processing
Building
Classifier
Models
Feature Extraction
The dataset are three open-source software projects that use
JIRA as an issue tracking system.
§ The dataset we gathered are three open-source software that come from
previous study [2].`
Bug# Non-Bug# Total#
305 440 745
Bug# Non-Bug# Total#
938 1464 2402
Bug# Non-Bug# Total#
697 1746 2443
Bug# Non-Bug# Total#
1964 3650 5590
8
[2]. Herzig et al."It's not a bug, it's a feature: how misclassification impacts bug prediction." ICSE 2013
We evaluate the performance of proposed model by comparing
with classification model created by using topic modeling
technique.
Text Processing Technique
1. N-Gram IDF
2. Topic Modeling (LDA) [3]
Testing Environments
1. 10-fold cross validation
2. Training & Testing Dataset
Evaluation Metric
F-measure Score
9[3]. Pingclasai et al.”Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling." APSEC 2013
On 10-fold cross validation setup, N-Gram IDF model
outperforms topic modeling model in all of evaluated cases.
Environment: 10-fold cross validation, Random Forest
0.721 0.717
0.756
0.712
0.814
0.771
0.823
0.792
0.5
0.6
0.7
0.8
0.9
HTTPClient Jackrabbit Lucene All Projects
F-measure score
Topic Modeling N-Gram IDF
10
N-Gram IDF also outperforms topic modeling in all of
evaluated cases on training-testing setup.
11
0.494 0.514
0.566
0.542
0.673
0.628
0.685 0.684
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
HTTPClient Jackrabbit Lucene All Projects
F-measure score
Topic Modeling N-gram IDF
Environment: Training-Testing Dataset, Random Forest
To assess the results, we conducted 1,000 runs of random forest
for both N-Gram IDF and Topic modeling
The differences are statistically significant (p-value < 0.001)
HTTPClient Jackrabbit
Lucene All Projects
12
We found that N-Gram IDF can extract the terms that vary in
both contexts and lengths. These terms can be used as the
features that contribute to the classification model.
§ HTTPClient-512
§ JCR-1437
§ JCR-119
13
Bug
Non
Bug
Bug
14
15
16
17

Bug or Not? Bug Report Classification using N-Gram Idf

  • 1.
    BUG OR NOT?BUG REPORT CLASSIFICATION USING N-GRAM IDF Pannavat Terdchanakul, Hideaki Hata, Passakorn Phannachitta, Kenichi Matsumoto Software Engineering Laboratory Nara Institute of Science and Technology
  • 2.
    Bug report misclassificationis a problem, which an issue report classified as a bug, but actually is a non-bug [1]. § An issue is classified as a bug. However, its description imply that it not related to defection of the software (i.e. request for new feature). § Example: Lucene-2074 (https://issues.apache.org/jira/browse/LUCENE-2074) 2 [1]. Antoniol et al. "Is it a bug or an enhancement?: a text-based approach to classify change requests." CASCON 2008
  • 3.
    Bug report misclassificationis considered as a serious problem. Could lead bug reports research to produce an unreliable result Manual Inspection is difficult and needs a lot of effort to be done [2]. 3 [2]. Herzig et al."It's not a bug, it's a feature: how misclassification impacts bug prediction." ICSE 2013
  • 4.
    Several studies haveattempted to tackle the misclassification of bug problems. § Herzig et al. [2] spent 90 days to manually classify over 7,000 bug reports. They found that about one-third of reports are actually not the bugs. § Antoniol et al. [1] proposed a word-based automatic classification technique. § Pingclasai et al. [3] proposed classification models based on one of topic modeling techniques; Latent Dirichlet Allocation (LDA). § Limsettho et al. [4] also proposed classfication models based on other topic modeling techniques; Hierarchical Dirichlet Process (HDP). § Zhou et al. [5] proposed a hybrid approach by combining both text mining and data mining techniques via a technique called data grafting. In this work, we propose to apply an alternative technique to classify bug reports, namely, N-Gram IDF, a theoretical extension of Inverse Document Frequency (IDF) 4 [1]. Antoniol et al. "Is it a bug or an enhancement?: a text-based approach to classify change requests." CASCON 2008 [2]. Herzig et al."It's not a bug, it's a feature: how misclassification impacts bug prediction." ICSE 2013 [3]. Pingclasai et al.”Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling." APSEC 2013 [4]. Limsettho et al. “Comparing Hierarchical Dirichlet Process With Latent Dirichlet Allocation in Bug Report Multiclass Classification.” SNPD 2014 [5]. Zhou et al. “Combining Text Mining and Data Mining for Bug Report Classification” Journal of Software: Evolution and Process 28.3(2016)
  • 5.
    N-Gram IDF isa theoretical extension of IDF for handling word or phrase of any length [6]. § N-gram IDF utilizes enchant suffix array [7] to enumerate all valid N-grams. § In this study, we apply N-gram IDF to extract the useful N-grams along with document frequency of N-grams from the corpus of bug reports. Useful N- grams are used as features for classification model. 5 [6]. Shirakawa et al. "N-gram IDF: A Global Term Weighting Scheme Based on Information Distance”, WWW 2015 [7]. Abouelhoda et al. “Replacing suffix trees with enhanced suffix array” JDA, vol. 2, Mar 2004 N-Gram IDF utilize enchant suffix array to get N-grams (Figure based from [6]).
  • 6.
    To construct classificationmodel, firstly, we pre-process the bug reports. We then apply N-gram IDF the corpus. The output is a list of valid N-gram terms. 6 Bug Reports Pre-processed bug reports Pre-Processing Pre-processed bug reports Applying N-Gram IDF N-grams Dictionary
  • 7.
    We then countthe raw frequency of each N-gram word and keep the value as feature vectors. Then we use the vectors as inputs to train our classification model. 7 Pre-processed bug reports N-grams Dictionary Feature Vectors Feature Vectors Feature Vector Pre-Processing Building Classifier Models Feature Extraction
  • 8.
    The dataset arethree open-source software projects that use JIRA as an issue tracking system. § The dataset we gathered are three open-source software that come from previous study [2].` Bug# Non-Bug# Total# 305 440 745 Bug# Non-Bug# Total# 938 1464 2402 Bug# Non-Bug# Total# 697 1746 2443 Bug# Non-Bug# Total# 1964 3650 5590 8 [2]. Herzig et al."It's not a bug, it's a feature: how misclassification impacts bug prediction." ICSE 2013
  • 9.
    We evaluate theperformance of proposed model by comparing with classification model created by using topic modeling technique. Text Processing Technique 1. N-Gram IDF 2. Topic Modeling (LDA) [3] Testing Environments 1. 10-fold cross validation 2. Training & Testing Dataset Evaluation Metric F-measure Score 9[3]. Pingclasai et al.”Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling." APSEC 2013
  • 10.
    On 10-fold crossvalidation setup, N-Gram IDF model outperforms topic modeling model in all of evaluated cases. Environment: 10-fold cross validation, Random Forest 0.721 0.717 0.756 0.712 0.814 0.771 0.823 0.792 0.5 0.6 0.7 0.8 0.9 HTTPClient Jackrabbit Lucene All Projects F-measure score Topic Modeling N-Gram IDF 10
  • 11.
    N-Gram IDF alsooutperforms topic modeling in all of evaluated cases on training-testing setup. 11 0.494 0.514 0.566 0.542 0.673 0.628 0.685 0.684 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 HTTPClient Jackrabbit Lucene All Projects F-measure score Topic Modeling N-gram IDF Environment: Training-Testing Dataset, Random Forest
  • 12.
    To assess theresults, we conducted 1,000 runs of random forest for both N-Gram IDF and Topic modeling The differences are statistically significant (p-value < 0.001) HTTPClient Jackrabbit Lucene All Projects 12
  • 13.
    We found thatN-Gram IDF can extract the terms that vary in both contexts and lengths. These terms can be used as the features that contribute to the classification model. § HTTPClient-512 § JCR-1437 § JCR-119 13 Bug Non Bug Bug
  • 14.
  • 15.
  • 16.
  • 17.