Bug or Not? Bug Report Classification using N-Gram Idf

BUG OR NOT? BUG REPORT
CLASSIFICATION USING
N-GRAM IDF
Pannavat Terdchanakul, Hideaki Hata,
Passakorn Phannachitta, Kenichi Matsumoto
Software Engineering Laboratory
Nara Institute of Science and Technology

Bug report misclassification is a problem, which an issue report
classified as a bug, but actually is a non-bug [1].
§ An issue is classified as a bug. However, its description imply that it not
related to defection of the software (i.e. request for new feature).
§ Example: Lucene-2074 (https://issues.apache.org/jira/browse/LUCENE-2074)
2
[1]. Antoniol et al. "Is it a bug or an enhancement?: a text-based approach to classify change requests." CASCON 2008

Bug report misclassification is considered as a serious problem.
Could lead bug reports
research to produce an
unreliable result
Manual Inspection is
difficult and needs a lot
of effort to be done [2].
3
[2]. Herzig et al."It's not a bug, it's a feature: how misclassification impacts bug prediction." ICSE 2013

Several studies have attempted to tackle the misclassification of
bug problems.
§ Herzig et al. [2] spent 90 days to manually classify over 7,000 bug reports.
They found that about one-third of reports are actually not the bugs.
§ Antoniol et al. [1] proposed a word-based automatic classification technique.
§ Pingclasai et al. [3] proposed classification models based on one of topic
modeling techniques; Latent Dirichlet Allocation (LDA).
§ Limsettho et al. [4] also proposed classfication models based on other topic
modeling techniques; Hierarchical Dirichlet Process (HDP).
§ Zhou et al. [5] proposed a hybrid approach by combining both text mining and
data mining techniques via a technique called data grafting.
In this work, we propose to apply an alternative technique to
classify bug reports, namely, N-Gram IDF, a theoretical extension
of Inverse Document Frequency (IDF)
4
[1]. Antoniol et al. "Is it a bug or an enhancement?: a text-based approach to classify change requests." CASCON 2008
[3]. Pingclasai et al.”Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling." APSEC 2013
[4]. Limsettho et al. “Comparing Hierarchical Dirichlet Process With Latent Dirichlet Allocation in Bug Report Multiclass Classification.” SNPD 2014
[5]. Zhou et al. “Combining Text Mining and Data Mining for Bug Report Classification” Journal of Software: Evolution and Process 28.3(2016)

N-Gram IDF is a theoretical extension of IDF for handling word
or phrase of any length [6].
§ N-gram IDF utilizes enchant suffix array [7] to enumerate all valid N-grams.
§ In this study, we apply N-gram IDF to extract the useful N-grams along with
document frequency of N-grams from the corpus of bug reports. Useful N-
grams are used as features for classification model.
5
[6]. Shirakawa et al. "N-gram IDF: A Global Term Weighting Scheme Based on Information Distance”, WWW 2015
[7]. Abouelhoda et al. “Replacing suffix trees with enhanced suffix array” JDA, vol. 2, Mar 2004
N-Gram IDF utilize enchant suffix array to
get N-grams (Figure based from [6]).

To construct classification model, firstly, we pre-process the bug
reports. We then apply N-gram IDF the corpus. The output is a
list of valid N-gram terms.
6
Bug Reports
Pre-processed bug reports
Pre-Processing
Applying N-Gram IDF
N-grams Dictionary

We then count the raw frequency of each N-gram word and
keep the value as feature vectors. Then we use the vectors as
inputs to train our classification model.
7
N-grams Dictionary
Feature Vectors
Feature Vectors
Feature Vector
Pre-Processing
Building
Classifier
Models
Feature Extraction

The dataset are three open-source software projects that use
JIRA as an issue tracking system.
§ The dataset we gathered are three open-source software that come from
previous study [2].`
Bug# Non-Bug# Total#
305 440 745
938 1464 2402
697 1746 2443
1964 3650 5590
8

We evaluate the performance of proposed model by comparing
with classification model created by using topic modeling
technique.
Text Processing Technique
1. N-Gram IDF
2. Topic Modeling (LDA) [3]
Testing Environments
1. 10-fold cross validation
2. Training & Testing Dataset
Evaluation Metric
F-measure Score
9[3]. Pingclasai et al.”Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling." APSEC 2013

On 10-fold cross validation setup, N-Gram IDF model
outperforms topic modeling model in all of evaluated cases.
Environment: 10-fold cross validation, Random Forest
0.721 0.717
0.756
0.712
0.814
0.771
0.823
0.792
0.5
0.6
0.7
0.8
0.9
HTTPClient Jackrabbit Lucene All Projects
F-measure score
Topic Modeling N-Gram IDF
10

N-Gram IDF also outperforms topic modeling in all of
evaluated cases on training-testing setup.
11
0.494 0.514
0.566
0.542
0.673
0.628
0.685 0.684
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
HTTPClient Jackrabbit Lucene All Projects
F-measure score
Topic Modeling N-gram IDF
Environment: Training-Testing Dataset, Random Forest

To assess the results, we conducted 1,000 runs of random forest
for both N-Gram IDF and Topic modeling
The differences are statistically significant (p-value < 0.001)
HTTPClient Jackrabbit
Lucene All Projects
12

We found that N-Gram IDF can extract the terms that vary in
both contexts and lengths. These terms can be used as the
features that contribute to the classification model.
§ HTTPClient-512
§ JCR-1437
§ JCR-119
13
Bug
Non
Bug
Bug

Bug or Not? Bug Report Classification using N-Gram Idf

More Related Content

What's hot

Similar to Bug or Not? Bug Report Classification using N-Gram Idf

More from Hideaki Hata

Recently uploaded

Bug or Not? Bug Report Classification using N-Gram Idf