SlideShare a Scribd company logo
BushraDBR: An Automatic Approach
to Retrieving Duplicate Bug Reports
Ra'Fat Al-Msie'deen
Department of Software Engineering, Faculty of IT, Mutah University, Mutah 61710, Karak,
Jordan
E-mail address: rafatalmsiedeen@mutah.edu.jo
https://rafat66.github.io/Al-Msie-Deen/
Citation
• R. Al-Msie’deen, “BushraDBR: An Automatic Approach to Retrieving
Duplicate Bug Reports,” International Journal of Computing and
Digital Systems, vol. 14, no. 2, pp. 1-18, 2023.
Keywords
Keywords: Software engineering, Software maintenance, Duplicate bug
reports, Formal concept analysis, Latent semantic indexing, Bug
tracking system, Bug report.
Abstract
A Bug Tracking System (BTS), such as Bugzilla,
is generally utilized to track submitted Bug
Reports (BRs) for a particular software system.
Duplicate Bug Report (DBR) retrieval is the
process of obtaining a DBR in the BTS. This
process is important to avoid needless work
from engineers on DBRs. To prevent wasting
engineer resources, such as effort and time, on
previously submitted (or duplicate) BRs, it is
essential to find and retrieve DBRs as soon as
they are submitted by software users.
Abstract …
Thus, this paper proposes an automatic approach
(called BushraDBR) that aims to assist an
engineer (called a triager) to retrieve DBRs and
stop the duplicates before they start. Where
BushraDBR stands for Bushra Duplicate Bug
Reports retrieval process. Therefore, when a new
BR is sent to the Bug Repository (BRE), an
engineer checks whether it is a duplicate of an
existing BR in BRE or not via BushraDBR approach.
If it is, the engineer marks it as DBR, and the BR is
excluded from consideration for any additional
work; otherwise, the BR is added to the BRE.
BushraDBR approach relies on Textual Similarity
(TS) between the newly submitted BR and the rest
of the BRs in BRE to retrieve DBRs.
Abstract …
BushraDBR exploits unstructured data from BRs
to apply Information Retrieval (IR) methods in
an efficient way. BushraDBR approach uses two
techniques to retrieve DBRs: Latent Semantic
Indexing (LSI) and Formal Concept Analysis
(FCA). The originality of BushraDBR is to stop
DBRs before they occur by comparing the newly
reported BR with the rest of the BRs in the BTS,
thus saving time and effort during the Software
Maintenance (SM) process. BushraDBR also
uniquely retrieves DBR through the use of LSI and
FCA techniques. BushraDBR approach had been
validated and evaluated on several publicly
available data sets from Bugzilla. Experiments
show the ability of BushraDBR approach to
retrieve DBRs in an efficient and accurate
manner.
BushraDBR
Approach
List of Abbreviations
• Bug Tracking System (BTS)
• Bug Reports (BRs)
• Duplicate Bug Report (DBR)
• Bug Repository (BRE)
• Information Retrieval (IR)
• Latent Semantic Indexing (LSI)
• Software Maintenance (SM)
• Formal Concept Analysis (FCA)
• Master BRs set (MRs)
Vector Space Model (VSM)
Deep Learning (DL)
Machine learning (ML)
Convolutional Neural Network (CNN)
New BR (Nr)
Natural Language Processing (NLP)
Quality Assurance (QA)
Drawing Shapes Application (DSA)
Bushra Duplicate Bug Reports retrieval process (BushraDBR)
Cosine Similarity Matrix (CSM)
Cosine Similarity (CS)
Term-Query Matrix (TQM)
Term Document Matrix (TDM)
Textual Similarity (TS)
List of DBRs (LDRs)
Binary Formal Context (BFC)
Singular Value Decomposition (SVD)
Introduction
 In this work, reports that are textually similar to each other are called DBRs.
 TS between BRs is good evidence that they describe the same (or similar) issue.
 Frequently, DBRs are reported by multiple software users.
 These users come from different backgrounds and use different vocabularies to describe
the same (or similar) software bug.
 The important role of the triager is to check the reported BRs for any possible duplicates
before sending them to the BRE.
 A manual check of submitted BRs is a difficult task due to the huge number of BRs
reported every day.
 On the other hand, retrieving DBR after storing it in the BRE is a tedious and expensive task
for software developers.
 So, the process of stopping DBRs before they start is important and very useful.
Introduction …
• The novelty of this paper is that it proposes a textual-based
approach (i.e., an IR-based solution) to stopping DBRs before
they start.
• BushraDBR prevents DBRs by continuously checking the
recently reported or submitted BR against the BRs stored in the
BTS.
• In the event that the submitted BR is textually similar to any of
the BRs inside the BTS, the triager will exclude this BR from any
further work and not include it in the BTS.
Figure 1. An
example of
a BR from
drawing
shapes
application.
“Bug details”
Bug ID: 000001.
Product: Drawing shapes App.
Submitted on: Sept. 12, 2022.
Summary: The PDF viewer is very slow for line drawings.
Component: Layout.
Type: defect.
Severity: S4.
Priority: P5.
Status: Assigned.
“Bug description”
Actual results:
PDF is very slow to load. The text parts of the PDF appear fast, but
every line drawing is very slow.
Expected results:
PDFs with line drawings should show up a lot faster without any delay.
TABLE I. A pair
of DBRs from
Bugzilla (i.e.,
core product).
Figure 2. The workflow
for retrieving DBRs via
BushraDBR approach.
BR_001: …
Bug repository
BushraDBR
approach
Similarity
BR documents
BR_002: …
BR_003: …
…
BR_n: …
BR_001: 0.02
BR_002: 0.58
BR_003: 0.98
…
BR_n: 0.45
Duplicate
Nr_Bug-ID
BR _003
Duplicate or
non-duplicate
New bug report (Nr)
Nr_Bug-ID
Query
LSI
Test duplicate bug reports (DBRs)

Nr
FCA
Figure 3. The key
elements of
BurshraDBR
approach.
New bug report (Nr)
BR documents from BTS
Duplicate BR
Non-duplicate BR
Threshold = 80
LSI
FCA
OUTPUT
DATA SETS
EVALUATION METRICS Precision
F-measure
Recall
WebPayments
UI
Firefo
x
Symbolication &
Eliot
DOM: Editor
Audio/Video
Core
Drawing Shapes App
TECHNIQUES
BurshraDBR
approach
TABLE II. Structured
and unstructured
information that is
leveraged by the
selected approaches
(i.e., survey).
TABLE III. Data
sets that are
utilized by the
selected
approaches
(i.e., survey).
TABLE IV. Review and classification of current studies relevant for DBR retrieval approaches.
Figure 4. DBRs describe the
same software failure and use
similar vocabularies.
Bug Report X
Bug Report Y
Bug X Bug Tracking
System (BTS)
Triager or
developer
BushraDBR
approach
Non-duplicate
Duplicate
Master (or original) bug report
Software
user
Submit
Software
failure X
Nr
Similar
vocabularies
Figure 5. The DBR
retrieval process -
BushraDBR approach.
Master bug reports
set (MRs)
List of duplicate bug reports (LDRs)
Repository
BushraDBR approach
Non-duplicate
Duplicate
No
Yes
BR documents
Similarity
value >=
threshold
(Thr)
Insert into
MRs
Insert into
LDRs
New bug
report (Nr)
Query (Nr)
document
Inputs
Latent Semantic Indexing
Formal Concept Analysis
Preprocessing
Figure 6. An example of a bug
document generated by BushraDBR
approach.
Bug Document
BushraDBR
Bug ID: 000007
Incapable of using the
mouse wheel to scroll on an
art site.
Scrolling the contents of
the art site by clicking
the scroll wheel of the
mouse device does not
always work well.
Algorithm 1:
Preprocessing
of BRs.
Figure 7. An
example of a
pre-processed
bug
document.
Bug Document
BushraDBR
Bug ID: 000007
incapable
use
mouse
wheel
scroll
art
site
scroll
content
art
site
click
scroll
wheel
mouse
device
always
work
well
1. Extracting
bug content
2. Splitting
words or
tokenization
3. Removing
stop words
4. Stemming
Pre-processed document
TABLE V. Bug reports from the drawing shapes application data set.
TABLE VI. BushraDBR preprocessing steps, explanation, and a real-world example
from the DSA data set.
Figure 8. Retrieving DBRs
using LSI and FCA
techniques — the
BushraDBR approach.
BushraDBR approach
Latent Semantic Indexing
2
Formal Concept Analysis
3
Extracting
Similarity matrix (SM)
SM as binary formal context
Nr
BR-1 BR-2 BR-N
0.01 0.93 0.07
Concept_0
BR-2
Nr
Nr
BRs
Nr
BR-1 BR-2 BR-N
0 1 0
Nr
BR-1 BR-2 BR-N
x
The AOC-poset
Formal context
DBRs
Formal
concept
..., of the myOval()
and … displayed, …
…, of, the, my, oval,
and, ..., displayed, ..
.., oval, .., displayed,
…
…, oval, …, display, ..
Splitting
Preprocessing
1
The intent
of concept
The extent of concept
Stop words Stemming
Algorithm 2:
Measuring
TS between
BRs via LSI.
Measuring TS between BRs via LSI
1
2
3
4
Figure 9. The TS
values between
Nr and BRs as a
directed graph.
Algorithm 3: Retrieving DBRs using FCA.
Retrieving DBRs using FCA
Figure 10. The AOC-poset for the
DSA data set.
1
2
3
4
Figure 10. The AOC-
poset for DSA data
set.
Formal
concept
analysis
Concept_0
1819151
1819152
Drawing Shapes
Application
(DSA)
Symbolication
& General -
Eliot product
Concept_0
1490824
1510066
WebPayments UI
– Firefox product
(partial)
Concept_0
1545235
1545237
Audio/Video –
Core product
(partial)
DOM editor –
Core product
(partial)
Concept_0
176525
1841744
Concept_0
000006
000007
The extent of
Concept_0
The intent of
Concept_0
The
concept ID
Formal concept
Concept_0
000006
000007
Formal concept analysis
Master bug reports set (MRs)
List of duplicate
bug reports
(LDRs)
Repository
BushraDBR approach
Non-duplicate
Duplicate
No
Yes
Preprocessing
Latent Semantic Indexing
1
2
Formal Concept Analysis
3
Extracting Splitting
Stop words Stemming
Similarity matrix (SM)
SM as binary formal context
BR documents
Similarity
value >=
threshold
(Thr)
Nr
BR-1 BR-2 BR-N
0.01 0.93 0.07
Nr
BR-1 BR-2 BR-N
0 1 0
Insert into MRs
Insert into
LDRs
Concept_0
BR-2
Nr
New bug
report (Nr)
Query (Nr)
document
Figure 9. The TS
values between Nr
and BRs as a directed
graph.
000007
000002
000003
000004
000005
000001
000006
New bug report
(Nr)
List of duplicate
bug reports (LDRs)
BushraDBR approach Non-duplicate
Duplicate
No
Yes
Preprocessing
Latent Semantic Indexing
1
2
Formal Concept Analysis
3
Extracting Splitting
Stop words Stemming
Similarity matrix (SM)
SM as binary formal context
Query (Nr) document
Similarity
value >=
threshold
(Thr)
Insert into LDRs
BushraDBR
approach
Experimentation
Master bug
reports set (MRs)
Repository
BR documents
data sets
Cosine similarity matrices
Figure 10. The AOC-poset for each data set in
the experiments: (A) DSA, (B) Eliot, (C)
WebPayments UI [partial], (D) Audio / Video
[partial], and (E) DOM editor [partial].
Concept_0
1819151
1819152
Concept_1
1768863
1814509
1729698
1741434
1801212
1812345
1815981
1707879
1815982
1811299
1801169
1811236
1475334
1649535
1745533
Concept_1
000005
000003
000004
000001
000002
Concept_0
1490824
1510066
Concept_1
1476344
1498447
1446577
1498225
1438784
1494439
1501447
1499837
1507623
1464356
1494559
1470197
1495151
.
.
Concept_0
1545235
1545237
Concept_1
1582074
1673285
1532646
1732199
1816175
1776641
1680362
1743870
1676015
1691996
1545237
1757124
1799132
.
.
Concept_1
1088194
201410
1036856
1276391
1327934
1723853
1840784
1710784
458524
460903
377297
1462368
1567160
.
.
Concept_0
176525
1841744
Concept_0
000006
000007
(A) (B) (C) (D) (E)
Conclusion
• This paper has introduced a novel
approach called BushraDBR targeted
at automatically retrieving DBRs
using LSI and FCA. BushraDBR aimed
to prevent developers from wasting
their resources, such as effort and
time, on previously submitted BRs.
The novelty of BushraDBR is that it
exploits textual data in BRs to apply
LSI and FCA techniques in an efficient
way to retrieve DBRs.
Preprocessing
Latent Semantic Indexing
1
2
Formal Concept Analysis
3
Extracting Splitting
Stop words Stemming
Similarity matrix (SM)
SM as binary formal context
Conclusion …
• BushraDBR prevents DBRs
before they occur by
comparing the newly reported
BR with the rest of the BRs in
the repository. The suggested
approach had been validated
and evaluated on different data
sets from Bugzilla. Experiments
show the capacity of
BushraDBR approach to
retrieve DBRs in an efficient
and accurate manner.
Future work
• Regarding BushraDBR's future work, the author
plans to extend the current approach by
developing an ML-based solution to retrieve DBRs
and prevent duplicates before they start. Also, he
plans to compare BushraDBR (i.e., an IR-based
approach) with current ML-based approaches.
Furthermore, additional empirical tests can be
conducted to verify BushraDBR approach using
open-source and industrial data sets. There is also
a necessary need to conduct a comprehensive
survey and make comparisons between all current
approaches relevant to DBR retrieval.
New bug
report (Nr)
List of duplicate
bug reports
(LDRs)
Non-duplicate
Duplicate
No
Yes
Query (Nr) document
Similarity value >= threshold (Thr)
Insert into
LDRs
References
[3] N. Jalbert and W. Weimer, “Automated duplicate detection for bug tracking systems,” in The 38th Annual IEEE / IFIP
International Conference on Dependable Systems and Networks, DSN 2008, June 24-27, 2008, Anchorage, Alaska, USA, Proceedings.
IEEE Computer Society, 2008, pp. 52–61. [Online]. Available: https://doi.org/10.1109/DSN.2008.4630070
[4] A. Hindle and C. Onuczko, “Preventing duplicate bug reports by continuously querying bug reports,” Empir. Softw. Eng., vol.
24, no. 2, pp. 902–936, 2019. [Online]. Available: https://doi.org/10.1007/s10664-018-9643-4
[11]A. Kukkar, R. Mohana, Y. Kumar, A. Nayyar, M. Bilal, and K. Kwak, “Duplicate bug report detection and classification system
based on deep learning technique,” IEEE Access, vol. 8, pp. 200 749–200 763, 2020. [Online]. Available:
https://doi.org/10.1109/ACCESS.2020.3033045
[17] X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun, “An approach to detecting duplicate bug reports using natural language and
execution information,” in 30th International Conference on Software Engineering (ICSE 2008), Leipzig, Germany, May 10-18,
2008, W. Sch¨afer, M. B. Dwyer, and V. Gruhn, Eds. ACM, 2008, pp. 461–470. [Online]. Available:
https://doi.org/10.1145/1368088.1368151
[36] C. Sun, D. Lo, X. Wang, J. Jiang, and S. Khoo, “A discriminative model approach for accurate duplicate bug report
retrieval,” in Proceedings of the 32nd ACM / IEEE International Conference on Software Engineering - Volume 1, ICSE 2010, Cape
Town, South Africa, 1-8 May 2010, J. Kramer, J. Bishop, P. T. Devanbu, and S. Uchitel, Eds. ACM, 2010, pp. 45–54. [Online].
Available: https://doi.org/10.1145/1806799.1806811
[37] C. Sun, D. Lo, S. Khoo, and J. Jiang, “Towards more accurate retrieval of duplicate bug reports,” in 26th IEEE / ACM
International Conference on Automated Software Engineering (ASE 2011), Lawrence, KS, USA, November 6-10, 2011, P. Alexander,
C. S. Pasareanu, and J. G. Hosking, Eds. IEEE Computer Society, 2011, pp. 253–262. [Online]. Available:
https://doi.org/10.1109/ASE.2011.6100061
[38] F. Thung, P. S. Kochhar, and D. Lo, “Dupfinder: integrated tool support for duplicate bug report detection,” in ACM / IEEE
International Conference on Automated Software Engineering, ASE ’14, Vasteras, Sweden - September 15 - 19, 2014, I. Crnkovic, M.
Chechik, and P. Gr¨unbacher, Eds. ACM, 2014, pp. 871–874. [Online]. Available: https://doi.org/10.1145/2642937.2648627
[39] J. He, L. Xu, M. Yan, X. Xia, and Y. Lei, “Duplicate bug report detection using dual-channel convolutional neural
networks,” in ICPC ’20: 28th International Conference on Program Comprehension, Seoul, Republic of Korea, July 13-15, 2020.
ACM, 2020, pp. 117–127. [Online]. Available: https://doi.org/10.1145 /3387904.3389263
[40] P. Runeson, M. Alexandersson, and O. Nyholm, “Detection of duplicate defect reports using natural language processing,” in
29th International Conference on Software Engineering (ICSE 2007), Minneapolis, MN, USA, May 20-26, 2007. IEEE Computer
Society, 2007, pp. 499–510. [Online]. Available: https://doi.org/10.1109/ICSE.2007.32
[44] M. S. Rakha, C. Bezemer, and A. E. Hassan, “Revisiting the performance evaluation of automated approaches for the
retrieval of duplicate issue reports,” IEEE Trans. Software Eng., vol. 44, no. 12, pp. 1245–1268, 2018. [Online]. Available:
https://doi.org/10.1109/TSE.2017.2755005
[47] B. S. Neysiani and S. Morteza Babamir, “Automatic duplicate bug report detection using information retrieval-based versus
machine learning-based approaches,” in 2020 6th International Conference on Web Research (ICWR), April 2020, pp. 288–293.
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug
Reports
BushraDBR: An Automatic Approach
to Retrieving Duplicate Bug Reports
Ra'Fat Al-Msie'deen
Department of Software Engineering, Faculty of IT, Mutah University, Mutah 61710, Karak,
Jordan
E-mail address: rafatalmsiedeen@mutah.edu.jo
https://rafat66.github.io/Al-Msie-Deen/

More Related Content

Similar to BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports

Mi0034 – database management system
Mi0034 – database management systemMi0034 – database management system
Mi0034 – database management system
smumbahelp
 
realtime system.docx
realtime system.docxrealtime system.docx
realtime system.docx
Pankaj Mudriwar
 
Software Effort Measurement Using Abstraction Techniques
Software Effort Measurement Using Abstraction TechniquesSoftware Effort Measurement Using Abstraction Techniques
Software Effort Measurement Using Abstraction Techniquesaliraza786
 
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSUSING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
HCL Technologies
 
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
Named Entity Recognition (NER) Using Automatic Summarization of ResumesNamed Entity Recognition (NER) Using Automatic Summarization of Resumes
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
IRJET Journal
 
Machine Learning Applications in Grid Computing
Machine Learning Applications in Grid ComputingMachine Learning Applications in Grid Computing
Machine Learning Applications in Grid Computingbutest
 
CosTriage: A Cost-Aware Algorithm for Bug Reporting Systems (AAAI 2011)
CosTriage: A Cost-Aware Algorithm for Bug Reporting Systems (AAAI 2011)CosTriage: A Cost-Aware Algorithm for Bug Reporting Systems (AAAI 2011)
CosTriage: A Cost-Aware Algorithm for Bug Reporting Systems (AAAI 2011)Sung Kim
 
Program for the 3rd IC Applications Show & Tell
Program for the 3rd IC Applications Show & TellProgram for the 3rd IC Applications Show & Tell
Program for the 3rd IC Applications Show & Tellnihshowandtell
 
IJET-V2I6P28
IJET-V2I6P28IJET-V2I6P28
As35252256
As35252256As35252256
As35252256
IJERA Editor
 
Optimizing content based image retrieval in p2 p systems
Optimizing content based image retrieval in p2 p systemsOptimizing content based image retrieval in p2 p systems
Optimizing content based image retrieval in p2 p systems
eSAT Publishing House
 
Bird binary interpretation using runtime disassembly
Bird binary interpretation using runtime disassemblyBird binary interpretation using runtime disassembly
Bird binary interpretation using runtime disassemblyUltraUploader
 
Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...
eSAT Journals
 
I0935053
I0935053I0935053
I0935053
IOSR Journals
 
Multibiometric Secure Index Value Code Generation for Authentication and Retr...
Multibiometric Secure Index Value Code Generation for Authentication and Retr...Multibiometric Secure Index Value Code Generation for Authentication and Retr...
Multibiometric Secure Index Value Code Generation for Authentication and Retr...
ijsrd.com
 
Lecture 02 architecture of dbms
Lecture 02 architecture of dbmsLecture 02 architecture of dbms
Lecture 02 architecture of dbmsrupalidhir
 
A study on cloud computing ppt n_24-12-2017
A study on cloud computing ppt n_24-12-2017A study on cloud computing ppt n_24-12-2017
A study on cloud computing ppt n_24-12-2017
Manish K Patel
 

Similar to BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports (20)

Mi0034 – database management system
Mi0034 – database management systemMi0034 – database management system
Mi0034 – database management system
 
realtime system.docx
realtime system.docxrealtime system.docx
realtime system.docx
 
Software Effort Measurement Using Abstraction Techniques
Software Effort Measurement Using Abstraction TechniquesSoftware Effort Measurement Using Abstraction Techniques
Software Effort Measurement Using Abstraction Techniques
 
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSUSING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
 
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
Named Entity Recognition (NER) Using Automatic Summarization of ResumesNamed Entity Recognition (NER) Using Automatic Summarization of Resumes
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
 
Machine Learning Applications in Grid Computing
Machine Learning Applications in Grid ComputingMachine Learning Applications in Grid Computing
Machine Learning Applications in Grid Computing
 
Bapi programming
Bapi programmingBapi programming
Bapi programming
 
CosTriage: A Cost-Aware Algorithm for Bug Reporting Systems (AAAI 2011)
CosTriage: A Cost-Aware Algorithm for Bug Reporting Systems (AAAI 2011)CosTriage: A Cost-Aware Algorithm for Bug Reporting Systems (AAAI 2011)
CosTriage: A Cost-Aware Algorithm for Bug Reporting Systems (AAAI 2011)
 
Program for the 3rd IC Applications Show & Tell
Program for the 3rd IC Applications Show & TellProgram for the 3rd IC Applications Show & Tell
Program for the 3rd IC Applications Show & Tell
 
IJET-V2I6P28
IJET-V2I6P28IJET-V2I6P28
IJET-V2I6P28
 
As35252256
As35252256As35252256
As35252256
 
Optimizing content based image retrieval in p2 p systems
Optimizing content based image retrieval in p2 p systemsOptimizing content based image retrieval in p2 p systems
Optimizing content based image retrieval in p2 p systems
 
Bird binary interpretation using runtime disassembly
Bird binary interpretation using runtime disassemblyBird binary interpretation using runtime disassembly
Bird binary interpretation using runtime disassembly
 
Bd25328332
Bd25328332Bd25328332
Bd25328332
 
Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...
 
I0935053
I0935053I0935053
I0935053
 
Multibiometric Secure Index Value Code Generation for Authentication and Retr...
Multibiometric Secure Index Value Code Generation for Authentication and Retr...Multibiometric Secure Index Value Code Generation for Authentication and Retr...
Multibiometric Secure Index Value Code Generation for Authentication and Retr...
 
Lecture 02 architecture of dbms
Lecture 02 architecture of dbmsLecture 02 architecture of dbms
Lecture 02 architecture of dbms
 
A study on cloud computing ppt n_24-12-2017
A study on cloud computing ppt n_24-12-2017A study on cloud computing ppt n_24-12-2017
A study on cloud computing ppt n_24-12-2017
 
disertation
disertationdisertation
disertation
 

More from Ra'Fat Al-Msie'deen

SoftCloud: A Tool for Visualizing Software Artifacts as Tag Clouds.pdf
SoftCloud: A Tool for Visualizing Software Artifacts as Tag Clouds.pdfSoftCloud: A Tool for Visualizing Software Artifacts as Tag Clouds.pdf
SoftCloud: A Tool for Visualizing Software Artifacts as Tag Clouds.pdf
Ra'Fat Al-Msie'deen
 
Software evolution understanding: Automatic extraction of software identifier...
Software evolution understanding: Automatic extraction of software identifier...Software evolution understanding: Automatic extraction of software identifier...
Software evolution understanding: Automatic extraction of software identifier...
Ra'Fat Al-Msie'deen
 
Source Code Summarization
Source Code SummarizationSource Code Summarization
Source Code Summarization
Ra'Fat Al-Msie'deen
 
FeatureClouds: Naming the Identified Feature Implementation Blocks from Softw...
FeatureClouds: Naming the Identified Feature Implementation Blocks from Softw...FeatureClouds: Naming the Identified Feature Implementation Blocks from Softw...
FeatureClouds: Naming the Identified Feature Implementation Blocks from Softw...
Ra'Fat Al-Msie'deen
 
YamenTrace: Requirements Traceability - Recovering and Visualizing Traceabili...
YamenTrace: Requirements Traceability - Recovering and Visualizing Traceabili...YamenTrace: Requirements Traceability - Recovering and Visualizing Traceabili...
YamenTrace: Requirements Traceability - Recovering and Visualizing Traceabili...
Ra'Fat Al-Msie'deen
 
Requirements Traceability: Recovering and Visualizing Traceability Links Betw...
Requirements Traceability: Recovering and Visualizing Traceability Links Betw...Requirements Traceability: Recovering and Visualizing Traceability Links Betw...
Requirements Traceability: Recovering and Visualizing Traceability Links Betw...
Ra'Fat Al-Msie'deen
 
Automatic Labeling of the Object-oriented Source Code: The Lotus Approach
Automatic Labeling of the Object-oriented Source Code: The Lotus ApproachAutomatic Labeling of the Object-oriented Source Code: The Lotus Approach
Automatic Labeling of the Object-oriented Source Code: The Lotus Approach
Ra'Fat Al-Msie'deen
 
Constructing a software requirements specification and design for electronic ...
Constructing a software requirements specification and design for electronic ...Constructing a software requirements specification and design for electronic ...
Constructing a software requirements specification and design for electronic ...
Ra'Fat Al-Msie'deen
 
Detecting commonality and variability in use-case diagram variants
Detecting commonality and variability in use-case diagram variantsDetecting commonality and variability in use-case diagram variants
Detecting commonality and variability in use-case diagram variants
Ra'Fat Al-Msie'deen
 
Naming the Identified Feature Implementation Blocks from Software Source Code
Naming the Identified Feature Implementation Blocks from Software Source CodeNaming the Identified Feature Implementation Blocks from Software Source Code
Naming the Identified Feature Implementation Blocks from Software Source Code
Ra'Fat Al-Msie'deen
 
Application architectures - Software Architecture and Design
Application architectures - Software Architecture and DesignApplication architectures - Software Architecture and Design
Application architectures - Software Architecture and Design
Ra'Fat Al-Msie'deen
 
Planning and writing your documents - Software documentation
Planning and writing your documents - Software documentationPlanning and writing your documents - Software documentation
Planning and writing your documents - Software documentation
Ra'Fat Al-Msie'deen
 
Requirements management planning & Requirements change management
Requirements management planning & Requirements change managementRequirements management planning & Requirements change management
Requirements management planning & Requirements change management
Ra'Fat Al-Msie'deen
 
Requirements change - requirements engineering
Requirements change - requirements engineeringRequirements change - requirements engineering
Requirements change - requirements engineering
Ra'Fat Al-Msie'deen
 
Requirements validation - requirements engineering
Requirements validation - requirements engineeringRequirements validation - requirements engineering
Requirements validation - requirements engineering
Ra'Fat Al-Msie'deen
 
Software Documentation - writing to support - references
Software Documentation - writing to support - referencesSoftware Documentation - writing to support - references
Software Documentation - writing to support - references
Ra'Fat Al-Msie'deen
 
Algorithms - "heap sort"
Algorithms - "heap sort"Algorithms - "heap sort"
Algorithms - "heap sort"
Ra'Fat Al-Msie'deen
 
Algorithms - "quicksort"
Algorithms - "quicksort"Algorithms - "quicksort"
Algorithms - "quicksort"
Ra'Fat Al-Msie'deen
 
Software Architecture and Design
Software Architecture and DesignSoftware Architecture and Design
Software Architecture and Design
Ra'Fat Al-Msie'deen
 
Software Architecture and Design
Software Architecture and DesignSoftware Architecture and Design
Software Architecture and Design
Ra'Fat Al-Msie'deen
 

More from Ra'Fat Al-Msie'deen (20)

SoftCloud: A Tool for Visualizing Software Artifacts as Tag Clouds.pdf
SoftCloud: A Tool for Visualizing Software Artifacts as Tag Clouds.pdfSoftCloud: A Tool for Visualizing Software Artifacts as Tag Clouds.pdf
SoftCloud: A Tool for Visualizing Software Artifacts as Tag Clouds.pdf
 
Software evolution understanding: Automatic extraction of software identifier...
Software evolution understanding: Automatic extraction of software identifier...Software evolution understanding: Automatic extraction of software identifier...
Software evolution understanding: Automatic extraction of software identifier...
 
Source Code Summarization
Source Code SummarizationSource Code Summarization
Source Code Summarization
 
FeatureClouds: Naming the Identified Feature Implementation Blocks from Softw...
FeatureClouds: Naming the Identified Feature Implementation Blocks from Softw...FeatureClouds: Naming the Identified Feature Implementation Blocks from Softw...
FeatureClouds: Naming the Identified Feature Implementation Blocks from Softw...
 
YamenTrace: Requirements Traceability - Recovering and Visualizing Traceabili...
YamenTrace: Requirements Traceability - Recovering and Visualizing Traceabili...YamenTrace: Requirements Traceability - Recovering and Visualizing Traceabili...
YamenTrace: Requirements Traceability - Recovering and Visualizing Traceabili...
 
Requirements Traceability: Recovering and Visualizing Traceability Links Betw...
Requirements Traceability: Recovering and Visualizing Traceability Links Betw...Requirements Traceability: Recovering and Visualizing Traceability Links Betw...
Requirements Traceability: Recovering and Visualizing Traceability Links Betw...
 
Automatic Labeling of the Object-oriented Source Code: The Lotus Approach
Automatic Labeling of the Object-oriented Source Code: The Lotus ApproachAutomatic Labeling of the Object-oriented Source Code: The Lotus Approach
Automatic Labeling of the Object-oriented Source Code: The Lotus Approach
 
Constructing a software requirements specification and design for electronic ...
Constructing a software requirements specification and design for electronic ...Constructing a software requirements specification and design for electronic ...
Constructing a software requirements specification and design for electronic ...
 
Detecting commonality and variability in use-case diagram variants
Detecting commonality and variability in use-case diagram variantsDetecting commonality and variability in use-case diagram variants
Detecting commonality and variability in use-case diagram variants
 
Naming the Identified Feature Implementation Blocks from Software Source Code
Naming the Identified Feature Implementation Blocks from Software Source CodeNaming the Identified Feature Implementation Blocks from Software Source Code
Naming the Identified Feature Implementation Blocks from Software Source Code
 
Application architectures - Software Architecture and Design
Application architectures - Software Architecture and DesignApplication architectures - Software Architecture and Design
Application architectures - Software Architecture and Design
 
Planning and writing your documents - Software documentation
Planning and writing your documents - Software documentationPlanning and writing your documents - Software documentation
Planning and writing your documents - Software documentation
 
Requirements management planning & Requirements change management
Requirements management planning & Requirements change managementRequirements management planning & Requirements change management
Requirements management planning & Requirements change management
 
Requirements change - requirements engineering
Requirements change - requirements engineeringRequirements change - requirements engineering
Requirements change - requirements engineering
 
Requirements validation - requirements engineering
Requirements validation - requirements engineeringRequirements validation - requirements engineering
Requirements validation - requirements engineering
 
Software Documentation - writing to support - references
Software Documentation - writing to support - referencesSoftware Documentation - writing to support - references
Software Documentation - writing to support - references
 
Algorithms - "heap sort"
Algorithms - "heap sort"Algorithms - "heap sort"
Algorithms - "heap sort"
 
Algorithms - "quicksort"
Algorithms - "quicksort"Algorithms - "quicksort"
Algorithms - "quicksort"
 
Software Architecture and Design
Software Architecture and DesignSoftware Architecture and Design
Software Architecture and Design
 
Software Architecture and Design
Software Architecture and DesignSoftware Architecture and Design
Software Architecture and Design
 

Recently uploaded

Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
Srikant77
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 

Recently uploaded (20)

Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 

BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports

  • 1. BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports Ra'Fat Al-Msie'deen Department of Software Engineering, Faculty of IT, Mutah University, Mutah 61710, Karak, Jordan E-mail address: rafatalmsiedeen@mutah.edu.jo https://rafat66.github.io/Al-Msie-Deen/
  • 2. Citation • R. Al-Msie’deen, “BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports,” International Journal of Computing and Digital Systems, vol. 14, no. 2, pp. 1-18, 2023.
  • 3. Keywords Keywords: Software engineering, Software maintenance, Duplicate bug reports, Formal concept analysis, Latent semantic indexing, Bug tracking system, Bug report.
  • 4. Abstract A Bug Tracking System (BTS), such as Bugzilla, is generally utilized to track submitted Bug Reports (BRs) for a particular software system. Duplicate Bug Report (DBR) retrieval is the process of obtaining a DBR in the BTS. This process is important to avoid needless work from engineers on DBRs. To prevent wasting engineer resources, such as effort and time, on previously submitted (or duplicate) BRs, it is essential to find and retrieve DBRs as soon as they are submitted by software users.
  • 5. Abstract … Thus, this paper proposes an automatic approach (called BushraDBR) that aims to assist an engineer (called a triager) to retrieve DBRs and stop the duplicates before they start. Where BushraDBR stands for Bushra Duplicate Bug Reports retrieval process. Therefore, when a new BR is sent to the Bug Repository (BRE), an engineer checks whether it is a duplicate of an existing BR in BRE or not via BushraDBR approach. If it is, the engineer marks it as DBR, and the BR is excluded from consideration for any additional work; otherwise, the BR is added to the BRE. BushraDBR approach relies on Textual Similarity (TS) between the newly submitted BR and the rest of the BRs in BRE to retrieve DBRs.
  • 6. Abstract … BushraDBR exploits unstructured data from BRs to apply Information Retrieval (IR) methods in an efficient way. BushraDBR approach uses two techniques to retrieve DBRs: Latent Semantic Indexing (LSI) and Formal Concept Analysis (FCA). The originality of BushraDBR is to stop DBRs before they occur by comparing the newly reported BR with the rest of the BRs in the BTS, thus saving time and effort during the Software Maintenance (SM) process. BushraDBR also uniquely retrieves DBR through the use of LSI and FCA techniques. BushraDBR approach had been validated and evaluated on several publicly available data sets from Bugzilla. Experiments show the ability of BushraDBR approach to retrieve DBRs in an efficient and accurate manner.
  • 8. List of Abbreviations • Bug Tracking System (BTS) • Bug Reports (BRs) • Duplicate Bug Report (DBR) • Bug Repository (BRE) • Information Retrieval (IR) • Latent Semantic Indexing (LSI) • Software Maintenance (SM) • Formal Concept Analysis (FCA) • Master BRs set (MRs) Vector Space Model (VSM) Deep Learning (DL) Machine learning (ML) Convolutional Neural Network (CNN) New BR (Nr) Natural Language Processing (NLP) Quality Assurance (QA) Drawing Shapes Application (DSA) Bushra Duplicate Bug Reports retrieval process (BushraDBR) Cosine Similarity Matrix (CSM) Cosine Similarity (CS) Term-Query Matrix (TQM) Term Document Matrix (TDM) Textual Similarity (TS) List of DBRs (LDRs) Binary Formal Context (BFC) Singular Value Decomposition (SVD)
  • 9. Introduction  In this work, reports that are textually similar to each other are called DBRs.  TS between BRs is good evidence that they describe the same (or similar) issue.  Frequently, DBRs are reported by multiple software users.  These users come from different backgrounds and use different vocabularies to describe the same (or similar) software bug.  The important role of the triager is to check the reported BRs for any possible duplicates before sending them to the BRE.  A manual check of submitted BRs is a difficult task due to the huge number of BRs reported every day.  On the other hand, retrieving DBR after storing it in the BRE is a tedious and expensive task for software developers.  So, the process of stopping DBRs before they start is important and very useful.
  • 10. Introduction … • The novelty of this paper is that it proposes a textual-based approach (i.e., an IR-based solution) to stopping DBRs before they start. • BushraDBR prevents DBRs by continuously checking the recently reported or submitted BR against the BRs stored in the BTS. • In the event that the submitted BR is textually similar to any of the BRs inside the BTS, the triager will exclude this BR from any further work and not include it in the BTS.
  • 11. Figure 1. An example of a BR from drawing shapes application. “Bug details” Bug ID: 000001. Product: Drawing shapes App. Submitted on: Sept. 12, 2022. Summary: The PDF viewer is very slow for line drawings. Component: Layout. Type: defect. Severity: S4. Priority: P5. Status: Assigned. “Bug description” Actual results: PDF is very slow to load. The text parts of the PDF appear fast, but every line drawing is very slow. Expected results: PDFs with line drawings should show up a lot faster without any delay.
  • 12. TABLE I. A pair of DBRs from Bugzilla (i.e., core product).
  • 13. Figure 2. The workflow for retrieving DBRs via BushraDBR approach. BR_001: … Bug repository BushraDBR approach Similarity BR documents BR_002: … BR_003: … … BR_n: … BR_001: 0.02 BR_002: 0.58 BR_003: 0.98 … BR_n: 0.45 Duplicate Nr_Bug-ID BR _003 Duplicate or non-duplicate New bug report (Nr) Nr_Bug-ID Query LSI Test duplicate bug reports (DBRs)  Nr FCA
  • 14. Figure 3. The key elements of BurshraDBR approach. New bug report (Nr) BR documents from BTS Duplicate BR Non-duplicate BR Threshold = 80 LSI FCA OUTPUT DATA SETS EVALUATION METRICS Precision F-measure Recall WebPayments UI Firefo x Symbolication & Eliot DOM: Editor Audio/Video Core Drawing Shapes App TECHNIQUES BurshraDBR approach
  • 15. TABLE II. Structured and unstructured information that is leveraged by the selected approaches (i.e., survey).
  • 16. TABLE III. Data sets that are utilized by the selected approaches (i.e., survey).
  • 17. TABLE IV. Review and classification of current studies relevant for DBR retrieval approaches.
  • 18. Figure 4. DBRs describe the same software failure and use similar vocabularies. Bug Report X Bug Report Y Bug X Bug Tracking System (BTS) Triager or developer BushraDBR approach Non-duplicate Duplicate Master (or original) bug report Software user Submit Software failure X Nr Similar vocabularies
  • 19. Figure 5. The DBR retrieval process - BushraDBR approach. Master bug reports set (MRs) List of duplicate bug reports (LDRs) Repository BushraDBR approach Non-duplicate Duplicate No Yes BR documents Similarity value >= threshold (Thr) Insert into MRs Insert into LDRs New bug report (Nr) Query (Nr) document Inputs Latent Semantic Indexing Formal Concept Analysis Preprocessing
  • 20. Figure 6. An example of a bug document generated by BushraDBR approach. Bug Document BushraDBR Bug ID: 000007 Incapable of using the mouse wheel to scroll on an art site. Scrolling the contents of the art site by clicking the scroll wheel of the mouse device does not always work well.
  • 22. Figure 7. An example of a pre-processed bug document. Bug Document BushraDBR Bug ID: 000007 incapable use mouse wheel scroll art site scroll content art site click scroll wheel mouse device always work well 1. Extracting bug content 2. Splitting words or tokenization 3. Removing stop words 4. Stemming Pre-processed document
  • 23. TABLE V. Bug reports from the drawing shapes application data set.
  • 24. TABLE VI. BushraDBR preprocessing steps, explanation, and a real-world example from the DSA data set.
  • 25. Figure 8. Retrieving DBRs using LSI and FCA techniques — the BushraDBR approach. BushraDBR approach Latent Semantic Indexing 2 Formal Concept Analysis 3 Extracting Similarity matrix (SM) SM as binary formal context Nr BR-1 BR-2 BR-N 0.01 0.93 0.07 Concept_0 BR-2 Nr Nr BRs Nr BR-1 BR-2 BR-N 0 1 0 Nr BR-1 BR-2 BR-N x The AOC-poset Formal context DBRs Formal concept ..., of the myOval() and … displayed, … …, of, the, my, oval, and, ..., displayed, .. .., oval, .., displayed, … …, oval, …, display, .. Splitting Preprocessing 1 The intent of concept The extent of concept Stop words Stemming
  • 27. Measuring TS between BRs via LSI 1 2 3 4
  • 28.
  • 29. Figure 9. The TS values between Nr and BRs as a directed graph.
  • 30. Algorithm 3: Retrieving DBRs using FCA.
  • 31. Retrieving DBRs using FCA Figure 10. The AOC-poset for the DSA data set. 1 2 3 4
  • 32. Figure 10. The AOC- poset for DSA data set.
  • 34. Concept_0 1819151 1819152 Drawing Shapes Application (DSA) Symbolication & General - Eliot product Concept_0 1490824 1510066 WebPayments UI – Firefox product (partial) Concept_0 1545235 1545237 Audio/Video – Core product (partial) DOM editor – Core product (partial) Concept_0 176525 1841744 Concept_0 000006 000007 The extent of Concept_0 The intent of Concept_0 The concept ID Formal concept Concept_0 000006 000007 Formal concept analysis
  • 35. Master bug reports set (MRs) List of duplicate bug reports (LDRs) Repository BushraDBR approach Non-duplicate Duplicate No Yes Preprocessing Latent Semantic Indexing 1 2 Formal Concept Analysis 3 Extracting Splitting Stop words Stemming Similarity matrix (SM) SM as binary formal context BR documents Similarity value >= threshold (Thr) Nr BR-1 BR-2 BR-N 0.01 0.93 0.07 Nr BR-1 BR-2 BR-N 0 1 0 Insert into MRs Insert into LDRs Concept_0 BR-2 Nr New bug report (Nr) Query (Nr) document
  • 36. Figure 9. The TS values between Nr and BRs as a directed graph. 000007 000002 000003 000004 000005 000001 000006
  • 37. New bug report (Nr) List of duplicate bug reports (LDRs) BushraDBR approach Non-duplicate Duplicate No Yes Preprocessing Latent Semantic Indexing 1 2 Formal Concept Analysis 3 Extracting Splitting Stop words Stemming Similarity matrix (SM) SM as binary formal context Query (Nr) document Similarity value >= threshold (Thr) Insert into LDRs BushraDBR approach
  • 38. Experimentation Master bug reports set (MRs) Repository BR documents
  • 41. Figure 10. The AOC-poset for each data set in the experiments: (A) DSA, (B) Eliot, (C) WebPayments UI [partial], (D) Audio / Video [partial], and (E) DOM editor [partial]. Concept_0 1819151 1819152 Concept_1 1768863 1814509 1729698 1741434 1801212 1812345 1815981 1707879 1815982 1811299 1801169 1811236 1475334 1649535 1745533 Concept_1 000005 000003 000004 000001 000002 Concept_0 1490824 1510066 Concept_1 1476344 1498447 1446577 1498225 1438784 1494439 1501447 1499837 1507623 1464356 1494559 1470197 1495151 . . Concept_0 1545235 1545237 Concept_1 1582074 1673285 1532646 1732199 1816175 1776641 1680362 1743870 1676015 1691996 1545237 1757124 1799132 . . Concept_1 1088194 201410 1036856 1276391 1327934 1723853 1840784 1710784 458524 460903 377297 1462368 1567160 . . Concept_0 176525 1841744 Concept_0 000006 000007 (A) (B) (C) (D) (E)
  • 42. Conclusion • This paper has introduced a novel approach called BushraDBR targeted at automatically retrieving DBRs using LSI and FCA. BushraDBR aimed to prevent developers from wasting their resources, such as effort and time, on previously submitted BRs. The novelty of BushraDBR is that it exploits textual data in BRs to apply LSI and FCA techniques in an efficient way to retrieve DBRs. Preprocessing Latent Semantic Indexing 1 2 Formal Concept Analysis 3 Extracting Splitting Stop words Stemming Similarity matrix (SM) SM as binary formal context
  • 43. Conclusion … • BushraDBR prevents DBRs before they occur by comparing the newly reported BR with the rest of the BRs in the repository. The suggested approach had been validated and evaluated on different data sets from Bugzilla. Experiments show the capacity of BushraDBR approach to retrieve DBRs in an efficient and accurate manner.
  • 44. Future work • Regarding BushraDBR's future work, the author plans to extend the current approach by developing an ML-based solution to retrieve DBRs and prevent duplicates before they start. Also, he plans to compare BushraDBR (i.e., an IR-based approach) with current ML-based approaches. Furthermore, additional empirical tests can be conducted to verify BushraDBR approach using open-source and industrial data sets. There is also a necessary need to conduct a comprehensive survey and make comparisons between all current approaches relevant to DBR retrieval. New bug report (Nr) List of duplicate bug reports (LDRs) Non-duplicate Duplicate No Yes Query (Nr) document Similarity value >= threshold (Thr) Insert into LDRs
  • 46. [3] N. Jalbert and W. Weimer, “Automated duplicate detection for bug tracking systems,” in The 38th Annual IEEE / IFIP International Conference on Dependable Systems and Networks, DSN 2008, June 24-27, 2008, Anchorage, Alaska, USA, Proceedings. IEEE Computer Society, 2008, pp. 52–61. [Online]. Available: https://doi.org/10.1109/DSN.2008.4630070 [4] A. Hindle and C. Onuczko, “Preventing duplicate bug reports by continuously querying bug reports,” Empir. Softw. Eng., vol. 24, no. 2, pp. 902–936, 2019. [Online]. Available: https://doi.org/10.1007/s10664-018-9643-4 [11]A. Kukkar, R. Mohana, Y. Kumar, A. Nayyar, M. Bilal, and K. Kwak, “Duplicate bug report detection and classification system based on deep learning technique,” IEEE Access, vol. 8, pp. 200 749–200 763, 2020. [Online]. Available: https://doi.org/10.1109/ACCESS.2020.3033045 [17] X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun, “An approach to detecting duplicate bug reports using natural language and execution information,” in 30th International Conference on Software Engineering (ICSE 2008), Leipzig, Germany, May 10-18, 2008, W. Sch¨afer, M. B. Dwyer, and V. Gruhn, Eds. ACM, 2008, pp. 461–470. [Online]. Available: https://doi.org/10.1145/1368088.1368151 [36] C. Sun, D. Lo, X. Wang, J. Jiang, and S. Khoo, “A discriminative model approach for accurate duplicate bug report retrieval,” in Proceedings of the 32nd ACM / IEEE International Conference on Software Engineering - Volume 1, ICSE 2010, Cape Town, South Africa, 1-8 May 2010, J. Kramer, J. Bishop, P. T. Devanbu, and S. Uchitel, Eds. ACM, 2010, pp. 45–54. [Online]. Available: https://doi.org/10.1145/1806799.1806811 [37] C. Sun, D. Lo, S. Khoo, and J. Jiang, “Towards more accurate retrieval of duplicate bug reports,” in 26th IEEE / ACM International Conference on Automated Software Engineering (ASE 2011), Lawrence, KS, USA, November 6-10, 2011, P. Alexander, C. S. Pasareanu, and J. G. Hosking, Eds. IEEE Computer Society, 2011, pp. 253–262. [Online]. Available: https://doi.org/10.1109/ASE.2011.6100061
  • 47. [38] F. Thung, P. S. Kochhar, and D. Lo, “Dupfinder: integrated tool support for duplicate bug report detection,” in ACM / IEEE International Conference on Automated Software Engineering, ASE ’14, Vasteras, Sweden - September 15 - 19, 2014, I. Crnkovic, M. Chechik, and P. Gr¨unbacher, Eds. ACM, 2014, pp. 871–874. [Online]. Available: https://doi.org/10.1145/2642937.2648627 [39] J. He, L. Xu, M. Yan, X. Xia, and Y. Lei, “Duplicate bug report detection using dual-channel convolutional neural networks,” in ICPC ’20: 28th International Conference on Program Comprehension, Seoul, Republic of Korea, July 13-15, 2020. ACM, 2020, pp. 117–127. [Online]. Available: https://doi.org/10.1145 /3387904.3389263 [40] P. Runeson, M. Alexandersson, and O. Nyholm, “Detection of duplicate defect reports using natural language processing,” in 29th International Conference on Software Engineering (ICSE 2007), Minneapolis, MN, USA, May 20-26, 2007. IEEE Computer Society, 2007, pp. 499–510. [Online]. Available: https://doi.org/10.1109/ICSE.2007.32 [44] M. S. Rakha, C. Bezemer, and A. E. Hassan, “Revisiting the performance evaluation of automated approaches for the retrieval of duplicate issue reports,” IEEE Trans. Software Eng., vol. 44, no. 12, pp. 1245–1268, 2018. [Online]. Available: https://doi.org/10.1109/TSE.2017.2755005 [47] B. S. Neysiani and S. Morteza Babamir, “Automatic duplicate bug report detection using information retrieval-based versus machine learning-based approaches,” in 2020 6th International Conference on Web Research (ICWR), April 2020, pp. 288–293.
  • 48. BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports
  • 49. BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports Ra'Fat Al-Msie'deen Department of Software Engineering, Faculty of IT, Mutah University, Mutah 61710, Karak, Jordan E-mail address: rafatalmsiedeen@mutah.edu.jo https://rafat66.github.io/Al-Msie-Deen/