Revisiting the Experimental Design Choices
for Approaches for the Automated Retrieval
of Duplicate Issue Reports
Ph.D. : Mohamed Sami Rakha
Supervisor: Prof. Ahmed E. Hassan
Ph.D. Thesis Defense
Software issues are common
There are many issue tracking systems
Issue tracking systems receive a large number
of issue reports
New Issue
Reports
Over 1.2 Million
Issue Reports
Different persons may report the same issue
User #2
User #1
Different persons may report the same issue
User #2
User #1
Duplicate
Master
The identification of duplicate issue reports is
an annoying and timewasting task
User #2
User #1
Duplicate
20-30%
Duplicates! Master
Bettenburg et al. Duplicate bug reports considered harmful… really? [ICSM 2008:]
Information retrieval is leveraged by prior
research
Duplicate
Automated
Approach
Search
Most similar
previously
reported issues
Ranked list
Master
Developer choose the accurate candidate for
the newly reported duplicate… If there is any
Duplicate
Automated
Approach
Search
Most similar
previously
reported issues
Ranked list
Developer
Master
Master
Several automated approaches for duplicates
retrieval are proposed by prior research
Prior Research
2007: Runeson et al.
2008: Wang et al.
2010: Sun et al.
2010: Sureka et al.
2011: Sun et al.
2012: Zhou al.
2012: Nguyen al.
2015: Aggarwal al.
2016: Hindle al.
2016: Jie et al.
Prior research focuses on improving the
performance
Automated
Approach
Performance
Recall Rate
Duplicate
The goal of newly proposed approach is to
have higher performance than prior ones
Prior Research
2007: Runeson et al.
2008: Wang et al.
2010: Sun et al.
2010: Sureka et al.
2011: Sun et al.
2012: Zhou al.
2012: Nguyen al.
2015: Aggarwal al.
2016: Hindle al.
2016: Jie et al.
In my thesis, I revisit the experimental design
choices from four perspectives
Automated
Approach
Performance
Recall Rate
Duplicate
The goal of newly proposed approach is to
have better performance than prior ones
A BRecall Rate = 0.6 Recall Rate = 0.5
>Example:
Prior Research
2007: Runeson et al.
2008: Wang et al.
2010: Sun et al.
2010: Sureka et al.
2011: Sun et al.
2012: Zhou al.
2012: Nguyen al.
2015: Aggarwal al.
2016: Hindle al.
2016: Jie et al.
Revisiting the experimental design choices
can help better understand the benefits and
limitations of prior approaches evaluation.
Needed
Effort
Data
Filtration
Data
Changes
Evaluation
Process
Evaluation Process Data Filtration Data ChangesNeeded Effort
Prior research treats all duplicate reports equally
Duplicate reports differ based on the effort
needed for their manual identification
Evaluation Process Data Filtration Data ChangesNeeded Effort
Real Example:
Duplicate (B) needs more much effort to be identified
Time
reported
BA
Evaluation Process Data Filtration Data ChangesNeeded Effort
16
Identified
A
Real Example:
Duplicate (B) needs more much effort to be identified
15 minutes
Time
reported
BA
Evaluation Process Data Filtration Data ChangesNeeded Effort
17
Identified
A B
Identified
Real Example:
Duplicate (B) needs more much effort to be identified
15 minutes
44 days
Time
reported
BA
Evaluation Process Data Filtration Data ChangesNeeded Effort
A duplicate can be hard or easy
based on the effort needed for identification
HARDEASY
Evaluation Process Data Filtration Data ChangesNeeded Effort
50%
Within
1 Day
Without
Discussions
With one
Person
50%EASY HARD
Studied Systems:
Evaluation Process Data Filtration Data ChangesNeeded Effort
Random Forest Model
“Binary Classification”
AUCs
0.68 – 0.8
50%50%EASY HARD
Evaluation Process Data Filtration Data ChangesNeeded Effort
Random Forest Model
“Binary Classification”
AUCs
0.68 – 0.8
50%50%EASY HARD
This model is used to explain the most important
factors on the effort needed for identification
Factors such as Description Similarity is one of
the top important ones
Evaluation Process Data Filtration Data ChangesNeeded Effort
Thesis Contribution #1:
Show the importance of considering the needed
effort for retrieving duplicate reports in the
performance measurement of automated duplicate
retrieval approaches
[Published in EMSE 2015]
Data ChangesData FiltrationNeeded Effort Evaluation Process
Testing period (1 Year)
AA Time
Not considered in the evaluation
Many prior research evaluate their
approaches on short testing periods
Data ChangesData FiltrationNeeded Effort Evaluation Process
AA B Time
Testing period (1 Year)Not considered in the evaluation
Many prior research evaluate their
approaches on short testing periods
Data ChangesData FiltrationNeeded Effort Evaluation Process
AA BB Time
Testing period (1 Year)
Duplicate (B) match is outside the
testing period
Not considered in the evaluation
Data ChangesData FiltrationNeeded Effort Evaluation Process
TimeB AA B
Ignored
Testing period (1 Year)
Duplicate (B) match is outside the
testing period
Not considered in the evaluation
Data ChangesData FiltrationNeeded Effort Evaluation Process
TimeB AA B
Ignored
Testing period (1 Year)Not considered in the evaluation
Classical
Evaluation
I refer to this evaluation by
“Classical Evaluation”
Data ChangesData FiltrationNeeded Effort Evaluation Process
Time
Classical
EvaluationRealistic
Evaluation
Testing period (1 Year)
Instead, I propose
“Realistic Evaluation”
Data ChangesData FiltrationNeeded Effort Evaluation Process
Time
Classical
EvaluationRealistic
Evaluation
Testing period (1 Year)
Instead, I propose
“Realistic Evaluation”
AB A B
Data ChangesData FiltrationNeeded Effort Evaluation Process
Mozilla
2010
Eclipse
2008
OpenOffice
2008 - 2010
Classical
EvaluationRealistic
Evaluation
Problem #1
Number of considered
duplicates
Problem #2
Performance
overestimation
I compare the two evaluations on frequently used
testing data in prior research
Time
Data ChangesData FiltrationNeeded Effort Evaluation Process
Classical evaluation is
missing 23.5% of
duplicates10,043 Duplicates
7,551 Duplicates
Classical
EvaluationRealistic
Evaluation
Mozilla 2010
Problem #1
Number of considered
duplicates
Testing period
Time
Data ChangesData FiltrationNeeded Effort Evaluation Process
Recall rate is
overestimated by 39%
for REP approachRecall: 0.43
Recall: 0.58
Mozilla 2010
Classical
EvaluationRealistic
Evaluation
Problem #2
Performance
overestimation
Testing period
Data ChangesData FiltrationNeeded Effort Evaluation Process
Mozilla
2010
Eclipse
2008
OpenOffice
2008 - 2010
Classical
EvaluationRealistic
Evaluation
Problem #1
Number of considered
duplicates
Problem #2
Performance
overestimation
Data ChangesData FiltrationNeeded Effort Evaluation Process
Mozilla
2010
Eclipse
2008
OpenOffice
2008 - 2010
Classical
EvaluationRealistic
Evaluation
Problem #1
Number of considered
duplicates
Problem #2
Performance
overestimation
Is the difference in performance
between the two evaluations
generalize to other testing
periods within the tested ITSs
Data ChangesData FiltrationNeeded Effort Evaluation Process
Randomly select 100 chunks as
“Testing Periods”
Studying the statistical difference
between the classical and realistic
evaluations
Classical
EvaluationRealistic
Evaluation
Chunk = 1 Year Time
Data ChangesData FiltrationNeeded Effort Evaluation Process
Classical
EvaluationRealistic
Evaluation
Chunk 1 (Dec2004 – Dec2005)
Chunk 2 (Nov2003 – Nov2004)
Chunk 100 (Sep2007 – Sep2008)
Results 1
Results 2
Results 100
Automated
Approach
Data ChangesData FiltrationNeeded Effort Evaluation Process
Classical evaluation
significantly
overestimates the
performance of the
automated approaches
(p-value<0.001)
Data ChangesData FiltrationNeeded Effort Evaluation Process
Thesis Contribution #2:
Propose a realistic evaluation for the automated
retrieval of duplicate reports, and analyze prior
findings using such realistic evaluation
[Published in TSE 2017]
Time
Data ChangesNeeded Effort Evaluation Process Data Filtration
Realistic
Evaluation
Do we need all these issue reports
to search in?
Filtering out old issue reports should
lead to reduced search time
Testing period (1 Year)
No Duplicates are
Ignored
Time
Data ChangesNeeded Effort Evaluation Process Data Filtration
Realistic
Evaluation
The possible solution is to filter
out the old issue reports using a
time window based on the
report date
Testing period (1 Year)
No Duplicates are
Ignored
Time
Data ChangesNeeded Effort Evaluation Process Data Filtration
Realistic
Evaluation
2 Years
Example (1)
AA BB
Testing period (1 Year)
The possible solution is to filter
out the old issue reports using a
time window based on the
report date
Time
Data ChangesNeeded Effort Evaluation Process Data Filtration
Realistic
Evaluation
4 months
Example (2)
AA BB
Testing period (1 Year)
The possible solution is to filter
out the old issue reports using a
time window based on the
report date
Data ChangesNeeded Effort Evaluation Process Data Filtration
Data ChangesNeeded Effort Evaluation Process Data Filtration
Realistic
Evaluation
Data ChangesNeeded Effort Evaluation Process Data Filtration
Realistic
Evaluation
Results of the limitation by (n) months
It is not possible to limit the issue reports that
are searched without decreasing
the performance of automated approaches
Data ChangesNeeded Effort Evaluation Process Data Filtration
A B Time
Testing period
reported
Data ChangesNeeded Effort Evaluation Process Data Filtration
A B Time
(A) is resolved 1
month ago
Testing period
reported
Data ChangesNeeded Effort Evaluation Process Data Filtration
A B Time
(A) is resolved 1
month ago
(B) is resolved 2
years ago
Testing period
reported
Data ChangesNeeded Effort Evaluation Process Data Filtration
A B Time
Testing period
A new duplicate is more likely
to be duplicate of (A)
(A) is resolved 1
month ago
(B) is resolved 2
years ago
reported
Data ChangesNeeded Effort Evaluation Process Data Filtration
I propose filtration
approach based on
time threshold from
resolution date for
each resolution value
and rank
The thresholds are
optimized using
Genetic Algorithm
(NSGA-II)
Data ChangesNeeded Effort Evaluation Process Data Filtration
The proposed
filtration approach
improves the
performance by up
to 60%
Data ChangesNeeded Effort Evaluation Process Data Filtration
Thesis Contribution #3:
Propose a genetic algorithm based approach for
filtering out the old issue reports to improve the
performance of duplicate retrieval approaches.
[Published in TSE 2017]
Time
Needed Effort Evaluation Process Data Filtration Data Changes
2011
Just-in-time (JIT)
duplicate retrieval
feature is introduced
Needed Effort Evaluation Process Data Filtration Data Changes
Time
Needed Effort Evaluation Process Data Filtration Data Changes
2011
Significantly less
duplicates but more
identification effort is
needed
Time
Needed Effort Evaluation Process Data Filtration Data Changes
2011
Automated
Approach
Automated
Approach
How the JIT duplicate retrieval feature impact the
performance evaluation of the automated approaches?
Needed Effort Evaluation Process Data Filtration Data Changes
Approach (1): BM25F Approach (2): REP
The performance of both approaches is
significantly lower for after-JIT duplicates
Firefox
Needed Effort Evaluation Process Data Filtration Data Changes
The relative improvement of the performance of REP
over BM25F is significantly larger after the activation of
the JIT
Firefox
Needed Effort Evaluation Process Data Filtration Data Changes
Thesis Contribution #4:
Highlight the impact of the “JIT” feature on the
performance evaluation of duplicate retrieval
approaches
[Submitted to EMSE] – under review
1 2
3 4
Summary
Prior research focuses on improving the
performance
Automated
Approach
Performance
Recall Rate
Duplicate
The goal of newly proposed approach is to
have higher performance than prior ones
Prior Research
2007: Runeson et al.
2008: Wang et al.
2010: Sun et al.
2010: Sureka et al.
2011: Sun et al.
2012: Zhou al.
2012: Nguyen al.
2015: Aggarwal al.
2016: Hindle al.
2016: Jie et al.
In my thesis, I revisit the experimental design
choices from four perspectives
Automated
Approach
Performance
Recall Rate
Duplicate
The goal of newly proposed approach is to
have better performance than prior ones
A BRecall Rate = 0.6 Recall Rate = 0.5
>Example:
Prior Research
2007: Runeson et al.
2008: Wang et al.
2010: Sun et al.
2010: Sureka et al.
2011: Sun et al.
2012: Zhou al.
2012: Nguyen al.
2015: Aggarwal al.
2016: Hindle al.
2016: Jie et al.
Revisiting the experimental design choices
can help better understand the benefits and
limitations of prior approaches evaluation.
Needed
Effort
Data
Filtration
Data
Changes
Evaluation
Process
Evaluation Process Data Filtration Data ChangesNeeded Effort
Thesis Contribution #1:
Show the importance of considering the needed
effort for retrieving duplicate reports in the
performance measurement of automated duplicate
retrieval approaches
[Published in EMSE 2015]
Data ChangesData FiltrationNeeded Effort Evaluation Process
Thesis Contribution #2:
Propose a realistic evaluation for the automated
retrieval of duplicate reports, and analyze prior
findings using such realistic evaluation
[Published in TSE 2017]
Data ChangesNeeded Effort Evaluation Process Data Filtration
Thesis Contribution #3:
Propose a genetic algorithm based approach for
filtering out the old issue reports to improve the
performance of duplicate retrieval approaches
[Published in TSE 2017]
Needed Effort Evaluation Process Data Filtration Data Changes
Thesis Contribution #4:
Highlight the impact of the “JIT” feature on the
performance evaluation of duplicate retrieval
approaches
[EMSE 2017] – under review
1 2
3 4
Summary

Revisiting the Experimental Design Choices for Approaches for the Automated Retrieval of Duplicate Issue Reports

  • 1.
    Revisiting the ExperimentalDesign Choices for Approaches for the Automated Retrieval of Duplicate Issue Reports Ph.D. : Mohamed Sami Rakha Supervisor: Prof. Ahmed E. Hassan Ph.D. Thesis Defense
  • 3.
  • 4.
    There are manyissue tracking systems
  • 5.
    Issue tracking systemsreceive a large number of issue reports New Issue Reports Over 1.2 Million Issue Reports
  • 6.
    Different persons mayreport the same issue User #2 User #1
  • 7.
    Different persons mayreport the same issue User #2 User #1 Duplicate Master
  • 8.
    The identification ofduplicate issue reports is an annoying and timewasting task User #2 User #1 Duplicate 20-30% Duplicates! Master Bettenburg et al. Duplicate bug reports considered harmful… really? [ICSM 2008:]
  • 9.
    Information retrieval isleveraged by prior research Duplicate Automated Approach Search Most similar previously reported issues Ranked list Master
  • 10.
    Developer choose theaccurate candidate for the newly reported duplicate… If there is any Duplicate Automated Approach Search Most similar previously reported issues Ranked list Developer Master Master
  • 11.
    Several automated approachesfor duplicates retrieval are proposed by prior research Prior Research 2007: Runeson et al. 2008: Wang et al. 2010: Sun et al. 2010: Sureka et al. 2011: Sun et al. 2012: Zhou al. 2012: Nguyen al. 2015: Aggarwal al. 2016: Hindle al. 2016: Jie et al.
  • 12.
    Prior research focuseson improving the performance Automated Approach Performance Recall Rate Duplicate The goal of newly proposed approach is to have higher performance than prior ones Prior Research 2007: Runeson et al. 2008: Wang et al. 2010: Sun et al. 2010: Sureka et al. 2011: Sun et al. 2012: Zhou al. 2012: Nguyen al. 2015: Aggarwal al. 2016: Hindle al. 2016: Jie et al.
  • 13.
    In my thesis,I revisit the experimental design choices from four perspectives Automated Approach Performance Recall Rate Duplicate The goal of newly proposed approach is to have better performance than prior ones A BRecall Rate = 0.6 Recall Rate = 0.5 >Example: Prior Research 2007: Runeson et al. 2008: Wang et al. 2010: Sun et al. 2010: Sureka et al. 2011: Sun et al. 2012: Zhou al. 2012: Nguyen al. 2015: Aggarwal al. 2016: Hindle al. 2016: Jie et al. Revisiting the experimental design choices can help better understand the benefits and limitations of prior approaches evaluation. Needed Effort Data Filtration Data Changes Evaluation Process
  • 14.
    Evaluation Process DataFiltration Data ChangesNeeded Effort Prior research treats all duplicate reports equally Duplicate reports differ based on the effort needed for their manual identification
  • 15.
    Evaluation Process DataFiltration Data ChangesNeeded Effort Real Example: Duplicate (B) needs more much effort to be identified Time reported BA
  • 16.
    Evaluation Process DataFiltration Data ChangesNeeded Effort 16 Identified A Real Example: Duplicate (B) needs more much effort to be identified 15 minutes Time reported BA
  • 17.
    Evaluation Process DataFiltration Data ChangesNeeded Effort 17 Identified A B Identified Real Example: Duplicate (B) needs more much effort to be identified 15 minutes 44 days Time reported BA
  • 18.
    Evaluation Process DataFiltration Data ChangesNeeded Effort A duplicate can be hard or easy based on the effort needed for identification HARDEASY
  • 19.
    Evaluation Process DataFiltration Data ChangesNeeded Effort 50% Within 1 Day Without Discussions With one Person 50%EASY HARD Studied Systems:
  • 20.
    Evaluation Process DataFiltration Data ChangesNeeded Effort Random Forest Model “Binary Classification” AUCs 0.68 – 0.8 50%50%EASY HARD
  • 21.
    Evaluation Process DataFiltration Data ChangesNeeded Effort Random Forest Model “Binary Classification” AUCs 0.68 – 0.8 50%50%EASY HARD This model is used to explain the most important factors on the effort needed for identification Factors such as Description Similarity is one of the top important ones
  • 22.
    Evaluation Process DataFiltration Data ChangesNeeded Effort Thesis Contribution #1: Show the importance of considering the needed effort for retrieving duplicate reports in the performance measurement of automated duplicate retrieval approaches [Published in EMSE 2015]
  • 23.
    Data ChangesData FiltrationNeededEffort Evaluation Process Testing period (1 Year) AA Time Not considered in the evaluation Many prior research evaluate their approaches on short testing periods
  • 24.
    Data ChangesData FiltrationNeededEffort Evaluation Process AA B Time Testing period (1 Year)Not considered in the evaluation Many prior research evaluate their approaches on short testing periods
  • 25.
    Data ChangesData FiltrationNeededEffort Evaluation Process AA BB Time Testing period (1 Year) Duplicate (B) match is outside the testing period Not considered in the evaluation
  • 26.
    Data ChangesData FiltrationNeededEffort Evaluation Process TimeB AA B Ignored Testing period (1 Year) Duplicate (B) match is outside the testing period Not considered in the evaluation
  • 27.
    Data ChangesData FiltrationNeededEffort Evaluation Process TimeB AA B Ignored Testing period (1 Year)Not considered in the evaluation Classical Evaluation I refer to this evaluation by “Classical Evaluation”
  • 28.
    Data ChangesData FiltrationNeededEffort Evaluation Process Time Classical EvaluationRealistic Evaluation Testing period (1 Year) Instead, I propose “Realistic Evaluation”
  • 29.
    Data ChangesData FiltrationNeededEffort Evaluation Process Time Classical EvaluationRealistic Evaluation Testing period (1 Year) Instead, I propose “Realistic Evaluation” AB A B
  • 30.
    Data ChangesData FiltrationNeededEffort Evaluation Process Mozilla 2010 Eclipse 2008 OpenOffice 2008 - 2010 Classical EvaluationRealistic Evaluation Problem #1 Number of considered duplicates Problem #2 Performance overestimation I compare the two evaluations on frequently used testing data in prior research
  • 31.
    Time Data ChangesData FiltrationNeededEffort Evaluation Process Classical evaluation is missing 23.5% of duplicates10,043 Duplicates 7,551 Duplicates Classical EvaluationRealistic Evaluation Mozilla 2010 Problem #1 Number of considered duplicates Testing period
  • 32.
    Time Data ChangesData FiltrationNeededEffort Evaluation Process Recall rate is overestimated by 39% for REP approachRecall: 0.43 Recall: 0.58 Mozilla 2010 Classical EvaluationRealistic Evaluation Problem #2 Performance overestimation Testing period
  • 33.
    Data ChangesData FiltrationNeededEffort Evaluation Process Mozilla 2010 Eclipse 2008 OpenOffice 2008 - 2010 Classical EvaluationRealistic Evaluation Problem #1 Number of considered duplicates Problem #2 Performance overestimation
  • 34.
    Data ChangesData FiltrationNeededEffort Evaluation Process Mozilla 2010 Eclipse 2008 OpenOffice 2008 - 2010 Classical EvaluationRealistic Evaluation Problem #1 Number of considered duplicates Problem #2 Performance overestimation Is the difference in performance between the two evaluations generalize to other testing periods within the tested ITSs
  • 35.
    Data ChangesData FiltrationNeededEffort Evaluation Process Randomly select 100 chunks as “Testing Periods” Studying the statistical difference between the classical and realistic evaluations Classical EvaluationRealistic Evaluation Chunk = 1 Year Time
  • 36.
    Data ChangesData FiltrationNeededEffort Evaluation Process Classical EvaluationRealistic Evaluation Chunk 1 (Dec2004 – Dec2005) Chunk 2 (Nov2003 – Nov2004) Chunk 100 (Sep2007 – Sep2008) Results 1 Results 2 Results 100 Automated Approach
  • 37.
    Data ChangesData FiltrationNeededEffort Evaluation Process Classical evaluation significantly overestimates the performance of the automated approaches (p-value<0.001)
  • 38.
    Data ChangesData FiltrationNeededEffort Evaluation Process Thesis Contribution #2: Propose a realistic evaluation for the automated retrieval of duplicate reports, and analyze prior findings using such realistic evaluation [Published in TSE 2017]
  • 39.
    Time Data ChangesNeeded EffortEvaluation Process Data Filtration Realistic Evaluation Do we need all these issue reports to search in? Filtering out old issue reports should lead to reduced search time Testing period (1 Year) No Duplicates are Ignored
  • 40.
    Time Data ChangesNeeded EffortEvaluation Process Data Filtration Realistic Evaluation The possible solution is to filter out the old issue reports using a time window based on the report date Testing period (1 Year) No Duplicates are Ignored
  • 41.
    Time Data ChangesNeeded EffortEvaluation Process Data Filtration Realistic Evaluation 2 Years Example (1) AA BB Testing period (1 Year) The possible solution is to filter out the old issue reports using a time window based on the report date
  • 42.
    Time Data ChangesNeeded EffortEvaluation Process Data Filtration Realistic Evaluation 4 months Example (2) AA BB Testing period (1 Year) The possible solution is to filter out the old issue reports using a time window based on the report date
  • 43.
    Data ChangesNeeded EffortEvaluation Process Data Filtration
  • 44.
    Data ChangesNeeded EffortEvaluation Process Data Filtration Realistic Evaluation
  • 45.
    Data ChangesNeeded EffortEvaluation Process Data Filtration Realistic Evaluation Results of the limitation by (n) months It is not possible to limit the issue reports that are searched without decreasing the performance of automated approaches
  • 46.
    Data ChangesNeeded EffortEvaluation Process Data Filtration A B Time Testing period reported
  • 47.
    Data ChangesNeeded EffortEvaluation Process Data Filtration A B Time (A) is resolved 1 month ago Testing period reported
  • 48.
    Data ChangesNeeded EffortEvaluation Process Data Filtration A B Time (A) is resolved 1 month ago (B) is resolved 2 years ago Testing period reported
  • 49.
    Data ChangesNeeded EffortEvaluation Process Data Filtration A B Time Testing period A new duplicate is more likely to be duplicate of (A) (A) is resolved 1 month ago (B) is resolved 2 years ago reported
  • 50.
    Data ChangesNeeded EffortEvaluation Process Data Filtration I propose filtration approach based on time threshold from resolution date for each resolution value and rank The thresholds are optimized using Genetic Algorithm (NSGA-II)
  • 51.
    Data ChangesNeeded EffortEvaluation Process Data Filtration The proposed filtration approach improves the performance by up to 60%
  • 52.
    Data ChangesNeeded EffortEvaluation Process Data Filtration Thesis Contribution #3: Propose a genetic algorithm based approach for filtering out the old issue reports to improve the performance of duplicate retrieval approaches. [Published in TSE 2017]
  • 53.
    Time Needed Effort EvaluationProcess Data Filtration Data Changes 2011 Just-in-time (JIT) duplicate retrieval feature is introduced
  • 54.
    Needed Effort EvaluationProcess Data Filtration Data Changes
  • 55.
    Time Needed Effort EvaluationProcess Data Filtration Data Changes 2011 Significantly less duplicates but more identification effort is needed
  • 56.
    Time Needed Effort EvaluationProcess Data Filtration Data Changes 2011 Automated Approach Automated Approach How the JIT duplicate retrieval feature impact the performance evaluation of the automated approaches?
  • 57.
    Needed Effort EvaluationProcess Data Filtration Data Changes Approach (1): BM25F Approach (2): REP The performance of both approaches is significantly lower for after-JIT duplicates Firefox
  • 58.
    Needed Effort EvaluationProcess Data Filtration Data Changes The relative improvement of the performance of REP over BM25F is significantly larger after the activation of the JIT Firefox
  • 59.
    Needed Effort EvaluationProcess Data Filtration Data Changes Thesis Contribution #4: Highlight the impact of the “JIT” feature on the performance evaluation of duplicate retrieval approaches [Submitted to EMSE] – under review
  • 60.
  • 61.
    Prior research focuseson improving the performance Automated Approach Performance Recall Rate Duplicate The goal of newly proposed approach is to have higher performance than prior ones Prior Research 2007: Runeson et al. 2008: Wang et al. 2010: Sun et al. 2010: Sureka et al. 2011: Sun et al. 2012: Zhou al. 2012: Nguyen al. 2015: Aggarwal al. 2016: Hindle al. 2016: Jie et al.
  • 62.
    In my thesis,I revisit the experimental design choices from four perspectives Automated Approach Performance Recall Rate Duplicate The goal of newly proposed approach is to have better performance than prior ones A BRecall Rate = 0.6 Recall Rate = 0.5 >Example: Prior Research 2007: Runeson et al. 2008: Wang et al. 2010: Sun et al. 2010: Sureka et al. 2011: Sun et al. 2012: Zhou al. 2012: Nguyen al. 2015: Aggarwal al. 2016: Hindle al. 2016: Jie et al. Revisiting the experimental design choices can help better understand the benefits and limitations of prior approaches evaluation. Needed Effort Data Filtration Data Changes Evaluation Process
  • 63.
    Evaluation Process DataFiltration Data ChangesNeeded Effort Thesis Contribution #1: Show the importance of considering the needed effort for retrieving duplicate reports in the performance measurement of automated duplicate retrieval approaches [Published in EMSE 2015]
  • 64.
    Data ChangesData FiltrationNeededEffort Evaluation Process Thesis Contribution #2: Propose a realistic evaluation for the automated retrieval of duplicate reports, and analyze prior findings using such realistic evaluation [Published in TSE 2017]
  • 65.
    Data ChangesNeeded EffortEvaluation Process Data Filtration Thesis Contribution #3: Propose a genetic algorithm based approach for filtering out the old issue reports to improve the performance of duplicate retrieval approaches [Published in TSE 2017]
  • 66.
    Needed Effort EvaluationProcess Data Filtration Data Changes Thesis Contribution #4: Highlight the impact of the “JIT” feature on the performance evaluation of duplicate retrieval approaches [EMSE 2017] – under review
  • 67.