Prior research on automated duplicate issue report retrieval focused on improving performance metrics like recall rate. The author revisits experimental design choices from four perspectives: needed effort, data changes, data filtration, and evaluation process.
The thesis contributions are: 1) Showing the importance of considering needed effort in performance measurement. 2) Proposing a "realistic evaluation" approach and analyzing prior findings with it. 3) Developing a genetic algorithm to filter old issue reports and improve performance. 4) Highlighting the impact of "just-in-time" features on evaluation. The findings help better understand benefits and limitations of prior work in this area.