Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Assisting Code Search with Automatic Query Reformulation for Bug Localization


Published on

MSR 2013, San Francisco, CA

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Assisting Code Search with Automatic Query Reformulation for Bug Localization

  1. 1. Assisting Code Search withAutomatic Query Reformulationfor Bug LocalizationBunyamin Sisman & Avinash | kak@purdue.eduPurdue UniversityMSR’13
  2. 2. OutlineI. MotivationII. Past Work on Automatic Query ReformulationIII. Proposed Approach: Spatial Code Proximity (SCP)IV. Data PreparationV. ResultsVI. ConclusionsMSR13
  3. 3. Main MotivationIt is extremely common for software developers to usearbitraryabbreviations & concatenationsin software. These are generally difficult to predict whensearching the code base of a project.The question is “Is there some way to automaticallyreformulate a user’s query so that all such relevant termsare also used in retrieval?”MSR13
  4. 4.  We show how a query can be automaticallyreformulated for superior retrieval accuracy We propose a new framework for QueryReformulation, which leverages the spatial proximityof the terms in files The approach leads to significant improvements overthe baseline and the competing Query ReformulationapproachesMSR13Summary of Our Contribution
  5. 5.  Our approach preserves or improves the retrievalaccuracy for 76% of the 4,393 bugs we analyzed forEclipse and Chrome projects Our approach improves the retrieval accuracy for 42%of the 4,393 bugs Improvements are 66% for Eclipse and 90% for Chromein terms of MAP (Mean Average Precision) We also describe the conditions under which QueryReformulation may perform poorly.MSR13Summary of Our Contribution
  6. 6. Query Reformulation withRelevance Feedback1. Perform an initial retrieval with the original query2. Analyze the set of top retrieved documents vis-à-vis the query3. Reformulate the queryMSR13
  7. 7. Acquiring RelevanceFeedback Implicitly: infer feedback from user interactions Explicitly: user provides feedback [Gay et al.2009] Pseudo Relevance Feedback (PRF): AutomaticQR This is our work!MSR13
  8. 8. Data Flow in the ProposedRetrieval FrameworkMSR13
  9. 9. Automatic Query Reformulation No user involvement! It takes less than a second to reformulate a query onordinary desktop hardware! It is cheap! It is effective!MSR13
  10. 10. Previous Work on AutomaticQR (for Text Retrieval)Rocchio’s Formula (ROCC)Relevance Model (RM)MSR13
  11. 11. The Proposed Approach to QR:Spatial Code Proximity (SCP) Spatial Code Proximity is an elegant approach togiving greater weights to terms in source code thatoccur in the vicinity of the terms in a users’ query Proximities may be created through commonly usedconcatenations Punctuation characters Camel Casing etc… Underscores: tab_strip_gtk Camel casing: kPinnedTabAnimationDurationMsMSR13
  12. 12. Spatial Code Proximity (SCP)(Cont’d) Tokenize source files and index the positions of theterms in each source file: Use the distance between terms to find relevant termsvis-à-vis a query!MSR13
  13. 13. SCP: Bringing the Query into the PictureMSR13 Example Query: “Browser Animation” First perform an initial retrieval with the original query Increase the weights of those nearby terms!
  14. 14. Research Questions Question 1: Does the proposed QR approach improvethe accuracy of source code retrieval. If so, to whatextent? Question 2: How do the QR techniques that arecurrently in the literature perform for source coderetrieval? Question 3: How does the initial retrieval performanceaffect the performance of QR? Question 4: What are the conditions under which QRmay perform poorly?MSR13
  15. 15. Data Preparation For evaluation, we need a set of queriesand the relevant files We use the titles of the bug reports asqueries We have to link the repository commitsto the bug tracking database! Used regular expressions to detect Bug Fixcommits based on commit messagesMSR13
  16. 16. Data Preparation (Cont’d)Eclipse v3.1 Chrome v4.0#Bugs 4,035 358Avg. # Relevant Files 2.76 3.82Avg. #Commits 1.36 1.23MSR131Resulting dataset: BUGLinks1
  17. 17. Evaluation Framework We use Precision and Recall based metrics to evaluatethe retrieval accuracy. Determine the query sets for which the proposed QRapproaches lead to1. improvements in the retrieval accuracy2. degradation in the retrieval accuracy3. no change in the retrieval accuracy Analyze these sets to understand the characteristics ofthe queries each set containsMSR13
  18. 18. Evaluation Framework (Cont’d) For comparison of these sets, we used the following QueryPerformance Prediction (QPP) metrics [Haiduc et al. 2012, Heet al. 2004]: Average Inverse Document Frequency (avgIDF) Average Inverse Collection Term Frequency (avgICTF) Query Scope (QS) Simplified Clarity Score (SCS) Additionally, we analyzed Query Lengths Number of Relevant files per bugMSR13
  19. 19. QR with Bug Report TitlesROCCRMSCP (Proposed)0500100015002000#BugsROCC RM SCP (Proposed)MSR13
  20. 20. Improvements in RetrievalAccuracy (% Increase in MAP)ROCCRMSCP (Proposed)0%20%40%60%80%100%Eclipse ChromeROCC RM SCP (Proposed)MSR13
  21. 21. Conclusions & Future Work Our framework can use a weak initial queryas a jumping off point for a better query. No user input is necessary We obtained significant improvements overthe baseline and the well-known AutomaticQR methods. Future Work includes evaluation of differentterm proximity metrics in source code for QRMSR13
  22. 22. References [1] B. Sisman and A. Kak, “Incorporating versionhistories in information retrieval based buglocalization,” in Proceedings of the 9th WorkingConference on Mining Software Repositories (MSR’12).IEEE, 2012, pp. 50–59 [2] G. Gay, S. Haiduc, A. Marcus, and T. Menzies, “Onthe use of relevance feedback in IR-based conceptlocation,” in International Conference on SoftwareMaintenance (ICSM’09), sept. 2009, pp. 351 –360. [3] A. Marcus, A. Sergeyev, V. Rajlich, and J. I.Maletic, “An information retrieval approach toconcept location in source code,” in Proceedings ofthe 11th Working Conference on Reverse Engineering(WCRE’04). IEEE Computer Society, 2004, pp. 214–223MSR13
  23. 23. References [4] S. Haiduc, G. Bavota, R. Oliveto, A. De Lucia, andA. Marcus, “Automatic query performance assessmentduring the retrieval of software artifacts,” inProceedings of the 27th International Conference onAutomated Software Engineering (ASE’12) .ACM, 2012, pp. 90–99 [5] B. He and I. Ounis, “Inferring query performanceusing pre-retrieval predictors,” in Proc. Symposium onString Processing and Information Retrieval . SpringerVerlag, 2004, pp. 43–54MSR13