MSR presentation


Published on

MSR 2011 Talk slides

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

MSR presentation

  1. 1. Comparative Study Retrieval from Software Libraries for Bug Localization: A Comparative Study of Generic and Composite Text Models Shivani Rao and Avinash Kak School of ECE,Purdue University May 21, 2011 MSR, HawaiiMining Software Repositories, Hawaii, 2011
  2. 2. Comparative Study Outline 1 Bug localization 2 IR(Information Retrieval)-based bug localization 3 Text Models 4 Preprocessing of the source files 5 Evaluation Metrics 6 Results 7 ConclusionMining Software Repositories, Hawaii, 2011
  3. 3. Comparative Study Bug localization Bug localization Bug localization means to locate the files, methods, classes, etc., that are directly related to the problem causing abnormal execution behavior of the software. IR Bug localization means to locate a bug from its textual description.Mining Software Repositories, Hawaii, 2011
  4. 4. Comparative Study Background A typical bug localization processMining Software Repositories, Hawaii, 2011
  5. 5. Comparative Study Background A typical bug report:JEditMining Software Repositories, Hawaii, 2011
  6. 6. Comparative Study Background Past work on IR-based bug localization Authors/Paper Model Software dataset Marcus et al. VSM Jedit [1] Cleary et al. [2] LM, LSA and Eclipse JDT CA Lukins et al. [3] LDA Mozilla, Eclipse, Rhino and JEdit Drawbacks 1 None of the work reported has been evaluated on a standard dataset. 2 Inability to compare with the static and dynamic techniques. 3 Number of bugs is of the order 5-30Mining Software Repositories, Hawaii, 2011
  7. 7. Comparative Study Background iBUGS Created by Dallmeier and Zimmerman [4], iBUGS contains a large number of real bugs with corresponding test suites in order to generate failing and passing test runs ASPECTJ software Software Library Size (Number of files) 6546 Lines of Code 75 KLOC Vocabulary Size 7553 Number of bugs 291 Table: The iBUGS dataset after preprocessingMining Software Repositories, Hawaii, 2011
  8. 8. Comparative Study Background A typical bug report in the iBUGS repositoryMining Software Repositories, Hawaii, 2011
  9. 9. Comparative Study Text Models Text models VSM : Vector Space Model LSA : Latent Semantic Analysis Model UM : Unigram Model LDA : Latent Dirichlet Allocation Model CBDM : Cluster-Based Document ModelMining Software Repositories, Hawaii, 2011
  10. 10. Comparative Study Text Models Vector Space Model If V is the vocabulary then queries and documents are |V|-dimensional vectors. wq .wm sim(q, dm ) = |w q ||w m | Sparse yet high dimensional space.Mining Software Repositories, Hawaii, 2011
  11. 11. Comparative Study Text Models Latent semantic analysis: Eigen decomposition A = UΣV TMining Software Repositories, Hawaii, 2011
  12. 12. Comparative Study Text Models LSA based models Topic based representation: wk (m) which is a K -dimensional eigen vector that mth document wm . wK (m) = Σ−1 UK wm K T qK = Σ−1 UK q K T qK .wK (m) sim(q, dm ) = |qK ||wK (m)| LSA2: Fold back the K-dimensional representation to a smoothed |V| dimensional represenation and compare directly with the query q. w = UK ΣK wK ˜ T Combined Representation: combines the LSA2 with the VSM representation using the mixture parameter λ . ˜ Acombined = λA + (1 − λ)AMining Software Repositories, Hawaii, 2011
  13. 13. Comparative Study Text Models Unigram model to represent documents using probability distribution [5] The term frequencies in a document are considered to be its probability distribution The term frequencies in a query become the query’s probablity distribution The similarities are established by comparing the probability distributions using KL divergence. To add smoothing we add the probability distribution over the entire source library. |D| c(w , dm ) m=1 c(w , dm ) puni (w |Dm ) = µ + (1 − µ) |D| |dm | m=1 |dm | |D| c(w , q) m=1 c(w , dm ) puni (w |q) = µ + (1 − µ) |D| |q| m=1 |dm |Mining Software Repositories, Hawaii, 2011
  14. 14. Comparative Study Text Models LDA: A mixture model to represent documents using topics/concepts [6]Mining Software Repositories, Hawaii, 2011
  15. 15. Comparative Study Text Models LDA based models [7] Topic based representation θm which is a K -dimensional probability vector that indicates the topic proportions present in mth document. Maximum Likelihood Representation folds back to the |V| dimensional term space. t=K plda (w |Dm ) = p(w |z = t)p(z = t|Dm ) t=1 t=K = φ(t, w )θm (t) t=1 Combined Representation combines the Unigram representation of the document and the MLE-LDA representation of a document. pcombined (w |Dm ) = λplda (w |Dm ) + (1 − λ)puni (w |Dm )Mining Software Repositories, Hawaii, 2011
  16. 16. Comparative Study Text Models Cluster Based Document Model (CBDM) [8] Cluster the documents into K clusters using deterministic algorithms like K-means, hierarchical, agglomerative clustering and so on. Represent each of the clusters using a multinomial distribution over the terms in the vocabulary. This distribution is commonly denoted by pML (w |Clusterj ) and we can express probabilistic distribution for a words in a dm ∈ Clusterj by: wm (n) pcbdm (w |wm ) = λ1 × n=|V| + λ2 × pc (w ) + n=1 wm (n) λ3 × pML (w |Clusterj ) (1)Mining Software Repositories, Hawaii, 2011
  17. 17. Comparative Study Text Models Summary of Text Models used in the comparative studyMining Software Repositories, Hawaii, 2011
  18. 18. Comparative Study Text Models Summary of Text Models used in the comparative study (cont.) Model Representation Similarity Metric VSM frequency vector Cosine similarity LSA K dimensional vector in the Cosine similarity eigen space Unigram |V| dimensional probability vec- KL divergence tor (smoothed) LDA K dimensional probability vec- KL divergence tor CBDM |V| dimensional combined prob- KL divergence or likeli- ability vector hood Table: Generic models used in the comparative evaluationMining Software Repositories, Hawaii, 2011
  19. 19. Comparative Study Text Models Summary of Text Models used in the comparative study (cont.) Model Representation Similarity Metric LSA2 |V| dimensional representation Cosine similarity in term-space MLE- |V| dimensional MLE-LDA KL divergence or likeli- LDA probability vector hood Table: The variations on two of the generic models used in the comparative evaluationMining Software Repositories, Hawaii, 2011
  20. 20. Comparative Study Text Models Summary of Text Models used in the comparative study (cont.) Model Representation Similarity Metric Unigram |V| dimensional combined prob- KL divergence or likeli- + LDA ability vector hood VSM + |V| dimensional combined VSM Cosine similarity LSA and LSA representation Table: The two composite models usedMining Software Repositories, Hawaii, 2011
  21. 21. Comparative Study Preprocessing of the source files Preprocessing of the source files If a patch file does not exist in the /trunk then it is searched and added to the source library from the other branches/tags of the ASPECTJ The source library consists of ”.java” files only. After this step, our library ended up with 6546 Java files. The repository.xml file documents all the information related to a bug. This includes the BugID, the bug description, the relevant source files, and so on. We shall call this ground-truth information as relevance judgements. The bugs that are documented in iBUGS and do not have any relevant software files in the source library that results from the previous step are eliminated. After this step, we are left with 291 bugs.Mining Software Repositories, Hawaii, 2011
  22. 22. Comparative Study Preprocessing of the source files Preprocessing of the source files (contd) Hard-words, camel-case words and soft-words are handled by using popular identifier-splitting methods [9, 10]. Stop-list consists of most commonly occuring words. Example: “for,” “else,” “while,” “int,”, “double,” “long,” “public,” “void,” etc. There are 375 such words in iBUGS ASPECTJ software. We also drop from the vocabulary all unicode strings. The vocabulary is pruned further by calculating the relative importance of terms and eliminating ubiquitous and rarely-occuring terms.Mining Software Repositories, Hawaii, 2011
  23. 23. Comparative Study Evaluation Metrics Mean Average Precision (MAP) Mean Average Precision (MAP) Calculated using the following two sets: retreived(Nr ) set consists of the top Nr documents from a ranked list of documents retrieved vis-a-vis the query. relevant set is extracted from relevance judgements available from repository.xml Precision and Recall: |{relevant} {retrieved}| Precision(P@Nr ) = |{retrieved}| |{relevant} {retrieved}| Recall(R@Nr ) = |{relevant}|Mining Software Repositories, Hawaii, 2011
  24. 24. Comparative Study Evaluation Metrics Mean Average Precision (MAP) Mean Average Precision (MAP) (cont.) 1 If we were to plot a typical P-R curve from the values for P@Nr and R@Nr , we would get a monotonically decrceasing curve that has high values of Precision for low values of Recall and vice versa. 2 Area under the P-R curve is called the Average Precision. 3 Taking mean of the Average Precision over all the queries gives Mean Average Precision (MAP). 4 Physical significance of MAP: Same as that of Precision.Mining Software Repositories, Hawaii, 2011
  25. 25. Comparative Study Evaluation Metrics Rank of Retrieved Files Rank of Retrieved Files [3] The number of queries/bugs for which relevant source files were retrieved with ranks rlow ≤ R ≤ rhigh is reported. For the retrieval performance reported in [3], ranks used are R = 1, 2 ≤ R ≤ 5, 6 ≤ R ≤ 10 and R > 10.Mining Software Repositories, Hawaii, 2011
  26. 26. Comparative Study Evaluation Metrics SCORE SCORE [11] 1 Indicates the proportion of the program that need to be examined in order to locate or localize a fault 2 For each range of this proportion (example, 10 − 20%) the number of test-runs (bugs) is reported.Mining Software Repositories, Hawaii, 2011
  27. 27. Comparative Study Results Models using LDA Figure: MAP using the three LDA models for different values of K, the experimental parameters for LDA+Unigram model are λ = 0.9 µ = 0.5, β = 0.01 and α = 50/KMining Software Repositories, Hawaii, 2011
  28. 28. Comparative Study Results The combined LDA+Unigram model Figure: MAP plotted for different values of mixture proportions (λ and µ) of the LDA+Unigram combined model.Mining Software Repositories, Hawaii, 2011
  29. 29. Comparative Study Results Models using LSA Figure: MAP using LSA model and its variations and combinations for different values of K. The experimental parameter for the LSA+VSM combined model is λ = 0.5.Mining Software Repositories, Hawaii, 2011
  30. 30. Comparative Study Results CBDM Model parameters K λ1 λ2 λ3 100 250 500 1000 0.25 0.25 0.5 0.093144 0.0914 0.08666 0.07664 0.15 0.35 0.5 0.0883 0.0897 0.0963 0.0932 0.81 0.09 0.1 0.143 0.102 0.108 0.09952 0.27 0.63 0.1 0.1306 0.117 0.111 0.0998 0.495 0.495 0.01 0.141 0.141 0.141 0.141 0.05 0.05 0.99 0.069 0.075 0.072 0.065 Table: Retrieval performance using MAP with the CBDM. λ1 + λ2 + λ3 = 1. λ1 Unigram model λ2 Collection Model λ3 Cluster modelMining Software Repositories, Hawaii, 2011
  31. 31. Comparative Study Results Rank based metric Figure: The height of the bars shows the number of queries (bugs) for which at least one relevant source file was retrieved at rank 1.Mining Software Repositories, Hawaii, 2011
  32. 32. Comparative Study Results SCORE: IR based bug localization toolsMining Software Repositories, Hawaii, 2011
  33. 33. Comparative Study Results SCORE: Compare with AMPLE and FINDBUGS SCORE with FINDBUGS None of the bugs were localized correctly. Figure: SCORE values calculated over 44 bugs in iBUGS ASPECTJ using AMPLE [12]Mining Software Repositories, Hawaii, 2011
  34. 34. Comparative Study Conclusion Conclusion IR based bug localization techniques are equally or more effective compared to static or dynamic bug localization tools. Sophisticated models like LDA, LSA or CBDM do not out-perform simpler models like Unigram or VSM for IR based bug localization on large software systems. An analysis of the spread of the word distributions over the source files with the help of measures such as tf and idf can give useful insights into the usability of topic and cluster based models for localization.Mining Software Repositories, Hawaii, 2011
  35. 35. Comparative Study Conclusion End of Presentation Thanks to Questions?Mining Software Repositories, Hawaii, 2011
  36. 36. Comparative Study Conclusion Threads to validity We have tested on a single database like iBUGS. How does this generalize? We have eliminated xml files among those that are indexed and queried. Maybe not a valid assumption?Mining Software Repositories, Hawaii, 2011
  37. 37. Comparative Study Conclusion References A. Marcus, A. Sergeyev, V. Rajlich, and J. I. Maletic, “An Information Retrieval Approach to Concept Location in Source code,” in In Proceedings of the 11th Working Conference on Reverse Engineering (WCRE 2004, pp. 214–223, IEEE Computer Society, 2004. B. Cleary, C. Exton, J. Buckley, and M. English, “An Empirical Analysis of Information Retrieval based Concept Location Techniques in Software Comprehension,” Empirical Softw. Engg., vol. 14, no. 1, pp. 93–130, 2009. S. K. Lukins, N. A. Karft, and E. H. Letha, “Source Code Retrieval for Bug Localization using Latent Dirichlet Allocation,” in 15th Working Conference on Reverse Engineering, 2008.Mining Software Repositories, Hawaii, 2011
  38. 38. Comparative Study Conclusion References (cont.) V. Dallmeier and T. Zimmermann, “Extraction of Bug Localization Benchmarks from History,” in ASE ’07: Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering, (New York, NY, USA), pp. 433–436, ACM, 2007. J. Lafferty and C. Zhai, “A Study of Smoothing Methods for Language Models Applied to information retrieval,” ACM Transactions Information Systems, pp. 179–214, 2004. D. M. Blei, A. V. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” Journal of Machine Learning, pp. 993–1022, 2003.Mining Software Repositories, Hawaii, 2011
  39. 39. Comparative Study Conclusion References (cont.) X. Wei and W. B. Croft, “Lda-Based Document Models for Ad-hoc Retrieval,” in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, 2006. L. X and W. B. Croft, “Cluster-Based Retrieval Using Language Models,” in ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2004. D. B. H. Field and D. Lawrie., “An Empirical Comparison of Techniques for Extracting Concept Abbreviations from Identifiers.,” in Proceedings of IASTED International Conference on Software Engineering and Applications, 2006.Mining Software Repositories, Hawaii, 2011
  40. 40. Comparative Study Conclusion References (cont.) E. Enslen, E. Hill, L. Pollock, and K. Vijay-Shanker, “Mining Source Code to Automatically Split Identifiers for Software Analysis,” in Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories, MSR ’09, (Washington, DC, USA), pp. 71–80, IEEE Computer Society, 2009. J. A. Jones and M. J. Harrold, “Empirical Evaluation of the Tarantula Automatic Fault-Localization Technique,” in Automated Software Engineering, 2005. V. Dallmeier and T. Zimmermann, “Automatic Extraction of Bug Localization Benchmarks from History,” tech. rep., Universi¨t des Saarlandes, Saarbr¨cken, Germany, June 2007. a uMining Software Repositories, Hawaii, 2011