Mining Version Histories to Guide Software Changes

2,085 views
1,913 views

Published on

Presented at ICSE 2004.

Published in: Technology, Art & Photos
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,085
On SlideShare
0
From Embeds
0
Number of Embeds
530
Actions
Shares
0
Downloads
46
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Mining Version Histories to Guide Software Changes

  1. 1. 0/10 26th International Conference on Software Engineering (ICSE), Edinburgh, 28.05.2004 Mining Version Histories to Guide Software Changes Thomas Zimmermann (with Peter Weißgerber, Stephan Diehl, and Andreas Zeller) Lehrstuhl Softwaretechnik Universität des Saarlandes, Saarbrücken
  2. 2. Extending ECLIPSE Preferences 1/10 Your task: Extend ECLIPSE with a new preference.
  3. 3. Extending ECLIPSE Preferences 1/10 Your task: Extend ECLIPSE with a new preference. Preferences are stored in field fKeys[]:
  4. 4. Extending ECLIPSE Preferences 2/10 What else do you need to change? Which of the 27,000 files 20,000 classes 200,000 methods of ECLIPSE?
  5. 5. Extending ECLIPSE Preferences 2/10 What else do you need to change? Which of the 27,000 files 20,000 classes 200,000 methods of ECLIPSE? Program analysis. fKeys[] and initDefaults() use the same variables. – Usage does not induce change. – Usage can be detected only within program code. ECLIPSE has 12,000 non-JAVA files
  6. 6. Extending ECLIPSE Preferences 2/10 What else do you need to change? Which of the 27,000 files 20,000 classes 200,000 methods of ECLIPSE? Program analysis. fKeys[] and initDefaults() use the same variables. – Usage does not induce change. – Usage can be detected only within program code. ECLIPSE has 12,000 non-JAVA files Learning from history. Programmers who changed fKeys[] also changed…
  7. 7. Guiding the Programmer 3/10 A) The user inserts a new preference into the field fKeys[] B) ROSE suggests locations for further changes, e.g. the function initDefaults()
  8. 8. From CVS to Transactions 4/10 The ECLIPSE CVS archive has more than 47,000 transactions.
  9. 9. From CVS to Transactions 4/10 The ECLIPSE CVS archive has more than 47,000 transactions. !
  10. 10. Mining Association Rules 5/10 ROSE takes all transactions as input: T42 = { fKeys[], initDefaults(), …, plugin.properties, …} T752 = { fKeys[], initDefaults(), …, plugin.properties, …} T9872 = { fKeys[], initDefaults(), …, plugin.properties, …} T11386 = { fKeys[], initDefaults(), …} T20814 = { fKeys[], initDefaults(), …, plugin.properties, …} T30989 = { fKeys[], initDefaults(), …, plugin.properties, …} T41999 = { fKeys[], initDefaults(), …, plugin.properties, …} T47423 = { fKeys[], initDefaults(), …, plugin.properties, …} . . .
  11. 11. Mining Association Rules 5/10 ROSE takes all transactions as input: T42 = { fKeys[], initDefaults(), …, plugin.properties, …} T752 = { fKeys[], initDefaults(), …, plugin.properties, …} T9872 = { fKeys[], initDefaults(), …, plugin.properties, …} T11386 = { fKeys[], initDefaults(), …} T20814 = { fKeys[], initDefaults(), …, plugin.properties, …} T30989 = { fKeys[], initDefaults(), …, plugin.properties, …} T41999 = { fKeys[], initDefaults(), …, plugin.properties, …} T47423 = { fKeys[], initDefaults(), …, plugin.properties, …} . . . ROSE mines association rules from these transactions: { fKeys[], initDefaults() } ⇒ { plugin.properties } [Support 7, Confidence 7/8 = 0.875]
  12. 12. Effective Mining 6/10 The classical association mining approach is to mine all rules: – Helpful in understanding general patterns. – Requires high support thresholds (>2n possible rules). – Takes time to compute (3 days and more).
  13. 13. Effective Mining 6/10 The classical association mining approach is to mine all rules: – Helpful in understanding general patterns. – Requires high support thresholds (>2n possible rules). – Takes time to compute (3 days and more). Alternative — mine only matching rules on demand: Constraints on antecedent. Mine only rules which are related to the situation Σ, e.g. Σ ⇒ X Single consequent rules. Mine only rules which have a singleton as consequent, e.g. Σ ⇒ {x} Average runtime of a query: 0.5 seconds.
  14. 14. Precision vs. Recall 7/10 What ROSE finds What it should find False positives False negatives Correct prediction Precision How many of the returned entities are relevant? High precision = few false positives Recall How many relevant entities are returned? High recall = few false negatives
  15. 15. Evaluation 8/10 The programmer has changed one single entity. Can ROSE suggest other entities that should be changed? Granularity Entities Project Recall Precision Top3 ECLIPSE 0.15 0.26 0.53 GCC 0.28 0.39 0.89 GIMP 0.12 0.25 0.91 JBOSS 0.16 0.38 0.69 JEDIT 0.07 0.16 0.52 KOFFICE 0.08 0.17 0.46 POSTGRES 0.13 0.23 0.59 PYTHON 0.14 0.24 0.51 Average 0.15 0.26 0.64 ROSE predicts 15% of all changed entities In 64% of all transactions, ROSE’s topmost three suggestions contain a correct entity
  16. 16. Evaluation 8/10 The programmer has changed one single entity. Can ROSE suggest other entities that should be changed? Granularity Entities Files Project Recall Precision Top3 Recall Precision Top3 ECLIPSE 0.15 0.26 0.53 0.17 0.26 0.54 GCC 0.28 0.39 0.89 0.44 0.42 0.87 GIMP 0.12 0.25 0.91 0.27 0.26 0.90 JBOSS 0.16 0.38 0.69 0.25 0.37 0.64 JEDIT 0.07 0.16 0.52 0.25 0.22 0.68 KOFFICE 0.08 0.17 0.46 0.24 0.26 0.67 POSTGRES 0.13 0.23 0.59 0.23 0.24 0.68 PYTHON 0.14 0.24 0.51 0.24 0.36 0.60 Average 0.15 0.26 0.64 0.26 0.30 0.70 ROSE predicts 15% of all changed entities (files: 26%). In 64% of all transactions, ROSE’s topmost three suggestions contain a correct entity (files: 70%).
  17. 17. Challenges 9/10 Further Data Sources. Test outcomes, Mailing lists, Newsgroups, Chat logs How do we leverage these sources?
  18. 18. Challenges 9/10 Further Data Sources. Test outcomes, Mailing lists, Newsgroups, Chat logs How do we leverage these sources? Further Analyses. Program analysis, Sequence analysis, Clustering How do we integrate different analyses?
  19. 19. Challenges 9/10 Further Data Sources. Test outcomes, Mailing lists, Newsgroups, Chat logs How do we leverage these sources? Further Analyses. Program analysis, Sequence analysis, Clustering How do we integrate different analyses? From Locations to Actions. You have extended fKeys[] with UI_SPLINES; ROSE suggests: Insert store.setDefaults(UI_SPLINES, false); in function initDefaults(); The user can accept this at the touch of one button. How much can we learn from history?
  20. 20. Conclusion 10/10 5 ROSE detects coupling between non-program entities (e.g. programs and documentation). 5 ROSE effectively guides users along related changes. 5 In 64% of all transactions, ROSE’s topmost three suggestions contain a correct entity (files: 70%). 5 Research has just begun to exploit non-program artefacts: – Similar results by A. Ying (2004); A. Hassan (2004); and J. Sayyad-Shirabad (2003). – ICSE Workshop on Mining Software Repositories, 2004. 5 ROSE will be available as an ECLIPSE plug-in in Fall 2004: http://www.st.cs.uni-sb.de/softevo/

×