Prity Khastgir IPR Strategic India Patent Attorney Amplify Innovation•22 views
Mining Version Histories to Guide Software Changes
1. 0/10
26th International Conference on Software Engineering (ICSE), Edinburgh, 28.05.2004
Mining Version Histories
to Guide Software Changes
Thomas Zimmermann
(with Peter Weißgerber, Stephan Diehl, and Andreas Zeller)
Lehrstuhl Softwaretechnik
Universität des Saarlandes, Saarbrücken
3. Extending ECLIPSE Preferences 1/10
Your task: Extend ECLIPSE with a new preference.
Preferences are stored in field fKeys[]:
4. Extending ECLIPSE Preferences 2/10
What else do you need to change?
Which of the 27,000 files
20,000 classes
200,000 methods of ECLIPSE?
5. Extending ECLIPSE Preferences 2/10
What else do you need to change?
Which of the 27,000 files
20,000 classes
200,000 methods of ECLIPSE?
Program analysis.
fKeys[] and initDefaults() use the same variables.
– Usage does not induce change.
– Usage can be detected only within program code.
ECLIPSE has 12,000 non-JAVA files
6. Extending ECLIPSE Preferences 2/10
What else do you need to change?
Which of the 27,000 files
20,000 classes
200,000 methods of ECLIPSE?
Program analysis.
fKeys[] and initDefaults() use the same variables.
– Usage does not induce change.
– Usage can be detected only within program code.
ECLIPSE has 12,000 non-JAVA files
Learning from history.
Programmers who changed fKeys[] also changed…
7. Guiding the Programmer 3/10
A) The user inserts a
new preference into
the field fKeys[]
B) ROSE suggests
locations for further
changes, e.g. the
function initDefaults()
8. From CVS to Transactions 4/10
The ECLIPSE CVS archive has more than 47,000 transactions.
9. From CVS to Transactions 4/10
The ECLIPSE CVS archive has more than 47,000 transactions.
!
12. Effective Mining 6/10
The classical association mining approach is to mine all rules:
– Helpful in understanding general patterns.
– Requires high support thresholds (>2n possible rules).
– Takes time to compute (3 days and more).
13. Effective Mining 6/10
The classical association mining approach is to mine all rules:
– Helpful in understanding general patterns.
– Requires high support thresholds (>2n possible rules).
– Takes time to compute (3 days and more).
Alternative — mine only matching rules on demand:
Constraints on antecedent. Mine only rules which are related
to the situation Σ, e.g. Σ ⇒ X
Single consequent rules. Mine only rules which have a
singleton as consequent, e.g. Σ ⇒ {x}
Average runtime of a query: 0.5 seconds.
14. Precision vs. Recall 7/10
What ROSE finds What it should find
False positives False negatives
Correct prediction
Precision How many of the returned entities are relevant?
High precision = few false positives
Recall How many relevant entities are returned?
High recall = few false negatives
15. Evaluation 8/10
The programmer has changed one single entity.
Can ROSE suggest other entities that should be changed?
Granularity Entities
Project Recall Precision Top3
ECLIPSE 0.15 0.26 0.53
GCC 0.28 0.39 0.89
GIMP 0.12 0.25 0.91
JBOSS 0.16 0.38 0.69
JEDIT 0.07 0.16 0.52
KOFFICE 0.08 0.17 0.46
POSTGRES 0.13 0.23 0.59
PYTHON 0.14 0.24 0.51
Average 0.15 0.26 0.64
ROSE predicts 15% of all changed entities
In 64% of all transactions, ROSE’s topmost three suggestions
contain a correct entity
16. Evaluation 8/10
The programmer has changed one single entity.
Can ROSE suggest other entities that should be changed?
Granularity Entities Files
Project Recall Precision Top3 Recall Precision Top3
ECLIPSE 0.15 0.26 0.53 0.17 0.26 0.54
GCC 0.28 0.39 0.89 0.44 0.42 0.87
GIMP 0.12 0.25 0.91 0.27 0.26 0.90
JBOSS 0.16 0.38 0.69 0.25 0.37 0.64
JEDIT 0.07 0.16 0.52 0.25 0.22 0.68
KOFFICE 0.08 0.17 0.46 0.24 0.26 0.67
POSTGRES 0.13 0.23 0.59 0.23 0.24 0.68
PYTHON 0.14 0.24 0.51 0.24 0.36 0.60
Average 0.15 0.26 0.64 0.26 0.30 0.70
ROSE predicts 15% of all changed entities (files: 26%).
In 64% of all transactions, ROSE’s topmost three suggestions
contain a correct entity (files: 70%).
17. Challenges 9/10
Further Data Sources.
Test outcomes, Mailing lists, Newsgroups, Chat logs
How do we leverage these sources?
18. Challenges 9/10
Further Data Sources.
Test outcomes, Mailing lists, Newsgroups, Chat logs
How do we leverage these sources?
Further Analyses.
Program analysis, Sequence analysis, Clustering
How do we integrate different analyses?
19. Challenges 9/10
Further Data Sources.
Test outcomes, Mailing lists, Newsgroups, Chat logs
How do we leverage these sources?
Further Analyses.
Program analysis, Sequence analysis, Clustering
How do we integrate different analyses?
From Locations to Actions.
You have extended fKeys[] with UI_SPLINES;
ROSE suggests:
Insert store.setDefaults(UI_SPLINES, false);
in function initDefaults();
The user can accept this at the touch of one button.
How much can we learn from history?
20. Conclusion 10/10
5 ROSE detects coupling between non-program entities
(e.g. programs and documentation).
5 ROSE effectively guides users along related changes.
5 In 64% of all transactions, ROSE’s topmost three
suggestions contain a correct entity (files: 70%).
5 Research has just begun to exploit non-program artefacts:
– Similar results by A. Ying (2004); A. Hassan (2004);
and J. Sayyad-Shirabad (2003).
– ICSE Workshop on Mining Software Repositories, 2004.
5 ROSE will be available as an ECLIPSE plug-in in Fall 2004:
http://www.st.cs.uni-sb.de/softevo/