Mining Version Histories to Guide Software Changes

0/10
26th International Conference on Software Engineering (ICSE), Edinburgh, 28.05.2004

Mining Version Histories
to Guide Software Changes

Thomas Zimmermann
(with Peter Weißgerber, Stephan Diehl, and Andreas Zeller)
Lehrstuhl Softwaretechnik
Universität des Saarlandes, Saarbrücken

Extending ECLIPSE Preferences 1/10

Your task: Extend ECLIPSE with a new preference.


Your task: Extend ECLIPSE with a new preference.

Preferences are stored in ﬁeld fKeys[]:


What else do you need to change?

Which of the 27,000 ﬁles
20,000 classes
200,000 methods of ECLIPSE?



20,000 classes

Program analysis.
fKeys[] and initDefaults() use the same variables.

– Usage does not induce change.
– Usage can be detected only within program code.
ECLIPSE has 12,000 non-JAVA ﬁles



20,000 classes

Program analysis.
fKeys[] and initDefaults() use the same variables.

– Usage does not induce change.
– Usage can be detected only within program code.
ECLIPSE has 12,000 non-JAVA ﬁles

Learning from history.
Programmers who changed fKeys[] also changed…

Guiding the Programmer 3/10

A) The user inserts a
new preference into
the field fKeys[]

B) ROSE suggests
locations for further
changes, e.g. the
function initDefaults()

From CVS to Transactions 4/10

The ECLIPSE CVS archive has more than 47,000 transactions.

From CVS to Transactions 4/10

The ECLIPSE CVS archive has more than 47,000 transactions.

!

Mining Association Rules 5/10

ROSE takes all transactions as input:

T42 = { fKeys[], initDefaults(), …, plugin.properties, …}
T11386 = { fKeys[], initDefaults(), …}
.
.
.

Mining Association Rules 5/10

ROSE takes all transactions as input:

T11386 = { fKeys[], initDefaults(), …}
.
.
.

ROSE mines association rules from these transactions:

{ fKeys[], initDefaults() } ⇒ { plugin.properties }
[Support 7, Conﬁdence 7/8 = 0.875]

Eﬀective Mining 6/10

The classical association mining approach is to mine all rules:

– Helpful in understanding general patterns.
– Requires high support thresholds (>2n possible rules).
– Takes time to compute (3 days and more).

Eﬀective Mining 6/10

The classical association mining approach is to mine all rules:

– Helpful in understanding general patterns.
– Requires high support thresholds (>2n possible rules).
– Takes time to compute (3 days and more).

Alternative — mine only matching rules on demand:

Constraints on antecedent. Mine only rules which are related
to the situation Σ, e.g. Σ ⇒ X
Single consequent rules. Mine only rules which have a
singleton as consequent, e.g. Σ ⇒ {x}

Average runtime of a query: 0.5 seconds.

Precision vs. Recall 7/10

What ROSE finds What it should find

False positives False negatives
Correct prediction

Precision How many of the returned entities are relevant?
High precision = few false positives
Recall How many relevant entities are returned?
High recall = few false negatives

Evaluation 8/10

The programmer has changed one single entity.
Can ROSE suggest other entities that should be changed?

Granularity Entities
Project Recall Precision Top3
ECLIPSE 0.15 0.26 0.53
GCC 0.28 0.39 0.89
GIMP 0.12 0.25 0.91
JBOSS 0.16 0.38 0.69
JEDIT 0.07 0.16 0.52
KOFFICE 0.08 0.17 0.46
POSTGRES 0.13 0.23 0.59
PYTHON 0.14 0.24 0.51
Average 0.15 0.26 0.64

ROSE predicts 15% of all changed entities
In 64% of all transactions, ROSE’s topmost three suggestions
contain a correct entity

Evaluation 8/10

The programmer has changed one single entity.
Can ROSE suggest other entities that should be changed?

Granularity Entities Files
Project Recall Precision Top3 Recall Precision Top3
ECLIPSE 0.15 0.26 0.53 0.17 0.26 0.54
GCC 0.28 0.39 0.89 0.44 0.42 0.87
GIMP 0.12 0.25 0.91 0.27 0.26 0.90
JBOSS 0.16 0.38 0.69 0.25 0.37 0.64
JEDIT 0.07 0.16 0.52 0.25 0.22 0.68
KOFFICE 0.08 0.17 0.46 0.24 0.26 0.67
POSTGRES 0.13 0.23 0.59 0.23 0.24 0.68
PYTHON 0.14 0.24 0.51 0.24 0.36 0.60
Average 0.15 0.26 0.64 0.26 0.30 0.70

ROSE predicts 15% of all changed entities (ﬁles: 26%).
In 64% of all transactions, ROSE’s topmost three suggestions
contain a correct entity (ﬁles: 70%).

Challenges 9/10

Further Data Sources.
Test outcomes, Mailing lists, Newsgroups, Chat logs
How do we leverage these sources?

Challenges 9/10

Further Analyses.
Program analysis, Sequence analysis, Clustering
How do we integrate diﬀerent analyses?

Challenges 9/10

Further Analyses.
Program analysis, Sequence analysis, Clustering
How do we integrate diﬀerent analyses?
From Locations to Actions.
You have extended fKeys[] with UI_SPLINES;
ROSE suggests:
Insert store.setDefaults(UI_SPLINES, false);
in function initDefaults();
The user can accept this at the touch of one button.
How much can we learn from history?

Conclusion 10/10

5 ROSE detects coupling between non-program entities
(e.g. programs and documentation).
5 ROSE eﬀectively guides users along related changes.
5 In 64% of all transactions, ROSE’s topmost three
suggestions contain a correct entity (ﬁles: 70%).
5 Research has just begun to exploit non-program artefacts:
– Similar results by A. Ying (2004); A. Hassan (2004);
and J. Sayyad-Shirabad (2003).
– ICSE Workshop on Mining Software Repositories, 2004.

5 ROSE will be available as an ECLIPSE plug-in in Fall 2004:
http://www.st.cs.uni-sb.de/softevo/

Mining Version Histories to Guide Software Changes

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Mining Version Histories to Guide Software Changes

Similar to Mining Version Histories to Guide Software Changes (20)

More from Thomas Zimmermann

More from Thomas Zimmermann (20)

Recently uploaded

Recently uploaded (20)

Mining Version Histories to Guide Software Changes