Upcoming SlideShare
×

# Benevol 2010

462 views

Published on

I used these slides during my presentation at BeNeVol 2010 in Lille, France.

Paper:
Vasilescu B, Serebrenik A and van den Brand MGJ (2010), "Comparative study of software metrics' aggregation techniques", In Proceedings of the 9th Belgian-Netherlands Software Evolution Seminar, pp. 80-84.

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
462
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
3
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Benevol 2010

1. 1. Software metrics are usually right-skewed Histogram of SLOC(org.argouml.ui) 25 20 15 Frequency 10 5 0 0 100 200 300 400 500 SLOC for classes in org.argouml.ui
2. 2. 2/11Aggregation of software metrics using the “softnometric” index Bogdan Vasilescu b.n.vasilescu@student.tue.nl Eindhoven University of Technology The Netherlands March 9, 2011
3. 3. Aggregation techniques 3/11 Inequality indices:Classical: Distribution ﬁtting: Theil Mean Log-normal Gini Sum Exponential Kolm Cardinality Negative binomial Atkinson
4. 4. Aggregation techniques 3/11 Inequality indices:Classical: Distribution ﬁtting: Theil Mean Log-normal Gini Sum Exponential Kolm Cardinality Negative binomial Atkinson
5. 5. Gini index 4/11The Gini index is based on the Lorenz curve: proportion of the total income of the population (y-axis) cumulatively earned by the bottom x% of the people. 0 perfect equality: every person receives the same income. 1 perfect inequality: one person receives all the income.IGini (X ) = A A +B
6. 6. Gini index 4/11The Gini index is based on the Lorenz curve: proportion of the total income of the population (y-axis) cumulatively earned by the bottom x% of the people. 0 perfect equality: every person receives the same income. 1 perfect inequality: one person receives all the income.IGini (X ) = A A +B
7. 7. Theoretical comparison 5/11Criteria: Domain → determines applicability Range → determines interpretation Invariance • w.r.t. addition → LOC, ignore headers • w.r.t. multiplication → LOC, percentages vs. absolute values Decomposability → explain inequality by partitioning the population into groups
8. 8. Theoretical comparison 6/11Agg. technique Domain Range Invariance DecomposabilityMean R R - N/ASum R R - N/ACardinality R N - N/AGini Index R+ [0, 1] mult. - R R mult. -Theil Index R+ [0, log n] mult. yesKolm Index R R+ add. yesAtkinson Index R+ [0, 1 − 1/n] mult. -
9. 9. Empirical comparison 7/11Research questions: Does LOC relate to bugs? Do the aggregation techniques inﬂuence the presence/strength of this relation? Is there any difference between the aggregation techniques? Do they express the same thing?
10. 10. Empirical comparison 8/11Case study: ArgoUML Open-source, ∼ 1200 Java classes, ∼ 100 packages.
11. 11. Empirical comparison 8/11Case study: ArgoUML Open-source, ∼ 1200 Java classes, ∼ 100 packages.Methodology: Tool chain to automatically process issue tracker and version control system data. Mapped defects to Java classes and then packages. Measured SLOC of each class, aggregated to package level. For each aggregation technique, statistically studied correlation with bugs.
12. 12. Results 9/11 mean IGini ITheil IKolm IAtkinson defectsmean 0.170 0.192 0.6761 0.203 0.0096IGini 0.908 0.467 0.903 0.27ITheil 0.488 0.918 0.273IKolm 0.501 0.119IAtkinson 0.229 IGini , ITheil and IAtkinson indicate the strongest and also statistically signiﬁcant correlation with the number of defects. However, high and statistically signiﬁcant correlation between them. Mean indicates the lowest correlation with the number of defects. 1 statistically signiﬁcant correlations, with two-sided p-values not exceeding 0.01, are typeset in boldface
13. 13. Threats to validity 10/11No control over the issue tracker → mapping of defects to classes. bugs missing from the issue tracker. bug ﬁxes not showing up in the commit log.How representative is the case? How about the version? replicate on more systems and more versions.Is LOC the most suitable metric? replicate with more metrics.
14. 14. Conclusions 11/11 Software metrics are not distributed normally. Histogram of SLOC(org.argouml.ui) Theoretical comparison. 25 Agg. technique Domain Range Invariance Decomposability 20 Mean R R - N/A Sum R R - N/A 15Frequency Cardinality R N - N/A 10 Gini Index R+ [0, 1] mult. - R R mult. - 5 Theil Index R+ [0, log n] mult. yes 0 0 100 200 300 400 500 Kolm Index R R+ add. yes SLOC for classes in org.argouml.ui Atkinson Index R+ [0, 1 − 1/n] mult. - Empirical comparison. mean Gini Theil Kolm Atkinson defects mean 0.170 0.192 0.676 0.203 0.0096 Gini 0.908 0.467 0.903 0.27 Theil 0.488 0.918 0.273 Kolm 0.501 0.119 Atkinson 0.229 Classical aggregation techniques have problems when distributions are skewed. Inequality indices look more promising.