Benevol 2010
Upcoming SlideShare
Loading in...5
×
 

Benevol 2010

on

  • 386 views

I used these slides during my presentation at BeNeVol 2010 in Lille, France. ...

I used these slides during my presentation at BeNeVol 2010 in Lille, France.

Paper:
Vasilescu B, Serebrenik A and van den Brand MGJ (2010), "Comparative study of software metrics' aggregation techniques", In Proceedings of the 9th Belgian-Netherlands Software Evolution Seminar, pp. 80-84.

Statistics

Views

Total Views
386
Views on SlideShare
386
Embed Views
0

Actions

Likes
0
Downloads
2
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Benevol 2010 Presentation Transcript

  • 1. Software metrics are usually right-skewed Histogram of SLOC(org.argouml.ui) 25 20 15 Frequency 10 5 0 0 100 200 300 400 500 SLOC for classes in org.argouml.ui
  • 2. 2/11Aggregation of software metrics using the “softnometric” index Bogdan Vasilescu b.n.vasilescu@student.tue.nl Eindhoven University of Technology The Netherlands March 9, 2011
  • 3. Aggregation techniques 3/11 Inequality indices:Classical: Distribution fitting: Theil Mean Log-normal Gini Sum Exponential Kolm Cardinality Negative binomial Atkinson
  • 4. Aggregation techniques 3/11 Inequality indices:Classical: Distribution fitting: Theil Mean Log-normal Gini Sum Exponential Kolm Cardinality Negative binomial Atkinson
  • 5. Gini index 4/11The Gini index is based on the Lorenz curve: proportion of the total income of the population (y-axis) cumulatively earned by the bottom x% of the people. 0 perfect equality: every person receives the same income. 1 perfect inequality: one person receives all the income.IGini (X ) = A A +B
  • 6. Gini index 4/11The Gini index is based on the Lorenz curve: proportion of the total income of the population (y-axis) cumulatively earned by the bottom x% of the people. 0 perfect equality: every person receives the same income. 1 perfect inequality: one person receives all the income.IGini (X ) = A A +B
  • 7. Theoretical comparison 5/11Criteria: Domain → determines applicability Range → determines interpretation Invariance • w.r.t. addition → LOC, ignore headers • w.r.t. multiplication → LOC, percentages vs. absolute values Decomposability → explain inequality by partitioning the population into groups
  • 8. Theoretical comparison 6/11Agg. technique Domain Range Invariance DecomposabilityMean R R - N/ASum R R - N/ACardinality R N - N/AGini Index R+ [0, 1] mult. - R R mult. -Theil Index R+ [0, log n] mult. yesKolm Index R R+ add. yesAtkinson Index R+ [0, 1 − 1/n] mult. -
  • 9. Empirical comparison 7/11Research questions: Does LOC relate to bugs? Do the aggregation techniques influence the presence/strength of this relation? Is there any difference between the aggregation techniques? Do they express the same thing?
  • 10. Empirical comparison 8/11Case study: ArgoUML Open-source, ∼ 1200 Java classes, ∼ 100 packages.
  • 11. Empirical comparison 8/11Case study: ArgoUML Open-source, ∼ 1200 Java classes, ∼ 100 packages.Methodology: Tool chain to automatically process issue tracker and version control system data. Mapped defects to Java classes and then packages. Measured SLOC of each class, aggregated to package level. For each aggregation technique, statistically studied correlation with bugs.
  • 12. Results 9/11 mean IGini ITheil IKolm IAtkinson defectsmean 0.170 0.192 0.6761 0.203 0.0096IGini 0.908 0.467 0.903 0.27ITheil 0.488 0.918 0.273IKolm 0.501 0.119IAtkinson 0.229 IGini , ITheil and IAtkinson indicate the strongest and also statistically significant correlation with the number of defects. However, high and statistically significant correlation between them. Mean indicates the lowest correlation with the number of defects. 1 statistically significant correlations, with two-sided p-values not exceeding 0.01, are typeset in boldface
  • 13. Threats to validity 10/11No control over the issue tracker → mapping of defects to classes. bugs missing from the issue tracker. bug fixes not showing up in the commit log.How representative is the case? How about the version? replicate on more systems and more versions.Is LOC the most suitable metric? replicate with more metrics.
  • 14. Conclusions 11/11 Software metrics are not distributed normally. Histogram of SLOC(org.argouml.ui) Theoretical comparison. 25 Agg. technique Domain Range Invariance Decomposability 20 Mean R R - N/A Sum R R - N/A 15Frequency Cardinality R N - N/A 10 Gini Index R+ [0, 1] mult. - R R mult. - 5 Theil Index R+ [0, log n] mult. yes 0 0 100 200 300 400 500 Kolm Index R R+ add. yes SLOC for classes in org.argouml.ui Atkinson Index R+ [0, 1 − 1/n] mult. - Empirical comparison. mean Gini Theil Kolm Atkinson defects mean 0.170 0.192 0.676 0.203 0.0096 Gini 0.908 0.467 0.903 0.27 Theil 0.488 0.918 0.273 Kolm 0.501 0.119 Atkinson 0.229 Classical aggregation techniques have problems when distributions are skewed. Inequality indices look more promising.