Your SlideShare is downloading. ×
  • Like
Benevol 2010
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Benevol 2010

  • 220 views
Published

I used these slides during my presentation at BeNeVol 2010 in Lille, France. …

I used these slides during my presentation at BeNeVol 2010 in Lille, France.

Paper:
Vasilescu B, Serebrenik A and van den Brand MGJ (2010), "Comparative study of software metrics' aggregation techniques", In Proceedings of the 9th Belgian-Netherlands Software Evolution Seminar, pp. 80-84.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
220
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
2
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Software metrics are usually right-skewed Histogram of SLOC(org.argouml.ui) 25 20 15 Frequency 10 5 0 0 100 200 300 400 500 SLOC for classes in org.argouml.ui
  • 2. 2/11Aggregation of software metrics using the “softnometric” index Bogdan Vasilescu b.n.vasilescu@student.tue.nl Eindhoven University of Technology The Netherlands March 9, 2011
  • 3. Aggregation techniques 3/11 Inequality indices:Classical: Distribution fitting: Theil Mean Log-normal Gini Sum Exponential Kolm Cardinality Negative binomial Atkinson
  • 4. Aggregation techniques 3/11 Inequality indices:Classical: Distribution fitting: Theil Mean Log-normal Gini Sum Exponential Kolm Cardinality Negative binomial Atkinson
  • 5. Gini index 4/11The Gini index is based on the Lorenz curve: proportion of the total income of the population (y-axis) cumulatively earned by the bottom x% of the people. 0 perfect equality: every person receives the same income. 1 perfect inequality: one person receives all the income.IGini (X ) = A A +B
  • 6. Gini index 4/11The Gini index is based on the Lorenz curve: proportion of the total income of the population (y-axis) cumulatively earned by the bottom x% of the people. 0 perfect equality: every person receives the same income. 1 perfect inequality: one person receives all the income.IGini (X ) = A A +B
  • 7. Theoretical comparison 5/11Criteria: Domain → determines applicability Range → determines interpretation Invariance • w.r.t. addition → LOC, ignore headers • w.r.t. multiplication → LOC, percentages vs. absolute values Decomposability → explain inequality by partitioning the population into groups
  • 8. Theoretical comparison 6/11Agg. technique Domain Range Invariance DecomposabilityMean R R - N/ASum R R - N/ACardinality R N - N/AGini Index R+ [0, 1] mult. - R R mult. -Theil Index R+ [0, log n] mult. yesKolm Index R R+ add. yesAtkinson Index R+ [0, 1 − 1/n] mult. -
  • 9. Empirical comparison 7/11Research questions: Does LOC relate to bugs? Do the aggregation techniques influence the presence/strength of this relation? Is there any difference between the aggregation techniques? Do they express the same thing?
  • 10. Empirical comparison 8/11Case study: ArgoUML Open-source, ∼ 1200 Java classes, ∼ 100 packages.
  • 11. Empirical comparison 8/11Case study: ArgoUML Open-source, ∼ 1200 Java classes, ∼ 100 packages.Methodology: Tool chain to automatically process issue tracker and version control system data. Mapped defects to Java classes and then packages. Measured SLOC of each class, aggregated to package level. For each aggregation technique, statistically studied correlation with bugs.
  • 12. Results 9/11 mean IGini ITheil IKolm IAtkinson defectsmean 0.170 0.192 0.6761 0.203 0.0096IGini 0.908 0.467 0.903 0.27ITheil 0.488 0.918 0.273IKolm 0.501 0.119IAtkinson 0.229 IGini , ITheil and IAtkinson indicate the strongest and also statistically significant correlation with the number of defects. However, high and statistically significant correlation between them. Mean indicates the lowest correlation with the number of defects. 1 statistically significant correlations, with two-sided p-values not exceeding 0.01, are typeset in boldface
  • 13. Threats to validity 10/11No control over the issue tracker → mapping of defects to classes. bugs missing from the issue tracker. bug fixes not showing up in the commit log.How representative is the case? How about the version? replicate on more systems and more versions.Is LOC the most suitable metric? replicate with more metrics.
  • 14. Conclusions 11/11 Software metrics are not distributed normally. Histogram of SLOC(org.argouml.ui) Theoretical comparison. 25 Agg. technique Domain Range Invariance Decomposability 20 Mean R R - N/A Sum R R - N/A 15Frequency Cardinality R N - N/A 10 Gini Index R+ [0, 1] mult. - R R mult. - 5 Theil Index R+ [0, log n] mult. yes 0 0 100 200 300 400 500 Kolm Index R R+ add. yes SLOC for classes in org.argouml.ui Atkinson Index R+ [0, 1 − 1/n] mult. - Empirical comparison. mean Gini Theil Kolm Atkinson defects mean 0.170 0.192 0.676 0.203 0.0096 Gini 0.908 0.467 0.903 0.27 Theil 0.488 0.918 0.273 Kolm 0.501 0.119 Atkinson 0.229 Classical aggregation techniques have problems when distributions are skewed. Inequality indices look more promising.