• Like
  • Save
Seeing the forest for the trees, UMons 2011
Upcoming SlideShare
Loading in...5
×
 

Seeing the forest for the trees, UMons 2011

on

  • 403 views

Slides used during the talk at UMons in November 2011.

Slides used during the talk at UMons in November 2011.

Statistics

Views

Total Views
403
Views on SlideShare
403
Embed Views
0

Actions

Likes
1
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Seeing the forest for the trees, UMons 2011 Seeing the forest for the trees, UMons 2011 Presentation Transcript

    • Seeing the forest for the trees Bogdan Vasilescu b.n.vasilescu@tue.nl http://www.win.tue.nl/∼bvasiles/ Software Engineering and Technology group Eindhoven University of TechnologyNovember 23, 2011
    • Eindhoven 2/21/ department of mathematics and computer science
    • Eindhoven 2/21/ department of mathematics and computer science
    • Computer Science @TU/e 3/21/ department of mathematics and computer science
    • Computer Science @TU/e 3/21 Section Model Driven Software Engineering (MDSE) Group Software Engineering and Technology (SET) Mark van den Brand Alexander Serebrenik/ department of mathematics and computer science
    • Interested in . . . 4/21 Software evolution Aggregation of code metrics Activity in open-source projects Computational geometry/ department of mathematics and computer science
    • Interested in . . . 4/21 Software evolution Aggregation of code metrics Activity in open-source projects Computational geometry/ department of mathematics and computer science
    • Aggregation of software metrics 5/21 Maintaining a software system is like renovating a house. Maintainability assessment precedes changing the software. Metrics are often applied to measure maintainability. But metrics are defined at a low level (method, class). We need aggregation techniques./ department of mathematics and computer science
    • Aggregation of software metrics 6/21/ department of mathematics and computer science
    • Traditional aggregation techniques 7/21 Standard summary statistics: mean, median, . . . Red line – mean; blue line – median/ department of mathematics and computer science
    • Recent trend: Inequality indices 8/21 Econometrics: measure/explain the inequality of income or wealth. Software metrics and econometric variables have distributions with similar shapes. Source Lines of Code: freecol−0.9.4 Household income in Ilocos, Philippines (1998) 100 200 300 400 500 400 300 Frequency Frequency 200 100 0 0 0 500 1000 1500 2000 2500 3000 0 500000 1500000 2500000 SLOC per class Income/ department of mathematics and computer science
    • Degree of concentration of functionality 9/21 Lorenz curve for SLOC in Hibernate 3.6.0-beta4. 1.0 0.8 0.6 % SLOC 0.4 0.2 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 % Classes/ department of mathematics and computer science
    • Degree of concentration of functionality 9/21 Lorenz curve for SLOC in Hibernate 3.6.0-beta4. A 2A A+ B = I Gini = I Hoover A B/ department of mathematics and computer science
    • Degree of concentration of functionality 9/21 Lorenz curve for SLOC in Hibernate Measure inequality between: 3.6.0-beta4. individuals (e.g., classes) A groups 2A A+ B = I Gini = (e.g., components) I Hoover Often desirable to assess the contribution of the inequality A between the groups. B Decomposable indices Root-cause analysis/ department of mathematics and computer science
    • Traceability via decomposability 10/21 Which individuals (classes in package) contribute to 80% of the inequality (of SLOC)? Which class contributes the most to the inequality?/ department of mathematics and computer science
    • Other properties of inequality indices 11/21 Symmetry Inequality stays the same for any permutation of the population./ department of mathematics and computer science
    • Other properties of inequality indices 11/21 Symmetry Inequality stays the same for any permutation of the population./ department of mathematics and computer science
    • Other properties of inequality indices 11/21 Symmetry Inequality stays the same for any permutation of the population./ department of mathematics and computer science
    • Other properties of inequality indices 12/21 Population principle Inequality does not change if the population is replicated any number of times./ department of mathematics and computer science
    • Other properties of inequality indices 12/21 Population principle Inequality does not change if the population is replicated any number of times./ department of mathematics and computer science
    • Other properties of inequality indices 12/21 Population principle Inequality does not change if the population is replicated any number of times./ department of mathematics and computer science
    • Other properties of inequality indices 13/21 Transfers principle A transfer from a rich man to a poor man (without reversing their position) should decrease inequality./ department of mathematics and computer science
    • Other properties of inequality indices 13/21 Transfers principle A transfer from a rich man to a poor man (without reversing their position) should decrease inequality./ department of mathematics and computer science
    • Other properties of inequality indices 13/21 Transfers principle A transfer from a rich man to a poor man (without reversing their position) should decrease inequality./ department of mathematics and computer science
    • Other properties of inequality indices 13/21 Transfers principle 20 36 45 30 36 A transfer from a rich man to a poor man (without reversing their position) should decrease inequality./ department of mathematics and computer science
    • Other properties of inequality indices 14/21 Scale invariance: Gini, Theil, Atkinson, Hoover Inequality does not change if all values are multiplied by the same constant./ department of mathematics and computer science
    • Other properties of inequality indices 14/21 Scale invariance: Gini, Theil, Atkinson, Hoover Inequality does not change if all values are multiplied by the same constant./ department of mathematics and computer science
    • Summary 15/21 Ineq. index Sym. Inv. Dec. Pop. Tra. IGini × ITheil × IMLD × IHoover × α IAtkinson × β IKolm +/ department of mathematics and computer science
    • Summary 15/21 Ineq. index Sym. Inv. Dec. Pop. Tra. IGini × ITheil × IMLD × IHoover × α IAtkinson × β IKolm + Problems include: Domain not always Rn . No distinction between all values equal but low, and all values equal but high./ department of mathematics and computer science
    • Our research 16/21/ department of mathematics and computer science
    • Which are redundant? 17/21 IGini , ITheil , IMLD , IAtkinson , and IHoover always convey the same information. 1.0 0.5 SLOC 0.0 -0.5 -1.0 (91%) (89%) (91%) (90%) (92%) (92%) (90%) (91%) (91%) (92%) MLD-Hoo Gin-MLD The-MLD Gin-Hoo Atk-Hoo The-Hoo Gin-Atk MLD-Atk Gin-The The-Atk 1.0 0.5 DIT 0.0 -0.5 -1.0 (85%) (87%) (87%) (88%) (88%) (89%) (88%) (88%) (88%) (89%) MLD-Hoo Atk-Hoo Gin-MLD The-Hoo Gin-Atk Gin-Hoo Gin-The The-MLD The-Atk MLD-Atk/ department of mathematics and computer science
    • Is the correlation meaningful? 18/21 Superlinear (e.g., ITheil –IGini ) and chaotic (e.g., ITheil –IKolm ) patterns can be observed in the scatter plots. compiere: Theil-Gini. Kendall: 0.94, p-val: 0.00 compiere: Theil-Kolm. Kendall: 0.25, p-val: 0.01 1.0 1.0 0.8 0.8 Theil (SLOC) Theil (SLOC) 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0 50 100 150 200 250 300 350 Gini (SLOC) Kolm (SLOC)/ department of mathematics and computer science
    • Does the aggregation level matter? 19/21 Changing the aggregation level to class level does not affect the correlation between various aggregation techniques as measured at package level. Kendall: Gini - Theil (SLOC) (100%) Kendall: Theil - Atkinson (SLOC) (100%) Kendall: Theil - MLD (SLOC) (100%) 1.0 1.0 1.0 0.5 0.5 0.5 Kendall correlation coefficient Kendall correlation coefficient Kendall correlation coefficient 0.0 0.0 0.0 -0.5 -0.5 -0.5 -1.0 -1.0 -1.0/ department of mathematics and computer science
    • / Cor. coeff. Theil(SLOC) − Kolm(SLOC) 0.0 0.2 0.4 0.6 0.8 1.0 0.8.1 1.0 1.1 2.0−beta−1 2.0−beta−2 2.0−beta−3 2.0−beta−4 2.0−final 2.0−rc2 2.0.1 2.0.2 2.0.3 2.1−beta−1 2.1−beta−2 2.1−beta−3 2.1−beta−3b 2.1−beta−4 2.1−beta−5 2.1−beta−6 2.1−final 2.1−rc1 2.1.1 2.1.2 2.1.3 2.1.4 2.1.5 2.1.6 2.1.7department of mathematics and computer science 2.1.8 3.0 3.0−alpha 3.0−beta1 3.0−beta2 3.0−beta3 3.0−beta4 3.0−rc1 3.0.1 Does system size matter? 3.0.2 3.0.3 3.0.4 3.0.5 3.1 3.1−alpha1 3.1−beta1 3.1−beta2 3.1−beta3 3.1−rc1 3.1−rc2 3.1−rc3 3.1.1 3.1.2 3.1.3 3.2−alpha1 3.2−alpha2 3.2−cr1 3.2−cr2 3.2.0−cr3 3.2.0−cr4 3.2.0−cr5 3.2.0.ga 3.2.1−ga 3.2.2−ga 3.2.3−ga hibernate − Kendall(Theil(SLOC), Kolm(SLOC)) (86 releases) 3.2.4−ga 3.2.4−sp1 3.2.5−ga 3.2.6−ga techniques, e.g., ITheil –IKolm increases with system size. 3.2.7−ga 3.3.0−cr2 3.3.0−ga 3.3.0−sp1 3.3.0.cr1 3.3.1−ga 3.3.2−ga 3.5.0−beta−1 3.5.0−beta−2 3.5.0−beta−3 3.5.0−beta−4 3.5.0−cr−1 System size does influence the correlation between aggregation 3.5.0−cr−2 3.5.3−final 3.5.5−final 3.6.0−beta1 3.6.0−beta2 3.6.0−beta3 3.6.0−beta4 20/21
    • References 21/21 A. Serebrenik and M. G. J. van den Brand. Theil index for aggregation of software metrics values. In Int. Conf. on Software Maintenance, pages 1–9. IEEE, 2010. B. Vasilescu. Analysis of advanced aggregation techniques for software metrics. Master’s thesis, Eindhoven, The Netherlands, July 2011. B. Vasilescu, A. Serebrenik, and M. G. J. van den Brand. By no means: A study on aggregating software metrics. In 2nd International Workshop on Emerging Trends in Software Metrics, Honolulu, Hawaii, USA, 2011. B. Vasilescu, A. Serebrenik, and M. G. J. van den Brand. You can’t control the unfamiliar: A study on the relations between aggregation techniques for software metrics. In Int. Conf. on Software Maintenance. IEEE, 2011./ department of mathematics and computer science
    • Correlation 22/21 Linear correlation can be misleading. Pea: 0.816; Ken: 0.963; Spe: 0.990 Pea: 0.816; Ken: 0.636; Spe: 0.818 Pea: 0.816; Ken: 0.563; Spe: 0.690 Pea: 0.816; Ken: 0.426; Spe: 0.5 q q 12 12 12 12 q 10 10 10 10 q q qqq q q q q q q qq q q q 8 8 8 8 q q q q q q q q q q q q q q q 6 6 6 6 q q q q q q q q q 4 4 4 4 q 5 10 15 5 10 15 5 10 15 5 10 15/ department of mathematics and computer science
    • Correlation 22/21 Linear correlation can be misleading. Pea: 0.816; Ken: 0.963; Spe: 0.990 Pea: 0.816; Ken: 0.636; Spe: 0.818 Pea: 0.816; Ken: 0.563; Spe: 0.690 Pea: 0.816; Ken: 0.426; Spe: 0.5 q q 12 12 12 12 q 10 10 10 10 q q qqq q q q q q q qq q q q 8 8 8 8 q q q q q q q q q q q q q q q 6 6 6 6 q q q q q q q q q 4 4 4 4 q 5 10 15 5 10 15 5 10 15 5 10 15 [Vas11, VSvdB11a, SvdB10, VSvdB11b]/ department of mathematics and computer science