Master Thesis presentation

27,608 views

Published on

I used these slides for my final presentation and Master Thesis defence.

Published in: Technology
5 Comments
4 Likes
Statistics
Notes
No Downloads
Views
Total views
27,608
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
441
Comments
5
Likes
4
Embeds 0
No embeds

No notes for slide

Master Thesis presentation

  1. 1. Analysis of Advanced Aggregation Techniques for Software Metrics Final presentation Bogdan Vasilescu b.n.vasilescu@student.tue.nl Supervisor: Dr. Alexander SerebrenikJuly 20, 2011
  2. 2. Analysis of advanced aggregation techniques for software metrics 2/32 Most metrics do not have a definition at system level./ department of mathematics and computer science
  3. 3. Analysis of advanced aggregation techniques for software metrics 2/32 Most metrics do not have a definition at system level./ department of mathematics and computer science
  4. 4. Analysis of advanced aggregation techniques for software metrics 2/32 Most metrics do not have a definition at system level./ department of mathematics and computer science
  5. 5. Analysis of advanced aggregation techniques for software metrics 2/32 Most metrics do not have a definition at system level./ department of mathematics and computer science
  6. 6. Analysis of advanced aggregation techniques for software metrics 3/32 “Designing a sound aggregation of software metrics is not obvious and it is still an open issue.” [CSS09]/ department of mathematics and computer science
  7. 7. Analysis of advanced aggregation techniques for software metrics 3/32 “Designing a sound aggregation of software metrics is not obvious and it is still an open issue.” [CSS09] Goal Derive requirements for aggregation techniques for software metrics./ department of mathematics and computer science
  8. 8. Aggregation of software metrics 4/32 Many to one: Same artifact Different metrics Example: Maintainability Index/ department of mathematics and computer science
  9. 9. Aggregation of software metrics 4/32 Many to one: Same artifact Different metrics Example: Maintainability Index One to many: Same metric Different artifacts Example: Weighted Methods per Class/ department of mathematics and computer science
  10. 10. Approach 5/32 Derive requirements for one-to-many aggregation techniques for software metrics/ department of mathematics and computer science
  11. 11. Approach 5/32 Study existing aggregation techniques: - traditional (e.g., mean, median) - inequality indices (e.g., Gini, Theil) - threshold-based (e.g., SIG, Squale) Theoretical Empirical analysis analysis Derive requirements for one-to-many aggregation techniques for software metrics/ department of mathematics and computer science
  12. 12. Inequality indices 6/32 Econometrics: measure/explain the inequality of income or wealth. Software metrics and econometric variables have distributions with similar shapes. Source Lines of Code: freecol−0.9.4 Household income in Ilocos, Philippines (1998) 100 200 300 400 500 400 300 Frequency Frequency 200 100 0 0 0 500 1000 1500 2000 2500 3000 0 500000 1500000 2500000 SLOC per class Income/ department of mathematics and computer science
  13. 13. Degree of concentration of functionality 7/32 Lorenz curve for SLOC in Hibernate 3.6.0-beta4. 1.0 0.8 0.6 % SLOC 0.4 0.2 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 % Classes/ department of mathematics and computer science
  14. 14. Degree of concentration of functionality 7/32 Lorenz curve for SLOC in Hibernate 3.6.0-beta4. A 2A A+ B = I Gini = I Hoover A B/ department of mathematics and computer science
  15. 15. Degree of concentration of functionality 7/32 Lorenz curve for SLOC in Hibernate Measure inequality between: 3.6.0-beta4. individuals (e.g., classes) A groups 2A A+ B = I Gini = (e.g., components) I Hoover A B/ department of mathematics and computer science
  16. 16. Degree of concentration of functionality 7/32 When computing the inequality Measure inequality between: within the entire population, it is individuals often desirable to assess the (e.g., classes) contribution of the inequality groups between the groups. (e.g., components) Decomposability: I (X ) = I within + I between m = ωj I (Xj ) + I between j =1/ department of mathematics and computer science
  17. 17. Traceability via decomposability 8/32 Share of inequality explained by the partitioning G = {G1 , . . . , Gm }: I between (G ) R (G ) = I (X )/ department of mathematics and computer science
  18. 18. Traceability via decomposability 8/32 Share of inequality explained by the partitioning G = {G1 , . . . , Gm }: I between (G ) R (G ) = I (X ) Which individuals (classes in package) contribute to 80% of the inequality of SLOC? Which class contributes the most to the inequality?/ department of mathematics and computer science
  19. 19. Traceability via decomposability 8/32 Lemma Let X = {x1 , x2 , . . . , xn } be a collection of values such that x1 ≤ xi ≤ xn . Then, it is either x1 or xn that contributes the most to the inequality measured using ITheil , i.e., it is either the partitioning ({x1 }, X {x1 }) or the partitioning ({xn }, X {xn }) that provides the best explanation for the inequality measured using ITheil ./ department of mathematics and computer science
  20. 20. Other properties of inequality indices 9/32 Symmetry Inequality stays the same for any permutation of the population./ department of mathematics and computer science
  21. 21. Other properties of inequality indices 9/32 Symmetry Inequality stays the same for any permutation of the population./ department of mathematics and computer science
  22. 22. Other properties of inequality indices 9/32 Symmetry Inequality stays the same for any permutation of the population./ department of mathematics and computer science
  23. 23. Other properties of inequality indices 10/32 Population principle Inequality does not change if the population is replicated any number of times./ department of mathematics and computer science
  24. 24. Other properties of inequality indices 10/32 Population principle Inequality does not change if the population is replicated any number of times./ department of mathematics and computer science
  25. 25. Other properties of inequality indices 10/32 Population principle Inequality does not change if the population is replicated any number of times./ department of mathematics and computer science
  26. 26. Other properties of inequality indices 11/32 Transfers principle A transfer from a rich man to a poor man (without reversing their position) should decrease inequality./ department of mathematics and computer science
  27. 27. Other properties of inequality indices 11/32 Transfers principle A transfer from a rich man to a poor man (without reversing their position) should decrease inequality./ department of mathematics and computer science
  28. 28. Other properties of inequality indices 11/32 Transfers principle A transfer from a rich man to a poor man (without reversing their position) should decrease inequality./ department of mathematics and computer science
  29. 29. Other properties of inequality indices 11/32 Transfers principle 20 36 45 30 36 A transfer from a rich man to a poor man (without reversing their position) should decrease inequality./ department of mathematics and computer science
  30. 30. Other properties of inequality indices 12/32 Scale invariance Inequality does not change if all values are multiplied by the same constant./ department of mathematics and computer science
  31. 31. Other properties of inequality indices 12/32 Scale invariance Inequality does not change if all values are multiplied by the same constant./ department of mathematics and computer science
  32. 32. Summary 13/32 Ineq. index Sym. Inv. Dec. Pop. Tra. IGini × ITheil × IMLD × IHoover × α IAtkinson × β IKolm + Problems include: Domain not always Rn . No distinction between all values equal but low, and all values equal but high./ department of mathematics and computer science
  33. 33. Threshold-based aggregation techniques 14/32 Two types: hard thresholds: improvements in quality are not reflected as long as the metrics stay within certain boundaries (e.g., SIG). soft thresholds: do not exhibit staircasing effects (e.g., Squale)./ department of mathematics and computer science
  34. 34. The Squale Quality Model 15/32 Metrics Individual Marks in [0,3] Global Mark in [0,3]/ department of mathematics and computer science
  35. 35. The Squale Quality Model 15/32 3.0 Individual Mark (IM) 2.5 2.0 1.5 1.0 0.5 Metrics 0.0 0 10 20 30 40 50 60 70 80 90 110 130 150 170 SLOC per method Individual Marks in [0,3] Global Mark in [0,3]/ department of mathematics and computer science
  36. 36. The Squale Quality Model 15/32 3.0 Individual Mark (IM) 2.5 2.0 1.5 1.0 0.5 Metrics 0.0 0 10 20 30 40 50 60 70 80 90 110 130 150 170 SLOC per method Individual Marks in [0,3] Global Mark in [0,3]/ department of mathematics and computer science
  37. 37. Properties of Squale aggregation 16/32 Symmetry Population princ. 20 36 45 30 36 Anti-transfers princ./ department of mathematics and computer science
  38. 38. Properties of Squale aggregation 17/32 Lemma log λ λ IKolm (x1 , . . . , xn ) + ISquale (x1 , . . . , xn ) = x ¯ Lemma λ For all c ∈ R it holds that ISquale is “unit translatable”, i.e., λ λ ISquale (x1 + c, . . . , xn + c) = ISquale (x1 , . . . , xn ) + c Inequality indices are invariant with respect to either multiplication, or addition./ department of mathematics and computer science
  39. 39. Summary 18/32 We distill: Highlighting undesirable values in the aggregated result. However, problems include: Thresholds should be derived and validated. A high rating is not necessarily an indication of good software engineering practices. Not decomposable./ department of mathematics and computer science
  40. 40. Approach 19/32 Study existing aggregation techniques: - traditional (e.g., mean, median) - inequality indices (e.g., Gini, Theil) - threshold-based (e.g., SIG, Squale) Theoretical Empirical analysis analysis Derive requirements for one-to-many aggregation techniques for software metrics/ department of mathematics and computer science
  41. 41. Empirical evaluation 20/32/ department of mathematics and computer science
  42. 42. Pilot study 21/32 Aggregate SLOC from class to package level. Study statistical correlation between aggregation techniques and number of defects per package. pairs of aggregation techniques. Case studies: ArgoUML, Adempiere, Mogwai. Questions: Does aggregation technique influence correlation with bugs? Which aggregation techniques convey the same information?/ department of mathematics and computer science
  43. 43. Pilot study 21/32 Aggregate SLOC from class to package level. Study statistical correlation between aggregation techniques and number of defects per package. pairs of aggregation techniques. Case studies: ArgoUML, Adempiere, Mogwai. Questions: Does aggregation technique influence correlation with bugs? • Correlation between SLOC and defects is not strong, and is influenced by the aggregation technique. Which aggregation techniques convey the same information? • IGini , ITheil , IMLD , IHoover , and IAtkinson convey the same information./ department of mathematics and computer science
  44. 44. Threats to validity 22/32 Threat Pilot Metric SLOC ArgoUML System Adempiere Mogwai Version single Technique traditional ineq. indices Aggr. level class–package/ department of mathematics and computer science
  45. 45. Threats to validity 22/32 Threat Pilot Subsequent studies Metric SLOC SLOC, LOC, NOS, NOSt, DIT, NOC, PBS, PLwC ArgoUML Qualitas Corpus System Adempiere 106 Java open-source systems Mogwai 430K files, 57 MSLOC Version single 414 from 13/106 systems (> 10 versions) Technique traditional traditional, ineq. indices, threshold-based ineq. indices Aggr. level class–package class-package, method–class/ department of mathematics and computer science
  46. 46. Results (1) 23/32 IGini , ITheil , IMLD , IAtkinson , and IHoover always convey the same information. 1.0 0.5 SLOC 0.0 -0.5 -1.0 (91%) (89%) (91%) (90%) (92%) (92%) (90%) (91%) (91%) (92%) MLD-Hoo Gin-MLD The-MLD Gin-Hoo Atk-Hoo The-Hoo Gin-Atk MLD-Atk Gin-The The-Atk 1.0 0.5 DIT 0.0 -0.5 -1.0 (85%) (87%) (87%) (88%) (88%) (89%) (88%) (88%) (88%) (89%) MLD-Hoo Atk-Hoo Gin-MLD The-Hoo Gin-Atk Gin-Hoo Gin-The The-MLD The-Atk MLD-Atk/ department of mathematics and computer science
  47. 47. Results (2) 24/32 IKolm shows high correlation with mean for size metrics. Kendall corr.: mean - Kolm (SLOC) Kendall corr.: mean - Kolm (DIT) Kendall corr.: mean - Kolm (PLwC) 1.0 1.0 1.0 0.5 0.5 0.5 Kendall correlation coefficient Kendall correlation coefficient Kendall correlation coefficient 0.0 0.0 0.0 -0.5 -0.5 -0.5 -1.0 -1.0 -1.0/ department of mathematics and computer science
  48. 48. Results (3) 25/32 Superlinear (e.g., ITheil –IGini ) and chaotic (e.g., ITheil –IKolm ) patterns can be observed in the scatter plots. compiere: Theil-Gini. Kendall: 0.94, p-val: 0.00 compiere: Theil-Kolm. Kendall: 0.25, p-val: 0.01 1.0 1.0 0.8 0.8 Theil (SLOC) Theil (SLOC) 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0 50 100 150 200 250 300 350 Gini (SLOC) Kolm (SLOC)/ department of mathematics and computer science
  49. 49. Results (4) 26/32 Changing the aggregation level to class level does not affect the correlation between various aggregation techniques as measured at package level. Kendall: Gini - Theil (SLOC) (100%) Kendall: Theil - Atkinson (SLOC) (100%) Kendall: Theil - MLD (SLOC) (100%) 1.0 1.0 1.0 0.5 0.5 0.5 Kendall correlation coefficient Kendall correlation coefficient Kendall correlation coefficient 0.0 0.0 0.0 -0.5 -0.5 -0.5 -1.0 -1.0 -1.0/ department of mathematics and computer science
  50. 50. / Cor. coeff. Theil(SLOC) − Kolm(SLOC) 0.0 0.2 0.4 0.6 0.8 1.0 0.8.1 1.0 1.1 2.0−beta−1 2.0−beta−2 2.0−beta−3 2.0−beta−4 2.0−final 2.0−rc2 2.0.1 Results (5) 2.0.2 2.0.3 2.1−beta−1 2.1−beta−2 2.1−beta−3 2.1−beta−3b 2.1−beta−4 2.1−beta−5 2.1−beta−6 2.1−final 2.1−rc1 2.1.1 2.1.2 2.1.3 2.1.4 2.1.5 2.1.6 2.1.7department of mathematics and computer science 2.1.8 3.0 3.0−alpha 3.0−beta1 3.0−beta2 3.0−beta3 3.0−beta4 3.0−rc1 3.0.1 3.0.2 3.0.3 3.0.4 3.0.5 3.1 3.1−alpha1 3.1−beta1 3.1−beta2 3.1−beta3 3.1−rc1 3.1−rc2 3.1−rc3 3.1.1 3.1.2 3.1.3 3.2−alpha1 3.2−alpha2 3.2−cr1 3.2−cr2 3.2.0−cr3 3.2.0−cr4 3.2.0−cr5 3.2.0.ga 3.2.1−ga 3.2.2−ga 3.2.3−ga hibernate − Kendall(Theil(SLOC), Kolm(SLOC)) (86 releases) 3.2.4−ga 3.2.4−sp1 3.2.5−ga 3.2.6−ga techniques, e.g., ITheil –IKolm increases with system size. 3.2.7−ga 3.3.0−cr2 3.3.0−ga 3.3.0−sp1 3.3.0.cr1 3.3.1−ga 3.3.2−ga 3.5.0−beta−1 3.5.0−beta−2 3.5.0−beta−3 3.5.0−beta−4 3.5.0−cr−1 System size does influence the correlation between aggregation 3.5.0−cr−2 3.5.3−final 3.5.5−final 3.6.0−beta1 3.6.0−beta2 3.6.0−beta3 3.6.0−beta4 27/32
  51. 51. Results (6) 28/32 SIG and Squale correlate positively to each other and negatively to all other aggregation techniques. Kendall: Squale(3) - SIGd (SLOC) (95%) Kendall: Gini - Squale(3) (SLOC) (95%) Kendall: Theil - Squale(3) (SLOC) (95%) 1.0 1.0 1.0 0.5 0.5 0.5 Kendall correlation coefficient Kendall correlation coefficient Kendall correlation coefficient 0.0 0.0 0.0 -0.5 -0.5 -0.5 -1.0 -1.0 -1.0/ department of mathematics and computer science
  52. 52. Results (7) 29/32 Inequality indices are less appropriate for highlighting undesirable values unless assumptions about their number can be made. Squale (weight = 3) aggregate for different percentages of perfect IMs Theil aggregate for different percentages of perfect IMs 3.0 3.0 0.0 3.0 Average Squale (weight = 3) mark 2.5 2.5 2.5 0.5 Average Theil aggregate Average mean range Average mean range 2.0 2.0 2.0 1.0 1.5 1.5 1.5 1.0 1.0 1.5 1.0 range [2, 3) range [2, 3) range [1, 2) 0.5 0.5 range [0.5, 1) 0.5 range [1, 2) 2.0 range [0.1, 0.5) range [0.5, 1) range [0.1, 0.5) range (0, 0.1) 0.0 range (0, 0.1) 0.0 0.0 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Percentage of imperfect marks Percentage of imperfect marks Kolm aggregate for different percentages of perfect IMs 0.0 3.0 2.5 0.2 Average Kolm aggregate Average mean range 2.0 0.4 1.5 0.6 1.0 range [2, 3) 0.8 range [1, 2) range [0.5, 1) 0.5 range [0.1, 0.5) range (0, 0.1) 1.0 0.0 0 10 20 30 40 50 60 70 80 90 100/ department of mathematics and computer science Percentage of imperfect marks
  53. 53. Summary 30/32 We distill: Correlation with Squale or SIG for aggregation techniques that satisfy the highlight problems requirement. Correlation with ITheil , IMLD , or IAtkinson , e.g., for aggregation techniques that satisfy the symmetry and decomposability requirements./ department of mathematics and computer science
  54. 54. Conclusions 31/32 Existing aggregation techniques Empirical analysis Theoretical analysis - methodology and tooling - root-cause analysis using - correlation studies with different - mathematical properties of objectives, metrics, systems, versions, aggregation techniques, aggregation levels Requirements for one-to-many aggregation techniques for software metrics/ department of mathematics and computer science
  55. 55. Conclusions 31/32 Existing aggregation techniques Empirical analysis Theoretical analysis - methodology and tooling - root-cause analysis using - correlation studies with different - mathematical properties of objectives, metrics, systems, versions, aggregation techniques, aggregation levels Requirements for one-to-many aggregation techniques for software metrics Social organization Determine an optimal partitioning of software projects Extensions: - other software metrics - non-software domains Apply the same techniques to aggregation of combined metrics data New one-to-many aggregation techniques for software metrics/ department of mathematics and computer science
  56. 56. Publications 32/32 You Can’t Control the Unfamiliar: Comparative Study of Software Metrics’ Aggregation Techniques A Study on the Relations Between Aggregation Techniques for Software Metrics Bogdan Vasilescu, Alexander Serebrenik∗, Mark van den Brand Technische Universiteit Eindhoven, Bogdan Vasilescu, Alexander Serebrenik, Mark van den Brand Den Dolech 2, P.O. Box 513, 5600 MB Eindhoven, The Netherlands Technische Universiteit Eindhoven, Den Dolech 2, P Box 513, .O. 5600 MB Eindhoven, The Netherlands {b.n.vasilescu@student., a.serebrenik@, m.g.j.v.d.brand@}tue.nl Abstr act While software metrics are commonly used to assess software maintainability and study software evolution, they are Abstract—A popular approach to assessing software main- However, metrics are usually defined at micro level (method, usually defined on a micro-level (method, class, package). Metrics should therefore be aggregated in order to provide tainability and predicting its evolution involves collecting and class, package), while the analysis of maintainability and By No Means: A Study on Aggregating Software Metrics insights in the evolution at the macro-level (system). In addition to traditional aggregation techniques such as the analyzing software metr ics. However, metr ics are usually defined on a micro-level (method, class, package), and should therefore evolution requires insights at macro (system) level. Moreover, JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION: RESEARCH AND PRACTICE mean, recently econometric aggregation techniques such as the Gini index and the Theil index have been proposed. be aggregated in or der to provide insights in the evolution at the due to privacy reasons, it J. Softw. Maint. Evol.: Res. to disclose00:1–15 might be undesirable Pract. 0000; Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/smr Advantages and disadvantages of di erent aggregation techniques have not been evaluated empirically so far. In this macro-level (system). I n addition to tr aditional aggregation tech- metrics pertaining to a single developer as opposed to those paper we present the preliminary results of the comparative study of di erent aggregation techniques.Alexander Serebrenik Bogdan Vasilescu Mark van den Brand niques such as the mean, median, or sum, recently econometr ic pertaining to the entire project [10]. Metrics should therefore Technische Universiteit Technische Universiteit Technische Universiteit aggregation techniques, such as the Gini, Theil, Kolm, Atkinson, be aggregated [11]. Keywords: and Hoover inequality indices have been proposed and applied Eindhoven Eindhoven Eindhoven Popular aggregation techniques include such standard sum- software metrics, maintainability, aggregation techniques Den Dolech 2, P.O. Box 513, Den Dolech 2, P.O. Box 513, Den Dolech 2, P.O. Box 513, to software metr ics. 5600 MB Eindhoven 5600 MB Eindhoven 5600 MB Eindhoven I n this paper we present the results of an extensive cor relation Practical Software Quality Metrics Aggregation mary statistical measures as mean, median, or sum [12], [13]. study of the most widely-used tr aditional and econometr ic aggre- Their main advantage is universality (metrics-independence): The Netherlands The Netherlands The Netherlands gation techniques, applied to lifting SL OC values from class to whatever metrics are considered, the measures should be cal- b.n.vasilescu@student.tue.nl a.serebrenik@tue.nl m.g.j.v.d.brand@tue.nl package level in the 106 systems compr ising the Qualitas Cor pus. culated in the same way. However, as the distribution of many 1. I ntroduction M oreover, we investigate the nature of this relation, and study Karine Mordal 1 , Nicolas Anquetil 2 , Jannik Laval 2 , Alexander Serebrenik3 , Bogdan interesting software metrics is skewed [14], the interpretation ABSTRACT (source) lines of code, (S)LOC. Size (SLOC) not onlyits evolution on a subset of 12 systems from the Qualitas Cor pus. corre- of such measures becomes unreliable [15]. Vasilescu3 , and St´ phane Ducasse2 e While software metrics are commonly used to assess software maintainability and study software evolution, they sponds to the intuitive belief that large systems have more results indicate high and statistically significant cor re- Our Fault prediction models usually employ software metrics which are usually defined on a micro-level (method, class, package). Metrics should therefore be aggregated in order to faults in them than small systems, but was shown lation between the Gini, Theil, Atkinson, and Hoover indices, Alternatively, distribution fitting [14], [16], [17] consists of1 to act LIASD, University of Paris 8, France were previously shown to be a strong predictor for defects, provide insights in the evolution at the macro-level (system). Popular aggregation techniques include themicro- [15] as an early indicator of problems better than, e.g., object- i.e., aggregation values obtained using these techniques convey selecting a known family of distributions (e.g., log-normal 2 RMoD Team, INRIA, Lille, France e.g., SLOC. However, metrics are usually de ned on a mean the same infor mation. However, we discuss some of the r ationale or exponential) and fitting its parameters to approximate the Universiteit Eindhoven, The Netherlands 3 Technische and distribution fitting [4, 19]. The main advantage of the mean is its metrics-independence: whatever metrics are oriented metrics such as the Chidamber and Kemerer suite choosing between one index or another. level (method, class, package), and should therefore be ag- behind considered, the mean should be calculated in the same way. However, as the distribution of manyevolution atsoftware or the Lorenz and Kidd suite [9]. metric values observed. The fitted parameters can be then gregated in order to provide insights in the interesting the Distribution fitting consists of selecting a known family of distri- However, software metrics are commonly de ned at micro- metrics is skewed [24] the mean becomes unreliable. macro-level (system). In addition to traditional aggrega- seen as aggregating these values. However, the fitting process I . I NTRODUCTI ON level (method, class, package), and should therefore be ag- butions (e.g., log-normal, exponential or negativebinomial) and fitting its parameters to approximate the metric values gregated at macro-level (system), in order to provide insights tion techniques such as the mean, median, or sum, recently should be repeated whenever a new metric is being consid- Software maintenance is an area of software engineering ered. Moreover, it is still a matter of controversy whether, observed. However, the fitting process should be repeated whenever a new metric is beingsuch as the Gini, Theil, it is in the study of maintainability and evolution. econometric aggregation techniques, considered. Moreover, SUMMARY and Hoover indices have been proposed. In this paper we with deep financial implications. Indeed, it was reported that e.g., software size is distributed log-normally [16] or double still a matter of controversy whether, e.g., software size is distributed log-normally [4] or double Pareto [11]. Popular aggregation techniques include such standard sum- wish to understand whether the aggregation technique in- It is highly desirable, hence, to develop an aggregation approach that would be bothof the relation between of mary statistical measures as mean, median, or sum [19]. reliable and independent between 60% and 90% of the software budgets represent main- Pareto [18]. We do not consider the growing fitting. quality assessment of entire software systems, in practice, new issues are With distribution need for uences the presence and strength the metrics being aggregated. Examples of such approaches are the Gini coe cientindicate that correlation is[22], Their main advantage is universality (metrics-independence): tenance and evolution costs [1]–[3]. Furthermore, maintenance Recently, there is an emerging trendFirst, since most software quality metrics are defined at the level of individual software emerging. in using more advanced SLOC and defects. Our results [10] and the Theil index components, there is a need for aggregation methods to summarize the results at the system level. Second, whatever metrics are considered, the measures should be and evolution costs were forecasted to account for more than aggregation techniques borrowed practical evaluation requires the use of different metrics, with possibly widely varying output ranges, since a from econometrics, where both well-known in econometrics [6] and recently not strong, software metrics [23, 20]. Comparison of di erent applied to and is in uenced by the aggregation technique. calculated in the same way. However, as the distribution of North American and European software budgets in half of they are used to study inequality of a need to combine distribu- there is income or welfare these results into a unified quality assessment. Third, since projects vary and aggregation techniques was so far missing, however. In this short paper we present the first preliminary results. many interesting software metrics is skewed [29], the2010 [4]. Similar or even higher figures were reported for inter- tions [19]–[21]. The motivation for organizations have different perceptions on quality, there is a need to adapt the interpretation of the different applying such techniques Categor ies and Subj ect Descr iptor s Remainder of thispaper isorganized asfollows. In Section 2 webriefly introducetheaggregation techniquesbeing pretation of such measures becomes unreliable. countries such as Norway [5] and Chile [6]. quality assessment to the perception of to software metrics is twofold. First, as numerous countries the users performing it. In this paper we identify the requirements for compared. Section 3 compares the theoretical properties of di erent aggregation techniques. Section 4 described the Alternatively, distribution tting [6, 26, 29] consists of se- D.2.7 [Software Engineering]: Distribution, Maintenance, a practical aggregation method, and present the Squale model for metric aggregation, specifically designed empirical studies conducted and, finally, Section 5 discusses related work and concludes. [Software Engineer- Controlling software maintenance costs requires predicting have few rich and many poor, numerous software systems and Enhancement corrections; D.2.8 lecting a known family of distributions (e.g., log-normal or to address the needs of practitioners. We empirically validate the adequation of Squale through experiments exponential) and tting its parameters to approximate the how the system will evolve in the future, which in turn have few very big or complex Eclipse. Additionally, wesmall or the Squale model to both traditional aggregation techniques (e.g., the on components, and many compare ing]: Metrics complexity measures metric values observed. The tted parameters can be then a better understanding of software evolution [7]–[9]. requires simple ones [15], [22], [23]. Consequently, it is commoneconometric inequality indices (e.g., the Gini or the Theil indices), recently arithmetic mean), as well as to both 2. Aggregation techniques considered as aggregating these values. However, the A ttingpopular approach to assessing software maintainability and for software metrics, as well as for econometric variables metrics. Copyright c 0000 John Wiley & Sons, Ltd. applied to aggregation of software to Gener al Ter ms process should be repeated whenever a new metric predicting its evolution involves performing measurements on is be- have strongly-skewed distributions (Figure 1). In this section we briefly present the mathematical definitions of the aggregation techniques to be evaluated. Let ing considered. Moreover, it is still a matter of controversy Measurement, Economics, Experimentation code artifacts. It starts off by identifying a number of specific Second, the shape of these distributions, which appear Received . . . {x1, . . . , xn} be the set of values to be aggregated. Then, the mean, denoted as x, is defined as 1 n xi . ¯ n i=1 whether, e.g., software size is distributed log-normally [6] or properties of the system under investigation, and then collect- visually to follow a power law, renders the use of traditional Keywor ds double Pareto [14]. ing the corresponding software metrics and analyzing their KEY WORDS: software metrics; software quality; aggregation; inequality indices aggregation techniques such as the sample mean and variance Recently, there is an emerging trend in using more ad- ∗ Corresponding author Software metrics, maintainability, aggregation techniques evolution. Although it is debatable whether one cannot control vanced aggregation techniques, that are both reliable, as well questionable at best. Indeed, it was reported that many impor- Email addresses: b.n.vasilescu@student.tue.nl (Bogdan Vasilescu), a.serebrenik@tue.nl (Alexander Serebrenik), what one cannot measure, it is without a doubt that collecting as general. Examples of such approaches are the Gini coe - tant relationships between software artifacts follow a power- m.g.j.v.d.brand@tue.nl (Mark van den Brand) 1. I NTRODUCTI ON and analyzing metrics helps increase one’s familiarity and cient [11], the Theil index [28], and the Hoover index [15], all law distribution [16], [25], and it is known that a power-law Software maintenance is an area of software engineering well-known in econometrics for their applicability to understanding of the analyzed systems. study- distribution may not have a finite mean and variance [22]. 1. INTRODUCTION with deep nancial implications. Indeed, it was reported ing income inequality [7], and recently applied to software Preprint submitted to Elsevier that up to 90% of the software budgets represent mainte- 2011 metrics [27, 30, 13, 31]. June 27, Software metrics are becoming part of the software development fabric, essential to understanding nance and evolution costs [10, 3]. Thus, in order to control In this preliminary study, based on the assumption that whether the quality of the software we are building corresponds to our expectations [Pfl08]. As size is a good predictor for defects, hence size and defects software maintenance costs, it is desirable, e.g., to predict a consequence, many different metrics have been proposed, as well as a plethora of tools to faulty components early in the development phase. should be statistically related, we wish to understand whether the aggregation technique in uences the presence and strength computethem and perform quality assessments. Considering thedifferent stakeholdersparticipating Fault prediction models usually employ software metrics which were previously shown to be a strong predictor for de- of this relation. Brie y, our results indicate that correlation in software projects (e.g. developers, managers, users), quality needs to be evaluated at different fects [9, 4, 21, 22, 20, 12]. Such a metric is size, measured in between SLOC and defects is not strong, and is in uenced levels of detail. Practical application of software metrics is, however, challenged by (i) the need by the aggregation technique. to combine different metrics as recommended by quality-model design methods such as Factor- Criteria-Metric (FCM) [MRW76], or Goal-Question-Metric (GQM) [Bas92]; (ii) the need to obtain 2. M ETHODOL OGY insights in quality of the entire system based on the metric values obtained for low-level system Permission to make digital or hard copies of all or part of this work for elements such as classes and methods; and (iii) the need to fine tune the quality model to different personal or classroom use is granted without fee provided that copies are We apply correlation analysis to SLOC data of Java classes not made or distributed for profit or commercial advantage and that copies aggregated at package level using di erent aggregation tech- quality standards employed by different organizations. We detail each challenge separately. bear this notice and the full citation on the first page. To copy otherwise, to niques, and defects (bug count per package). As a by- First, a practical quality assessment needs to combine the results of various methods to answer republish, to post on servers or to redistribute to lists, requires prior specific product of our evaluation, we also study the correlation be- specific questionsassuggested by such modelsasFactor-Criteria-Metric (FCM) [MRW76], or Goal- permission and/or a fee. ICSE ’ 11, May 21–28, 2011, Waikiki, Honolulu, HI, USA tween the di erent aggregation techniques themselves. The Question-Metric (GQM) [Bas92]. For example, cyclomatic complexity might be combined with test Copyright 2011 ACM 978-1-4503-0593-8/11/05 ...$10.00. choice for aggregating data from class to package level rather Correspondence to: INRIA Team RMod, Parc Scientifique de la Haute Borne, 40, avenue Halley. Bt.A, Park Plaza, 59650 Villeneuve d’ Ascq, France. E-mail: Nicolas.Anquetil@inria.fr Copyright c 0000 John Wiley & Sons, Ltd. Prepared using smrauth.cls [ Version: 2010/05/10 v2.00] BeNeVol 2010 WETSoM 2011 ICSM 2011 JSME/ department of mathematics and computer science

×