Your SlideShare is downloading.
×

×
# Saving this for later?

### Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

#### Text the download link to your phone

Standard text messaging rates apply

Like this presentation? Why not share!

- In the age of Big Data, what role f... by CS, NcState 391 views
- Applying Six Sigma and Statistical ... by Sixsigmacentral 792 views
- EPLAN efficient engineering by EPLAN Netherlands 362 views
- PROMISE 2011: Seven Habits of High ... by CS, NcState 2915 views
- What Metrics Matter? by CS, NcState 1124 views
- SOFTWARE QUALITY IN 2011 : A SURVEY... by reeistia 1615 views
- Tim Menzies, directions in Data Sci... by CS, NcState 175 views
- Six sigma by Nithin Sai 426 views
- The Art and Science of Analyzing So... by CS, NcState 1263 views
- Gender reassignment by hpinn 438 views
- The Impact of Code Review Coverage ... by Shane McIntosh 776 views
- Not my bug! Reasons for software bu... by Thomas Zimmermann 1936 views

Like this? Share it with your network
Share

1,122

views

views

Published on

A. Gunes Koru and Donsong Zhang and Hongfang Liu

A. Gunes Koru and Donsong Zhang and Hongfang Liu

No Downloads

Total Views

1,122

On Slideshare

0

From Embeds

0

Number of Embeds

1

Shares

0

Downloads

47

Comments

0

Likes

1

No embeds

No notes for slide

- 1. PROMISE at ICSE’07 MODELING the EFFECT of SIZE on DEFECT PRONENESS for OPEN-SOURCE SOFTWARE A. Güneş Koru1, Donsong Zhang1, and Hongfang Liu2 1Department of Information Systems UMBC Baltimore, MD, USA 2Georgetown Medical Center Department of Bioinformatics, Biostatistics, and Biomathematics Georgetown University, Washington, D.C., USA E-mails: gkoru@umbc.edu, zhangd@umbc.edu, hl224@georgetown.edu
- 2. UMBC • UMBC, University of Maryland, Baltimore County (http://umbc.edu/ ~gkoru) • Public research university with a focus on graduate education. • Theoretically, all campuses belong to the University of Maryland but practically they look like different universities. • UMBC is located in Baltimore in a small suburban neighborhood called Catonsville. UMBC is not • University of Maryland, College Park • University of Baltimore (Business school) • University of Maryland Baltimore (Medical School) • Hongfang Liu is with the Georgetown University located in Washington, D.C., Interested in Bioinformatics and Health Care.
- 3. Size--Defect Relationship • Size is perhaps the oldest measure. Mostly, measured by lines of code (sometimes function points). • Several studies found size to be associated with defect count. Earliest: A linear model in [Akiyama 71]. • Many other measures (e.g. cyclomatic complexity [McCabe 76], software science measures [Halstead 77]) are also correlated with size. There is some consensus that these are also size measures [Fenton and Pﬂeeger 96]. “May be size does not explain everything, but it explains a lot.” Bojan Cukic, PROMISE 2007 • Functional form of this relationship is still not understood well. • Commonly, practitioners assume a linear relationship [El Emam 05]. • Only general conclusion is that there is a continuously increasing relationship between the two [Fenton and Ohlsson 00, El Emam et al. 01].
- 4. Size--Defect Relationship: Alternative Forms defects defects defects size size size (a) (b) (c) • Implications: “Things are linear is open to questions” Tim Menzies, PROMISE 2007 • (a) Linear: Smaller and larger modules are proportionally equally • Theoretical and Practical Importance problematic • Decomposition • (b) Quadratic: Larger modules are • Focused quality assurance proportionally more problematic • Functional Enhancements • (c) Logarithmic: Smaller modules are proportionally more problematic
- 5. Why the relationship is still unclear... • Many earlier studies did not fully explore alternative functional forms or test the deviation from linearity signiﬁcantly. • Linear models [Akiyama 71] or correlations [Andersson and Runeson 07] were found sufﬁcient. • A study stated that linear models could be good as ﬁrst approximations and there was better tool support [Shen 85] • Number of data points were very limited in the earlier studies (e.g. Akiyama 71). • Deriving models analytically and then ﬁtting data to validate those models [Lipow 82]. • Accepting correlations as a sign of a linear relationship [Schneidewind and Hoffman 78]. Correlations do not imply proportionality. • Focus shift on defect density. Observations for optimal module size that minimizes defect density. U-shaped curve (Goldilock’s conjecture) [Withrow 90, Hatton 97, Hatton 98, etc]. See [El Emam 02] for a detailed review. • This approach can mask the plain size--defect relationship and mislead us. [El Emam 02, Fenton and Neil 00, and Rosenberg 97] • Gets more difﬁcult to understand from multivariate and sophisticated machine learning models (e.g. from Neural Networks in [Khoshgoftaar 97]).
- 6. Conventional Approach to Investigate Size--Defect Relationship • All these studies share a common characteristic • A software system is measured at a snapshot time, then the obtained measurements are associated with the future defect count (note this might be pre-release or post release) For ex: [Koru and Tian 03] [Khoshgoftaar 96] • Usually, measurement and analysis performed at module level. • A common problem is the availability of data [Fenton and Ohlsson 02]. • Publicly available Open Source Software (OSS) repositories: Source code, change data, and defect data [Koru and Tian 04].
- 7. Challenges with Using Conventional Method in OSS Context • Evolutionary aspects of OSS. Continuous and concurrent functional enhancements, defect ﬁxes, all other changes (perfective, adaptive, etc.) Bazaar model rather than cathedral model [Raymond 99]. • OSS, usually, developed by volunteers, not too much planning, no requirements or design documents, source code is the main artifact. [Mockus et al. 00, Mockus et al. 02]. • Quality assurance activities are not systematic in OSS (see Zhao and Elbaum 03, Koru et al. 07]) • So far, research using conventional approach focused on relatively better planned, analyzed, designed, and tested closed source products. • Internal validity problems caused by the dynamic OSS context: • Deleted classes • Size changes • There might be closed source products developed in an evolutionary manner and vice versa. Such comparisons are outside of the scope here (see [Paulson et al. 04]))
- 8. In this study... “If developers play with a file, it can change its defect proneness” Elaine Weyuker, PROMISE 2007 • To gain a better understanding of the size--defect relationship, we used both • Novel approach that adopts Cox Proportional Hazards Modeling with Recurrent Events (Cox Modeling) [Cox 72]. • The data comes from a large-scale long-lived OSS product Mozilla (http://www.mozilla.org). • The evolutionary aspects of the Mozilla project was shown in other studies: • Gyiomothy et al. [04] found that size of Mozilla increased signiﬁcantly during successive releases. • Mockus et al. [02] found that there was no particular development process in Mozilla.
- 9. In the rest of this presentation... •Methodology •Demonstrating the evolutionary aspects of Mozilla •Cox Modeling •Data Collection •Modeling and Results •Future Work •Conclusion
- 10. Results: Demonstrating Evolutionary Aspects of Mozilla (a) 1000 ● Cumulative Number of Deleted Classes ● ● ● ● • For only Mozilla 1.0 ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ●● ● ●● ●●● ● ●● ● ●● ● ●● ● ● ● ●● ● ●●● 800 ●●● ● ● ●● ● ●● ● ● classes ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 600 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ● ●● ● ●● ●● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ●● ●● ●● ●● ● ● 400 ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ●● ●●● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● • (a) Cumulative ● ●● ●●● ●● ●● ●● ● ● ● ● ● ● ● ● 200 ● ● ●● ● ●● ●●● ●●● ●● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ● ●●● ●● number of ●● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ●● ●● ● ● ●●● ●●● ●● ●● ● ● ● ●● ● ● ● ●●● 0 ● deleted classes 2003 2004 2005 2006 Years (b) • (b) Cumulative ●●●●●●●●●●●●●● ●●●●●●●●● ●●●● ●●●●●●● ●●● ●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●● ●● ●●●● ●●●●●●● ●●●●●●● ●●●●●● ●●●●●●●● ● ●●● ●●●●●●● ● ●●● ●●●● ●●●●●●●● ●●●●●●●●●●●●●● ●●●●● ● ●●●●● ●●●●●●●●● ●● ●● ● ●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●● ●●●● 1500000 ●●●●●●●●●● ●●●● ●●●●●●●●●● ●●●
- 11. Cox Modeling ( A non-parametric approach) • The instantaneous relative risk (hazard) of defect ﬁx, also called event, becomes the response variable. Note that it can recur. • A complete size history is obtained for each class by measuring size at each change and corrective changes are marked. • Time of change is also noted. At each unique time, the hazard is calculated by dividing the events at that time by the classes at risk at that time. λi (t) = λ0 (t)eβxi (t) . (1) • Hazard function: β is the regression coeﬃcient for xi (t) and λ0 (t) is an unspeciﬁed non-negative function of time called the baseline hazard function. It is the instantaneous hazard of having an event without any covariate eﬀect (i.e., when β = 0). • Relative hazard: eβ(xj (t)−xk (t)) • Note that the relative hazard is proportional to the difference in covariate values. This is called proportional hazards assumption and needs to be checked.
- 12. Methodology • Relative log risk is noted by f(size) (for median size, it is set to zero). • Examine the functional model with Cubic Spline Functions using four knots f (size) = β0 +β1 size+β2 (size−k1 )3 +β3 (size−k2 )3 +β4 (size−k3 )3 +β5 (size−k4 )3 + + + + (1) where, (size − kn ), if (size − kn ) > 0 (size − kn )+ = (2) 0, otherwise • Examined the alternative model visually • Tested whether the deviation from linearity was statistically signiﬁcant H0 : β2 = β3 = β4 = 0
- 13. Methodology - Data Layout and Collection (A) • We developed PERL scripts to extract class name size defect count source code, analyze CVS changes, and A 75 0 to ﬁnd whether a class is affected or not B 250 2 C 300 2 • (a) What would the data look like if D 600 2 conventional approach was used. E 800 3 F 220 0 • (b) Novel Approach: Classes between G 300 0 added to the system after Mozilla 1.0 . . . . . . release date were measured until Feb 22, (B) 2006. class name start end event size state • Each change resulted in an observation Y 0 50 0 75 0 Y 50 100 1 200 1 • 15,545 observations Y 100 200 0 300 1 • Events were identiﬁed by searching the Z 0 200 1 250 0 Z 200 800 0 180 1 CVS logs for words ‘bug’, ‘defect’, and Z 800 1400 1 400 1 ‘ﬁx’. When we sampled 100 logs Z 1400 1800 0 300 1 . . . . . randomly, we saw that this automated . . . . . approach was correct for 98 of them.
- 14. Results - Functional Form 2.0 1.5 Instantaneous relative risk of defect fix 1.0 0.5 0.0 −0.5 −1.0 0 2000 4000 6000 8000 10000 12000 Size (LOC) • When we use cubic spline functions the logarithmic form is also obvious. The curve down at the end is only for less than 0.3% of the data points. We can use log(size) directly in the Cox model
- 15. Results -- Modeling results MANUSCRIPT SUBMITTED TO TSE coef exp(coef) se(coef) robust se z p log(size) 0.368 1.44 0.00732 0.018 20.4 0 Rsquare= 0.152 (max possible= 1) Likelihood ratio test= 2565 on 1 df, p=0 Wald test = 416 on 1 df, p=0 Score (logrank) test = 2565 on 1 df, p=0, Robust Score = 142 p=0 Fig. 5. Modeling results using logarithmic transform of size
- 16. Outlier Analysis - Checking for overly inﬂuential data points 0.0005 ! ! ! ! !! !! ! ! !!! ! ! ! ! ! !! !!!! !!! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! !!! !!! !! ! ! ! !!! ! !! ! !! ! ! ! !! ! !! ! !! ! !!!! ! !! ! ! ! !! ! ! ! ! ! ! ! ! ! ! !!! !!! !! !!!! ! !! ! ! ! ! ! !! ! !! ! ! !! ! ! ! ! ! ! ! ! ! !! !!! ! !! ! !!! ! !! ! ! !! ! !! ! ! ! !! ! !! !!!! ! ! ! ! !! !!!! !!! ! !! !!!!!!!! ! !!! !!!! !!! !!! !! ! !!!! !!!!!!! ! ! ! ! ! !!!!! ! !!! !! ! ! !! !!! ! ! !! ! ! ! ! ! !! ! !! !! ! !! !! ! ! ! ! ! ! ! ! ! !! ! !!! !! ! ! ! !!!!! !!!!!! !! !! !!!!! !!!! ! !!!!! !!! !!! ! !!!!!! !!! !!!!! !!!!!!!! !!!!!!!!!! !!!!!!!!!! !!!!!!! !! !!!! !!! ! ! !!! !!!!!!!! !!!! ! !!! ! ! ! ! ! !! ! ! ! !!! !! !!! ! !! ! ! !! !!!!!! !!! ! ! ! ! !!! !! !! ! !!! !!!!!!!!!! !!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!! ! ! ! ! ! ! ! ! ! ! !! ! ! !! ! ! !! ! ! ! !!!!!! ! !!! ! !! !!! !! !! ! !! !!! ! !!!!!!!!!!!!!!!!!!! !!! !!!!!!!!!!!!!!!!!!!!!! !!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!! !!!! ! !!! !! !! !! !!!!!!!!!!!! !!!!!! !!! !!!!!!!! !!!!!!!!!!!!!!!!!!! Plotted Martingale Residuals !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!! ! !!!! !!! ! ! !!!! !!!!! !!!! ! !! !! ! ! !! !! !!!!! ! !!! !!!!!!!!!!!!!!!! !!!!!!! !!! ! !! !! !!!! ! ! ! ! ! ! ! !!! !! ! !! !! ! ! !!! !!!!!!!! !!! !!!!!!!!!! !!!!!!! !! !!! !!!!!! !!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! ! ! ! ! ! ! ! ! ! ! ! !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!! ! !!!!!!!!!!!!!!!!!!!!! !! !!!!!!!!!! !!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! !!! ! !!!!! !!!! ! ! ! !!!! ! ! !!!! !!!!! !!!!!!!!! !!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!! !!!!! !!!!!!!! ! !!!!!!!!! !!!! !!!!! !!! !! !!! !! !!!!!!!!! ! !!! ! !!! ! ! ! !! ! ! ! ! ! !! ! ! ! !!! ! !!! !! !!! !!!! !! !! !! !! ! ! !! ! !!!!!!! !! !!!!!! !! ! ! ! !! ! ! ! ! ! ! !!! ! !! ! !! !!!!! ! !!!! !! !! !!!!!!!!! !!! !!!!!! !! !!! !!! !! !!!!!!!! ! !!!! !!!!!! !! !!!!!!!!!! !!!!!!! ! !!! ! !!!!!!!!!! !!!!! ! !! ! !!! !!!! ! !! !!!! !!!! !!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!! ! !!!!!!!!!!!!!!! !!!!!!!!!!!! !!! !!! ! !!!!! !!!!!!!! ! !!!! !!!!!!! !! ! !!!! ! !! ! !!! ! !!!! ! !!!!!! !!!!! !!!!! !! ! !! !! !!!! ! ! !! ! !!! ! !!! !!!!! !!!! ! !! ! !! !!!! !!!!! ! !!! ! ! !!! !! !! ! ! !! ! !!! ! !! ! ! ! !! ! ! ! ! ! !! ! !! !! ! !! !!! !!!!!!!!!! ! ! !! !!!! !! !!!!!!!!!!!! !!! !!!! !!!!!! !!!!!!! ! ! !!! ! !! !!!!!! ! ! !! !! ! !! !!! !!!!! !!!!! !! ! ! !! !!!! !!! ! ! ! ! ! ! !! ! ! ! !!! ! ! ! ! ! !! ! !! !!! ! ! !! ! ! ! !! ! !!! ! ! !! !! ! !! ! ! ! ! !! !! ! ! !! ! !0.0005 ! ! !! ! ! ! ! !! ! !! ! !! ! ! !! ! ! !!! ! ! ! ! ! !!! ! ! ! ! ! !! !!! !! !! !! ! ! !! ! ! !! ! !! ! ! !! Outliers are still !! ! !! ! !! ! ! ! !! ! ! !! ! ! !! ! !! ! !! ! ! !! ! ! valid observations Influence ! !! ! ! ! !! !0.0015 Removing them only brings the unit effect outliers ! of log size to 45 % !0.0025 (small change) ! 0 5000 10000 15000 Decided to keep them. Observation id
- 17. Test of Proportional Hazards • Commonly, interaction with time is tested 20 • Example: A drug only effective in the ﬁrst hour. 10 Beta(t) for log(size) • Note: This test can also become signiﬁcant when a 0 wrong functional form is used. !10 • Result: p = 0.835 highly insigniﬁcant. !20 • A smooth plot of Schönfeld 0 500000 1000000 1500000 2000000 residuals show almost a Time perfectly straight line.
- 18. Model Fitness - Arjas plot • Arjas plot shows ! !! ! cumulative sum of ! !! !! !! !! ! ! ! ! ! 8000 ! ! !! ! ! ! ! expected versus ! ! ! ! !! ! ! ! ! !! !!! ! ! !! Cumulative Sum of Expected Events ! ! ! ! cumulative sum of ! ! ! ! ! ! ! ! ! !!! ! ! ! ! !! ! !! !! actual events. ! ! ! ! ! ! ! ! 6000 ! ! !! !! ! ! ! ! ! ! !! !! !! !! • Should follow 45- ! ! ! ! ! !! !! !! !! !! !! !! !! !! !! !! ! ! !! ! ! degree line. ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! !! !! ! ! ! ! 4000 ! ! ! ! ! ! ! !! • Ours follow the line ! ! ! ! !! !! ! !! !! !! ! ! !!! ! ! !! ! ! ! !! !! !! !! !! closely. ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !!! ! ! ! ! ! ! ! • Spearman correlation ! ! ! ! ! 2000 ! ! ! ! !! ! ! !! ! ! ! ! ! ! ! ! ! ! !! ! ! !! between actual vs. ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! expected: 0.79 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! • The model passes all ! ! 0 the tests 0 2000 4000 6000 8000 Cumulative Sum of Actual Events
- 19. Interpretation of the results - bootstrapping For Mozilla classes, one unit of increase in the natural • log of size caused the rate of defect ﬁx to increase by 44%. • We run bootstrapping to derive an estimate of the 95% conﬁdence interval. •Sampled 1,000 classes 1,000 times. •For each sample, a different Cox model was produced using all observations in the sample, and a point estimate was made for the log(size) effect. •The point estimate was 44% and the 95% conﬁdence interval calculated via this procedure was [39% -- 49%].
- 20. Overall Results • Note that the results using both approaches showed that the functional form of the relationship was close to a logarithmic form. • Implies that smaller modules are proportionally more troublesome. •Note that they do not imply that smaller modules have less defects or less probability of having defects. •There is no threshold effect of size on defect proneness. The plot curves down for very large classes but the conﬁdence band gets larger too.
- 21. Implications • Practical: • A 1,000 LOC class, although 10 times bigger, is estimated to be only 2.33 times more defect-prone (95% CI [2.13 , 2.54]). • If one has resources to inspect 10,000 LOC, it is better to pick 100 classes of size 100-LOC as opposed to picking 10 classes of 1,000 LOC. The ﬁrst approach would be estimated to be 329% times more effective (95% CI [293.70%,369.48%]). • Theoretical: • Decomposition might have side effects • If the interface defects are responsible as suggested in [Basili and Perricone 84], the extent of decomposition needs to be reconsidered.
- 22. Related Work • Similar earlier observations were reported, however, by focusing on the size-defect density relationship [Hatton 97] [Hatton 98] and [Withrow 90]. • Such studies observed a U-shaped curve and identiﬁed an optimum size that minimized defect density. • Later El Emam et al. [El Emam 02] reported that such an approach can mask the true relationship between size and defects and mislead us by showing some threshold effects. • Indeed same points were made earlier in [Fenton and Neil 99] and even earlier in [Rosenberg 98] • In our study, we focused on the basic size--defect relationship.
- 23. Directions for Future Research • Replicated Studies for validation • Studying Modularity: Is the reason interface defects as suggested earlier [Basili and Perricone 84]? What is the interplay between coupling and size? • Studying people aspects: Experts develop larger modules? • Studying process aspects: Larger modules inspected and tested better? Systematic or non-systematic reuse? Copy- paste into larger classes?
- 24. Conclusion Our empirical results using a large-scale product that offered thousands of data points showed that: • Size--defect relationship took a logarithmic form • Defect-proneness increased as size increased • There is no threshold value; a continuously increasing relationship • Smaller modules are proportionally more troublesome • Results can be immediately useful for Mozilla project • Results also trigger many interesting research questions for the future.

Be the first to comment