Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

1,307 views

1,245 views

1,245 views

Published on

A. Gunes Koru and Donsong Zhang and Hongfang Liu

No Downloads

Total views

1,307

On SlideShare

0

From Embeds

0

Number of Embeds

32

Shares

0

Downloads

50

Comments

0

Likes

1

No embeds

No notes for slide

- 1. PROMISE at ICSE’07 MODELING the EFFECT of SIZE on DEFECT PRONENESS for OPEN-SOURCE SOFTWARE A. Güneş Koru1, Donsong Zhang1, and Hongfang Liu2 1Department of Information Systems UMBC Baltimore, MD, USA 2Georgetown Medical Center Department of Bioinformatics, Biostatistics, and Biomathematics Georgetown University, Washington, D.C., USA E-mails: gkoru@umbc.edu, zhangd@umbc.edu, hl224@georgetown.edu
- 2. UMBC • UMBC, University of Maryland, Baltimore County (http://umbc.edu/ ~gkoru) • Public research university with a focus on graduate education. • Theoretically, all campuses belong to the University of Maryland but practically they look like different universities. • UMBC is located in Baltimore in a small suburban neighborhood called Catonsville. UMBC is not • University of Maryland, College Park • University of Baltimore (Business school) • University of Maryland Baltimore (Medical School) • Hongfang Liu is with the Georgetown University located in Washington, D.C., Interested in Bioinformatics and Health Care.
- 3. Size--Defect Relationship • Size is perhaps the oldest measure. Mostly, measured by lines of code (sometimes function points). • Several studies found size to be associated with defect count. Earliest: A linear model in [Akiyama 71]. • Many other measures (e.g. cyclomatic complexity [McCabe 76], software science measures [Halstead 77]) are also correlated with size. There is some consensus that these are also size measures [Fenton and Pﬂeeger 96]. “May be size does not explain everything, but it explains a lot.” Bojan Cukic, PROMISE 2007 • Functional form of this relationship is still not understood well. • Commonly, practitioners assume a linear relationship [El Emam 05]. • Only general conclusion is that there is a continuously increasing relationship between the two [Fenton and Ohlsson 00, El Emam et al. 01].
- 4. Size--Defect Relationship: Alternative Forms defects defects defects size size size (a) (b) (c) • Implications: “Things are linear is open to questions” Tim Menzies, PROMISE 2007 • (a) Linear: Smaller and larger modules are proportionally equally • Theoretical and Practical Importance problematic • Decomposition • (b) Quadratic: Larger modules are • Focused quality assurance proportionally more problematic • Functional Enhancements • (c) Logarithmic: Smaller modules are proportionally more problematic
- 5. Why the relationship is still unclear... • Many earlier studies did not fully explore alternative functional forms or test the deviation from linearity signiﬁcantly. • Linear models [Akiyama 71] or correlations [Andersson and Runeson 07] were found sufﬁcient. • A study stated that linear models could be good as ﬁrst approximations and there was better tool support [Shen 85] • Number of data points were very limited in the earlier studies (e.g. Akiyama 71). • Deriving models analytically and then ﬁtting data to validate those models [Lipow 82]. • Accepting correlations as a sign of a linear relationship [Schneidewind and Hoffman 78]. Correlations do not imply proportionality. • Focus shift on defect density. Observations for optimal module size that minimizes defect density. U-shaped curve (Goldilock’s conjecture) [Withrow 90, Hatton 97, Hatton 98, etc]. See [El Emam 02] for a detailed review. • This approach can mask the plain size--defect relationship and mislead us. [El Emam 02, Fenton and Neil 00, and Rosenberg 97] • Gets more difﬁcult to understand from multivariate and sophisticated machine learning models (e.g. from Neural Networks in [Khoshgoftaar 97]).
- 6. Conventional Approach to Investigate Size--Defect Relationship • All these studies share a common characteristic • A software system is measured at a snapshot time, then the obtained measurements are associated with the future defect count (note this might be pre-release or post release) For ex: [Koru and Tian 03] [Khoshgoftaar 96] • Usually, measurement and analysis performed at module level. • A common problem is the availability of data [Fenton and Ohlsson 02]. • Publicly available Open Source Software (OSS) repositories: Source code, change data, and defect data [Koru and Tian 04].
- 7. Challenges with Using Conventional Method in OSS Context • Evolutionary aspects of OSS. Continuous and concurrent functional enhancements, defect ﬁxes, all other changes (perfective, adaptive, etc.) Bazaar model rather than cathedral model [Raymond 99]. • OSS, usually, developed by volunteers, not too much planning, no requirements or design documents, source code is the main artifact. [Mockus et al. 00, Mockus et al. 02]. • Quality assurance activities are not systematic in OSS (see Zhao and Elbaum 03, Koru et al. 07]) • So far, research using conventional approach focused on relatively better planned, analyzed, designed, and tested closed source products. • Internal validity problems caused by the dynamic OSS context: • Deleted classes • Size changes • There might be closed source products developed in an evolutionary manner and vice versa. Such comparisons are outside of the scope here (see [Paulson et al. 04]))
- 8. In this study... “If developers play with a file, it can change its defect proneness” Elaine Weyuker, PROMISE 2007 • To gain a better understanding of the size--defect relationship, we used both • Novel approach that adopts Cox Proportional Hazards Modeling with Recurrent Events (Cox Modeling) [Cox 72]. • The data comes from a large-scale long-lived OSS product Mozilla (http://www.mozilla.org). • The evolutionary aspects of the Mozilla project was shown in other studies: • Gyiomothy et al. [04] found that size of Mozilla increased signiﬁcantly during successive releases. • Mockus et al. [02] found that there was no particular development process in Mozilla.
- 9. In the rest of this presentation... •Methodology •Demonstrating the evolutionary aspects of Mozilla •Cox Modeling •Data Collection •Modeling and Results •Future Work •Conclusion
- 10. Results: Demonstrating Evolutionary Aspects of Mozilla (a) 1000 ● Cumulative Number of Deleted Classes ● ● ● ● • For only Mozilla 1.0 ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ●● ● ●● ●●● ● ●● ● ●● ● ●● ● ● ● ●● ● ●●● 800 ●●● ● ● ●● ● ●● ● ● classes ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 600 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ● ●● ● ●● ●● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ●● ●● ●● ●● ● ● 400 ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ●● ●●● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● • (a) Cumulative ● ●● ●●● ●● ●● ●● ● ● ● ● ● ● ● ● 200 ● ● ●● ● ●● ●●● ●●● ●● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ● ●●● ●● number of ●● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ●● ●● ● ● ●●● ●●● ●● ●● ● ● ● ●● ● ● ● ●●● 0 ● deleted classes 2003 2004 2005 2006 Years (b) • (b) Cumulative ●●●●●●●●●●●●●● ●●●●●●●●● ●●●● ●●●●●●● ●●● ●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●● ●● ●●●● ●●●●●●● ●●●●●●● ●●●●●● ●●●●●●●● ● ●●● ●●●●●●● ● ●●● ●●●● ●●●●●●●● ●●●●●●●●●●●●●● ●●●●● ● ●●●●● ●●●●●●●●● ●● ●● ● ●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●● ●●●● 1500000 ●●●●●●●●●● ●●●● ●●●●●●●●●● ●●●
- 11. Cox Modeling ( A non-parametric approach) • The instantaneous relative risk (hazard) of defect ﬁx, also called event, becomes the response variable. Note that it can recur. • A complete size history is obtained for each class by measuring size at each change and corrective changes are marked. • Time of change is also noted. At each unique time, the hazard is calculated by dividing the events at that time by the classes at risk at that time. λi (t) = λ0 (t)eβxi (t) . (1) • Hazard function: β is the regression coeﬃcient for xi (t) and λ0 (t) is an unspeciﬁed non-negative function of time called the baseline hazard function. It is the instantaneous hazard of having an event without any covariate eﬀect (i.e., when β = 0). • Relative hazard: eβ(xj (t)−xk (t)) • Note that the relative hazard is proportional to the difference in covariate values. This is called proportional hazards assumption and needs to be checked.
- 12. Methodology • Relative log risk is noted by f(size) (for median size, it is set to zero). • Examine the functional model with Cubic Spline Functions using four knots f (size) = β0 +β1 size+β2 (size−k1 )3 +β3 (size−k2 )3 +β4 (size−k3 )3 +β5 (size−k4 )3 + + + + (1) where, (size − kn ), if (size − kn ) > 0 (size − kn )+ = (2) 0, otherwise • Examined the alternative model visually • Tested whether the deviation from linearity was statistically signiﬁcant H0 : β2 = β3 = β4 = 0
- 13. Methodology - Data Layout and Collection (A) • We developed PERL scripts to extract class name size defect count source code, analyze CVS changes, and A 75 0 to ﬁnd whether a class is affected or not B 250 2 C 300 2 • (a) What would the data look like if D 600 2 conventional approach was used. E 800 3 F 220 0 • (b) Novel Approach: Classes between G 300 0 added to the system after Mozilla 1.0 . . . . . . release date were measured until Feb 22, (B) 2006. class name start end event size state • Each change resulted in an observation Y 0 50 0 75 0 Y 50 100 1 200 1 • 15,545 observations Y 100 200 0 300 1 • Events were identiﬁed by searching the Z 0 200 1 250 0 Z 200 800 0 180 1 CVS logs for words ‘bug’, ‘defect’, and Z 800 1400 1 400 1 ‘ﬁx’. When we sampled 100 logs Z 1400 1800 0 300 1 . . . . . randomly, we saw that this automated . . . . . approach was correct for 98 of them.
- 14. Results - Functional Form 2.0 1.5 Instantaneous relative risk of defect fix 1.0 0.5 0.0 −0.5 −1.0 0 2000 4000 6000 8000 10000 12000 Size (LOC) • When we use cubic spline functions the logarithmic form is also obvious. The curve down at the end is only for less than 0.3% of the data points. We can use log(size) directly in the Cox model
- 15. Results -- Modeling results MANUSCRIPT SUBMITTED TO TSE coef exp(coef) se(coef) robust se z p log(size) 0.368 1.44 0.00732 0.018 20.4 0 Rsquare= 0.152 (max possible= 1) Likelihood ratio test= 2565 on 1 df, p=0 Wald test = 416 on 1 df, p=0 Score (logrank) test = 2565 on 1 df, p=0, Robust Score = 142 p=0 Fig. 5. Modeling results using logarithmic transform of size
- 16. Outlier Analysis - Checking for overly inﬂuential data pointslotted Martingale Residuals !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!! ! !!!! !!! ! ! !!!! !!!!! !!!! ! !! !! ! ! !! !! !!!!! ! !!! !!!!!!!!!!!!!!!! !!!!!!! !!! ! !! !! !!!! ! ! ! ! ! ! ! !!! !! ! !! !! ! ! !!! !!!!!!!! !!! !!!!!!!!!! !!!!!!! !! !!! !!!!!! !!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! ! ! ! ! ! ! ! ! ! ! ! !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!! ! !!!!!!!!!!!!!!!!!!!!! !! !!!!!!!!!! !!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ! !!! ! !!!!! !!!! ! ! ! !!!! ! ! !!!! !!!!! !!!!!!!!! !!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!! !!!!! !!!!!!!! ! !!!!!!!!! !!!! !!!!! !!! !! !!! !! !!!!!!!!! ! !!! ! !!! ! ! ! !! ! ! ! ! ! !! ! ! ! !!! ! !!! !! !!! !!!! !! !! !! !! ! ! !! ! !!!!!!! !! !!!!!! !! ! ! ! !! ! ! ! ! ! ! !!! ! !! ! !! !!!!! ! !!!! !! !! !!!!!!!!! !!! !!!!!! !! !!! !!! !! !!!!!!!! ! !!!! !!!!!! !! !!!!!!!!!! !!!!!!! ! !!! ! !!!!!!!!!! !!!!! ! !! ! !!! !!!! ! !! !!!! !!!! !!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!! ! !!!!!!!!!!!!!!! !!!!!!!!!!!! !!! !!! ! !!!!! !!!!!!!! ! !!!! !!!!!!! !! ! !!!! ! !! ! !!! ! !!!! ! !!!!!! !!!!! !!!!! !! ! !! !! !!!! ! ! !! ! !!! ! !!! !!!!! !!!! ! !! ! !! !!!! !!!!! ! !!! ! ! !!! !! !! ! ! !! ! !!! ! !! ! ! ! !! ! ! ! ! ! !! ! !! !! ! !! !!! !!!!!!!!!! ! ! !! !!!! !! !!!!!!!!!!!! !!! !!!! !!!!!! !!!!!!! ! ! !!! ! !! !!!!!! ! ! !! !! ! !! !!! !!!!! !!!!! !! ! ! !! !!!! !!! ! ! ! ! ! ! !! ! ! ! !!! ! ! ! ! ! !! ! !! !!! ! ! !! ! ! ! !! ! !!! ! ! !! !! ! !! ! ! ! ! !! !! ! ! !! ! !0.0005 ! ! !! ! ! ! ! !! ! !! ! !! ! ! !! ! ! !!! ! ! ! ! ! !!! ! ! ! ! ! !! !!! !! !! !! ! ! !! ! ! !! ! !! ! ! !! Outliers are still !! ! !! ! !! ! ! ! !! ! ! !! ! ! !! ! !! ! !! ! ! !! ! ! valid observations Influence ! !! ! ! ! !! !0.0015 Removing them only brings the unit effect outliers ! of log size to 45 % !0.0025 (small change) ! 0 5000 10000 15000 Decided to keep them. Observation id
- 17. Test of Proportional Hazards • Commonly, interaction with time is tested 20 • Example: A drug only effective in the ﬁrst hour. 10 Beta(t) for log(size) • Note: This test can also become signiﬁcant when a 0 wrong functional form is used. !10 • Result: p = 0.835 highly insigniﬁcant. !20 • A smooth plot of Schönfeld 0 500000 1000000 1500000 2000000 residuals show almost a Time perfectly straight line.

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment