Slideshare.net (beta)

 
Post to TwitterPost to Twitter
Post: 
Myspace Hi5 Friendster Xanga LiveJournal Facebook Blogger Tagged Typepad Freewebs BlackPlanet gigya icons

All comments

Add a comment on Slide 1

If you have a SlideShare account, login to comment; else you can comment as a guest


Showing 1-50 of 0 (more)

Modeling the Effect of Size of Defect Proneness for Open-Source Software

From promise07, 2 years ago

A. Gunes Koru and Donsong Zhang and Hongfang Liu

697 views  |  0 comments  |  0 favorites  |  39 downloads
 

Categories

Add Category
 
 

Groups / Events

 

 
Embed
options

More Info

This slideshow is Public
Total Views: 697
on Slideshare: 697
from embeds: 0

Slideshow transcript

Slide 1: PROMISE at ICSE’07 MODELING the EFFECT of SIZE on DEFECT PRONENESS for OPEN-SOURCE SOFTWARE A. Güneş Koru1, Donsong Zhang1, and Hongfang Liu2 1Department of Information Systems UMBC Baltimore, MD, USA 2Georgetown Medical Center Department of Bioinformatics, Biostatistics, and Biomathematics Georgetown University, Washington, D.C., USA E-mails: gkoru@umbc.edu, zhangd@umbc.edu, hl224@georgetown.edu

Slide 2: UMBC • UMBC, University of Maryland, Baltimore County (http://umbc.edu/ ~gkoru) • Public research university with a focus on graduate education. • Theoretically, all campuses belong to the University of Maryland but practically they look like different universities. • UMBC is located in Baltimore in a small suburban neighborhood called Catonsville. UMBC is not • University of Maryland, College Park • University of Baltimore (Business school) • University of Maryland Baltimore (Medical School) • Hongfang Liu is with the Georgetown University located in Washington, D.C., Interested in Bioinformatics and Health Care.

Slide 3: Size--Defect Relationship • Size is perhaps the oldest measure. Mostly, measured by lines of code (sometimes function points). • Several studies found size to be associated with defect count. Earliest: A linear model in [Akiyama 71]. • Many other measures (e.g. cyclomatic complexity [McCabe 76], software science measures [Halstead 77]) are also correlated with size. There is some consensus that these are also size measures [Fenton and Pfleeger 96]. “May be size does not explain everything, but it explains a lot.” Bojan Cukic, PROMISE 2007 • Functional form of this relationship is still not understood well. • Commonly, practitioners assume a linear relationship [El Emam 05]. • Only general conclusion is that there is a continuously increasing relationship between the two [Fenton and Ohlsson 00, El Emam et al. 01].

Slide 4: Size--Defect Relationship: Alternative Forms defects defects defects size size size (a) (b) (c) • Implications: “Things are linear is open to questions” Tim Menzies, PROMISE 2007 • (a) Linear: Smaller and larger modules are proportionally equally • Theoretical and Practical Importance problematic • Decomposition • (b) Quadratic: Larger modules are • Focused quality assurance proportionally more problematic • Functional Enhancements • (c) Logarithmic: Smaller modules are proportionally more problematic

Slide 5: Why the relationship is still unclear... • Many earlier studies did not fully explore alternative functional forms or test the deviation from linearity significantly. • Linear models [Akiyama 71] or correlations [Andersson and Runeson 07] were found sufficient. • A study stated that linear models could be good as first approximations and there was better tool support [Shen 85] • Number of data points were very limited in the earlier studies (e.g. Akiyama 71). • Deriving models analytically and then fitting data to validate those models [Lipow 82]. • Accepting correlations as a sign of a linear relationship [Schneidewind and Hoffman 78]. Correlations do not imply proportionality. • Focus shift on defect density. Observations for optimal module size that minimizes defect density. U-shaped curve (Goldilock’s conjecture) [Withrow 90, Hatton 97, Hatton 98, etc]. See [El Emam 02] for a detailed review. • This approach can mask the plain size--defect relationship and mislead us. [El Emam 02, Fenton and Neil 00, and Rosenberg 97] • Gets more difficult to understand from multivariate and sophisticated machine learning models (e.g. from Neural Networks in [Khoshgoftaar 97]).

Slide 6: Conventional Approach to Investigate Size--Defect Relationship • All these studies share a common characteristic • A software system is measured at a snapshot time, then the obtained measurements are associated with the future defect count (note this might be pre-release or post release) For ex: [Koru and Tian 03] [Khoshgoftaar 96] • Usually, measurement and analysis performed at module level. • A common problem is the availability of data [Fenton and Ohlsson 02]. • Publicly available Open Source Software (OSS) repositories: Source code, change data, and defect data [Koru and Tian 04].

Slide 7: Challenges with Using Conventional Method in OSS Context • Evolutionary aspects of OSS. Continuous and concurrent functional enhancements, defect fixes, all other changes (perfective, adaptive, etc.) Bazaar model rather than cathedral model [Raymond 99]. • OSS, usually, developed by volunteers, not too much planning, no requirements or design documents, source code is the main artifact. [Mockus et al. 00, Mockus et al. 02]. • Quality assurance activities are not systematic in OSS (see Zhao and Elbaum 03, Koru et al. 07]) • So far, research using conventional approach focused on relatively better planned, analyzed, designed, and tested closed source products. • Internal validity problems caused by the dynamic OSS context: • Deleted classes • Size changes • There might be closed source products developed in an evolutionary manner and vice versa. Such comparisons are outside of the scope here (see [Paulson et al. 04]))

Slide 8: In this study... “If developers play with a file, it can change its defect proneness” Elaine Weyuker, PROMISE 2007 • To gain a better understanding of the size--defect relationship, we used both • Novel approach that adopts Cox Proportional Hazards Modeling with Recurrent Events (Cox Modeling) [Cox 72]. • The data comes from a large-scale long-lived OSS product Mozilla (http://www.mozilla.org). • The evolutionary aspects of the Mozilla project was shown in other studies: • Gyiomothy et al. [04] found that size of Mozilla increased significantly during successive releases. • Mockus et al. [02] found that there was no particular development process in Mozilla.

Slide 9: In the rest of this presentation... •Methodology •Demonstrating the evolutionary aspects of Mozilla •Cox Modeling •Data Collection •Modeling and Results •Future Work •Conclusion

Slide 10: Results: Demonstrating Evolutionary Aspects of Mozilla (a) 1000 ● Cumulative Number of Deleted Classes ● ● ● ● • For only Mozilla 1.0 ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ●● ● ●● ●●● ● ●● ● ●● ● ●● ● ● ● ●● ● ●●● 800 ●●● ● ● ●● ● ●● ● ● classes ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 600 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ● ●● ● ●● ●● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ●● ●● ●● ●● ● ● 400 ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ●● ●●● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● • (a) Cumulative ● ●● ●●● ●● ●● ●● ● ● ● ● ● ● ● ● 200 ● ● ●● ● ●● ●●● ●●● ●● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ●● ● ●● ● ● ●● ● ●●● ●● number of ●● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ●● ●● ● ● ●●● ●●● ●● ●● ● ● ● ●● ● ● ● ●●● 0 ● deleted classes 2003 2004 2005 2006 Years (b) • (b) Cumulative ●●●●●●●●●●●●●● ●●●●●●●●● ●●●● ●●●●●●● ●●● ●● ●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●● ●● ●●●● ●●●●●●●