SlideShare a Scribd company logo
An Overview of Software
   Defect Density: A
    Scoping Study




       Syed Muhammad Ali Shah*,
       Maurizio Morisio, Marco Torchiano
Software
   Quality
Defect Density
Defect Density
 • Product quality measured in terms of
   Defect Density (DD)

                DD = CND/ Size
   • Where CND is the cumulative number of
     post release defects in the observed period.
   • Size is measured at the end of the observed
     period in thousands of lines of code (KLoC).
RQ 1: What are the Typical Figures of DD in Software
                     Projects



            Typical Figures of
                   DD


Such figures provide managers benchmarks to define quality
goals upfront, to evaluate and to assess the quality
RQ2: Is there a difference in DD between open and closed
source project?




Since often the context, motivation, and development process
differ between open source and proprietary projects. We aim at
finding some evidence, at least in terms of DD.
RQ3: Is there a difference in DD among programming languages?




Different programming languages encompass e.g. varying
styles, expressive power and abstraction level. Such
differences are likely to influence the DD of projects.
What is the relationship between DD and
project size?


It provide researchers the evidence of a
relationship between project size and DD.
What is the relationship of DD and project age?




It provides researchers the evidence of evolution of reliability
over time, not only on failure happening (traditional reliability
definition) but also on DD.
Scoping Study
• Followed the framework of Arksey and O’Malley
  for conducting our scoping study.


  The framework has five stages
• Stage 1: Research Questions Definition
• Stage 2: Relevant Studies Identification
• Stage 3: Selection of Studies
• Stage 4: Charting the Data
• Stage 5: Results
Search Strategy
Three phase search strategy.
Phase 1:
• Considers papers published in top software
  engineering journals from 2000 to 2011.
• The search string is “Defect Density” OR “Fault
  Density” OR “Reliability” is used.
Phase 2
• We search using two publication databases, IEEE
  explorer and ACM digital library.
….
• The search keywords are “Software Defect
  Density” and “Software Fault Density”.
Phase 3
• We followed the references of selected 18
  papers to identify other relevant studies.
• We particularly searched the Promise
  proceedings


  This led to a selection of 19 papers in total
  containing DD data about 110 projects.
Inclusion Criteria
• Are related to software engineering
• Are related to software projects
• Contain directly figures of DD, or contain data
  that allows to compute DD indirectly (such as
  number of post release defects and size in LOC)

• Mention that DD was computed after the
  particular release or the project (operation
  phase, post release phase, etc).
Data Analysis and Hypothesis
 To answer the research questions we used
 descriptive statistics, regression analysis and when
 applicable we also statistically test some hypothesis.


 Concerning RQ 1: we used descriptive statistics and
 cumulative distribution diagrams.


 Concerning RQ 4,5: we performed regression
 analysis
…..
     We answer RQ’s 2 and 3 by means of
     hypothesis testing


     Concerning RQ2:
     H20: There is no significant difference in
      term of DD between open source and
     close source projects.
     H2a: There is a significant difference in
      term of DD between open source
     and close source projects.
  
….
Concerning RQ3:
 H3.10:There is no significant difference in
     terms of DD among projects adopting
 different languages.
 H3.1a:There is a significant difference in
  terms of DD among projects adopting
 different languages.


 If the above null hypothesis can be rejected we can
 conduct a post-hoc investigation of the pair-wise
 differences;
DD Typical Figures

             7.47




                                     STANDARD
           NUMBERS   MEAN   MEDIAN   DEVIATION
Projects   109       7.47     4.3        7.99
DD Vs. (open, closed
               source)
                                    Mann-
                                     Mann-
                                    Whitney
                                    Whitney
                                    p = 0.004
                                    p = 0.004




Open source projects DD that is on average 4 defects/KLoC
smaller than closed source ones
                                                 STANDARD
                NUMBERS    MEAN       MEDIAN     DEVIATION
Closed source      77       8.6         5.4         8.49
 Open source      32        4.66        2.75        5.84
DD Vs. Programming
                  languages Kruskal-Wallis
                            Kruskal-Wallis
            Mann-Whitney
            Mann-Whitney        αB =α/3=0.016               p = 0.009
                 p = 0.003
                 p = 0.003                                  p = 0.009




C projects DD have 4.1 defect per KLoC higher than Java ones
                                                              STANDARD
                NUMBERS        MEAN          MEDIAN           DEVIATION
           C              43          10.0            7.9               7.98
         Java             45           5.9            3.5               6.22
         C++              11          8.73            1.3           13.17
log(DD) = 5.12 - 0.341 log(Size)



 DD Vs. Size of Project
                                                      The negative correlation,
                                                       The negative correlation,
                                                      for the log-values has a
                                                       for the log-values has a
                                                      higher, though still small
                                                       higher, though still small
                                                      practical relevance.
                                                       practical relevance.




           log(DD) = 5.12 - 0.341 log(Size)
Regression’s p-value is smaller than 0.001 and the
corresponding adjusted R2 is 22%.
We find evidence that the larger the projects, the lower the
defect density


                                                     Mann-Whitney
                                                     Mann-Whitney
                                                         p <0.016
                                                         p <0.016




        αB =α/3=0.016
                                        Defect Density
Size range              N        Mean     Median          Std Dev
upto 400K LoC               88   8.63       5.3             8.4
400K to 2M LoC              15   3.3        2.8             2.72
Above 2M LoC                5    0.38      0.40             0.28
DD Vs. Age of Project




                         DD = 1.86 + 0.59 ⋅ Age
The p-value of the regression is 0.011 and the corresponding
                    adjusted R2 is 13.8%.
Contingency Tables
Size  Type                 Closed                       Open
         Small              63           82%             25           78%
       Medium               12           15%              4           12%
         Large               2            3%              3           10%
                            77          100%             32          100%

 Lang            C              C++            Java             Other
Type
Closed       39       91%      11     100%     20      44%       7    70%
 Open         4        9%       0       0%     25      56%       3    30%
             43      100%      11     100%     45     100%      10   100%
Speculative Hypotheses

•    Small projects are more in closed source projects
    (82%) than for open source ones (78%).

• C and Java Unbalanced use Open Source and Close
  Source may have some influence.

• Size is negatively correlated with defect density”
  Large projects typically put more effort on testing
Threats to Validity
• “survivorship bias”. Checking figures.
• Missed papers calling DD by another name.
• Little concern programming language,
  development mode.
• Size, defects are easily subject to ambiguities.
• 109 projects is not a negligible number, but is a
  very limited percentage of the number of projects
  released overall.
Main Limitations
    Lack of process variables:
•    Testing effort
•    Quality effort in general
•    code churn
•    code history
•    Experience (team and manager)

    That are probably key factors for DD.
Conclusion
• The first to mined the literature for DD
  figures that had not been gathered and
  analyzed before.
• Found that development mode is a factor
  for DD
• Programming language is sometimes a
  factor for DD
• Found that projects size is relevant while
  Age is not a factor for DD
Thank You


Questions
  OR
 Coffee
  OR
 Coffee

More Related Content

Similar to An overview of software dd a scoing study

When, why and for whom do practitioners detect technical debts?: An experienc...
When, why and for whom do practitioners detect technical debts?: An experienc...When, why and for whom do practitioners detect technical debts?: An experienc...
When, why and for whom do practitioners detect technical debts?: An experienc...
Norihiro Yoshida
 
ICPE 2022 - Data Challenge
ICPE 2022 - Data ChallengeICPE 2022 - Data Challenge
ICPE 2022 - Data Challenge
Luc Lesoil
 
Conducting and reporting the results of a cfd simulation
Conducting and reporting the results of a cfd simulationConducting and reporting the results of a cfd simulation
Conducting and reporting the results of a cfd simulation
Malik Abdul Wahab
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Gábor Szárnyas
 
Reliability Vs. Testing
Reliability Vs. TestingReliability Vs. Testing
Reliability Vs. Testing
Nicolò Paternoster
 
On Impact in Software Engineering Research
On Impact in Software Engineering ResearchOn Impact in Software Engineering Research
On Impact in Software Engineering Research
CISPA Helmholtz Center for Information Security
 
2010-07-19_rails_tdd_week1
2010-07-19_rails_tdd_week12010-07-19_rails_tdd_week1
2010-07-19_rails_tdd_week1
Wolfram Arnold
 
Rayleigh model
Rayleigh modelRayleigh model
Rayleigh model
Roy Antony Arnold G
 
My Dad Won't Buy Me DevOps
My Dad Won't Buy Me DevOpsMy Dad Won't Buy Me DevOps
My Dad Won't Buy Me DevOps
XebiaLabs
 
Open-DO Update
Open-DO UpdateOpen-DO Update
Open-DO Update
AdaCore
 
"We are doing it wrong."
"We are doing it wrong.""We are doing it wrong."
"We are doing it wrong."
weissgraeber
 
Why computer programming
Why computer programmingWhy computer programming
Why computer programming
TUOS-Sam
 
IEEE ACM Studying the Relationship between Exception Handling Practices and P...
IEEE ACM Studying the Relationship between Exception Handling Practices and P...IEEE ACM Studying the Relationship between Exception Handling Practices and P...
IEEE ACM Studying the Relationship between Exception Handling Practices and P...
Gui Padua
 
Lafauci dv club oct 2006
Lafauci dv club oct 2006Lafauci dv club oct 2006
Lafauci dv club oct 2006
Obsidian Software
 
Complexity metrics and models
Complexity metrics and modelsComplexity metrics and models
Complexity metrics and models
Roy Antony Arnold G
 
Complexity metrics and models
Complexity metrics and modelsComplexity metrics and models
Complexity metrics and models
Roy Antony Arnold G
 
Estimating test effort part 2 of 2
Estimating test effort part 2 of 2Estimating test effort part 2 of 2
Estimating test effort part 2 of 2
Ian McDonald
 
Improving your Agile Process
Improving your Agile ProcessImproving your Agile Process
Improving your Agile Process
David Copeland
 
A preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localizationA preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localization
krws
 
Believe it or not - keynote CAS 2015
Believe it or not - keynote CAS 2015Believe it or not - keynote CAS 2015
Believe it or not - keynote CAS 2015
lantoli
 

Similar to An overview of software dd a scoing study (20)

When, why and for whom do practitioners detect technical debts?: An experienc...
When, why and for whom do practitioners detect technical debts?: An experienc...When, why and for whom do practitioners detect technical debts?: An experienc...
When, why and for whom do practitioners detect technical debts?: An experienc...
 
ICPE 2022 - Data Challenge
ICPE 2022 - Data ChallengeICPE 2022 - Data Challenge
ICPE 2022 - Data Challenge
 
Conducting and reporting the results of a cfd simulation
Conducting and reporting the results of a cfd simulationConducting and reporting the results of a cfd simulation
Conducting and reporting the results of a cfd simulation
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
 
Reliability Vs. Testing
Reliability Vs. TestingReliability Vs. Testing
Reliability Vs. Testing
 
On Impact in Software Engineering Research
On Impact in Software Engineering ResearchOn Impact in Software Engineering Research
On Impact in Software Engineering Research
 
2010-07-19_rails_tdd_week1
2010-07-19_rails_tdd_week12010-07-19_rails_tdd_week1
2010-07-19_rails_tdd_week1
 
Rayleigh model
Rayleigh modelRayleigh model
Rayleigh model
 
My Dad Won't Buy Me DevOps
My Dad Won't Buy Me DevOpsMy Dad Won't Buy Me DevOps
My Dad Won't Buy Me DevOps
 
Open-DO Update
Open-DO UpdateOpen-DO Update
Open-DO Update
 
"We are doing it wrong."
"We are doing it wrong.""We are doing it wrong."
"We are doing it wrong."
 
Why computer programming
Why computer programmingWhy computer programming
Why computer programming
 
IEEE ACM Studying the Relationship between Exception Handling Practices and P...
IEEE ACM Studying the Relationship between Exception Handling Practices and P...IEEE ACM Studying the Relationship between Exception Handling Practices and P...
IEEE ACM Studying the Relationship between Exception Handling Practices and P...
 
Lafauci dv club oct 2006
Lafauci dv club oct 2006Lafauci dv club oct 2006
Lafauci dv club oct 2006
 
Complexity metrics and models
Complexity metrics and modelsComplexity metrics and models
Complexity metrics and models
 
Complexity metrics and models
Complexity metrics and modelsComplexity metrics and models
Complexity metrics and models
 
Estimating test effort part 2 of 2
Estimating test effort part 2 of 2Estimating test effort part 2 of 2
Estimating test effort part 2 of 2
 
Improving your Agile Process
Improving your Agile ProcessImproving your Agile Process
Improving your Agile Process
 
A preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localizationA preliminary study on using code smells to improve bug localization
A preliminary study on using code smells to improve bug localization
 
Believe it or not - keynote CAS 2015
Believe it or not - keynote CAS 2015Believe it or not - keynote CAS 2015
Believe it or not - keynote CAS 2015
 

Recently uploaded

The basics of sentences session 7pptx.pptx
The basics of sentences session 7pptx.pptxThe basics of sentences session 7pptx.pptx
The basics of sentences session 7pptx.pptx
heathfieldcps1
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
deepaannamalai16
 
MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025
khuleseema60
 
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptxA Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
OH TEIK BIN
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
Prof. Dr. K. Adisesha
 
Simple-Present-Tense xxxxxxxxxxxxxxxxxxx
Simple-Present-Tense xxxxxxxxxxxxxxxxxxxSimple-Present-Tense xxxxxxxxxxxxxxxxxxx
Simple-Present-Tense xxxxxxxxxxxxxxxxxxx
RandolphRadicy
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
MJDuyan
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.
IsmaelVazquez38
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
Steve Thomason
 
Juneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School DistrictJuneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School District
David Douglas School District
 
Contiguity Of Various Message Forms - Rupam Chandra.pptx
Contiguity Of Various Message Forms - Rupam Chandra.pptxContiguity Of Various Message Forms - Rupam Chandra.pptx
Contiguity Of Various Message Forms - Rupam Chandra.pptx
Kalna College
 
Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10
nitinpv4ai
 
BPSC-105 important questions for june term end exam
BPSC-105 important questions for june term end examBPSC-105 important questions for june term end exam
BPSC-105 important questions for june term end exam
sonukumargpnirsadhan
 
Educational Technology in the Health Sciences
Educational Technology in the Health SciencesEducational Technology in the Health Sciences
Educational Technology in the Health Sciences
Iris Thiele Isip-Tan
 
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
Payaamvohra1
 
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
Nguyen Thanh Tu Collection
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
TechSoup
 
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapitolTechU
 
How to Download & Install Module From the Odoo App Store in Odoo 17
How to Download & Install Module From the Odoo App Store in Odoo 17How to Download & Install Module From the Odoo App Store in Odoo 17
How to Download & Install Module From the Odoo App Store in Odoo 17
Celine George
 

Recently uploaded (20)

The basics of sentences session 7pptx.pptx
The basics of sentences session 7pptx.pptxThe basics of sentences session 7pptx.pptx
The basics of sentences session 7pptx.pptx
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
 
MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025
 
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptxA Free 200-Page eBook ~ Brain and Mind Exercise.pptx
A Free 200-Page eBook ~ Brain and Mind Exercise.pptx
 
Data Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsxData Structure using C by Dr. K Adisesha .ppsx
Data Structure using C by Dr. K Adisesha .ppsx
 
Simple-Present-Tense xxxxxxxxxxxxxxxxxxx
Simple-Present-Tense xxxxxxxxxxxxxxxxxxxSimple-Present-Tense xxxxxxxxxxxxxxxxxxx
Simple-Present-Tense xxxxxxxxxxxxxxxxxxx
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
 
Juneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School DistrictJuneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School District
 
Contiguity Of Various Message Forms - Rupam Chandra.pptx
Contiguity Of Various Message Forms - Rupam Chandra.pptxContiguity Of Various Message Forms - Rupam Chandra.pptx
Contiguity Of Various Message Forms - Rupam Chandra.pptx
 
Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10
 
BPSC-105 important questions for june term end exam
BPSC-105 important questions for june term end examBPSC-105 important questions for june term end exam
BPSC-105 important questions for june term end exam
 
Educational Technology in the Health Sciences
Educational Technology in the Health SciencesEducational Technology in the Health Sciences
Educational Technology in the Health Sciences
 
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
NIPER 2024 MEMORY BASED QUESTIONS.ANSWERS TO NIPER 2024 QUESTIONS.NIPER JEE 2...
 
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
 
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptx
 
How to Download & Install Module From the Odoo App Store in Odoo 17
How to Download & Install Module From the Odoo App Store in Odoo 17How to Download & Install Module From the Odoo App Store in Odoo 17
How to Download & Install Module From the Odoo App Store in Odoo 17
 

An overview of software dd a scoing study

  • 1. An Overview of Software Defect Density: A Scoping Study Syed Muhammad Ali Shah*, Maurizio Morisio, Marco Torchiano
  • 2. Software Quality Defect Density
  • 3. Defect Density • Product quality measured in terms of Defect Density (DD) DD = CND/ Size • Where CND is the cumulative number of post release defects in the observed period. • Size is measured at the end of the observed period in thousands of lines of code (KLoC).
  • 4. RQ 1: What are the Typical Figures of DD in Software Projects Typical Figures of DD Such figures provide managers benchmarks to define quality goals upfront, to evaluate and to assess the quality
  • 5. RQ2: Is there a difference in DD between open and closed source project? Since often the context, motivation, and development process differ between open source and proprietary projects. We aim at finding some evidence, at least in terms of DD.
  • 6. RQ3: Is there a difference in DD among programming languages? Different programming languages encompass e.g. varying styles, expressive power and abstraction level. Such differences are likely to influence the DD of projects.
  • 7. What is the relationship between DD and project size? It provide researchers the evidence of a relationship between project size and DD.
  • 8. What is the relationship of DD and project age? It provides researchers the evidence of evolution of reliability over time, not only on failure happening (traditional reliability definition) but also on DD.
  • 9. Scoping Study • Followed the framework of Arksey and O’Malley for conducting our scoping study. The framework has five stages • Stage 1: Research Questions Definition • Stage 2: Relevant Studies Identification • Stage 3: Selection of Studies • Stage 4: Charting the Data • Stage 5: Results
  • 10. Search Strategy Three phase search strategy. Phase 1: • Considers papers published in top software engineering journals from 2000 to 2011. • The search string is “Defect Density” OR “Fault Density” OR “Reliability” is used. Phase 2 • We search using two publication databases, IEEE explorer and ACM digital library.
  • 11. …. • The search keywords are “Software Defect Density” and “Software Fault Density”. Phase 3 • We followed the references of selected 18 papers to identify other relevant studies. • We particularly searched the Promise proceedings This led to a selection of 19 papers in total containing DD data about 110 projects.
  • 12. Inclusion Criteria • Are related to software engineering • Are related to software projects • Contain directly figures of DD, or contain data that allows to compute DD indirectly (such as number of post release defects and size in LOC) • Mention that DD was computed after the particular release or the project (operation phase, post release phase, etc).
  • 13. Data Analysis and Hypothesis To answer the research questions we used descriptive statistics, regression analysis and when applicable we also statistically test some hypothesis. Concerning RQ 1: we used descriptive statistics and cumulative distribution diagrams. Concerning RQ 4,5: we performed regression analysis
  • 14. ….. We answer RQ’s 2 and 3 by means of hypothesis testing Concerning RQ2: H20: There is no significant difference in term of DD between open source and close source projects. H2a: There is a significant difference in term of DD between open source and close source projects.  
  • 15. …. Concerning RQ3: H3.10:There is no significant difference in terms of DD among projects adopting different languages. H3.1a:There is a significant difference in terms of DD among projects adopting different languages. If the above null hypothesis can be rejected we can conduct a post-hoc investigation of the pair-wise differences;
  • 16. DD Typical Figures 7.47 STANDARD NUMBERS MEAN MEDIAN DEVIATION Projects 109 7.47 4.3 7.99
  • 17. DD Vs. (open, closed source) Mann- Mann- Whitney Whitney p = 0.004 p = 0.004 Open source projects DD that is on average 4 defects/KLoC smaller than closed source ones STANDARD NUMBERS MEAN MEDIAN DEVIATION Closed source 77 8.6 5.4 8.49 Open source 32 4.66 2.75 5.84
  • 18. DD Vs. Programming languages Kruskal-Wallis Kruskal-Wallis Mann-Whitney Mann-Whitney αB =α/3=0.016 p = 0.009 p = 0.003 p = 0.003 p = 0.009 C projects DD have 4.1 defect per KLoC higher than Java ones STANDARD NUMBERS MEAN MEDIAN DEVIATION C 43 10.0 7.9 7.98 Java 45 5.9 3.5 6.22 C++ 11 8.73 1.3 13.17
  • 19. log(DD) = 5.12 - 0.341 log(Size) DD Vs. Size of Project The negative correlation, The negative correlation, for the log-values has a for the log-values has a higher, though still small higher, though still small practical relevance. practical relevance. log(DD) = 5.12 - 0.341 log(Size) Regression’s p-value is smaller than 0.001 and the corresponding adjusted R2 is 22%.
  • 20. We find evidence that the larger the projects, the lower the defect density Mann-Whitney Mann-Whitney p <0.016 p <0.016 αB =α/3=0.016 Defect Density Size range N Mean Median Std Dev upto 400K LoC 88 8.63 5.3 8.4 400K to 2M LoC 15 3.3 2.8 2.72 Above 2M LoC 5 0.38 0.40 0.28
  • 21. DD Vs. Age of Project DD = 1.86 + 0.59 ⋅ Age The p-value of the regression is 0.011 and the corresponding adjusted R2 is 13.8%.
  • 22. Contingency Tables Size Type Closed Open Small 63 82% 25 78% Medium 12 15% 4 12% Large 2 3% 3 10% 77 100% 32 100% Lang C C++ Java Other Type Closed 39 91% 11 100% 20 44% 7 70% Open 4 9% 0 0% 25 56% 3 30% 43 100% 11 100% 45 100% 10 100%
  • 23. Speculative Hypotheses • Small projects are more in closed source projects (82%) than for open source ones (78%). • C and Java Unbalanced use Open Source and Close Source may have some influence. • Size is negatively correlated with defect density” Large projects typically put more effort on testing
  • 24. Threats to Validity • “survivorship bias”. Checking figures. • Missed papers calling DD by another name. • Little concern programming language, development mode. • Size, defects are easily subject to ambiguities. • 109 projects is not a negligible number, but is a very limited percentage of the number of projects released overall.
  • 25. Main Limitations Lack of process variables: • Testing effort • Quality effort in general • code churn • code history • Experience (team and manager) That are probably key factors for DD.
  • 26. Conclusion • The first to mined the literature for DD figures that had not been gathered and analyzed before. • Found that development mode is a factor for DD • Programming language is sometimes a factor for DD • Found that projects size is relevant while Age is not a factor for DD
  • 27. Thank You Questions OR Coffee OR Coffee

Editor's Notes

  1. Such figures provide quality managers and project managers benchmarks to define quality goals upfront, to evaluate the quality of a project during development, and to assess the quality of a project post mortem
  2. Since often the context, motivation, and development process differ between open source and proprietary projects, as the anecdotal story goes one would expect a different quality of products. We aim at finding some evidence, at least in terms of DD.
  3. Such differences are likely to influence the DD of projects.
  4. A lot of analysis has been conducted on the relationship between module size and DD, but we could not find any on project size and DD. RQ4 provide researchers the evidence of a relationship between project size and DD.
  5. A reasonable expectation is that the longer a project is released, more defects are found and therefore the higher the defect density. In addition it also provides researchers the evidence of evolution of reliability over time, not only on failure happening (traditional reliability definition) but also on DD.
  6. System and Software IEEE Transaction on Software Engineering ACM Transaction on Software Engineering and Measurement IEEE Software Empirical Software Engineering Information and Software Technology
  7. to determine up to which extent the dependent variable varies as a function of one or more independent variables
  8. The DD is reported on a logarithmic scale, we observe that it spans nearly three orders of magnitudes, from 0.05 to close to 50.0.
  9. Open source projects DD that is on average 4 defects/KLoC smaller than closed source ones
  10. C projects have a DD that is an average 4.1 defect per KLoC higher than Java ones Bonferroni correction for multiple tests shall be applied.
  11. Since both the DD and size distributions are extremely skewed – actually requiring a dual log scale to have a discernible representation The negative correlation, for the log-values has a higher, though still small practical relevance.
  12. This may give an impression that the reported data to calculate DD would be “survivorship bias”. as in any overview the composition is subject of debate [16], and actually we acknowledge that this is not the definitive: it is deemed to evolve indefinitely. It is well known that measuring size in lines of code is subject to variations due to programming language and modes of measuring (with or without comments, with or without blank lines, including libraries or not, etc). Here too there may be differences in the precision of measurement processes used, and on the definition of a defect used. This lack of precision of defect data may give estimates with larger error but following [16] we believe that this is better than having no data at all and relying on intuition