Predicting Defects in
SAP Java Code
An Experience Report
                       by Tilman Holschuh
                       ...
Motivation
Motivation


Quality Manager
Motivation


Quality Manager
Motivation


Quality Manager
Motivation


Quality Manager
Motivation
                              Problems




Quality Manager   Resources     Time     Knowledge
Motivation
                              Problems




Quality Manager   Resources     Time     Knowledge




Where do we p...
Replicated 2 Studies
Replicated 2 Studies
1
Replicated 2 Studies
1



    Source
     code


    Version
    archive


      Bug
    database
Replicated 2 Studies
1



    Source
     code       McCabe
                FanOut
                LoC
                Cou...
Replicated 2 Studies
1



    Source
     code       McCabe
                FanOut
                LoC
                Cou...
Replicated 2 Studies
1



    Source
     code       McCabe
                FanOut
                LoC
                Cou...
Replicated 2 Studies
2



    Source
     code       McCabe
                FanOut
                LoC
                Cou...
Replicated 2 Studies
2



    Source
     code          McCabe
                   FanOut
               Dependencies
     ...
The Product

‣   SAP Standard Software
‣   Large scale Java software system ( > 10M LoC )
‣   Separated in projects
‣   Se...
Defect Distribution




            graphic created with TreeMap (University of Maryland)
                          see ht...
Defect Distribution




            graphic created with TreeMap (University of Maryland)
                          see ht...
Defect Distribution
20% of the code
contain ~75% of defects




                          graphic created with TreeMap (Un...
Defect Distribution
20% of the code
contain ~75% of defects




Upper bound for
prediction




                          g...
Basics


         Predictor
Input     Model      Output
How to collect
    Input Data?

1               2
     McCabe
     FanOut
     LoC            Dependencies
     Coupling
Collecting Metric Data

1
     McCabe
     FanOut
     LoC
     Coupling
Collecting Metric Data
                ‣ Metric tools: ckjm,
                  JDepend, ephyra
1
     McCabe
     FanOut
 ...
Collecting Metric Data
                ‣ Metric tools: ckjm,
                  JDepend, ephyra
1
     McCabe
     FanOut  ...
Collecting Metric Data
                ‣ Metric tools: ckjm,
                  JDepend, ephyra
1
     McCabe
     FanOut  ...
Collecting
    Dependency Data
2
    Dependencies
Collecting
    Dependency Data
2                  ‣ extracting package
                     import relations
    Dependenc...
Collecting
    Dependency Data
2                  ‣ extracting package
                     import relations
    Dependenc...
How to measure
Component Quality?


Input ✔   Predictor
           Model      Output
Component Quality
Component Quality
  Bug
database




Version-
 archive
Component Quality
  Bug               Bug 42233
                    FileSystemPreferences
database            lockFile() s...
Component Quality
  Bug               Bug 42233
                    FileSystemPreferences
database            lockFile() s...
Component Quality
  Bug               Bug 42233
                    FileSystemPreferences
database            lockFile() s...
Component Quality
  Bug               Bug 42233
                    FileSystemPreferences
database            lockFile() s...
Component Quality


                             Fixed Bug
                             42233




Maintenance branch
     ...
Component Quality

                                         #defects + 1
                             Fixed Bug
          ...
How to build
Predictor Models?

 Linear Regression     Support Vector
  Y = Xβ + ε           Machine
      McCabe         ...
Forward Prediction


                          t
V1     V2



               static analysis
               training bug d...
Results
Metric Correlations
    Metric                Level: package     Class
                           Project 2       Project ...
Metric Correlations
    Metric                Level: package     Class
                           Project 2       Project ...
Hit Rate
          actual   predicted
             1         4
             2         9    Hit rate = 50%
             3  ...
McCabe
FanOut
LoC
                 Predictions using
                 Linear Regression
Coupling




                     ...
Dependencies
                Predicting from
                Dependencies
       Support Vector
                        To...
Dependencies
                Predicting from
                Dependencies
       Support Vector
                         T...
Compare Results
                           Dependencies     Metrics
           80%



           60%
Hit rate




        ...
Compare Results
                           Dependencies     Metrics
           80%



           Complexity metrics have h...
Lessons Learned
                 Nagappan   Schröter
                   et al.     et al.   our study
metrics defect
 corr...
Lessons Learned
Lessons Learned
 Predictions based on static code features provide
limited results and depend on the project context
Lessons Learned
 Predictions based on static code features provide
limited results and depend on the project context


   ...
Lessons Learned
 Predictions based on static code features provide
limited results and depend on the project context


   ...
SQS Software Quality Systems AG

Stollwerckstraße 11
51149 Cologne, Germany
Phone: + 49 22 03 91 54 - 7149
Fax: + 49 22 03...
Thank you!
         SQS Software Quality Systems AG

         Stollwerckstraße 11
         51149 Cologne, Germany
        ...
Predicting Defects in SAP Java Code: An Experience Report
Upcoming SlideShare
Loading in...5
×

Predicting Defects in SAP Java Code: An Experience Report

20,803

Published on

Which components of a large software system are the
most defect-prone? In a study on a large SAP Java system,
we evaluated and compared a number of defect predictors,
based on code features such as complexity metrics, static
error detectors, change frequency, or component imports,
thus replicating a number of earlier case studies in an industrial
context. We found the overall predictive power to
be lower than expected; still, the resulting regression models
successfully predicted 50–60% of the 20% most defectprone
components.

Published in: Technology, News & Politics
1 Comment
3 Likes
Statistics
Notes
  • Could you please sand me a copy of your presentation to my e- mail chskrao@rediffmail.com
    Thanks in advance.
    Chenna Rao
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
20,803
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

Predicting Defects in SAP Java Code: An Experience Report

  1. 1. Predicting Defects in SAP Java Code An Experience Report by Tilman Holschuh (SQS AG) Markus Päuser (SAP AG) Kim Herzig (Saarland University) Thomas Zimmermann (Microsoft Research) Rahul Premraj (Vrije University Amsterdam) Andreas Zeller (Saarland University)
  2. 2. Motivation
  3. 3. Motivation Quality Manager
  4. 4. Motivation Quality Manager
  5. 5. Motivation Quality Manager
  6. 6. Motivation Quality Manager
  7. 7. Motivation Problems Quality Manager Resources Time Knowledge
  8. 8. Motivation Problems Quality Manager Resources Time Knowledge Where do we put the most effort?
  9. 9. Replicated 2 Studies
  10. 10. Replicated 2 Studies 1
  11. 11. Replicated 2 Studies 1 Source code Version archive Bug database
  12. 12. Replicated 2 Studies 1 Source code McCabe FanOut LoC Coupling Version archive Bug database
  13. 13. Replicated 2 Studies 1 Source code McCabe FanOut LoC Coupling Version archive Component Quality Bug database
  14. 14. Replicated 2 Studies 1 Source code McCabe FanOut LoC Coupling Version archive Predictor Component Quality Bug database
  15. 15. Replicated 2 Studies 2 Source code McCabe FanOut LoC Coupling Version archive Predictor Component Quality Bug database
  16. 16. Replicated 2 Studies 2 Source code McCabe FanOut Dependencies LoC Coupling Version archive Predictor Component Quality Bug database
  17. 17. The Product ‣ SAP Standard Software ‣ Large scale Java software system ( > 10M LoC ) ‣ Separated in projects ‣ Service pack release cycles
  18. 18. Defect Distribution graphic created with TreeMap (University of Maryland) see http://www.cs.umd.edu/hcil/treemap
  19. 19. Defect Distribution graphic created with TreeMap (University of Maryland) see http://www.cs.umd.edu/hcil/treemap
  20. 20. Defect Distribution 20% of the code contain ~75% of defects graphic created with TreeMap (University of Maryland) see http://www.cs.umd.edu/hcil/treemap
  21. 21. Defect Distribution 20% of the code contain ~75% of defects Upper bound for prediction graphic created with TreeMap (University of Maryland) see http://www.cs.umd.edu/hcil/treemap
  22. 22. Basics Predictor Input Model Output
  23. 23. How to collect Input Data? 1 2 McCabe FanOut LoC Dependencies Coupling
  24. 24. Collecting Metric Data 1 McCabe FanOut LoC Coupling
  25. 25. Collecting Metric Data ‣ Metric tools: ckjm, JDepend, ephyra 1 McCabe FanOut LoC Coupling
  26. 26. Collecting Metric Data ‣ Metric tools: ckjm, JDepend, ephyra 1 McCabe FanOut ‣ Static code checkers: LoC Coupling PMD, FindBugs
  27. 27. Collecting Metric Data ‣ Metric tools: ckjm, JDepend, ephyra 1 McCabe FanOut ‣ Static code checkers: LoC Coupling PMD, FindBugs ‣ Change frequency JDepend ckjm
  28. 28. Collecting Dependency Data 2 Dependencies
  29. 29. Collecting Dependency Data 2 ‣ extracting package import relations Dependencies
  30. 30. Collecting Dependency Data 2 ‣ extracting package import relations Dependencies ‣ Tool: JDepend JDepend
  31. 31. How to measure Component Quality? Input ✔ Predictor Model Output
  32. 32. Component Quality
  33. 33. Component Quality Bug database Version- archive
  34. 34. Component Quality Bug Bug 42233 FileSystemPreferences database lockFile() should close ... Version- archive v1.17 v1.18 v1.19
  35. 35. Component Quality Bug Bug 42233 FileSystemPreferences database lockFile() should close ... Fixed Bug 42233 Version- archive v1.17 v1.18 v1.19
  36. 36. Component Quality Bug Bug 42233 FileSystemPreferences database lockFile() should close ... Fixed Bug 42233 Version- archive v1.17 v1.18 v1.19
  37. 37. Component Quality Bug Bug 42233 FileSystemPreferences database lockFile() should close ... Fixed Bug 42233 Version- archive v1.17 v1.18 v1.19
  38. 38. Component Quality Fixed Bug 42233 Maintenance branch v1.17 v1.18 v1.19 Version- archive v1.17 v1.18 v1.19
  39. 39. Component Quality #defects + 1 Fixed Bug 42233 Maintenance branch v1.17 v1.18 v1.19 Version- archive v1.17 v1.18 v1.19
  40. 40. How to build Predictor Models? Linear Regression Support Vector Y = Xβ + ε Machine McCabe McCabe FanOut FanOut LoC LoC Dependencies Coupling Coupling
  41. 41. Forward Prediction t V1 V2 static analysis training bug data test bug data
  42. 42. Results
  43. 43. Metric Correlations Metric Level: package Class Project 2 Project 4 Sum 0.583 0.377 LoC Max 0.587 n/a Sum 0.583 0.299 McCabe Max 0.588 0.261 0.608 n/a Efferent Coupling Sum 0.557 0.264 Design Rules Max 0.578 n/a Sum 0.308 0.403 Changes Max 0.240 n/a
  44. 44. Metric Correlations Metric Level: package Class Project 2 Project 4 Sum 0.583 0.377 LoC Prediction is more precise at Max 0.587 n/a Sum 0.583 0.299 McCabe higher granularity levels Max 0.588 0.261 0.608 n/a Efferent Coupling Sum 0.557 0.264 Design Rules Max 0.578 n/a Sum 0.308 0.403 Changes Max 0.240 n/a
  45. 45. Hit Rate actual predicted 1 4 2 9 Hit rate = 50% 3 2 Top 20% 4 11 5 6 6 1 7 3 8 5 9 10 10 8 11 7
  46. 46. McCabe FanOut LoC Predictions using Linear Regression Coupling Top 5% Top 20% All projects 46% 55% Group 1 47% 63% Project 1 21% 43% Project 2 42% 64% Project 3 41% 55%
  47. 47. Dependencies Predicting from Dependencies Support Vector Top 5% Top 20% Machine Group 1 26% 43% Project 1 38% 50% Project 2 36% 46% Project 3 46% 49%
  48. 48. Dependencies Predicting from Dependencies Support Vector Top 5% Top 20% Machine Stable Group 1 prediction results 43% 26% across projects Project 1 38% 50% Project 2 36% 46% Project 3 46% 49%
  49. 49. Compare Results Dependencies Metrics 80% 60% Hit rate 40% 20% 0% Group 1 Project 1 Project 2 Project 3
  50. 50. Compare Results Dependencies Metrics 80% Complexity metrics have higher 60% predictive power Hit rate 40% 20% 0% Group 1 Project 1 Project 2 Project 3
  51. 51. Lessons Learned Nagappan Schröter et al. et al. our study metrics defect correlation ✔ n/a ✔ prediction possible ✔ ✔ ✔ forward prediction ✘ ✘ ✔ universal predictor ✘ ✘ ✘
  52. 52. Lessons Learned
  53. 53. Lessons Learned Predictions based on static code features provide limited results and depend on the project context
  54. 54. Lessons Learned Predictions based on static code features provide limited results and depend on the project context Software archives are reliable and easily accessible source of defect data
  55. 55. Lessons Learned Predictions based on static code features provide limited results and depend on the project context Software archives are reliable and easily accessible source of defect data Defects have many sources, and code is just one of them
  56. 56. SQS Software Quality Systems AG Stollwerckstraße 11 51149 Cologne, Germany Phone: + 49 22 03 91 54 - 7149 Fax: + 49 22 03 91 54 - 15 Email: tilman.holschuh@sqs.de Internet: www.sqs-group.com
  57. 57. Thank you! SQS Software Quality Systems AG Stollwerckstraße 11 51149 Cologne, Germany Phone: + 49 22 03 91 54 - 7149 Fax: + 49 22 03 91 54 - 15 Email: tilman.holschuh@sqs.de Internet: www.sqs-group.com

×