Predicting Subsystem Defects using Dependency Graph Complexities

3,848 views

Published on

Presented at ISSRE 2007

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,848
On SlideShare
0
From Embeds
0
Number of Embeds
249
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Predicting Subsystem Defects using Dependency Graph Complexities

  1. 1. Predicting Subsystem Failures using Dependency Graph Complexities Thomas Zimmermann, University of Calgary, Canada Nachiappan Nagappan, Microsoft Research, USA
  2. 2. Predicting Subsystem Defects using Dependency Graph Complexities search: ISSRE Thomas Zimmermann, University of Calgary, Canada Nachiappan Nagappan, Microsoft Research, USA
  3. 3. Bugs are everywhere
  4. 4. Bugs are everywhere
  5. 5. Bugs are everywhere
  6. 6. Quality assurance is limited... ...by time...
  7. 7. Quality assurance is limited... ...by time... ...and by money.
  8. 8. Resource allocation Spent resources on the components that need it most, i.e., are most likely to fail.
  9. 9. Meet Jacob • Your QA manager • Ten years knowledge of your project • Aware of its history and the hot spots
  10. 10. Meet Jacob • Your QA manager • Ten years knowledge of your project • Aware of its history and the hot spots • Likes extreme sports
  11. 11. Meet Emily • Your new QA manager (replaces Jacob) • Not much experience with your project yet • How can she allocate resources effectively?
  12. 12. Meet Emily • Your new QA manager (replaces Jacob) • Not much experience with your project yet • How can she allocate resources effectively?
  13. 13. Indicators of failures  Code complexity ◦ Basili et al. 1996, Subramanyam and Krishnan 2003, ◦ Binkley and Schach 1998, Ohlsson and Alberg 1996, ◦ Nagappan et al. 2006  Code churn ◦ Nagappan and Ball 2005  Historical data ◦ Khoshgoftaar et al. 1996, Graves et al. 2000, Kim et al. 2007, ◦ Ostrand et al. 2005, Mockus et al. 2005  Code dependencies ◦ Nagappan and Ball 2007
  14. 14. Windows Server 2003
  15. 15. Windows Server 2003  2254 Binaries  28.4 MLOC
  16. 16. What are dependencies?  Dependency = (directed) relationship between two pieces of code
  17. 17. What are dependencies?  Dependency = (directed) relationship between two pieces of code  MaX dependency analysis framework ◦ Caller-callee dependencies ◦ Imports and exports ◦ RPC ◦ COM ◦ Runtime dependencies (such as LoadLibrary) ◦ Registry access ◦ etc.
  18. 18. Windows Server layout
  19. 19. Windows Server layout
  20. 20. Windows Server layout
  21. 21. Windows Server layout
  22. 22. Complexity of subsystems Subsystem A
  23. 23. Complexity of subsystems Subsystem A Subsystem B
  24. 24. Complexity of subsystems Subsystem A Subsystem B Which subsystem has more defects?
  25. 25. Complexity of subsystems Subsystem A Subsystem B Which subsystem has more defects? Our hypothesis: the more complex one.
  26. 26. Observation #1: Cycles Dependency cycles: No dependency cycle:
  27. 27. Observation #1: Cycles Dependency cycles: No dependency cycle: Binaries that are part of a dependency cycle have on average twice as many defects.
  28. 28. Observation #2: Cliques
  29. 29. Observation #2: Cliques
  30. 30. Observation #2: Cliques Average number of defects is higher for binaries in large cliques.
  31. 31. Data collection
  32. 32. Data collection
  33. 33. Data collection defects Defects
  34. 34. Dependency graphs What is the dependency graph of a subsystem?
  35. 35. Dependency graphs INTRA =Internal dependencies
  36. 36. Dependency graphs OUT =Outgoing dependencies
  37. 37. Dependency graphs DEP =“Neighborhood” =INTRA + OUT + more
  38. 38. Complexity measures #Nodes |V| Multiplicity Complexity #Edges |E| |E|-|V|+|P| Degree Density |E|/|V|2 Eccentricity Radius Diameter
  39. 39. Spearman correlations
  40. 40. Spearman correlations Complexity Measures
  41. 41. Spearman correlations Dependency Graphs Complexity Measures
  42. 42. Spearman correlations Dependency Graphs Complexity Measures
  43. 43. Spearman correlations Dependency Graphs Complexity Measures
  44. 44. Spearman correlations Dependency Graphs Complexity Measures
  45. 45. Spearman correlations Dependency Graphs Complexity Measures
  46. 46. Predicting failures NODES EDGES COMPLEXITY DENSITY DEGREE_MIN DEGREE_MAX DEGREE_AVG ECCENTRICITY_MIN ECCENTRICITY_MAX ECCENTRICITY_AVG MULTI_EDGES MULTI_COMPLEXITY MULTI_DENSITY MULTI_DEGREE_MIN MULTI_DEGREE_MAX MULTI_DEGREE_AVG MULTI_MULTIPLICITY_MIN MULTI_MULTIPLICITY_MAX MULTI_MULTIPLICITY_AVG MULTI_ECCENTRICITY_MIN MULTI_ECCENTRICITY_MAX MULTI_ECCENTRICITY_AVG
  47. 47. Predicting failures NODES EDGES COMPLEXITY DENSITY DEGREE_MIN DEGREE_MAX DEGREE_AVG ECCENTRICITY_MIN ECCENTRICITY_MAX ECCENTRICITY_AVG MULTI_EDGES MULTI_COMPLEXITY MULTI_DENSITY INTRA MULTI_DEGREE_MIN MULTI_DEGREE_MAX OUT MULTI_DEGREE_AVG MULTI_MULTIPLICITY_MIN MULTI_MULTIPLICITY_MAX DEP COMBINED MULTI_MULTIPLICITY_AVG MULTI_ECCENTRICITY_MIN MULTI_ECCENTRICITY_MAX MULTI_ECCENTRICITY_AVG
  48. 48. Ranking
  49. 49. Ranking Rank Subsystem Actual Rank 1 K 3 2 L 95 3 C 6 4 G 2 5 F 8 6 A 3 7 Y 12 8 O 1 9 B 18 10 M 35 ... (many more)
  50. 50. Ranking Rank Subsystem Actual Rank 1 K 3 2 L 95 3 C 6 4 G 2 5 F 8 6 A 3 7 Y 12 8 O 1 9 B 18 10 M 35 ... (many more)
  51. 51. Ranking Rank Subsystem Actual Rank 1 K 3 2 L 95 3 C 6 4 G 2 5 F 8 6 A 3 7 Y 12 8 O 1 9 B 18 10 M 35 ... (many more)
  52. 52. Ranking Rank Subsystem Actual Rank 1 K 3 2 L 95 3 C 6 4 G 2 5 F 8 6 A 3 7 Y 12 8 O 1 9 B 18 10 M 35 ... (many more)
  53. 53. Ranking Rank Subsystem Actual Rank 1 K 3 2 L 95 3 C 6 4 G 2 5 F 8 6 A 3 7 Y 12 8 O 1 9 B 18 10 M 35 ... (many more) Spearman correlation
  54. 54. Random splits
  55. 55. Random splits 4×50×
  56. 56. Random splits 4×50×
  57. 57. Linear regression
  58. 58. Linear regression
  59. 59. Linear regression A higher predicted rank corresponds to a higher observed rank
  60. 60. Impact of granularity
  61. 61. Impact of granularity The predictions are more reliable for coarse granularities…
  62. 62. Impact of granularity The predictions are more reliable for coarse granularities… …at the cost of locality and stability.
  63. 63. Future work
  64. 64. Future work • Assemble the pieces of the puzzle • Evolution of dependencies predictors? Are churned dependencies better • Development process development? What’s the impact of, say, global • Human and social factors
  65. 65. Conclusion • Cycles and cliques correlate with defects. • The complexity of the dependency structure predicts the number of defects. • Defect predictions help to allocate resources for QA more effectively. Slides on Slideshare.net (search for ISSRE)
  66. 66. Contact Email: tz@acm.org nachin@microsoft.com Internet: www.softevo.org research.microsoft.com/esm

×