Your SlideShare is downloading. ×
Predicting Defects
using Network Analysis
on Dependency Graphs
Thomas Zimmermann, University of Calgary, Canada
Nachiappan...
Bugs are everywhere
Bugs are everywhere
Bugs are everywhere
Quality assurance is limited...

   ...by time...
Quality assurance is limited...

   ...by time...   ...and by money.
Spent resources on the
components that need it most,
  i.e., are most likely to fail.
Meet Jacob
Meet Jacob

• Your QA manager
Meet Jacob

• Your QA manager
• Ten years knowledge
  of your project
Meet Jacob

• Your QA manager
• Ten years knowledge
  of your project
• Aware of its history
  and the hot spots
But then Jacob left...
Meet Emily

  • Your new QA manager
    (replaces Jacob)
  • Not much experience
    with your project yet
  • How can she...
Meet Emily

  • Your new QA manager
    (replaces Jacob)
  • Not much experience
    with your project yet
  • How can she...
Indicators of defects
•   Code complexity
    -   Basili et al. 1996, Subramanyam and Krishnan 2003,
    -   Binkley and S...
Indicators of defects
•   Code complexity
    -   Basili et al. 1996, Subramanyam and Krishnan 2003,
    -   Binkley and S...
Indicators of defects
•   Code complexity
    -   Basili et al. 1996, Subramanyam and Krishnan 2003,
    -   Binkley and S...
Indicators of defects
•   Code complexity
    -   Basili et al. 1996, Subramanyam and Krishnan 2003,
    -   Binkley and S...
Centrality
Hypothesis


Network measures on dependency graphs
 - correlate with the number of post-release defects (H1)
 - can predic...
DATA.   .
2252 Binaries
28.3 MLOC
Windows Server layout
Windows Server layout
Windows Server layout
Windows Server layout
Data collection

 Release point for
Windows Server 2003
Data collection

 Release point for
Windows Server 2003




Complexity Metrics

  Dependencies

Network Measures
Data collection
                      six months
 Release point for
                       to collect
Windows Server 2003
...
Dependencies
• Directed relationship between two pieces
  of code (here: binaries)
• MaX dependency analysis framework
  -...
Centrality
• Degreethe number dependencies
          centrality
   -
   counts

• Closeness centrality binaries into accou...
Structural holes


 A
            B

 C
No structural hole
Structural holes


 A                    A
            B                    B

 C                    C
No structural hole ...
Ego networks




    EGO
Ego networks




    EGO




   INOUT
Ego networks




     EGO




IN
     INOUT
Ego networks




     EGO




IN           OUT
     INOUT
Complexity metrics
Group                  Metrics                                 Aggregation
Module metrics         # fun...
RESULTS.   .
1 PATTERNS
Star pattern

     With defects




               No defects
Undirected cliques



           ...       ...
Undirected cliques
Undirected cliques




    Average number of defects is
 higher for binaries in large cliques.
2 PREDICTION
Prediction

                             Model
Input metrics and measures                Prediction
                      ...
Prediction

                             Model
Input metrics and measures                Prediction
                      ...
Prediction

                             Model
Input metrics and measures                Prediction
                      ...
Classification


Has a binary a defect or not?




            or
Ranking


Which binaries have the most defects?




    or                or ... or
Random splits
Random splits




4×50×
Classification
 (logistic regression)
Classification
            (logistic regression)




SNA increases the recall by 0.10 (at p=0.01)
  while precision remains...
Ranking
(linear regression)
Ranking
          (linear regression)




SNA+METRICS increases the correlation
    by 0.10 (significant at p=0.01)
3 ESCROW
Escrow binaries

• Escrowcritical binaries for Windows Server 2003
            binaries
   -list of
   - development teams...
Predicting escrow binaries
 Network measures           Recall
 GlobalInClosenessFreeman   0.60
 GlobalIndwReach           ...
Predicting escrow binaries
 Network measures                      Recall
 GlobalInClosenessFreeman               0.60
 Glo...
CONCLUSION. .
• Classification measures is 0.10 higher than for
  -Recall for network
    complexity metrics.
  - The preci...
Upcoming SlideShare
Loading in...5
×

Predicting Defects using Network Analysis on Dependency Graphs

3,170

Published on

Presented at ICSE 2008.

Published in: Economy & Finance, Technology
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total Views
3,170
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
1
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Predicting Defects using Network Analysis on Dependency Graphs"

  1. 1. Predicting Defects using Network Analysis on Dependency Graphs Thomas Zimmermann, University of Calgary, Canada Nachiappan Nagappan, Microsoft Research, USA
  2. 2. Bugs are everywhere
  3. 3. Bugs are everywhere
  4. 4. Bugs are everywhere
  5. 5. Quality assurance is limited... ...by time...
  6. 6. Quality assurance is limited... ...by time... ...and by money.
  7. 7. Spent resources on the components that need it most, i.e., are most likely to fail.
  8. 8. Meet Jacob
  9. 9. Meet Jacob • Your QA manager
  10. 10. Meet Jacob • Your QA manager • Ten years knowledge of your project
  11. 11. Meet Jacob • Your QA manager • Ten years knowledge of your project • Aware of its history and the hot spots
  12. 12. But then Jacob left...
  13. 13. Meet Emily • Your new QA manager (replaces Jacob) • Not much experience with your project yet • How can she allocate resources effectively?
  14. 14. Meet Emily • Your new QA manager (replaces Jacob) • Not much experience with your project yet • How can she allocate resources effectively?
  15. 15. Indicators of defects • Code complexity - Basili et al. 1996, Subramanyam and Krishnan 2003, - Binkley and Schach 1998, Ohlsson and Alberg 1996, Nagappan et al. 2006
  16. 16. Indicators of defects • Code complexity - Basili et al. 1996, Subramanyam and Krishnan 2003, - Binkley and Schach 1998, Ohlsson and Alberg 1996, Nagappan et al. 2006 • Code churn - Nagappan and Ball 2005
  17. 17. Indicators of defects • Code complexity - Basili et al. 1996, Subramanyam and Krishnan 2003, - Binkley and Schach 1998, Ohlsson and Alberg 1996, Nagappan et al. 2006 • Code churn - Nagappan and Ball 2005 • Historical data - Khoshgoftaar et al. 1996, Graves et al. 2000, Kim et al. 2007, - Ostrand et al. 2005, Mockus et al. 2005
  18. 18. Indicators of defects • Code complexity - Basili et al. 1996, Subramanyam and Krishnan 2003, - Binkley and Schach 1998, Ohlsson and Alberg 1996, Nagappan et al. 2006 • Code churn - Nagappan and Ball 2005 • Historical data - Khoshgoftaar et al. 1996, Graves et al. 2000, Kim et al. 2007, - Ostrand et al. 2005, Mockus et al. 2005 • Code dependencies - Nagappan and Ball 2007, Schröter et al. 2006 - Zimmermann and Nagappan 2007
  19. 19. Centrality
  20. 20. Hypothesis Network measures on dependency graphs - correlate with the number of post-release defects (H1) - can predict the number of post-release defects (H2) - can indicate critical “escrow” binaries (H3)
  21. 21. DATA. .
  22. 22. 2252 Binaries 28.3 MLOC
  23. 23. Windows Server layout
  24. 24. Windows Server layout
  25. 25. Windows Server layout
  26. 26. Windows Server layout
  27. 27. Data collection Release point for Windows Server 2003
  28. 28. Data collection Release point for Windows Server 2003 Complexity Metrics Dependencies Network Measures
  29. 29. Data collection six months Release point for to collect Windows Server 2003 defects Complexity Metrics Dependencies Network Measures Defects
  30. 30. Dependencies • Directed relationship between two pieces of code (here: binaries) • MaX dependency analysis framework -Caller-callee dependencies - Imports and exports - RPC, COM - Runtime dependencies (such as LoadLibrary) - Registry access - etc.
  31. 31. Centrality • Degreethe number dependencies centrality - counts • Closeness centrality binaries into account - takes distance to all other - Closeness: How close are the other binaries? - Reach: How many binaries can be reached (weighted)? - Eigenvector: similar to Pagerank • Betweenness centrality paths through a binary - counts the number of shortest
  32. 32. Structural holes A B C No structural hole
  33. 33. Structural holes A A B B C C No structural hole No structural hole between B and C
  34. 34. Ego networks EGO
  35. 35. Ego networks EGO INOUT
  36. 36. Ego networks EGO IN INOUT
  37. 37. Ego networks EGO IN OUT INOUT
  38. 38. Complexity metrics Group Metrics Aggregation Module metrics # functions in B for a binary B # global variables in B # executable lines in f() # parameters in f() Per-function metrics Total # functions calling f() for a function f() Max # functions called by f() McCabe’s cyclomatic complexity of f() # methods in C # subclasses of C OO metrics Total Depth of C in the inheritance tree for a class C Max Coupling between classes Cyclic coupling between classes
  39. 39. RESULTS. .
  40. 40. 1 PATTERNS
  41. 41. Star pattern With defects No defects
  42. 42. Undirected cliques ... ...
  43. 43. Undirected cliques
  44. 44. Undirected cliques Average number of defects is higher for binaries in large cliques.
  45. 45. 2 PREDICTION
  46. 46. Prediction Model Input metrics and measures Prediction PCA Regression
  47. 47. Prediction Model Input metrics and measures Prediction PCA Regression Metrics SNA Metrics+SNA
  48. 48. Prediction Model Input metrics and measures Prediction PCA Regression Metrics Classification SNA Metrics+SNA Ranking
  49. 49. Classification Has a binary a defect or not? or
  50. 50. Ranking Which binaries have the most defects? or or ... or
  51. 51. Random splits
  52. 52. Random splits 4×50×
  53. 53. Classification (logistic regression)
  54. 54. Classification (logistic regression) SNA increases the recall by 0.10 (at p=0.01) while precision remains comparable.
  55. 55. Ranking (linear regression)
  56. 56. Ranking (linear regression) SNA+METRICS increases the correlation by 0.10 (significant at p=0.01)
  57. 57. 3 ESCROW
  58. 58. Escrow binaries • Escrowcritical binaries for Windows Server 2003 binaries -list of - development teams select binaries for escrow based on (past) experience • Special protocol for escrow binaries -involves more testing, code reviews
  59. 59. Predicting escrow binaries Network measures Recall GlobalInClosenessFreeman 0.60 GlobalIndwReach 0.60 EgoInSize 0.55 EgoInPairs 0.55 EgoInBroker 0.55 EgoInTies 0.50 GlobalInDegree 0.50 GlobalBetweenness 0.50 ... ... Complexity metrics Recall TotalParameters 0.30 TotalComplexity 0.30 TotalLines 0.30 TotalFanIn 0.30 TotalFanOut 0.30 ... ...
  60. 60. Predicting escrow binaries Network measures Recall GlobalInClosenessFreeman 0.60 GlobalIndwReach 0.60 EgoInSize 0.55 EgoInPairs 0.55 EgoInBroker 0.55 EgoInTies 0.50 GlobalInDegree 0.50 GlobalBetweenness 0.50 ... ... Complexity metrics Recall TotalParameters 0.30 TotalComplexity 0.30 TotalLines 0.30 TotalFanIn 0.30 Network measures predict twice as 0.30 many TotalFanOut ... escrow binaries as complexity metrics do. ...
  61. 61. CONCLUSION. . • Classification measures is 0.10 higher than for -Recall for network complexity metrics. - The precision remains comparable. • Ranking network mesures with complexity metrics -Combining increases the correlation by 0.10. • Escrow metrics fail to predict escrow binaries. - Complexity - Network measures predict 60% of escrow binaries.

×