Predicting Defects using Network Analysis on Dependency Graphs
1. Predicting Defects
using Network Analysis
on Dependency Graphs
Thomas Zimmermann, University of Calgary, Canada
Nachiappan Nagappan, Microsoft Research, USA
13. Meet Emily
• Your new QA manager
(replaces Jacob)
• Not much experience
with your project yet
• How can she allocate
resources effectively?
14. Meet Emily
• Your new QA manager
(replaces Jacob)
• Not much experience
with your project yet
• How can she allocate
resources effectively?
15. Indicators of defects
• Code complexity
- Basili et al. 1996, Subramanyam and Krishnan 2003,
- Binkley and Schach 1998, Ohlsson and Alberg 1996, Nagappan et al. 2006
16. Indicators of defects
• Code complexity
- Basili et al. 1996, Subramanyam and Krishnan 2003,
- Binkley and Schach 1998, Ohlsson and Alberg 1996, Nagappan et al. 2006
• Code churn
- Nagappan and Ball 2005
17. Indicators of defects
• Code complexity
- Basili et al. 1996, Subramanyam and Krishnan 2003,
- Binkley and Schach 1998, Ohlsson and Alberg 1996, Nagappan et al. 2006
• Code churn
- Nagappan and Ball 2005
• Historical data
- Khoshgoftaar et al. 1996, Graves et al. 2000, Kim et al. 2007,
- Ostrand et al. 2005, Mockus et al. 2005
18. Indicators of defects
• Code complexity
- Basili et al. 1996, Subramanyam and Krishnan 2003,
- Binkley and Schach 1998, Ohlsson and Alberg 1996, Nagappan et al. 2006
• Code churn
- Nagappan and Ball 2005
• Historical data
- Khoshgoftaar et al. 1996, Graves et al. 2000, Kim et al. 2007,
- Ostrand et al. 2005, Mockus et al. 2005
• Code dependencies
- Nagappan and Ball 2007, Schröter et al. 2006
- Zimmermann and Nagappan 2007
20. Hypothesis
Network measures on dependency graphs
- correlate with the number of post-release defects (H1)
- can predict the number of post-release defects (H2)
- can indicate critical “escrow” binaries (H3)
28. Data collection
Release point for
Windows Server 2003
Complexity Metrics
Dependencies
Network Measures
29. Data collection
six months
Release point for
to collect
Windows Server 2003
defects
Complexity Metrics
Dependencies
Network Measures Defects
30. Dependencies
• Directed relationship between two pieces
of code (here: binaries)
• MaX dependency analysis framework
-Caller-callee dependencies
- Imports and exports
- RPC, COM
- Runtime dependencies (such as LoadLibrary)
- Registry access
- etc.
31. Centrality
• Degreethe number dependencies
centrality
-
counts
• Closeness centrality binaries into account
-
takes distance to all other
- Closeness: How close are the other binaries?
- Reach: How many binaries can be reached (weighted)?
- Eigenvector: similar to Pagerank
• Betweenness centrality paths through a binary
-
counts the number of shortest
38. Complexity metrics
Group Metrics Aggregation
Module metrics # functions in B
for a binary B # global variables in B
# executable lines in f()
# parameters in f()
Per-function metrics Total
# functions calling f()
for a function f() Max
# functions called by f()
McCabe’s cyclomatic complexity of f()
# methods in C
# subclasses of C
OO metrics Total
Depth of C in the inheritance tree
for a class C Max
Coupling between classes
Cyclic coupling between classes
58. Escrow binaries
• Escrowcritical binaries for Windows Server 2003
binaries
-list of
- development teams select binaries for escrow based
on (past) experience
• Special protocol for escrow binaries
-involves more testing, code reviews