• Save
Predicting Defects using Network Analysis on Dependency Graphs
Upcoming SlideShare
Loading in...5
×
 

Predicting Defects using Network Analysis on Dependency Graphs

on

  • 4,491 views

Presented at ICSE 2008.

Presented at ICSE 2008.

Statistics

Views

Total Views
4,491
Views on SlideShare
4,270
Embed Views
221

Actions

Likes
2
Downloads
1
Comments
1

5 Embeds 221

http://thomas-zimmermann.com 102
http://www.techgig.com 87
http://www.scoop.it 26
http://www.slideshare.net 5
http://216.239.59.104 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Predicting Defects using Network Analysis on Dependency Graphs Predicting Defects using Network Analysis on Dependency Graphs Presentation Transcript

  • Predicting Defects using Network Analysis on Dependency Graphs Thomas Zimmermann, University of Calgary, Canada Nachiappan Nagappan, Microsoft Research, USA
  • Bugs are everywhere
  • Bugs are everywhere
  • Bugs are everywhere
  • Quality assurance is limited... ...by time...
  • Quality assurance is limited... ...by time... ...and by money.
  • Spent resources on the components that need it most, i.e., are most likely to fail.
  • Meet Jacob
  • Meet Jacob • Your QA manager
  • Meet Jacob • Your QA manager • Ten years knowledge of your project
  • Meet Jacob • Your QA manager • Ten years knowledge of your project • Aware of its history and the hot spots
  • But then Jacob left...
  • Meet Emily • Your new QA manager (replaces Jacob) • Not much experience with your project yet • How can she allocate resources effectively?
  • Meet Emily • Your new QA manager (replaces Jacob) • Not much experience with your project yet • How can she allocate resources effectively?
  • Indicators of defects • Code complexity - Basili et al. 1996, Subramanyam and Krishnan 2003, - Binkley and Schach 1998, Ohlsson and Alberg 1996, Nagappan et al. 2006
  • Indicators of defects • Code complexity - Basili et al. 1996, Subramanyam and Krishnan 2003, - Binkley and Schach 1998, Ohlsson and Alberg 1996, Nagappan et al. 2006 • Code churn - Nagappan and Ball 2005
  • Indicators of defects • Code complexity - Basili et al. 1996, Subramanyam and Krishnan 2003, - Binkley and Schach 1998, Ohlsson and Alberg 1996, Nagappan et al. 2006 • Code churn - Nagappan and Ball 2005 • Historical data - Khoshgoftaar et al. 1996, Graves et al. 2000, Kim et al. 2007, - Ostrand et al. 2005, Mockus et al. 2005
  • Indicators of defects • Code complexity - Basili et al. 1996, Subramanyam and Krishnan 2003, - Binkley and Schach 1998, Ohlsson and Alberg 1996, Nagappan et al. 2006 • Code churn - Nagappan and Ball 2005 • Historical data - Khoshgoftaar et al. 1996, Graves et al. 2000, Kim et al. 2007, - Ostrand et al. 2005, Mockus et al. 2005 • Code dependencies - Nagappan and Ball 2007, Schröter et al. 2006 - Zimmermann and Nagappan 2007
  • Centrality
  • Hypothesis Network measures on dependency graphs - correlate with the number of post-release defects (H1) - can predict the number of post-release defects (H2) - can indicate critical “escrow” binaries (H3)
  • DATA. .
  • 2252 Binaries 28.3 MLOC
  • Windows Server layout
  • Windows Server layout
  • Windows Server layout
  • Windows Server layout
  • Data collection Release point for Windows Server 2003
  • Data collection Release point for Windows Server 2003 Complexity Metrics Dependencies Network Measures
  • Data collection six months Release point for to collect Windows Server 2003 defects Complexity Metrics Dependencies Network Measures Defects
  • Dependencies • Directed relationship between two pieces of code (here: binaries) • MaX dependency analysis framework -Caller-callee dependencies - Imports and exports - RPC, COM - Runtime dependencies (such as LoadLibrary) - Registry access - etc.
  • Centrality • Degreethe number dependencies centrality - counts • Closeness centrality binaries into account - takes distance to all other - Closeness: How close are the other binaries? - Reach: How many binaries can be reached (weighted)? - Eigenvector: similar to Pagerank • Betweenness centrality paths through a binary - counts the number of shortest
  • Structural holes A B C No structural hole
  • Structural holes A A B B C C No structural hole No structural hole between B and C
  • Ego networks EGO
  • Ego networks EGO INOUT
  • Ego networks EGO IN INOUT
  • Ego networks EGO IN OUT INOUT
  • Complexity metrics Group Metrics Aggregation Module metrics # functions in B for a binary B # global variables in B # executable lines in f() # parameters in f() Per-function metrics Total # functions calling f() for a function f() Max # functions called by f() McCabe’s cyclomatic complexity of f() # methods in C # subclasses of C OO metrics Total Depth of C in the inheritance tree for a class C Max Coupling between classes Cyclic coupling between classes
  • RESULTS. .
  • 1 PATTERNS
  • Star pattern With defects No defects
  • Undirected cliques ... ...
  • Undirected cliques
  • Undirected cliques Average number of defects is higher for binaries in large cliques.
  • 2 PREDICTION
  • Prediction Model Input metrics and measures Prediction PCA Regression
  • Prediction Model Input metrics and measures Prediction PCA Regression Metrics SNA Metrics+SNA
  • Prediction Model Input metrics and measures Prediction PCA Regression Metrics Classification SNA Metrics+SNA Ranking
  • Classification Has a binary a defect or not? or
  • Ranking Which binaries have the most defects? or or ... or
  • Random splits
  • Random splits 4×50×
  • Classification (logistic regression)
  • Classification (logistic regression) SNA increases the recall by 0.10 (at p=0.01) while precision remains comparable.
  • Ranking (linear regression)
  • Ranking (linear regression) SNA+METRICS increases the correlation by 0.10 (significant at p=0.01)
  • 3 ESCROW
  • Escrow binaries • Escrowcritical binaries for Windows Server 2003 binaries -list of - development teams select binaries for escrow based on (past) experience • Special protocol for escrow binaries -involves more testing, code reviews
  • Predicting escrow binaries Network measures Recall GlobalInClosenessFreeman 0.60 GlobalIndwReach 0.60 EgoInSize 0.55 EgoInPairs 0.55 EgoInBroker 0.55 EgoInTies 0.50 GlobalInDegree 0.50 GlobalBetweenness 0.50 ... ... Complexity metrics Recall TotalParameters 0.30 TotalComplexity 0.30 TotalLines 0.30 TotalFanIn 0.30 TotalFanOut 0.30 ... ...
  • Predicting escrow binaries Network measures Recall GlobalInClosenessFreeman 0.60 GlobalIndwReach 0.60 EgoInSize 0.55 EgoInPairs 0.55 EgoInBroker 0.55 EgoInTies 0.50 GlobalInDegree 0.50 GlobalBetweenness 0.50 ... ... Complexity metrics Recall TotalParameters 0.30 TotalComplexity 0.30 TotalLines 0.30 TotalFanIn 0.30 Network measures predict twice as 0.30 many TotalFanOut ... escrow binaries as complexity metrics do. ...
  • CONCLUSION. . • Classification measures is 0.10 higher than for -Recall for network complexity metrics. - The precision remains comparable. • Ranking network mesures with complexity metrics -Combining increases the correlation by 0.10. • Escrow metrics fail to predict escrow binaries. - Complexity - Network measures predict 60% of escrow binaries.