Predicting Defects using Network Analysis on Dependency Graphs

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Predicting Defects using Network Analysis on Dependency Graphs - Presentation Transcript

    1. Predicting Defects using Network Analysis on Dependency Graphs Thomas Zimmermann, University of Calgary, Canada Nachiappan Nagappan, Microsoft Research, USA
    2. Bugs are everywhere
    3. Bugs are everywhere
    4. Bugs are everywhere
    5. Quality assurance is limited... ...by time...
    6. Quality assurance is limited... ...by time... ...and by money.
    7. Spent resources on the components that need it most, i.e., are most likely to fail.
    8. Meet Jacob
    9. Meet Jacob • Your QA manager
    10. Meet Jacob • Your QA manager • Ten years knowledge of your project
    11. Meet Jacob • Your QA manager • Ten years knowledge of your project • Aware of its history and the hot spots
    12. But then Jacob left...
    13. Meet Emily • Your new QA manager (replaces Jacob) • Not much experience with your project yet • How can she allocate resources effectively?
    14. Meet Emily • Your new QA manager (replaces Jacob) • Not much experience with your project yet • How can she allocate resources effectively?
    15. Indicators of defects • Code complexity - Basili et al. 1996, Subramanyam and Krishnan 2003, - Binkley and Schach 1998, Ohlsson and Alberg 1996, Nagappan et al. 2006
    16. Indicators of defects • Code complexity - Basili et al. 1996, Subramanyam and Krishnan 2003, - Binkley and Schach 1998, Ohlsson and Alberg 1996, Nagappan et al. 2006 • Code churn - Nagappan and Ball 2005
    17. Indicators of defects • Code complexity - Basili et al. 1996, Subramanyam and Krishnan 2003, - Binkley and Schach 1998, Ohlsson and Alberg 1996, Nagappan et al. 2006 • Code churn - Nagappan and Ball 2005 • Historical data - Khoshgoftaar et al. 1996, Graves et al. 2000, Kim et al. 2007, - Ostrand et al. 2005, Mockus et al. 2005
    18. Indicators of defects • Code complexity - Basili et al. 1996, Subramanyam and Krishnan 2003, - Binkley and Schach 1998, Ohlsson and Alberg 1996, Nagappan et al. 2006 • Code churn - Nagappan and Ball 2005 • Historical data - Khoshgoftaar et al. 1996, Graves et al. 2000, Kim et al. 2007, - Ostrand et al. 2005, Mockus et al. 2005 • Code dependencies - Nagappan and Ball 2007, Schröter et al. 2006 - Zimmermann and Nagappan 2007
    19. Centrality
    20. Hypothesis Network measures on dependency graphs - correlate with the number of post-release defects (H1) - can predict the number of post-release defects (H2) - can indicate critical “escrow” binaries (H3)
    21. DATA. .
    22. 2252 Binaries 28.3 MLOC
    23. Windows Server layout
    24. Windows Server layout
    25. Windows Server layout
    26. Windows Server layout
    27. Data collection Release point for Windows Server 2003
    28. Data collection Release point for Windows Server 2003 Complexity Metrics Dependencies Network Measures
    29. Data collection six months Release point for to collect Windows Server 2003 defects Complexity Metrics Dependencies Network Measures Defects
    30. Dependencies • Directed relationship between two pieces of code (here: binaries) • MaX dependency analysis framework -Caller-callee dependencies - Imports and exports - RPC, COM - Runtime dependencies (such as LoadLibrary) - Registry access - etc.
    31. Centrality • Degreethe number dependencies centrality - counts • Closeness centrality binaries into account - takes distance to all other - Closeness: How close are the other binaries? - Reach: How many binaries can be reached (weighted)? - Eigenvector: similar to Pagerank • Betweenness centrality paths through a binary - counts the number of shortest
    32. Structural holes A B C No structural hole
    33. Structural holes A A B B C C No structural hole No structural hole between B and C
    34. Ego networks EGO
    35. Ego networks EGO INOUT
    36. Ego networks EGO IN INOUT
    37. Ego networks EGO IN OUT INOUT
    38. Complexity metrics Group Metrics Aggregation Module metrics # functions in B for a binary B # global variables in B # executable lines in f() # parameters in f() Per-function metrics Total # functions calling f() for a function f() Max # functions called by f() McCabe’s cyclomatic complexity of f() # methods in C # subclasses of C OO metrics Total Depth of C in the inheritance tree for a class C Max Coupling between classes Cyclic coupling between classes
    39. RESULTS. .
    40. 1 PATTERNS
    41. Star pattern With defects No defects
    42. Undirected cliques ... ...
    43. Undirected cliques
    44. Undirected cliques Average number of defects is higher for binaries in large cliques.
    45. 2 PREDICTION
    46. Prediction Model Input metrics and measures Prediction PCA Regression
    47. Prediction Model Input metrics and measures Prediction PCA Regression Metrics SNA Metrics+SNA
    48. Prediction Model Input metrics and measures Prediction PCA Regression Metrics Classification SNA Metrics+SNA Ranking
    49. Classification Has a binary a defect or not? or
    50. Ranking Which binaries have the most defects? or or ... or
    51. Random splits
    52. Random splits 4×50×
    53. Classification (logistic regression)
    54. Classification (logistic regression) SNA increases the recall by 0.10 (at p=0.01) while precision remains comparable.
    55. Ranking (linear regression)
    56. Ranking (linear regression) SNA+METRICS increases the correlation by 0.10 (significant at p=0.01)
    57. 3 ESCROW
    58. Escrow binaries • Escrowcritical binaries for Windows Server 2003 binaries -list of - development teams select binaries for escrow based on (past) experience • Special protocol for escrow binaries -involves more testing, code reviews
    59. Predicting escrow binaries Network measures Recall GlobalInClosenessFreeman 0.60 GlobalIndwReach 0.60 EgoInSize 0.55 EgoInPairs 0.55 EgoInBroker 0.55 EgoInTies 0.50 GlobalInDegree 0.50 GlobalBetweenness 0.50 ... ... Complexity metrics Recall TotalParameters 0.30 TotalComplexity 0.30 TotalLines 0.30 TotalFanIn 0.30 TotalFanOut 0.30 ... ...
    60. Predicting escrow binaries Network measures Recall GlobalInClosenessFreeman 0.60 GlobalIndwReach 0.60 EgoInSize 0.55 EgoInPairs 0.55 EgoInBroker 0.55 EgoInTies 0.50 GlobalInDegree 0.50 GlobalBetweenness 0.50 ... ... Complexity metrics Recall TotalParameters 0.30 TotalComplexity 0.30 TotalLines 0.30 TotalFanIn 0.30 Network measures predict twice as 0.30 many TotalFanOut ... escrow binaries as complexity metrics do. ...
    61. CONCLUSION. . • Classification measures is 0.10 higher than for -Recall for network complexity metrics. - The precision remains comparable. • Ranking network mesures with complexity metrics -Combining increases the correlation by 0.10. • Escrow metrics fail to predict escrow binaries. - Complexity - Network measures predict 60% of escrow binaries.

    + Thomas ZimmermannThomas Zimmermann, 2 years ago

    custom

    1455 views, 0 favs, 2 embeds more stats

    Presented at ICSE 2008.

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 1455
      • 1395 on SlideShare
      • 60 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 0
    Most viewed embeds
    • 59 views on http://thomas-zimmermann.com
    • 1 views on http://216.239.59.104

    more

    All embeds
    • 59 views on http://thomas-zimmermann.com
    • 1 views on http://216.239.59.104

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories