Predicting Subsystem Defects using Dependency Graph Complexities

Thomas Zimmermann
Thomas ZimmermannResearcher at Microsoft Research
Predicting Subsystem Failures
  using Dependency Graph Complexities




  Thomas Zimmermann, University of Calgary, Canada
     Nachiappan Nagappan, Microsoft Research, USA
Predicting Subsystem Defects
  using Dependency Graph Complexities


                                         search: ISSRE




  Thomas Zimmermann, University of Calgary, Canada
     Nachiappan Nagappan, Microsoft Research, USA
Predicting Subsystem Defects using Dependency Graph Complexities
Bugs are everywhere
Bugs are everywhere
Bugs are everywhere
Quality assurance is limited...

   ...by time...
Quality assurance is limited...

   ...by time...   ...and by money.
Resource allocation
            Spent resources on the
        components that need it most,
          i.e., are most likely to fail.
Meet Jacob

• Your QA manager
• Ten years knowledge
  of your project
• Aware of its history
  and the hot spots
Meet Jacob

• Your QA manager
• Ten years knowledge
  of your project
• Aware of its history
  and the hot spots
• Likes extreme sports
Meet Emily

  • Your new QA manager
    (replaces Jacob)
  • Not much experience
    with your project yet
  • How can she allocate
    resources effectively?
Meet Emily

  • Your new QA manager
    (replaces Jacob)
  • Not much experience
    with your project yet
  • How can she allocate
    resources effectively?
Indicators of failures
 Code complexity
  ◦ Basili et al. 1996, Subramanyam and Krishnan 2003,
  ◦ Binkley and Schach 1998, Ohlsson and Alberg 1996,
  ◦ Nagappan et al. 2006
 Code churn
  ◦ Nagappan and Ball 2005
 Historical data
  ◦ Khoshgoftaar et al. 1996, Graves et al. 2000, Kim et al. 2007,
  ◦ Ostrand et al. 2005, Mockus et al. 2005
 Code dependencies
  ◦ Nagappan and Ball 2007
Windows Server 2003
Windows Server 2003




              2254 Binaries
              28.4 MLOC
What are dependencies?
 Dependency   = (directed)
 relationship between two pieces of code
What are dependencies?
 Dependency   = (directed)
 relationship between two pieces of code
 MaX dependency analysis framework
  ◦ Caller-callee dependencies
  ◦ Imports and exports
  ◦ RPC
  ◦ COM
  ◦ Runtime dependencies (such as LoadLibrary)
  ◦ Registry access
  ◦ etc.
Windows Server layout
Windows Server layout
Windows Server layout
Windows Server layout
Complexity of subsystems
  Subsystem A
Complexity of subsystems
  Subsystem A   Subsystem B
Complexity of subsystems
  Subsystem A          Subsystem B




  Which subsystem has more defects?
Complexity of subsystems
  Subsystem A           Subsystem B




  Which subsystem has more defects?
 Our hypothesis: the more complex one.
Observation #1: Cycles
Dependency cycles:



No dependency cycle:
Observation #1: Cycles
Dependency cycles:



No dependency cycle:



 Binaries that are part of a dependency cycle
   have on average twice as many defects.
Observation #2: Cliques
Observation #2: Cliques
Observation #2: Cliques




       Average number of defects is
    higher for binaries in large cliques.
Data collection
Data collection
Data collection


                  defects




                  Defects
Dependency graphs




           What is the dependency
            graph of a subsystem?
Dependency graphs




            INTRA
            =Internal dependencies
Dependency graphs




            OUT
            =Outgoing dependencies
Dependency graphs




            DEP
            =“Neighborhood”
            =INTRA + OUT + more
Complexity measures
   #Nodes |V|          Multiplicity

Complexity                 #Edges |E|
|E|-|V|+|P|      Degree
                                Density
                                |E|/|V|2
  Eccentricity
Radius          Diameter
Spearman correlations
Spearman correlations
 Complexity Measures
Spearman correlations
                       Dependency Graphs
 Complexity Measures
Spearman correlations
                       Dependency Graphs
 Complexity Measures
Spearman correlations
                       Dependency Graphs
 Complexity Measures
Spearman correlations
                       Dependency Graphs
 Complexity Measures
Spearman correlations
                       Dependency Graphs
 Complexity Measures
Predicting failures

NODES
EDGES
COMPLEXITY
DENSITY
DEGREE_MIN
DEGREE_MAX
DEGREE_AVG
ECCENTRICITY_MIN
ECCENTRICITY_MAX
ECCENTRICITY_AVG
MULTI_EDGES
MULTI_COMPLEXITY
MULTI_DENSITY
MULTI_DEGREE_MIN
MULTI_DEGREE_MAX
MULTI_DEGREE_AVG
MULTI_MULTIPLICITY_MIN
MULTI_MULTIPLICITY_MAX
MULTI_MULTIPLICITY_AVG
MULTI_ECCENTRICITY_MIN
MULTI_ECCENTRICITY_MAX
MULTI_ECCENTRICITY_AVG
Predicting failures

NODES
EDGES
COMPLEXITY
DENSITY
DEGREE_MIN
DEGREE_MAX
DEGREE_AVG
ECCENTRICITY_MIN
ECCENTRICITY_MAX
ECCENTRICITY_AVG
MULTI_EDGES
MULTI_COMPLEXITY
MULTI_DENSITY
                           INTRA
MULTI_DEGREE_MIN
MULTI_DEGREE_MAX            OUT
MULTI_DEGREE_AVG
MULTI_MULTIPLICITY_MIN
MULTI_MULTIPLICITY_MAX
                            DEP
                         COMBINED
MULTI_MULTIPLICITY_AVG
MULTI_ECCENTRICITY_MIN
MULTI_ECCENTRICITY_MAX
MULTI_ECCENTRICITY_AVG
Ranking
Ranking
 Rank   Subsystem     Actual Rank
   1         K             3
   2         L            95
   3         C             6
   4         G             2
   5         F             8
   6         A             3
   7         Y            12
   8        O              1
   9         B            18
  10        M             35
  ...   (many more)
Ranking
 Rank   Subsystem     Actual Rank
   1         K             3
   2         L            95
   3         C             6
   4         G             2
   5         F             8
   6         A             3
   7         Y            12
   8        O              1
   9         B            18
  10        M             35
  ...   (many more)
Ranking
 Rank   Subsystem     Actual Rank
   1         K             3
   2         L            95
   3         C             6
   4         G             2
   5         F             8
   6         A             3
   7         Y            12
   8        O              1
   9         B            18
  10        M             35
  ...   (many more)
Ranking
 Rank   Subsystem     Actual Rank
   1         K             3
   2         L            95
   3         C             6
   4         G             2
   5         F             8
   6         A             3
   7         Y            12
   8        O              1
   9         B            18
  10        M             35
  ...   (many more)
Ranking
 Rank   Subsystem     Actual Rank
   1         K             3
   2         L            95
   3         C             6
   4         G             2
   5         F             8
   6         A             3
   7         Y            12
   8        O              1
   9         B            18
  10        M             35
  ...   (many more)
                         Spearman correlation
Random splits
Random splits




4×50×
Random splits




4×50×
Linear regression
Linear regression
Linear regression



    A higher predicted rank corresponds
         to a higher observed rank
Impact of granularity
Impact of granularity


      The predictions are more reliable
         for coarse granularities…
Impact of granularity


      The predictions are more reliable
         for coarse granularities…


     …at the cost of locality and stability.
Future work
Future work

• Assemble the pieces of the puzzle
• Evolution of dependencies predictors?
  Are churned dependencies better

• Development process development?
  What’s the impact of, say, global

• Human and social factors
Conclusion

• Cycles and cliques correlate with defects.
• The complexity of the dependency
  structure predicts the number of defects.
• Defect predictions help to allocate
  resources for QA more effectively.
   Slides on Slideshare.net (search for ISSRE)
Contact

          Email:
       tz@acm.org
  nachin@microsoft.com

         Internet:
     www.softevo.org
research.microsoft.com/esm
1 of 67

More Related Content

Similar to Predicting Subsystem Defects using Dependency Graph Complexities (20)

Mutant Tests Too: The SQLMutant Tests Too: The SQL
Mutant Tests Too: The SQL
DataWorks Summit522 views
Devops is all greekDevops is all greek
Devops is all greek
Lori MacVittie1.3K views
ReportReport
Report
Conor McMenamin482 views
The Art Of Performance TuningThe Art Of Performance Tuning
The Art Of Performance Tuning
Jonathan Ross189 views
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect Prediction
Sung Kim3.7K views

More from Thomas Zimmermann(20)

Software Analytics = Sharing InformationSoftware Analytics = Sharing Information
Software Analytics = Sharing Information
Thomas Zimmermann3.3K views
MSR 2013 PreviewMSR 2013 Preview
MSR 2013 Preview
Thomas Zimmermann21.8K views
Analytics for smarter software development Analytics for smarter software development
Analytics for smarter software development
Thomas Zimmermann2.6K views
Klingon Countdown TimerKlingon Countdown Timer
Klingon Countdown Timer
Thomas Zimmermann1.3K views
Data driven games user researchData driven games user research
Data driven games user research
Thomas Zimmermann1.5K views
Security trend analysis with CVE topic modelsSecurity trend analysis with CVE topic models
Security trend analysis with CVE topic models
Thomas Zimmermann1.5K views
Analytics for software developmentAnalytics for software development
Analytics for software development
Thomas Zimmermann4.6K views
Cross-project defect predictionCross-project defect prediction
Cross-project defect prediction
Thomas Zimmermann1.9K views
Quality of Bug Reports in Open SourceQuality of Bug Reports in Open Source
Quality of Bug Reports in Open Source
Thomas Zimmermann1.6K views
Meet Tom and his FishMeet Tom and his Fish
Meet Tom and his Fish
Thomas Zimmermann1.5K views
Got Myth? Myths in Software EngineeringGot Myth? Myths in Software Engineering
Got Myth? Myths in Software Engineering
Thomas Zimmermann5.9K views
Mining Workspace Updates in CVSMining Workspace Updates in CVS
Mining Workspace Updates in CVS
Thomas Zimmermann632 views

Predicting Subsystem Defects using Dependency Graph Complexities