Changes and Bugs: Mining and Predicting Development Activities

Thomas Zimmermann
Thomas ZimmermannResearcher at Microsoft Research
MINING
                                             SOFTW
                                                  ARE AR
                                                        CHIVES




 Changes and Bugs
Mining and Predicting Development Activities



               Thomas Zimmermann
             Saarbrücken, May 26, 2008
Software development



         Build
Collaboration
Collaboration




Comm.
Archive
Collaboration




Comm.     Version
Archive   Archive
Collaboration




Comm.     Version     Bug
Archive   Archive   Database
Collaboration




Comm.      Version      Bug
Archive    Archive    Database


  Mining Software Archives
eROSE: Guiding developers

       Customers who
     bought this item also
          bought...



Purchase
 History
eROSE: Guiding developers

       Customers who               Developers who
     bought this item also       changed this function
          bought...                 also changed...



Purchase                     Version
 History                     Archive
Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development Activities
eROSE suggests further locations.
THIS THESIS   .
THIS THESIS                                                           .
additions analysis architecture archives aspects   bug cached calls
changes collaboration complexities component concerns cross-
cutting cvs data defects design development drawing dynamine
eclipse effort evolves failures fine-grained fix fix-inducing
graphs     hatari   history locate matching method mining
predicting program programmers report repositories
revision software support system taking transactions
version visualizing
THIS THESIS                                                           .
additions analysis architecture archives aspects   bug cached calls
changes collaboration complexities component concerns cross-
cutting cvs data defects design development drawing dynamine
eclipse effort evolves failures fine-grained fix fix-inducing
graphs     hatari   history locate matching method mining
predicting program programmers report repositories
revision software support system taking transactions
version visualizing
Contributions of the thesis

Fine-grained analysis of version archives.              1
Project-specific usage patterns of methods (FSE 2005)
Identification of cross-cutting changes (ASE 2006)



Mining bug databases to predict defects.                2
Dependencies predict defects (ISSRE 2007, ICSE 2008)
Domino effect: depending on defect-prone binaries increases
the chances of having defects (In submission).
Contributions of the thesis

Fine-grained analysis of version archives.              1
Project-specific usage patterns of methods (FSE 2005)
Identification of cross-cutting changes (ASE 2006)



Mining bug databases to predict defects.                2
Dependencies predict defects (ISSRE 2007, ICSE 2008)
Domino effect: depending on defect-prone binaries increases
the chances of having defects (In submission).
Fine-grained analysis

public void createPartControl(Composite parent) {
    ...
    // add listener for editor page activation
    getSite().getPage().addPartListener(partListener);
}

public void dispose() {
    ...
    getSite().getPage().removePartListener(partListener);
}
Fine-grained analysis

public void createPartControl(Composite parent) {
    ...
    // add listener for editor page activation
    getSite().getPage().addPartListener(partListener);
}

public void dispose() {
    ...
    getSite().getPage().removePartListener(partListener);
}
Fine-grained analysis

public void createPartControl(Composite parent) {
    ...
    // add listener for editor page activation
    getSite().getPage().addPartListener(partListener);
}

public void dispose() {         co-added
    ...
    getSite().getPage().removePartListener(partListener);
}
Fine-grained analysis

public void createPartControl(Composite parent) {
    ...
    // add listener for editor page activation
    getSite().getPage().addPartListener(partListener);
}

public void dispose() {         co-added
    ...
    getSite().getPage().removePartListener(partListener);
}
Fine-grained analysis

public void createPartControl(Composite parent) {
    ...                                                      close
    // add listener for editor page activation      open
    getSite().getPage().addPartListener(partListener);      println
}

public void dispose() {          co-added
    ...
    getSite().getPage().removePartListener(partListener);
}                                                             begin
Fine-grained analysis

public void createPartControl(Composite parent) {
    ...                                                      close
    // add listener for editor page activation      open
    getSite().getPage().addPartListener(partListener);      println
}

public void dispose() {          co-added
    ...
    getSite().getPage().removePartListener(partListener);
}                                                             begin




           Co-added items = patterns
Fine-grained analysis
Fine-grained analysis
         public static final native void _XFree(int address);
         public static final void XFree(int /*long*/ address) {
               lock.lock();
               try {
                     _XFree(address);
               } finally {
                     lock.unlock();
               }
         }
Fine-grained analysis
         public static final native void _XFree(int address);
         public static final void XFree(int /*long*/ address) {
               lock.lock();
               try {
                     _XFree(address);
               } finally {
                     lock.unlock();
               }
         }

                         D IN
                      NGE IONS
                 CHA CAT
                128 4 LO
Fine-grained analysis
                  public static final native void _XFree(int address);
                  public static final void XFree(int /*long*/ address) {
                        lock.lock();
                        try {
                              _XFree(address);
                        } finally {
                              lock.unlock();
                        }
                  }

                                  D IN
                               NGE IONS
                          CHA CAT
                         128 4 LO


Crosscutting changes = aspect candidates
Contributions of the thesis

Fine-grained analysis of version archives.              1
Project-specific usage patterns of methods (FSE 2005)
Identification of cross-cutting changes (ASE 2006)



Mining bug databases to predict defects.                2
Dependencies predict defects (ISSRE 2007, ICSE 2008)
Domino effect: depending on defect-prone binaries increases
the chances of having defects (In submission).
Bugs! Bugs! Bugs!
Quality assurance is limited...

   ...by time...
Quality assurance is limited...

   ...by time...   ...and by money.
Spent resources on the
components that need it most,
  i.e., are most likely to fail.
Indicators of defects
•   Code complexity
    -   Basili et al. 1996, Subramanyam and Krishnan 2003,
    -   Binkley and Schach 1998, Ohlsson and Alberg 1996,
    -   Nagappan et al. 2006, Knab et al. 2006
Indicators of defects
•   Code complexity
    -   Basili et al. 1996, Subramanyam and Krishnan 2003,
    -   Binkley and Schach 1998, Ohlsson and Alberg 1996,
    -   Nagappan et al. 2006, Knab et al. 2006

•   Code churn
    -   Nagappan and Ball 2005
Indicators of defects
•   Code complexity
    -   Basili et al. 1996, Subramanyam and Krishnan 2003,
    -   Binkley and Schach 1998, Ohlsson and Alberg 1996,
    -   Nagappan et al. 2006, Knab et al. 2006

•   Code churn
    -   Nagappan and Ball 2005

•   Historical data
    -   Khoshgoftaar et al. 1996, Graves et al. 2000, Kim et al. 2007,
    -   Ostrand et al. 2005, Mockus et al. 2005
Indicators of defects
•   Code complexity
    -   Basili et al. 1996, Subramanyam and Krishnan 2003,
    -   Binkley and Schach 1998, Ohlsson and Alberg 1996,
    -   Nagappan et al. 2006, Knab et al. 2006

•   Code churn
    -   Nagappan and Ball 2005

•   Historical data
    -   Khoshgoftaar et al. 1996, Graves et al. 2000, Kim et al. 2007,
    -   Ostrand et al. 2005, Mockus et al. 2005

•   Code dependencies
    -   Nagappan and Ball 2007, Schröter et al. 2006
2252 Binaries
28.3 MLOC
Windows Server layout
Windows Server layout
Windows Server layout
Windows Server layout
Hypotheses

Complexity of dependency graphs                             Sub
                                                          system
correlates with the number of post-release defects (H1)    level
can predict the number of post-release defects (H2)
Hypotheses

Complexity of dependency graphs                             Sub
                                                          system
correlates with the number of post-release defects (H1)    level
can predict the number of post-release defects (H2)



Network measures on dependency graphs                     Binary
correlate with the number of post-release defects (H3)     level

can predict the number of post-release defects (H4)
can indicate critical “escrow” binaries (H5)
DATA.   .
Data collection

 Release point for
Windows Server 2003
Data collection

 Release point for
Windows Server 2003




  Dependencies

Network Measures

Complexity Metrics
Data collection
                      six months
 Release point for
                       to collect
Windows Server 2003
                        defects



  Dependencies

Network Measures

Complexity Metrics     Defects
Dependencies
• Directed relationship between two pieces
  of code (here: binaries)
• MaX dependency analysis framework
  -Caller-callee dependencies
  - Imports and exports
  - RPC, COM
  - Runtime dependencies (such as LoadLibrary)
  - Registry access
  - etc.
Centrality
Centrality




Degree
Blue binary has dependencies
to many other binaries
Centrality




Degree                         Closeness
Blue binary has dependencies   Blue binary is close to all other
to many other binaries         binaries (only two steps)
Centrality




Degree                         Closeness                           Betweenness
Blue binary has dependencies   Blue binary is close to all other   Blue binary connects the left
to many other binaries         binaries (only two steps)           with the right graph (bridge)
Centrality
• Degreethe number dependencies
          centrality
   -
   counts

• Closeness centrality binaries into account
   -
   takes distance to all other
   - Closeness: How close are the other binaries?
   - Reach: How many binaries can be reached (weighted)?
   - Eigenvector: similar to Pagerank
• Betweenness centrality paths through a binary
   -
   counts the number of shortest
Ego networks




    EGO
Ego networks




    EGO




   INOUT
Ego networks




     EGO




IN
     INOUT
Ego networks




     EGO




IN           OUT
     INOUT
Complexity metrics
Group                  Metrics                                 Aggregation
Module metrics         # functions in B
for a binary B         # global variables in B
                       # executable lines in f()
                       # parameters in f()
Per-function metrics                                              Total
                       # functions calling f()
for a function f()                                                Max
                       # functions called by f()
                       McCabe’s cyclomatic complexity of f()
                       # methods in C
                       # subclasses of C
OO metrics                                                        Total
                       Depth of C in the inheritance tree
for a class C                                                     Max
                       Coupling between classes
                       Cyclic coupling between classes
RESULTS.   .
1 PATTERNS
Star pattern
Star pattern

     With defects




               No defects
Undirected cliques



           ...       ...
Undirected cliques
Undirected cliques




    Average number of defects is
 higher for binaries in large cliques.
2 PREDICTION
Prediction


Input metrics and measures   Model        Prediction
                               PCA
                             Regression
Prediction


Input metrics and measures   Model        Prediction
                               PCA
                             Regression
  Metrics
                 SNA

 Metrics+SNA
Prediction


Input metrics and measures   Model        Prediction
                               PCA
                             Regression
  Metrics                                     Classification
                 SNA

 Metrics+SNA                                   Ranking
Classification


Has a binary a defect or not?




            or
Ranking


Which binaries have the most defects?




    or                or ... or
Random splits
Random splits




4×50×
Classification
 (logistic regression)
Classification
 (logistic regression)
Classification
            (logistic regression)




SNA increases the recall by 0.10 (at p=0.01)
  while precision remains comparable.
Ranking
(linear regression)
Ranking
          (linear regression)




SNA+METRICS increases the correlation
    by 0.10 (significant at p=0.01)
FUTURE WORK                                                           .
additions analysis architecture archives aspects   bug cached calls
changes collaboration complexities component concerns cross-
cutting cvs data defects design development drawing dynamine
eclipse effort evolves failures fine-grained fix fix-inducing
graphs     hatari   history locate matching method mining
predicting program programmers report repositories
revision software support system taking transactions
version visualizing
FUTURE WORK                                                             .
analysis archives aspects bug   changes collaboration
complexities component concerns cross-cutting cvs data defects

design development drawing   eclipse        erose evolves   factor
failures fine-grained fix fix-inducing fm    graphs guide      hatari

history human matching mining networking
predicting program programmers quality report repositories
revision social software support system taking version
Collaboration




Comm.     Version     Bug
Archive   Archive   Database
Collaboration



          Collab.
           Data


Comm.               Version     Bug
Archive             Archive   Database
Collaboration



          Collab.             Effort
           Data               Data


Comm.               Version              Bug
Archive             Archive            Database
Collaboration



          Collab.             Effort
           Data               Data


Comm.               Version              Bug
Archive             Archive            Database
Collaboration



                  Collab.             Effort
                   Data               Data


        Comm.               Version              Bug
        Archive             Archive            Database


Social Networking for Software Development
Contributions of the thesis

Fine-grained analysis of version archives.              1
Project-specific usage patterns of methods (FSE 2005)
Identification of cross-cutting changes (ASE 2006)



Mining bug databases to predict defects.                2
Dependencies predict defects (ISSRE 2007, ICSE 2008)
Domino effect: depending on defect-prone binaries increases
the chances of having defects (In submission).
1 of 87

More Related Content

What's hot(20)

bluespec talkbluespec talk
bluespec talk
Suman Karumuri2K views
Pattern Matching in Java 14Pattern Matching in Java 14
Pattern Matching in Java 14
GlobalLogic Ukraine926 views
The Rust Borrow CheckerThe Rust Borrow Checker
The Rust Borrow Checker
Nell Shamrell-Harrington376 views
ikh331-06-distributed-programmingikh331-06-distributed-programming
ikh331-06-distributed-programming
Anung Ariwibowo511 views
OpenCog Developer WorkshopOpenCog Developer Workshop
OpenCog Developer Workshop
Ibby Benali479 views
Advanced Debugging Using Java BytecodesAdvanced Debugging Using Java Bytecodes
Advanced Debugging Using Java Bytecodes
Ganesh Samarthyam8.4K views
concurrency gparsconcurrency gpars
concurrency gpars
Paul King18.9K views
Java_practical_handbookJava_practical_handbook
Java_practical_handbook
Manusha Dilan565 views
Dagger & rxjava & retrofitDagger & rxjava & retrofit
Dagger & rxjava & retrofit
Ted Liang3.7K views
ThreadThread
Thread
phanleson1K views
Java 7 at SoftShake 2011Java 7 at SoftShake 2011
Java 7 at SoftShake 2011
julien.ponge843 views
Java 7 JUG Summer CampJava 7 JUG Summer Camp
Java 7 JUG Summer Camp
julien.ponge958 views
C++ Advanced FeaturesC++ Advanced Features
C++ Advanced Features
Michael Redlich96 views
C++ Advanced FeaturesC++ Advanced Features
C++ Advanced Features
Michael Redlich280 views
Reactive Access to MongoDB from Java 8Reactive Access to MongoDB from Java 8
Reactive Access to MongoDB from Java 8
Hermann Hueck8.8K views

Similar to Changes and Bugs: Mining and Predicting Development Activities(20)

More from Thomas Zimmermann(20)

Software Analytics = Sharing InformationSoftware Analytics = Sharing Information
Software Analytics = Sharing Information
Thomas Zimmermann3.3K views
MSR 2013 PreviewMSR 2013 Preview
MSR 2013 Preview
Thomas Zimmermann21.8K views
Analytics for smarter software development Analytics for smarter software development
Analytics for smarter software development
Thomas Zimmermann2.6K views
Klingon Countdown TimerKlingon Countdown Timer
Klingon Countdown Timer
Thomas Zimmermann1.3K views
Data driven games user researchData driven games user research
Data driven games user research
Thomas Zimmermann1.5K views
Security trend analysis with CVE topic modelsSecurity trend analysis with CVE topic models
Security trend analysis with CVE topic models
Thomas Zimmermann1.5K views
Analytics for software developmentAnalytics for software development
Analytics for software development
Thomas Zimmermann4.6K views
Cross-project defect predictionCross-project defect prediction
Cross-project defect prediction
Thomas Zimmermann1.9K views
Quality of Bug Reports in Open SourceQuality of Bug Reports in Open Source
Quality of Bug Reports in Open Source
Thomas Zimmermann1.6K views
Meet Tom and his FishMeet Tom and his Fish
Meet Tom and his Fish
Thomas Zimmermann1.5K views
Got Myth? Myths in Software EngineeringGot Myth? Myths in Software Engineering
Got Myth? Myths in Software Engineering
Thomas Zimmermann5.9K views
Mining Workspace Updates in CVSMining Workspace Updates in CVS
Mining Workspace Updates in CVS
Thomas Zimmermann632 views

Recently uploaded(20)

Green Leaf Consulting: Capabilities DeckGreen Leaf Consulting: Capabilities Deck
Green Leaf Consulting: Capabilities Deck
GreenLeafConsulting170 views
[2023] Putting the R! in R&D.pdf[2023] Putting the R! in R&D.pdf
[2023] Putting the R! in R&D.pdf
Eleanor McHugh34 views
Liqid: Composable CXL PreviewLiqid: Composable CXL Preview
Liqid: Composable CXL Preview
CXL Forum118 views
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
Prity Khastgir IPR Strategic India Patent Attorney Amplify Innovation23 views

Changes and Bugs: Mining and Predicting Development Activities

  • 1. MINING SOFTW ARE AR CHIVES Changes and Bugs Mining and Predicting Development Activities Thomas Zimmermann Saarbrücken, May 26, 2008
  • 5. Collaboration Comm. Version Archive Archive
  • 6. Collaboration Comm. Version Bug Archive Archive Database
  • 7. Collaboration Comm. Version Bug Archive Archive Database Mining Software Archives
  • 8. eROSE: Guiding developers Customers who bought this item also bought... Purchase History
  • 9. eROSE: Guiding developers Customers who Developers who bought this item also changed this function bought... also changed... Purchase Version History Archive
  • 15. THIS THESIS . additions analysis architecture archives aspects bug cached calls changes collaboration complexities component concerns cross- cutting cvs data defects design development drawing dynamine eclipse effort evolves failures fine-grained fix fix-inducing graphs hatari history locate matching method mining predicting program programmers report repositories revision software support system taking transactions version visualizing
  • 16. THIS THESIS . additions analysis architecture archives aspects bug cached calls changes collaboration complexities component concerns cross- cutting cvs data defects design development drawing dynamine eclipse effort evolves failures fine-grained fix fix-inducing graphs hatari history locate matching method mining predicting program programmers report repositories revision software support system taking transactions version visualizing
  • 17. Contributions of the thesis Fine-grained analysis of version archives. 1 Project-specific usage patterns of methods (FSE 2005) Identification of cross-cutting changes (ASE 2006) Mining bug databases to predict defects. 2 Dependencies predict defects (ISSRE 2007, ICSE 2008) Domino effect: depending on defect-prone binaries increases the chances of having defects (In submission).
  • 18. Contributions of the thesis Fine-grained analysis of version archives. 1 Project-specific usage patterns of methods (FSE 2005) Identification of cross-cutting changes (ASE 2006) Mining bug databases to predict defects. 2 Dependencies predict defects (ISSRE 2007, ICSE 2008) Domino effect: depending on defect-prone binaries increases the chances of having defects (In submission).
  • 19. Fine-grained analysis public void createPartControl(Composite parent) { ... // add listener for editor page activation getSite().getPage().addPartListener(partListener); } public void dispose() { ... getSite().getPage().removePartListener(partListener); }
  • 20. Fine-grained analysis public void createPartControl(Composite parent) { ... // add listener for editor page activation getSite().getPage().addPartListener(partListener); } public void dispose() { ... getSite().getPage().removePartListener(partListener); }
  • 21. Fine-grained analysis public void createPartControl(Composite parent) { ... // add listener for editor page activation getSite().getPage().addPartListener(partListener); } public void dispose() { co-added ... getSite().getPage().removePartListener(partListener); }
  • 22. Fine-grained analysis public void createPartControl(Composite parent) { ... // add listener for editor page activation getSite().getPage().addPartListener(partListener); } public void dispose() { co-added ... getSite().getPage().removePartListener(partListener); }
  • 23. Fine-grained analysis public void createPartControl(Composite parent) { ... close // add listener for editor page activation open getSite().getPage().addPartListener(partListener); println } public void dispose() { co-added ... getSite().getPage().removePartListener(partListener); } begin
  • 24. Fine-grained analysis public void createPartControl(Composite parent) { ... close // add listener for editor page activation open getSite().getPage().addPartListener(partListener); println } public void dispose() { co-added ... getSite().getPage().removePartListener(partListener); } begin Co-added items = patterns
  • 26. Fine-grained analysis public static final native void _XFree(int address); public static final void XFree(int /*long*/ address) { lock.lock(); try { _XFree(address); } finally { lock.unlock(); } }
  • 27. Fine-grained analysis public static final native void _XFree(int address); public static final void XFree(int /*long*/ address) { lock.lock(); try { _XFree(address); } finally { lock.unlock(); } } D IN NGE IONS CHA CAT 128 4 LO
  • 28. Fine-grained analysis public static final native void _XFree(int address); public static final void XFree(int /*long*/ address) { lock.lock(); try { _XFree(address); } finally { lock.unlock(); } } D IN NGE IONS CHA CAT 128 4 LO Crosscutting changes = aspect candidates
  • 29. Contributions of the thesis Fine-grained analysis of version archives. 1 Project-specific usage patterns of methods (FSE 2005) Identification of cross-cutting changes (ASE 2006) Mining bug databases to predict defects. 2 Dependencies predict defects (ISSRE 2007, ICSE 2008) Domino effect: depending on defect-prone binaries increases the chances of having defects (In submission).
  • 31. Quality assurance is limited... ...by time...
  • 32. Quality assurance is limited... ...by time... ...and by money.
  • 33. Spent resources on the components that need it most, i.e., are most likely to fail.
  • 34. Indicators of defects • Code complexity - Basili et al. 1996, Subramanyam and Krishnan 2003, - Binkley and Schach 1998, Ohlsson and Alberg 1996, - Nagappan et al. 2006, Knab et al. 2006
  • 35. Indicators of defects • Code complexity - Basili et al. 1996, Subramanyam and Krishnan 2003, - Binkley and Schach 1998, Ohlsson and Alberg 1996, - Nagappan et al. 2006, Knab et al. 2006 • Code churn - Nagappan and Ball 2005
  • 36. Indicators of defects • Code complexity - Basili et al. 1996, Subramanyam and Krishnan 2003, - Binkley and Schach 1998, Ohlsson and Alberg 1996, - Nagappan et al. 2006, Knab et al. 2006 • Code churn - Nagappan and Ball 2005 • Historical data - Khoshgoftaar et al. 1996, Graves et al. 2000, Kim et al. 2007, - Ostrand et al. 2005, Mockus et al. 2005
  • 37. Indicators of defects • Code complexity - Basili et al. 1996, Subramanyam and Krishnan 2003, - Binkley and Schach 1998, Ohlsson and Alberg 1996, - Nagappan et al. 2006, Knab et al. 2006 • Code churn - Nagappan and Ball 2005 • Historical data - Khoshgoftaar et al. 1996, Graves et al. 2000, Kim et al. 2007, - Ostrand et al. 2005, Mockus et al. 2005 • Code dependencies - Nagappan and Ball 2007, Schröter et al. 2006
  • 43. Hypotheses Complexity of dependency graphs Sub system correlates with the number of post-release defects (H1) level can predict the number of post-release defects (H2)
  • 44. Hypotheses Complexity of dependency graphs Sub system correlates with the number of post-release defects (H1) level can predict the number of post-release defects (H2) Network measures on dependency graphs Binary correlate with the number of post-release defects (H3) level can predict the number of post-release defects (H4) can indicate critical “escrow” binaries (H5)
  • 45. DATA. .
  • 46. Data collection Release point for Windows Server 2003
  • 47. Data collection Release point for Windows Server 2003 Dependencies Network Measures Complexity Metrics
  • 48. Data collection six months Release point for to collect Windows Server 2003 defects Dependencies Network Measures Complexity Metrics Defects
  • 49. Dependencies • Directed relationship between two pieces of code (here: binaries) • MaX dependency analysis framework -Caller-callee dependencies - Imports and exports - RPC, COM - Runtime dependencies (such as LoadLibrary) - Registry access - etc.
  • 51. Centrality Degree Blue binary has dependencies to many other binaries
  • 52. Centrality Degree Closeness Blue binary has dependencies Blue binary is close to all other to many other binaries binaries (only two steps)
  • 53. Centrality Degree Closeness Betweenness Blue binary has dependencies Blue binary is close to all other Blue binary connects the left to many other binaries binaries (only two steps) with the right graph (bridge)
  • 54. Centrality • Degreethe number dependencies centrality - counts • Closeness centrality binaries into account - takes distance to all other - Closeness: How close are the other binaries? - Reach: How many binaries can be reached (weighted)? - Eigenvector: similar to Pagerank • Betweenness centrality paths through a binary - counts the number of shortest
  • 56. Ego networks EGO INOUT
  • 57. Ego networks EGO IN INOUT
  • 58. Ego networks EGO IN OUT INOUT
  • 59. Complexity metrics Group Metrics Aggregation Module metrics # functions in B for a binary B # global variables in B # executable lines in f() # parameters in f() Per-function metrics Total # functions calling f() for a function f() Max # functions called by f() McCabe’s cyclomatic complexity of f() # methods in C # subclasses of C OO metrics Total Depth of C in the inheritance tree for a class C Max Coupling between classes Cyclic coupling between classes
  • 63. Star pattern With defects No defects
  • 66. Undirected cliques Average number of defects is higher for binaries in large cliques.
  • 68. Prediction Input metrics and measures Model Prediction PCA Regression
  • 69. Prediction Input metrics and measures Model Prediction PCA Regression Metrics SNA Metrics+SNA
  • 70. Prediction Input metrics and measures Model Prediction PCA Regression Metrics Classification SNA Metrics+SNA Ranking
  • 71. Classification Has a binary a defect or not? or
  • 72. Ranking Which binaries have the most defects? or or ... or
  • 77. Classification (logistic regression) SNA increases the recall by 0.10 (at p=0.01) while precision remains comparable.
  • 79. Ranking (linear regression) SNA+METRICS increases the correlation by 0.10 (significant at p=0.01)
  • 80. FUTURE WORK . additions analysis architecture archives aspects bug cached calls changes collaboration complexities component concerns cross- cutting cvs data defects design development drawing dynamine eclipse effort evolves failures fine-grained fix fix-inducing graphs hatari history locate matching method mining predicting program programmers report repositories revision software support system taking transactions version visualizing
  • 81. FUTURE WORK . analysis archives aspects bug changes collaboration complexities component concerns cross-cutting cvs data defects design development drawing eclipse erose evolves factor failures fine-grained fix fix-inducing fm graphs guide hatari history human matching mining networking predicting program programmers quality report repositories revision social software support system taking version
  • 82. Collaboration Comm. Version Bug Archive Archive Database
  • 83. Collaboration Collab. Data Comm. Version Bug Archive Archive Database
  • 84. Collaboration Collab. Effort Data Data Comm. Version Bug Archive Archive Database
  • 85. Collaboration Collab. Effort Data Data Comm. Version Bug Archive Archive Database
  • 86. Collaboration Collab. Effort Data Data Comm. Version Bug Archive Archive Database Social Networking for Software Development
  • 87. Contributions of the thesis Fine-grained analysis of version archives. 1 Project-specific usage patterns of methods (FSE 2005) Identification of cross-cutting changes (ASE 2006) Mining bug databases to predict defects. 2 Dependencies predict defects (ISSRE 2007, ICSE 2008) Domino effect: depending on defect-prone binaries increases the chances of having defects (In submission).