Mining Software Archives
to Support Software Development




         Tom Zimmermann
         Saarland University
Software Development


                 Hello
         Build   Calgary!
Software Development



         Build
Collaboration
Collaboration
Collaboration




Comm.
Archive
Collaboration




          Version
Comm.
          Archive
Archive
Collaboration




          Version
Comm.                 Bug
          Archive
Archive             Database
Collaboration




           Version
Comm.                   Bug
           Archive
Archive               Database


  Min...
Mining Software Archives
Mining Software Archives




eROSE    BugCache   Vulture
eROSE
              Related Changes
                    (ICSE 2004, TSE 2005)




Tom Zimmermann • Saarland University
  P...
Developers who changed this function
also changed...
eROSE: Guiding Developers

       Customers who
     bought this item also
          bought...




Purchase
 History
eROSE: Guiding Developers

                                    Developers who
       Customers who
                       ...
eROSE suggests further locations.
eROSE prevents incomplete changes.
Processing CVS data
Processing CVS data
Processing CVS data




  1. Comparing files
  2. Building transactions
Comparing Files
Comparing Files
A()


B()


C()


D()


E()
Comparing Files
A()          A()


B()          F()


C()          B()


D()          D()


E()          E()
Comparing Files
A()          A()


B()          F()


C()          B()


D()          D()


E()          E()
Building Transactions


   CVS
150,000
Building Transactions

                2003-02-19 (aweinand): fixed #13332
   CVS
                createGeneralPage()
     ...
Building Transactions
                   same author + message + time

                2003-02-19 (aweinand): fixed #13332
...
Mining Associations

User changes fKeys[] and initDefaults()
Mining Associations
Mining Associations
EROSE
finds past
transactions
Mining Associations
                    #756                #6721               #21078
EROSE               fKeys[]        ...
Mining Associations
                   #756                     #6721               #21078
EROSE              fKeys[]     ...
Evaluation

                        GIMP




         PostgreSQL



                      KOffice
jEdit
Evaluation

EROSE predicts 33% of all changed entities.
                                         GIMP
(files: 44%)



     ...
Evaluation

EROSE predicts 33% of all changed entities.
                                         GIMP
(files: 44%)

In 70% ...
Evaluation

EROSE predicts 33% of all changed entities.
                                         GIMP
(files: 44%)

In 70% ...
eROSE
        Related Changes
            (ICSE 2004, TSE 2005)



guides developers

 non-program elements
   (documentat...
BugCache
            Predicting Defects
                    (ASE 2006, ICSE 2007)




                            `

     ...
The Problem

     How should we
 allocate our resources
 for quality assurance?
One Solution

    List with elements that
       (will) have defects




         List is adaptive, i.e.,
        it chang...
One Solution

    List with elements that
       (will) have defects

        Cache
         List is adaptive, i.e.,
     ...
The BugCache Model

                 What is loaded in the
                 cache?

                                      ...
The BugCache Model

                 What is loaded in the
                 cache?

                                      ...
The BugCache Model

                 What is loaded in the
                 cache?

                                      ...
The BugCache Model

                 What is loaded in the
                 cache?

                                      ...
The BugCache Model

                 What is loaded in the
                 cache?

                                      ...
The BugCache Model

                 What is loaded in the
                 cache?

                                      ...
The BugCache Model

                 What is loaded in the
                 cache?

                                      ...
The BugCache Model


            Cache size: 2




  Miss
The BugCache Model


            Cache size: 2




  Miss
The BugCache Model


               Cache size: 2




  Miss   Hit
The BugCache Model


               Cache size: 2




  Miss   Hit
The BugCache Model


               Cache size: 2




  Miss   Hit      Miss
The BugCache Model


               Cache size: 2




  Miss   Hit      Miss
The BugCache Model


                        Cache size: 2




      Miss    Hit          Miss

Hit rate = #Hits / #Defect...
The BugCache Model


               Cache size: 2




  Miss   Hit      Miss
The BugCache Model


               Cache size: 2




  Miss   Hit      Miss
The BugCache Model


               Cache size: 2




  Miss   Hit      Miss         Miss
The BugCache Model


               Cache size: 2




  Miss   Hit      Miss         Miss
The BugCache Model


               Cache size: 2




  Miss   Hit      Miss         Miss
Loading Elements

Temporal locality – as shown before
Spatial locality – load “nearby” elements
(i.e., co-changed before)
...
Evaluation



                        Mozilla
jEdit
        PostgreSQL   Columba
Hit Rates
                 Methods               Files
Project      BugCache FixCache BugCache FixCache
Apache 1.3    59.6...
Hit Rates
                 Methods               Files
Project      BugCache FixCache BugCache FixCache
Apache 1.3    59.6...
Reasons for Hits
                   Initial pre-fetch
Spatial locality          18%
     18%




                         ...
Warning Developers

 “Safe” Location
(not in FixCache)



  Risky Location
(red, in FixCache)
BugCache
       Predicting Defects
             (ASE 2006, ICSE 2007)




temporal locality

       adaptive
    hit rates...
Vulture
                   Predicting
      Security Vulnerabilities
                      (Work in Progress)




 Stephan...
Firefox/Mozilla
  >700 developers         228,365 commits




 14,368 C/C++ files
                          1,012,512 revis...
>700 developers     228,365 commits




 14,368 C/C++ files
                      1,012,512 revisions
(10,452 components)
Vulnerabilities
Vulnerabilities
Vulnerabilities




0
    Vulnerabilities
Vulnerabilities
      Security Advisory 2005-12
    Title: Livefeed bookmarks can steal cookies
    Impact: High
    Produ...
Vulnerabilities




0
    Vulnerabilities
Vulnerabilities
      Security Advisory 2005-13
    Title: Window Injection Spoofing
    Severity: Low
    Products: Firefo...
Vulnerabilities
      Security Advisory 2005-15
                        2005-41
                        2005-16
          ...
Vulnerabilities




0
    Vulnerabilities
Vulnerabilities
10,452 components

    424 vulnerable

     4.05%
0
    Vulnerabilities
Vulnerabilities

             What other
           components are
             vulnerable?




0
    Vulnerabilities
Vulnerabilities




0
    Vulnerabilities
Vulnerabilities




0
    Vulnerabilities                 ?
Vulnerabilities
             Is this new
          component likely
          to be vulnerable?




0
    Vulnerabilities ...
Vulture
                                Code
Vulnerability   Version          Code
                                  Code
...
Vulture
                                Code
Vulnerability   Version          Code
                                  Code
...
Vulture
                                     Code
   Vulnerability    Version           Code
                             ...
Vulture
                                     Code
   Vulnerability    Version           Code
                             ...
Vulture
                                     Code
   Vulnerability    Version           Code
                             ...
Correlations
Correlations
Programmer            Code Complexity




  Language
Correlations

                    Code Complexity




Language
Correlations




Language
Correlations




Language
                   Problem Domain
Imports
Imports




GUI   Database   Certificates   OS
Imports




GUI   Database   Certificates   OS
Imports




GUI   Database   Certificates   OS
Example (1)


                     nsIContent.h




                  nsIContentUtils.h




              nsIScriptSecurit...
Example (1)


                     nsIContent.h




    import
                  nsIContentUtils.h




              nsISc...
Example (1)
                    ✘
✘           ✘
    ✘           ✘
        ✘                                nsIContent.h
  ...
Example (2)



              nsIPrivateDOMEvent.h




               nsReadableUtils.h
Example (2)



    import    nsIPrivateDOMEvent.h




               nsReadableUtils.h
Example (2)
                    ✘
✘           ✘
    ✘           ✘
                    ✘
✘           ✘
    ✘           ✘   ...
Research Questions


• How well do imports predict vulnerabilities?
• Can imports be used for
  − classification (vulnerabl...
Input Data


     nsCOMArray              0
   nsIDocument.h             1
        nspr_md.h            0
 nsDOMClassInfo ...
Input Data




                                                e. am t.h
                                                 ...
Distribution
ibution of MFSAs                                       Distribution of Bug Reports


                        ...
Experiments

• 40 randomtraining set, 3,484 rows in validation set
                splits
  6,968 rows in

• Classification...
Results

                          (a) Precision and Recall                                                               ...
Results

                          (a) Precision and Recall                                                               ...
Results

                          (a) Precision and Recall                                                               ...
Results

                          (a) Precision and Recall                                                               ...
Results
moderately strong correlation (mostly significant at p < 0.01)
                              (a) Precision and Reca...
Ranking
Ranking
Rank   Component              Actual Rank
 1     nsDOMClassInfo              3
 2     SGridRowLayout             9...
Ranking
Rank   Component              Actual Rank
 1     nsDOMClassInfo              3
 2     SGridRowLayout             9...
Ranking
Rank   Component              Actual Rank
 1     nsDOMClassInfo              3
 2     SGridRowLayout             9...
Ranking
Rank   Component              Actual Rank
 1     nsDOMClassInfo              3
 2     SGridRowLayout             9...
Similar Results for Bugs

        Packages + Import relationships
        (ISESE 2006)


        Precision: 66.7% Recall: ...
Vulture
                 Predicting
    Security Vulnerabilities
                (Work in Progress)




locates past + pre...
Future
 Work


    ?
#1: Mining across Projects


            • Complement source
              code search engines
              with mining t...
#2: Developer Buddy




               MOCKUP
eROSE   BugCache   Vulture
automatic




  eROSE     BugCache   Vulture
automatic
                       large-scale




  eROSE     BugCache        Vulture
automatic
                           large-scale




  eROSE         BugCache        Vulture


    tool-oriented
automatic
                    large-scale


       Empirical Software
        Engineering 2.0


    tool-oriented
automatic
                     large-scale


       Empirical Software
        Engineering 2.0


    tool-oriented   Thank...
Mining Software Archives to Support Software Development
Mining Software Archives to Support Software Development
Mining Software Archives to Support Software Development
Mining Software Archives to Support Software Development
Mining Software Archives to Support Software Development
Mining Software Archives to Support Software Development
Mining Software Archives to Support Software Development
Upcoming SlideShare
Loading in …5
×

Mining Software Archives to Support Software Development

1,717
-1

Published on

Job application talk.

Published in: Technology, Art & Photos
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,717
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
69
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Mining Software Archives to Support Software Development

  1. 1. Mining Software Archives to Support Software Development Tom Zimmermann Saarland University
  2. 2. Software Development Hello Build Calgary!
  3. 3. Software Development Build
  4. 4. Collaboration
  5. 5. Collaboration
  6. 6. Collaboration Comm. Archive
  7. 7. Collaboration Version Comm. Archive Archive
  8. 8. Collaboration Version Comm. Bug Archive Archive Database
  9. 9. Collaboration Version Comm. Bug Archive Archive Database Mining Software Archives
  10. 10. Mining Software Archives
  11. 11. Mining Software Archives eROSE BugCache Vulture
  12. 12. eROSE Related Changes (ICSE 2004, TSE 2005) Tom Zimmermann • Saarland University Peter Weißgerber • University of Trier Stephan Diehl • University of Trier Andreas Zeller • Saarland University
  13. 13. Developers who changed this function also changed...
  14. 14. eROSE: Guiding Developers Customers who bought this item also bought... Purchase History
  15. 15. eROSE: Guiding Developers Developers who Customers who changed this function bought this item also also changed... bought... Version Purchase Archive History
  16. 16. eROSE suggests further locations.
  17. 17. eROSE prevents incomplete changes.
  18. 18. Processing CVS data
  19. 19. Processing CVS data
  20. 20. Processing CVS data 1. Comparing files 2. Building transactions
  21. 21. Comparing Files
  22. 22. Comparing Files A() B() C() D() E()
  23. 23. Comparing Files A() A() B() F() C() B() D() D() E() E()
  24. 24. Comparing Files A() A() B() F() C() B() D() D() E() E()
  25. 25. Building Transactions CVS 150,000
  26. 26. Building Transactions 2003-02-19 (aweinand): fixed #13332 CVS createGeneralPage() createTextComparePage() 150,000 fKeys[] initDefaults() buildnotes_compare.html PatchMessages.properties plugin.properties
  27. 27. Building Transactions same author + message + time 2003-02-19 (aweinand): fixed #13332 CVS createGeneralPage() createTextComparePage() 150,000 fKeys[] initDefaults() buildnotes_compare.html PatchMessages.properties plugin.properties
  28. 28. Mining Associations User changes fKeys[] and initDefaults()
  29. 29. Mining Associations
  30. 30. Mining Associations EROSE finds past transactions
  31. 31. Mining Associations #756 #6721 #21078 EROSE fKeys[] fKeys[] fKeys[] initDefaults() initDefaults() initDefaults() finds past ... ... ... transactions plugin.properties plugin.properties plugin.properties #42432 #51345 #59998 #71003 fKeys[] fKeys[] fKeys[] fKeys[] initDefaults() initDefaults() initDefaults() initDefaults() ... ... ... ... plugin.properties plugin.properties plugin.properties plugin.properties #87264 #91220 #101823 #104223 fKeys[] fKeys[] fKeys[] fKeys[] initDefaults() initDefaults() initDefaults() initDefaults() ... ... ... ... plugin.properties plugin.properties plugin.properties
  32. 32. Mining Associations #756 #6721 #21078 EROSE fKeys[] fKeys[] fKeys[] initDefaults() initDefaults() initDefaults() finds past ... ... ... transactions plugin.properties plugin.properties plugin.properties #42432 #51345 #59998 #71003 {fKeys[], initDefaults()} {plugin.properties} fKeys[] fKeys[] fKeys[] fKeys[] initDefaults() initDefaults() initDefaults() initDefaults() Support 10, Confidence 10/11 = 0.909 ... ... ... ... plugin.properties plugin.properties plugin.properties plugin.properties #87264 #91220 #101823 #104223 fKeys[] fKeys[] fKeys[] fKeys[] initDefaults() initDefaults() initDefaults() initDefaults() ... ... ... ... plugin.properties plugin.properties plugin.properties
  33. 33. Evaluation GIMP PostgreSQL KOffice jEdit
  34. 34. Evaluation EROSE predicts 33% of all changed entities. GIMP (files: 44%) PostgreSQL KOffice jEdit
  35. 35. Evaluation EROSE predicts 33% of all changed entities. GIMP (files: 44%) In 70% of all transactions, EROSE’s topmost three suggestions contain a changed entity. PostgreSQL (files: 72%) KOffice jEdit
  36. 36. Evaluation EROSE predicts 33% of all changed entities. GIMP (files: 44%) In 70% of all transactions, EROSE’s topmost three suggestions contain a changed entity. PostgreSQL (files: 72%) EROSE learns quickly (within 30 days). KOffice jEdit
  37. 37. eROSE Related Changes (ICSE 2004, TSE 2005) guides developers non-program elements (documentation) learns quickly
  38. 38. BugCache Predicting Defects (ASE 2006, ICSE 2007) ` Sung Kim • MIT Tom Zimmermann • Saarland University Jim Whitehead • Univ. of California SC Andreas Zeller • Saarland University
  39. 39. The Problem How should we allocate our resources for quality assurance?
  40. 40. One Solution List with elements that (will) have defects List is adaptive, i.e., it changes over time
  41. 41. One Solution List with elements that (will) have defects Cache List is adaptive, i.e., it changes over time
  42. 42. The BugCache Model What is loaded in the cache? Cache size: 2 Hypothesis: Temporal locality between defects
  43. 43. The BugCache Model What is loaded in the cache? Cache size: 2 Hypothesis: Temporal locality between defects
  44. 44. The BugCache Model What is loaded in the cache? Cache size: 2 Hypothesis: Temporal locality between defects
  45. 45. The BugCache Model What is loaded in the cache? Cache size: 2 Hypothesis: Temporal locality between defects
  46. 46. The BugCache Model What is loaded in the cache? Cache size: 2 Hypothesis: Temporal locality between defects
  47. 47. The BugCache Model What is loaded in the cache? Cache size: 2 Miss Hypothesis: Temporal locality between defects
  48. 48. The BugCache Model What is loaded in the cache? Cache size: 2 Miss Hypothesis: Temporal locality between defects
  49. 49. The BugCache Model Cache size: 2 Miss
  50. 50. The BugCache Model Cache size: 2 Miss
  51. 51. The BugCache Model Cache size: 2 Miss Hit
  52. 52. The BugCache Model Cache size: 2 Miss Hit
  53. 53. The BugCache Model Cache size: 2 Miss Hit Miss
  54. 54. The BugCache Model Cache size: 2 Miss Hit Miss
  55. 55. The BugCache Model Cache size: 2 Miss Hit Miss Hit rate = #Hits / #Defects = 33.3%
  56. 56. The BugCache Model Cache size: 2 Miss Hit Miss
  57. 57. The BugCache Model Cache size: 2 Miss Hit Miss
  58. 58. The BugCache Model Cache size: 2 Miss Hit Miss Miss
  59. 59. The BugCache Model Cache size: 2 Miss Hit Miss Miss
  60. 60. The BugCache Model Cache size: 2 Miss Hit Miss Miss
  61. 61. Loading Elements Temporal locality – as shown before Spatial locality – load “nearby” elements (i.e., co-changed before) Changed-entity locality – load changed elements New-entity locality – load new elements Initial pre-fetch – start with a loaded cache
  62. 62. Evaluation Mozilla jEdit PostgreSQL Columba
  63. 63. Hit Rates Methods Files Project BugCache FixCache BugCache FixCache Apache 1.3 59.6% 61.5% 83.9% 81.5% Columba 58.9% 67.6% 83.5% 83.0% Eclipse 64.5% 71.6% 95.1% 95.0% JEdit 50.5% 48.9% 85.7% 85.4% Mozilla 49.3% 55.0% 93.3% 88.0% PostgreSQL 61.9% 59.2% 73.9% 71.0% Subversion 68.3% 43.8% 82.0% 81.3% Cache size = 10%
  64. 64. Hit Rates Methods Files Project BugCache FixCache BugCache FixCache Apache 1.3 59.6% 61.5% 83.9% 81.5% Columba 58.9% 67.6% 83.5% 83.0% Eclipse 64.5% 71.6% 95.1% 95.0% JEdit 50.5% 48.9% 85.7% 85.4% Mozilla 49.3% 55.0% 93.3% 88.0% PostgreSQL 61.9% 59.2% 73.9% 71.0% Subversion 68.3% 43.8% 82.0% 81.3% Cache size = 10%
  65. 65. Reasons for Hits Initial pre-fetch Spatial locality 18% 18% Initial pre-fetch Temporal locality Temporal locality Spatial locality Changed-entity locality 60% New-entity locality
  66. 66. Warning Developers “Safe” Location (not in FixCache) Risky Location (red, in FixCache)
  67. 67. BugCache Predicting Defects (ASE 2006, ICSE 2007) temporal locality adaptive hit rates of 71%~95%
  68. 68. Vulture Predicting Security Vulnerabilities (Work in Progress) Stephan Neuhaus • Saarland University Tom Zimmermann • Saarland University Andreas Zeller • Saarland University
  69. 69. Firefox/Mozilla >700 developers 228,365 commits 14,368 C/C++ files 1,012,512 revisions (10,452 components)
  70. 70. >700 developers 228,365 commits 14,368 C/C++ files 1,012,512 revisions (10,452 components)
  71. 71. Vulnerabilities
  72. 72. Vulnerabilities
  73. 73. Vulnerabilities 0 Vulnerabilities
  74. 74. Vulnerabilities Security Advisory 2005-12 Title: Livefeed bookmarks can steal cookies Impact: High Products: Firefox Description: Earlier versions of Firefox allowed javascript: and data: URLs as Livefeed bookmarks. When they updated the URL would be run in the context of the current page and could be used to steal cookies or data displayed on the page. If the user were on a page with elevated privileges (for example, about:config) when the Livefeed was updated, the feed URL could potentially run arbitrary code on the user's machine. 0 Vulnerabilities
  75. 75. Vulnerabilities 0 Vulnerabilities
  76. 76. Vulnerabilities Security Advisory 2005-13 Title: Window Injection Spoofing Severity: Low Products: Firefox, Mozilla Suite Description: A website can inject content into a popup opened by another site if the target name of the popup window is known. An attacker who knows you are going to visit that other site could spoof the contents of the popup. 0 Vulnerabilities
  77. 77. Vulnerabilities Security Advisory 2005-15 2005-41 2005-16 2006-76 2005-14 Title: Heap overflow possible security dialogs Title: Spoofing escalation via DOM property XSS quot;secure sitequot;window's Function Privilege download and in UTF8 to object SSL using outer indicator spoofing Impact: Moderate Unicode conversion overrides High with overlapping windows Severity: Products:Critical 2.0 Severity: High Products: Firefox Mozilla Suite Firefox, Description:Various schemesdemonstrated Products: Firefox, Thunderbird, Mozilla Suitethat Description: moz_bug_r_a4 were reported Mozilla Suite Description: It thepossible forreportedstringin the Function prototype regressionlock icon to with that could causeMichael Kraxsitequot; UTF8 several moz_bug_r_a4 a described is quot;secure demonstrates that the download dialog trigger details overflow be bug 355161 couldto and security dialogs the exploitsand show attacker the ability tothe wrong invalid sequences certificate a heap bypass can of appear giving an be exploited to for install malicious could be data. by requiring would spoofed byUnicode cross Exploitability only convertedcode or steal data,phishers to an that site. These against used site script (XSS) protections partially covering them with make injection, which could be used to particularly a the user do commonplace users get click onin overlapping window. Some actionsstealthe string depend on the attackers abilityto may not notice their spoofs look more legitimate, like credentials or the buggyhide the and browser or perform link or window from arbitrary sitescommon thesensitive the context menu. Theshowing the intoOS opendataborderaddress barweb content is windows that converter. General statusbar destructive actions on privileged rule out cause in what appears to be of a logged-in and bisectingeach case was behalf a single dialog,user. converted elsewhere but we can'tUI code the be true location. (quot;chromequot;) being overly attack. convinced by the spoofing text of the top-most possibility of a successfultrusting of DOM nodes from the content window. window to click on the quot;Allowquot; or quot;Openquot; button of the window below. 0 Vulnerabilities
  78. 78. Vulnerabilities 0 Vulnerabilities
  79. 79. Vulnerabilities 10,452 components 424 vulnerable 4.05% 0 Vulnerabilities
  80. 80. Vulnerabilities What other components are vulnerable? 0 Vulnerabilities
  81. 81. Vulnerabilities 0 Vulnerabilities
  82. 82. Vulnerabilities 0 Vulnerabilities ?
  83. 83. Vulnerabilities Is this new component likely to be vulnerable? 0 Vulnerabilities ?
  84. 84. Vulture Code Vulnerability Version Code Code Database Archive Code Redo diagram
  85. 85. Vulture Code Vulnerability Version Code Code Database Archive Code Redo diagram Vulture
  86. 86. Vulture Code Vulnerability Version Code Code Database Archive Code Redo diagram Vulture Component Component Component
  87. 87. Vulture Code Vulnerability Version Code Code Database Archive Code Redo diagram Vulture Predictor Component Component Component
  88. 88. Vulture Code Vulnerability Version Code Code Code Database Archive Code Redo diagram Vulture Predictor Component Component Component
  89. 89. Correlations
  90. 90. Correlations Programmer Code Complexity Language
  91. 91. Correlations Code Complexity Language
  92. 92. Correlations Language
  93. 93. Correlations Language Problem Domain
  94. 94. Imports
  95. 95. Imports GUI Database Certificates OS
  96. 96. Imports GUI Database Certificates OS
  97. 97. Imports GUI Database Certificates OS
  98. 98. Example (1) nsIContent.h nsIContentUtils.h nsIScriptSecurityManager.h
  99. 99. Example (1) nsIContent.h import nsIContentUtils.h nsIScriptSecurityManager.h
  100. 100. Example (1) ✘ ✘ ✘ ✘ ✘ ✘ nsIContent.h ✘ ✘ ✘ ✘✘ ✘ import ✘ ✘ ✘ nsIContentUtils.h ✘ ✘ 95.5% ✘ ✔ ✘ ✘ ✘ nsIScriptSecurityManager.h
  101. 101. Example (2) nsIPrivateDOMEvent.h nsReadableUtils.h
  102. 102. Example (2) import nsIPrivateDOMEvent.h nsReadableUtils.h
  103. 103. Example (2) ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘ import nsIPrivateDOMEvent.h ✘ ✘ ✘ ✘ 100% ✘ ✘ ✘ ✘ ✘ nsReadableUtils.h
  104. 104. Research Questions • How well do imports predict vulnerabilities? • Can imports be used for − classification (vulnerable or not) and for − regression (number of vulnerabilities)?
  105. 105. Input Data nsCOMArray 0 nsIDocument.h 1 nspr_md.h 0 nsDOMClassInfo 10 EmbedGTKTools 0 MozillaControl.cpp 0 nsDOMClassInfo has had 10 vulnerability-related bug reports
  106. 106. Input Data e. am t.h h e. re Fr c bt ack nne e or St o di h s/fi h m ns PC st le. 9, h ut o.h sy pl. 9 il.h IX Im 05 ns ss nsCOMArray 0 1 0 0 0 1 0 0 nsIDocument.h 1 0 0 1 0 0 1 0 nspr_md.h 0 0 1 1 0 0 1 0 nsDOMClassInfo 10 0 0 1 0 1 0 0 EmbedGTKTools 0 0 0 0 0 1 0 0 MozillaControl.cpp 0 0 1 0 1 0 0 0 nsDOMClassInfo has had 10 nsDOMClassInfo imports vulnerability-related bug reports “nsIXPConnect.h”
  107. 107. Distribution ibution of MFSAs Distribution of Bug Reports 300 Number of Components 20 50 5 12 5 7 9 11 13 13579 13 17 24 umber of MFSAs Number of Bug Reports
  108. 108. Experiments • 40 randomtraining set, 3,484 rows in validation set splits 6,968 rows in • Classification recall and precision Train SVM, compute • Regression rank correlation on top 1% Train SVM, compute • SVM: linear kernel10GB ofdefault parameters with R implementation (up to main memory)
  109. 109. Results (a) Precision and Recall (b) Rank Correlation 0.55 1.0 ● ● ● ● ● ● ● Cumulative Distribution ● 0.8 ● ● 0.50 ● ● ● ● ● ● ●● ● ● ●●● ●● ● ● ● 0.6 Precision ● ● ● ● ● 0.45 ● ●● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ●●● ● ● ● ● ● ● ● ● 0.40 ● ● ● ● ● ● 0.2 ● ●● ● ● ● ● ● ● ● ● ● 0.35 ● 0.0 ● 0.55 0.60 0.65 0.70 0.75 0.2 0.3 0.4 0.5 0.6 0.7 Recall Rank Correlation
  110. 110. Results (a) Precision and Recall (b) Rank Correlation 0.55 1.0 ● ● ● ● ● ● ● Cumulative Distribution ● 0.8 ● ● 0.50 ● ● ● ● ● ● ●● ● ● ●●● ●● ● ● ● 0.6 Precision ● ● ● ● ● 0.45 ● ●● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ●●● ● ● ● ● ● ● ● ● 0.40 ● ● ● ● ● ● 0.2 ● ●● ● ● ● ● ● ● ● ● ● 0.35 ● 0.0 ● 0.55 0.60 0.65 0.70 0.75 0.2 0.3 0.4 0.5 0.6 0.7 Recall Rank Correlation 45% (about 1/2) of predictions correct
  111. 111. Results (a) Precision and Recall (b) Rank Correlation 0.55 1.0 ● ● ● ● ● ● ● Cumulative Distribution ● 0.8 ● ● 0.50 ● ● ● ● ● ● ●● ● ● ●●● ●● ● ● ● 0.6 Precision ● ● ● ● ● 0.45 ● ●● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ●●● ● ● ● ● ● ● ● ● 0.40 ● ● ● ● ● ● 0.2 ● ●● ● ● ● ● ● ● ● ● ● 0.35 ● 0.0 ● 0.55 0.60 0.65 0.70 0.75 0.2 0.3 0.4 0.5 0.6 0.7 Recall Rank Correlation 2/3 of all vulnerable components detected 45% (about 1/2) of predictions correct
  112. 112. Results (a) Precision and Recall (b) Rank Correlation 0.55 1.0 ● ● ● ● ● ● ● Cumulative Distribution ● 0.8 ● ● 0.50 ● ● ● ● ● ● ●● ● ● ●●● ●● ● ● ● 0.6 Precision ● ● ● ● ● 0.45 ● ●● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ●●● ● ● ● ● ● ● ● ● 0.40 ● ● ● ● ● ● 0.2 ● ●● ● ● ● ● ● ● ● ● ● 0.35 ● 0.0 ● 0.55 0.60 0.65 0.70 0.75 0.2 0.3 0.4 0.5 0.6 0.7 Recall Rank Correlation 2/3 of all vulnerable components detected 45% (about 1/2) of predictions correct
  113. 113. Results moderately strong correlation (mostly significant at p < 0.01) (a) Precision and Recall (b) Rank Correlation 0.55 1.0 ● ● ● ● ● ● ● Cumulative Distribution ● 0.8 ● ● 0.50 ● ● ● ● ● ● ●● ● ● ●●● ●● ● ● ● 0.6 Precision ● ● ● ● ● 0.45 ● ●● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ●●● ● ● ● ● ● ● ● ● 0.40 ● ● ● ● ● ● 0.2 ● ●● ● ● ● ● ● ● ● ● ● 0.35 ● 0.0 ● 0.55 0.60 0.65 0.70 0.75 0.2 0.3 0.4 0.5 0.6 0.7 Recall Rank Correlation 2/3 of all vulnerable components detected 45% (about 1/2) of predictions correct
  114. 114. Ranking
  115. 115. Ranking Rank Component Actual Rank 1 nsDOMClassInfo 3 2 SGridRowLayout 95 3 xpcprivate 6 4 jsxml 2 5 nsGenericHTMLElement 8 6 jsgc 3 7 nsISEnvironment 12 8 jsfun 1 9 nsHTMLLabelElement 18 10 nsHttpTransaction 35 ... (3,474 components)
  116. 116. Ranking Rank Component Actual Rank 1 nsDOMClassInfo 3 2 SGridRowLayout 95 3 xpcprivate 6 4 jsxml 2 5 nsGenericHTMLElement 8 6 jsgc 3 7 nsISEnvironment 12 8 jsfun 1 9 nsHTMLLabelElement 18 10 nsHttpTransaction 35 ... (3,474 components)
  117. 117. Ranking Rank Component Actual Rank 1 nsDOMClassInfo 3 2 SGridRowLayout 95 3 xpcprivate 6 4 jsxml 2 5 nsGenericHTMLElement 8 6 jsgc 3 7 nsISEnvironment 12 8 jsfun 1 9 nsHTMLLabelElement 18 10 nsHttpTransaction 35 ... (3,474 components)
  118. 118. Ranking Rank Component Actual Rank 1 nsDOMClassInfo 3 2 SGridRowLayout 95 3 xpcprivate 6 4 jsxml 2 5 nsGenericHTMLElement 8 6 jsgc 3 7 nsISEnvironment 12 8 jsfun 1 9 nsHTMLLabelElement 18 10 nsHttpTransaction 35 ... (3,474 components)
  119. 119. Similar Results for Bugs Packages + Import relationships (ISESE 2006) Precision: 66.7% Recall: 69.4% Binaries + Dependencies (Internship @ Microsoft Research, 2006) Precision: 64.4% Recall: 75.3%
  120. 120. Vulture Predicting Security Vulnerabilities (Work in Progress) locates past + predicts new vulnerabilities problem domain
  121. 121. Future Work ?
  122. 122. #1: Mining across Projects • Complement source code search engines with mining techniques. • Large-scale mining (144,000 SF projects)
  123. 123. #2: Developer Buddy MOCKUP
  124. 124. eROSE BugCache Vulture
  125. 125. automatic eROSE BugCache Vulture
  126. 126. automatic large-scale eROSE BugCache Vulture
  127. 127. automatic large-scale eROSE BugCache Vulture tool-oriented
  128. 128. automatic large-scale Empirical Software Engineering 2.0 tool-oriented
  129. 129. automatic large-scale Empirical Software Engineering 2.0 tool-oriented Thanks! Questions?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×