SlideShare a Scribd company logo
1 of 24
Download to read offline
Bug Prediction Based on
 Fine-Grained Module
       Histories
     H i d e a k i    H a t a
      O s a m u   M i z u n o
      To h r u    K i k u n o




                  1
Overview
Background
 Historical metrics are useful for bug prediction
Problem
 For method-level prediction, it is difficult to
 collect historical metrics
Solution & Results
 Historage: fine-grained version control system
 First study of method-level bug prediction with
 well-known historical metrics
                       2
Bug Prediction
              Papers
      Papers (TSE, EMSE, ICSE, ESEC/FSE, FSE, ICSM, MSR)

15


10


 5


 0
     2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010


                              3
Code


   Historical Metrics
          Code                            Process
           •Code churn                     •Changes
                                           •Past bugs
                                           •Process complexity


          Organization                    Geography
           •Developers                     •Locations
           •Org structure                  •Distribution
           •Network
           •Ownership
Bug Prediction Survey: http://bpsurvey-hidehata.dotcloud.com/
                              4
Mining Version
Control Repository
                                           Commit message


                                                  Fix bug #32528




... n-3 n-2         n-1               n          n+1 n+2 n+3       ...
        <       July 2007        >        Code delta
       Su Mo Tu We Th       Fr   Sa

        1   2   3   4   5   6    7
        8   9 10 11 12 13 14
       15 16 17 18 19 20 21
       22 23 24 25 26 27 28
       29 30 31 1       2   3    4
        5   6   7   8   9 10 11


                                      5
What We Have
       Learned
Prediction accuracy

  Historical metrics ≥ Static code metrics
  [Moser et al. ’08, Kamei et al. ’10]

Required effort

  File-level ≤ Package-level
  [Kamei et al. ’10, Nguyen et al. ’10,
   Posnett et al. ’11]

                        6
State of the Art
      Papers (TSE, EMSE, ICSE, ESEC/FSE, FSE, ICSM, MSR)

Package-level
                                               Cache model
                                              [Kim et al. ’07]
    File-level
                                            Spam filtering model
                                             [Mizuno et al. ’07]
 Method-level

                 0    5       10       15

No method-level prediction with well-known historical metrics

                              7
Method-Level
     Prediction

Requirement

 Method-level historical metrics

Problem

 Analysis of method histories is difficult


                    8
Difficulties
1.Tracking methods is troublesome

   Matching methods should be found between
   sequential snapshots

2.Method-level metadata are not easily available

   Metadata (who, when,        n-2   n-1      n
   how, etc.) are associated
   with files



                        9
Historage
                                                     com1                    com2
          Fine-grained version
          control system[1]

              is created on top on a
              Git repository                              Method
                                                                                 Method
                                                           Method
                                                                                  Method
                                                              Method                 Method
                                                                Method                 Method




              stores methods as files

              detects rename/move
                                                                                 Method
                                                                                  Method
                                                                                     Method
                                                                                       Method



              with Git mechanism


[1] Hata et al., “Historage: Fine-Grained Version Control System for Java,” IWPSE-EVOL ’11.
Tool: git2historage(https://github.com/hdrky/git2historage)
                                             10
Visualization of repository history
•tree: directory
•white node: method




     Git - file histories              Historage - method histories
                                11
Mining Historage
                                                  Commit message


                                                         Fix bug #32528




                                        Method
                       Method                          Method
            Method                                                Method


... n-3 n-2                                                                         ...
   Method                                                                  Method




                          n-1               n           n+1 n+2 n+3

             <        July 2007        >         Code delta
             Su Mo Tu We Th       Fr   Sa

              1   2   3   4   5   6    7
              8   9 10 11 12 13 14
             15 16 17 18 19 20 21
             22 23 24 25 26 27 28
             29 30 31 1       2   3    4
              5   6   7   8   9 10 11


                                            12
Study
           Comparison
              Prediction level: package, file, and method
              Same metrics and a same prediction algorithm
              (random forest)
              Buggy modules: identified with SZZ algorithm[2]

           Evaluation
              10-fold cross validation
              Effort-based evaluation
[2] Sliwerski et al., “When do changes induce fixes?” MSR ’05.
                                              13
Target
   Project      Period   # of commits
   Xpand         2y6m       1,038
WTP Incubator    2y8m       1,133
    Ant         11y7m       2,590
 Lucene/Solr     1y6m       3,485
  OpenJPA        5y4m       4,180
  Cassandra      2y6m       4423
    ECF          6y6m       9,748
   Wicket         7y       15,033

                  14
Collected Metrics
LOC                        Lines of code
Add/DelLOC                 Added / Deleted LOC

Chg/FixChgNum              # of changes/bug-fix changes
PastBugNum                 # of fixed bug IDs
Period                     Existing days
BugIntroNum                # of bug introducing changes
LogCoupNum                 # of logical coupling changes
Avg/Max/MinInterval        Avg/Max/Min change interval
HCM                        Process complexity metric

DevTotal/Major/Minor # of Total/Major /Minor developers
Ownership            Highest proportion of ownership
                      15
Effort-Based
                               Evaluation
                        100
Percent of Bugs found


                        75

                        50

                        25

                          0
                              0    20      40      60    80   100
                                        Percent of LOC
                                  sample curve
                                            16
Result (ECF)

                        100
Percent of Bugs Found
                        80
                        60
                        40
                        20




                                                      Package
                                                      File
                                                      Method
                        0




                              0   20      40     60       80    100
                                       Percent of Lines

                                            17
1000 Times Run (ECF)

                               80
       Percent of Bugs Found
                               60
                               40
                               20
                               0




                                    Package    File   Method

percentages of bugs found in 20% LOC on a 1,000 times run
                                              18
1000 Times Run (All)
                                           Package             File             Method
                        100
Percent of bugs found




                        75


                        50


                        25


                          0
                              Xpand WTP Incubator   Ant   Lucene/Solr OpenJPA   Cassandra   ECF   Wicket

                        median values of the percentage of bugs found in 20% LOC
                                                              19
Why Is Method-Level
      800
            Prediction Effective?




                                                                 10 20 30 40 50 60
                                             Number of methods
      600
LOC
      400
      200
      0




                                                                 0
            Package     File   Method                                                    All        Buggy
                      Size                                                           # of method in a file
      Although models predict buggy modules correctly, they are
      largely non-buggy in packages, or files.
                                        20
Observations from
Correlation Analysis
Are there differences between method-level and
package/file -level prediction models?

 Same
   Large changes tend to be buggy

   Frequent changes tend to be buggy

 Different
   Bugs do not occur repeatedly
   Organizational metrics may not contribute to method-
   level prediction
                          21
Threats to Validity

Targets are limited to open-source written in
Java projects

No manual inspection of identifying buggy
modules

Effort-based evaluation may not reflect actual
efforts


                      22
Fine-Grained Study Is
  Big Data Analysis
 Need scalable                          Files         Methods

 techniques                    30000

   preparing fine-
                               22500
   grained data
   (making Historage)
                               15000
   analyzing histories
   (collecting metrics)         7500


   building prediction             0
   models                              Xpand    Ant   ECF Wicket
                           # of modules in one snapshot
                          23
Conclusions
Summary

 Method-level bug prediction with well-known
 historical metrics
Future work

 Empirical studies of actual effort using method-
 level prediction

 More metrics and more projects (including
 industrial projects)

                      24

More Related Content

Similar to Bug Prediction Based on Fine-Grained Module Histories

Bára Bühnová: Naučte se taktizovat s pomocí bad code smells a quality tactics
Bára Bühnová: Naučte se taktizovat s pomocí bad code smells a quality tacticsBára Bühnová: Naučte se taktizovat s pomocí bad code smells a quality tactics
Bára Bühnová: Naučte se taktizovat s pomocí bad code smells a quality tacticsDevelcz
 
The Art Of Debugging
The Art Of DebuggingThe Art Of Debugging
The Art Of Debuggingsvilen.ivanov
 
EKON 23 Code_review_checklist
EKON 23 Code_review_checklistEKON 23 Code_review_checklist
EKON 23 Code_review_checklistMax Kleiner
 
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...Ali Ouni
 
Predicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode OperationsPredicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode OperationsThomas Zimmermann
 
MEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational ExperimentsMEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational ExperimentsGIScRG
 
Serenity Project: Security in Software Enginering
Serenity Project: Security in Software EngineringSerenity Project: Security in Software Enginering
Serenity Project: Security in Software EngineringFrancisco Sanchez Cid
 
Continuous Inspection - Uma abordagem efetiva para melhoria contínua da quali...
Continuous Inspection - Uma abordagem efetiva para melhoria contínua da quali...Continuous Inspection - Uma abordagem efetiva para melhoria contínua da quali...
Continuous Inspection - Uma abordagem efetiva para melhoria contínua da quali...Roberto Pepato
 
IRJET- A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...
IRJET-  	  A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...IRJET-  	  A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...
IRJET- A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...IRJET Journal
 
Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...eSAT Journals
 
Asufe juniors-training session2
Asufe juniors-training session2Asufe juniors-training session2
Asufe juniors-training session2Omar Ahmed
 
Systems Lifecycle workbook
Systems Lifecycle workbookSystems Lifecycle workbook
Systems Lifecycle workbookMISY
 
Ghotra icse
Ghotra icseGhotra icse
Ghotra icseSAIL_QU
 
Performing Large Scale Repeatable Software Engineering Studies
Performing Large Scale Repeatable Software Engineering StudiesPerforming Large Scale Repeatable Software Engineering Studies
Performing Large Scale Repeatable Software Engineering StudiesGeorgios Gousios
 
SourceWarp AST 2023.pdf
SourceWarp AST 2023.pdfSourceWarp AST 2023.pdf
SourceWarp AST 2023.pdfJulian Thome
 
Network Metrics and Measurements in the Era of the Digital Economies
Network Metrics and Measurements in the Era of the Digital EconomiesNetwork Metrics and Measurements in the Era of the Digital Economies
Network Metrics and Measurements in the Era of the Digital EconomiesPavel Loskot
 
PHX - Session #2 Test Driven Development: Improving .NET Application Performa...
PHX - Session #2 Test Driven Development: Improving .NET Application Performa...PHX - Session #2 Test Driven Development: Improving .NET Application Performa...
PHX - Session #2 Test Driven Development: Improving .NET Application Performa...Steve Lange
 
Matrioska tracking keypoints in real-time
Matrioska tracking keypoints in real-timeMatrioska tracking keypoints in real-time
Matrioska tracking keypoints in real-timepowerUserHallo
 

Similar to Bug Prediction Based on Fine-Grained Module Histories (20)

Bára Bühnová: Naučte se taktizovat s pomocí bad code smells a quality tactics
Bára Bühnová: Naučte se taktizovat s pomocí bad code smells a quality tacticsBára Bühnová: Naučte se taktizovat s pomocí bad code smells a quality tactics
Bára Bühnová: Naučte se taktizovat s pomocí bad code smells a quality tactics
 
The Art Of Debugging
The Art Of DebuggingThe Art Of Debugging
The Art Of Debugging
 
EKON 23 Code_review_checklist
EKON 23 Code_review_checklistEKON 23 Code_review_checklist
EKON 23 Code_review_checklist
 
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...The Use of Development History in Software Refactoring Using a Multi-Objectiv...
The Use of Development History in Software Refactoring Using a Multi-Objectiv...
 
Predicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode OperationsPredicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode Operations
 
MEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational ExperimentsMEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational Experiments
 
Serenity Project: Security in Software Enginering
Serenity Project: Security in Software EngineringSerenity Project: Security in Software Enginering
Serenity Project: Security in Software Enginering
 
Continuous Inspection - Uma abordagem efetiva para melhoria contínua da quali...
Continuous Inspection - Uma abordagem efetiva para melhoria contínua da quali...Continuous Inspection - Uma abordagem efetiva para melhoria contínua da quali...
Continuous Inspection - Uma abordagem efetiva para melhoria contínua da quali...
 
IRJET- A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...
IRJET-  	  A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...IRJET-  	  A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...
IRJET- A Detailed Analysis on Windows Event Log Viewer for Faster Root Ca...
 
The art of project estimation
The art of project estimationThe art of project estimation
The art of project estimation
 
FASE08.ppt
FASE08.pptFASE08.ppt
FASE08.ppt
 
Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...Implementation of reducing features to improve code change based bug predicti...
Implementation of reducing features to improve code change based bug predicti...
 
Asufe juniors-training session2
Asufe juniors-training session2Asufe juniors-training session2
Asufe juniors-training session2
 
Systems Lifecycle workbook
Systems Lifecycle workbookSystems Lifecycle workbook
Systems Lifecycle workbook
 
Ghotra icse
Ghotra icseGhotra icse
Ghotra icse
 
Performing Large Scale Repeatable Software Engineering Studies
Performing Large Scale Repeatable Software Engineering StudiesPerforming Large Scale Repeatable Software Engineering Studies
Performing Large Scale Repeatable Software Engineering Studies
 
SourceWarp AST 2023.pdf
SourceWarp AST 2023.pdfSourceWarp AST 2023.pdf
SourceWarp AST 2023.pdf
 
Network Metrics and Measurements in the Era of the Digital Economies
Network Metrics and Measurements in the Era of the Digital EconomiesNetwork Metrics and Measurements in the Era of the Digital Economies
Network Metrics and Measurements in the Era of the Digital Economies
 
PHX - Session #2 Test Driven Development: Improving .NET Application Performa...
PHX - Session #2 Test Driven Development: Improving .NET Application Performa...PHX - Session #2 Test Driven Development: Improving .NET Application Performa...
PHX - Session #2 Test Driven Development: Improving .NET Application Performa...
 
Matrioska tracking keypoints in real-time
Matrioska tracking keypoints in real-timeMatrioska tracking keypoints in real-time
Matrioska tracking keypoints in real-time
 

More from Hideaki Hata

Same File, Different Changes: The Potential of Meta-Maintenance on GitHub
Same File, Different Changes: The Potential of Meta-Maintenance on GitHubSame File, Different Changes: The Potential of Meta-Maintenance on GitHub
Same File, Different Changes: The Potential of Meta-Maintenance on GitHubHideaki Hata
 
Are Donation Badges Appealing?: A Case Study of Developer Responses to Eclips...
Are Donation Badges Appealing?: A Case Study of Developer Responses to Eclips...Are Donation Badges Appealing?: A Case Study of Developer Responses to Eclips...
Are Donation Badges Appealing?: A Case Study of Developer Responses to Eclips...Hideaki Hata
 
9.6 million links in source code comments: purpose, evolution, and decay
9.6 million links in source code comments: purpose, evolution, and decay9.6 million links in source code comments: purpose, evolution, and decay
9.6 million links in source code comments: purpose, evolution, and decayHideaki Hata
 
Understanding the Heterogeneity of Contributors in Bug Bounty Programs
Understanding the Heterogeneity of Contributors in Bug Bounty Programs Understanding the Heterogeneity of Contributors in Bug Bounty Programs
Understanding the Heterogeneity of Contributors in Bug Bounty Programs Hideaki Hata
 
Using High-Rising Cities to Visualize Performance in Real-Time
Using High-Rising Cities to Visualize Performance in Real-TimeUsing High-Rising Cities to Visualize Performance in Real-Time
Using High-Rising Cities to Visualize Performance in Real-TimeHideaki Hata
 
Bug or Not? Bug Report Classification using N-Gram Idf
Bug or Not? Bug Report Classification using N-Gram IdfBug or Not? Bug Report Classification using N-Gram Idf
Bug or Not? Bug Report Classification using N-Gram IdfHideaki Hata
 

More from Hideaki Hata (6)

Same File, Different Changes: The Potential of Meta-Maintenance on GitHub
Same File, Different Changes: The Potential of Meta-Maintenance on GitHubSame File, Different Changes: The Potential of Meta-Maintenance on GitHub
Same File, Different Changes: The Potential of Meta-Maintenance on GitHub
 
Are Donation Badges Appealing?: A Case Study of Developer Responses to Eclips...
Are Donation Badges Appealing?: A Case Study of Developer Responses to Eclips...Are Donation Badges Appealing?: A Case Study of Developer Responses to Eclips...
Are Donation Badges Appealing?: A Case Study of Developer Responses to Eclips...
 
9.6 million links in source code comments: purpose, evolution, and decay
9.6 million links in source code comments: purpose, evolution, and decay9.6 million links in source code comments: purpose, evolution, and decay
9.6 million links in source code comments: purpose, evolution, and decay
 
Understanding the Heterogeneity of Contributors in Bug Bounty Programs
Understanding the Heterogeneity of Contributors in Bug Bounty Programs Understanding the Heterogeneity of Contributors in Bug Bounty Programs
Understanding the Heterogeneity of Contributors in Bug Bounty Programs
 
Using High-Rising Cities to Visualize Performance in Real-Time
Using High-Rising Cities to Visualize Performance in Real-TimeUsing High-Rising Cities to Visualize Performance in Real-Time
Using High-Rising Cities to Visualize Performance in Real-Time
 
Bug or Not? Bug Report Classification using N-Gram Idf
Bug or Not? Bug Report Classification using N-Gram IdfBug or Not? Bug Report Classification using N-Gram Idf
Bug or Not? Bug Report Classification using N-Gram Idf
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

Bug Prediction Based on Fine-Grained Module Histories

  • 1. Bug Prediction Based on Fine-Grained Module Histories H i d e a k i H a t a O s a m u M i z u n o To h r u K i k u n o 1
  • 2. Overview Background Historical metrics are useful for bug prediction Problem For method-level prediction, it is difficult to collect historical metrics Solution & Results Historage: fine-grained version control system First study of method-level bug prediction with well-known historical metrics 2
  • 3. Bug Prediction Papers Papers (TSE, EMSE, ICSE, ESEC/FSE, FSE, ICSM, MSR) 15 10 5 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 3
  • 4. Code Historical Metrics Code Process •Code churn •Changes •Past bugs •Process complexity Organization Geography •Developers •Locations •Org structure •Distribution •Network •Ownership Bug Prediction Survey: http://bpsurvey-hidehata.dotcloud.com/ 4
  • 5. Mining Version Control Repository Commit message Fix bug #32528 ... n-3 n-2 n-1 n n+1 n+2 n+3 ... < July 2007 > Code delta Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 2 3 4 5 6 7 8 9 10 11 5
  • 6. What We Have Learned Prediction accuracy Historical metrics ≥ Static code metrics [Moser et al. ’08, Kamei et al. ’10] Required effort File-level ≤ Package-level [Kamei et al. ’10, Nguyen et al. ’10, Posnett et al. ’11] 6
  • 7. State of the Art Papers (TSE, EMSE, ICSE, ESEC/FSE, FSE, ICSM, MSR) Package-level Cache model [Kim et al. ’07] File-level Spam filtering model [Mizuno et al. ’07] Method-level 0 5 10 15 No method-level prediction with well-known historical metrics 7
  • 8. Method-Level Prediction Requirement Method-level historical metrics Problem Analysis of method histories is difficult 8
  • 9. Difficulties 1.Tracking methods is troublesome Matching methods should be found between sequential snapshots 2.Method-level metadata are not easily available Metadata (who, when, n-2 n-1 n how, etc.) are associated with files 9
  • 10. Historage com1 com2 Fine-grained version control system[1] is created on top on a Git repository Method Method Method Method Method Method Method Method stores methods as files detects rename/move Method Method Method Method with Git mechanism [1] Hata et al., “Historage: Fine-Grained Version Control System for Java,” IWPSE-EVOL ’11. Tool: git2historage(https://github.com/hdrky/git2historage) 10
  • 11. Visualization of repository history •tree: directory •white node: method Git - file histories Historage - method histories 11
  • 12. Mining Historage Commit message Fix bug #32528 Method Method Method Method Method ... n-3 n-2 ... Method Method n-1 n n+1 n+2 n+3 < July 2007 > Code delta Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 2 3 4 5 6 7 8 9 10 11 12
  • 13. Study Comparison Prediction level: package, file, and method Same metrics and a same prediction algorithm (random forest) Buggy modules: identified with SZZ algorithm[2] Evaluation 10-fold cross validation Effort-based evaluation [2] Sliwerski et al., “When do changes induce fixes?” MSR ’05. 13
  • 14. Target Project Period # of commits Xpand 2y6m 1,038 WTP Incubator 2y8m 1,133 Ant 11y7m 2,590 Lucene/Solr 1y6m 3,485 OpenJPA 5y4m 4,180 Cassandra 2y6m 4423 ECF 6y6m 9,748 Wicket 7y 15,033 14
  • 15. Collected Metrics LOC Lines of code Add/DelLOC Added / Deleted LOC Chg/FixChgNum # of changes/bug-fix changes PastBugNum # of fixed bug IDs Period Existing days BugIntroNum # of bug introducing changes LogCoupNum # of logical coupling changes Avg/Max/MinInterval Avg/Max/Min change interval HCM Process complexity metric DevTotal/Major/Minor # of Total/Major /Minor developers Ownership Highest proportion of ownership 15
  • 16. Effort-Based Evaluation 100 Percent of Bugs found 75 50 25 0 0 20 40 60 80 100 Percent of LOC sample curve 16
  • 17. Result (ECF) 100 Percent of Bugs Found 80 60 40 20 Package File Method 0 0 20 40 60 80 100 Percent of Lines 17
  • 18. 1000 Times Run (ECF) 80 Percent of Bugs Found 60 40 20 0 Package File Method percentages of bugs found in 20% LOC on a 1,000 times run 18
  • 19. 1000 Times Run (All) Package File Method 100 Percent of bugs found 75 50 25 0 Xpand WTP Incubator Ant Lucene/Solr OpenJPA Cassandra ECF Wicket median values of the percentage of bugs found in 20% LOC 19
  • 20. Why Is Method-Level 800 Prediction Effective? 10 20 30 40 50 60 Number of methods 600 LOC 400 200 0 0 Package File Method All Buggy Size # of method in a file Although models predict buggy modules correctly, they are largely non-buggy in packages, or files. 20
  • 21. Observations from Correlation Analysis Are there differences between method-level and package/file -level prediction models? Same Large changes tend to be buggy Frequent changes tend to be buggy Different Bugs do not occur repeatedly Organizational metrics may not contribute to method- level prediction 21
  • 22. Threats to Validity Targets are limited to open-source written in Java projects No manual inspection of identifying buggy modules Effort-based evaluation may not reflect actual efforts 22
  • 23. Fine-Grained Study Is Big Data Analysis Need scalable Files Methods techniques 30000 preparing fine- 22500 grained data (making Historage) 15000 analyzing histories (collecting metrics) 7500 building prediction 0 models Xpand Ant ECF Wicket # of modules in one snapshot 23
  • 24. Conclusions Summary Method-level bug prediction with well-known historical metrics Future work Empirical studies of actual effort using method- level prediction More metrics and more projects (including industrial projects) 24