SlideShare a Scribd company logo
1 of 22
Download to read offline
Extracting Change patterns
          from CVS Repositories


  Salah Bouktif, Yann-Gaël Guéhéneuc, and Giuliano Antoniol

                         WCRE 2006
                        Benevento, Italy


GEODES



                                           © Bouktif, Guéhéneuc, and Antoniol, 2006
Summary

        Our work
        – Introduces the concept of change patterns
          to analyse evolution information
        – Defines and identifies the Synchrony
          change pattern using the DTW technique
        – Evaluates the extraction of the Synchrony
          change pattern wrt. a golden standard
        – Eases tracking changes in time- and
          geographically-distributed software
2/20
          development processes, e.g., open source
Context and Problem

        Time- and geographically-distributed
        software development is common
        Difficulty in sharing timely information
        about changes among developers

        Version control systems contains
        millions of fine-grained language-
        independent changes difficult to track
3/20
        and to organise
Definitions                          (1/2)

        Change patterns
          (Similar in the idea to design patterns)
        – Common and recurring changes during the
          evolution of a software
        – Reification of explicit and implicit
          relationships among software artefacts in
          the version control system


4/20
Definitions                              (2/2)

        Synchrony change patterns
        – Files that change at almost the same
          moments in time
        – Co-changes that happened in the past are
          likely to occur in the near future
        – Explicit and implicit dependencies among
          files remain stable over time

          “If this particular file changes, what other
5/20      files should change?”
Related Work

        Gall
        – Evolution patterns
        – Growth and change of behaviour
        German
        – Modification requests, interrelationships
        Beyer
        – Interpretable graph layouts
        Zimmermann
        – Co-changing fine grain entities using data mining
6/20    – Limited precision and recall
Difficulties

        Version configuration systems, i.e.,
        CVS, record change
         – Line of code
         – 1/1,000th of seconds
            Artefacts   Change


            a1
            a2
            a3
            a4

            a5
            a6

7/20                                                    t
                 t0     t1       t2   t3   t4   t5 t6
Solutions

        Choosing software artefacts
        – Files, for simplicity
        – Could be other artefacts, i.e., class…


        Computing appropriate window size

        Grouping co-changing files with DTW
8/20
Window Size
                Files     Change   w3
                                        w2
                                             w1
                f1
                f2
                f3
                f4

                f5
                f6

                     t0
                                                  t


            Different co-changes in time
            – Keeping all changes
            – Missing recent co-changes
            Most files are changed frequently
9/20        – Window of 5 to 7 changes
Grouping Co-changing Files(1/3)

         DTW: Dynamic Time Warping



         Such as




10/20
Grouping Co-changing Files(2/3)

         Example with Euclidian distance
         – Q=(1 2 3 3 2 2) and C = (1 1 3 3 4 2)




11/20
Grouping Co-changing Files(3/3)

                                           f1
                                              f2
                                          f3
                                                f6
                                                   f5
                                               f4

         Threshold
         – Between 43,200 sec. and 86,400 sec. ⇒ Balance
           between precision and recall
         – Above 86,400 sec. ⇒ Recall over precision
12/20
         – An artefact may belong to more than one group
Case Study                                             (1/8)
         Objectives
         – Compute internal precision and recall
            • k-fold cross validation
         – Compute external precision and recall
            • Comparison with a golden standard provided by an
              expert
            • The expert was not aware of the goal nor of the applied
              process and methodology
         – Assess predictive power
            • Prediction of changes with the golden standard
         – Study the scalability of the approach
            • k-fold cross validation on large program
13/20
Case Study                      (2/8)

         Measures
         – Precision
         – Recall
         Query = File to be changed
         Retrieved documents = Co-changing files
         Average and weighted precision/recall


14/20
Case Study                                 (3/8)

         Object of the case study
         – PADL
            • Meta-model to describe OO programs
         – 3 years of development
         – 91 files with a history of a least 5 changes




15/20
Case Study                                   (4/8)

         Internal precision and recall
         – Building disjoint test and training sets
            • Maximum length of 7
            • Testing sets a–b–c (c < 5, a > 1, b > 1, c > 1)
            • a, b, and c are numbers of changes in histories
              of length 5, 6, and 7 used to build test sets
            • 2-3-3 means test sets with the 2 most recent
              changes for history of length 5, 3 for length 6…


16/20
Case Study                       (5/8)

         External precision and recall
         – Golden standard from PADL
            • Conservative approach




17/20
Case Study                                 (6/8)

         Predictive power
         – Real application for prediction
            • Take the last 7 changes
            • Ignore the most recent change
            • Use the 6 changes to predict the recent change




18/20
Case Study                                      (7/8)

         Scalability of the approach
         – PADL
            • 91 files, 3 years
         – Mozilla Web browser
            •   Mirrored on July 15, 2005
            •   Versions of the next 5 days
            •   More than 20,134 files (in 2,480 directories)
            •   3 minutes

19/20
Case Study                               (8/8)

         Threat to the validity
         – Expert did not participate in the building of
           the DTW-based groupings
         – Use of same files to identify window size
           and precision/recall
         – Generalisation



20/20
Conclusion

         Change patterns
         Synchrony change patterns
         Dynamic Time Warping
         – Increased precision and recall over previous
           work, in particular Zimmermann
         – External: 78.98% precision, 76.89% recall
         – Prediction: 77.20% precision, 79.20% recall

21/20
Future Work

         Analysis of the strength of the groups
         Study of fluctuations in precision/recall
         Perform more external validations
         Apply clustering techniques
         Compare time intervals with windows
         based on the number of changes


22/20

More Related Content

Similar to WCRE06.ppt

Overview of DuraMat software tool development
Overview of DuraMat software tool developmentOverview of DuraMat software tool development
Overview of DuraMat software tool developmentAnubhav Jain
 
Investigating the Impact of Network Topology on the Processing Times of SDN C...
Investigating the Impact of Network Topology on the Processing Times of SDN C...Investigating the Impact of Network Topology on the Processing Times of SDN C...
Investigating the Impact of Network Topology on the Processing Times of SDN C...Steffen Gebert
 
Emerson Exchange 3D plots Process Analysis
Emerson Exchange 3D plots Process AnalysisEmerson Exchange 3D plots Process Analysis
Emerson Exchange 3D plots Process AnalysisEmerson Exchange
 
Towards Software Sustainability Guides for Industrial Software Systems
Towards Software Sustainability Guides for Industrial Software SystemsTowards Software Sustainability Guides for Industrial Software Systems
Towards Software Sustainability Guides for Industrial Software SystemsHeiko Koziolek
 
naveed-kamran-software-architecture-agile
naveed-kamran-software-architecture-agilenaveed-kamran-software-architecture-agile
naveed-kamran-software-architecture-agileNaveed Kamran
 
Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Java Thread and Process Performance for Parallel Machine Learning on Multicor...Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Java Thread and Process Performance for Parallel Machine Learning on Multicor...Saliya Ekanayake
 
Configuration Management
Configuration ManagementConfiguration Management
Configuration Managementelliando dias
 
Seminar on Parallel and Concurrent Programming
Seminar on Parallel and Concurrent ProgrammingSeminar on Parallel and Concurrent Programming
Seminar on Parallel and Concurrent ProgrammingStefan Marr
 
MODELS 2019: Querying and annotating model histories with time-aware patterns
MODELS 2019: Querying and annotating model histories with time-aware patternsMODELS 2019: Querying and annotating model histories with time-aware patterns
MODELS 2019: Querying and annotating model histories with time-aware patternsAntonio García-Domínguez
 
Bio inspired use case variability modelling, ijsea
Bio inspired use case variability modelling, ijseaBio inspired use case variability modelling, ijsea
Bio inspired use case variability modelling, ijseaijseajournal
 
Processing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And ToilProcessing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And ToilSpark Summit
 
Evaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasetsEvaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasetsIsrael Herraiz
 
Multi-component Modeling with Swift at Extreme Scale
Multi-component Modeling with Swift at Extreme ScaleMulti-component Modeling with Swift at Extreme Scale
Multi-component Modeling with Swift at Extreme ScaleDaniel S. Katz
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
 
Agile performance engineering with cloud 2016
Agile performance engineering with cloud   2016Agile performance engineering with cloud   2016
Agile performance engineering with cloud 2016Ken Chan
 

Similar to WCRE06.ppt (20)

Overview of DuraMat software tool development
Overview of DuraMat software tool developmentOverview of DuraMat software tool development
Overview of DuraMat software tool development
 
Investigating the Impact of Network Topology on the Processing Times of SDN C...
Investigating the Impact of Network Topology on the Processing Times of SDN C...Investigating the Impact of Network Topology on the Processing Times of SDN C...
Investigating the Impact of Network Topology on the Processing Times of SDN C...
 
Emerson Exchange 3D plots Process Analysis
Emerson Exchange 3D plots Process AnalysisEmerson Exchange 3D plots Process Analysis
Emerson Exchange 3D plots Process Analysis
 
3rd 3DDRESD: DReAMS
3rd 3DDRESD: DReAMS3rd 3DDRESD: DReAMS
3rd 3DDRESD: DReAMS
 
Towards Software Sustainability Guides for Industrial Software Systems
Towards Software Sustainability Guides for Industrial Software SystemsTowards Software Sustainability Guides for Industrial Software Systems
Towards Software Sustainability Guides for Industrial Software Systems
 
naveed-kamran-software-architecture-agile
naveed-kamran-software-architecture-agilenaveed-kamran-software-architecture-agile
naveed-kamran-software-architecture-agile
 
COMPSAC 2008 Presentation
COMPSAC 2008 PresentationCOMPSAC 2008 Presentation
COMPSAC 2008 Presentation
 
Be cse
Be cseBe cse
Be cse
 
Developing a Framework for File Format Migrations. Joey Heinen and Andrea Goe...
Developing a Framework for File Format Migrations. Joey Heinen and Andrea Goe...Developing a Framework for File Format Migrations. Joey Heinen and Andrea Goe...
Developing a Framework for File Format Migrations. Joey Heinen and Andrea Goe...
 
Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Java Thread and Process Performance for Parallel Machine Learning on Multicor...Java Thread and Process Performance for Parallel Machine Learning on Multicor...
Java Thread and Process Performance for Parallel Machine Learning on Multicor...
 
Configuration Management
Configuration ManagementConfiguration Management
Configuration Management
 
Seminar on Parallel and Concurrent Programming
Seminar on Parallel and Concurrent ProgrammingSeminar on Parallel and Concurrent Programming
Seminar on Parallel and Concurrent Programming
 
MODELS 2019: Querying and annotating model histories with time-aware patterns
MODELS 2019: Querying and annotating model histories with time-aware patternsMODELS 2019: Querying and annotating model histories with time-aware patterns
MODELS 2019: Querying and annotating model histories with time-aware patterns
 
Bio inspired use case variability modelling, ijsea
Bio inspired use case variability modelling, ijseaBio inspired use case variability modelling, ijsea
Bio inspired use case variability modelling, ijsea
 
Final_Latest
Final_LatestFinal_Latest
Final_Latest
 
Processing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And ToilProcessing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And Toil
 
Evaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasetsEvaluating the presence and impact of bias in bug-fix datasets
Evaluating the presence and impact of bias in bug-fix datasets
 
Multi-component Modeling with Swift at Extreme Scale
Multi-component Modeling with Swift at Extreme ScaleMulti-component Modeling with Swift at Extreme Scale
Multi-component Modeling with Swift at Extreme Scale
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
Agile performance engineering with cloud 2016
Agile performance engineering with cloud   2016Agile performance engineering with cloud   2016
Agile performance engineering with cloud 2016
 

More from Ptidej Team

From IoT to Software Miniaturisation
From IoT to Software MiniaturisationFrom IoT to Software Miniaturisation
From IoT to Software MiniaturisationPtidej Team
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel BriandPtidej Team
 
Manel Abdellatif
Manel AbdellatifManel Abdellatif
Manel AbdellatifPtidej Team
 
Azadeh Kermansaravi
Azadeh KermansaraviAzadeh Kermansaravi
Azadeh KermansaraviPtidej Team
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel GrichiPtidej Team
 
Cristiano Politowski
Cristiano PolitowskiCristiano Politowski
Cristiano PolitowskiPtidej Team
 
Will io t trigger the next software crisis
Will io t trigger the next software crisisWill io t trigger the next software crisis
Will io t trigger the next software crisisPtidej Team
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptPtidej Team
 
Thesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptThesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptPtidej Team
 

More from Ptidej Team (20)

From IoT to Software Miniaturisation
From IoT to Software MiniaturisationFrom IoT to Software Miniaturisation
From IoT to Software Miniaturisation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel Briand
 
Manel Abdellatif
Manel AbdellatifManel Abdellatif
Manel Abdellatif
 
Azadeh Kermansaravi
Azadeh KermansaraviAzadeh Kermansaravi
Azadeh Kermansaravi
 
Mouna Abidi
Mouna AbidiMouna Abidi
Mouna Abidi
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel Grichi
 
Cristiano Politowski
Cristiano PolitowskiCristiano Politowski
Cristiano Politowski
 
Will io t trigger the next software crisis
Will io t trigger the next software crisisWill io t trigger the next software crisis
Will io t trigger the next software crisis
 
MIPA
MIPAMIPA
MIPA
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.ppt
 
Thesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptThesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.ppt
 
Medicine15.ppt
Medicine15.pptMedicine15.ppt
Medicine15.ppt
 
Qrs17b.ppt
Qrs17b.pptQrs17b.ppt
Qrs17b.ppt
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Icpc11c.ppt
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
 
Msr17a.ppt
Msr17a.pptMsr17a.ppt
Msr17a.ppt
 
Icsoc15.ppt
Icsoc15.pptIcsoc15.ppt
Icsoc15.ppt
 

Recently uploaded

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

WCRE06.ppt

  • 1. Extracting Change patterns from CVS Repositories Salah Bouktif, Yann-Gaël Guéhéneuc, and Giuliano Antoniol WCRE 2006 Benevento, Italy GEODES © Bouktif, Guéhéneuc, and Antoniol, 2006
  • 2. Summary Our work – Introduces the concept of change patterns to analyse evolution information – Defines and identifies the Synchrony change pattern using the DTW technique – Evaluates the extraction of the Synchrony change pattern wrt. a golden standard – Eases tracking changes in time- and geographically-distributed software 2/20 development processes, e.g., open source
  • 3. Context and Problem Time- and geographically-distributed software development is common Difficulty in sharing timely information about changes among developers Version control systems contains millions of fine-grained language- independent changes difficult to track 3/20 and to organise
  • 4. Definitions (1/2) Change patterns (Similar in the idea to design patterns) – Common and recurring changes during the evolution of a software – Reification of explicit and implicit relationships among software artefacts in the version control system 4/20
  • 5. Definitions (2/2) Synchrony change patterns – Files that change at almost the same moments in time – Co-changes that happened in the past are likely to occur in the near future – Explicit and implicit dependencies among files remain stable over time “If this particular file changes, what other 5/20 files should change?”
  • 6. Related Work Gall – Evolution patterns – Growth and change of behaviour German – Modification requests, interrelationships Beyer – Interpretable graph layouts Zimmermann – Co-changing fine grain entities using data mining 6/20 – Limited precision and recall
  • 7. Difficulties Version configuration systems, i.e., CVS, record change – Line of code – 1/1,000th of seconds Artefacts Change a1 a2 a3 a4 a5 a6 7/20 t t0 t1 t2 t3 t4 t5 t6
  • 8. Solutions Choosing software artefacts – Files, for simplicity – Could be other artefacts, i.e., class… Computing appropriate window size Grouping co-changing files with DTW 8/20
  • 9. Window Size Files Change w3 w2 w1 f1 f2 f3 f4 f5 f6 t0 t Different co-changes in time – Keeping all changes – Missing recent co-changes Most files are changed frequently 9/20 – Window of 5 to 7 changes
  • 10. Grouping Co-changing Files(1/3) DTW: Dynamic Time Warping Such as 10/20
  • 11. Grouping Co-changing Files(2/3) Example with Euclidian distance – Q=(1 2 3 3 2 2) and C = (1 1 3 3 4 2) 11/20
  • 12. Grouping Co-changing Files(3/3) f1 f2 f3 f6 f5 f4 Threshold – Between 43,200 sec. and 86,400 sec. ⇒ Balance between precision and recall – Above 86,400 sec. ⇒ Recall over precision 12/20 – An artefact may belong to more than one group
  • 13. Case Study (1/8) Objectives – Compute internal precision and recall • k-fold cross validation – Compute external precision and recall • Comparison with a golden standard provided by an expert • The expert was not aware of the goal nor of the applied process and methodology – Assess predictive power • Prediction of changes with the golden standard – Study the scalability of the approach • k-fold cross validation on large program 13/20
  • 14. Case Study (2/8) Measures – Precision – Recall Query = File to be changed Retrieved documents = Co-changing files Average and weighted precision/recall 14/20
  • 15. Case Study (3/8) Object of the case study – PADL • Meta-model to describe OO programs – 3 years of development – 91 files with a history of a least 5 changes 15/20
  • 16. Case Study (4/8) Internal precision and recall – Building disjoint test and training sets • Maximum length of 7 • Testing sets a–b–c (c < 5, a > 1, b > 1, c > 1) • a, b, and c are numbers of changes in histories of length 5, 6, and 7 used to build test sets • 2-3-3 means test sets with the 2 most recent changes for history of length 5, 3 for length 6… 16/20
  • 17. Case Study (5/8) External precision and recall – Golden standard from PADL • Conservative approach 17/20
  • 18. Case Study (6/8) Predictive power – Real application for prediction • Take the last 7 changes • Ignore the most recent change • Use the 6 changes to predict the recent change 18/20
  • 19. Case Study (7/8) Scalability of the approach – PADL • 91 files, 3 years – Mozilla Web browser • Mirrored on July 15, 2005 • Versions of the next 5 days • More than 20,134 files (in 2,480 directories) • 3 minutes 19/20
  • 20. Case Study (8/8) Threat to the validity – Expert did not participate in the building of the DTW-based groupings – Use of same files to identify window size and precision/recall – Generalisation 20/20
  • 21. Conclusion Change patterns Synchrony change patterns Dynamic Time Warping – Increased precision and recall over previous work, in particular Zimmermann – External: 78.98% precision, 76.89% recall – Prediction: 77.20% precision, 79.20% recall 21/20
  • 22. Future Work Analysis of the strength of the groups Study of fluctuations in precision/recall Perform more external validations Apply clustering techniques Compare time intervals with windows based on the number of changes 22/20