SlideShare a Scribd company logo
1 of 22
Download to read offline
Salah Bouktif, Yann-Gaël Guéhéneuc, and Giuliano Antoniol
© Bouktif, Guéhéneuc, and Antoniol, 2006
GEODES
Extracting Change patterns
from CVS Repositories
WCRE 2006
Benevento, Italy
2/20
Summary
Our work
– Introduces the concept of change patterns
to analyse evolution information
– Defines and identifies the Synchrony
change pattern using the DTW technique
– Evaluates the extraction of the Synchrony
change pattern wrt. a golden standard
– Eases tracking changes in time- and
geographically-distributed software
development processes, e.g., open source
3/20
Context and Problem
Time- and geographically-distributed
software development is common
Difficulty in sharing timely information
about changes among developers
Version control systems contains
millions of fine-grained language-
independent changes difficult to track
and to organise
4/20
Definitions (1/2)
Change patterns
(Similar in the idea to design patterns)
– Common and recurring changes during the
evolution of a software
– Reification of explicit and implicit
relationships among software artefacts in
the version control system
5/20
Definitions (2/2)
Synchrony change patterns
– Files that change at almost the same
moments in time
– Co-changes that happened in the past are
likely to occur in the near future
– Explicit and implicit dependencies among
files remain stable over time
“If this particular file changes, what other
files should change?”
6/20
Related Work
Gall
– Evolution patterns
– Growth and change of behaviour
German
– Modification requests, interrelationships
Beyer
– Interpretable graph layouts
Zimmermann
– Co-changing fine grain entities using data mining
– Limited precision and recall
7/20
Difficulties
Version configuration systems, i.e.,
CVS, record change
– Line of code
– 1/1,000th of seconds
t
t0 t1 t2 t6
a1
a2
a3
a4
a5
a6
t3 t4 t5
ChangeArtefacts
8/20
Solutions
Choosing software artefacts
– Files, for simplicity
– Could be other artefacts, i.e., class…
Computing appropriate window size
Grouping co-changing files with DTW
9/20
Window Size
t
w1
w2
f1
f2
f3
f4
f5
f6
w3ChangeFiles
t0
Different co-changes in time
– Keeping all changes
– Missing recent co-changes
Most files are changed frequently
– Window of 5 to 7 changes
10/20
Grouping Co-changing Files(1/3)
DTW: Dynamic Time Warping
Such as
11/20
Grouping Co-changing Files(2/3)
Example with Euclidian distance
– Q=(1 2 3 3 2 2) and C = (1 1 3 3 4 2)
12/20
Grouping Co-changing Files(3/3)
Threshold
– Between 43,200 sec. and 86,400 sec. ⇒ Balance
between precision and recall
– Above 86,400 sec. ⇒ Recall over precision
– An artefact may belong to more than one group
f1
f2
f3
f6
f5
f4
13/20
Case Study (1/8)
Objectives
– Compute internal precision and recall
• k-fold cross validation
– Compute external precision and recall
• Comparison with a golden standard provided by an
expert
• The expert was not aware of the goal nor of the applied
process and methodology
– Assess predictive power
• Prediction of changes with the golden standard
– Study the scalability of the approach
• k-fold cross validation on large program
14/20
Case Study (2/8)
Measures
– Precision
– Recall
Query = File to be changed
Retrieved documents = Co-changing files
Average and weighted precision/recall
15/20
Case Study (3/8)
Object of the case study
– PADL
• Meta-model to describe OO programs
– 3 years of development
– 91 files with a history of a least 5 changes
16/20
Case Study (4/8)
Internal precision and recall
– Building disjoint test and training sets
• Maximum length of 7
• Testing sets a–b–c (c < 5, a > 1, b > 1, c > 1)
• a, b, and c are numbers of changes in histories
of length 5, 6, and 7 used to build test sets
• 2-3-3 means test sets with the 2 most recent
changes for history of length 5, 3 for length 6…
17/20
Case Study (5/8)
External precision and recall
– Golden standard from PADL
• Conservative approach
18/20
Case Study (6/8)
Predictive power
– Real application for prediction
• Take the last 7 changes
• Ignore the most recent change
• Use the 6 changes to predict the recent change
19/20
Case Study (7/8)
Scalability of the approach
– PADL
• 91 files, 3 years
– Mozilla Web browser
• Mirrored on July 15, 2005
• Versions of the next 5 days
• More than 20,134 files (in 2,480 directories)
• 3 minutes
20/20
Case Study (8/8)
Threat to the validity
– Expert did not participate in the building of
the DTW-based groupings
– Use of same files to identify window size
and precision/recall
– Generalisation
21/20
Conclusion
Change patterns
Synchrony change patterns
Dynamic Time Warping
– Increased precision and recall over previous
work, in particular Zimmermann
– External: 78.98% precision, 76.89% recall
– Prediction: 77.20% precision, 79.20% recall
22/20
Future Work
Analysis of the strength of the groups
Study of fluctuations in precision/recall
Perform more external validations
Apply clustering techniques
Compare time intervals with windows
based on the number of changes

More Related Content

Similar to Wcre06.ppt

Configuration Management
Configuration ManagementConfiguration Management
Configuration Management
elliando dias
 
VL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
VL/HCC 2014 - A Longitudinal Study of Programmers' BacktrackingVL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
VL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
YoungSeok Yoon
 
naveed-kamran-software-architecture-agile
naveed-kamran-software-architecture-agilenaveed-kamran-software-architecture-agile
naveed-kamran-software-architecture-agile
Naveed Kamran
 
Introduction to Version Control
Introduction to Version ControlIntroduction to Version Control
Introduction to Version Control
Wei-Tsung Su
 

Similar to Wcre06.ppt (20)

A preliminary study of GitHub Actions workflow changes .pptx
A preliminary study of GitHub Actions workflow changes .pptxA preliminary study of GitHub Actions workflow changes .pptx
A preliminary study of GitHub Actions workflow changes .pptx
 
DSD-INT 2023 Deltares Hydrology Suite - An introduction - Slootjes
DSD-INT 2023 Deltares Hydrology Suite - An introduction - SlootjesDSD-INT 2023 Deltares Hydrology Suite - An introduction - Slootjes
DSD-INT 2023 Deltares Hydrology Suite - An introduction - Slootjes
 
Configuration Management
Configuration ManagementConfiguration Management
Configuration Management
 
Wastewater networks modeling using info works cs
Wastewater networks modeling using info works csWastewater networks modeling using info works cs
Wastewater networks modeling using info works cs
 
MODELS 2019: Querying and annotating model histories with time-aware patterns
MODELS 2019: Querying and annotating model histories with time-aware patternsMODELS 2019: Querying and annotating model histories with time-aware patterns
MODELS 2019: Querying and annotating model histories with time-aware patterns
 
Repository deposit: specifying user requirements and test cases
Repository deposit: specifying user requirements and test casesRepository deposit: specifying user requirements and test cases
Repository deposit: specifying user requirements and test cases
 
VL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
VL/HCC 2014 - A Longitudinal Study of Programmers' BacktrackingVL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
VL/HCC 2014 - A Longitudinal Study of Programmers' Backtracking
 
ECMFA 2016 slides
ECMFA 2016 slidesECMFA 2016 slides
ECMFA 2016 slides
 
naveed-kamran-software-architecture-agile
naveed-kamran-software-architecture-agilenaveed-kamran-software-architecture-agile
naveed-kamran-software-architecture-agile
 
Technical appraisal and change impact analysis - IDCC17 workshop
Technical appraisal and change impact analysis - IDCC17 workshopTechnical appraisal and change impact analysis - IDCC17 workshop
Technical appraisal and change impact analysis - IDCC17 workshop
 
Be cse
Be cseBe cse
Be cse
 
Social and Technical Evolution of the Ruby on Rails Software Ecosystem
Social and Technical Evolution of the Ruby on Rails Software EcosystemSocial and Technical Evolution of the Ruby on Rails Software Ecosystem
Social and Technical Evolution of the Ruby on Rails Software Ecosystem
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
Woop - Workflow Optimizer
Woop - Workflow OptimizerWoop - Workflow Optimizer
Woop - Workflow Optimizer
 
COMPSAC 2008 Presentation
COMPSAC 2008 PresentationCOMPSAC 2008 Presentation
COMPSAC 2008 Presentation
 
Emerson Exchange 3D plots Process Analysis
Emerson Exchange 3D plots Process AnalysisEmerson Exchange 3D plots Process Analysis
Emerson Exchange 3D plots Process Analysis
 
Social and Technical Evolution of the Ruby on Rails Software Ecosystem
Social and Technical Evolution of the Ruby on Rails Software EcosystemSocial and Technical Evolution of the Ruby on Rails Software Ecosystem
Social and Technical Evolution of the Ruby on Rails Software Ecosystem
 
3rd 3DDRESD: DReAMS
3rd 3DDRESD: DReAMS3rd 3DDRESD: DReAMS
3rd 3DDRESD: DReAMS
 
Assessing Effect Sizes of Influence Factors Towards a QoE Model for HTTP Adap...
Assessing Effect Sizes of Influence Factors Towards a QoE Model for HTTP Adap...Assessing Effect Sizes of Influence Factors Towards a QoE Model for HTTP Adap...
Assessing Effect Sizes of Influence Factors Towards a QoE Model for HTTP Adap...
 
Introduction to Version Control
Introduction to Version ControlIntroduction to Version Control
Introduction to Version Control
 

More from Yann-Gaël Guéhéneuc

Evolution and Examples of Java Features, from Java 1.7 to Java 22
Evolution and Examples of Java Features, from Java 1.7 to Java 22Evolution and Examples of Java Features, from Java 1.7 to Java 22
Evolution and Examples of Java Features, from Java 1.7 to Java 22
Yann-Gaël Guéhéneuc
 
Consequences and Principles of Software Quality v0.3
Consequences and Principles of Software Quality v0.3Consequences and Principles of Software Quality v0.3
Consequences and Principles of Software Quality v0.3
Yann-Gaël Guéhéneuc
 
On Reflection in OO Programming Languages v1.6
On Reflection in OO Programming Languages v1.6On Reflection in OO Programming Languages v1.6
On Reflection in OO Programming Languages v1.6
Yann-Gaël Guéhéneuc
 

More from Yann-Gaël Guéhéneuc (20)

Advice for writing a NSERC Discovery grant application v0.5
Advice for writing a NSERC Discovery grant application v0.5Advice for writing a NSERC Discovery grant application v0.5
Advice for writing a NSERC Discovery grant application v0.5
 
Ptidej Architecture, Design, and Implementation in Action v2.1
Ptidej Architecture, Design, and Implementation in Action v2.1Ptidej Architecture, Design, and Implementation in Action v2.1
Ptidej Architecture, Design, and Implementation in Action v2.1
 
Evolution and Examples of Java Features, from Java 1.7 to Java 22
Evolution and Examples of Java Features, from Java 1.7 to Java 22Evolution and Examples of Java Features, from Java 1.7 to Java 22
Evolution and Examples of Java Features, from Java 1.7 to Java 22
 
Consequences and Principles of Software Quality v0.3
Consequences and Principles of Software Quality v0.3Consequences and Principles of Software Quality v0.3
Consequences and Principles of Software Quality v0.3
 
Some Pitfalls with Python and Their Possible Solutions v0.9
Some Pitfalls with Python and Their Possible Solutions v0.9Some Pitfalls with Python and Their Possible Solutions v0.9
Some Pitfalls with Python and Their Possible Solutions v0.9
 
An Explanation of the Unicode, the Text Encoding Standard, Its Usages and Imp...
An Explanation of the Unicode, the Text Encoding Standard, Its Usages and Imp...An Explanation of the Unicode, the Text Encoding Standard, Its Usages and Imp...
An Explanation of the Unicode, the Text Encoding Standard, Its Usages and Imp...
 
An Explanation of the Halting Problem and Its Consequences
An Explanation of the Halting Problem and Its ConsequencesAn Explanation of the Halting Problem and Its Consequences
An Explanation of the Halting Problem and Its Consequences
 
Are CPUs VMs Like Any Others? v1.0
Are CPUs VMs Like Any Others? v1.0Are CPUs VMs Like Any Others? v1.0
Are CPUs VMs Like Any Others? v1.0
 
Informaticien(ne)s célèbres (v1.0.2, 19/02/20)
Informaticien(ne)s célèbres (v1.0.2, 19/02/20)Informaticien(ne)s célèbres (v1.0.2, 19/02/20)
Informaticien(ne)s célèbres (v1.0.2, 19/02/20)
 
Well-known Computer Scientists v1.0.2
Well-known Computer Scientists v1.0.2Well-known Computer Scientists v1.0.2
Well-known Computer Scientists v1.0.2
 
On Java Generics, History, Use, Caveats v1.1
On Java Generics, History, Use, Caveats v1.1On Java Generics, History, Use, Caveats v1.1
On Java Generics, History, Use, Caveats v1.1
 
On Reflection in OO Programming Languages v1.6
On Reflection in OO Programming Languages v1.6On Reflection in OO Programming Languages v1.6
On Reflection in OO Programming Languages v1.6
 
ICSOC'21
ICSOC'21ICSOC'21
ICSOC'21
 
Vissoft21.ppt
Vissoft21.pptVissoft21.ppt
Vissoft21.ppt
 
Service computation20.ppt
Service computation20.pptService computation20.ppt
Service computation20.ppt
 
Serp4 iot20.ppt
Serp4 iot20.pptSerp4 iot20.ppt
Serp4 iot20.ppt
 
Msr20.ppt
Msr20.pptMsr20.ppt
Msr20.ppt
 
Iwesep19.ppt
Iwesep19.pptIwesep19.ppt
Iwesep19.ppt
 
Icsoc20.ppt
Icsoc20.pptIcsoc20.ppt
Icsoc20.ppt
 
Icsoc18.ppt
Icsoc18.pptIcsoc18.ppt
Icsoc18.ppt
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Recently uploaded (20)

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 

Wcre06.ppt

  • 1. Salah Bouktif, Yann-Gaël Guéhéneuc, and Giuliano Antoniol © Bouktif, Guéhéneuc, and Antoniol, 2006 GEODES Extracting Change patterns from CVS Repositories WCRE 2006 Benevento, Italy
  • 2. 2/20 Summary Our work – Introduces the concept of change patterns to analyse evolution information – Defines and identifies the Synchrony change pattern using the DTW technique – Evaluates the extraction of the Synchrony change pattern wrt. a golden standard – Eases tracking changes in time- and geographically-distributed software development processes, e.g., open source
  • 3. 3/20 Context and Problem Time- and geographically-distributed software development is common Difficulty in sharing timely information about changes among developers Version control systems contains millions of fine-grained language- independent changes difficult to track and to organise
  • 4. 4/20 Definitions (1/2) Change patterns (Similar in the idea to design patterns) – Common and recurring changes during the evolution of a software – Reification of explicit and implicit relationships among software artefacts in the version control system
  • 5. 5/20 Definitions (2/2) Synchrony change patterns – Files that change at almost the same moments in time – Co-changes that happened in the past are likely to occur in the near future – Explicit and implicit dependencies among files remain stable over time “If this particular file changes, what other files should change?”
  • 6. 6/20 Related Work Gall – Evolution patterns – Growth and change of behaviour German – Modification requests, interrelationships Beyer – Interpretable graph layouts Zimmermann – Co-changing fine grain entities using data mining – Limited precision and recall
  • 7. 7/20 Difficulties Version configuration systems, i.e., CVS, record change – Line of code – 1/1,000th of seconds t t0 t1 t2 t6 a1 a2 a3 a4 a5 a6 t3 t4 t5 ChangeArtefacts
  • 8. 8/20 Solutions Choosing software artefacts – Files, for simplicity – Could be other artefacts, i.e., class… Computing appropriate window size Grouping co-changing files with DTW
  • 9. 9/20 Window Size t w1 w2 f1 f2 f3 f4 f5 f6 w3ChangeFiles t0 Different co-changes in time – Keeping all changes – Missing recent co-changes Most files are changed frequently – Window of 5 to 7 changes
  • 10. 10/20 Grouping Co-changing Files(1/3) DTW: Dynamic Time Warping Such as
  • 11. 11/20 Grouping Co-changing Files(2/3) Example with Euclidian distance – Q=(1 2 3 3 2 2) and C = (1 1 3 3 4 2)
  • 12. 12/20 Grouping Co-changing Files(3/3) Threshold – Between 43,200 sec. and 86,400 sec. ⇒ Balance between precision and recall – Above 86,400 sec. ⇒ Recall over precision – An artefact may belong to more than one group f1 f2 f3 f6 f5 f4
  • 13. 13/20 Case Study (1/8) Objectives – Compute internal precision and recall • k-fold cross validation – Compute external precision and recall • Comparison with a golden standard provided by an expert • The expert was not aware of the goal nor of the applied process and methodology – Assess predictive power • Prediction of changes with the golden standard – Study the scalability of the approach • k-fold cross validation on large program
  • 14. 14/20 Case Study (2/8) Measures – Precision – Recall Query = File to be changed Retrieved documents = Co-changing files Average and weighted precision/recall
  • 15. 15/20 Case Study (3/8) Object of the case study – PADL • Meta-model to describe OO programs – 3 years of development – 91 files with a history of a least 5 changes
  • 16. 16/20 Case Study (4/8) Internal precision and recall – Building disjoint test and training sets • Maximum length of 7 • Testing sets a–b–c (c < 5, a > 1, b > 1, c > 1) • a, b, and c are numbers of changes in histories of length 5, 6, and 7 used to build test sets • 2-3-3 means test sets with the 2 most recent changes for history of length 5, 3 for length 6…
  • 17. 17/20 Case Study (5/8) External precision and recall – Golden standard from PADL • Conservative approach
  • 18. 18/20 Case Study (6/8) Predictive power – Real application for prediction • Take the last 7 changes • Ignore the most recent change • Use the 6 changes to predict the recent change
  • 19. 19/20 Case Study (7/8) Scalability of the approach – PADL • 91 files, 3 years – Mozilla Web browser • Mirrored on July 15, 2005 • Versions of the next 5 days • More than 20,134 files (in 2,480 directories) • 3 minutes
  • 20. 20/20 Case Study (8/8) Threat to the validity – Expert did not participate in the building of the DTW-based groupings – Use of same files to identify window size and precision/recall – Generalisation
  • 21. 21/20 Conclusion Change patterns Synchrony change patterns Dynamic Time Warping – Increased precision and recall over previous work, in particular Zimmermann – External: 78.98% precision, 76.89% recall – Prediction: 77.20% precision, 79.20% recall
  • 22. 22/20 Future Work Analysis of the strength of the groups Study of fluctuations in precision/recall Perform more external validations Apply clustering techniques Compare time intervals with windows based on the number of changes