SlideShare a Scribd company logo
1 of 22
Late Propagation
  in Software Clones
Liliane Barbour, Foutse Khomh,
          and Ying Zou
Late Propagation (LP)
• Definition: An inconsistent change that diverges a
  clone pair, later followed by a consistent, re-
  synchronizing change.
• It can be risky because failure to propagate changes
  between clones in a clone pair can lead to faults
• In our work, we found that 8-21% of genealogies
  contain a late propagation




                                                         2
LP With Propagation Example from
                ArgoUML
//Clone A, Revision 595
add Field(new UMLComboBox(typeModel),1,0,0);

//Clone B, Revision 595
add Field(new UMLComboBox(classifierModel),2,0,0);

//Diverging Change: Clone A, Revision 602
add Field(new UMLComboBoxNavigator(this,”NavClass”,
         new UMLComboBox(typeModel)),1,0,0);

//Re-synchronizing Change: Clone B, Revision 604
add Field(new UMLComboBoxNavigator (this,”NavClass”,
         new UMLComboBox(classifierModel)),2,0,0);
                                                          Clone A   Clone B

                                                Revision 595



                                                Revision 602              Diverging
                                                                          Change


                                                                          Re-synchronizing
                                                Revision 604              Change    3
LP Without Propagation Example
               from Ant
//Clone A, Revision 270250                                  Clone A   Clone B
if( destFile == null )
{                                                    Revision
   destFile = new File(destDir,file.getName());      270250
}

//Clone B, Revision 270250                           Revision              Diverging
if (destFile == null ) {                             270264                Change
   destFile = new File(destDir,file.getName());
}
                                                   Revision                Re-synchronizing
// Diverging Change: Clone A, Revision 270264      271109                  Change
if ( m_destFile == null )
{
   m_destFile = new File(m_destDir,m_file.getName());
}

//Re-synchronizing Change: Clone A, Revision 271109
if ( destFile == null ) {
   destFile = new File(destDir,file.getName());
}



                                                                                   4
Types of Late Propagation
Propagation       LP     Modified During Modified During   Modified During
Category          Type   Diverging Change the Period of    Re-synchronizing
                                          Divergence       Change
Propagation        LP1          A               A                  B
Always Occurs      LP2          A             A and B              B
                   LP3          A               A               A and B
Propagation May    LP4          A             A and B              A
or May Not         LP5          A             A and B           A and B
Occur
                   LP6       A and B          A and B            A or B
                   LP7       A and B          A and B           A and B
Propagation        LP8          A               A                  A
Never Occurs



                                                                              5
Research Questions
RQ1: Are there different types of LP?

RQ2: Are some types of LP more fault-prone than
  others?

RQ3: Which type of LP experiences the highest
    proportion of faults?



                                                  6
Subject Systems


                             # Gen    # LP     # Gen    # LP
System   # LOC # Revisions   CCFinder CCFinder Simian   Simian
ArgoUML 3.1M       18k         14k      1.1k     111      23
  Ant    2.3M     1.0M         30k      4.7k     461      80




                                                                 7
Our Approach




               8
Mining the SVN




• Use J-Rex to mine the SVN
• Heuristics used to identify reason for commit
  (Mockus et al., 2000)
• Snapshots of all revisions to each Java file are stored
  in an XML file
• Test files are removed
                                                            9
Clone Detection




• Contents of each method revision extracted into
  individual files
• Perform clone detection once on all snapshots
• Two existing clone detection tools are used
   – Simian (text-based) and CCFinder (token-based)
                                                      10
Building Clone Genealogies




• Build clone genealogies using the existing clone list
• Query the SVN using diff to track changes to each
  clone in a clone pair over time.
• If a change modifies one of the clones in a clone
  pair, query the clone list for a matching clone
                                                          11
RQ1: Are there different types of LP?




                                    12
RQ1: Are there different types of LP?
                                            Breakdown of LP Type by System
                                   80%
Percentage of All LP Occurrences



                                   70%
                                   60%
                                   50%
                                   40%
                                   30%
                                   20%
                                   10%
                                    0%
                                          LP1     LP2       LP3     LP4     LP5       LP6     LP7     LP8
                                                                      LP Types
                                   ArgoUML - Simian     ArgoUML - CCFinder     Ant - Simian   Ant - CCFinder


                There is representation from multiple types of LP
                          and across all categories of LP.                                                     13
RQ2: Are some types of LP more fault-
         prone than others?




      Part 1: Is Late Propagation fault-prone?

 Part 2: Are specific types of late propagation more
                       fault-prone?

                                                       14
Part 1: Is Late Propagation Fault-
                  prone?
                              LP vs. Non-LP
                               Odds Ratios
                   4
                                                                     ArgoUML – Simian
      Odds Ratio




                   3
                                                                    is omitted because
                   2
                                                                    it is not statistically
                   1                                                      significant
                   0
               Ant - Simian   ArgoUML - CCFinder   Ant - CCFinder


In all significant cases, the odds ratio is greater than 1.
 Therefore, LP genealogies are more fault prone than
                    non-LP genealogies.
                                                                                      15
Part 2: Are specific types of late
 propagation more fault-prone?
                    Odds Ratios Between Each LP Type
                        and Non-LP Genealogies
               16
               14
               12
  Odds Ratio




               10
                8
                6
                4
                2
                0
                      LP1     LP2   LP3    LP4    LP5    LP6   LP7     LP8
                                             LP Type
                    Ant - Simian    ArgoUML - CCFinder    Ant - CCFinder

Note: ArgoUML – Simian is omitted because it is not statistically significant   16
RQ2 Observations
• In general, some LP types are not more fault-prone
  than non-LP genealogies (i.e. odds ratio < 1)
• Some types that make up a small proportion of LP
  instances have a very high odds ratio
• LP7 and LP8 occur frequently but have low odds
  ratios.
Each type of LP has a different level of fault-proneness.



                                                       17
RQ3: Which type of LP experiences
 the highest proportion of faults?




                                     18
RQ3: Which type of LP experiences
 the highest proportion of faults?
                                          Percentage of Fault Occurrences
                                             Broken Down by LP Type
  Percentage of Fault Occurrences




                                    80%

                                    60%

                                    40%

                                    20%

                                    0%
                                           LP1   LP2    LP3    LP4    LP5   LP6    LP7    LP8
                                                                 LP Type

                                      Ant - Simian     ArgoUML - CCFinder    Ant - CCFinder

Note: ArgoUML – Simian is omitted because it is not statistically significant                   19
RQ3 Observations
• LP7 and LP8 contribute a large proportion of the
  faults but have lower odds ratios (RQ2)
   – When faults occur, they occur in large numbers
• Overall, LP7 and LP8 are the most dangerous, with
  the other types being system dependent in their
  fault-proneness.


       The proportion of faults is different for
                   each LP type.

                                                      20
Conclusion
• In general, LP genealogies are more fault-prone than
  non-LP genealogies
• LP7 and LP8 are the riskiest, in terms of their fault-
  proneness and magnitude of faults.
   – LP8 contains no propagation of changes
   – LP7 may or may not contain any propagation of
     changes
• The fault-proneness and fault-occurrence is
  dependent on the LP type and is system-dependent.

                                                       21
22

More Related Content

More from Foutse Khomh

Talk-Foutse-SrangeLoop.pdf
Talk-Foutse-SrangeLoop.pdfTalk-Foutse-SrangeLoop.pdf
Talk-Foutse-SrangeLoop.pdfFoutse Khomh
 
Foutse_MSR Vision keynote.pptx
Foutse_MSR Vision keynote.pptxFoutse_MSR Vision keynote.pptx
Foutse_MSR Vision keynote.pptxFoutse Khomh
 
Stack overflow code_laundering
Stack overflow code_launderingStack overflow code_laundering
Stack overflow code_launderingFoutse Khomh
 
Mining the Relationship between Anti-patterns Dependencies and Fault-Proneness
Mining the Relationship between Anti-patterns Dependencies and Fault-PronenessMining the Relationship between Anti-patterns Dependencies and Fault-Proneness
Mining the Relationship between Anti-patterns Dependencies and Fault-PronenessFoutse Khomh
 
Predicting bugs using antipatterns
Predicting bugs using antipatternsPredicting bugs using antipatterns
Predicting bugs using antipatternsFoutse Khomh
 
How does Context affect the Distribution of Software Maintainability Metrics?
How does Context affect the Distribution of Software Maintainability Metrics?How does Context affect the Distribution of Software Maintainability Metrics?
How does Context affect the Distribution of Software Maintainability Metrics?Foutse Khomh
 
On Rapid Releases and Software Testing
On Rapid Releases and Software TestingOn Rapid Releases and Software Testing
On Rapid Releases and Software TestingFoutse Khomh
 
Adapting Linux for Mobile Platforms: An Empirical Study of Android
Adapting Linux for Mobile Platforms: An Empirical Study of AndroidAdapting Linux for Mobile Platforms: An Empirical Study of Android
Adapting Linux for Mobile Platforms: An Empirical Study of AndroidFoutse Khomh
 
Recovering Commit Dependencies for Selective Code Integration in Software Pro...
Recovering Commit Dependencies for Selective Code Integration in Software Pro...Recovering Commit Dependencies for Selective Code Integration in Software Pro...
Recovering Commit Dependencies for Selective Code Integration in Software Pro...Foutse Khomh
 
An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...
An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...
An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...Foutse Khomh
 
Do Faster Releases Improve Software Quality?
Do Faster Releases Improve Software Quality? Do Faster Releases Improve Software Quality?
Do Faster Releases Improve Software Quality? Foutse Khomh
 

More from Foutse Khomh (12)

Talk-Foutse-SrangeLoop.pdf
Talk-Foutse-SrangeLoop.pdfTalk-Foutse-SrangeLoop.pdf
Talk-Foutse-SrangeLoop.pdf
 
Foutse_Khomh.pptx
Foutse_Khomh.pptxFoutse_Khomh.pptx
Foutse_Khomh.pptx
 
Foutse_MSR Vision keynote.pptx
Foutse_MSR Vision keynote.pptxFoutse_MSR Vision keynote.pptx
Foutse_MSR Vision keynote.pptx
 
Stack overflow code_laundering
Stack overflow code_launderingStack overflow code_laundering
Stack overflow code_laundering
 
Mining the Relationship between Anti-patterns Dependencies and Fault-Proneness
Mining the Relationship between Anti-patterns Dependencies and Fault-PronenessMining the Relationship between Anti-patterns Dependencies and Fault-Proneness
Mining the Relationship between Anti-patterns Dependencies and Fault-Proneness
 
Predicting bugs using antipatterns
Predicting bugs using antipatternsPredicting bugs using antipatterns
Predicting bugs using antipatterns
 
How does Context affect the Distribution of Software Maintainability Metrics?
How does Context affect the Distribution of Software Maintainability Metrics?How does Context affect the Distribution of Software Maintainability Metrics?
How does Context affect the Distribution of Software Maintainability Metrics?
 
On Rapid Releases and Software Testing
On Rapid Releases and Software TestingOn Rapid Releases and Software Testing
On Rapid Releases and Software Testing
 
Adapting Linux for Mobile Platforms: An Empirical Study of Android
Adapting Linux for Mobile Platforms: An Empirical Study of AndroidAdapting Linux for Mobile Platforms: An Empirical Study of Android
Adapting Linux for Mobile Platforms: An Empirical Study of Android
 
Recovering Commit Dependencies for Selective Code Integration in Software Pro...
Recovering Commit Dependencies for Selective Code Integration in Software Pro...Recovering Commit Dependencies for Selective Code Integration in Software Pro...
Recovering Commit Dependencies for Selective Code Integration in Software Pro...
 
An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...
An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...
An Entropy Evaluation Approach for Triaging Field Crashes: A Case Study of Mo...
 
Do Faster Releases Improve Software Quality?
Do Faster Releases Improve Software Quality? Do Faster Releases Improve Software Quality?
Do Faster Releases Improve Software Quality?
 

Recently uploaded

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Late Propagation in Software Clones

  • 1. Late Propagation in Software Clones Liliane Barbour, Foutse Khomh, and Ying Zou
  • 2. Late Propagation (LP) • Definition: An inconsistent change that diverges a clone pair, later followed by a consistent, re- synchronizing change. • It can be risky because failure to propagate changes between clones in a clone pair can lead to faults • In our work, we found that 8-21% of genealogies contain a late propagation 2
  • 3. LP With Propagation Example from ArgoUML //Clone A, Revision 595 add Field(new UMLComboBox(typeModel),1,0,0); //Clone B, Revision 595 add Field(new UMLComboBox(classifierModel),2,0,0); //Diverging Change: Clone A, Revision 602 add Field(new UMLComboBoxNavigator(this,”NavClass”, new UMLComboBox(typeModel)),1,0,0); //Re-synchronizing Change: Clone B, Revision 604 add Field(new UMLComboBoxNavigator (this,”NavClass”, new UMLComboBox(classifierModel)),2,0,0); Clone A Clone B Revision 595 Revision 602 Diverging Change Re-synchronizing Revision 604 Change 3
  • 4. LP Without Propagation Example from Ant //Clone A, Revision 270250 Clone A Clone B if( destFile == null ) { Revision destFile = new File(destDir,file.getName()); 270250 } //Clone B, Revision 270250 Revision Diverging if (destFile == null ) { 270264 Change destFile = new File(destDir,file.getName()); } Revision Re-synchronizing // Diverging Change: Clone A, Revision 270264 271109 Change if ( m_destFile == null ) { m_destFile = new File(m_destDir,m_file.getName()); } //Re-synchronizing Change: Clone A, Revision 271109 if ( destFile == null ) { destFile = new File(destDir,file.getName()); } 4
  • 5. Types of Late Propagation Propagation LP Modified During Modified During Modified During Category Type Diverging Change the Period of Re-synchronizing Divergence Change Propagation LP1 A A B Always Occurs LP2 A A and B B LP3 A A A and B Propagation May LP4 A A and B A or May Not LP5 A A and B A and B Occur LP6 A and B A and B A or B LP7 A and B A and B A and B Propagation LP8 A A A Never Occurs 5
  • 6. Research Questions RQ1: Are there different types of LP? RQ2: Are some types of LP more fault-prone than others? RQ3: Which type of LP experiences the highest proportion of faults? 6
  • 7. Subject Systems # Gen # LP # Gen # LP System # LOC # Revisions CCFinder CCFinder Simian Simian ArgoUML 3.1M 18k 14k 1.1k 111 23 Ant 2.3M 1.0M 30k 4.7k 461 80 7
  • 9. Mining the SVN • Use J-Rex to mine the SVN • Heuristics used to identify reason for commit (Mockus et al., 2000) • Snapshots of all revisions to each Java file are stored in an XML file • Test files are removed 9
  • 10. Clone Detection • Contents of each method revision extracted into individual files • Perform clone detection once on all snapshots • Two existing clone detection tools are used – Simian (text-based) and CCFinder (token-based) 10
  • 11. Building Clone Genealogies • Build clone genealogies using the existing clone list • Query the SVN using diff to track changes to each clone in a clone pair over time. • If a change modifies one of the clones in a clone pair, query the clone list for a matching clone 11
  • 12. RQ1: Are there different types of LP? 12
  • 13. RQ1: Are there different types of LP? Breakdown of LP Type by System 80% Percentage of All LP Occurrences 70% 60% 50% 40% 30% 20% 10% 0% LP1 LP2 LP3 LP4 LP5 LP6 LP7 LP8 LP Types ArgoUML - Simian ArgoUML - CCFinder Ant - Simian Ant - CCFinder There is representation from multiple types of LP and across all categories of LP. 13
  • 14. RQ2: Are some types of LP more fault- prone than others? Part 1: Is Late Propagation fault-prone? Part 2: Are specific types of late propagation more fault-prone? 14
  • 15. Part 1: Is Late Propagation Fault- prone? LP vs. Non-LP Odds Ratios 4 ArgoUML – Simian Odds Ratio 3 is omitted because 2 it is not statistically 1 significant 0 Ant - Simian ArgoUML - CCFinder Ant - CCFinder In all significant cases, the odds ratio is greater than 1. Therefore, LP genealogies are more fault prone than non-LP genealogies. 15
  • 16. Part 2: Are specific types of late propagation more fault-prone? Odds Ratios Between Each LP Type and Non-LP Genealogies 16 14 12 Odds Ratio 10 8 6 4 2 0 LP1 LP2 LP3 LP4 LP5 LP6 LP7 LP8 LP Type Ant - Simian ArgoUML - CCFinder Ant - CCFinder Note: ArgoUML – Simian is omitted because it is not statistically significant 16
  • 17. RQ2 Observations • In general, some LP types are not more fault-prone than non-LP genealogies (i.e. odds ratio < 1) • Some types that make up a small proportion of LP instances have a very high odds ratio • LP7 and LP8 occur frequently but have low odds ratios. Each type of LP has a different level of fault-proneness. 17
  • 18. RQ3: Which type of LP experiences the highest proportion of faults? 18
  • 19. RQ3: Which type of LP experiences the highest proportion of faults? Percentage of Fault Occurrences Broken Down by LP Type Percentage of Fault Occurrences 80% 60% 40% 20% 0% LP1 LP2 LP3 LP4 LP5 LP6 LP7 LP8 LP Type Ant - Simian ArgoUML - CCFinder Ant - CCFinder Note: ArgoUML – Simian is omitted because it is not statistically significant 19
  • 20. RQ3 Observations • LP7 and LP8 contribute a large proportion of the faults but have lower odds ratios (RQ2) – When faults occur, they occur in large numbers • Overall, LP7 and LP8 are the most dangerous, with the other types being system dependent in their fault-proneness. The proportion of faults is different for each LP type. 20
  • 21. Conclusion • In general, LP genealogies are more fault-prone than non-LP genealogies • LP7 and LP8 are the riskiest, in terms of their fault- proneness and magnitude of faults. – LP8 contains no propagation of changes – LP7 may or may not contain any propagation of changes • The fault-proneness and fault-occurrence is dependent on the LP type and is system-dependent. 21
  • 22. 22