SlideShare a Scribd company logo
1 of 24
Partitioning Composite Code Changes
to Facilitate Code Review
Yida Tao and Sunghun Kim
The Hong Kong University of Science and Technology
Atomic Code Change
Fixed bug #123
Atomic Code Change
Fixed bug #12, #34, #56
Removed duplicate code
Add a feature
Javadoc updated
Composite Code Change
Fixed bug #123
Atomic Code Change
Fixed bug #12, #34, #56
Removed duplicate code
Add a feature
Javadoc updated
Composite Code Change
Fixed bug #123
Difficult to review
Likely be rejected
Research Questions
• RQ1: Are composite code changes prevalent?
• RQ2: Can we propose an approach to improve the semantic atomicity
of composite code changes?
• RQ3: Can our approach help developers better review composite code
changes?
RQ1: Occurrence of composite code changes
• Data source
• 4 open-source Java projects
• Revisions that changed >= 2 lines of code
• Commit logs and source code were manually inspected
6
Time period Revisions Avg. cLOC Avg. files
Ant 2010/04/27 -- 2012/03/05 137 26.1 2.0
Commons Math 2011/11/28 -- 2012/04/12 107 84.7 3.5
Xerces 2008/11/03 -- 2012/03/13 116 63.6 3.0
JFreeChart 2008/07/02 -- 2010/03/30 93 144.9 4.1
Total revisions 453
82%
13%
5%
Xerces
7
8% - 29% revisions address multiple issues
RQ1: Occurrence of composite code changes
92%
7%
1%
Ant
82%
12%
6%
Commons Math
71%
18%
11%
JFreeChart
1 issue
2 issues
> 2 issues
Approach
A set of changed
statements
Partition of the change
(A subset is a change-slice)
A composite
code change
Identify
related
statements
8
Approach Formatting
Dependency
Similarity
9
A set of changed
statements
A composite
code change
Partition of the change
(A subset is a change-slice)
Approach
Partition of the patch
Unix diff
(text differencing)
ChangeDistiller*
(AST differencing)
*http://www.ifi.uzh.ch/seal/research
/tools/changeDistiller.html
Formatting
changes
10
Formatting
Dependency
Similarity
A set of changed
statements
A composite
code change
Approach
Partition of the patch
IBM T.J. Watson
Libraries for Analysis
(WALA)*
Inter-procedural
Backward static slicing
*http://wala.sourceforge.net/wiki/in
dex.php/Main_Page
11
Formatting
Dependency
Similarity
A set of changed
statements
A composite
code change
Approach
Partition of the patch“protect array entries against
corruption by returning a clone”
Same change type
Similar delta
12
Formatting
Dependency
Similarity
A set of changed
statements
A composite
code change
P
P’
Evaluation
• 78 composite code changes from the previously inspected data
• 3 human evaluators establish manual partitions for these changes
• Automatic partition results are compared to manual partitions
• Considered acceptableif it exactly matched the manual partition
82%
13%
5%
Xerces
92%
7%
1%
Ant
82%
12%
6%
Commons Math
71%
18%
11%
JFreeChart
1 issue
2 issues
> 2 issues
Evaluation
• 78 composite code changes from the previously inspected data
• 3 human evaluators establish manual partitions for these changes
• Automatic partition results are compared to manual partitions
• Considered acceptableif it exactly matched the manual partition
82%
13%
5%
Xerces
92%
7%
1%
Ant
82%
12%
6%
Commons Math
71%
18%
11%
JFreeChart
1 issue
2 issues
> 2 issues
Evaluation
• 78 composite code changes from the previously inspected data
• 3 human evaluators establish manual partitions for these changes
• Automatic partition results are compared to manual partitions
• Considered acceptableif it exactly matched the manual partition
Acceptable # / Total #
Ant 8 / 11
Commons Math 10 / 19
Xerces 16 / 21
JFreeChart 20 / 27
54 / 78 (69%)
Ant revision 943068 (24 changed LOC)
“Wrong assignment after I renamed the parameter. Unfortunately
there doesn’t seem to be a testcase that catches the error.”
Later fixed in revision 943070
17
One of the two change-slices after partitioning
Preliminary User Study
• RQ3
• Can our automatic partition help developers better review composite changes?
• Participants
• 18 CS graduate students
• Task
• Participants review 12 composite code changes
• Answer a series of code review questions [1], e.g.,
• “What is the consequence of removing the schemaType field?”
• “What do changes in these files have in common?”
18
[1]“Questions programmers ask during software evolution tasks” Sillito et al. FSE 2006
Experimental Settings
• Treatments
• Control group: review code changes by file
• Experimental group: review code changes by partition
19
Results
By file By partition By file By partition
20
p = 0.01 p = 0.81
Formatting
Dependency
Similarity
Composite
Code Changes
Partition of
the change
8% - 29% 69%
By file By partition
Discussion
•Impact of unsatisfactory change partitions
•Balancing between partition costs and benefits
Related Work
• Helping developers help themselves: Automatic decomposition of
code review changesets. Barnett et al. ICSE 2015
• The industrial perspective of composite code changes
• The impact of tangled code changes. Kim Herzig and Andreas Zeller.
MSR 2013
• Filtering Noise in Mixed-Purpose Fixing Commits to Improve Defect
Prediction and Localization. Nguyen et al. ISSRE 2013
• How composite code changes affect defect prediction
Acknowledgment
• We thank all the students that participated in our user study.
• We especially thank Tao He and Hai Wan for their kind help of
arranging the user study.
• We also thank Ananya Kanjilal for her comments on the paper draft.

More Related Content

What's hot

The Impact of Test Ownership and Team Structure on the Reliability and Effect...
The Impact of Test Ownership and Team Structure on the Reliability and Effect...The Impact of Test Ownership and Team Structure on the Reliability and Effect...
The Impact of Test Ownership and Team Structure on the Reliability and Effect...Kim Herzig
 
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug PredictionIt's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug Predictionsjust
 
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...Ali Ouni
 
Automated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsAutomated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsSAIL_QU
 
Review Participation in Modern Code Review: An Empirical Study of the Android...
Review Participation in Modern Code Review: An Empirical Study of the Android...Review Participation in Modern Code Review: An Empirical Study of the Android...
Review Participation in Modern Code Review: An Empirical Study of the Android...The University of Adelaide
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect PredictionSung Kim
 
Detecting Interaction Coupling from Task Interaction Histories
Detecting Interaction Coupling from Task Interaction HistoriesDetecting Interaction Coupling from Task Interaction Histories
Detecting Interaction Coupling from Task Interaction HistoriesSAIL_QU
 
A Mono- and Multi-objective Approach for Recommending Software Refactoring
A Mono- and Multi-objective Approach for Recommending Software RefactoringA Mono- and Multi-objective Approach for Recommending Software Refactoring
A Mono- and Multi-objective Approach for Recommending Software RefactoringAli Ouni
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSung Kim
 
Improving Code Review Effectiveness Through Reviewer Recommendations
Improving Code Review Effectiveness Through Reviewer RecommendationsImproving Code Review Effectiveness Through Reviewer Recommendations
Improving Code Review Effectiveness Through Reviewer RecommendationsThe University of Adelaide
 
INGI2252 Software Measures & Maintenance
INGI2252 Software Measures & MaintenanceINGI2252 Software Measures & Maintenance
INGI2252 Software Measures & Maintenancekim.mens
 
Software testing: an introduction - 2017
Software testing: an introduction - 2017Software testing: an introduction - 2017
Software testing: an introduction - 2017XavierDevroey
 
Exploring neXtProt data and beyond: A SPARQLing solution
Exploring neXtProt data and beyond: A SPARQLing solutionExploring neXtProt data and beyond: A SPARQLing solution
Exploring neXtProt data and beyond: A SPARQLing solutionneXtProt
 
Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...The University of Adelaide
 
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...Ali Ouni
 

What's hot (20)

The Impact of Test Ownership and Team Structure on the Reliability and Effect...
The Impact of Test Ownership and Team Structure on the Reliability and Effect...The Impact of Test Ownership and Team Structure on the Reliability and Effect...
The Impact of Test Ownership and Team Structure on the Reliability and Effect...
 
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug PredictionIt's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
 
Cser13.ppt
Cser13.pptCser13.ppt
Cser13.ppt
 
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
 
Automated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsAutomated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise Applications
 
Icsm19.ppt
Icsm19.pptIcsm19.ppt
Icsm19.ppt
 
Review Participation in Modern Code Review: An Empirical Study of the Android...
Review Participation in Modern Code Review: An Empirical Study of the Android...Review Participation in Modern Code Review: An Empirical Study of the Android...
Review Participation in Modern Code Review: An Empirical Study of the Android...
 
Who Should Review My Code?
Who Should Review My Code?  Who Should Review My Code?
Who Should Review My Code?
 
Enase20.ppt
Enase20.pptEnase20.ppt
Enase20.ppt
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
 
Detecting Interaction Coupling from Task Interaction Histories
Detecting Interaction Coupling from Task Interaction HistoriesDetecting Interaction Coupling from Task Interaction Histories
Detecting Interaction Coupling from Task Interaction Histories
 
A Mono- and Multi-objective Approach for Recommending Software Refactoring
A Mono- and Multi-objective Approach for Recommending Software RefactoringA Mono- and Multi-objective Approach for Recommending Software Refactoring
A Mono- and Multi-objective Approach for Recommending Software Refactoring
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
 
Improving Code Review Effectiveness Through Reviewer Recommendations
Improving Code Review Effectiveness Through Reviewer RecommendationsImproving Code Review Effectiveness Through Reviewer Recommendations
Improving Code Review Effectiveness Through Reviewer Recommendations
 
INGI2252 Software Measures & Maintenance
INGI2252 Software Measures & MaintenanceINGI2252 Software Measures & Maintenance
INGI2252 Software Measures & Maintenance
 
Software testing: an introduction - 2017
Software testing: an introduction - 2017Software testing: an introduction - 2017
Software testing: an introduction - 2017
 
Exploring neXtProt data and beyond: A SPARQLing solution
Exploring neXtProt data and beyond: A SPARQLing solutionExploring neXtProt data and beyond: A SPARQLing solution
Exploring neXtProt data and beyond: A SPARQLing solution
 
Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...
 
Icsm20.ppt
Icsm20.pptIcsm20.ppt
Icsm20.ppt
 
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
 

Viewers also liked

Best of tomhanks
Best of tomhanksBest of tomhanks
Best of tomhanksclicktoseo
 
Sustainability with Regards to coal energy Production
Sustainability with Regards to coal energy ProductionSustainability with Regards to coal energy Production
Sustainability with Regards to coal energy ProductionSangeen Jogezai
 
Writing acceptable patches: an empirical study of open source project patches
Writing acceptable patches: an empirical study of open source project patchesWriting acceptable patches: an empirical study of open source project patches
Writing acceptable patches: an empirical study of open source project patchesYida Tao
 
How do software engineers understand code changes?
How do software engineers understand code changes?How do software engineers understand code changes?
How do software engineers understand code changes?Yida Tao
 
Ch06 records management slide show part 2 with notes
Ch06 records management slide show part 2 with notesCh06 records management slide show part 2 with notes
Ch06 records management slide show part 2 with notesfrancarter2
 
Automatically generated patches as debugging aids
Automatically generated patches as debugging aidsAutomatically generated patches as debugging aids
Automatically generated patches as debugging aidsYida Tao
 
Kemal göktaş
Kemal göktaş Kemal göktaş
Kemal göktaş kemal99
 
Cultura de bolivia
Cultura de boliviaCultura de bolivia
Cultura de boliviaSanncal
 
Sinif yöneti̇mi̇
Sinif yöneti̇mi̇Sinif yöneti̇mi̇
Sinif yöneti̇mi̇kemal99
 
Neoimpressionism and postimpressionism
Neoimpressionism and postimpressionismNeoimpressionism and postimpressionism
Neoimpressionism and postimpressionismnaranjoisabel
 
Conversation Techniques
Conversation TechniquesConversation Techniques
Conversation TechniquesLu Gines
 
Aplikasi 5R dan tahap-tahapnya
Aplikasi 5R dan tahap-tahapnyaAplikasi 5R dan tahap-tahapnya
Aplikasi 5R dan tahap-tahapnyadani senja
 

Viewers also liked (15)

Best of tomhanks
Best of tomhanksBest of tomhanks
Best of tomhanks
 
Sustainability with Regards to coal energy Production
Sustainability with Regards to coal energy ProductionSustainability with Regards to coal energy Production
Sustainability with Regards to coal energy Production
 
Writing acceptable patches: an empirical study of open source project patches
Writing acceptable patches: an empirical study of open source project patchesWriting acceptable patches: an empirical study of open source project patches
Writing acceptable patches: an empirical study of open source project patches
 
Tema audio
Tema audioTema audio
Tema audio
 
How do software engineers understand code changes?
How do software engineers understand code changes?How do software engineers understand code changes?
How do software engineers understand code changes?
 
Ch06 records management slide show part 2 with notes
Ch06 records management slide show part 2 with notesCh06 records management slide show part 2 with notes
Ch06 records management slide show part 2 with notes
 
Automatically generated patches as debugging aids
Automatically generated patches as debugging aidsAutomatically generated patches as debugging aids
Automatically generated patches as debugging aids
 
Kemal göktaş
Kemal göktaş Kemal göktaş
Kemal göktaş
 
Cultura de bolivia
Cultura de boliviaCultura de bolivia
Cultura de bolivia
 
Sinif yöneti̇mi̇
Sinif yöneti̇mi̇Sinif yöneti̇mi̇
Sinif yöneti̇mi̇
 
Gambar Teknik
Gambar TeknikGambar Teknik
Gambar Teknik
 
Neoimpressionism and postimpressionism
Neoimpressionism and postimpressionismNeoimpressionism and postimpressionism
Neoimpressionism and postimpressionism
 
Conversation Techniques
Conversation TechniquesConversation Techniques
Conversation Techniques
 
Aplikasi 5R dan tahap-tahapnya
Aplikasi 5R dan tahap-tahapnyaAplikasi 5R dan tahap-tahapnya
Aplikasi 5R dan tahap-tahapnya
 
Tarun hait cv
Tarun hait cvTarun hait cv
Tarun hait cv
 

Similar to Partitioning composite code changes to facilitate code review

Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Incremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical SystemsIncremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical SystemsÁkos Horváth
 
Using HPC Resources to Exploit Big Data for Code Review Analytics
Using HPC Resources to Exploit Big Data for Code Review AnalyticsUsing HPC Resources to Exploit Big Data for Code Review Analytics
Using HPC Resources to Exploit Big Data for Code Review AnalyticsThe University of Adelaide
 
Incremental Model Queries for Model-Dirven Software Engineering
Incremental Model Queries for Model-Dirven Software EngineeringIncremental Model Queries for Model-Dirven Software Engineering
Incremental Model Queries for Model-Dirven Software EngineeringÁkos Horváth
 
A Tool for Optimizing Java 8 Stream Software via Automated Refactoring
A Tool for Optimizing Java 8 Stream Software via Automated RefactoringA Tool for Optimizing Java 8 Stream Software via Automated Refactoring
A Tool for Optimizing Java 8 Stream Software via Automated RefactoringRaffi Khatchadourian
 
PhD Proposal talk
PhD Proposal talkPhD Proposal talk
PhD Proposal talkRay Buse
 
Improving Code Quality In Medical Software Through Code Reviews - Vincit Teat...
Improving Code Quality In Medical Software Through Code Reviews - Vincit Teat...Improving Code Quality In Medical Software Through Code Reviews - Vincit Teat...
Improving Code Quality In Medical Software Through Code Reviews - Vincit Teat...VincitOy
 
Scam2010 thomas presentation
Scam2010 thomas presentationScam2010 thomas presentation
Scam2010 thomas presentationSAIL_QU
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Amine Barrak
 
Improving the accuracy and reliability of data analysis code
Improving the accuracy and reliability of data analysis codeImproving the accuracy and reliability of data analysis code
Improving the accuracy and reliability of data analysis codeJohan Carlin
 
TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...
TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...
TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...Iosif Itkin
 
Synthesizing Knowledge from Software Development Artifacts
Synthesizing Knowledge from Software Development ArtifactsSynthesizing Knowledge from Software Development Artifacts
Synthesizing Knowledge from Software Development ArtifactsJeongwhan Choi
 
AOUG_11Nov2016_Challenges_with_EBS12_2
AOUG_11Nov2016_Challenges_with_EBS12_2AOUG_11Nov2016_Challenges_with_EBS12_2
AOUG_11Nov2016_Challenges_with_EBS12_2Sean Braymen
 
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...Daniel Varro
 
Code Management Workshop
Code Management WorkshopCode Management Workshop
Code Management WorkshopSameh El-Ashry
 
Software Coding- Software Coding
Software Coding- Software CodingSoftware Coding- Software Coding
Software Coding- Software CodingNikhil Pandit
 
Bug prediction + sdlc automation
Bug prediction + sdlc automationBug prediction + sdlc automation
Bug prediction + sdlc automationAlexey Tokar
 
Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories Dániel Stein
 

Similar to Partitioning composite code changes to facilitate code review (20)

poster_3.0
poster_3.0poster_3.0
poster_3.0
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Incremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical SystemsIncremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical Systems
 
Using HPC Resources to Exploit Big Data for Code Review Analytics
Using HPC Resources to Exploit Big Data for Code Review AnalyticsUsing HPC Resources to Exploit Big Data for Code Review Analytics
Using HPC Resources to Exploit Big Data for Code Review Analytics
 
Incremental Model Queries for Model-Dirven Software Engineering
Incremental Model Queries for Model-Dirven Software EngineeringIncremental Model Queries for Model-Dirven Software Engineering
Incremental Model Queries for Model-Dirven Software Engineering
 
A Tool for Optimizing Java 8 Stream Software via Automated Refactoring
A Tool for Optimizing Java 8 Stream Software via Automated RefactoringA Tool for Optimizing Java 8 Stream Software via Automated Refactoring
A Tool for Optimizing Java 8 Stream Software via Automated Refactoring
 
PhD Proposal talk
PhD Proposal talkPhD Proposal talk
PhD Proposal talk
 
Improving Code Quality In Medical Software Through Code Reviews - Vincit Teat...
Improving Code Quality In Medical Software Through Code Reviews - Vincit Teat...Improving Code Quality In Medical Software Through Code Reviews - Vincit Teat...
Improving Code Quality In Medical Software Through Code Reviews - Vincit Teat...
 
Scam2010 thomas presentation
Scam2010 thomas presentationScam2010 thomas presentation
Scam2010 thomas presentation
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
 
Improving the accuracy and reliability of data analysis code
Improving the accuracy and reliability of data analysis codeImproving the accuracy and reliability of data analysis code
Improving the accuracy and reliability of data analysis code
 
TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...
TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...
TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...
 
Synthesizing Knowledge from Software Development Artifacts
Synthesizing Knowledge from Software Development ArtifactsSynthesizing Knowledge from Software Development Artifacts
Synthesizing Knowledge from Software Development Artifacts
 
AOUG_11Nov2016_Challenges_with_EBS12_2
AOUG_11Nov2016_Challenges_with_EBS12_2AOUG_11Nov2016_Challenges_with_EBS12_2
AOUG_11Nov2016_Challenges_with_EBS12_2
 
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
 
Code Management Workshop
Code Management WorkshopCode Management Workshop
Code Management Workshop
 
Software Coding- Software Coding
Software Coding- Software CodingSoftware Coding- Software Coding
Software Coding- Software Coding
 
Bug prediction + sdlc automation
Bug prediction + sdlc automationBug prediction + sdlc automation
Bug prediction + sdlc automation
 
Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories
 
Javantura v6 - How can you improve the quality of your application - Ioannis ...
Javantura v6 - How can you improve the quality of your application - Ioannis ...Javantura v6 - How can you improve the quality of your application - Ioannis ...
Javantura v6 - How can you improve the quality of your application - Ioannis ...
 

Recently uploaded

Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxraffaeleoman
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoKayode Fayemi
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lodhisaajjda
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaKayode Fayemi
 
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...Pooja Nehwal
 
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verifiedDelhi Call girls
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsaqsarehman5055
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Delhi Call girls
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfSenaatti-kiinteistöt
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubssamaasim06
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatmentnswingard
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Baileyhlharris
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCamilleBoulbin1
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardsticksaastr
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIINhPhngng3
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Vipesco
 

Recently uploaded (20)

Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
 
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animals
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptx
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 

Partitioning composite code changes to facilitate code review

  • 1. Partitioning Composite Code Changes to Facilitate Code Review Yida Tao and Sunghun Kim The Hong Kong University of Science and Technology
  • 3. Atomic Code Change Fixed bug #12, #34, #56 Removed duplicate code Add a feature Javadoc updated Composite Code Change Fixed bug #123
  • 4. Atomic Code Change Fixed bug #12, #34, #56 Removed duplicate code Add a feature Javadoc updated Composite Code Change Fixed bug #123 Difficult to review Likely be rejected
  • 5. Research Questions • RQ1: Are composite code changes prevalent? • RQ2: Can we propose an approach to improve the semantic atomicity of composite code changes? • RQ3: Can our approach help developers better review composite code changes?
  • 6. RQ1: Occurrence of composite code changes • Data source • 4 open-source Java projects • Revisions that changed >= 2 lines of code • Commit logs and source code were manually inspected 6 Time period Revisions Avg. cLOC Avg. files Ant 2010/04/27 -- 2012/03/05 137 26.1 2.0 Commons Math 2011/11/28 -- 2012/04/12 107 84.7 3.5 Xerces 2008/11/03 -- 2012/03/13 116 63.6 3.0 JFreeChart 2008/07/02 -- 2010/03/30 93 144.9 4.1 Total revisions 453
  • 7. 82% 13% 5% Xerces 7 8% - 29% revisions address multiple issues RQ1: Occurrence of composite code changes 92% 7% 1% Ant 82% 12% 6% Commons Math 71% 18% 11% JFreeChart 1 issue 2 issues > 2 issues
  • 8. Approach A set of changed statements Partition of the change (A subset is a change-slice) A composite code change Identify related statements 8
  • 9. Approach Formatting Dependency Similarity 9 A set of changed statements A composite code change Partition of the change (A subset is a change-slice)
  • 10. Approach Partition of the patch Unix diff (text differencing) ChangeDistiller* (AST differencing) *http://www.ifi.uzh.ch/seal/research /tools/changeDistiller.html Formatting changes 10 Formatting Dependency Similarity A set of changed statements A composite code change
  • 11. Approach Partition of the patch IBM T.J. Watson Libraries for Analysis (WALA)* Inter-procedural Backward static slicing *http://wala.sourceforge.net/wiki/in dex.php/Main_Page 11 Formatting Dependency Similarity A set of changed statements A composite code change
  • 12. Approach Partition of the patch“protect array entries against corruption by returning a clone” Same change type Similar delta 12 Formatting Dependency Similarity A set of changed statements A composite code change P P’
  • 13.
  • 14. Evaluation • 78 composite code changes from the previously inspected data • 3 human evaluators establish manual partitions for these changes • Automatic partition results are compared to manual partitions • Considered acceptableif it exactly matched the manual partition 82% 13% 5% Xerces 92% 7% 1% Ant 82% 12% 6% Commons Math 71% 18% 11% JFreeChart 1 issue 2 issues > 2 issues
  • 15. Evaluation • 78 composite code changes from the previously inspected data • 3 human evaluators establish manual partitions for these changes • Automatic partition results are compared to manual partitions • Considered acceptableif it exactly matched the manual partition 82% 13% 5% Xerces 92% 7% 1% Ant 82% 12% 6% Commons Math 71% 18% 11% JFreeChart 1 issue 2 issues > 2 issues
  • 16. Evaluation • 78 composite code changes from the previously inspected data • 3 human evaluators establish manual partitions for these changes • Automatic partition results are compared to manual partitions • Considered acceptableif it exactly matched the manual partition Acceptable # / Total # Ant 8 / 11 Commons Math 10 / 19 Xerces 16 / 21 JFreeChart 20 / 27 54 / 78 (69%)
  • 17. Ant revision 943068 (24 changed LOC) “Wrong assignment after I renamed the parameter. Unfortunately there doesn’t seem to be a testcase that catches the error.” Later fixed in revision 943070 17 One of the two change-slices after partitioning
  • 18. Preliminary User Study • RQ3 • Can our automatic partition help developers better review composite changes? • Participants • 18 CS graduate students • Task • Participants review 12 composite code changes • Answer a series of code review questions [1], e.g., • “What is the consequence of removing the schemaType field?” • “What do changes in these files have in common?” 18 [1]“Questions programmers ask during software evolution tasks” Sillito et al. FSE 2006
  • 19. Experimental Settings • Treatments • Control group: review code changes by file • Experimental group: review code changes by partition 19
  • 20. Results By file By partition By file By partition 20 p = 0.01 p = 0.81
  • 22. Discussion •Impact of unsatisfactory change partitions •Balancing between partition costs and benefits
  • 23. Related Work • Helping developers help themselves: Automatic decomposition of code review changesets. Barnett et al. ICSE 2015 • The industrial perspective of composite code changes • The impact of tangled code changes. Kim Herzig and Andreas Zeller. MSR 2013 • Filtering Noise in Mixed-Purpose Fixing Commits to Improve Defect Prediction and Localization. Nguyen et al. ISSRE 2013 • How composite code changes affect defect prediction
  • 24. Acknowledgment • We thank all the students that participated in our user study. • We especially thank Tao He and Hai Wan for their kind help of arranging the user study. • We also thank Ananya Kanjilal for her comments on the paper draft.

Editor's Notes

  1. The focus of this session, and of course this work, is code review, which in modern software development typically practiced on lightweight code changes. Characteristics of changes directly affect how code review is practice
  2. Considering two types of code changes/commits/revisions.
  3. The second change fixed 3 bugs, did some refactoring…. Which one is more acceptable from a code reviewers’ perspective?
  4. From our previous study, we found evidences that CCC are … compared to atomic changes. And this is the particular problem we try to tackle with in this work
  5. Specifically, we want to answer these 3 questions.
  6. We first investigate whether the problem of CCC is really prominent by taking a look the revision history of these 4 OS projects. We inspected revisions that changed at least 2 lines of code within this time period, that results in 453 revisions in total We mainly inspect the commit log to see how many logical issues are addressed. Filter criteria: multiple lines of source code changes, compilable,
  7. Here is what we observed. Blue means the revision is atomic, that is, it addresses one single issue. And orange and grey mean that the revision addresses 2 or more than 2 issues. Among the revisions that changed multiple source code lines, from 8% - 29% , on average 17% patches are composite, that is, they address 2 or more logical issues. non-trivial
  8. So next, we propose an approach to partition a composite code change, so that in the end each subset is more semantically cohesive and hopefully easier to review. First, a code change is modeled as a set of changed statements. Basically, we identify relations between statements to derive change partition. Statements in the same partition (change-slice) are related while statements in different subsets are not Now the core of our approach is this black box: how do we …
  9. We use 3 heuristics here
  10. First, we believe that formatting changes should be on their own instead of mingled with other type of changes, such as bug fixes. So we compute textual and AST differencing results of the change. Changed lines that are not identified in AST differencing but in the textual differencing are considered as formatting changes
  11. Second, we believe that changed statements that have static dependency should belong to the same subset. We use WALA to perform … on changed lines so that we know whether two changes have static dependency.
  12. Third, we believe that changes with similar patterns are also related. For example, these two functions both change the returning of a variable the same way. Although they don’t have static dependency, it’s obvious that they are serving the same purpose… So, as the 3rd heuristic, we check if two changes have same change type (e.g., if they both update the return statement, which we can tell from its AST node type) If so, whether they have string delta is above a threshold (0.8 by python library default)
  13. Here is an example of how our approach works. Before the change and after the change. Some code is added with the plus sign, some is deleted with the minus sign, others are modified with the caret sign Based on our heuristics, this change is partitioned into 3 change-slices, marked with different colors. First, the orange part is formatting changes Another change-slice is the green part of the change, … Finally, these two added methods also form a partition cuz their method names are similar.
  14. Remember in RQ1, we inspected over 400 revisions and some of them address multiple logical issues? We used these 78 revisions, marked in yellow and grey, in our evaluation.
  15. We compare the automatic partitions produced by our approach to the manual partitions. An automatic partition for a change is considered acceptable if … We are not claiming that the manual partition by human evaluators is the only way of partitioning a change, because code semantics are subject to …. In other words, the solution space for partitioning is potentially infinite. However, our goal is to facilitate code review. So if our automatic partition results match a common way of manual partition, then we say it’s acceptable.
  16. Among the 78 CCC, 54 are automatically partitioned the same way as human evaluators did. Yet, this brings up our third research question: does this automatic partition really help code review?
  17. We’ve observed several cases that indicate the benefits of our approach. Here is one of them. Our approach partitions a Ant revision, which changes 24 LOC, into 2 subsets. One subset pinpoints a buggy area, which was later fixed by developers. Specifically, developers said “no testcase…” We think that our patch partition may help make this bug more obvious and identifiable from a big, composite code change.
  18. We also conduct a user study to investigate…? We recruit … to review 12 composite … from our evaluation data and answer
  19. There are 2 treatments. The control condition is that participants review … by file, as they usually do The experimental condition is that .. by partition
  20. We analyze the correctness of their answers and the time they spent on coming up the answers. We found that the group using partition to review patches performs significantly better than the control group in answering questions correctly, although they spend a slightly, but not significantly longer time. This result is also promising in showing that our approach potentially help the patch review process One-tailed Paired-samples t-test Time no significant different. Participants even spent a slightly longer time under the by slice treatment. We conjecture that such a time increase mainly resulted from participants’ additional learning cost for using the unfamiliar “partitions” to understand patches.
  21. 69% composite changes are automatically partitioned the same way as human evaluators did Our user study shows Participants’ code review effectiveness is significantly improved
  22. Finally, I would like to bring up these issues that are highly relevant and important, yet we haven’t been able to address them in this work yet. For example …. I think these would to interesting to discussion and I’ll leave it to the discussion slot. Cost: learning cost, execution cost Hypothesis: size and complexity