SlideShare a Scribd company logo
Partitioning Composite Code Changes
to Facilitate Code Review
Yida Tao and Sunghun Kim
The Hong Kong University of Science and Technology
Atomic Code Change
Fixed bug #123
Atomic Code Change
Fixed bug #12, #34, #56
Removed duplicate code
Add a feature
Javadoc updated
Composite Code Change
Fixed bug #123
Atomic Code Change
Fixed bug #12, #34, #56
Removed duplicate code
Add a feature
Javadoc updated
Composite Code Change
Fixed bug #123
Difficult to review
Likely be rejected
Research Questions
• RQ1: Are composite code changes prevalent?
• RQ2: Can we propose an approach to improve the semantic atomicity
of composite code changes?
• RQ3: Can our approach help developers better review composite code
changes?
RQ1: Occurrence of composite code changes
• Data source
• 4 open-source Java projects
• Revisions that changed >= 2 lines of code
• Commit logs and source code were manually inspected
6
Time period Revisions Avg. cLOC Avg. files
Ant 2010/04/27 -- 2012/03/05 137 26.1 2.0
Commons Math 2011/11/28 -- 2012/04/12 107 84.7 3.5
Xerces 2008/11/03 -- 2012/03/13 116 63.6 3.0
JFreeChart 2008/07/02 -- 2010/03/30 93 144.9 4.1
Total revisions 453
82%
13%
5%
Xerces
7
8% - 29% revisions address multiple issues
RQ1: Occurrence of composite code changes
92%
7%
1%
Ant
82%
12%
6%
Commons Math
71%
18%
11%
JFreeChart
1 issue
2 issues
> 2 issues
Approach
A set of changed
statements
Partition of the change
(A subset is a change-slice)
A composite
code change
Identify
related
statements
8
Approach Formatting
Dependency
Similarity
9
A set of changed
statements
A composite
code change
Partition of the change
(A subset is a change-slice)
Approach
Partition of the patch
Unix diff
(text differencing)
ChangeDistiller*
(AST differencing)
*http://www.ifi.uzh.ch/seal/research
/tools/changeDistiller.html
Formatting
changes
10
Formatting
Dependency
Similarity
A set of changed
statements
A composite
code change
Approach
Partition of the patch
IBM T.J. Watson
Libraries for Analysis
(WALA)*
Inter-procedural
Backward static slicing
*http://wala.sourceforge.net/wiki/in
dex.php/Main_Page
11
Formatting
Dependency
Similarity
A set of changed
statements
A composite
code change
Approach
Partition of the patch“protect array entries against
corruption by returning a clone”
Same change type
Similar delta
12
Formatting
Dependency
Similarity
A set of changed
statements
A composite
code change
P
P’
Evaluation
• 78 composite code changes from the previously inspected data
• 3 human evaluators establish manual partitions for these changes
• Automatic partition results are compared to manual partitions
• Considered acceptableif it exactly matched the manual partition
82%
13%
5%
Xerces
92%
7%
1%
Ant
82%
12%
6%
Commons Math
71%
18%
11%
JFreeChart
1 issue
2 issues
> 2 issues
Evaluation
• 78 composite code changes from the previously inspected data
• 3 human evaluators establish manual partitions for these changes
• Automatic partition results are compared to manual partitions
• Considered acceptableif it exactly matched the manual partition
82%
13%
5%
Xerces
92%
7%
1%
Ant
82%
12%
6%
Commons Math
71%
18%
11%
JFreeChart
1 issue
2 issues
> 2 issues
Evaluation
• 78 composite code changes from the previously inspected data
• 3 human evaluators establish manual partitions for these changes
• Automatic partition results are compared to manual partitions
• Considered acceptableif it exactly matched the manual partition
Acceptable # / Total #
Ant 8 / 11
Commons Math 10 / 19
Xerces 16 / 21
JFreeChart 20 / 27
54 / 78 (69%)
Ant revision 943068 (24 changed LOC)
“Wrong assignment after I renamed the parameter. Unfortunately
there doesn’t seem to be a testcase that catches the error.”
Later fixed in revision 943070
17
One of the two change-slices after partitioning
Preliminary User Study
• RQ3
• Can our automatic partition help developers better review composite changes?
• Participants
• 18 CS graduate students
• Task
• Participants review 12 composite code changes
• Answer a series of code review questions [1], e.g.,
• “What is the consequence of removing the schemaType field?”
• “What do changes in these files have in common?”
18
[1]“Questions programmers ask during software evolution tasks” Sillito et al. FSE 2006
Experimental Settings
• Treatments
• Control group: review code changes by file
• Experimental group: review code changes by partition
19
Results
By file By partition By file By partition
20
p = 0.01 p = 0.81
Formatting
Dependency
Similarity
Composite
Code Changes
Partition of
the change
8% - 29% 69%
By file By partition
Discussion
•Impact of unsatisfactory change partitions
•Balancing between partition costs and benefits
Related Work
• Helping developers help themselves: Automatic decomposition of
code review changesets. Barnett et al. ICSE 2015
• The industrial perspective of composite code changes
• The impact of tangled code changes. Kim Herzig and Andreas Zeller.
MSR 2013
• Filtering Noise in Mixed-Purpose Fixing Commits to Improve Defect
Prediction and Localization. Nguyen et al. ISSRE 2013
• How composite code changes affect defect prediction
Acknowledgment
• We thank all the students that participated in our user study.
• We especially thank Tao He and Hai Wan for their kind help of
arranging the user study.
• We also thank Ananya Kanjilal for her comments on the paper draft.

More Related Content

What's hot

The Impact of Test Ownership and Team Structure on the Reliability and Effect...
The Impact of Test Ownership and Team Structure on the Reliability and Effect...The Impact of Test Ownership and Team Structure on the Reliability and Effect...
The Impact of Test Ownership and Team Structure on the Reliability and Effect...
Kim Herzig
 
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug PredictionIt's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
sjust
 
Cser13.ppt
Cser13.pptCser13.ppt
Cser13.ppt
Ptidej Team
 
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
Ali Ouni
 
Automated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsAutomated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise Applications
SAIL_QU
 
Icsm19.ppt
Icsm19.pptIcsm19.ppt
Review Participation in Modern Code Review: An Empirical Study of the Android...
Review Participation in Modern Code Review: An Empirical Study of the Android...Review Participation in Modern Code Review: An Empirical Study of the Android...
Review Participation in Modern Code Review: An Empirical Study of the Android...
The University of Adelaide
 
Who Should Review My Code?
Who Should Review My Code?  Who Should Review My Code?
Who Should Review My Code?
The University of Adelaide
 
Enase20.ppt
Enase20.pptEnase20.ppt
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
Sung Kim
 
Detecting Interaction Coupling from Task Interaction Histories
Detecting Interaction Coupling from Task Interaction HistoriesDetecting Interaction Coupling from Task Interaction Histories
Detecting Interaction Coupling from Task Interaction Histories
SAIL_QU
 
A Mono- and Multi-objective Approach for Recommending Software Refactoring
A Mono- and Multi-objective Approach for Recommending Software RefactoringA Mono- and Multi-objective Approach for Recommending Software Refactoring
A Mono- and Multi-objective Approach for Recommending Software Refactoring
Ali Ouni
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
Sung Kim
 
Improving Code Review Effectiveness Through Reviewer Recommendations
Improving Code Review Effectiveness Through Reviewer RecommendationsImproving Code Review Effectiveness Through Reviewer Recommendations
Improving Code Review Effectiveness Through Reviewer Recommendations
The University of Adelaide
 
INGI2252 Software Measures & Maintenance
INGI2252 Software Measures & MaintenanceINGI2252 Software Measures & Maintenance
INGI2252 Software Measures & Maintenance
kim.mens
 
Software testing: an introduction - 2017
Software testing: an introduction - 2017Software testing: an introduction - 2017
Software testing: an introduction - 2017
XavierDevroey
 
Exploring neXtProt data and beyond: A SPARQLing solution
Exploring neXtProt data and beyond: A SPARQLing solutionExploring neXtProt data and beyond: A SPARQLing solution
Exploring neXtProt data and beyond: A SPARQLing solution
neXtProt
 
Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...
The University of Adelaide
 
Icsm20.ppt
Icsm20.pptIcsm20.ppt
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
Ali Ouni
 

What's hot (20)

The Impact of Test Ownership and Team Structure on the Reliability and Effect...
The Impact of Test Ownership and Team Structure on the Reliability and Effect...The Impact of Test Ownership and Team Structure on the Reliability and Effect...
The Impact of Test Ownership and Team Structure on the Reliability and Effect...
 
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug PredictionIt's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
It's Not a Bug, It's a Feature — How Misclassification Impacts Bug Prediction
 
Cser13.ppt
Cser13.pptCser13.ppt
Cser13.ppt
 
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
ICGSE2020: On the Detection of Community Smells Using Genetic Programming-bas...
 
Automated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsAutomated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise Applications
 
Icsm19.ppt
Icsm19.pptIcsm19.ppt
Icsm19.ppt
 
Review Participation in Modern Code Review: An Empirical Study of the Android...
Review Participation in Modern Code Review: An Empirical Study of the Android...Review Participation in Modern Code Review: An Empirical Study of the Android...
Review Participation in Modern Code Review: An Empirical Study of the Android...
 
Who Should Review My Code?
Who Should Review My Code?  Who Should Review My Code?
Who Should Review My Code?
 
Enase20.ppt
Enase20.pptEnase20.ppt
Enase20.ppt
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
 
Detecting Interaction Coupling from Task Interaction Histories
Detecting Interaction Coupling from Task Interaction HistoriesDetecting Interaction Coupling from Task Interaction Histories
Detecting Interaction Coupling from Task Interaction Histories
 
A Mono- and Multi-objective Approach for Recommending Software Refactoring
A Mono- and Multi-objective Approach for Recommending Software RefactoringA Mono- and Multi-objective Approach for Recommending Software Refactoring
A Mono- and Multi-objective Approach for Recommending Software Refactoring
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
 
Improving Code Review Effectiveness Through Reviewer Recommendations
Improving Code Review Effectiveness Through Reviewer RecommendationsImproving Code Review Effectiveness Through Reviewer Recommendations
Improving Code Review Effectiveness Through Reviewer Recommendations
 
INGI2252 Software Measures & Maintenance
INGI2252 Software Measures & MaintenanceINGI2252 Software Measures & Maintenance
INGI2252 Software Measures & Maintenance
 
Software testing: an introduction - 2017
Software testing: an introduction - 2017Software testing: an introduction - 2017
Software testing: an introduction - 2017
 
Exploring neXtProt data and beyond: A SPARQLing solution
Exploring neXtProt data and beyond: A SPARQLing solutionExploring neXtProt data and beyond: A SPARQLing solution
Exploring neXtProt data and beyond: A SPARQLing solution
 
Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...
 
Icsm20.ppt
Icsm20.pptIcsm20.ppt
Icsm20.ppt
 
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
A Multi-Objective Refactoring Approach to Introduce Design Patterns and Fix A...
 

Viewers also liked

Best of tomhanks
Best of tomhanksBest of tomhanks
Best of tomhanksclicktoseo
 
Sustainability with Regards to coal energy Production
Sustainability with Regards to coal energy ProductionSustainability with Regards to coal energy Production
Sustainability with Regards to coal energy ProductionSangeen Jogezai
 
Writing acceptable patches: an empirical study of open source project patches
Writing acceptable patches: an empirical study of open source project patchesWriting acceptable patches: an empirical study of open source project patches
Writing acceptable patches: an empirical study of open source project patches
Yida Tao
 
How do software engineers understand code changes?
How do software engineers understand code changes?How do software engineers understand code changes?
How do software engineers understand code changes?
Yida Tao
 
Ch06 records management slide show part 2 with notes
Ch06 records management slide show part 2 with notesCh06 records management slide show part 2 with notes
Ch06 records management slide show part 2 with notes
francarter2
 
Automatically generated patches as debugging aids
Automatically generated patches as debugging aidsAutomatically generated patches as debugging aids
Automatically generated patches as debugging aids
Yida Tao
 
Kemal göktaş
Kemal göktaş Kemal göktaş
Kemal göktaş
kemal99
 
Cultura de bolivia
Cultura de boliviaCultura de bolivia
Cultura de boliviaSanncal
 
Sinif yöneti̇mi̇
Sinif yöneti̇mi̇Sinif yöneti̇mi̇
Sinif yöneti̇mi̇kemal99
 
Gambar Teknik
Gambar TeknikGambar Teknik
Gambar Teknik
dani senja
 
Neoimpressionism and postimpressionism
Neoimpressionism and postimpressionismNeoimpressionism and postimpressionism
Neoimpressionism and postimpressionismnaranjoisabel
 
Conversation Techniques
Conversation TechniquesConversation Techniques
Conversation Techniques
Lu Gines
 
Aplikasi 5R dan tahap-tahapnya
Aplikasi 5R dan tahap-tahapnyaAplikasi 5R dan tahap-tahapnya
Aplikasi 5R dan tahap-tahapnya
dani senja
 

Viewers also liked (15)

Best of tomhanks
Best of tomhanksBest of tomhanks
Best of tomhanks
 
Sustainability with Regards to coal energy Production
Sustainability with Regards to coal energy ProductionSustainability with Regards to coal energy Production
Sustainability with Regards to coal energy Production
 
Writing acceptable patches: an empirical study of open source project patches
Writing acceptable patches: an empirical study of open source project patchesWriting acceptable patches: an empirical study of open source project patches
Writing acceptable patches: an empirical study of open source project patches
 
Tema audio
Tema audioTema audio
Tema audio
 
How do software engineers understand code changes?
How do software engineers understand code changes?How do software engineers understand code changes?
How do software engineers understand code changes?
 
Ch06 records management slide show part 2 with notes
Ch06 records management slide show part 2 with notesCh06 records management slide show part 2 with notes
Ch06 records management slide show part 2 with notes
 
Automatically generated patches as debugging aids
Automatically generated patches as debugging aidsAutomatically generated patches as debugging aids
Automatically generated patches as debugging aids
 
Kemal göktaş
Kemal göktaş Kemal göktaş
Kemal göktaş
 
Cultura de bolivia
Cultura de boliviaCultura de bolivia
Cultura de bolivia
 
Sinif yöneti̇mi̇
Sinif yöneti̇mi̇Sinif yöneti̇mi̇
Sinif yöneti̇mi̇
 
Gambar Teknik
Gambar TeknikGambar Teknik
Gambar Teknik
 
Neoimpressionism and postimpressionism
Neoimpressionism and postimpressionismNeoimpressionism and postimpressionism
Neoimpressionism and postimpressionism
 
Conversation Techniques
Conversation TechniquesConversation Techniques
Conversation Techniques
 
Aplikasi 5R dan tahap-tahapnya
Aplikasi 5R dan tahap-tahapnyaAplikasi 5R dan tahap-tahapnya
Aplikasi 5R dan tahap-tahapnya
 
Tarun hait cv
Tarun hait cvTarun hait cv
Tarun hait cv
 

Similar to Partitioning composite code changes to facilitate code review

Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
Lionel Briand
 
Incremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical SystemsIncremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical Systems
Ákos Horváth
 
Using HPC Resources to Exploit Big Data for Code Review Analytics
Using HPC Resources to Exploit Big Data for Code Review AnalyticsUsing HPC Resources to Exploit Big Data for Code Review Analytics
Using HPC Resources to Exploit Big Data for Code Review Analytics
The University of Adelaide
 
Incremental Model Queries for Model-Dirven Software Engineering
Incremental Model Queries for Model-Dirven Software EngineeringIncremental Model Queries for Model-Dirven Software Engineering
Incremental Model Queries for Model-Dirven Software Engineering
Ákos Horváth
 
A Tool for Optimizing Java 8 Stream Software via Automated Refactoring
A Tool for Optimizing Java 8 Stream Software via Automated RefactoringA Tool for Optimizing Java 8 Stream Software via Automated Refactoring
A Tool for Optimizing Java 8 Stream Software via Automated Refactoring
Raffi Khatchadourian
 
PhD Proposal talk
PhD Proposal talkPhD Proposal talk
PhD Proposal talkRay Buse
 
Improving Code Quality In Medical Software Through Code Reviews - Vincit Teat...
Improving Code Quality In Medical Software Through Code Reviews - Vincit Teat...Improving Code Quality In Medical Software Through Code Reviews - Vincit Teat...
Improving Code Quality In Medical Software Through Code Reviews - Vincit Teat...
VincitOy
 
Scam2010 thomas presentation
Scam2010 thomas presentationScam2010 thomas presentation
Scam2010 thomas presentationSAIL_QU
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Amine Barrak
 
Improving the accuracy and reliability of data analysis code
Improving the accuracy and reliability of data analysis codeImproving the accuracy and reliability of data analysis code
Improving the accuracy and reliability of data analysis code
Johan Carlin
 
TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...
TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...
TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...
Iosif Itkin
 
Synthesizing Knowledge from Software Development Artifacts
Synthesizing Knowledge from Software Development ArtifactsSynthesizing Knowledge from Software Development Artifacts
Synthesizing Knowledge from Software Development Artifacts
Jeongwhan Choi
 
AOUG_11Nov2016_Challenges_with_EBS12_2
AOUG_11Nov2016_Challenges_with_EBS12_2AOUG_11Nov2016_Challenges_with_EBS12_2
AOUG_11Nov2016_Challenges_with_EBS12_2Sean Braymen
 
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
Daniel Varro
 
Code Management Workshop
Code Management WorkshopCode Management Workshop
Code Management Workshop
Sameh El-Ashry
 
Software Coding- Software Coding
Software Coding- Software CodingSoftware Coding- Software Coding
Software Coding- Software Coding
Nikhil Pandit
 
Bug prediction + sdlc automation
Bug prediction + sdlc automationBug prediction + sdlc automation
Bug prediction + sdlc automation
Alexey Tokar
 
Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories
Dániel Stein
 
How to improve the quality of your application
How to improve the quality of your applicationHow to improve the quality of your application
How to improve the quality of your application
EUR ING Ioannis Kolaxis MSc
 

Similar to Partitioning composite code changes to facilitate code review (20)

poster_3.0
poster_3.0poster_3.0
poster_3.0
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Incremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical SystemsIncremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical Systems
 
Using HPC Resources to Exploit Big Data for Code Review Analytics
Using HPC Resources to Exploit Big Data for Code Review AnalyticsUsing HPC Resources to Exploit Big Data for Code Review Analytics
Using HPC Resources to Exploit Big Data for Code Review Analytics
 
Incremental Model Queries for Model-Dirven Software Engineering
Incremental Model Queries for Model-Dirven Software EngineeringIncremental Model Queries for Model-Dirven Software Engineering
Incremental Model Queries for Model-Dirven Software Engineering
 
A Tool for Optimizing Java 8 Stream Software via Automated Refactoring
A Tool for Optimizing Java 8 Stream Software via Automated RefactoringA Tool for Optimizing Java 8 Stream Software via Automated Refactoring
A Tool for Optimizing Java 8 Stream Software via Automated Refactoring
 
PhD Proposal talk
PhD Proposal talkPhD Proposal talk
PhD Proposal talk
 
Improving Code Quality In Medical Software Through Code Reviews - Vincit Teat...
Improving Code Quality In Medical Software Through Code Reviews - Vincit Teat...Improving Code Quality In Medical Software Through Code Reviews - Vincit Teat...
Improving Code Quality In Medical Software Through Code Reviews - Vincit Teat...
 
Scam2010 thomas presentation
Scam2010 thomas presentationScam2010 thomas presentation
Scam2010 thomas presentation
 
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
 
Improving the accuracy and reliability of data analysis code
Improving the accuracy and reliability of data analysis codeImproving the accuracy and reliability of data analysis code
Improving the accuracy and reliability of data analysis code
 
TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...
TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...
TMPA-2015: The Application of Parameterized Hierarchy Templates for Automated...
 
Synthesizing Knowledge from Software Development Artifacts
Synthesizing Knowledge from Software Development ArtifactsSynthesizing Knowledge from Software Development Artifacts
Synthesizing Knowledge from Software Development Artifacts
 
AOUG_11Nov2016_Challenges_with_EBS12_2
AOUG_11Nov2016_Challenges_with_EBS12_2AOUG_11Nov2016_Challenges_with_EBS12_2
AOUG_11Nov2016_Challenges_with_EBS12_2
 
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
 
Code Management Workshop
Code Management WorkshopCode Management Workshop
Code Management Workshop
 
Software Coding- Software Coding
Software Coding- Software CodingSoftware Coding- Software Coding
Software Coding- Software Coding
 
Bug prediction + sdlc automation
Bug prediction + sdlc automationBug prediction + sdlc automation
Bug prediction + sdlc automation
 
Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories
 
How to improve the quality of your application
How to improve the quality of your applicationHow to improve the quality of your application
How to improve the quality of your application
 

Recently uploaded

AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AwangAniqkmals
 
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
OECD Directorate for Financial and Enterprise Affairs
 
María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024
eCommerce Institute
 
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Dutch Power
 
Media as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern EraMedia as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern Era
faizulhassanfaiz1670
 
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptxsomanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
Howard Spence
 
Burning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdfBurning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdf
kkirkland2
 
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
0x01 - Newton's Third Law:  Static vs. Dynamic Abusers0x01 - Newton's Third Law:  Static vs. Dynamic Abusers
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
OWASP Beja
 
Acorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutesAcorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutes
IP ServerOne
 
Gregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptxGregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptx
gharris9
 
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Dutch Power
 
Tom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issueTom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issue
amekonnen
 
International Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software TestingInternational Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software Testing
Sebastiano Panichella
 
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Sebastiano Panichella
 
Bitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXOBitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXO
Matjaž Lipuš
 
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Sebastiano Panichella
 
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdfSupercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Access Innovations, Inc.
 
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdfBonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
khadija278284
 
Getting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control TowerGetting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control Tower
Vladimir Samoylov
 
Obesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditionsObesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditions
Faculty of Medicine And Health Sciences
 

Recently uploaded (20)

AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
 
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
 
María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024
 
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
 
Media as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern EraMedia as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern Era
 
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptxsomanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
 
Burning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdfBurning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdf
 
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
0x01 - Newton's Third Law:  Static vs. Dynamic Abusers0x01 - Newton's Third Law:  Static vs. Dynamic Abusers
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
 
Acorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutesAcorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutes
 
Gregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptxGregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptx
 
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
 
Tom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issueTom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issue
 
International Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software TestingInternational Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software Testing
 
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
 
Bitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXOBitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXO
 
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...
 
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdfSupercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
 
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdfBonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
 
Getting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control TowerGetting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control Tower
 
Obesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditionsObesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditions
 

Partitioning composite code changes to facilitate code review

  • 1. Partitioning Composite Code Changes to Facilitate Code Review Yida Tao and Sunghun Kim The Hong Kong University of Science and Technology
  • 3. Atomic Code Change Fixed bug #12, #34, #56 Removed duplicate code Add a feature Javadoc updated Composite Code Change Fixed bug #123
  • 4. Atomic Code Change Fixed bug #12, #34, #56 Removed duplicate code Add a feature Javadoc updated Composite Code Change Fixed bug #123 Difficult to review Likely be rejected
  • 5. Research Questions • RQ1: Are composite code changes prevalent? • RQ2: Can we propose an approach to improve the semantic atomicity of composite code changes? • RQ3: Can our approach help developers better review composite code changes?
  • 6. RQ1: Occurrence of composite code changes • Data source • 4 open-source Java projects • Revisions that changed >= 2 lines of code • Commit logs and source code were manually inspected 6 Time period Revisions Avg. cLOC Avg. files Ant 2010/04/27 -- 2012/03/05 137 26.1 2.0 Commons Math 2011/11/28 -- 2012/04/12 107 84.7 3.5 Xerces 2008/11/03 -- 2012/03/13 116 63.6 3.0 JFreeChart 2008/07/02 -- 2010/03/30 93 144.9 4.1 Total revisions 453
  • 7. 82% 13% 5% Xerces 7 8% - 29% revisions address multiple issues RQ1: Occurrence of composite code changes 92% 7% 1% Ant 82% 12% 6% Commons Math 71% 18% 11% JFreeChart 1 issue 2 issues > 2 issues
  • 8. Approach A set of changed statements Partition of the change (A subset is a change-slice) A composite code change Identify related statements 8
  • 9. Approach Formatting Dependency Similarity 9 A set of changed statements A composite code change Partition of the change (A subset is a change-slice)
  • 10. Approach Partition of the patch Unix diff (text differencing) ChangeDistiller* (AST differencing) *http://www.ifi.uzh.ch/seal/research /tools/changeDistiller.html Formatting changes 10 Formatting Dependency Similarity A set of changed statements A composite code change
  • 11. Approach Partition of the patch IBM T.J. Watson Libraries for Analysis (WALA)* Inter-procedural Backward static slicing *http://wala.sourceforge.net/wiki/in dex.php/Main_Page 11 Formatting Dependency Similarity A set of changed statements A composite code change
  • 12. Approach Partition of the patch“protect array entries against corruption by returning a clone” Same change type Similar delta 12 Formatting Dependency Similarity A set of changed statements A composite code change P P’
  • 13.
  • 14. Evaluation • 78 composite code changes from the previously inspected data • 3 human evaluators establish manual partitions for these changes • Automatic partition results are compared to manual partitions • Considered acceptableif it exactly matched the manual partition 82% 13% 5% Xerces 92% 7% 1% Ant 82% 12% 6% Commons Math 71% 18% 11% JFreeChart 1 issue 2 issues > 2 issues
  • 15. Evaluation • 78 composite code changes from the previously inspected data • 3 human evaluators establish manual partitions for these changes • Automatic partition results are compared to manual partitions • Considered acceptableif it exactly matched the manual partition 82% 13% 5% Xerces 92% 7% 1% Ant 82% 12% 6% Commons Math 71% 18% 11% JFreeChart 1 issue 2 issues > 2 issues
  • 16. Evaluation • 78 composite code changes from the previously inspected data • 3 human evaluators establish manual partitions for these changes • Automatic partition results are compared to manual partitions • Considered acceptableif it exactly matched the manual partition Acceptable # / Total # Ant 8 / 11 Commons Math 10 / 19 Xerces 16 / 21 JFreeChart 20 / 27 54 / 78 (69%)
  • 17. Ant revision 943068 (24 changed LOC) “Wrong assignment after I renamed the parameter. Unfortunately there doesn’t seem to be a testcase that catches the error.” Later fixed in revision 943070 17 One of the two change-slices after partitioning
  • 18. Preliminary User Study • RQ3 • Can our automatic partition help developers better review composite changes? • Participants • 18 CS graduate students • Task • Participants review 12 composite code changes • Answer a series of code review questions [1], e.g., • “What is the consequence of removing the schemaType field?” • “What do changes in these files have in common?” 18 [1]“Questions programmers ask during software evolution tasks” Sillito et al. FSE 2006
  • 19. Experimental Settings • Treatments • Control group: review code changes by file • Experimental group: review code changes by partition 19
  • 20. Results By file By partition By file By partition 20 p = 0.01 p = 0.81
  • 22. Discussion •Impact of unsatisfactory change partitions •Balancing between partition costs and benefits
  • 23. Related Work • Helping developers help themselves: Automatic decomposition of code review changesets. Barnett et al. ICSE 2015 • The industrial perspective of composite code changes • The impact of tangled code changes. Kim Herzig and Andreas Zeller. MSR 2013 • Filtering Noise in Mixed-Purpose Fixing Commits to Improve Defect Prediction and Localization. Nguyen et al. ISSRE 2013 • How composite code changes affect defect prediction
  • 24. Acknowledgment • We thank all the students that participated in our user study. • We especially thank Tao He and Hai Wan for their kind help of arranging the user study. • We also thank Ananya Kanjilal for her comments on the paper draft.

Editor's Notes

  1. The focus of this session, and of course this work, is code review, which in modern software development typically practiced on lightweight code changes. Characteristics of changes directly affect how code review is practice
  2. Considering two types of code changes/commits/revisions.
  3. The second change fixed 3 bugs, did some refactoring…. Which one is more acceptable from a code reviewers’ perspective?
  4. From our previous study, we found evidences that CCC are … compared to atomic changes. And this is the particular problem we try to tackle with in this work
  5. Specifically, we want to answer these 3 questions.
  6. We first investigate whether the problem of CCC is really prominent by taking a look the revision history of these 4 OS projects. We inspected revisions that changed at least 2 lines of code within this time period, that results in 453 revisions in total We mainly inspect the commit log to see how many logical issues are addressed. Filter criteria: multiple lines of source code changes, compilable,
  7. Here is what we observed. Blue means the revision is atomic, that is, it addresses one single issue. And orange and grey mean that the revision addresses 2 or more than 2 issues. Among the revisions that changed multiple source code lines, from 8% - 29% , on average 17% patches are composite, that is, they address 2 or more logical issues. non-trivial
  8. So next, we propose an approach to partition a composite code change, so that in the end each subset is more semantically cohesive and hopefully easier to review. First, a code change is modeled as a set of changed statements. Basically, we identify relations between statements to derive change partition. Statements in the same partition (change-slice) are related while statements in different subsets are not Now the core of our approach is this black box: how do we …
  9. We use 3 heuristics here
  10. First, we believe that formatting changes should be on their own instead of mingled with other type of changes, such as bug fixes. So we compute textual and AST differencing results of the change. Changed lines that are not identified in AST differencing but in the textual differencing are considered as formatting changes
  11. Second, we believe that changed statements that have static dependency should belong to the same subset. We use WALA to perform … on changed lines so that we know whether two changes have static dependency.
  12. Third, we believe that changes with similar patterns are also related. For example, these two functions both change the returning of a variable the same way. Although they don’t have static dependency, it’s obvious that they are serving the same purpose… So, as the 3rd heuristic, we check if two changes have same change type (e.g., if they both update the return statement, which we can tell from its AST node type) If so, whether they have string delta is above a threshold (0.8 by python library default)
  13. Here is an example of how our approach works. Before the change and after the change. Some code is added with the plus sign, some is deleted with the minus sign, others are modified with the caret sign Based on our heuristics, this change is partitioned into 3 change-slices, marked with different colors. First, the orange part is formatting changes Another change-slice is the green part of the change, … Finally, these two added methods also form a partition cuz their method names are similar.
  14. Remember in RQ1, we inspected over 400 revisions and some of them address multiple logical issues? We used these 78 revisions, marked in yellow and grey, in our evaluation.
  15. We compare the automatic partitions produced by our approach to the manual partitions. An automatic partition for a change is considered acceptable if … We are not claiming that the manual partition by human evaluators is the only way of partitioning a change, because code semantics are subject to …. In other words, the solution space for partitioning is potentially infinite. However, our goal is to facilitate code review. So if our automatic partition results match a common way of manual partition, then we say it’s acceptable.
  16. Among the 78 CCC, 54 are automatically partitioned the same way as human evaluators did. Yet, this brings up our third research question: does this automatic partition really help code review?
  17. We’ve observed several cases that indicate the benefits of our approach. Here is one of them. Our approach partitions a Ant revision, which changes 24 LOC, into 2 subsets. One subset pinpoints a buggy area, which was later fixed by developers. Specifically, developers said “no testcase…” We think that our patch partition may help make this bug more obvious and identifiable from a big, composite code change.
  18. We also conduct a user study to investigate…? We recruit … to review 12 composite … from our evaluation data and answer
  19. There are 2 treatments. The control condition is that participants review … by file, as they usually do The experimental condition is that .. by partition
  20. We analyze the correctness of their answers and the time they spent on coming up the answers. We found that the group using partition to review patches performs significantly better than the control group in answering questions correctly, although they spend a slightly, but not significantly longer time. This result is also promising in showing that our approach potentially help the patch review process One-tailed Paired-samples t-test Time no significant different. Participants even spent a slightly longer time under the by slice treatment. We conjecture that such a time increase mainly resulted from participants’ additional learning cost for using the unfamiliar “partitions” to understand patches.
  21. 69% composite changes are automatically partitioned the same way as human evaluators did Our user study shows Participants’ code review effectiveness is significantly improved
  22. Finally, I would like to bring up these issues that are highly relevant and important, yet we haven’t been able to address them in this work yet. For example …. I think these would to interesting to discussion and I’ll leave it to the discussion slot. Cost: learning cost, execution cost Hypothesis: size and complexity