SlideShare a Scribd company logo
1 of 22
PREDICTING USEFULNESS OF CODE REVIEW
COMMENTS USING TEXTUAL FEATURES AND
DEVELOPER EXPERIENCE
Mohammad Masudur Rahman, Chanchal K. Roy, and
Raula G. Kula*
University of Saskatchewan, Canada,
Osaka University, Japan*
14th International Conference on Mining Software
Repositories (MSR 2017), Buenos Aires, Argentina
RESEARCH PROBLEM: USEFULNESS OF CODE
REVIEW COMMENTS
2
 What makes a review comment
useful or non-useful?
 34.5% of review comments are non-
useful at Microsoft (Bosu et al., MSR 2015)
 No automated support to detect or
improve such comments so far
STUDY METHODOLOGY
3
1,482 Review
comments (4 systems)
Manual tagging with
Bosu et al., MSR 2015
Non-useful
comments (602)
Useful
comments (880)
(1)
Comparative
study
(2)
Prediction
model
COMPARATIVE STUDY: VARIABLES
Independent Variables (8) Response Variable (1)
Reading Ease Textual
Comment Usefulness
(Yes / No)
Stop word Ratio Textual
Question Ratio Textual
Code Element Ratio Textual
Conceptual Similarity Textual
Code Authorship Experience
Code Reviewership Experience
External Lib. Experience Experience
4
 Contrast between useful and non-useful comments.
 Two paradigms– comment texts, and
commenter’s/developer’s experience
 Answers two RQs related to two paradigms.
ANSWERING RQ1: READING EASE
 Flesch-Kincaid Reading Ease applied.
 No significant difference between useful and non-
useful review comments.
5
ANSWERING RQ1: STOP WORD RATIO
 Used Google stop word list and Python keywords.
 Stop word ratio = #stop or keywords/#all words from a
review comment
 Non-useful comments contain more stop words than
useful comments, i.e., statistically significant.
6
ANSWERING RQ1: QUESTION RATIO
 Developers treat clarification questions as non-useful
review comments.
 Question ratio = #questions/#sentences of a comment.
 No significant difference between useful and non-useful
comments in question ratio.
7
ANSWERING RQ1: CODE ELEMENT RATIO
 Important code elements (e.g., identifiers) in the
comments texts, possibly trigger the code change.
 Code element ratio = #source tokens/#all tokens
 Useful comments > non-useful comments for code
element ratio, i.e., statistically significant.
8
ANSWERING RQ1: CONCEPTUAL SIMILARITY
BETWEEN COMMENTS & CHANGED CODE
 How relevant the comment is with the changed code?
 Do comments & changed code share vocabularies?
 Yes, useful comments do more sharing than non-useful ones,
i.e., statistically significant.
9
ANSWERING RQ2: CODE AUTHORSHIP
 File level authorship did not make much difference, a
bit counter-intuitive.
 Project level authorship differs between useful and
non-useful comments, mostly for Q2 and Q3
10
ANSWERING RQ2: CODE REVIEWERSHIP
 Does reviewing experience matter in providing useful
comments?
 Yes, it does. File level reviewing experience matters. Especially
true for Q2 and Q3.
 Experienced reviewers provide more useful comments than non-
useful comments.
11
ANSWERING RQ2: EXT. LIB. EXPERIENCE
 Familiarity with the library used in the changed
code for which comment is posted.
 Significantly higher for the authors of useful
comments for Q3 only.
12
SUMMARY OF COMPARATIVE STUDY
13
RQ Independent Variables Useful vs. Non-useful
Difference
RQ1
Reading Ease Not significant
Stop word Ratio Significant
Question Ratio Not significant
Code Element Ratio Significant
Conceptual Similarity Significant
RQ2
Code Authorship Somewhat significant
Code Reviewership Significant
External Lib. Experience Somewhat significant
EXPERIMENTAL DATASET & SETUP
14
1,482 code review
comments
Evaluation set
(1,116)
Validation set
(366)
Model training &
cross-validation
Validation with
unseen comments
REVHELPER: USEFULNESS PREDICTION MODEL
15
Review
comments
Manual classification
using Bosu et al.
Useful & non-useful
comments Model training
Prediction
model
New review comment
 Prediction of usefulness for a new review
comment to be submitted.
 Applied three ML algorithms– NB, LR, and RF
 Evaluation & validation with different data sets
 Answered 3 RQs– RQ3, RQ4 and RQ5
ANSWERING RQ3: MODEL PERFORMANCE
Learning Algorithm Useful Comments Non-useful Comments
Precision Recall Precision Recall
Naïve Bayes 61.30% 66.00% 53.30% 48.20%
Logistic Regression 60.70% 71.40% 54.60% 42.80%
Random Forest 67.93% 75.04% 63.06% 54.54%
16
 Random Forest based model performs the best.
 Both F1-score and accuracy 66%.
 Comment usefulness and features are not linearly
correlated.
 As a primer, this prediction could be useful.
ANSWERING RQ4: ROLE OF PARADIGMS
17
ANSWERING RQ4: ROLE OF PARADIGMS
18
ANSWERING RQ5: COMPARISON WITH
BASELINE (VALIDATION)
19
ANSWERING RQ5: COMPARISON WITH
BASELINE (ROC)
20
TAKE-HOME MESSAGES
 Usefulness of review comments is complex but a
much needed piece of information.
 No automated support available so far to predict
usefulness of review comments instantly.
 Non-useful comments are significantly different from
useful comments in several textual features (e.g.,
conceptual similarity)
 Reviewing experience matters in providing useful
review comments.
 Our prediction model can predict the usefulness of a
new review comment.
 RevHelper performs better than random guessing and
available alternatives.
21
THANK YOU!! QUESTIONS?
22
Email: chanchal.roy@usask.ca or
masud.rahman@usask.ca

More Related Content

What's hot

Equivalence class testing
Equivalence  class testingEquivalence  class testing
Equivalence class testingMani Kanth
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directionsTao He
 
A software fault localization technique based on program mutations
A software fault localization technique based on program mutationsA software fault localization technique based on program mutations
A software fault localization technique based on program mutationsTao He
 
Cs6502 ooad-cse-vst-au-unit-v dce
Cs6502 ooad-cse-vst-au-unit-v dceCs6502 ooad-cse-vst-au-unit-v dce
Cs6502 ooad-cse-vst-au-unit-v dcetagoreengineering
 
Dynamic analysis in Software Testing
Dynamic analysis in Software TestingDynamic analysis in Software Testing
Dynamic analysis in Software TestingSagar Pednekar
 
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...CS, NcState
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving softwareSung Kim
 
Boundary value analysis and equivalence partitioning
Boundary value analysis and equivalence partitioningBoundary value analysis and equivalence partitioning
Boundary value analysis and equivalence partitioningSneha Singh
 
Formal Method for Avionics Software Verification
 Formal Method for Avionics Software Verification Formal Method for Avionics Software Verification
Formal Method for Avionics Software VerificationAdaCore
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesSung Kim
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel GrichiPtidej Team
 

What's hot (17)

Equivalence class testing
Equivalence  class testingEquivalence  class testing
Equivalence class testing
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directions
 
Driven to Tests
Driven to TestsDriven to Tests
Driven to Tests
 
Design Pattern
Design PatternDesign Pattern
Design Pattern
 
A software fault localization technique based on program mutations
A software fault localization technique based on program mutationsA software fault localization technique based on program mutations
A software fault localization technique based on program mutations
 
Cs6502 ooad-cse-vst-au-unit-v dce
Cs6502 ooad-cse-vst-au-unit-v dceCs6502 ooad-cse-vst-au-unit-v dce
Cs6502 ooad-cse-vst-au-unit-v dce
 
Dynamic analysis in Software Testing
Dynamic analysis in Software TestingDynamic analysis in Software Testing
Dynamic analysis in Software Testing
 
Learning Curve
Learning CurveLearning Curve
Learning Curve
 
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Predic...
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving software
 
WCRE11b.ppt
WCRE11b.pptWCRE11b.ppt
WCRE11b.ppt
 
Boundary value analysis and equivalence partitioning
Boundary value analysis and equivalence partitioningBoundary value analysis and equivalence partitioning
Boundary value analysis and equivalence partitioning
 
Cser13.ppt
Cser13.pptCser13.ppt
Cser13.ppt
 
Icsm20.ppt
Icsm20.pptIcsm20.ppt
Icsm20.ppt
 
Formal Method for Avionics Software Verification
 Formal Method for Avionics Software Verification Formal Method for Avionics Software Verification
Formal Method for Avionics Software Verification
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution Techniques
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel Grichi
 

Similar to MSR2017-RevHelper

Code-Review-COW56-Meeting
Code-Review-COW56-MeetingCode-Review-COW56-Meeting
Code-Review-COW56-MeetingMasud Rahman
 
CodeInsight-SCAM2015
CodeInsight-SCAM2015CodeInsight-SCAM2015
CodeInsight-SCAM2015Masud Rahman
 
Speculative analysis for comment quality assessment
Speculative analysis for comment quality assessmentSpeculative analysis for comment quality assessment
Speculative analysis for comment quality assessmentPooja Rani
 
CS 584 - Aligning development tools with the way programmers think about code...
CS 584 - Aligning development tools with the way programmers think about code...CS 584 - Aligning development tools with the way programmers think about code...
CS 584 - Aligning development tools with the way programmers think about code...Sergii Shmarkatiuk
 
SOFTWARE CODE MAINTAINABILITY: A LITERATURE REVIEW
SOFTWARE CODE MAINTAINABILITY: A LITERATURE REVIEWSOFTWARE CODE MAINTAINABILITY: A LITERATURE REVIEW
SOFTWARE CODE MAINTAINABILITY: A LITERATURE REVIEWijseajournal
 
CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016Masud Rahman
 
Framework for a Software Quality Rating System
Framework for a Software Quality Rating SystemFramework for a Software Quality Rating System
Framework for a Software Quality Rating SystemKarthik Murali
 
Automatic Code Completion Exploting Semantic Similarity
Automatic Code Completion Exploting Semantic SimilarityAutomatic Code Completion Exploting Semantic Similarity
Automatic Code Completion Exploting Semantic SimilarityMasud Rahman
 
Criteria Ratings PointsIntroduction 10 to 8.0 ptsAdva
Criteria Ratings PointsIntroduction 10 to 8.0 ptsAdvaCriteria Ratings PointsIntroduction 10 to 8.0 ptsAdva
Criteria Ratings PointsIntroduction 10 to 8.0 ptsAdvaCruzIbarra161
 
International Journal of Computer Science and Security Volume (1) Issue (2)
International Journal of Computer Science and Security Volume (1) Issue (2)International Journal of Computer Science and Security Volume (1) Issue (2)
International Journal of Computer Science and Security Volume (1) Issue (2)CSCJournals
 
When Testing Meets Code Review: Why and How Developers Review Tests
When Testing Meets Code Review: Why and How Developers Review TestsWhen Testing Meets Code Review: Why and How Developers Review Tests
When Testing Meets Code Review: Why and How Developers Review TestsDelft University of Technology
 
Generating_Automated_Feedback_To_Improve_Software_Testing_Quality
Generating_Automated_Feedback_To_Improve_Software_Testing_QualityGenerating_Automated_Feedback_To_Improve_Software_Testing_Quality
Generating_Automated_Feedback_To_Improve_Software_Testing_QualityPratik Gundlupet Venkatesh
 
Adopting code reviews for agile software development
Adopting code reviews for agile software developmentAdopting code reviews for agile software development
Adopting code reviews for agile software developmentmariobernhart
 
Supporting the Maintenance of Identifier Names: A Holistic Approach to High-Q...
Supporting the Maintenance of Identifier Names: A Holistic Approach to High-Q...Supporting the Maintenance of Identifier Names: A Holistic Approach to High-Q...
Supporting the Maintenance of Identifier Names: A Holistic Approach to High-Q...University of Hawai‘i at Mānoa
 
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...Debdoot Mukherjee
 
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...Debdoot Mukherjee
 

Similar to MSR2017-RevHelper (20)

Code-Review-COW56-Meeting
Code-Review-COW56-MeetingCode-Review-COW56-Meeting
Code-Review-COW56-Meeting
 
CodeInsight-SCAM2015
CodeInsight-SCAM2015CodeInsight-SCAM2015
CodeInsight-SCAM2015
 
ISEC-2021-Presentation-Saikat-Mondal
ISEC-2021-Presentation-Saikat-MondalISEC-2021-Presentation-Saikat-Mondal
ISEC-2021-Presentation-Saikat-Mondal
 
Speculative analysis for comment quality assessment
Speculative analysis for comment quality assessmentSpeculative analysis for comment quality assessment
Speculative analysis for comment quality assessment
 
CS 584 - Aligning development tools with the way programmers think about code...
CS 584 - Aligning development tools with the way programmers think about code...CS 584 - Aligning development tools with the way programmers think about code...
CS 584 - Aligning development tools with the way programmers think about code...
 
SOFTWARE CODE MAINTAINABILITY: A LITERATURE REVIEW
SOFTWARE CODE MAINTAINABILITY: A LITERATURE REVIEWSOFTWARE CODE MAINTAINABILITY: A LITERATURE REVIEW
SOFTWARE CODE MAINTAINABILITY: A LITERATURE REVIEW
 
CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016
 
Framework for a Software Quality Rating System
Framework for a Software Quality Rating SystemFramework for a Software Quality Rating System
Framework for a Software Quality Rating System
 
C24011018
C24011018C24011018
C24011018
 
Automatic Code Completion Exploting Semantic Similarity
Automatic Code Completion Exploting Semantic SimilarityAutomatic Code Completion Exploting Semantic Similarity
Automatic Code Completion Exploting Semantic Similarity
 
Criteria Ratings PointsIntroduction 10 to 8.0 ptsAdva
Criteria Ratings PointsIntroduction 10 to 8.0 ptsAdvaCriteria Ratings PointsIntroduction 10 to 8.0 ptsAdva
Criteria Ratings PointsIntroduction 10 to 8.0 ptsAdva
 
International Journal of Computer Science and Security Volume (1) Issue (2)
International Journal of Computer Science and Security Volume (1) Issue (2)International Journal of Computer Science and Security Volume (1) Issue (2)
International Journal of Computer Science and Security Volume (1) Issue (2)
 
When Testing Meets Code Review: Why and How Developers Review Tests
When Testing Meets Code Review: Why and How Developers Review TestsWhen Testing Meets Code Review: Why and How Developers Review Tests
When Testing Meets Code Review: Why and How Developers Review Tests
 
Generating_Automated_Feedback_To_Improve_Software_Testing_Quality
Generating_Automated_Feedback_To_Improve_Software_Testing_QualityGenerating_Automated_Feedback_To_Improve_Software_Testing_Quality
Generating_Automated_Feedback_To_Improve_Software_Testing_Quality
 
Adopting code reviews for agile software development
Adopting code reviews for agile software developmentAdopting code reviews for agile software development
Adopting code reviews for agile software development
 
Supporting the Maintenance of Identifier Names: A Holistic Approach to High-Q...
Supporting the Maintenance of Identifier Names: A Holistic Approach to High-Q...Supporting the Maintenance of Identifier Names: A Holistic Approach to High-Q...
Supporting the Maintenance of Identifier Names: A Holistic Approach to High-Q...
 
MSR2015-Challenge
MSR2015-ChallengeMSR2015-Challenge
MSR2015-Challenge
 
Ijcatr04051006
Ijcatr04051006Ijcatr04051006
Ijcatr04051006
 
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
 
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
 

More from Masud Rahman

HereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie UniversityHereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie UniversityMasud Rahman
 
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...Masud Rahman
 
PhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of SaskatchewanPhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of SaskatchewanMasud Rahman
 
PhD proposal of Masud Rahman
PhD proposal of Masud RahmanPhD proposal of Masud Rahman
PhD proposal of Masud RahmanMasud Rahman
 
PhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud RahmanPhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud RahmanMasud Rahman
 
Doctoral Symposium of Masud Rahman
Doctoral Symposium of Masud RahmanDoctoral Symposium of Masud Rahman
Doctoral Symposium of Masud RahmanMasud Rahman
 
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...Masud Rahman
 
ICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationMasud Rahman
 
RACK-Tool-ICSE2017
RACK-Tool-ICSE2017RACK-Tool-ICSE2017
RACK-Tool-ICSE2017Masud Rahman
 
QUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-SingaporeQUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-SingaporeMasud Rahman
 
ACER-ASE2017-slides
ACER-ASE2017-slidesACER-ASE2017-slides
ACER-ASE2017-slidesMasud Rahman
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureMasud Rahman
 
NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018Masud Rahman
 

More from Masud Rahman (20)

HereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie UniversityHereWeCode 2022: Dalhousie University
HereWeCode 2022: Dalhousie University
 
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
 
PhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of SaskatchewanPhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of Saskatchewan
 
PhD proposal of Masud Rahman
PhD proposal of Masud RahmanPhD proposal of Masud Rahman
PhD proposal of Masud Rahman
 
PhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud RahmanPhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud Rahman
 
Doctoral Symposium of Masud Rahman
Doctoral Symposium of Masud RahmanDoctoral Symposium of Masud Rahman
Doctoral Symposium of Masud Rahman
 
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
 
ICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-Localization
 
MSR2017-Challenge
MSR2017-ChallengeMSR2017-Challenge
MSR2017-Challenge
 
STRICT-SANER2017
STRICT-SANER2017STRICT-SANER2017
STRICT-SANER2017
 
MSR2014-Challenge
MSR2014-ChallengeMSR2014-Challenge
MSR2014-Challenge
 
STRICT-SANER2015
STRICT-SANER2015STRICT-SANER2015
STRICT-SANER2015
 
CMPT-842-BRACK
CMPT-842-BRACKCMPT-842-BRACK
CMPT-842-BRACK
 
RACK-Tool-ICSE2017
RACK-Tool-ICSE2017RACK-Tool-ICSE2017
RACK-Tool-ICSE2017
 
RACK-SANER2016
RACK-SANER2016RACK-SANER2016
RACK-SANER2016
 
QUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-SingaporeQUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-Singapore
 
CORRECT-ICSE2016
CORRECT-ICSE2016CORRECT-ICSE2016
CORRECT-ICSE2016
 
ACER-ASE2017-slides
ACER-ASE2017-slidesACER-ASE2017-slides
ACER-ASE2017-slides
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lecture
 
NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

MSR2017-RevHelper

  • 1. PREDICTING USEFULNESS OF CODE REVIEW COMMENTS USING TEXTUAL FEATURES AND DEVELOPER EXPERIENCE Mohammad Masudur Rahman, Chanchal K. Roy, and Raula G. Kula* University of Saskatchewan, Canada, Osaka University, Japan* 14th International Conference on Mining Software Repositories (MSR 2017), Buenos Aires, Argentina
  • 2. RESEARCH PROBLEM: USEFULNESS OF CODE REVIEW COMMENTS 2  What makes a review comment useful or non-useful?  34.5% of review comments are non- useful at Microsoft (Bosu et al., MSR 2015)  No automated support to detect or improve such comments so far
  • 3. STUDY METHODOLOGY 3 1,482 Review comments (4 systems) Manual tagging with Bosu et al., MSR 2015 Non-useful comments (602) Useful comments (880) (1) Comparative study (2) Prediction model
  • 4. COMPARATIVE STUDY: VARIABLES Independent Variables (8) Response Variable (1) Reading Ease Textual Comment Usefulness (Yes / No) Stop word Ratio Textual Question Ratio Textual Code Element Ratio Textual Conceptual Similarity Textual Code Authorship Experience Code Reviewership Experience External Lib. Experience Experience 4  Contrast between useful and non-useful comments.  Two paradigms– comment texts, and commenter’s/developer’s experience  Answers two RQs related to two paradigms.
  • 5. ANSWERING RQ1: READING EASE  Flesch-Kincaid Reading Ease applied.  No significant difference between useful and non- useful review comments. 5
  • 6. ANSWERING RQ1: STOP WORD RATIO  Used Google stop word list and Python keywords.  Stop word ratio = #stop or keywords/#all words from a review comment  Non-useful comments contain more stop words than useful comments, i.e., statistically significant. 6
  • 7. ANSWERING RQ1: QUESTION RATIO  Developers treat clarification questions as non-useful review comments.  Question ratio = #questions/#sentences of a comment.  No significant difference between useful and non-useful comments in question ratio. 7
  • 8. ANSWERING RQ1: CODE ELEMENT RATIO  Important code elements (e.g., identifiers) in the comments texts, possibly trigger the code change.  Code element ratio = #source tokens/#all tokens  Useful comments > non-useful comments for code element ratio, i.e., statistically significant. 8
  • 9. ANSWERING RQ1: CONCEPTUAL SIMILARITY BETWEEN COMMENTS & CHANGED CODE  How relevant the comment is with the changed code?  Do comments & changed code share vocabularies?  Yes, useful comments do more sharing than non-useful ones, i.e., statistically significant. 9
  • 10. ANSWERING RQ2: CODE AUTHORSHIP  File level authorship did not make much difference, a bit counter-intuitive.  Project level authorship differs between useful and non-useful comments, mostly for Q2 and Q3 10
  • 11. ANSWERING RQ2: CODE REVIEWERSHIP  Does reviewing experience matter in providing useful comments?  Yes, it does. File level reviewing experience matters. Especially true for Q2 and Q3.  Experienced reviewers provide more useful comments than non- useful comments. 11
  • 12. ANSWERING RQ2: EXT. LIB. EXPERIENCE  Familiarity with the library used in the changed code for which comment is posted.  Significantly higher for the authors of useful comments for Q3 only. 12
  • 13. SUMMARY OF COMPARATIVE STUDY 13 RQ Independent Variables Useful vs. Non-useful Difference RQ1 Reading Ease Not significant Stop word Ratio Significant Question Ratio Not significant Code Element Ratio Significant Conceptual Similarity Significant RQ2 Code Authorship Somewhat significant Code Reviewership Significant External Lib. Experience Somewhat significant
  • 14. EXPERIMENTAL DATASET & SETUP 14 1,482 code review comments Evaluation set (1,116) Validation set (366) Model training & cross-validation Validation with unseen comments
  • 15. REVHELPER: USEFULNESS PREDICTION MODEL 15 Review comments Manual classification using Bosu et al. Useful & non-useful comments Model training Prediction model New review comment  Prediction of usefulness for a new review comment to be submitted.  Applied three ML algorithms– NB, LR, and RF  Evaluation & validation with different data sets  Answered 3 RQs– RQ3, RQ4 and RQ5
  • 16. ANSWERING RQ3: MODEL PERFORMANCE Learning Algorithm Useful Comments Non-useful Comments Precision Recall Precision Recall Naïve Bayes 61.30% 66.00% 53.30% 48.20% Logistic Regression 60.70% 71.40% 54.60% 42.80% Random Forest 67.93% 75.04% 63.06% 54.54% 16  Random Forest based model performs the best.  Both F1-score and accuracy 66%.  Comment usefulness and features are not linearly correlated.  As a primer, this prediction could be useful.
  • 17. ANSWERING RQ4: ROLE OF PARADIGMS 17
  • 18. ANSWERING RQ4: ROLE OF PARADIGMS 18
  • 19. ANSWERING RQ5: COMPARISON WITH BASELINE (VALIDATION) 19
  • 20. ANSWERING RQ5: COMPARISON WITH BASELINE (ROC) 20
  • 21. TAKE-HOME MESSAGES  Usefulness of review comments is complex but a much needed piece of information.  No automated support available so far to predict usefulness of review comments instantly.  Non-useful comments are significantly different from useful comments in several textual features (e.g., conceptual similarity)  Reviewing experience matters in providing useful review comments.  Our prediction model can predict the usefulness of a new review comment.  RevHelper performs better than random guessing and available alternatives. 21
  • 22. THANK YOU!! QUESTIONS? 22 Email: chanchal.roy@usask.ca or masud.rahman@usask.ca

Editor's Notes

  1. Introduce yourself +introductory statements. Code review is a widely used practice for modern software development, and review comments are the building blocks for code review. However, it is often difficult to find out whether a review comment is useful or not. In my talk, I am going to discuss which factors might be related to the usefulness of code review comments.
  2. For example, these are two review comments. This one triggers a new code change whereas the other one does not. Now, professional developers @Microsoft considered the change-triggering comment as the useful one. Now, in this research, we try to answer what makes a review comment useful? If not useful, can we identify that before sending them to the patch submitters? About 35% of the review comments are found non-useful at Microsoft, which is a significant amount. Unfortunately, there exist no automated supports for detecting or improving such comments so far. So, here comes our study.
  3. So, in this research, we try to understand what makes a review comment useful. For that, we need real code review data. So, we collaborated with Vendasta, a local software company with 150+ employees, and collect about 1,500 recent review comments from four of their systems. Then we apply the change-triggering heuristic of an earlier study, and manually annotate the comments. That is, if a review comment triggers new code change in its vicinity, it is a useful comment and vice versa. Now, we have to distinct sets of comments. Now we do comparative analysis between them, which is our first contribution. Then, we also develop a prediction model so that the non-useful comments can be predicted before submitting to the developers. The goal is to improve the non-useful review comments through identification and possible suggestions.
  4. In the comparative study, we contrast between useful and non-useful comments. We have independent and response variables. We consider two paradigms– comment texts, and the corresponding developer’s experience. In the textual aspect we consider five features of the comments such as reading ease, code element ratio or conceptual similarity between comment texts and the changed code. In the experience paradigm, we consider the authorship, review experience, and library experience of the developers. We have one response variable, that is the comment is useful or not– yes or no.
  5. We applied Flesch-Kincaid reading ease to both review comments which returns a ease score between 0 to 100. We did not find any significant difference in these scores for useful and non-useful comments.
  6. The second variable is stop word ratio. We used Google’s stop word list and Python keyword list since our code base is of Python. We found that non-useful comments contain significantly higher ratio of stop words than useful comments. The findings is also confirmed from quartile level analysis.
  7. Developers often treat clarification question as a sign of non-useful comments. So, we determine question ratio = # of questions/#sentences for each of the comments. We did not find any such evidence of developers claim. There is no significant difference between useful and non-useful comments in the question ratio
  8. From our manual observations, we note that review comments often contain relevant code elements such as method names, or identifiers. We also contrast on code element ratio between useful and non-useful comments. We found that useful comments contain more code elements than non-useful comments.
  9. How relevant the comment is against the changed code for which the comment is posted? This can be determined between lexical similarity between comments and the changed code. Similar concepts are applied with Stack Overflow questions and answers. We found that useful comments are conceptually more similar to their target code, than the non-useful comments. That is, you have to use familiar vocabulary to patch submitters to make your comment useful.
  10. As opposed to earlier findings, file level authorship did not make too much difference. That is, 46% of the useful comment authors changed file at least once. The same statistic for non-useful comments is 49% However, in the case with project level authorship of the developers, we found some difference between useful and non-useful comments. Mostly for Q2 and Q3. For more details, please read the paper.
  11. We also investigate if the reviewing experience is different between the authors of useful and non-useful comments. We found that file level reviewing experience matters here. Reviewing experience reduces the non-useful comments from a developer. Also experienced developers provide more useful comments than less experienced developers.
  12. We repeat the same for the external library experience, and found that developer’s experience for useful comments is slightly higher than that of non-useful comments.
  13. So, these are the summaries from our comparative study. We found three textual features are significantly different between useful and non-useful comments which are interesting. No studies explored these aspects earlier. We also confirmed some of the earlier findings on the developer experience paradigm as well.
  14. We divide our dataset into two groups– evaluation set and validation set. With the evaluation set, we evaluate our technique with 10-fold cross validation. Using the validation set, we validate the performance of our prediction model.
  15. These are the steps of our model development. We first annotate the comments using change-triggering heuristic. Then we extract the textual and experiment features of the comments, train our model using three learning algorithms. Then we use our model to predict whether a new review comment is useful or not. If a review comment is not useful, we can explain that with our identified features as well.
  16. Our evaluation shows that RF-based model performs the best, especially it can identify the non-usefulness comments more than the other variants. Since RF is a tree-based model, it also confirms that comment usefulness and features are NOT linearly correlated. The model provides 66% accuracy with 10-fold cross validation. As a state-of-the-art, this could be useful. Something is always better than nothing we believe.
  17. We also investigate how each of the features perform or contribute during comment usefulness prediction. We see that Conceptual Similarity, a textual feature has the strongest role in the prediction. For example, if we discard that feature, our prediction accuracy drops by 25% The next important features are reviewing experience stats of the developers.
  18. We also contrast between two group of features– textual features and experience features. While experience is a strong metric overall, it has limited practical use/applicability during usefulness prediction. That is why, we introduce the textual features. It also shows some interesting results. ROC and PR-curve show that our prediction is much better than random guessing.
  19. We compare we five variants of a baseline model – Bosu et al., MSR 2015 That model considers a set of keywords and sentiment analysis to separate useful and non-useful comments. However, according to our experiments, these metrics are not sufficient enough, especially for usefulness prediction. We see that our identified features are more effective, and prediction much more accurate than theirs.
  20. This is ROC for our model and the competitive models. While the baseline models are close to random guessing, we are doing much better. Obviously, these can be further improved with more accurate features and possibly more training dataset.
  21. Possibly you can read them out I guess.
  22. Thanks for your time. Questions!!