Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Test-Driven Code Review: An Empirical Study

54 views

Published on

ICSE 2019

Davide Spadini
https://doi.org/10.5281/zenodo.2551217

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Test-Driven Code Review: An Empirical Study

  1. 1. Test-Driven Code Review: An Empirical Study Davide Spadini, Fabio Palomba, Tobias Baum, Stefan Hanenberg, Magiel Bruntink, Alberto Bacchelli
  2. 2. Davide Spadini, Fabio Palomba, Tobias Baum, Stefan Hanenberg, Magiel Bruntink, Alberto Bacchelli @DavideSpadini ishepard Test-Driven Code Review: An Empirical Study
  3. 3. Code Review Software Testing
  4. 4. Code Review Software Testing When Testing Meets Code Review: Why and How Developers Review Tests Davide Spadini Delft University of Technology Software Improvement Group Delft, The Netherlands d.spadini@sig.eu Maurício Aniche Delft University of Technology Delft, The Netherlands m.f.aniche@tudelft.nl Margaret-Anne Storey University of Victoria Victoria, BC, Canada mstorey@uvic.ca Magiel Bruntink Software Improvement Group Amsterdam, The Netherlands m.bruntink@sig.eu Alberto Bacchelli University of Zurich Zurich, Switzerland bacchelli@i.uzh.ch ABSTRACT Automated testing is considered an essential process for ensuring software quality. However, writing and maintaining high-quality test code is challenging and frequently considered of secondary importance. For production code, many open source and industrial software projects employ code review, a well-established software quality practice, but the question remains whether and how code review is also used for ensuring the quality of test code. The aim of this research is to answer this question and to increase our un- derstanding of what developers think and do when it comes to reviewing test code. We conducted both quantitative and quali- tative methods to analyze more than 300,000 code reviews, and interviewed 12 developers about how they review test les. This work resulted in an overview of current code reviewing practices, a set of identied obstacles limiting the review of test code, and a set of issues that developers would like to see improved in code review tools. The study reveals that reviewing test les is very dierent from reviewing production les, and that the navigation within the review itself is one of the main issues developers currently face. Based on our ndings, we propose a series of recommendations and suggestions for the design of tools and future research. CCS CONCEPTS • Software and its engineering → Software testing and de- bugging; KEYWORDS software testing, automated testing, code review, Gerrit Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specic permission and/or a fee. Request permissions from permissions@acm.org. ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden © 2018 Copyright held by the owner/author(s). Publication rights licensed to Associa- tion for Computing Machinery. ACM ISBN 978-1-4503-5638-1/18/05...$15.00 https://doi.org/10.1145/3180155.3180192 ACM Reference Format: Davide Spadini, Maurício Aniche, Margaret-Anne Storey, Magiel Bruntink, and Alberto Bacchelli. 2018. When Testing Meets Code Review: Why and How Developers Review Tests. In Proceedings of ICSE ’18: 40th International Conference on Software Engineering , Gothenburg, Sweden, May 27-June 3, 2018 (ICSE ’18), 11 pages. https://doi.org/10.1145/3180155.3180192 1 INTRODUCTION Automated testing has become an essential process for improving the quality of software systems [15, 31]. Automated tests (hereafter referred to as just ‘tests’) can help ensure that production code is robust under many usage conditions and that code meets perfor- mance and security needs [15, 16]. Nevertheless, writing eective tests is as challenging as writing good production code. A tester has to ensure that test results are accurate, that all important execution paths are considered, and that the tests themselves do not introduce bottlenecks in the development pipeline [15]. Like production code, test code must also be maintained and evolved [49]. As testing has become more commonplace, some have considered that improving the quality of test code should help improve the quality of the associated production code [21, 47]. Unfortunately, there is evidence that test code is not always of high quality [11, 49]. Vazhabzadeh et al. showed that about half of the projects they studied had bugs in the test code [46]. Most of these bugs create false alarms that can waste developer time, while other bugs cause harmful defects in production code that can remain undetected. We also see that test code tends to grow over time, leading to bloat and technical debt [49]. As code review has been shown to improve the quality of source code in general [12, 38], one practice that is now common in many development projects is to use Modern Code Review (MCR) to improve the quality of test code. But how is test code reviewed? Is it reviewed as rigorously as production code, or is it reviewed at all? Are there specic issues that reviewers look for in test les? Does test code pose dierent reviewing challenges compared to the review of production code? Do some developers use techniques for reviewing test code that could be helpful to other developers? To address these questions and nd insights about test code review, we conducted a two-phase study to understand how test
  5. 5. TDR What? Who? Why? When?How?
  6. 6. Research Questions Code Review Experiment with 92 developers ~150 reviews 9 interviews + survey with 103 respondents RQ1: Does the order of presenting test code to the reviewer influence the code review’s effectiveness? RQ2: How do developers perceive the practice of Test-Driven Code Review?
  7. 7. Research Questions RQ1: Does the order of presenting test code to the reviewer influence the code review’s effectiveness? RQ2: How do developers perceive the practice of Test-Driven Code Review? Code Review Experiment with 92 developers ~150 reviews 9 interviews + survey with 103 respondents
  8. 8. RQ1: Does the order of presenting test code to the reviewer influence the code review’s effectiveness? — Method review 1st review review 2nd review (Optional)
  9. 9. RQ1: Does the order of presenting test code to the reviewer influence the code review’s effectiveness? — Method Dependent Variables Bugs • 2 different types: 1. Maintainability Issues 2. Functional Defects • Both in test and in production code
  10. 10. RQ1: Does the order of presenting test code to the reviewer influence the code review’s effectiveness? — Method Production First A.java ATest.javaA.javaATest.java Test First Independent Variable: Treatment A.java Only Production
  11. 11. RQ1: Does the order of presenting test code to the reviewer influence the code review’s effectiveness? — Method review 2nd review review 1st review Test First Production First Participant A review 1st review Production First review 2nd review Only Production Participant B
  12. 12. RQ1: Does the order of presenting test code to the reviewer influence the code review’s effectiveness? — Method Control Variables Code and Review Details, Participant’s Profile • Duration of the code review • Patch • Role • Experience: • Reviewing • Programming • …
  13. 13. RQ1: Does the order of presenting test code to the reviewer influence the code review’s effectiveness? — Method Demographics confounders review 1st review 32 participants completed only the 1st review 60 participants completed both reviews review 2nd review (Optional)
  14. 14. Found Maintainability Issues 0% 20% 40% 60% 80% 100% 18%21%13% 8%18% RQ1: Does the order of presenting test code to the reviewer influence the code review’s effectiveness? — Results Found Functional Defects (Bugs) 0% 20% 40% 60% 80% 100% 28%33%17% 28%40% Test First Production First Only Production p 0.01 medium eff. size p 0.01 small eff. size p 0.05p 0.05 In Test Code In Production Code In Test Code In Production Code
  15. 15. RQ1: Does the order of presenting test code to the reviewer influence the code review’s effectiveness? — Results Test Bugs Production Maintainability Issues Estimate Significance Estimate Significance Total duration -0.73 ** 0.04 . Is First Review 0.16 . -0.15 Patch 2 -0.92 *** -2.84 *** Treatment Prod. First -0.09 Treatment Test First 1.17 ** -1.24 ** Review Practice 0.06 0.25 . Programming Pract. -0.66 -0.21 Profession Dev Exp. -0.09 -0.29 * Java Exp. -0.01 -0.03 Worked Hours -0.02 -0.04 significance codes: ’***’p 0.001, ’**’p 0.01, ’*’p 0.05, ’.’p 0.1
  16. 16. RQ1: Does the order of presenting test code to the reviewer influence the code review’s effectiveness? — Results Test Bugs Production Maintainability Issues Estimate Significance Estimate Significance Total duration -0.73 ** 0.04 . Is First Review 0.16 . -0.15 Patch 2 -0.92 *** -2.84 *** Treatment Prod. First -0.09 Treatment Test First 1.17 ** -1.24 ** Review Practice 0.06 0.25 . Programming Pract. -0.66 -0.21 Profession Dev Exp. -0.09 -0.29 * Java Exp. -0.01 -0.03 Worked Hours -0.02 -0.04 significance codes: ’***’p 0.001, ’**’p 0.01, ’*’p 0.05, ’.’p 0.1
  17. 17. RQ1: Does the order of presenting test code to the reviewer influence the code review’s effectiveness? — Results Test Bugs Production Maintainability Issues Estimate Significance Estimate Significance Total duration -0.73 ** 0.04 . Is First Review 0.16 . -0.15 Patch 2 -0.92 *** -2.84 *** Treatment Prod. First -0.09 Treatment Test First 1.17 ** -1.24 ** Review Practice 0.06 0.25 . Programming Pract. -0.66 -0.21 Profession Dev Exp. -0.09 -0.29 * Java Exp. -0.01 -0.03 Worked Hours -0.02 -0.04 significance codes: ’***’p 0.001, ’**’p 0.01, ’*’p 0.05, ’.’p 0.1
  18. 18. Review Practice, Progr. Practice, Java Exp. With TDR, participants found more bugs in test code and the same in production code With TDR, participants found less maint. issues in production code RQ1: Does the order of presenting test code to the reviewer influence the code review’s effectiveness? — Results
  19. 19. Research Questions Code Review Experiment with 92 developers ~150 reviews 9 interviews + survey with 103 respondents RQ1: Does the order of presenting test code to the reviewer influence the code review’s effectiveness? RQ2: How do developers perceive the practice of Test-Driven Code Review?
  20. 20. Perceived advantages Adoption Perceived problems RQ2: How do developers perceive Test-Driven Code Review?
  21. 21. Working context: 3 OSS 6 Closed Source 9 interviews 103 survey respondents 44% between 2 to 5 years of Experience 28 % between 6 to 10 years of Experience RQ2: How do developers perceive Test-Driven Code Review?
  22. 22. Adoption: How often respondents use TDR RQ2: How do developers perceive Test-Driven Code Review? 0 10 20 30 40 50 Never Almost never Sometimes Almost always Always
  23. 23. Tests give less knowledge on the production behavior Tests have low code quality Can not choose a different order in CR tool Tests are perceived as less important Perceived problems with TDR RQ2: How do developers perceive Test-Driven Code Review?
  24. 24. Black-box view of production code Useful when the developer is unfamiliar with the code improved test code quality Perceived advantages of TDR RQ2: How do developers perceive Test-Driven Code Review?
  25. 25. Research Questions RQ1: Does the order of presenting test code to the reviewer influence the code review’s effectiveness? RQ2: How do developers perceive the practice of Test-Driven Code Review?
  26. 26. The order matters Context dependent Educate on test writing and reviewing Test code importance == Prod. code importance

×