Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Primers or Reminders? The Effects of Existing Review Comments on Code Review

29 views

Published on

Presentation of the paper "Primers or Reminders? The Effects of Existing Review Comments on Code Review" published at ICSE 2020.

Authors:
Davide Spadini, Gül Calikli, Alberto Bacchelli

Link to the paper: https://research.tudelft.nl/en/publications/primers-or-reminders-the-effects-of-existing-review-comments-on-c

Published in: Engineering
  • Login to see the comments

  • Be the first to like this

Primers or Reminders? The Effects of Existing Review Comments on Code Review

  1. 1. Primers or Reminders? The Effects of Existing Review Comments on Code Review Davide Spadini, Gül Calikli, Alberto Bacchelli This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 642954
  2. 2. Davide Spadini, Gül Calikli, Alberto Bacchelli This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 642954 @DavideSpadini ishepard Primers or Reminders? The Effects of Existing Review Comments on Code Review
  3. 3. Motivation Peer review of manuscripts PyDriller: Python Framework for Mining Soware Repositories Davide Spadini Delft University of Technology Software Improvement Group Delft, The Netherlands d.spadini@sig.eu Maurício Aniche Delft University of Technology Delft, The Netherlands m.f.aniche@tudelft.nl Alberto Bacchelli University of Zurich Zurich, Switzerland bacchelli@i.uzh.ch ABSTRACT Software repositories contain historical and valuable information about the overall development of software systems. Mining software repositories (MSR) is nowadays considered one of the most inter- esting growing elds within software engineering. MSR focuses on extracting and analyzing data available in software repositories to uncover interesting, useful, and actionable information about the system. Even though MSR plays an important role in software engineering research, few tools have been created and made public to support developers in extracting information from Git reposi- tory. In this paper, we present P, a Python Framework that eases the process of mining Git. We compare our tool against the state-of-the-art Python Framework GitPython, demonstrating that P can achieve the same results with, on average, 50% less LOC and signicantly lower complexity. URL: https://github.com/ishepard/pydriller, Materials: https://doi.org/10.5281/zenodo.1327363, Pre-print: https://doi.org/10.5281/zenodo.1327411 CCS CONCEPTS • Software and its engineering; KEYWORDS Mining Software Repositories, GitPython, Git, Python ACM Reference Format: Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python Framework for Mining Software Repositories. In Proceedings of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’18), November 4– 9, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3236024.3264598 1 INTRODUCTION Mining software repository (MSR) techniques allow researchers to analyze the information generated throughout the software devel- follow [20], predicting classes that are more prone to change/de- fects [3, 6, 16, 17], and identifying the core developers of a software team to transfer knowledge [12]. Among the dierent sources of information researchers can use, version control systems, such as Git, are among the most used ones. Indeed, version control systems provide researchers with precise information about the source code, its evolution, the developers of the software, and the commit messages (which explain the reasons for changing). Nevertheless, extracting information from Git repositories is not trivial. Indeed, many frameworks can be used to interact with Git (depending on the preferred programming language), such as GitPython [1] for Python, or JGit for Java [8]. However, these tools are often dicult to use. One of the main reasons for such diculty is that they encapsulate all the features from Git, hence, developers are forced to write long and complex implementations to extract even simple data from a Git repository. In this paper, we present P, a Python framework that helps developers to mine software repositories. P provides developers with simple APIs to extract information from a Git repository, such as commits, developers, modications, dis, and source code. Moreover, as P is a framework, developers can further manipulate the extracted data and quickly export the results to their preferred formats (e.g., CSV les and databases). To evaluate the usefulness of our tool, we compare it with the state-of-the-art Python framework GitPython, in terms of imple- mentation complexity, performance, and memory consumption. Our results show that P requires signicantly fewer lines of code to perform the same task when compared to GitPython, with only a small drop in performance. Also, we asked six develop- ers to perform tasks with both tools and found that all developers spend less time in learning and implementing tasks in P. 2 PYDRILLER P is a wrapper around GitPython that eases the extraction of information from Git repositories. The most signicant dier- Code review
  4. 4. Motivation Peer review of manuscripts PyDriller: Python Framework for Mining Soware Repositories Davide Spadini Delft University of Technology Software Improvement Group Delft, The Netherlands d.spadini@sig.eu Maurício Aniche Delft University of Technology Delft, The Netherlands m.f.aniche@tudelft.nl Alberto Bacchelli University of Zurich Zurich, Switzerland bacchelli@i.uzh.ch ABSTRACT Software repositories contain historical and valuable information about the overall development of software systems. Mining software repositories (MSR) is nowadays considered one of the most inter- esting growing elds within software engineering. MSR focuses on extracting and analyzing data available in software repositories to uncover interesting, useful, and actionable information about the system. Even though MSR plays an important role in software engineering research, few tools have been created and made public to support developers in extracting information from Git reposi- tory. In this paper, we present P, a Python Framework that eases the process of mining Git. We compare our tool against the state-of-the-art Python Framework GitPython, demonstrating that P can achieve the same results with, on average, 50% less LOC and signicantly lower complexity. URL: https://github.com/ishepard/pydriller, Materials: https://doi.org/10.5281/zenodo.1327363, Pre-print: https://doi.org/10.5281/zenodo.1327411 CCS CONCEPTS • Software and its engineering; KEYWORDS Mining Software Repositories, GitPython, Git, Python ACM Reference Format: Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python Framework for Mining Software Repositories. In Proceedings of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’18), November 4– 9, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3236024.3264598 1 INTRODUCTION Mining software repository (MSR) techniques allow researchers to analyze the information generated throughout the software devel- follow [20], predicting classes that are more prone to change/de- fects [3, 6, 16, 17], and identifying the core developers of a software team to transfer knowledge [12]. Among the dierent sources of information researchers can use, version control systems, such as Git, are among the most used ones. Indeed, version control systems provide researchers with precise information about the source code, its evolution, the developers of the software, and the commit messages (which explain the reasons for changing). Nevertheless, extracting information from Git repositories is not trivial. Indeed, many frameworks can be used to interact with Git (depending on the preferred programming language), such as GitPython [1] for Python, or JGit for Java [8]. However, these tools are often dicult to use. One of the main reasons for such diculty is that they encapsulate all the features from Git, hence, developers are forced to write long and complex implementations to extract even simple data from a Git repository. In this paper, we present P, a Python framework that helps developers to mine software repositories. P provides developers with simple APIs to extract information from a Git repository, such as commits, developers, modications, dis, and source code. Moreover, as P is a framework, developers can further manipulate the extracted data and quickly export the results to their preferred formats (e.g., CSV les and databases). To evaluate the usefulness of our tool, we compare it with the state-of-the-art Python framework GitPython, in terms of imple- mentation complexity, performance, and memory consumption. Our results show that P requires signicantly fewer lines of code to perform the same task when compared to GitPython, with only a small drop in performance. Also, we asked six develop- ers to perform tasks with both tools and found that all developers spend less time in learning and implementing tasks in P. 2 PYDRILLER P is a wrapper around GitPython that eases the extraction of information from Git repositories. The most signicant dier- Code review - Asynchronous - Asynchronous
  5. 5. Motivation Peer review of manuscripts PyDriller: Python Framework for Mining Soware Repositories Davide Spadini Delft University of Technology Software Improvement Group Delft, The Netherlands d.spadini@sig.eu Maurício Aniche Delft University of Technology Delft, The Netherlands m.f.aniche@tudelft.nl Alberto Bacchelli University of Zurich Zurich, Switzerland bacchelli@i.uzh.ch ABSTRACT Software repositories contain historical and valuable information about the overall development of software systems. Mining software repositories (MSR) is nowadays considered one of the most inter- esting growing elds within software engineering. MSR focuses on extracting and analyzing data available in software repositories to uncover interesting, useful, and actionable information about the system. Even though MSR plays an important role in software engineering research, few tools have been created and made public to support developers in extracting information from Git reposi- tory. In this paper, we present P, a Python Framework that eases the process of mining Git. We compare our tool against the state-of-the-art Python Framework GitPython, demonstrating that P can achieve the same results with, on average, 50% less LOC and signicantly lower complexity. URL: https://github.com/ishepard/pydriller, Materials: https://doi.org/10.5281/zenodo.1327363, Pre-print: https://doi.org/10.5281/zenodo.1327411 CCS CONCEPTS • Software and its engineering; KEYWORDS Mining Software Repositories, GitPython, Git, Python ACM Reference Format: Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python Framework for Mining Software Repositories. In Proceedings of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’18), November 4– 9, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3236024.3264598 1 INTRODUCTION Mining software repository (MSR) techniques allow researchers to analyze the information generated throughout the software devel- follow [20], predicting classes that are more prone to change/de- fects [3, 6, 16, 17], and identifying the core developers of a software team to transfer knowledge [12]. Among the dierent sources of information researchers can use, version control systems, such as Git, are among the most used ones. Indeed, version control systems provide researchers with precise information about the source code, its evolution, the developers of the software, and the commit messages (which explain the reasons for changing). Nevertheless, extracting information from Git repositories is not trivial. Indeed, many frameworks can be used to interact with Git (depending on the preferred programming language), such as GitPython [1] for Python, or JGit for Java [8]. However, these tools are often dicult to use. One of the main reasons for such diculty is that they encapsulate all the features from Git, hence, developers are forced to write long and complex implementations to extract even simple data from a Git repository. In this paper, we present P, a Python framework that helps developers to mine software repositories. P provides developers with simple APIs to extract information from a Git repository, such as commits, developers, modications, dis, and source code. Moreover, as P is a framework, developers can further manipulate the extracted data and quickly export the results to their preferred formats (e.g., CSV les and databases). To evaluate the usefulness of our tool, we compare it with the state-of-the-art Python framework GitPython, in terms of imple- mentation complexity, performance, and memory consumption. Our results show that P requires signicantly fewer lines of code to perform the same task when compared to GitPython, with only a small drop in performance. Also, we asked six develop- ers to perform tasks with both tools and found that all developers spend less time in learning and implementing tasks in P. 2 PYDRILLER P is a wrapper around GitPython that eases the extraction of information from Git repositories. The most signicant dier- Code review - Asynchronous - 1 reviewer per peer review - Asynchronous - 1 reviewer per code review
  6. 6. Motivation Peer review of manuscripts PyDriller: Python Framework for Mining Soware Repositories Davide Spadini Delft University of Technology Software Improvement Group Delft, The Netherlands d.spadini@sig.eu Maurício Aniche Delft University of Technology Delft, The Netherlands m.f.aniche@tudelft.nl Alberto Bacchelli University of Zurich Zurich, Switzerland bacchelli@i.uzh.ch ABSTRACT Software repositories contain historical and valuable information about the overall development of software systems. Mining software repositories (MSR) is nowadays considered one of the most inter- esting growing elds within software engineering. MSR focuses on extracting and analyzing data available in software repositories to uncover interesting, useful, and actionable information about the system. Even though MSR plays an important role in software engineering research, few tools have been created and made public to support developers in extracting information from Git reposi- tory. In this paper, we present P, a Python Framework that eases the process of mining Git. We compare our tool against the state-of-the-art Python Framework GitPython, demonstrating that P can achieve the same results with, on average, 50% less LOC and signicantly lower complexity. URL: https://github.com/ishepard/pydriller, Materials: https://doi.org/10.5281/zenodo.1327363, Pre-print: https://doi.org/10.5281/zenodo.1327411 CCS CONCEPTS • Software and its engineering; KEYWORDS Mining Software Repositories, GitPython, Git, Python ACM Reference Format: Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python Framework for Mining Software Repositories. In Proceedings of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’18), November 4– 9, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3236024.3264598 1 INTRODUCTION Mining software repository (MSR) techniques allow researchers to analyze the information generated throughout the software devel- follow [20], predicting classes that are more prone to change/de- fects [3, 6, 16, 17], and identifying the core developers of a software team to transfer knowledge [12]. Among the dierent sources of information researchers can use, version control systems, such as Git, are among the most used ones. Indeed, version control systems provide researchers with precise information about the source code, its evolution, the developers of the software, and the commit messages (which explain the reasons for changing). Nevertheless, extracting information from Git repositories is not trivial. Indeed, many frameworks can be used to interact with Git (depending on the preferred programming language), such as GitPython [1] for Python, or JGit for Java [8]. However, these tools are often dicult to use. One of the main reasons for such diculty is that they encapsulate all the features from Git, hence, developers are forced to write long and complex implementations to extract even simple data from a Git repository. In this paper, we present P, a Python framework that helps developers to mine software repositories. P provides developers with simple APIs to extract information from a Git repository, such as commits, developers, modications, dis, and source code. Moreover, as P is a framework, developers can further manipulate the extracted data and quickly export the results to their preferred formats (e.g., CSV les and databases). To evaluate the usefulness of our tool, we compare it with the state-of-the-art Python framework GitPython, in terms of imple- mentation complexity, performance, and memory consumption. Our results show that P requires signicantly fewer lines of code to perform the same task when compared to GitPython, with only a small drop in performance. Also, we asked six develop- ers to perform tasks with both tools and found that all developers spend less time in learning and implementing tasks in P. 2 PYDRILLER P is a wrapper around GitPython that eases the extraction of information from Git repositories. The most signicant dier- Code review - Asynchronous - 1 reviewer per peer review - Reviewers judge the manuscript independently from each other - Asynchronous - 1 reviewer per code review - Reviews are immediately visible to the other reviewers Could this visibility bias the other reviewers?
  7. 7. Availability Bias • Availability bias is one type of cognitive bias • It is the tendency to overestimate the likelihood of events with greater availability in memory (recent memories) • By reading other reviewers’ comments, the reviewer might be biased in fi nding the same types of errors, thus resulting in a biased code review outcome. • Are reviewers biased by other reviewers comments? Should we change the code review process and do something similar to a manuscript review?
  8. 8. Research Questions RQ1: … a bug type that is not normally considered? RQ2: … a bug type that is normally considered? What is the effect of priming a reviewer with comments on …
  9. 9. Demographics confounders With 1 comment of a previous reviewer Treatment Group Without comments Control Group review review review review Normally considered: Corner Case Normally considered: Corner Case Not normally considered: NPE on parameters Not normally considered: NPE on parameters
  10. 10. With 1 comment of a previous reviewer Treatment Group Without comments Control Group review review review review Normally considered: Corner Case Normally considered: Corner Case Not normally considered: NPE on parameters Not normally considered: NPE on parameters }Questions on the code review
  11. 11. RQ1: What is the effect of priming a reviewer with comments on a bug type that is not normally considered?
  12. 12. Reviewers primed on a not commonly considered bug are more likely to fi nd other occurrences of this type of bugs. However, this does not prevent them in fi nding also other types of bugs. 40%: “Extremely in fl uenced” 40%: ”Very in fl uenced 20%: ”Somewhat in fl uenced RQ1: What is the effect of priming a reviewer with comments on a bug type that is not normally considered?
  13. 13. RQ2: What is the effect of priming a reviewer with comments on a bug type that is normally considered?
  14. 14. RQ2: Results 50%: ”Extremely in fl uenced 10%: ”Somewhat in fl uenced” 40%: “Slightly/Not In fl uenced” Reviewers primed on an algorithmic bug perceive an in fl uence, but are as likely as the others to fi nd algorithmic bugs. Furthermore, primed participants did not capture fewer bugs of the other type.
  15. 15. Closing the circle Peer review of manuscripts PyDriller: Python Framework for Mining Soware Repositories Davide Spadini Delft University of Technology Software Improvement Group Delft, The Netherlands d.spadini@sig.eu Maurício Aniche Delft University of Technology Delft, The Netherlands m.f.aniche@tudelft.nl Alberto Bacchelli University of Zurich Zurich, Switzerland bacchelli@i.uzh.ch ABSTRACT Software repositories contain historical and valuable information about the overall development of software systems. Mining software repositories (MSR) is nowadays considered one of the most inter- esting growing elds within software engineering. MSR focuses on extracting and analyzing data available in software repositories to uncover interesting, useful, and actionable information about the system. Even though MSR plays an important role in software engineering research, few tools have been created and made public to support developers in extracting information from Git reposi- tory. In this paper, we present P, a Python Framework that eases the process of mining Git. We compare our tool against the state-of-the-art Python Framework GitPython, demonstrating that P can achieve the same results with, on average, 50% less LOC and signicantly lower complexity. URL: https://github.com/ishepard/pydriller, Materials: https://doi.org/10.5281/zenodo.1327363, Pre-print: https://doi.org/10.5281/zenodo.1327411 CCS CONCEPTS • Software and its engineering; KEYWORDS Mining Software Repositories, GitPython, Git, Python ACM Reference Format: Davide Spadini, Maurício Aniche, and Alberto Bacchelli. 2018. PyDriller: Python Framework for Mining Software Repositories. In Proceedings of the 26th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’18), November 4– 9, 2018, Lake Buena Vista, FL, USA. ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3236024.3264598 1 INTRODUCTION Mining software repository (MSR) techniques allow researchers to analyze the information generated throughout the software devel- follow [20], predicting classes that are more prone to change/de- fects [3, 6, 16, 17], and identifying the core developers of a software team to transfer knowledge [12]. Among the dierent sources of information researchers can use, version control systems, such as Git, are among the most used ones. Indeed, version control systems provide researchers with precise information about the source code, its evolution, the developers of the software, and the commit messages (which explain the reasons for changing). Nevertheless, extracting information from Git repositories is not trivial. Indeed, many frameworks can be used to interact with Git (depending on the preferred programming language), such as GitPython [1] for Python, or JGit for Java [8]. However, these tools are often dicult to use. One of the main reasons for such diculty is that they encapsulate all the features from Git, hence, developers are forced to write long and complex implementations to extract even simple data from a Git repository. In this paper, we present P, a Python framework that helps developers to mine software repositories. P provides developers with simple APIs to extract information from a Git repository, such as commits, developers, modications, dis, and source code. Moreover, as P is a framework, developers can further manipulate the extracted data and quickly export the results to their preferred formats (e.g., CSV les and databases). To evaluate the usefulness of our tool, we compare it with the state-of-the-art Python framework GitPython, in terms of imple- mentation complexity, performance, and memory consumption. Our results show that P requires signicantly fewer lines of code to perform the same task when compared to GitPython, with only a small drop in performance. Also, we asked six develop- ers to perform tasks with both tools and found that all developers spend less time in learning and implementing tasks in P. 2 PYDRILLER P is a wrapper around GitPython that eases the extraction of information from Git repositories. The most signicant dier- Code review

×