Mining the Modern Code Review Repositories:
A Dataset of People, Process and Product
Xin Yang Raula G. Kula Norihiro Yoshida Hajimu Iida
May 14–15, 2016. Austin, Texas
MSR 2016 data showcase
Osaka University
Japan
Nagoya University
Japan
NAIST
Japan
NAIST
Japan
An Overview of the Code Review Dataset
1
● Code Review
● Source Code
● Human / Social
Why we made this dataset?
2
*Hamasaki et al., “Who does what during a code review? datasets of OSS peer review
repositories”. MSR '13
Our JSON-based
Dataset
(Hamasaki et al. MSR'13)*
Our previous work
(Hamasaki et al. MSR '13)*
Why we made this dataset?
2
*Hamasaki et al., “Who does what during a code review? datasets of OSS peer review
repositories”. MSR '13
Our JSON-based
Dataset
(Hamasaki et al. MSR'13)*
Some feedback:
“Hard to query...”
“Hard to convert...”
“Unable to access the source
code...”
Our previous work
(Hamasaki et al. MSR '13)*
Why we made this dataset?
2
*Hamasaki et al., “Who does what during a code review? datasets of OSS peer review
repositories”. MSR '13
Our JSON-based
Dataset
(Hamasaki et al. MSR'13)*
Some feedback:
“Hard to query...”
“Hard to convert...”
“Unable to access the source
code...”
Script
Typical Modern Code Review Process
3
Process
Product
People
You can mine from three different aspects
3
4 years 3 years 7 years 4 years 3 years
611 20 567 111 189
173,749 13,597 63,610 110,172 9,168
5,091 437 3,334 1,437 759
Dataset Statistics (updated to May 2015)
4
</></></>
goo.gl/Wi4UoJ
5
Download the Dataset

Mining the Modern Code Review Repositories: A Dataset of People, Process and Product (MSR 2016)

  • 1.
    Mining the ModernCode Review Repositories: A Dataset of People, Process and Product Xin Yang Raula G. Kula Norihiro Yoshida Hajimu Iida May 14–15, 2016. Austin, Texas MSR 2016 data showcase Osaka University Japan Nagoya University Japan NAIST Japan NAIST Japan
  • 2.
    An Overview ofthe Code Review Dataset 1 ● Code Review ● Source Code ● Human / Social
  • 3.
    Why we madethis dataset? 2 *Hamasaki et al., “Who does what during a code review? datasets of OSS peer review repositories”. MSR '13 Our JSON-based Dataset (Hamasaki et al. MSR'13)*
  • 4.
    Our previous work (Hamasakiet al. MSR '13)* Why we made this dataset? 2 *Hamasaki et al., “Who does what during a code review? datasets of OSS peer review repositories”. MSR '13 Our JSON-based Dataset (Hamasaki et al. MSR'13)* Some feedback: “Hard to query...” “Hard to convert...” “Unable to access the source code...”
  • 5.
    Our previous work (Hamasakiet al. MSR '13)* Why we made this dataset? 2 *Hamasaki et al., “Who does what during a code review? datasets of OSS peer review repositories”. MSR '13 Our JSON-based Dataset (Hamasaki et al. MSR'13)* Some feedback: “Hard to query...” “Hard to convert...” “Unable to access the source code...” Script
  • 6.
    Typical Modern CodeReview Process 3
  • 7.
    Process Product People You can minefrom three different aspects 3
  • 8.
    4 years 3years 7 years 4 years 3 years 611 20 567 111 189 173,749 13,597 63,610 110,172 9,168 5,091 437 3,334 1,437 759 Dataset Statistics (updated to May 2015) 4 </></></>
  • 9.

Editor's Notes

  • #3 Why we made this dataset? Code review dataset from 5 successful OSS projects Source code from Git Human and social information (anonymized usernames and email addresses)
  • #4 Our previous work in MSR 2013 provides JSON format dataset and refined dataset with csv format. In these 3 years we have received many feedback from our dataset users. Some users complained that : ……. Thus, we improved our dataset by converting JSON to MySQL database, and provide shell scripts to access source code...
  • #5 Our previous work in MSR 2013 provides JSON format dataset and refined dataset with csv format. In these 3 years we have received many feedback from our dataset users. Some users complained that : ……. Thus, we improved our dataset by converting JSON to MySQL database, and provide shell scripts to access source code...
  • #6 Our previous work in MSR 2013 provides JSON format dataset and refined dataset with csv format. In these 3 years we have received many feedback from our dataset users. Some users complained that : ……. Thus, we improved our dataset by converting JSON to MySQL database, and provide shell scripts to access source code...
  • #7 This is a typical MCR process, Author create and update their patches (changes), Reviewers perform code reviews on changes and send feedback to authors Continuous Integration (CI) tools build and test changes, After several times revisions, the changes will pass reviews and be integrated to code repositories
  • #8 Our dataset try to retrieve the data from three different aspect of code review process. First, how developers, reviewers and CI tools collaborate (see People) Second, what is the life cycle of a change from initial commit to final decision (see Process) Final, what is the product of code review (see Product).
  • #9 Some basic statistics about our dataset We retrieve data from 5 big-scale successful OSS projects: OpenStack, Libreoffice, AOSP, Qt and Eclipse Time: how long this project use Gerrit code review (from the time they adopted Gerrit) Repositories: how many repositories are involved Patches: how many changes have been created Participants: how many people have participated in
  • #10 You can download our dataset here and now!