The Codex of Business Writing Software for Real-World Solutions 2.pptx
MSR2017-Challenge
1. IMPACT OF CONTINUOUS
INTEGRATION ON CODE REVIEWS
Mohammad Masudur Rahman, Chanchal K. Roy
Department of Computer Science
University of Saskatchewan, Canada
14th International Conference on Mining Software
Repositories (MSR 2017) (Challenge Track)
Buenos Aires, Argentina
2. RESEARCH PROBLEM: IMPACT OF
AUTOMATED BUILDS ON CODE REVIEWS
Automated Builds, an
important part of CI for
commit merging & consistency
Exponential increase of
automated builds over the
years with Travis CI.
Builds & Code reviews as
interleaving steps in the pull-
based development
RQ1: Does the status of automated builds influence the code
review participation in open source projects?
RQ2: Do frequent automated builds help improve the
overall quality of peer code reviews?
RQ3: Can we automatically predict whether an automated
build would trigger new code reviews or not? 2
4. ANSWERING RQ1: BUILD STATUS &
CODE REVIEW PARTICIPATION
Build Status Build Only Builds + Reviews Total
Canceled 2,616 1,368 3,984
Errored 51,729 27,262 78,991
Failed 55,546 39,025 94,571
Passed 236,573 164,174 400,747
All 346,464 231,829 (40%) 578,293
4
578K PR-based builds
Four build statuses
232K (40%) build entries
with code reviews.
Chi-squared tests (p-
value=2.2e-16<0.05)
5. ANSWERING RQ1: BUILD STATUS &
CODE REVIEW PARTICIPATION
5
Previous
Build status
#PR with Review Comments
Only Added↑ Only Removed↓ Total Changed↑↓
Canceled 20 24 65
Errored 510 265 812
Failed 1,542 826 2,316
Passed 4,235 1,788 5,677
All 6,307 2,903 8,870 (28%)
31,648 PRs for 232K entries from 1000+ projects
For 28% PR, #review comments changed.
Passed builds triggered 18% of new reviews.
Errored + Failed triggered 10%
6. ANSWERING RQ2: BUILD FREQUENCY &
CODE REVIEW QUALITY
6
Quantile Issue Comments PR Comments All Review Comments
M p-value ∆ M p-value ∆ M p-value ∆
Q1
0.60
<0.001* 0.35
0.24
<0.001* 0.49
0.84
<0.001* 0.41
Q4
0.99 0.52 1.50
M= Mean #review comments, * = Statistically significant, ∆ = Cliff’s Delta
7. ANSWERING RQ2: BUILD FREQUENCY &
CODE REVIEW QUALITY
5 projects from Q1, and 5 from Q4, 3-4 years old
Cumulative #review comments/build over 48 months
Code review quality (i.e., #comments) improved almost
linearly for frequently built projects
Didn’t happen so for the counterpart, looks zigzag.
7
8. ANSWERING RQ3: PREDICTION OF NEW
CODE REVIEW TRIGGERING
Learning
Algorithm
Overall
Accuracy
New Review Triggered?
Precision Recall
Naïve Bayes 58.03% 68.70% 29.50%
Logistic Regression 60.56% 64.50% 47.00%
J48 64.04% 69.50% 50.10%
8
Features: build status, code change statistics, test
change statistics, and code review comments.
Response: New review triggered or unchanged.
Three ML algorithms with 10-fold cross-validation.
26.5K build entries as balanced dataset.
J48 performed the best, 64% accuracy, 69.50%
precision & 50% recall.
9. TAKE-HOME MESSAGES
Automated builds might influence manual code
reviews since they interleave each other in the
modern pull-based development
Passed builds more associated with review
participations, and with new code reviews.
Frequently built projects received more review
comments than less frequently built ones.
Code review activities are steady over time with
frequently built projects. Not true for the
counterparts.
Our prediction model can predict whether a
build will trigger new code review or not.
9