1
Junji
Shimagaki
Yasutaka
Kamei
Ahmed E.
Hassan
Naoyasu
Ubayashi
Shane
McIntosh
A Study of the Quality-Impacting Practices
of Modern Code Review at Sony Mobile
2
Code review is an important
software quality assurance practice
Programmer Code reviewer
3
Sony Mobile uses
Gerrit Code Review tools
1. Commit message
2. Files under review
4
Sony Mobile uses
Gerrit Code Review tools
1. Commit message
4. Review scores
3. Reviewers
2. Files under review
5
Code Review context
at Sony Mobile
“Code-Review”. (e.g., syntax, grammar, logic..)
Code Review context
at Sony Mobile
“Code-Review”. (e.g., syntax, grammar, logic..)
Test results on
Application crashes?
Reboot after 10 sec6
?
7
Lax reviewing practices
impact software quality
Reviewer
Programmer
Poorly
reviewed code
Code
repository
McIntosh et al., EMSE 2015
Poorly
reviewed code
Reviewer Programmer
Code
repository
McIntosh et al., EMSE 2015
8
How about at Sony Mobile?
Lax reviewing practices
impact software quality
9
Approach
Quantitative
study
Qualitative
study
Replication of
McIntosh et al.,
EMSE 2015
Developer surveys
at Sony Mobile
with 100+ people
Implications
Better code review
practices validated
by stakeholders
10
Review participation
Quick results
Simple adaptation does not work!
Review coverage
Self approval✗
Discussion volume✗
Un-review ratio✗
11
Sony's unique apps, HW
A software project for this.
Target system: A smartphone product
of a release cycle for 6 months
A software project for this.
Target system: A smartphone product
of a release cycle for 6 months
Sony's unique apps, HW
Chipset and modem
Android OS
Strong dependencies on third-party system12 s
Why might Sony Mobile be Different?
Third-party
dependencies
Offline
Communication
Embedded
Software
Development
13
Third-party
dependencies
Why might Sony Mobile be Different?
EmbeddedOffline Software
Communication
Development
Previous studied systems are
less impacted by third-party dependencies.14
Third-party
dependencies
Why might Sony Mobile be Different?
Embedded
Software
Development
Offline
Communication
Previous studied systems rely on
online communication methods.15
16
Why might Sony Mobile be Different?
Embedded
Software
Development
Third-party Offline
dependencies Communication
Previous studied systems are of applications
at higher levels in the application stack
Do reviewing practices
impact software quality?
Review coverage
Un-review ratio
Third-party ratio
✗
✓
Components with higher third-party
codebase ratio are more defect-prone17
Review participation
Do reviewing practices
impact software quality?
Discussion volume
Patch update activity
✗
✓
Components with high patch update
activity are less defect prone. 18
Review participation
Do reviewing practices
impact software quality?
Self approval
Self verify
✗
✓
Components which are prevailed with self
verification practices are more defect prone19
20
Do reviewing practices
impact software quality?
Review coverage
Self verify
Third-party ratio✓
Review participation
✓ Patch update activity
✓
Do reviewing practices
impact software quality?
Review coverage
✓ Third-party ratio
Review participation
✓ Self verify
Why are these metrics effective?
Let's ask the developers!
22
Qualitative Study Approach
Presentation
Initial survey
(93 stakeholders)
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
Interviewee's list
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
23
Qualitative Study Approach
Semi-structured
Interviews
(15 key engineers)
Presentation
Initial survey
(93 stakeholders)
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
Interviewee's list Implications
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
Qualitative Study Approach
Semi-structured
Interviews
(15 key engineers)
Presentation
Initial survey
(93 stakeholders)
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
Interviewee's list Implications
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
Validation survey
(25 senior
stakeholders)
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
Confirmed
Implicatio24
ns
✓
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
–---------
–-------
“An external codebase takes more time
from me to understand the code and
to develop patches.”
Developers require more time and effort to
understand, extend, or repair components with
high third-party rates.
Why does third-party ratio matter
at Sony Mobile?
Software engineer
✓92% of stakeholders agreed 25
Why does self-verify rate matter
at Sony Mobile?
The self-verification practice is coloured by
the author’s subjective perspective,
which may bias the testing procedures and results.
“I understand the architecture, and
I am the one who can test my
commit properly.”
Software engineer
✓75% of stakeholders agreed 26
Why does patch update rate
matter at Sony Mobile?
“Patch updates rate” captures developer
effort in a way that is not diminished by
in-person discussion at Sony Mobile.
“… it is much easier to work with
direct communication rather
than with the Gerrit tools.”
Software architect
✓81% of stakeholders agreed 27
Discouraging the practice of
self-verification
Investigating ways to encourage
passive developers to participate more
in code review 28
What is Sony Mobile doing to
adjust their reviewing process?
QA has new focus on
test coverage of external code
Investigating ways to encourage
passive developers to participate more
in code review 29
What is Sony Mobile doing to
adjust their reviewing process?
QA has new focus on
test coverage of external code
Discouraging the practice of
self-verification
30in code review
What is Sony Mobile doing to
adjust their reviewing process?
QA has new focus on
test coverage of external code
Discouraging the practice of
self-verification
Investigating ways to encourage
passive developers to participate more
31
32
33
34
35
36
Backup slides
37
Review coverage of a component
✓
✓
✓ ✓
✓
✓
✓
Review is performed
at the Sony Mobile's Gerrit
✓✓✓✓ ✓✓✓
Code review participation metrics
✓
✓
✓
✓
✓ ✓
Discussion volume
recorded in Gerrit
Number of commits
approved by her/his own
Self review only?
Enough code review effort? 38
But, again, they do NOT share
relationship with defect proneness
✓
✓
✓
✓
✓ ✓
recorded in Gerrit
approved by her/his own
✗Number of commits
39
✗Discussion volume
Adjusted review participation metrics share
relationship with defect-proneness
✓
✓
✓
✓
✓ ✓
Patch updates
activity
verified by her/his own
Lax reviewing practices are associated with
defect-proneness. 40
✓Number of commits
✓
41
External components tend to be
InHouse
more defect-prone.
Defect-proneness
declines as In-House
ratio increases.
Review coverage
No significant link with
defect proneness
In-House shares a stronger relationship
with defect proneness at Sony Mobile
Self-verify shares an increasing
relationship with defect proneness.
Number of self_verify commits
Defect
proneness
(Logit
transformed)
42
Defect
proneness
(Logit
transformed)
Patch updates activity
Patch updates activity shares an decreasing
relationship with defect proneness.
43
44
Quantitative
study
Replication of
McIntosh et al.,
EMSE 2015
Do reviewing practices impact
software quality at Sony Mobile?
Quantitative
study
Qualitative
study
with 100+ people45EMSE 2015
Replication of
McIntosh et al.,
Developer surveys
at Sony Mobile
Do reviewing practices impact
software quality at Sony Mobile?
Quantitative
study
Qualitative
study
Replication of
McIntosh et al.,
by stakeholders with 100+ people46EMSE 2015
Developer surveys
at Sony Mobile
Do reviewing practices impact
software quality at Sony Mobile?
Implications
Better code review
practices validated
47
A software project for this.
Target system: A smartphone product
of a release cycle for 6 months
700 components
...
300 components
...
We study defect-proneness
of those 1,000 components
Do reviewing practices
impact software quality?
RQ1:
Review coverage
RQ2:
Review participation
48
Review coverage of a component
Code repository of
1 component
1 commit
49
50
Review coverage of a component
✓
✓
✓ ✓
✓
✓
✓
8 commits
4 reviewed
5 commits
1 reviewed
2 commits
2 reviewed
Review is performed
at the Sony Mobile's Gerrit
Review coverage of a component
✓
✓
✓ ✓
✓
✓
✓
50% 20% 100%
51
However, it does NOT share
relationship with defect-proneness
✓
✓
✓ ✓
✓
✓
✓
✗50%
52
✗20% ✗100%
At Sony Mobile, review status is equivalent to
whether it is made 'In-House'
✓
✓
✓ ✓
Sony Mobile's
internal patches
Linux kernel's
baseline commits
53
But, our defined bags look too small to
represent 'In-House' made ratio
✓
✓
✓ ✓
Commits during
development of Slipped historic
kernel commits
54
55
We adjusted the definition of
'review coverage'
✓
✓
✓ ✓
→ ??%
Proportion of 'In-House' commits in total
56
Adjusted review coverage shares
relationship with defect-proneness
✓✓
✓✓
✓
✓
✓
✓
✓
✓ ✓
✓0.1% ✓1% ✓100%
External originated components tend to be
more defect prone.
Do reviewing practices
impact software quality?
RQ1:
Review coverage
✓YES!In-House ratio
RQ2:
Review participation
???57
58
Ok, code-reviewed but...
✓
A reviewed commit
no guarantee of active participation
✓
Definitions of participation
Who
approved this
commit?
59
✓
Sufficient
discussion
(effort) made?
Who
approved this
commit?
60
Definitions of participation
✓
Code review participation metrics
Sufficient
discussion
(effort) made?
Number of commits
approved by her/his own
Discussion volume
recorded in Gerrit
WWhohodid
aapppprroovveedtthhiiss
ccoommmmiitt??
61
✓
Sufficient
discussion
(effort) made?
approved by her/his own
✗Number of commits
recorded in Gerrit
✗Discussion volume
But, they do NOT share relationship
with defect proneness
WWhohodid
aapppprroovveedtthhiiss
ccoommmmiitt??
62
We adjust self-approval
We only counted the number of self “Code-Review”
We also count the number of self “Verified”
63
Code review process at
Sony Mobile
Code
reviewer
Programmer
1.
Code.
2.
Upload.
3.
Review.
4.
Verify on HW.
Gerrit
server
5. Privileged.
6. Submit.
64
65
Code review system
at Sony Mobile
We adjust effort
“… it is much easier to work with
direct communication rather
than with the Gerrit tools.”
Software architect
66
No longer assume discussion is online.
We introduce a new “patch update ratio”
✓
Who
verified this
commit?
Sufficient
discussion
(effort) made?
verified by her/his own
✓Number of commits
activity
67
✓Patch updates
Adjusted review participation metrics share
relationship with defect-proneness

A Study of the Quality-Impacting Practices of Modern Code Review at Sony Mobile

  • 1.
    1 Junji Shimagaki Yasutaka Kamei Ahmed E. Hassan Naoyasu Ubayashi Shane McIntosh A Studyof the Quality-Impacting Practices of Modern Code Review at Sony Mobile
  • 2.
    2 Code review isan important software quality assurance practice Programmer Code reviewer
  • 3.
    3 Sony Mobile uses GerritCode Review tools 1. Commit message 2. Files under review
  • 4.
    4 Sony Mobile uses GerritCode Review tools 1. Commit message 4. Review scores 3. Reviewers 2. Files under review
  • 5.
    5 Code Review context atSony Mobile “Code-Review”. (e.g., syntax, grammar, logic..)
  • 6.
    Code Review context atSony Mobile “Code-Review”. (e.g., syntax, grammar, logic..) Test results on Application crashes? Reboot after 10 sec6 ?
  • 7.
    7 Lax reviewing practices impactsoftware quality Reviewer Programmer Poorly reviewed code Code repository McIntosh et al., EMSE 2015
  • 8.
    Poorly reviewed code Reviewer Programmer Code repository McIntoshet al., EMSE 2015 8 How about at Sony Mobile? Lax reviewing practices impact software quality
  • 9.
    9 Approach Quantitative study Qualitative study Replication of McIntosh etal., EMSE 2015 Developer surveys at Sony Mobile with 100+ people Implications Better code review practices validated by stakeholders
  • 10.
    10 Review participation Quick results Simpleadaptation does not work! Review coverage Self approval✗ Discussion volume✗ Un-review ratio✗
  • 11.
    11 Sony's unique apps,HW A software project for this. Target system: A smartphone product of a release cycle for 6 months
  • 12.
    A software projectfor this. Target system: A smartphone product of a release cycle for 6 months Sony's unique apps, HW Chipset and modem Android OS Strong dependencies on third-party system12 s
  • 13.
    Why might SonyMobile be Different? Third-party dependencies Offline Communication Embedded Software Development 13
  • 14.
    Third-party dependencies Why might SonyMobile be Different? EmbeddedOffline Software Communication Development Previous studied systems are less impacted by third-party dependencies.14
  • 15.
    Third-party dependencies Why might SonyMobile be Different? Embedded Software Development Offline Communication Previous studied systems rely on online communication methods.15
  • 16.
    16 Why might SonyMobile be Different? Embedded Software Development Third-party Offline dependencies Communication Previous studied systems are of applications at higher levels in the application stack
  • 17.
    Do reviewing practices impactsoftware quality? Review coverage Un-review ratio Third-party ratio ✗ ✓ Components with higher third-party codebase ratio are more defect-prone17
  • 18.
    Review participation Do reviewingpractices impact software quality? Discussion volume Patch update activity ✗ ✓ Components with high patch update activity are less defect prone. 18
  • 19.
    Review participation Do reviewingpractices impact software quality? Self approval Self verify ✗ ✓ Components which are prevailed with self verification practices are more defect prone19
  • 20.
    20 Do reviewing practices impactsoftware quality? Review coverage Self verify Third-party ratio✓ Review participation ✓ Patch update activity ✓
  • 21.
    Do reviewing practices impactsoftware quality? Review coverage ✓ Third-party ratio Review participation ✓ Self verify Why are these metrics effective? Let's ask the developers!
  • 22.
    22 Qualitative Study Approach Presentation Initialsurvey (93 stakeholders) –--------- –------- –--------- –------- –--------- –------- –--------- –------- –--------- –------- –--------- –------- Interviewee's list –--------- –------- –--------- –------- –--------- –------- –--------- –-------
  • 23.
    23 Qualitative Study Approach Semi-structured Interviews (15key engineers) Presentation Initial survey (93 stakeholders) –--------- –------- –--------- –------- –--------- –------- –--------- –------- –--------- –------- –--------- –------- –--------- –------- –--------- –------- –--------- –------- –--------- –------- –--------- –------- –--------- –------- Interviewee's list Implications –--------- –------- –--------- –------- –--------- –------- –--------- –-------
  • 24.
    Qualitative Study Approach Semi-structured Interviews (15key engineers) Presentation Initial survey (93 stakeholders) –--------- –------- –--------- –------- –--------- –------- –--------- –------- –--------- –------- –--------- –------- –--------- –------- –--------- –------- –--------- –------- –--------- –------- –--------- –------- –--------- –------- Interviewee's list Implications –--------- –------- –--------- –------- –--------- –------- –--------- –------- Validation survey (25 senior stakeholders) –--------- –------- –--------- –------- –--------- –------- –--------- –------- –--------- –------- –--------- –------- Confirmed Implicatio24 ns ✓ –--------- –------- –--------- –------- –--------- –------- –--------- –------- –--------- –------- –--------- –-------
  • 25.
    “An external codebasetakes more time from me to understand the code and to develop patches.” Developers require more time and effort to understand, extend, or repair components with high third-party rates. Why does third-party ratio matter at Sony Mobile? Software engineer ✓92% of stakeholders agreed 25
  • 26.
    Why does self-verifyrate matter at Sony Mobile? The self-verification practice is coloured by the author’s subjective perspective, which may bias the testing procedures and results. “I understand the architecture, and I am the one who can test my commit properly.” Software engineer ✓75% of stakeholders agreed 26
  • 27.
    Why does patchupdate rate matter at Sony Mobile? “Patch updates rate” captures developer effort in a way that is not diminished by in-person discussion at Sony Mobile. “… it is much easier to work with direct communication rather than with the Gerrit tools.” Software architect ✓81% of stakeholders agreed 27
  • 28.
    Discouraging the practiceof self-verification Investigating ways to encourage passive developers to participate more in code review 28 What is Sony Mobile doing to adjust their reviewing process? QA has new focus on test coverage of external code
  • 29.
    Investigating ways toencourage passive developers to participate more in code review 29 What is Sony Mobile doing to adjust their reviewing process? QA has new focus on test coverage of external code Discouraging the practice of self-verification
  • 30.
    30in code review Whatis Sony Mobile doing to adjust their reviewing process? QA has new focus on test coverage of external code Discouraging the practice of self-verification Investigating ways to encourage passive developers to participate more
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
    37 Review coverage ofa component ✓ ✓ ✓ ✓ ✓ ✓ ✓ Review is performed at the Sony Mobile's Gerrit ✓✓✓✓ ✓✓✓
  • 38.
    Code review participationmetrics ✓ ✓ ✓ ✓ ✓ ✓ Discussion volume recorded in Gerrit Number of commits approved by her/his own Self review only? Enough code review effort? 38
  • 39.
    But, again, theydo NOT share relationship with defect proneness ✓ ✓ ✓ ✓ ✓ ✓ recorded in Gerrit approved by her/his own ✗Number of commits 39 ✗Discussion volume
  • 40.
    Adjusted review participationmetrics share relationship with defect-proneness ✓ ✓ ✓ ✓ ✓ ✓ Patch updates activity verified by her/his own Lax reviewing practices are associated with defect-proneness. 40 ✓Number of commits ✓
  • 41.
    41 External components tendto be InHouse more defect-prone. Defect-proneness declines as In-House ratio increases. Review coverage No significant link with defect proneness In-House shares a stronger relationship with defect proneness at Sony Mobile
  • 42.
    Self-verify shares anincreasing relationship with defect proneness. Number of self_verify commits Defect proneness (Logit transformed) 42
  • 43.
    Defect proneness (Logit transformed) Patch updates activity Patchupdates activity shares an decreasing relationship with defect proneness. 43
  • 44.
    44 Quantitative study Replication of McIntosh etal., EMSE 2015 Do reviewing practices impact software quality at Sony Mobile?
  • 45.
    Quantitative study Qualitative study with 100+ people45EMSE2015 Replication of McIntosh et al., Developer surveys at Sony Mobile Do reviewing practices impact software quality at Sony Mobile?
  • 46.
    Quantitative study Qualitative study Replication of McIntosh etal., by stakeholders with 100+ people46EMSE 2015 Developer surveys at Sony Mobile Do reviewing practices impact software quality at Sony Mobile? Implications Better code review practices validated
  • 47.
    47 A software projectfor this. Target system: A smartphone product of a release cycle for 6 months 700 components ... 300 components ... We study defect-proneness of those 1,000 components
  • 48.
    Do reviewing practices impactsoftware quality? RQ1: Review coverage RQ2: Review participation 48
  • 49.
    Review coverage ofa component Code repository of 1 component 1 commit 49
  • 50.
    50 Review coverage ofa component ✓ ✓ ✓ ✓ ✓ ✓ ✓ 8 commits 4 reviewed 5 commits 1 reviewed 2 commits 2 reviewed Review is performed at the Sony Mobile's Gerrit
  • 51.
    Review coverage ofa component ✓ ✓ ✓ ✓ ✓ ✓ ✓ 50% 20% 100% 51
  • 52.
    However, it doesNOT share relationship with defect-proneness ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✗50% 52 ✗20% ✗100%
  • 53.
    At Sony Mobile,review status is equivalent to whether it is made 'In-House' ✓ ✓ ✓ ✓ Sony Mobile's internal patches Linux kernel's baseline commits 53
  • 54.
    But, our definedbags look too small to represent 'In-House' made ratio ✓ ✓ ✓ ✓ Commits during development of Slipped historic kernel commits 54
  • 55.
    55 We adjusted thedefinition of 'review coverage' ✓ ✓ ✓ ✓ → ??% Proportion of 'In-House' commits in total
  • 56.
    56 Adjusted review coverageshares relationship with defect-proneness ✓✓ ✓✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓0.1% ✓1% ✓100% External originated components tend to be more defect prone.
  • 57.
    Do reviewing practices impactsoftware quality? RQ1: Review coverage ✓YES!In-House ratio RQ2: Review participation ???57
  • 58.
    58 Ok, code-reviewed but... ✓ Areviewed commit no guarantee of active participation
  • 59.
  • 60.
  • 61.
    ✓ Code review participationmetrics Sufficient discussion (effort) made? Number of commits approved by her/his own Discussion volume recorded in Gerrit WWhohodid aapppprroovveedtthhiiss ccoommmmiitt?? 61
  • 62.
    ✓ Sufficient discussion (effort) made? approved byher/his own ✗Number of commits recorded in Gerrit ✗Discussion volume But, they do NOT share relationship with defect proneness WWhohodid aapppprroovveedtthhiiss ccoommmmiitt?? 62
  • 63.
    We adjust self-approval Weonly counted the number of self “Code-Review” We also count the number of self “Verified” 63
  • 64.
    Code review processat Sony Mobile Code reviewer Programmer 1. Code. 2. Upload. 3. Review. 4. Verify on HW. Gerrit server 5. Privileged. 6. Submit. 64
  • 65.
  • 66.
    We adjust effort “…it is much easier to work with direct communication rather than with the Gerrit tools.” Software architect 66 No longer assume discussion is online. We introduce a new “patch update ratio”
  • 67.
    ✓ Who verified this commit? Sufficient discussion (effort) made? verifiedby her/his own ✓Number of commits activity 67 ✓Patch updates Adjusted review participation metrics share relationship with defect-proneness