The Impact of Task Granularity on
Co-evolution Analyses
Yasutaka
Kamei
Keisuke

Miura
Shane

McIntosh
Naoyasu

Ubayashi
Ahmed E.

Hassan
Software evolution aims to recover 

knowledge about development
2
Repositories Knowledge
GerritGit GitHub
RHSA Mylyn
Co-evolution of production &
test code
3
Growth history view of ArgoUML [1]
[1] A. Zaidman, B. Van Rompaey, S. Demeyer, and A. Van Deursen. Mining software repositories to study co-evolution of production & test code. 

In Proc. Int’l Conf. on Software Testing, Verification, and Validation (ICST’08), pages 220–229, 2008.
There are several levels of
granularity
4
There are several levels of
granularity
4
Commits
There are several levels of
granularity
4
Pull 

Requests
Merge
Commits
There are several levels of
granularity
4
Pull 

Requests
Merge v2.0
ReleasesCommits
Some issues may require several
commits
5
Fix #1000
A
Test

A
Fix #1100
B
Some issues may require several
commits
5
Fix #1000
A
Test

A
Fix #1100
B
Some issues may require several
commits
5
Fix #1000
A
Test

A
Fix #1100
Test

B
1 day 

later
Fix #1100
B
Some issues may require several
commits
5
Fix #1000
A
Test

ACommit-level analysis would miss the 

co-change relationship between them
Fix #1100
Test

B
1 day 

later
Work items can be used to study
software evolution
6
Jira
#1000 #1100
Jira
#1000 #1100
Work items can be used to study
software evolution
7
Git
Jira
#1000 #1100
Fix #1000
A
Test

A
B
Test

B
Fix #1100Fix #1100
Work items can be used to study
software evolution
8
Git
Jira
#1000 #1100
Fix #1000
A
Test

A
B
Test

B
Fix #1100Fix #1100
Work items can be used to study
software evolution
9
Git
Jira
#1000 #1100
Fix #1000
A
Test

A
B
Test

B
Fix #1100Fix #1100
How does the work item
granularity impact co-
evolution analyses?
Work items can be used to study
software evolution
10
Studied systems
11
Studied systems
11
Jira
Studied systems
11
Jira
119 systems
12
System size ITS usage
Two important criteria that needed to
be satisfied to qualify for our analysis
12
System size ITS usage
Two important criteria that needed to
be satisfied to qualify for our analysis
Jira
13
System size ITS usage
100
75
50
25
0 10,000 20,000
# of commits
Two important criteria that needed to
be satisfied to qualify for our analysis
13
System size ITS usage
100
75
50
25
0 10,000 20,000
# of commits
Two important criteria that needed to
be satisfied to qualify for our analysis
Jira
Studied systems
14
15
How often are work items
composed of several commits?
15
Git
Jira
#1000
Bug Fix
#1100
BA
Fix #1000
#1100
C
#1200
Fix a Bug
#1200
How often are work items
composed of several commits?
16
Median of 29% of work items
consist of two or more commits
16
Median of 29% of work items
consist of two or more commits
Granularity may have a
considerable impact on
co-evolution analyses
17
The impact of the work item
granularity
File
Spread
Time
Spread
Developer
Spread
17
The impact of the work item
granularity
File
Spread
Time
Spread
Developer
Spread
18
How many files are changed by
the commits of work items?
Git
#1000
A
Test

A
A2 A3
Fix #1000 Fix #1000 Fix #1000
18
How many files are changed by
the commits of work items?
Git
#1000
A
Test

A
A2 A3
Fix #1000 Fix #1000 Fix #1000
19
OpenJPA #1763:
An example of file spread
19
OpenJPA #1763:
An example of file spread
19
OpenJPA #1763:
An example of file spread
19
OpenJPA #1763:
An example of file spread
19
OpenJPA #1763:
An example of file spread
60% (= 3 / 5)
20
First-commit analysis would overlook
24% of the co-changed files (median)
21
QPID-4575: adds support for
Visual Studio 2012
Git
1st commit 5th commit
.cpp .h .cpproj
21
QPID-4575: adds support for
Visual Studio 2012
Git
1st commit 5th commit
.cpp .h
This co-change activity of
production code and build
system would be missed
.cpproj
22
The impact of the work item
granularity
File
Spread
Time
Spread
Developer
Spread
24% of the co-
changed files
are overlooked
22
The impact of the work item
granularity
File
Spread
Time
Spread
Developer
Spread
24% of the co-
changed files
are overlooked
How much time elapses
between the commits
of work items?
Sliding time window technique
A common setting in software evolution
studies
23
Same commit message
Same developer
Similar time (300 secs)
Sliding time window technique
A common setting in software evolution
studies
23
A
Test

A
Fix #1000 Fix #1000
Git
Same commit message
Same developer
Similar time (300 secs)
Sliding time window technique
A common setting in software evolution
studies
23
A
Test

A
Fix #1000 Fix #1000
Git
< 300 secs
Same commit message
Same developer
Similar time (300 secs)
An example of time spread
24
R1 R2 R3 R4
Jira
#1000
An example of time spread
24
R1 R2 R3 R4
250 secs 200 secs 400 secs
Jira
#1000
An example of time spread
24
R1 R2 R3 R4
250 secs 200 secs 400 secs
33% (= 1 / 3)
Jira
#1000
25
48%-97% of related commits cannot
be grouped using the sliding window
ACCUMULO-1890
Clean up the test
to avoid spinning
up a MAC
26
ACCUMULO-1890: recovers from a
failure due to limited resources
ACCUMULO-1890
Clean up the test
to avoid spinning
up a MAC
26
11 minutes later
ACCUMULO-1890: recovers from a
failure due to limited resources
ACCUMULO-1890
Clean up the test
to avoid spinning
up a MAC
26
ACCUMULO-1890
Forgot to re-add
changes before
commit
11 minutes later
ACCUMULO-1890: recovers from a
failure due to limited resources
27
The impact of the work item
granularity
File
Spread
Time
Spread
Developer
Spread
24% of the co-
changed files
are overlooked
83% of related
commits cannot
be grouped
27
The impact of the work item
granularity
File
Spread
Time
Spread
Developer
Spread
24% of the co-
changed files
are overlooked
83% of related
commits cannot
be grouped
How many developers
are involved across
revisions of a work item?
Sliding time window technique
A common setting in software evolution
studies
28
A
Test

A
Fix #1000 Fix #1000
Git
< 300 secs
Same commit message
Same developer
Similar time (300 secs)
29
How many developers are involved
across revisions of a work item?
29
How many developers are involved
across revisions of a work item?
Jira
#1000
29
How many developers are involved
across revisions of a work item?
Jira
#1000
Git
Bug Fix
#1000
BA
Fix #1000
C
Fix a Bug
#1000
29
How many developers are involved
across revisions of a work item?
Jira
#1000
Git
Bug Fix
#1000
BA
Fix #1000
C
Fix a Bug
#1000
30
25% of work items involve
multiple developers
31
The impact of the work item
granularity
File
Spread
Time
Spread
Developer
Spread
24% of the co-
changed files
are overlooked
83% of related
commits cannot
be grouped
25% of work
items involve
multiple
developers
32
[2]Q. Xuan and V. Filkov. Building it together: Synchronous development in OSS. In Proc. Int’l Conf. on Software Engineering (ICSE’14), pages 222–233, 2014.
A set of commits where one file is modified
by multiple developers within a time window
Synchronous development [2]
32
[2]Q. Xuan and V. Filkov. Building it together: Synchronous development in OSS. In Proc. Int’l Conf. on Software Engineering (ICSE’14), pages 222–233, 2014.
A set of commits where one file is modified
by multiple developers within a time window
A
Synchronous development [2]
33
[2]Q. Xuan and V. Filkov. Building it together: Synchronous development in OSS. In Proc. Int’l Conf. on Software Engineering (ICSE’14), pages 222–233, 2014.
A set of commits where different files are
modified by multiple developers under the
same work item
A
Test

A
#1000
Collaborative development
34
Collaborative development
[2]Q. Xuan and V. Filkov. Building it together: Synchronous development in OSS. In Proc. Int’l Conf. on Software Engineering (ICSE’14), pages 222–233, 2014.
A set of commits where different files are
modified by multiple developers under the
same work item
A
Test

A
#1000
We investigate collaborative
work items that cannot be
detected as synchronous ones
This type of collaboration is not rare
27%-83% of collaborative work items
involve developers modifying different files
35
Conclusion
36
Median of 29% of work items
consist of two or more commits
Granularity may have a
considerable impact on
co-evolution analyses
Median of 29% of work items
consist of two or more commits
Granularity may have a
considerable impact on
co-evolution analyses
Studied systems
Median of 29% of work items
consist of two or more commits
Granularity may have a
considerable impact on
co-evolution analyses
Studied systems
The impact of the work item
granularity
File
Spread
Time
Spread
Developer
Spread
24% of the co-
changed files
are overlooked
83% of related
commits cannot
be grouped
25% of work
items involve
multiple
developers
Median of 29% of work items
consist of two or more commits
Granularity may have a
considerable impact on
co-evolution analyses
Studied systems
The impact of the work item
granularity
File
Spread
Time
Spread
Developer
Spread
24% of the co-
changed files
are overlooked
83% of related
commits cannot
be grouped
25% of work
items involve
multiple
developers
Given the impact that work item grouping,
we recommend that future software
evolution studies will be performed at the
work item level.
39
Back up slides
Work item aggregation
40
Git
AMBARI-15217.
Folder name spills
out of Upload file
window
Update ambari
docs for ambari
2.2.0 release
/PROJECT_NAME.

?(d+)/i
/PROJECT_NAME.

?(d+)[.-]
d/i

The Impact of Task Granularity on Co-evolution Analyses

  • 1.
    The Impact ofTask Granularity on Co-evolution Analyses Yasutaka Kamei Keisuke
 Miura Shane
 McIntosh Naoyasu
 Ubayashi Ahmed E.
 Hassan
  • 2.
    Software evolution aimsto recover 
 knowledge about development 2 Repositories Knowledge GerritGit GitHub RHSA Mylyn
  • 3.
    Co-evolution of production& test code 3 Growth history view of ArgoUML [1] [1] A. Zaidman, B. Van Rompaey, S. Demeyer, and A. Van Deursen. Mining software repositories to study co-evolution of production & test code. 
 In Proc. Int’l Conf. on Software Testing, Verification, and Validation (ICST’08), pages 220–229, 2008.
  • 4.
    There are severallevels of granularity 4
  • 5.
    There are severallevels of granularity 4 Commits
  • 6.
    There are severallevels of granularity 4 Pull 
 Requests Merge Commits
  • 7.
    There are severallevels of granularity 4 Pull 
 Requests Merge v2.0 ReleasesCommits
  • 8.
    Some issues mayrequire several commits 5 Fix #1000 A Test
 A
  • 9.
    Fix #1100 B Some issuesmay require several commits 5 Fix #1000 A Test
 A
  • 10.
    Fix #1100 B Some issuesmay require several commits 5 Fix #1000 A Test
 A Fix #1100 Test
 B 1 day 
 later
  • 11.
    Fix #1100 B Some issuesmay require several commits 5 Fix #1000 A Test
 ACommit-level analysis would miss the 
 co-change relationship between them Fix #1100 Test
 B 1 day 
 later
  • 12.
    Work items canbe used to study software evolution 6 Jira #1000 #1100
  • 13.
    Jira #1000 #1100 Work itemscan be used to study software evolution 7
  • 14.
    Git Jira #1000 #1100 Fix #1000 A Test
 A B Test
 B Fix#1100Fix #1100 Work items can be used to study software evolution 8
  • 15.
    Git Jira #1000 #1100 Fix #1000 A Test
 A B Test
 B Fix#1100Fix #1100 Work items can be used to study software evolution 9
  • 16.
    Git Jira #1000 #1100 Fix #1000 A Test
 A B Test
 B Fix#1100Fix #1100 How does the work item granularity impact co- evolution analyses? Work items can be used to study software evolution 10
  • 17.
  • 18.
  • 19.
  • 20.
    12 System size ITSusage Two important criteria that needed to be satisfied to qualify for our analysis
  • 21.
    12 System size ITSusage Two important criteria that needed to be satisfied to qualify for our analysis Jira
  • 22.
    13 System size ITSusage 100 75 50 25 0 10,000 20,000 # of commits Two important criteria that needed to be satisfied to qualify for our analysis
  • 23.
    13 System size ITSusage 100 75 50 25 0 10,000 20,000 # of commits Two important criteria that needed to be satisfied to qualify for our analysis Jira
  • 24.
  • 25.
    15 How often arework items composed of several commits?
  • 26.
    15 Git Jira #1000 Bug Fix #1100 BA Fix #1000 #1100 C #1200 Fixa Bug #1200 How often are work items composed of several commits?
  • 27.
    16 Median of 29%of work items consist of two or more commits
  • 28.
    16 Median of 29%of work items consist of two or more commits Granularity may have a considerable impact on co-evolution analyses
  • 29.
    17 The impact ofthe work item granularity File Spread Time Spread Developer Spread
  • 30.
    17 The impact ofthe work item granularity File Spread Time Spread Developer Spread
  • 31.
    18 How many filesare changed by the commits of work items? Git #1000 A Test
 A A2 A3 Fix #1000 Fix #1000 Fix #1000
  • 32.
    18 How many filesare changed by the commits of work items? Git #1000 A Test
 A A2 A3 Fix #1000 Fix #1000 Fix #1000
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
    19 OpenJPA #1763: An exampleof file spread 60% (= 3 / 5)
  • 38.
    20 First-commit analysis wouldoverlook 24% of the co-changed files (median)
  • 39.
    21 QPID-4575: adds supportfor Visual Studio 2012 Git 1st commit 5th commit .cpp .h .cpproj
  • 40.
    21 QPID-4575: adds supportfor Visual Studio 2012 Git 1st commit 5th commit .cpp .h This co-change activity of production code and build system would be missed .cpproj
  • 41.
    22 The impact ofthe work item granularity File Spread Time Spread Developer Spread 24% of the co- changed files are overlooked
  • 42.
    22 The impact ofthe work item granularity File Spread Time Spread Developer Spread 24% of the co- changed files are overlooked How much time elapses between the commits of work items?
  • 43.
    Sliding time windowtechnique A common setting in software evolution studies 23 Same commit message Same developer Similar time (300 secs)
  • 44.
    Sliding time windowtechnique A common setting in software evolution studies 23 A Test
 A Fix #1000 Fix #1000 Git Same commit message Same developer Similar time (300 secs)
  • 45.
    Sliding time windowtechnique A common setting in software evolution studies 23 A Test
 A Fix #1000 Fix #1000 Git < 300 secs Same commit message Same developer Similar time (300 secs)
  • 46.
    An example oftime spread 24 R1 R2 R3 R4 Jira #1000
  • 47.
    An example oftime spread 24 R1 R2 R3 R4 250 secs 200 secs 400 secs Jira #1000
  • 48.
    An example oftime spread 24 R1 R2 R3 R4 250 secs 200 secs 400 secs 33% (= 1 / 3) Jira #1000
  • 49.
    25 48%-97% of relatedcommits cannot be grouped using the sliding window
  • 50.
    ACCUMULO-1890 Clean up thetest to avoid spinning up a MAC 26 ACCUMULO-1890: recovers from a failure due to limited resources
  • 51.
    ACCUMULO-1890 Clean up thetest to avoid spinning up a MAC 26 11 minutes later ACCUMULO-1890: recovers from a failure due to limited resources
  • 52.
    ACCUMULO-1890 Clean up thetest to avoid spinning up a MAC 26 ACCUMULO-1890 Forgot to re-add changes before commit 11 minutes later ACCUMULO-1890: recovers from a failure due to limited resources
  • 53.
    27 The impact ofthe work item granularity File Spread Time Spread Developer Spread 24% of the co- changed files are overlooked 83% of related commits cannot be grouped
  • 54.
    27 The impact ofthe work item granularity File Spread Time Spread Developer Spread 24% of the co- changed files are overlooked 83% of related commits cannot be grouped How many developers are involved across revisions of a work item?
  • 55.
    Sliding time windowtechnique A common setting in software evolution studies 28 A Test
 A Fix #1000 Fix #1000 Git < 300 secs Same commit message Same developer Similar time (300 secs)
  • 56.
    29 How many developersare involved across revisions of a work item?
  • 57.
    29 How many developersare involved across revisions of a work item? Jira #1000
  • 58.
    29 How many developersare involved across revisions of a work item? Jira #1000 Git Bug Fix #1000 BA Fix #1000 C Fix a Bug #1000
  • 59.
    29 How many developersare involved across revisions of a work item? Jira #1000 Git Bug Fix #1000 BA Fix #1000 C Fix a Bug #1000
  • 60.
    30 25% of workitems involve multiple developers
  • 61.
    31 The impact ofthe work item granularity File Spread Time Spread Developer Spread 24% of the co- changed files are overlooked 83% of related commits cannot be grouped 25% of work items involve multiple developers
  • 62.
    32 [2]Q. Xuan andV. Filkov. Building it together: Synchronous development in OSS. In Proc. Int’l Conf. on Software Engineering (ICSE’14), pages 222–233, 2014. A set of commits where one file is modified by multiple developers within a time window Synchronous development [2]
  • 63.
    32 [2]Q. Xuan andV. Filkov. Building it together: Synchronous development in OSS. In Proc. Int’l Conf. on Software Engineering (ICSE’14), pages 222–233, 2014. A set of commits where one file is modified by multiple developers within a time window A Synchronous development [2]
  • 64.
    33 [2]Q. Xuan andV. Filkov. Building it together: Synchronous development in OSS. In Proc. Int’l Conf. on Software Engineering (ICSE’14), pages 222–233, 2014. A set of commits where different files are modified by multiple developers under the same work item A Test
 A #1000 Collaborative development
  • 65.
    34 Collaborative development [2]Q. Xuanand V. Filkov. Building it together: Synchronous development in OSS. In Proc. Int’l Conf. on Software Engineering (ICSE’14), pages 222–233, 2014. A set of commits where different files are modified by multiple developers under the same work item A Test
 A #1000 We investigate collaborative work items that cannot be detected as synchronous ones
  • 66.
    This type ofcollaboration is not rare 27%-83% of collaborative work items involve developers modifying different files 35
  • 67.
  • 69.
    Median of 29%of work items consist of two or more commits Granularity may have a considerable impact on co-evolution analyses
  • 70.
    Median of 29%of work items consist of two or more commits Granularity may have a considerable impact on co-evolution analyses Studied systems
  • 71.
    Median of 29%of work items consist of two or more commits Granularity may have a considerable impact on co-evolution analyses Studied systems The impact of the work item granularity File Spread Time Spread Developer Spread 24% of the co- changed files are overlooked 83% of related commits cannot be grouped 25% of work items involve multiple developers
  • 72.
    Median of 29%of work items consist of two or more commits Granularity may have a considerable impact on co-evolution analyses Studied systems The impact of the work item granularity File Spread Time Spread Developer Spread 24% of the co- changed files are overlooked 83% of related commits cannot be grouped 25% of work items involve multiple developers Given the impact that work item grouping, we recommend that future software evolution studies will be performed at the work item level.
  • 73.
  • 74.
    Work item aggregation 40 Git AMBARI-15217. Foldername spills out of Upload file window Update ambari docs for ambari 2.2.0 release /PROJECT_NAME.
 ?(d+)/i /PROJECT_NAME.
 ?(d+)[.-] d/i