The Impact of Task Granularity on Co-evolution Analyses

The Impact of Task Granularity on
Co-evolution Analyses
Yasutaka
Kamei
Keisuke 
Miura
Shane 
McIntosh
Naoyasu 
Ubayashi
Ahmed E. 
Hassan

Software evolution aims to recover  
knowledge about development
2
Repositories Knowledge
GerritGit GitHub
RHSA Mylyn

Co-evolution of production &
test code
3
Growth history view of ArgoUML [1]
[1] A. Zaidman, B. Van Rompaey, S. Demeyer, and A. Van Deursen. Mining software repositories to study co-evolution of production & test code.  
In Proc. Int’l Conf. on Software Testing, Veriﬁcation, and Validation (ICST’08), pages 220–229, 2008.

There are several levels of
granularity
4

granularity
4
Commits

granularity
4
Pull  
Requests
Merge
Commits

granularity
4
Pull  
Requests
Merge v2.0
ReleasesCommits

Some issues may require several
commits
5
Fix #1000
A
Test 
A

Fix #1100
B
commits
5
Fix #1000
A
Test 
A

Fix #1100
B
commits
5
Fix #1000
A
Test 
A
Fix #1100
Test 
B
1 day  
later

Fix #1100
B
commits
5
Fix #1000
A
Test 
ACommit-level analysis would miss the  
co-change relationship between them
Fix #1100
Test 
B
1 day  
later

Work items can be used to study
software evolution
6
Jira
#1000 #1100

Jira
#1000 #1100
software evolution
7

Git
Jira
#1000 #1100
Fix #1000
A
Test 
A
B
Test 
B
Fix #1100Fix #1100
software evolution
8

Git
Jira
#1000 #1100
Fix #1000
A
Test 
A
B
Test 
B
Fix #1100Fix #1100
software evolution
9

Git
Jira
#1000 #1100
Fix #1000
A
Test 
A
B
Test 
B
Fix #1100Fix #1100
How does the work item
granularity impact co-
evolution analyses?
software evolution
10

Studied systems
11
Jira
119 systems

12
System size ITS usage
Two important criteria that needed to
be satisﬁed to qualify for our analysis

12
Jira

13
100
75
50
25
0 10,000 20,000
# of commits

13
100
75
50
25
0 10,000 20,000
# of commits
Jira

15
How often are work items
composed of several commits?

15
Git
Jira
#1000
Bug Fix
#1100
BA
Fix #1000
#1100
C
#1200
Fix a Bug
#1200
How often are work items
composed of several commits?

16
Median of 29% of work items
consist of two or more commits

16
Granularity may have a
considerable impact on
co-evolution analyses

17
The impact of the work item
granularity
File
Spread
Time
Spread
Developer
Spread

18
How many ﬁles are changed by
the commits of work items?
Git
#1000
A
Test 
A
A2 A3
Fix #1000 Fix #1000 Fix #1000

19
OpenJPA #1763:
An example of ﬁle spread

19
OpenJPA #1763:
An example of ﬁle spread
60% (= 3 / 5)

20
First-commit analysis would overlook
24% of the co-changed ﬁles (median)

21
QPID-4575: adds support for
Visual Studio 2012
Git
1st commit 5th commit
.cpp .h .cpproj

21
QPID-4575: adds support for
Visual Studio 2012
Git
1st commit 5th commit
.cpp .h
This co-change activity of
production code and build
system would be missed
.cpproj

22
granularity
File
Spread
Time
Spread
Developer
Spread
24% of the co-
changed ﬁles
are overlooked

22
granularity
File
Spread
Time
Spread
Developer
Spread
24% of the co-
changed ﬁles
are overlooked
How much time elapses
between the commits
of work items?

Sliding time window technique
A common setting in software evolution
studies
23
Same commit message
Same developer
Similar time (300 secs)

studies
23
A
Test 
A
Fix #1000 Fix #1000
Git
Same commit message
Same developer

studies
23
A
Test 
A
Fix #1000 Fix #1000
Git
< 300 secs
Same commit message
Same developer

An example of time spread
24
R1 R2 R3 R4
Jira
#1000

24
R1 R2 R3 R4
250 secs 200 secs 400 secs
Jira
#1000

24
R1 R2 R3 R4
250 secs 200 secs 400 secs
33% (= 1 / 3)
Jira
#1000

25
48%-97% of related commits cannot
be grouped using the sliding window

ACCUMULO-1890
Clean up the test
to avoid spinning
up a MAC
26
ACCUMULO-1890: recovers from a
failure due to limited resources

ACCUMULO-1890
Clean up the test
to avoid spinning
up a MAC
26
11 minutes later

ACCUMULO-1890
Clean up the test
to avoid spinning
up a MAC
26
ACCUMULO-1890
Forgot to re-add
changes before
commit
11 minutes later

27
granularity
File
Spread
Time
Spread
Developer
Spread
24% of the co-
changed ﬁles
are overlooked
83% of related
commits cannot
be grouped

27
granularity
File
Spread
Time
Spread
Developer
Spread
24% of the co-
changed ﬁles
are overlooked
83% of related
commits cannot
be grouped
How many developers
are involved across
revisions of a work item?

studies
28
A
Test 
A
Fix #1000 Fix #1000
Git
< 300 secs
Same commit message
Same developer

29
How many developers are involved
across revisions of a work item?

29
Jira
#1000

29
Jira
#1000
Git
Bug Fix
#1000
BA
Fix #1000
C
Fix a Bug
#1000

30
25% of work items involve
multiple developers

31
granularity
File
Spread
Time
Spread
Developer
Spread
24% of the co-
changed ﬁles
are overlooked
83% of related
commits cannot
be grouped
25% of work
items involve
multiple
developers

32
[2]Q. Xuan and V. Filkov. Building it together: Synchronous development in OSS. In Proc. Int’l Conf. on Software Engineering (ICSE’14), pages 222–233, 2014.
A set of commits where one ﬁle is modiﬁed
by multiple developers within a time window
Synchronous development [2]

32
A set of commits where one ﬁle is modiﬁed
by multiple developers within a time window
A
Synchronous development [2]

33
A set of commits where different ﬁles are
modiﬁed by multiple developers under the
same work item
A
Test 
A
#1000
Collaborative development

34
Collaborative development
A set of commits where different ﬁles are
modiﬁed by multiple developers under the
same work item
A
Test 
A
#1000
We investigate collaborative
work items that cannot be
detected as synchronous ones

This type of collaboration is not rare
27%-83% of collaborative work items
involve developers modifying different ﬁles
35

Studied systems

Studied systems
granularity
File
Spread
Time
Spread
Developer
Spread
24% of the co-
changed ﬁles
are overlooked
83% of related
commits cannot
be grouped
25% of work
items involve
multiple
developers

Studied systems
granularity
File
Spread
Time
Spread
Developer
Spread
24% of the co-
changed ﬁles
are overlooked
83% of related
commits cannot
be grouped
25% of work
items involve
multiple
developers
Given the impact that work item grouping,
we recommend that future software
evolution studies will be performed at the
work item level.

Work item aggregation
40
Git
AMBARI-15217.
Folder name spills
out of Upload ﬁle
window
Update ambari
docs for ambari
2.2.0 release
/PROJECT_NAME. 
?(d+)/i
/PROJECT_NAME. 
?(d+)[.-]
d/i

The Impact of Task Granularity on Co-evolution Analyses

Recommended

Recommended

More Related Content

Similar to The Impact of Task Granularity on Co-evolution Analyses

Similar to The Impact of Task Granularity on Co-evolution Analyses (20)

More from SAIL_QU

More from SAIL_QU (20)

Recently uploaded

Recently uploaded (20)

The Impact of Task Granularity on Co-evolution Analyses