Situated learning among open source software developers

A Master Thesis Presentation

(Dartington Pottery Training Workshop, 1978)

Situated Learning in
Open Source Software Developers:
The Case of Google Chrome Project
Author: Supervisors:
Josef Hardi Prof. Barbara Russo
European Master in Software Engineering Dr. Richard Torkar
Thursday, August 4, 2011

Introduction

• Situated Learning is the learning that occurs
in workplaces [Brown et al., 1989].
• No separation between ‘knowing’ and ‘doing’.
• Situated learning is primarily practiced by the
community of practitioners.

1/18

Existing Findings
• Learning curve effect.
• “That the more times a task has been performed, the
less time will be required on each subsequent
iteration.” [T.P. Wright, 1936]

• [Huntley, 2003]: Mozilla is reported to exhibit a
strong learning curve compared to Apache.
• [Au et al., 2009]: Learning is universally present
in OSS projects.

2/18

Distinctions in this
Thesis

• Data are taken from each individual instead of
from an aggregation of individuals.
• More insights to individual characteristics.
• i.e., Knowledge depreciation and team roles as
factors that affect the learning process.

3/18

Research Question 1: Research Question 2:
Is learning present in What are the factors that
OSS developers? affect learning?

Hypothesis 2:
Hypothesis 1: Knowledge depreciates over
There is a relation time among the OSS
between the developers.
accumulated
experience and the Hypothesis 3:
performance. Core developers resolve
issues faster.

4/18

Case Study

• Google Chrome Project.

• Duration: 10 months ~ 10
releases (December 2008 -
October 2009).

5/18

Research Methodology
1 2
Data Collection
Data exploration
Issue Report Review
Data Interaction Data Performance Experience Team Role

4 3

Identiﬁcation of Learning Curve
Construct Input Data
Models and Data Fitting

6/18

Research Methodology: 1 2 3 4

Data Collection
Issue Report =
[ID, Type, Area, Status,
Owner, Open date,
Assigned date, Started
date, Close date]

1. Unrelated project areas,
2. Invalid issue status,
3. Empty owner name.

Issue Report Data
(5,160 entries)
7/18


Data Collection
"ben","sky",1226700214
"ben","sky",1226706864
"ben","pkasting",1226707765
"mal","tony",1226809276
"sgk","tony",1226874776
"phajdan.jr","deanm",1227808551
"phajdan.jr","deanm",1227809341
"phajdan.jr","mark",1228496086
...

Interaction =
[Owner, Reviewer,
Comment date]

Review
Interaction Data
(12,037 entries) 8/18


Data Exploration
Releases
Issue Report
Data

Developers
Average of issue Performance
Measure Performance resolution time.

...
Releases
Issue Report
Data

Developers
Number of resolved Experience
Measure Experience
issues

...
Sample = 274 developers
9/18


Data Exploration
Review
Releases
Interaction
Data

Developers
Core and periphery Team Role
Estimate Team Role structure model
[Borgatti, 1999]
...

• Core entails a dense, cohesive structure and
periphery entails a sparse, loose structure.
• The estimation is performed by using UCINET.
Sample = 274 developers
10/18


Construct Input Data
274 Developers 38 Long-term
Contributors

Participate for
at least 8
releases

Reﬁne
Not all of them working
in a long-term. longitud inal data
new
sets

11/18

Input data set:
Performance
The data distribution in the group of long-term developers
Average time of resolving issues
(log days)

Releases
12/18

Input data set:
Experience
The data distribution in the group of long-term developers
Amount of resolved issues
(N)

Releases
13/18

Input data set:
Team Role
The team composition in the group of long-term developers
R1 R2 R3 R4 R5

39% 39% 45% 47%
46% 55% 53%
54% 61% 61%

R6 R7 R8 R9 R10

39%
47% 47% 42% 42%
53% 53% 58% 58% 61%

14/18


Identiﬁcation of Learning Curve
Models and Data Fitting
Model 1:

Model 2:

Note

15/18

Result Summary

Hypothesis Variable Model 1 Model 2 Supported?

H1 KnowledgeStock -0.01*** -0.01*** Yes

H2 Lambda 0.94*** 0.94*** Yes

H3 TeamRole NA 0.18 No
*** Statistically signiﬁcant p < 0.001

16/18

Threats to Validity
Internal Validity External Validity

• The improvement in the solving • Both models have a very low
issues might be caused by the statistical prediction power (less
improvement in the system than 5%).
design.

• Some of the issue data are Construct Validity
incomplete
• The estimation of Core and
Periphery structure might not
reﬂect the real situation.
However, the communication
pattern is the best indicator.

17/18

Conclusion
• I afﬁrmed that learning is present in open
source software developers.
• Knowledge does not signiﬁcantly depreciate in
the Google Chrome team.
• It is inconclusive to claim core developers
work faster than those who are in the
periphery.
• Methodological contribution: A method to
harvest and analyze data from code review.

18/18

Thank you!

Bolzano, 8 October 2010

Situated learning among open source software developers

Recommended

Recommended

More Related Content

Similar to Situated learning among open source software developers

Similar to Situated learning among open source software developers (20)

Recently uploaded

Recently uploaded (20)

Situated learning among open source software developers