Situated learning among open source software developers


Published on

Abstract--The presence of learning in organizations is important for success and survival. Recent research into open source software developers has primarily suggested a social constructivist view where knowledge is constructed in the social relationships within the organization culture. I report results from a case study that investigated the presence of situated learning in open source developers at earlier time of a project. Thirty-eight developers were systematically selected and examined on their performance, experience and roles during ten months of maintenance work. I followed a model of learning curve effects that associated the improvement in the average resolving time with the accumulated experience. I found a strong relationship between the two variables and confirmed the presence of learning. In addition, I found a less convincing evidence to affirm knowledge depreciates in open source software developers. The depreciation factor was estimated to be 94 percent, compared to other studies which ranged between 65 to 85 percent. An additional investigation was conducted around the organization structure to understand whether core and peripheral members have different average resolving time. The finding was inconclusive to claim both groups have different means towards issue resolution. The consistency in the result about learning existence between this thesis and several related research efforts suggests that learning is likely to be an intrinsic characteristic of open source software development rather than just a speculative belief.

Published in: Technology, News & Politics
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Situated learning among open source software developers

  1. 1. A Master Thesis Presentation (Dartington Pottery Training Workshop, 1978) Situated Learning in Open Source Software Developers: The Case of Google Chrome Project Author: Supervisors: Josef Hardi Prof. Barbara Russo European Master in Software Engineering Dr. Richard TorkarThursday, August 4, 2011
  2. 2. Introduction • Situated Learning is the learning that occurs in workplaces [Brown et al., 1989]. • No separation between ‘knowing’ and ‘doing’. • Situated learning is primarily practiced by the community of practitioners. 1/18Thursday, August 4, 2011
  3. 3. Existing Findings • Learning curve effect. • “That the more times a task has been performed, the less time will be required on each subsequent iteration.” [T.P. Wright, 1936] • [Huntley, 2003]: Mozilla is reported to exhibit a strong learning curve compared to Apache. • [Au et al., 2009]: Learning is universally present in OSS projects. 2/18Thursday, August 4, 2011
  4. 4. Distinctions in this Thesis • Data are taken from each individual instead of from an aggregation of individuals. • More insights to individual characteristics. • i.e., Knowledge depreciation and team roles as factors that affect the learning process. 3/18Thursday, August 4, 2011
  5. 5. Research Question 1: Research Question 2: Is learning present in What are the factors that OSS developers? affect learning? Hypothesis 2: Hypothesis 1: Knowledge depreciates over There is a relation time among the OSS between the developers. accumulated experience and the Hypothesis 3: performance. Core developers resolve issues faster. 4/18Thursday, August 4, 2011
  6. 6. Case Study • Google Chrome Project. • Duration: 10 months ~ 10 releases (December 2008 - October 2009). 5/18Thursday, August 4, 2011
  7. 7. Research Methodology 1 2 Data Collection Data exploration Issue Report Review Data Interaction Data Performance Experience Team Role 4 3 Identification of Learning Curve Construct Input Data Models and Data Fitting 6/18Thursday, August 4, 2011
  8. 8. Research Methodology: 1 2 3 4 Data Collection Issue Report = [ID, Type, Area, Status, Owner, Open date, Assigned date, Started date, Close date] 1. Unrelated project areas, 2. Invalid issue status, 3. Empty owner name. Issue Report Data (5,160 entries) 7/18Thursday, August 4, 2011
  9. 9. Research Methodology: 1 2 3 4 Data Collection "ben","sky",1226700214 "ben","sky",1226706864 "ben","pkasting",1226707765 "mal","tony",1226809276 "sgk","tony",1226874776 "phajdan.jr","deanm",1227808551 "phajdan.jr","deanm",1227809341 "phajdan.jr","mark",1228496086 ... Interaction = [Owner, Reviewer, Comment date] Review Interaction Data (12,037 entries) 8/18Thursday, August 4, 2011
  10. 10. Research Methodology: 1 2 3 4 Data Exploration Releases Issue Report Data Developers Average of issue Performance Measure Performance resolution time. ... Releases Issue Report Data Developers Number of resolved Experience Measure Experience issues ... Sample = 274 developers 9/18Thursday, August 4, 2011
  11. 11. Research Methodology: 1 2 3 4 Data Exploration Review Releases Interaction Data Developers Core and periphery Team Role Estimate Team Role structure model [Borgatti, 1999] ... • Core entails a dense, cohesive structure and periphery entails a sparse, loose structure. • The estimation is performed by using UCINET. Sample = 274 developers 10/18Thursday, August 4, 2011
  12. 12. Research Methodology: 1 2 3 4 Construct Input Data 274 Developers 38 Long-term Contributors Participate for at least 8 releases Refine Not all of them working in a long-term. longitud inal data new sets 11/18Thursday, August 4, 2011
  13. 13. Input data set: Performance The data distribution in the group of long-term developers Average time of resolving issues (log days) Releases 12/18Thursday, August 4, 2011
  14. 14. Input data set: Experience The data distribution in the group of long-term developers Amount of resolved issues (N) Releases 13/18Thursday, August 4, 2011
  15. 15. Input data set: Team Role The team composition in the group of long-term developers R1 R2 R3 R4 R5 39% 39% 45% 47% 46% 55% 53% 54% 61% 61% R6 R7 R8 R9 R10 39% 47% 47% 42% 42% 53% 53% 58% 58% 61% 14/18Thursday, August 4, 2011
  16. 16. Research Methodology: 1 2 3 4 Identification of Learning Curve Models and Data Fitting Model 1: Model 2: Note 15/18Thursday, August 4, 2011
  17. 17. Result Summary Hypothesis Variable Model 1 Model 2 Supported? H1 KnowledgeStock -0.01*** -0.01*** Yes H2 Lambda 0.94*** 0.94*** Yes H3 TeamRole NA 0.18 No *** Statistically significant p < 0.001 16/18Thursday, August 4, 2011
  18. 18. Threats to Validity Internal Validity External Validity • The improvement in the solving • Both models have a very low issues might be caused by the statistical prediction power (less improvement in the system than 5%). design. • Some of the issue data are Construct Validity incomplete • The estimation of Core and Periphery structure might not reflect the real situation. However, the communication pattern is the best indicator. 17/18Thursday, August 4, 2011
  19. 19. Conclusion • I affirmed that learning is present in open source software developers. • Knowledge does not significantly depreciate in the Google Chrome team. • It is inconclusive to claim core developers work faster than those who are in the periphery. • Methodological contribution: A method to harvest and analyze data from code review. 18/18Thursday, August 4, 2011
  20. 20. Thank you!Bolzano, 8 October 2010Thursday, August 4, 2011
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.