Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Multilevel Collaboration between Software Developers and the Impact of Proximity: an Early, Preliminary Work


Published on

Very early preliminary work presented for feedback only.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Multilevel Collaboration between Software Developers and the Impact of Proximity: an Early, Preliminary Work

  1. 1. Multilevel Collaboration between Software Developers and the Impact of Proximity: an Early, Preliminary Work   Dawn Foster, Guido Conaldi, Riccardo De Vita University of Greenwich Centre for Business Network Analysis
  2. 2. Goals for Today Very early work – seeking feedback on •  Best approaches for incorporating multilevel concepts. •  Fitting a suitable model for multilevel networks. •  What we have done so far. 2  
  3. 3. Research Overview How do participants who are paid by firms collaborate within a fluid organization? Proximity theory as a theoretical framework: •  to understand intraorganizational collaboration •  within fluid organizations •  using an open source software project, the Linux kernel, as the empirical setting. 3  
  4. 4. Contributions Contribute to literature on fluid organizations by: •  Determining the impact of firm affiliation on intraorganizational collaboration between individuals in fluid organizations. –  Existing studies on open source mostly individual motivations. –  Firms can influence collaboration of employees. •  Demonstrating that proximity theory can be used to better understand collaboration within fluid organizations. –  Boschma’s (2005) five dimensions should further our understanding. –  Most proximity studies are inter; Fluid boundaries blur distinction. As fluid organizations become more common, understanding collaboration within them is increasingly important. 4  
  5. 5. Fluid Organizations •  In fluid organizations, the boundaries and structures allow fluid movement within the organization as individuals collaborate to coordinate activities (Ashkenas et al., 2002; Glance & Huberman, 1994). •  Some fluid organizations are based on global virtual work across many time zones by people from different backgrounds (Nurmi & Hinds, 2016) and may include individuals from different firms and different types of institutions (O’Mahony & Bechky, 2008). •  Collaboration, especially within fluid organizations, crosses dimensions of proximity, including cognitive, organizational, social, institutional and geographical, which can be used to better understand collaboration (Balland, 2012; Boschma, 2005; Cantner & Graf, 2006; Crescenzi, Nathan, & Rodríguez-Pose, 2016; Knoben & Oerlemans, 2006). 5  
  6. 6. Proximity Theory •  Social proximity: relations between actors with trust coming from friendship and experience (Boschma 2005). •  Institutional proximity: whether individuals collaborate more with others in a similar institutional setting, like corporation, non-profit, university, non-affiliated, etc. (Balland 2012; Crescenzi et al. 2013). •  Organizational proximity: relationship within an organizational structure (Boschma 2005) and to look at collaboration within and between orgs. •  Cognitive proximity: similarity of frames of reference and knowledge (Knoben & Oerlemans 2006). •  Geographic Proximity: physical, spatial distance between actors (Boschma 2005). Online, geographical proximity is often irrelevant, but others have used a temporal measure (time zones) (O’Leary & Cummings, 2007). 6  
  7. 7. Empirical Setting: Open Source •  Open source frequently studied as a fluid organization (e.g. Chen & O’Mahony, 2009; O'Mahony & Bechky, 2008; Puranam et al., 2014) •  Contributions by individuals, not firms (O’Mahony, 2007), but firms are increasingly paying employees to contribute as a way to participate (Jensen & Scacchi, 2007; Roberts et al., 2006). •  Linux Kernel1: –  < 8% of contributions by unpaid software developers –  Neutral project, competing companies participate –  22 million lines of code –  14,000 developers –  1,300 organisations 7   Linux Kernel Computer Hardware (CPU, memory, disk) Linux Operating System (Red Hat, Ubuntu) Applications (web browser, office)SystemonlyUserfacing 1 Corbet & Kroah-Hartman, 2016  
  8. 8. Collaboration Network •  Network ties: Mailing Lists – ego replies to alter –  Collaboration for code review, patch feedback, bugs & discussions are on mailing lists before source code is accepted into repository. •  “The mailing lists are still the primary communications space.” •  “All of our collaboration happens over discussing patches.” 8  10 Mailing Lists 2015-01-27 90 days k-core>=10
  9. 9. Multilevel Network •  Individual / Organizational / Mailing List Levels –  Employers pay developers to enable firm’s products, gain influence and set direction, share information, more. –  Most consider affiliation with the Linux kernel community to be more important than their employer. –  Almost all contributions come from paid software developers. –  Collaboration occurs in 200+ mailing lists simultaneously. •  How does firm affiliation with an organization shape collaboration of individuals? •  How do mailing lists enable collaboration? 9  
  10. 10. Operationalizing Proximity Using Boschma’s (2005) 5 dimensions of proximity •  Organizational: –  Operationalized as firm affiliation (company) or unaffiliated (hobbyist, etc.) •  Cognitive: –  Usually measured based on shared knowledge / technologies –  Operationalized as contributing to areas of the source code (subsystems) •  Geographic: –  Usually measured based on physical location, less relevant for online collaboration. –  Operationalized using time zones (temporal geographic proximity) •  Institutional: –  Operationalized based on employment by firm, academia, or unaffiliated •  Social: –  Often measured using collaboration network (seems like double counting) –  Operationalized by # of times dyad participated in same mailing list threads10  
  11. 11. Dataset •  Subset for testing multilevel analysis – 2 years •  Dates: –  2013-11-01 (complete dataset: 2006-03-20 first LTS release) –  2015-11-01 – date of 4.3 release –  15, 30, 45, 60, 75, 90 day moving windows •  Mailing Lists: –  19 of the top mailing lists (over 200), excluded top mailing list –  226,919 messages (out of 2,818,774 for top 20, all dates) •  Source Code: –  Linux-stable tree –  177,113 commits (out of 603,006 for all dates) 11  
  12. 12. Relational Event Models •  Relational event models provide a “highly flexible framework for modeling actions within social settings, which permits likelihood-based inference for behavioral mechanisms with complex dependence.” (Butts, 2008, p. 155) •  Based on relational events, or actions generated by sender directed toward a receiver. Represented by sender, receiver, action type and time (Butts, 2008). •  Mailing list data with a time stamp for each message provides useful data for relational event models. •  Each reply to a mailing list post can be thought of as an event created by a sender targeted at a receiver. •  Used to explain likelihood of collaboration between 2 developers given influence of dimensions of proximity and other effects. 12  
  13. 13. Effects: Dyadic P-Shifts, Recency 13   Illustra3ons  by     Carter  Bu;s,     Sunbelt  2015  
  14. 14. Results - Series of difficulties •  REM model struggled with number of events: –  Reduced to first 500 events (1.5 days) to get the model to run (used first 200 events as control, ran model with 300 events) –  Takes 6+ hours to estimate 600 events (3 days) on a big server. –  Might have to do with the way we are loading variables into the model. –  Possible other limitations with the REM model / Relevent software 14  
  15. 15. Preliminary Results 15  
  16. 16. Preliminary Results •  Model not yet complete: Testing the waters now. –  tiny number of events won’t represent whole. –  missing variables likely to change these results. –  need to analyze per mailing list (mailing list level) •  Proximity looks promising as theoretical framework –  Org prox - less likely to reply to other employees. Do they use internal corporate channels to collaborate? –  Cognitive prox – more likely to reply to people working in same areas of code. –  Geo prox – less likely to reply as tz difference increases 16  
  17. 17. Future Developments / Feedback •  We know the Model has issues: –  Get feedback on what we have done so far and on fitting a suitable model for multilevel networks. •  Multilevel: Both aspects need to be developed: –  Multilevel analysis of networks: multiple mailing lists at the same time (like classrooms within schools) •  Mailing lists as levels? How do we do this? –  Analysis of multilevel networks: complex models for networks - modeling organizational affiliation as a level. •  Can we treat organizations as a level, instead of as an attribute of developers? •  Need to look at org level to see interactions by organization. •  Relational Event Models: –  Options for modeling large event sequences in networks. 17  
  18. 18. Thank You and Questions Authors:     Dawn  M.  Foster           Guido  Conaldi     Riccardo  De  Vita       University  of  Greenwich,  Centre  for  Business  Network  Analysis   h;p://   18