Multilevel Collaboration between Software Developers and the Impact of Proximity:an Early, Preliminary Work
Multilevel Collaboration between Software
Developers and the Impact of Proximity:
an Early, Preliminary Work
Dawn Foster, Guido Conaldi, Riccardo De Vita
University of Greenwich
Centre for Business Network Analysis
Goals for Today
Very early work – seeking feedback on
• Best approaches for incorporating
• Fitting a suitable model for multilevel
• What we have done so far.
How do participants who are paid by
firms collaborate within a fluid
Proximity theory as a theoretical framework:
• to understand intraorganizational collaboration
• within fluid organizations
• using an open source software project, the
Linux kernel, as the empirical setting.
Contribute to literature on fluid organizations by:
• Determining the impact of firm affiliation on intraorganizational
collaboration between individuals in fluid organizations.
– Existing studies on open source mostly individual motivations.
– Firms can influence collaboration of employees.
• Demonstrating that proximity theory can be used to better
understand collaboration within fluid organizations.
– Boschma’s (2005) five dimensions should further our understanding.
– Most proximity studies are inter; Fluid boundaries blur distinction.
As fluid organizations become more common, understanding
collaboration within them is increasingly important. 4
• In fluid organizations, the boundaries and structures allow fluid
movement within the organization as individuals collaborate to
coordinate activities (Ashkenas et al., 2002; Glance & Huberman, 1994).
• Some fluid organizations are based on global virtual work across many
time zones by people from different backgrounds (Nurmi & Hinds, 2016)
and may include individuals from different firms and different types of
institutions (O’Mahony & Bechky, 2008).
• Collaboration, especially within fluid organizations, crosses dimensions
of proximity, including cognitive, organizational, social, institutional and
geographical, which can be used to better understand collaboration
(Balland, 2012; Boschma, 2005; Cantner & Graf, 2006; Crescenzi,
Nathan, & Rodríguez-Pose, 2016; Knoben & Oerlemans, 2006).
• Social proximity: relations between actors with trust coming from friendship and
experience (Boschma 2005).
• Institutional proximity: whether individuals collaborate more with others in a
similar institutional setting, like corporation, non-profit, university, non-affiliated,
etc. (Balland 2012; Crescenzi et al. 2013).
• Organizational proximity: relationship within an organizational structure
(Boschma 2005) and to look at collaboration within and between orgs.
• Cognitive proximity: similarity of frames of reference and knowledge (Knoben &
• Geographic Proximity: physical, spatial distance between actors (Boschma
2005). Online, geographical proximity is often irrelevant, but others have used a
temporal measure (time zones) (O’Leary & Cummings, 2007).
Empirical Setting: Open Source
• Open source frequently studied as a fluid organization (e.g. Chen
& O’Mahony, 2009; O'Mahony & Bechky, 2008; Puranam et al.,
• Contributions by individuals, not firms (O’Mahony, 2007), but firms
are increasingly paying employees to contribute as a way to
participate (Jensen & Scacchi, 2007; Roberts et al., 2006).
• Linux Kernel1:
– < 8% of contributions by
unpaid software developers
– Neutral project, competing
– 22 million lines of code
– 14,000 developers
– 1,300 organisations
Computer Hardware (CPU, memory, disk)
Linux Operating System (Red Hat, Ubuntu)
Applications (web browser, office)SystemonlyUserfacing
1 Corbet & Kroah-Hartman, 2016
• Network ties: Mailing Lists – ego replies to alter
– Collaboration for code review, patch feedback, bugs & discussions
are on mailing lists before source code is accepted into repository.
• “The mailing lists are still the primary communications space.”
• “All of our collaboration happens over discussing patches.”
10 Mailing Lists 2015-01-27 90 days k-core>=10
• Individual / Organizational / Mailing List Levels
– Employers pay developers to enable firm’s products, gain
influence and set direction, share information, more.
– Most consider affiliation with the Linux kernel community to
be more important than their employer.
– Almost all contributions come from paid software developers.
– Collaboration occurs in 200+ mailing lists simultaneously.
• How does firm affiliation with an organization shape
collaboration of individuals?
• How do mailing lists enable collaboration?
Using Boschma’s (2005) 5 dimensions of proximity
– Operationalized as firm affiliation (company) or unaffiliated (hobbyist, etc.)
– Usually measured based on shared knowledge / technologies
– Operationalized as contributing to areas of the source code (subsystems)
– Usually measured based on physical location, less relevant for online
– Operationalized using time zones (temporal geographic proximity)
– Operationalized based on employment by firm, academia, or unaffiliated
– Often measured using collaboration network (seems like double counting)
– Operationalized by # of times dyad participated in same mailing list threads10
• Subset for testing multilevel analysis – 2 years
– 2013-11-01 (complete dataset: 2006-03-20 first LTS release)
– 2015-11-01 – date of 4.3 release
– 15, 30, 45, 60, 75, 90 day moving windows
• Mailing Lists:
– 19 of the top mailing lists (over 200), excluded top mailing list
– 226,919 messages (out of 2,818,774 for top 20, all dates)
• Source Code:
– Linux-stable tree
– 177,113 commits (out of 603,006 for all dates)
Relational Event Models
• Relational event models provide a “highly flexible framework for
modeling actions within social settings, which permits likelihood-based
inference for behavioral mechanisms with complex
dependence.” (Butts, 2008, p. 155)
• Based on relational events, or actions generated by sender directed
toward a receiver. Represented by sender, receiver, action type and
time (Butts, 2008).
• Mailing list data with a time stamp for each message provides useful
data for relational event models.
• Each reply to a mailing list post can be thought of as an event created
by a sender targeted at a receiver.
• Used to explain likelihood of collaboration between 2 developers given
influence of dimensions of proximity and other effects.
Effects: Dyadic P-Shifts, Recency
Results - Series of difficulties
• REM model struggled with number of events:
– Reduced to first 500 events (1.5 days) to get the model to run
(used first 200 events as control, ran model with 300 events)
– Takes 6+ hours to estimate 600 events (3 days) on a big server.
– Might have to do with the way we are loading variables into the
– Possible other limitations with the REM model / Relevent software
• Model not yet complete: Testing the waters now.
– tiny number of events won’t represent whole.
– missing variables likely to change these results.
– need to analyze per mailing list (mailing list level)
• Proximity looks promising as theoretical framework
– Org prox - less likely to reply to other employees. Do they
use internal corporate channels to collaborate?
– Cognitive prox – more likely to reply to people working in
same areas of code.
– Geo prox – less likely to reply as tz difference increases
Future Developments / Feedback
• We know the Model has issues:
– Get feedback on what we have done so far and on
fitting a suitable model for multilevel networks.
• Multilevel: Both aspects need to be developed:
– Multilevel analysis of networks: multiple mailing lists at the same
time (like classrooms within schools)
• Mailing lists as levels? How do we do this?
– Analysis of multilevel networks: complex models for networks -
modeling organizational affiliation as a level.
• Can we treat organizations as a level, instead of as an attribute of developers?
• Need to look at org level to see interactions by organization.
• Relational Event Models:
– Options for modeling large event sequences in networks. 17
Thank You and Questions