Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Social dynamics of FLOSS team communication across channels


Published on

Presentation for IFIP 2.13 Open Source Software 2008 in Milan, reporting findings of dynamic social network analysis of communications in FLOSS teams in a variety of communication venues, including trackers, developer communication venues and user communication venues.

Published in: Technology, Economy & Finance
  • Be the first to comment

  • Be the first to like this

Social dynamics of FLOSS team communication across channels

  1. 1. Social dynamics of FLOSS team communication across channels Andrea Wiggins, James Howison & Kevin Crowston Syracuse University School of Information Studies 9 September 2008 ~ IFIP 2.13 - OSS 2008
  2. 2. Introduction <ul><li>FLOSS studies using social network analysis (SNA) techniques raise questions of validity </li></ul><ul><li>Methodological issues for SNA on FLOSS </li></ul><ul><ul><li>Choices of network measures </li></ul></ul><ul><ul><li>Evaluation of intensity of relationships </li></ul></ul><ul><ul><li>Effects of time </li></ul></ul><ul><ul><li>Variations across communication venues </li></ul></ul><ul><li>Building upon Social dynamics of free and open source team communications by Howison et al., from IFIP 2.13 in 2006 </li></ul>
  3. 3. FLOSS Networks <ul><li>Membership/association networks </li></ul><ul><ul><li>Developer-project networks, not the focus of this paper </li></ul></ul><ul><li>Communication networks </li></ul><ul><ul><li>Network based on reply structure of public threads </li></ul></ul><ul><ul><ul><li>Proxy for direct communication between individuals, leaving external validity questions aside… </li></ul></ul></ul><ul><ul><li>Link formed between a replier and the immediately previous poster in a threaded discussion </li></ul></ul><ul><ul><ul><li>Example: Andrea starts a thread, James replies to Andrea, and Kevin replies to James </li></ul></ul></ul><ul><ul><ul><li>(Directed) links: J -> A, K -> J </li></ul></ul></ul>
  4. 4. Methodological Issues <ul><li>Choice of measures </li></ul><ul><ul><li>Are the measures appropriate to the data? </li></ul></ul><ul><ul><li>Example: broadcast network (e.g. email list) violates assumptions of brokerage measures based on control of information flow </li></ul></ul><ul><li>Time </li></ul><ul><ul><li>Aggregation is necessary but masks informative details </li></ul></ul><ul><li>Intensity of interactions </li></ul><ul><ul><li>Most SNA measures are binary </li></ul></ul><ul><li>Variations across venues </li></ul><ul><ul><li>A significant issue for FLOSS SNA study sampling </li></ul></ul>
  5. 5. Measures and Time <ul><li>Measures </li></ul><ul><ul><li>Outdegree centralization: whole network measure of inequality in communication contribution </li></ul></ul><ul><ul><li>Outdegree (centrality): number of outbound links in a directed network, used for reply-to structure of threads </li></ul></ul><ul><ul><li>Centralization: relationship between all centralities in the network </li></ul></ul><ul><ul><ul><li>High values: a few individuals make most responses </li></ul></ul></ul><ul><ul><ul><li>Low values: more equal communication levels </li></ul></ul></ul><ul><li>Time </li></ul><ul><ul><li>Series of snapshots of network, with sliding window to handle low-volume time periods </li></ul></ul>
  6. 6. Sliding Windows and Smoothing 90 days 30 days 30 days 90 day window, sliding forward by 30 days at a time
  7. 7. Intensity and Variation Across Venues <ul><li>Intensity </li></ul><ul><ul><li>Created an intensity-based smoothing function </li></ul></ul><ul><ul><li>Exponential decay of interaction weight as time passes: more recent events more heavily weighted </li></ul></ul><ul><ul><li>Threshold dichotomization for use with binary SNA measures </li></ul></ul><ul><li>Variation in venues </li></ul><ul><ul><li>Analysis of all venues for each project </li></ul></ul><ul><ul><li>Grouped by target audience/purpose for venues: users, developers, and trackers </li></ul></ul><ul><ul><li>Trackers include bugs, feature requests, etc. </li></ul></ul>
  8. 8. Venue Volumes Over Time
  9. 9. Data <ul><li>Data from FLOSSmole </li></ul><ul><li>Compared two projects: Fire & Gaim, both IM clients </li></ul><ul><li>Gaim </li></ul><ul><ul><li>November 1999 - April 2006, when project identity changed to Pidgin </li></ul></ul><ul><ul><li>4 trackers, 1 user forum, 2 developer email lists </li></ul></ul><ul><ul><li>Considered successful </li></ul></ul><ul><li>Fire </li></ul><ul><ul><li>2001 - March 2006, project’s final release </li></ul></ul><ul><ul><li>2 trackers, 1 user email list, 2 developer email lists </li></ul></ul><ul><ul><li>Not considered successful </li></ul></ul>
  10. 10. Intensity-Based Smoothing <ul><li>(simplified) R script for calculating edge weights: </li></ul><ul><ul><li> , , : inputs for beginning and end dates for period, plus date of event </li></ul></ul><ul><ul><li>total.time <- - + 1 elapsed.time <- - +1 event.rate <- ( total.time - elapsed.time )/ total.time event.weight <- exp(-log( total.time )* event.rate ) </li></ul></ul><ul><ul><li>Sum up interaction weights for each dyad in time period for edge weight </li></ul></ul><ul><li>Intent is to reduce undesirable effects of smoothing from overlapping windows </li></ul>
  11. 11. Applying the Smoothing Function
  12. 12. Effect of Edge Weighting on Smoothing <ul><li>No effect on a very large data set like Gaim </li></ul><ul><li>Improved smoothing over unit weighting for sparse data </li></ul><ul><ul><li>Most difference seen in least active venue </li></ul></ul><ul><li>Exhaustive sensitivity testing - future work </li></ul>
  13. 13. Scientific Workflows for Dynamic SNA <ul><li>Used Taverna Workbench to design analysis workflow </li></ul><ul><li>Automated method for replicable analysis of large data sets </li></ul><ul><li>Example: Gaim data set </li></ul><ul><ul><li>41K events in user forum </li></ul></ul><ul><ul><li>30K events in developer venues </li></ul></ul><ul><ul><li>20K events in trackers </li></ul></ul><ul><li>Can (and intend to) apply same analysis to data for additional projects </li></ul>
  14. 14. Findings: Variations in Dynamics <ul><li>Compared mean and standard deviation of centralizations in aggregated venues </li></ul><ul><li>For both projects, different venues showed different communication dynamics </li></ul>
  15. 15. Findings: Gaim <ul><li>Average centralizations lowest for user email list and highest for the developer lists: number of participants matters </li></ul><ul><li>Standard deviations of the centralizations for the user forum and the trackers are comparable, while the standard deviation for the developer lists was much lower: consistency of participation </li></ul><ul><li>Centralization trends reflect more varied participation dynamic in user forum and trackers, more regular for developer lists </li></ul><ul><li>Periodic spikes in tracker activity (to be continued…) </li></ul>
  16. 16. Findings: Fire <ul><li>Comparable mean values for centralizations for user and developer venues, but higher for tracker </li></ul><ul><ul><li>Anomalous data pattern, to be explained momentarily </li></ul></ul><ul><li>Excluding the anomalous tracker data, means and standard deviations of centralizations for trackers and email lists were comparable </li></ul><ul><li>Suggests a degree of regularity across the communication venues </li></ul><ul><li>All venues tended toward decentralization over time </li></ul><ul><ul><li>Except for that anomaly… </li></ul></ul>
  17. 17. Tracker Housekeeping Behavior <ul><li>Observed anomalous patterns in trackers for both projects: periodic centralization spikes </li></ul><ul><li>A single user makes batch bug closings (up to 279!) </li></ul><ul><ul><li>Fire’s (feature request) tracker housekeeping appears to be preparation for project closure </li></ul></ul><ul><ul><li>Gaim’s tracker housekeeping was more regular and repeated </li></ul></ul>Cleaning up before shutting down
  18. 18. Observations on Dynamics <ul><li>Different levels of correlation between venues, suggesting different types of interactions </li></ul><ul><li>User venues more decentralized than developer venues, reflecting greater number of participants </li></ul><ul><li>Overall trend toward decentralization over time could be a result of different influences </li></ul><ul><ul><li>Fire: decentralization due to loss of project leadership </li></ul></ul><ul><ul><li>Gaim: decentralization due to growth in user participation </li></ul></ul><ul><ul><li>Highlights duality of centralization measure: can be affected by leadership and/or number of participants </li></ul></ul><ul><li>Variations indicate reason for concern over validity </li></ul>
  19. 19. Conclusion <ul><li>Contributed an original method for exponentially decayed edge weightings in dynamic networks </li></ul><ul><li>Variation in communication centralization dynamics across venues has implications for research design </li></ul><ul><li>Periodic project management activities are apparent in batch bug closings by few individuals, which cause spikes in tracker centralization </li></ul><ul><ul><li>Interesting housekeeping behavior, but also a potential confound for analysis based on trackers </li></ul></ul><ul><li>All venues in both projects tended toward decentralization over time </li></ul>