Social dynamics of FLOSS team communication across channels Andrea Wiggins, James Howison & Kevin Crowston Syracuse Univer...
Introduction <ul><li>FLOSS studies using social network analysis (SNA) techniques raise questions of validity </li></ul><u...
FLOSS Networks <ul><li>Membership/association networks </li></ul><ul><ul><li>Developer-project networks, not the focus of ...
Methodological Issues <ul><li>Choice of measures </li></ul><ul><ul><li>Are the measures appropriate to the data? </li></ul...
Measures and Time <ul><li>Measures </li></ul><ul><ul><li>Outdegree centralization: whole network measure of inequality in ...
Sliding Windows and Smoothing 90 days 30 days 30 days 90 day window, sliding forward by 30 days at a time
Intensity and Variation Across Venues <ul><li>Intensity </li></ul><ul><ul><li>Created an intensity-based smoothing functio...
Venue Volumes Over Time
Data <ul><li>Data from FLOSSmole </li></ul><ul><li>Compared two projects: Fire & Gaim, both IM clients </li></ul><ul><li>G...
Intensity-Based Smoothing <ul><li>(simplified) R script for calculating edge weights: </li></ul><ul><ul><li>end.date ,  fi...
Applying the Smoothing Function
Effect of Edge Weighting on Smoothing <ul><li>No effect on a very large data set like Gaim </li></ul><ul><li>Improved smoo...
Scientific Workflows for Dynamic SNA <ul><li>Used Taverna Workbench to design analysis workflow </li></ul><ul><li>Automate...
Findings: Variations in Dynamics <ul><li>Compared mean and standard deviation of centralizations in aggregated venues </li...
Findings: Gaim <ul><li>Average centralizations lowest for user email list and highest for the developer lists: number of p...
Findings: Fire <ul><li>Comparable mean values for centralizations for user and developer venues, but higher for tracker </...
Tracker Housekeeping Behavior <ul><li>Observed anomalous patterns in trackers for both projects: periodic centralization s...
Observations on Dynamics <ul><li>Different levels of correlation between venues, suggesting different types of interaction...
Conclusion <ul><li>Contributed an original method for exponentially decayed edge weightings in dynamic networks </li></ul>...
Upcoming SlideShare
Loading in …5
×

Social dynamics of FLOSS team communication across channels

1,366
-1

Published on

Presentation for IFIP 2.13 Open Source Software 2008 in Milan, reporting findings of dynamic social network analysis of communications in FLOSS teams in a variety of communication venues, including trackers, developer communication venues and user communication venues.

Published in: Technology, Economy & Finance
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,366
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Social dynamics of FLOSS team communication across channels

    1. 1. Social dynamics of FLOSS team communication across channels Andrea Wiggins, James Howison & Kevin Crowston Syracuse University School of Information Studies 9 September 2008 ~ IFIP 2.13 - OSS 2008
    2. 2. Introduction <ul><li>FLOSS studies using social network analysis (SNA) techniques raise questions of validity </li></ul><ul><li>Methodological issues for SNA on FLOSS </li></ul><ul><ul><li>Choices of network measures </li></ul></ul><ul><ul><li>Evaluation of intensity of relationships </li></ul></ul><ul><ul><li>Effects of time </li></ul></ul><ul><ul><li>Variations across communication venues </li></ul></ul><ul><li>Building upon Social dynamics of free and open source team communications by Howison et al., from IFIP 2.13 in 2006 </li></ul>
    3. 3. FLOSS Networks <ul><li>Membership/association networks </li></ul><ul><ul><li>Developer-project networks, not the focus of this paper </li></ul></ul><ul><li>Communication networks </li></ul><ul><ul><li>Network based on reply structure of public threads </li></ul></ul><ul><ul><ul><li>Proxy for direct communication between individuals, leaving external validity questions aside… </li></ul></ul></ul><ul><ul><li>Link formed between a replier and the immediately previous poster in a threaded discussion </li></ul></ul><ul><ul><ul><li>Example: Andrea starts a thread, James replies to Andrea, and Kevin replies to James </li></ul></ul></ul><ul><ul><ul><li>(Directed) links: J -> A, K -> J </li></ul></ul></ul>
    4. 4. Methodological Issues <ul><li>Choice of measures </li></ul><ul><ul><li>Are the measures appropriate to the data? </li></ul></ul><ul><ul><li>Example: broadcast network (e.g. email list) violates assumptions of brokerage measures based on control of information flow </li></ul></ul><ul><li>Time </li></ul><ul><ul><li>Aggregation is necessary but masks informative details </li></ul></ul><ul><li>Intensity of interactions </li></ul><ul><ul><li>Most SNA measures are binary </li></ul></ul><ul><li>Variations across venues </li></ul><ul><ul><li>A significant issue for FLOSS SNA study sampling </li></ul></ul>
    5. 5. Measures and Time <ul><li>Measures </li></ul><ul><ul><li>Outdegree centralization: whole network measure of inequality in communication contribution </li></ul></ul><ul><ul><li>Outdegree (centrality): number of outbound links in a directed network, used for reply-to structure of threads </li></ul></ul><ul><ul><li>Centralization: relationship between all centralities in the network </li></ul></ul><ul><ul><ul><li>High values: a few individuals make most responses </li></ul></ul></ul><ul><ul><ul><li>Low values: more equal communication levels </li></ul></ul></ul><ul><li>Time </li></ul><ul><ul><li>Series of snapshots of network, with sliding window to handle low-volume time periods </li></ul></ul>
    6. 6. Sliding Windows and Smoothing 90 days 30 days 30 days 90 day window, sliding forward by 30 days at a time
    7. 7. Intensity and Variation Across Venues <ul><li>Intensity </li></ul><ul><ul><li>Created an intensity-based smoothing function </li></ul></ul><ul><ul><li>Exponential decay of interaction weight as time passes: more recent events more heavily weighted </li></ul></ul><ul><ul><li>Threshold dichotomization for use with binary SNA measures </li></ul></ul><ul><li>Variation in venues </li></ul><ul><ul><li>Analysis of all venues for each project </li></ul></ul><ul><ul><li>Grouped by target audience/purpose for venues: users, developers, and trackers </li></ul></ul><ul><ul><li>Trackers include bugs, feature requests, etc. </li></ul></ul>
    8. 8. Venue Volumes Over Time
    9. 9. Data <ul><li>Data from FLOSSmole </li></ul><ul><li>Compared two projects: Fire & Gaim, both IM clients </li></ul><ul><li>Gaim </li></ul><ul><ul><li>November 1999 - April 2006, when project identity changed to Pidgin </li></ul></ul><ul><ul><li>4 trackers, 1 user forum, 2 developer email lists </li></ul></ul><ul><ul><li>Considered successful </li></ul></ul><ul><li>Fire </li></ul><ul><ul><li>2001 - March 2006, project’s final release </li></ul></ul><ul><ul><li>2 trackers, 1 user email list, 2 developer email lists </li></ul></ul><ul><ul><li>Not considered successful </li></ul></ul>
    10. 10. Intensity-Based Smoothing <ul><li>(simplified) R script for calculating edge weights: </li></ul><ul><ul><li>end.date , first.date , event.date : inputs for beginning and end dates for period, plus date of event </li></ul></ul><ul><ul><li>total.time <- end.date - first.date + 1 elapsed.time <- event.date - first.date +1 event.rate <- ( total.time - elapsed.time )/ total.time event.weight <- exp(-log( total.time )* event.rate ) </li></ul></ul><ul><ul><li>Sum up interaction weights for each dyad in time period for edge weight </li></ul></ul><ul><li>Intent is to reduce undesirable effects of smoothing from overlapping windows </li></ul>
    11. 11. Applying the Smoothing Function
    12. 12. Effect of Edge Weighting on Smoothing <ul><li>No effect on a very large data set like Gaim </li></ul><ul><li>Improved smoothing over unit weighting for sparse data </li></ul><ul><ul><li>Most difference seen in least active venue </li></ul></ul><ul><li>Exhaustive sensitivity testing - future work </li></ul>
    13. 13. Scientific Workflows for Dynamic SNA <ul><li>Used Taverna Workbench to design analysis workflow </li></ul><ul><li>Automated method for replicable analysis of large data sets </li></ul><ul><li>Example: Gaim data set </li></ul><ul><ul><li>41K events in user forum </li></ul></ul><ul><ul><li>30K events in developer venues </li></ul></ul><ul><ul><li>20K events in trackers </li></ul></ul><ul><li>Can (and intend to) apply same analysis to data for additional projects </li></ul>
    14. 14. Findings: Variations in Dynamics <ul><li>Compared mean and standard deviation of centralizations in aggregated venues </li></ul><ul><li>For both projects, different venues showed different communication dynamics </li></ul>
    15. 15. Findings: Gaim <ul><li>Average centralizations lowest for user email list and highest for the developer lists: number of participants matters </li></ul><ul><li>Standard deviations of the centralizations for the user forum and the trackers are comparable, while the standard deviation for the developer lists was much lower: consistency of participation </li></ul><ul><li>Centralization trends reflect more varied participation dynamic in user forum and trackers, more regular for developer lists </li></ul><ul><li>Periodic spikes in tracker activity (to be continued…) </li></ul>
    16. 16. Findings: Fire <ul><li>Comparable mean values for centralizations for user and developer venues, but higher for tracker </li></ul><ul><ul><li>Anomalous data pattern, to be explained momentarily </li></ul></ul><ul><li>Excluding the anomalous tracker data, means and standard deviations of centralizations for trackers and email lists were comparable </li></ul><ul><li>Suggests a degree of regularity across the communication venues </li></ul><ul><li>All venues tended toward decentralization over time </li></ul><ul><ul><li>Except for that anomaly… </li></ul></ul>
    17. 17. Tracker Housekeeping Behavior <ul><li>Observed anomalous patterns in trackers for both projects: periodic centralization spikes </li></ul><ul><li>A single user makes batch bug closings (up to 279!) </li></ul><ul><ul><li>Fire’s (feature request) tracker housekeeping appears to be preparation for project closure </li></ul></ul><ul><ul><li>Gaim’s tracker housekeeping was more regular and repeated </li></ul></ul>Cleaning up before shutting down
    18. 18. Observations on Dynamics <ul><li>Different levels of correlation between venues, suggesting different types of interactions </li></ul><ul><li>User venues more decentralized than developer venues, reflecting greater number of participants </li></ul><ul><li>Overall trend toward decentralization over time could be a result of different influences </li></ul><ul><ul><li>Fire: decentralization due to loss of project leadership </li></ul></ul><ul><ul><li>Gaim: decentralization due to growth in user participation </li></ul></ul><ul><ul><li>Highlights duality of centralization measure: can be affected by leadership and/or number of participants </li></ul></ul><ul><li>Variations indicate reason for concern over validity </li></ul>
    19. 19. Conclusion <ul><li>Contributed an original method for exponentially decayed edge weightings in dynamic networks </li></ul><ul><li>Variation in communication centralization dynamics across venues has implications for research design </li></ul><ul><li>Periodic project management activities are apparent in batch bug closings by few individuals, which cause spikes in tracker centralization </li></ul><ul><ul><li>Interesting housekeeping behavior, but also a potential confound for analysis based on trackers </li></ul></ul><ul><li>All venues in both projects tended toward decentralization over time </li></ul>

    ×