Dynamic Social Network Analysis (and more!) with eResearch Tools


Published on

A presentation for the OSS Watch Expert Workshop on Profiling Communities, demonstrating eResearch methodology applied to replicating research on open source software development.

Published in: Economy & Finance, Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Dynamic Social Network Analysis (and more!) with eResearch Tools

  1. 1. Dynamic Social Network Analysis (and more!) with eResearch Tools Andrea Wiggins iSchool @ Syracuse University 21 July, 2008
  2. 2. eResearch for FLOSS <ul><li>An approach to research using cyberinfrastructure </li></ul><ul><li>Collaborative and transparent, like FLOSS </li></ul><ul><li>Large-scale shared data sets </li></ul><ul><ul><li>FLOSSmole </li></ul></ul><ul><ul><li>Notre Dame SourceForge dumps </li></ul></ul><ul><ul><li>CVSanalY </li></ul></ul><ul><ul><li>Etc… </li></ul></ul><ul><li>Uses tools and analyses that allow sharing among researchers to support open science ideals </li></ul><ul><ul><li>Taverna Workbench </li></ul></ul><ul><ul><li>MyExperiment.org </li></ul></ul>
  3. 3. Using Taverna <ul><li>Scientific analysis workflow tool </li></ul><ul><ul><li>Open source development lead by myGrid team </li></ul></ul><ul><ul><li>Target users are UK life sciences community </li></ul></ul><ul><li>Create analysis workflows by connecting modular components through input/output ports </li></ul><ul><ul><li>Produces ( rigorous ) analyses that are replicable, self-documenting, and easy to share </li></ul></ul><ul><ul><li>Components include WSDL SOAP web services, Beanshell, RShell, and local Java shims </li></ul></ul><ul><li>Collaboratively developing our workflows </li></ul>
  4. 4. Replicating FLOSS Research as eResearch <ul><li>Replicating a selection of FLOSS papers and presentations, currently in progress </li></ul><ul><li>Demonstrating utility and viability of eResearch approaches for FLOSS and social science </li></ul><ul><li>Building reusable, customizable analysis components specific to FLOSS research, e.g. for data selection, sociomatrix generation for SNA, etc. </li></ul><ul><li>Extending the original research analysis by parameterization (inputs, thresholds) and implementing “future work” suggestions of authors (plus our own ideas, of course) </li></ul>
  5. 5. Social dynamics of FLOSS communications <ul><li>Replication of Howison, Inoue & Crowston, 2006 </li></ul><ul><ul><li>Compute dynamic network centrality of projects from trackers for 120 projects </li></ul></ul><ul><li>Extension </li></ul><ul><ul><li>Added exponentially-decayed edge weighting function (needs sensitivity testing) </li></ul></ul><ul><ul><li>Made sliding window adjustable </li></ul></ul><ul><ul><li>Can apply to any threaded communication venue for which data is available </li></ul></ul><ul><ul><li>Completed: all venues for 2 projects; queued: 216 projects with 635 venues! </li></ul></ul>
  6. 6. Workflow for Dynamic SNA
  7. 7. Dynamic SNA Across FLOSS Communication Channels <ul><li>Clearly a lot of variation across channels (user, developer & trackers), no easily observed patterns except overall trend toward decentralization </li></ul><ul><li>Implications: carefully match theoretical constructs to data sampling, as different venues are very likely to yield different results, which significantly impacts interpretations </li></ul>
  8. 8. “Do the Rich Get Richer?” <ul><li>Replication of OSCon 2004 presentation by Conklin </li></ul><ul><ul><li>Demonstrate scale-free distribution of developers among projects </li></ul></ul><ul><li>Almost there </li></ul><ul><ul><li>A little more analysis to replicate </li></ul></ul><ul><li>Hoping to extend to dynamic analysis of preferential attachment </li></ul><ul><ul><li>Showing change to project sizes over time </li></ul></ul><ul><ul><li>Comparing evolution and growth across repositories </li></ul></ul>
  9. 9. Workflow for Rich Get Richer
  10. 10. <ul><li>Using a single FLOSSmole summary statistic </li></ul><ul><li>Very simple workflow, can expand analysis considerably </li></ul><ul><li>Analysis of over 65K projects completes in under 3 minutes! </li></ul>Scale-free Developer Distribution in FLOSS
  11. 11. “Identifying success and tragedy of FLOSS Commons” <ul><li>Replication of English & Schweik, 2007 </li></ul><ul><ul><li>Classification of project success by stage of growth for 110K projects as of August 2006 </li></ul></ul><ul><ul><li>Requires data from 2 repositories, FLOSSmole & ND </li></ul></ul><ul><li>Extension </li></ul><ul><ul><li>Parameterized all thresholds, makes sensitivity analysis possible </li></ul></ul><ul><ul><li>Added 2 additional options for a criterion test, one suggested by authors in article </li></ul></ul><ul><ul><li>Limitation: slightly less available data in FLOSSmole, 94K projects as of April 2005 </li></ul></ul>
  12. 12. Workflow for Success-Abandonment Classification
  13. 13. Classifying FLOSS Projects <ul><li>Very complex data requirements; meshing across repositories </li></ul><ul><li>Difficult to scale and resource intensive </li></ul><ul><li>Already using this workflow for project sampling </li></ul><ul><li>For small (non-random) sample of 54 projects: </li></ul><ul><ul><li>64% growth, 17% initiation, 19% null (i.e. missing data) </li></ul></ul><ul><ul><ul><li>Indeterminate Growth: 18.9% </li></ul></ul></ul><ul><ul><ul><li>Success Growth: 39.6% </li></ul></ul></ul><ul><ul><ul><li>Tragedy Growth: 7.5% </li></ul></ul></ul><ul><ul><ul><li>Other: 34% </li></ul></ul></ul>amsn,downloaded,growth,enough.releases,active,ok.release.rate,true,SG anjuta,downloaded,growth,enough.releases,active,ok.release.rate,false,SG anon,downloaded,growth,enough.releases,inactive,fast.release.rate,false,TG etc…
  14. 14. Future Directions <ul><li>Replication of “Evolution & Growth in Large Libre Software Projects” by Robles et al., 2005 </li></ul><ul><li>Prototyping OWL ontology of FLOSS communication data, already in use with RDF & SPARQL </li></ul><ul><li>Cross-linking data, analyses, and papers </li></ul><ul><li>Increasing scale of analyses to thousands of projects </li></ul><ul><li>Extending analyses, sensitivity testing to strengthen findings </li></ul><ul><li>Building reusable analysis components to share, enabling cumulative research </li></ul>
  15. 15. Thanks! <ul><li>More at floss.syr.edu/publications/ </li></ul>