Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Dynamic Social Network Analysis (and more!) with eResearch Tools


Published on

A presentation for the OSS Watch Expert Workshop on Profiling Communities, demonstrating eResearch methodology applied to replicating research on open source software development.

Published in: Economy & Finance, Technology
  • Be the first to comment

Dynamic Social Network Analysis (and more!) with eResearch Tools

  1. 1. Dynamic Social Network Analysis (and more!) with eResearch Tools Andrea Wiggins iSchool @ Syracuse University 21 July, 2008
  2. 2. eResearch for FLOSS <ul><li>An approach to research using cyberinfrastructure </li></ul><ul><li>Collaborative and transparent, like FLOSS </li></ul><ul><li>Large-scale shared data sets </li></ul><ul><ul><li>FLOSSmole </li></ul></ul><ul><ul><li>Notre Dame SourceForge dumps </li></ul></ul><ul><ul><li>CVSanalY </li></ul></ul><ul><ul><li>Etc… </li></ul></ul><ul><li>Uses tools and analyses that allow sharing among researchers to support open science ideals </li></ul><ul><ul><li>Taverna Workbench </li></ul></ul><ul><ul><li> </li></ul></ul>
  3. 3. Using Taverna <ul><li>Scientific analysis workflow tool </li></ul><ul><ul><li>Open source development lead by myGrid team </li></ul></ul><ul><ul><li>Target users are UK life sciences community </li></ul></ul><ul><li>Create analysis workflows by connecting modular components through input/output ports </li></ul><ul><ul><li>Produces ( rigorous ) analyses that are replicable, self-documenting, and easy to share </li></ul></ul><ul><ul><li>Components include WSDL SOAP web services, Beanshell, RShell, and local Java shims </li></ul></ul><ul><li>Collaboratively developing our workflows </li></ul>
  4. 4. Replicating FLOSS Research as eResearch <ul><li>Replicating a selection of FLOSS papers and presentations, currently in progress </li></ul><ul><li>Demonstrating utility and viability of eResearch approaches for FLOSS and social science </li></ul><ul><li>Building reusable, customizable analysis components specific to FLOSS research, e.g. for data selection, sociomatrix generation for SNA, etc. </li></ul><ul><li>Extending the original research analysis by parameterization (inputs, thresholds) and implementing “future work” suggestions of authors (plus our own ideas, of course) </li></ul>
  5. 5. Social dynamics of FLOSS communications <ul><li>Replication of Howison, Inoue & Crowston, 2006 </li></ul><ul><ul><li>Compute dynamic network centrality of projects from trackers for 120 projects </li></ul></ul><ul><li>Extension </li></ul><ul><ul><li>Added exponentially-decayed edge weighting function (needs sensitivity testing) </li></ul></ul><ul><ul><li>Made sliding window adjustable </li></ul></ul><ul><ul><li>Can apply to any threaded communication venue for which data is available </li></ul></ul><ul><ul><li>Completed: all venues for 2 projects; queued: 216 projects with 635 venues! </li></ul></ul>
  6. 6. Workflow for Dynamic SNA
  7. 7. Dynamic SNA Across FLOSS Communication Channels <ul><li>Clearly a lot of variation across channels (user, developer & trackers), no easily observed patterns except overall trend toward decentralization </li></ul><ul><li>Implications: carefully match theoretical constructs to data sampling, as different venues are very likely to yield different results, which significantly impacts interpretations </li></ul>
  8. 8. “Do the Rich Get Richer?” <ul><li>Replication of OSCon 2004 presentation by Conklin </li></ul><ul><ul><li>Demonstrate scale-free distribution of developers among projects </li></ul></ul><ul><li>Almost there </li></ul><ul><ul><li>A little more analysis to replicate </li></ul></ul><ul><li>Hoping to extend to dynamic analysis of preferential attachment </li></ul><ul><ul><li>Showing change to project sizes over time </li></ul></ul><ul><ul><li>Comparing evolution and growth across repositories </li></ul></ul>
  9. 9. Workflow for Rich Get Richer
  10. 10. <ul><li>Using a single FLOSSmole summary statistic </li></ul><ul><li>Very simple workflow, can expand analysis considerably </li></ul><ul><li>Analysis of over 65K projects completes in under 3 minutes! </li></ul>Scale-free Developer Distribution in FLOSS
  11. 11. “Identifying success and tragedy of FLOSS Commons” <ul><li>Replication of English & Schweik, 2007 </li></ul><ul><ul><li>Classification of project success by stage of growth for 110K projects as of August 2006 </li></ul></ul><ul><ul><li>Requires data from 2 repositories, FLOSSmole & ND </li></ul></ul><ul><li>Extension </li></ul><ul><ul><li>Parameterized all thresholds, makes sensitivity analysis possible </li></ul></ul><ul><ul><li>Added 2 additional options for a criterion test, one suggested by authors in article </li></ul></ul><ul><ul><li>Limitation: slightly less available data in FLOSSmole, 94K projects as of April 2005 </li></ul></ul>
  12. 12. Workflow for Success-Abandonment Classification
  13. 13. Classifying FLOSS Projects <ul><li>Very complex data requirements; meshing across repositories </li></ul><ul><li>Difficult to scale and resource intensive </li></ul><ul><li>Already using this workflow for project sampling </li></ul><ul><li>For small (non-random) sample of 54 projects: </li></ul><ul><ul><li>64% growth, 17% initiation, 19% null (i.e. missing data) </li></ul></ul><ul><ul><ul><li>Indeterminate Growth: 18.9% </li></ul></ul></ul><ul><ul><ul><li>Success Growth: 39.6% </li></ul></ul></ul><ul><ul><ul><li>Tragedy Growth: 7.5% </li></ul></ul></ul><ul><ul><ul><li>Other: 34% </li></ul></ul></ul>amsn,downloaded,growth,enough.releases,active,ok.release.rate,true,SG anjuta,downloaded,growth,enough.releases,active,ok.release.rate,false,SG anon,downloaded,growth,enough.releases,inactive,fast.release.rate,false,TG etc…
  14. 14. Future Directions <ul><li>Replication of “Evolution & Growth in Large Libre Software Projects” by Robles et al., 2005 </li></ul><ul><li>Prototyping OWL ontology of FLOSS communication data, already in use with RDF & SPARQL </li></ul><ul><li>Cross-linking data, analyses, and papers </li></ul><ul><li>Increasing scale of analyses to thousands of projects </li></ul><ul><li>Extending analyses, sensitivity testing to strengthen findings </li></ul><ul><li>Building reusable analysis components to share, enabling cumulative research </li></ul>
  15. 15. Thanks! <ul><li>More at </li></ul>