Dynamic Social Network Analysis (and more!) with eResearch Tools

  • 1,606 views
Uploaded on

A presentation for the OSS Watch Expert Workshop on Profiling Communities, demonstrating eResearch methodology applied to replicating research on open source software development.

A presentation for the OSS Watch Expert Workshop on Profiling Communities, demonstrating eResearch methodology applied to replicating research on open source software development.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,606
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
55
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Dynamic Social Network Analysis (and more!) with eResearch Tools Andrea Wiggins iSchool @ Syracuse University 21 July, 2008
  • 2. eResearch for FLOSS
    • An approach to research using cyberinfrastructure
    • Collaborative and transparent, like FLOSS
    • Large-scale shared data sets
      • FLOSSmole
      • Notre Dame SourceForge dumps
      • CVSanalY
      • Etc…
    • Uses tools and analyses that allow sharing among researchers to support open science ideals
      • Taverna Workbench
      • MyExperiment.org
  • 3. Using Taverna
    • Scientific analysis workflow tool
      • Open source development lead by myGrid team
      • Target users are UK life sciences community
    • Create analysis workflows by connecting modular components through input/output ports
      • Produces ( rigorous ) analyses that are replicable, self-documenting, and easy to share
      • Components include WSDL SOAP web services, Beanshell, RShell, and local Java shims
    • Collaboratively developing our workflows
  • 4. Replicating FLOSS Research as eResearch
    • Replicating a selection of FLOSS papers and presentations, currently in progress
    • Demonstrating utility and viability of eResearch approaches for FLOSS and social science
    • Building reusable, customizable analysis components specific to FLOSS research, e.g. for data selection, sociomatrix generation for SNA, etc.
    • Extending the original research analysis by parameterization (inputs, thresholds) and implementing “future work” suggestions of authors (plus our own ideas, of course)
  • 5. Social dynamics of FLOSS communications
    • Replication of Howison, Inoue & Crowston, 2006
      • Compute dynamic network centrality of projects from trackers for 120 projects
    • Extension
      • Added exponentially-decayed edge weighting function (needs sensitivity testing)
      • Made sliding window adjustable
      • Can apply to any threaded communication venue for which data is available
      • Completed: all venues for 2 projects; queued: 216 projects with 635 venues!
  • 6. Workflow for Dynamic SNA
  • 7. Dynamic SNA Across FLOSS Communication Channels
    • Clearly a lot of variation across channels (user, developer & trackers), no easily observed patterns except overall trend toward decentralization
    • Implications: carefully match theoretical constructs to data sampling, as different venues are very likely to yield different results, which significantly impacts interpretations
  • 8. “Do the Rich Get Richer?”
    • Replication of OSCon 2004 presentation by Conklin
      • Demonstrate scale-free distribution of developers among projects
    • Almost there
      • A little more analysis to replicate
    • Hoping to extend to dynamic analysis of preferential attachment
      • Showing change to project sizes over time
      • Comparing evolution and growth across repositories
  • 9. Workflow for Rich Get Richer
  • 10.
    • Using a single FLOSSmole summary statistic
    • Very simple workflow, can expand analysis considerably
    • Analysis of over 65K projects completes in under 3 minutes!
    Scale-free Developer Distribution in FLOSS
  • 11. “Identifying success and tragedy of FLOSS Commons”
    • Replication of English & Schweik, 2007
      • Classification of project success by stage of growth for 110K projects as of August 2006
      • Requires data from 2 repositories, FLOSSmole & ND
    • Extension
      • Parameterized all thresholds, makes sensitivity analysis possible
      • Added 2 additional options for a criterion test, one suggested by authors in article
      • Limitation: slightly less available data in FLOSSmole, 94K projects as of April 2005
  • 12. Workflow for Success-Abandonment Classification
  • 13. Classifying FLOSS Projects
    • Very complex data requirements; meshing across repositories
    • Difficult to scale and resource intensive
    • Already using this workflow for project sampling
    • For small (non-random) sample of 54 projects:
      • 64% growth, 17% initiation, 19% null (i.e. missing data)
        • Indeterminate Growth: 18.9%
        • Success Growth: 39.6%
        • Tragedy Growth: 7.5%
        • Other: 34%
    amsn,downloaded,growth,enough.releases,active,ok.release.rate,true,SG anjuta,downloaded,growth,enough.releases,active,ok.release.rate,false,SG anon,downloaded,growth,enough.releases,inactive,fast.release.rate,false,TG etc…
  • 14. Future Directions
    • Replication of “Evolution & Growth in Large Libre Software Projects” by Robles et al., 2005
    • Prototyping OWL ontology of FLOSS communication data, already in use with RDF & SPARQL
    • Cross-linking data, analyses, and papers
    • Increasing scale of analyses to thousands of projects
    • Extending analyses, sensitivity testing to strengthen findings
    • Building reusable analysis components to share, enabling cumulative research
  • 15. Thanks!
    • More at floss.syr.edu/publications/