• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Dynamic Social Network Analysis (and more!) with eResearch Tools
 

Dynamic Social Network Analysis (and more!) with eResearch Tools

on

  • 2,821 views

A presentation for the OSS Watch Expert Workshop on Profiling Communities, demonstrating eResearch methodology applied to replicating research on open source software development.

A presentation for the OSS Watch Expert Workshop on Profiling Communities, demonstrating eResearch methodology applied to replicating research on open source software development.

Statistics

Views

Total Views
2,821
Views on SlideShare
2,821
Embed Views
0

Actions

Likes
1
Downloads
55
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Dynamic Social Network Analysis (and more!) with eResearch Tools Dynamic Social Network Analysis (and more!) with eResearch Tools Presentation Transcript

    • Dynamic Social Network Analysis (and more!) with eResearch Tools Andrea Wiggins iSchool @ Syracuse University 21 July, 2008
    • eResearch for FLOSS
      • An approach to research using cyberinfrastructure
      • Collaborative and transparent, like FLOSS
      • Large-scale shared data sets
        • FLOSSmole
        • Notre Dame SourceForge dumps
        • CVSanalY
        • Etc…
      • Uses tools and analyses that allow sharing among researchers to support open science ideals
        • Taverna Workbench
        • MyExperiment.org
    • Using Taverna
      • Scientific analysis workflow tool
        • Open source development lead by myGrid team
        • Target users are UK life sciences community
      • Create analysis workflows by connecting modular components through input/output ports
        • Produces ( rigorous ) analyses that are replicable, self-documenting, and easy to share
        • Components include WSDL SOAP web services, Beanshell, RShell, and local Java shims
      • Collaboratively developing our workflows
    • Replicating FLOSS Research as eResearch
      • Replicating a selection of FLOSS papers and presentations, currently in progress
      • Demonstrating utility and viability of eResearch approaches for FLOSS and social science
      • Building reusable, customizable analysis components specific to FLOSS research, e.g. for data selection, sociomatrix generation for SNA, etc.
      • Extending the original research analysis by parameterization (inputs, thresholds) and implementing “future work” suggestions of authors (plus our own ideas, of course)
    • Social dynamics of FLOSS communications
      • Replication of Howison, Inoue & Crowston, 2006
        • Compute dynamic network centrality of projects from trackers for 120 projects
      • Extension
        • Added exponentially-decayed edge weighting function (needs sensitivity testing)
        • Made sliding window adjustable
        • Can apply to any threaded communication venue for which data is available
        • Completed: all venues for 2 projects; queued: 216 projects with 635 venues!
    • Workflow for Dynamic SNA
    • Dynamic SNA Across FLOSS Communication Channels
      • Clearly a lot of variation across channels (user, developer & trackers), no easily observed patterns except overall trend toward decentralization
      • Implications: carefully match theoretical constructs to data sampling, as different venues are very likely to yield different results, which significantly impacts interpretations
    • “Do the Rich Get Richer?”
      • Replication of OSCon 2004 presentation by Conklin
        • Demonstrate scale-free distribution of developers among projects
      • Almost there
        • A little more analysis to replicate
      • Hoping to extend to dynamic analysis of preferential attachment
        • Showing change to project sizes over time
        • Comparing evolution and growth across repositories
    • Workflow for Rich Get Richer
      • Using a single FLOSSmole summary statistic
      • Very simple workflow, can expand analysis considerably
      • Analysis of over 65K projects completes in under 3 minutes!
      Scale-free Developer Distribution in FLOSS
    • “Identifying success and tragedy of FLOSS Commons”
      • Replication of English & Schweik, 2007
        • Classification of project success by stage of growth for 110K projects as of August 2006
        • Requires data from 2 repositories, FLOSSmole & ND
      • Extension
        • Parameterized all thresholds, makes sensitivity analysis possible
        • Added 2 additional options for a criterion test, one suggested by authors in article
        • Limitation: slightly less available data in FLOSSmole, 94K projects as of April 2005
    • Workflow for Success-Abandonment Classification
    • Classifying FLOSS Projects
      • Very complex data requirements; meshing across repositories
      • Difficult to scale and resource intensive
      • Already using this workflow for project sampling
      • For small (non-random) sample of 54 projects:
        • 64% growth, 17% initiation, 19% null (i.e. missing data)
          • Indeterminate Growth: 18.9%
          • Success Growth: 39.6%
          • Tragedy Growth: 7.5%
          • Other: 34%
      amsn,downloaded,growth,enough.releases,active,ok.release.rate,true,SG anjuta,downloaded,growth,enough.releases,active,ok.release.rate,false,SG anon,downloaded,growth,enough.releases,inactive,fast.release.rate,false,TG etc…
    • Future Directions
      • Replication of “Evolution & Growth in Large Libre Software Projects” by Robles et al., 2005
      • Prototyping OWL ontology of FLOSS communication data, already in use with RDF & SPARQL
      • Cross-linking data, analyses, and papers
      • Increasing scale of analyses to thousands of projects
      • Extending analyses, sensitivity testing to strengthen findings
      • Building reusable analysis components to share, enabling cumulative research
    • Thanks!
      • More at floss.syr.edu/publications/