• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Reclassifying Success and Tragedy in FLOSS Projects
 

Reclassifying Success and Tragedy in FLOSS Projects

on

  • 945 views

 

Statistics

Views

Total Views
945
Views on SlideShare
925
Embed Views
20

Actions

Likes
0
Downloads
1
Comments
0

6 Embeds 20

http://www.slideshare.net 7
http://floss.syr.edu 4
http://floss-test.syr.edu 4
http://flossplanet.info 3
http://crowston.syr.edu 1
http://www.ischool.su 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Shift in TI may be because the thresholds set for determining project failure are too low?

Reclassifying Success and Tragedy in FLOSS Projects Reclassifying Success and Tragedy in FLOSS Projects Presentation Transcript

  • Reclassifying Success and Tragedy in FLOSS Projects Andrea Wiggins Kevin Crowston 1 June, 2010
  • Motivation
    • Replication of prior research using data from RoRs
      • English & Schweik’s 2007 classification of project growth and success
      • Relevant to both researchers and practitioners
  • Success and Tragedy
    • English & Schweik generated classification criteria based on empirical research
      • Stage of growth : Initiation (I) or Growth (G)
      • Outcome : Success (S), Tragedy (T) or Indeterminate (I)
      • Variables (project level): age, releases, release timing, downloads, other distribution channels
  • Replication
    • Data
      • Used SRDA data
      • Original used FLOSSmole data + spidered data
        • Some difference between data sets due to offsets in data collection dates for original
    • Analysis
      • Identical criteria for 4 of 6 classes
      • Slight variations for TG and SG were operationally equivalent to original
  • Extended Analysis: Suggested Future Work
    • Release-based sustainability criteria
      • Original operationalization does not account for diverse release management strategies
    • Added two different release rate criteria
      • Original: time between first and last releases
      • V2: Threshold for time between most recent releases (suggested by English & Schweik)
      • V3: Average time between each release (my idea)
  • Extended Analysis: Over Time
    • Additional dates
      • Original used data from October 2006
      • Added April 2006 for short-term comparison of stability of classification
      • With the default values for several variables, project status can change in a period of 6 months, affecting classification status
  • Results
    • Comparison to original published results (from 2007 paper)
    • Comparison of results from varying the release rate classification criterion
    • Comparison of classifications over time, Markov model showing state changes
  • Comparison to Original Results 117,733 (+9.6%) 119,355 Total n/a 8,422 other 16,045 (14%) 15,782 (15%) SG 32,642 (28%) 30,592 (28%) TG 36,507 (31%) 37,320 (35%) TI 16,252 (14%) 10,711 (10%) IG 16,252 (16%) 13,342 (12%) II 3,296 3,186 unclassifiable Replication Original 2006-10
  • Comparison of Release Rate Criteria 32% 32% 32% TI 33% 27% 28% TG 3% 13% 13% SG 14% 14% 14% II 16% 12% 11% IG Method 3* Method 2 Method 1 2006-10-23
  • Comparison Over Time 117,733 112,430 Total 3,296 (2.8%) 3,343 (3.0%) unclassifiable 36,507 (31.0%) 39,948 (35.5%) TI 32,642 (27.7%) 28,777 (25.6%) TG 16,045 (13.6%) 14,244 (12.7%) SG 16,252 (13.8%) 13,592 (12.4%) II 12,991 (11.0%) 12,166 (10.8%) IG 2006-10-23 2006-04-21 Class
  • Changes to Project Classification
  • Changes to Project Classification
  • Discussion of Methods
    • Challenges for large-scale analysis
      • Data exceptions + automated processing
        • Allow extra time to refine data handling
        • Adapt processes for changes in data structures
      • Managing data flow across tools
        • Advantages and disadvantages for each tool
        • Create test data sets to speed debugging
  • Limitations
    • Same as the original work
      • Generalizability beyond SF, imperfect data sources, simplistic measures
    • Specific to this work
      • Changes to data source
      • Need for sensitivity analysis on parameters
    • Inherent to topic and methods
      • Hard (impossible?) to validate empirically on large scale
  • Future Work
    • Additional replication and extension
    • Exhaustive testing of threshold values
    • Evaluate alternate measures & dynamic thresholds based on project statistics
    • Incorporate CVS/email/forum data
    • More closely examine changes in classification over time
  • Conclusions
    • Replicated classification of FLOSS project growth and development
    • Extended analysis with variations on classification criteria
    • Extended analysis with additional date
    • Recommendations for large-scale FLOSS data analysis and future work
  • Questions?
    • Data, workflows & scripts:
    • http://floss.syr.edu/reclassifying-success-and-tragedy