Reclassifying Success and Tragedy in FLOSS Projects
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Reclassifying Success and Tragedy in FLOSS Projects

on

  • 1,097 views

 

Statistics

Views

Total Views
1,097
Views on SlideShare
1,077
Embed Views
20

Actions

Likes
0
Downloads
1
Comments
0

6 Embeds 20

http://www.slideshare.net 7
http://floss.syr.edu 4
http://floss-test.syr.edu 4
http://flossplanet.info 3
http://crowston.syr.edu 1
http://www.ischool.su 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Shift in TI may be because the thresholds set for determining project failure are too low?

Reclassifying Success and Tragedy in FLOSS Projects Presentation Transcript

  • 1. Reclassifying Success and Tragedy in FLOSS Projects Andrea Wiggins Kevin Crowston 1 June, 2010
  • 2. Motivation
    • Replication of prior research using data from RoRs
      • English & Schweik’s 2007 classification of project growth and success
      • Relevant to both researchers and practitioners
  • 3. Success and Tragedy
    • English & Schweik generated classification criteria based on empirical research
      • Stage of growth : Initiation (I) or Growth (G)
      • Outcome : Success (S), Tragedy (T) or Indeterminate (I)
      • Variables (project level): age, releases, release timing, downloads, other distribution channels
  • 4. Replication
    • Data
      • Used SRDA data
      • Original used FLOSSmole data + spidered data
        • Some difference between data sets due to offsets in data collection dates for original
    • Analysis
      • Identical criteria for 4 of 6 classes
      • Slight variations for TG and SG were operationally equivalent to original
  • 5. Extended Analysis: Suggested Future Work
    • Release-based sustainability criteria
      • Original operationalization does not account for diverse release management strategies
    • Added two different release rate criteria
      • Original: time between first and last releases
      • V2: Threshold for time between most recent releases (suggested by English & Schweik)
      • V3: Average time between each release (my idea)
  • 6. Extended Analysis: Over Time
    • Additional dates
      • Original used data from October 2006
      • Added April 2006 for short-term comparison of stability of classification
      • With the default values for several variables, project status can change in a period of 6 months, affecting classification status
  • 7. Results
    • Comparison to original published results (from 2007 paper)
    • Comparison of results from varying the release rate classification criterion
    • Comparison of classifications over time, Markov model showing state changes
  • 8. Comparison to Original Results 117,733 (+9.6%) 119,355 Total n/a 8,422 other 16,045 (14%) 15,782 (15%) SG 32,642 (28%) 30,592 (28%) TG 36,507 (31%) 37,320 (35%) TI 16,252 (14%) 10,711 (10%) IG 16,252 (16%) 13,342 (12%) II 3,296 3,186 unclassifiable Replication Original 2006-10
  • 9. Comparison of Release Rate Criteria 32% 32% 32% TI 33% 27% 28% TG 3% 13% 13% SG 14% 14% 14% II 16% 12% 11% IG Method 3* Method 2 Method 1 2006-10-23
  • 10. Comparison Over Time 117,733 112,430 Total 3,296 (2.8%) 3,343 (3.0%) unclassifiable 36,507 (31.0%) 39,948 (35.5%) TI 32,642 (27.7%) 28,777 (25.6%) TG 16,045 (13.6%) 14,244 (12.7%) SG 16,252 (13.8%) 13,592 (12.4%) II 12,991 (11.0%) 12,166 (10.8%) IG 2006-10-23 2006-04-21 Class
  • 11. Changes to Project Classification
  • 12. Changes to Project Classification
  • 13. Discussion of Methods
    • Challenges for large-scale analysis
      • Data exceptions + automated processing
        • Allow extra time to refine data handling
        • Adapt processes for changes in data structures
      • Managing data flow across tools
        • Advantages and disadvantages for each tool
        • Create test data sets to speed debugging
  • 14. Limitations
    • Same as the original work
      • Generalizability beyond SF, imperfect data sources, simplistic measures
    • Specific to this work
      • Changes to data source
      • Need for sensitivity analysis on parameters
    • Inherent to topic and methods
      • Hard (impossible?) to validate empirically on large scale
  • 15. Future Work
    • Additional replication and extension
    • Exhaustive testing of threshold values
    • Evaluate alternate measures & dynamic thresholds based on project statistics
    • Incorporate CVS/email/forum data
    • More closely examine changes in classification over time
  • 16. Conclusions
    • Replicated classification of FLOSS project growth and development
    • Extended analysis with variations on classification criteria
    • Extended analysis with additional date
    • Recommendations for large-scale FLOSS data analysis and future work
  • 17. Questions?
    • Data, workflows & scripts:
    • http://floss.syr.edu/reclassifying-success-and-tragedy