Your SlideShare is downloading. ×
0
Reclassifying Success and Tragedy in FLOSS Projects Andrea Wiggins Kevin Crowston 1 June, 2010
Motivation <ul><li>Replication of prior research using data from RoRs </li></ul><ul><ul><li>English & Schweik’s 2007 class...
Success and Tragedy <ul><li>English & Schweik generated classification criteria based on empirical research </li></ul><ul>...
Replication <ul><li>Data </li></ul><ul><ul><li>Used SRDA data </li></ul></ul><ul><ul><li>Original used FLOSSmole data + sp...
Extended Analysis:  Suggested Future Work <ul><li>Release-based sustainability criteria </li></ul><ul><ul><li>Original ope...
Extended Analysis: Over Time <ul><li>Additional dates </li></ul><ul><ul><li>Original used data from October 2006 </li></ul...
Results <ul><li>Comparison to original published results (from 2007 paper) </li></ul><ul><li>Comparison of results from va...
Comparison to Original Results 117,733 (+9.6%) 119,355 Total n/a 8,422 other 16,045  (14%) 15,782  (15%) SG 32,642 (28%) 3...
Comparison of  Release Rate Criteria 32% 32% 32% TI 33% 27% 28% TG 3% 13% 13% SG 14% 14% 14% II 16% 12% 11% IG Method 3* M...
Comparison Over Time 117,733 112,430 Total 3,296  (2.8%) 3,343  (3.0%) unclassifiable 36,507  (31.0%) 39,948  (35.5%) TI 3...
Changes to Project Classification
Changes to Project Classification
Discussion of Methods <ul><li>Challenges for large-scale analysis </li></ul><ul><ul><li>Data exceptions + automated proces...
Limitations <ul><li>Same as the original work </li></ul><ul><ul><li>Generalizability beyond SF, imperfect data sources, si...
Future Work <ul><li>Additional replication and extension </li></ul><ul><li>Exhaustive testing of threshold values </li></u...
Conclusions <ul><li>Replicated classification of FLOSS project growth and development </li></ul><ul><li>Extended analysis ...
Questions? <ul><li>Data, workflows & scripts: </li></ul><ul><li>http://floss.syr.edu/reclassifying-success-and-tragedy </l...
Upcoming SlideShare
Loading in...5
×

Reclassifying Success and Tragedy in FLOSS Projects

695

Published on

Published in: Self Improvement
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
695
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Shift in TI may be because the thresholds set for determining project failure are too low?
  • Transcript of "Reclassifying Success and Tragedy in FLOSS Projects"

    1. 1. Reclassifying Success and Tragedy in FLOSS Projects Andrea Wiggins Kevin Crowston 1 June, 2010
    2. 2. Motivation <ul><li>Replication of prior research using data from RoRs </li></ul><ul><ul><li>English & Schweik’s 2007 classification of project growth and success </li></ul></ul><ul><ul><li>Relevant to both researchers and practitioners </li></ul></ul>
    3. 3. Success and Tragedy <ul><li>English & Schweik generated classification criteria based on empirical research </li></ul><ul><ul><li>Stage of growth : Initiation (I) or Growth (G) </li></ul></ul><ul><ul><li>Outcome : Success (S), Tragedy (T) or Indeterminate (I) </li></ul></ul><ul><ul><li>Variables (project level): age, releases, release timing, downloads, other distribution channels </li></ul></ul>
    4. 4. Replication <ul><li>Data </li></ul><ul><ul><li>Used SRDA data </li></ul></ul><ul><ul><li>Original used FLOSSmole data + spidered data </li></ul></ul><ul><ul><ul><li>Some difference between data sets due to offsets in data collection dates for original </li></ul></ul></ul><ul><li>Analysis </li></ul><ul><ul><li>Identical criteria for 4 of 6 classes </li></ul></ul><ul><ul><li>Slight variations for TG and SG were operationally equivalent to original </li></ul></ul>
    5. 5. Extended Analysis: Suggested Future Work <ul><li>Release-based sustainability criteria </li></ul><ul><ul><li>Original operationalization does not account for diverse release management strategies </li></ul></ul><ul><li>Added two different release rate criteria </li></ul><ul><ul><li>Original: time between first and last releases </li></ul></ul><ul><ul><li>V2: Threshold for time between most recent releases (suggested by English & Schweik) </li></ul></ul><ul><ul><li>V3: Average time between each release (my idea) </li></ul></ul>
    6. 6. Extended Analysis: Over Time <ul><li>Additional dates </li></ul><ul><ul><li>Original used data from October 2006 </li></ul></ul><ul><ul><li>Added April 2006 for short-term comparison of stability of classification </li></ul></ul><ul><ul><li>With the default values for several variables, project status can change in a period of 6 months, affecting classification status </li></ul></ul>
    7. 7. Results <ul><li>Comparison to original published results (from 2007 paper) </li></ul><ul><li>Comparison of results from varying the release rate classification criterion </li></ul><ul><li>Comparison of classifications over time, Markov model showing state changes </li></ul>
    8. 8. Comparison to Original Results 117,733 (+9.6%) 119,355 Total n/a 8,422 other 16,045 (14%) 15,782 (15%) SG 32,642 (28%) 30,592 (28%) TG 36,507 (31%) 37,320 (35%) TI 16,252 (14%) 10,711 (10%) IG 16,252 (16%) 13,342 (12%) II 3,296 3,186 unclassifiable Replication Original 2006-10
    9. 9. Comparison of Release Rate Criteria 32% 32% 32% TI 33% 27% 28% TG 3% 13% 13% SG 14% 14% 14% II 16% 12% 11% IG Method 3* Method 2 Method 1 2006-10-23
    10. 10. Comparison Over Time 117,733 112,430 Total 3,296 (2.8%) 3,343 (3.0%) unclassifiable 36,507 (31.0%) 39,948 (35.5%) TI 32,642 (27.7%) 28,777 (25.6%) TG 16,045 (13.6%) 14,244 (12.7%) SG 16,252 (13.8%) 13,592 (12.4%) II 12,991 (11.0%) 12,166 (10.8%) IG 2006-10-23 2006-04-21 Class
    11. 11. Changes to Project Classification
    12. 12. Changes to Project Classification
    13. 13. Discussion of Methods <ul><li>Challenges for large-scale analysis </li></ul><ul><ul><li>Data exceptions + automated processing </li></ul></ul><ul><ul><ul><li>Allow extra time to refine data handling </li></ul></ul></ul><ul><ul><ul><li>Adapt processes for changes in data structures </li></ul></ul></ul><ul><ul><li>Managing data flow across tools </li></ul></ul><ul><ul><ul><li>Advantages and disadvantages for each tool </li></ul></ul></ul><ul><ul><ul><li>Create test data sets to speed debugging </li></ul></ul></ul>
    14. 14. Limitations <ul><li>Same as the original work </li></ul><ul><ul><li>Generalizability beyond SF, imperfect data sources, simplistic measures </li></ul></ul><ul><li>Specific to this work </li></ul><ul><ul><li>Changes to data source </li></ul></ul><ul><ul><li>Need for sensitivity analysis on parameters </li></ul></ul><ul><li>Inherent to topic and methods </li></ul><ul><ul><li>Hard (impossible?) to validate empirically on large scale </li></ul></ul>
    15. 15. Future Work <ul><li>Additional replication and extension </li></ul><ul><li>Exhaustive testing of threshold values </li></ul><ul><li>Evaluate alternate measures & dynamic thresholds based on project statistics </li></ul><ul><li>Incorporate CVS/email/forum data </li></ul><ul><li>More closely examine changes in classification over time </li></ul>
    16. 16. Conclusions <ul><li>Replicated classification of FLOSS project growth and development </li></ul><ul><li>Extended analysis with variations on classification criteria </li></ul><ul><li>Extended analysis with additional date </li></ul><ul><li>Recommendations for large-scale FLOSS data analysis and future work </li></ul>
    17. 17. Questions? <ul><li>Data, workflows & scripts: </li></ul><ul><li>http://floss.syr.edu/reclassifying-success-and-tragedy </li></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×