The Evolution of the R Software Ecosystem (CSMR 2013)
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

The Evolution of the R Software Ecosystem (CSMR 2013)

  • 295 views
Uploaded on

Software ecosystems form the heart of modern companies’ collaboration strategies with end users, open source developers and other companies. An ecosystem consists of a core platform and a halo of......

Software ecosystems form the heart of modern companies’ collaboration strategies with end users, open source developers and other companies. An ecosystem consists of a core platform and a halo of user contributions that provide value to a company or project. In order to sustain the level and number of high-quality contributions, it is crucial for companies and
contributors to understand how ecosystems tend to evolve and can be maintained successfully over time.

As a first step, this presentation explores the evolution characteristics of the statistical computing project GNU R, which is a successful, end-user programming ecosystem. We find that the ecosystem of user-contributed R packages has been growing steadily since R’s conception, at a significantly faster rate than core packages, yet each individual package remains stable in size. We also identified differences in the way user-contributed and core packages are able to attract an active community of users.

http://sail.cs.queensu.ca/publications/pubs/German-CSMR2013.pdf

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
295
On Slideshare
295
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
4
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The Evolution of the R Software Ecosystem Daniel M. German University ofVictoria Bram Adams École Polytechnique de Montréal Ahmed E. Hassan Queen's University
  • 2. An Ecosystem is ...
  • 3. An Ecosystem is ... Jansen et al., ICSE '09 a set of (1) businesses functioning as a unit and interacting with a shared market for (2) software and services, together with (3) the relationships among [the businesses].
  • 4. In Other Words
  • 5. core platform
  • 6. user contributions building on platform core platform
  • 7. user contributions building on platform core platform ecosystem infrastructure
  • 8. user contributions building on platform ecosystem infrastructure
  • 9. user contributions building on platform CRAN
  • 10. ggplot wethepeopledata.table Sim.DiffProc randomForest rbundler foreach RODBC rms WGCNA minpack.lm fields caret heavy plm rv ggplot2 Sim.DiffProcGUI CRAN
  • 11. ggplot wethepeopledata.table Sim.DiffProc randomForest rbundler foreach RODBC rms WGCNA minpack.lm fields caret heavy plm rv ggplot2 Sim.DiffProcGUI CRAN
  • 12. In Other Words
  • 13. Bosch, SPLC '09 Desktop ecosystems for end- user programming are the holy grail of software platforms!
  • 14. 6
  • 15. 6 h#p://www.)obe.com
  • 16. 6 h#p://www.rexeranaly)cs.com/Data-­‐Miner-­‐Survey-­‐Results-­‐2011.html
  • 17. 6 h#p://www.rexeranaly)cs. But  How  Did  they  Get  This  Far?
  • 18. Robert  Gentleman,  1993
  • 19. Robert  Gentleman,  1993 non-­‐programmers
  • 20. # Goals: A first look at R objects - vectors, lists, matrices, data frames. # To make vectors "x" "y" "year" and "names" x <- c(2,3,7,9) y <- c(9,7,3,2) year <- 1990:1993 names <- c("payal", "shraddha", "kritika", "itida") # Accessing the 1st and last elements of y -- y[1] y[length(y)] # To make a list "person" -- person <- list(name="payal", x=2, y=9, year=1990) person # Accessing things inside a list -- person$name person$x # To make a matrix, pasting together the columns "year" "x" and "y" # The verb cbind() stands for "column bind" cbind(year, x, y) # To make a "data frame", which is a list of vectors of the same length -- D <- data.frame(names, year, x, y) nrow(D) # Accessing one of these vectors D$names # Accessing the last element of this vector D$names[nrow(D)] # Or equally, D$names[length(D$names)] 8 The  R  Language
  • 21. 9 R  has  an  ACTIVE   Community
  • 22. 9 R  has  an  ACTIVE   Community package  infrastructure
  • 23. 9 R  has  an  ACTIVE   Community package  infrastructure mailing  lists
  • 24. 9 R  has  an  ACTIVE   Community package  infrastructure blogsmailing  lists
  • 25. 9 R  has  an  ACTIVE   Community package  infrastructure books blogsmailing  lists
  • 26. 9 R  has  an  ACTIVE   Community package  infrastructure books blogsmailing  lists commercial  partners
  • 27. 9 R  has  an  ACTIVE   Community package  infrastructure books blogsmailing  lists commercial  partners conference
  • 28. How  does  a  Successful   Ecosystem  like  R  Evolve? 10
  • 29. How  does  a  Successful   Ecosystem  like  R  Evolve? 10 Package  Characteris)cs
  • 30. How  does  a  Successful   Ecosystem  like  R  Evolve? 10 Package  Characteris)cs Package  Evolu)on
  • 31. How  does  a  Successful   Ecosystem  like  R  Evolve? 10 Package  Characteris)cs Package  Evolu)on Package  Dependencies
  • 32. How  does  a  Successful   Ecosystem  like  R  Evolve? 10 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community
  • 33. Package  Data  Used
  • 34. Package  Data  Used CRAN 23/04/1997  -­‐  25/02/2011 80  official  R  versions
  • 35. base recommended popular contributed Package  Data  Used CRAN 23/04/1997  -­‐  25/02/2011 80  official  R  versions 2,733 15 13 179 19,593   versions +
  • 36. How  to  Define  Popular  Packages?
  • 37. How  to  Define  Popular  Packages?
  • 38. How  to  Define  Popular  Packages? contest  providing  list  of   installed  packages  by  52  users
  • 39. 1 5 10 50 100 500 1000 Number of Packages Installed Numberofdifferentpackagesperuser All Inst. by at least 20% users
  • 40. popular  packages= 1 5 10 50 100 500 1000 Number of Packages Installed Numberofdifferentpackagesperuser All Inst. by at least 20% users
  • 41. Mailing  List  Data  Used 13
  • 42. Mailing  List  Data  Used 13 R-­‐help R-­‐devel
  • 43. Mailing  List  Data  Used 13 R-­‐help R-­‐devel MailMiner [Be#enburg  et  al.]
  • 44. Mailing  List  Data  Used 13 R-­‐help R-­‐devel MailMiner [Be#enburg  et  al.] PostgreSQL
  • 45. How  does  a  Successful   Ecosystem  like  R  Evolve? 14 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community
  • 46. How  does  a  Successful   Ecosystem  like  R  Evolve? 14 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community
  • 47. 0.0 0.1 0.2 0.3 0.4 0.5 Proportion of files for a given extension Proportionoffiles ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Base Recommended Popular Contributed rd r txt hpp rda c h description pdf cpp namespace f rdata png gif java rnw save html xml tex s q citation Documenta)on  Files  Dominate! 15
  • 48. 0.0 0.1 0.2 0.3 0.4 0.5 Proportion of files for a given extension Proportionoffiles ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Base Recommended Popular Contributed rd r txt hpp rda c h description pdf cpp namespace f rdata png gif java rnw save html xml tex s q citation Documenta)on  Files  Dominate! 15 documentaDon
  • 49. 0.0 0.1 0.2 0.3 0.4 0.5 Proportion of files for a given extension Proportionoffiles ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Base Recommended Popular Contributed rd r txt hpp rda c h description pdf cpp namespace f rdata png gif java rnw save html xml tex s q citation Documenta)on  Files  Dominate! 15 documentaDon source  code
  • 50. base recommended popular contributed Size of Documentation per Package Documentation Files (.rd) Lines 0 100 1k 10k 100k Extensive  Package  Documenta)on 16 5.3k 3.6k 1.7k 0.6k
  • 51. Contributed  Packages  Contain  Less  Code 17 Size of Source Code per Package r Popular Contributed SLOCs 0 100 1k 10k 100k 1M All source code Base Recommended Popular SourceCodeperPackageurceCodeperPackage base recommended popular contributed Size of Documentation per Package Documentation Files (.rd) Lines 0 100 1k 10k 100k 7.3k 3.5k 1.8k 0.7k
  • 52. 18 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community
  • 53. 18 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community extensive   documenta)on small   contributed   packages
  • 54. 18 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community extensive   documenta)on small   contributed   packages
  • 55. 1550500 Number of Packages over Time Total ● ● ● ● ●● ●● ● ● 1998 2000 2002 2004 2006 2008 2010 ● Base Recommended Popular Contributed Fast  Growth  of  Contributed  Packages 19
  • 56. 1550500 Number of Packages over Time Total ● ● ● ● ●● ●● ● ● 1998 2000 2002 2004 2006 2008 2010 ● Base Recommended Popular Contributed Fast  Growth  of  Contributed  Packages 19 super-­‐linear  growth
  • 57. 1550500 Number of Packages over Time Total ● ● ● ● ●● ●● ● ● 1998 2000 2002 2004 2006 2008 2010 ● Base Recommended Popular Contributed Fast  Growth  of  Contributed  Packages 19 super-­‐linear  growth conservaDve  base/ recommended  evoluDon
  • 58. Evolution of the Size of Source 1998 2001 2004 2007 2010 1999 2002 2005 2008 2011 1999 010010k1M Base Recommended Popu e Size of Source Code per Package 2008 2011 1999 2002 2005 2008 2011 1999 2002 2005 2008 2011 Recommended Popular Contributed Contributed  Packages  have  Stable  Size 20 05 2008 2011 1999 2002 2005 2008 2011 1999 200 Recommended Popular Contributed 2007 2010 1999 2002 2005 2008 2011 1999 2002 Base Recommended Popular
  • 59. Number of Releases Per Package ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 15102050160 ● Recommended Popular Contributed The  Less  Core,  the  Less  Releases 21
  • 60. Number of Releases Per Package ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 15102050160 ● Recommended Popular Contributed The  Less  Core,  the  Less  Releases 21 50%  had  <=17  releases
  • 61. Number of Releases Per Package ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 15102050160 ● Recommended Popular Contributed The  Less  Core,  the  Less  Releases 21 50%  had  <=3  releases 50%  had  <=17  releases
  • 62. Date of Latest Release per Package ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 2003 2004 2005 2006 2007 2008 2009 2010 2011 ● Recommended Popular Contributed ...  but  Contributed  Packages  are  Ac)vely  Maintained! 22 >90%  of  packages  had  release  in  last  2  years
  • 63. 23
  • 64. 23
  • 65. 24 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community extensive   documenta)on small   contributed   packages
  • 66. 24 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community extensive   documenta)on small   contributed   packages fast  growth  of   contributed   packages stable   package  size ac)ve   maintenance
  • 67. 24 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community extensive   documenta)on small   contributed   packages fast  growth  of   contributed   packages stable   package  size ac)ve   maintenance
  • 68. 0510152025 Number of Dependencies Per Package Proportion of Packages NumberofDependencies 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recommended Popular Contributed Packages  have  Few  Dependencies
  • 69. 0510152025 Number of Dependencies Per Package Proportion of Packages NumberofDependencies 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recommended Popular Contributed Packages  have  Few  Dependencies 1/3  has  NONE
  • 70. 0510152025 Number of Dependencies Per Package Proportion of Packages NumberofDependencies 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recommended Popular Contributed Packages  have  Few  Dependencies 1/3  has  NONE 1/4  has  1  dependency
  • 71. Number of Dependents Per Package Proportion of Packages NumberofDependents 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0131050260 Recommended Popular Contributed Contributed  Packages  are  Higher-­‐Level
  • 72. Number of Dependents Per Package Proportion of Packages NumberofDependents 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0131050260 Recommended Popular Contributed Contributed  Packages  are  Higher-­‐Level NO  dependents
  • 73. Number of Dependents Per Package Proportion of Packages NumberofDependents 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0131050260 Recommended Popular Contributed Contributed  Packages  are  Higher-­‐Level NO  dependents 50%  popular  packages  has  <=6  dependents
  • 74. 27 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community extensive   documenta)on small   contributed   packages fast  growth  of   contributed   packages stable   package  size ac)ve   maintenance
  • 75. 27 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community extensive   documenta)on small   contributed   packages fast  growth  of   contributed   packages stable   package  size ac)ve   maintenance few   dependencies contributed   packages  are   higher  level
  • 76. 27 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community extensive   documenta)on small   contributed   packages fast  growth  of   contributed   packages stable   package  size ac)ve   maintenance few   dependencies contributed   packages  are   higher  level
  • 77. 1998 2000 2002 2004 2006 2008 2010 05000100001500020000 #messages ● ● ● ● ● ● ● ● ● ● ● ● ● ● base recommended popular contributed Contributed  Packages  Generate   More  User  Traffic
  • 78. 1998 2000 2002 2004 2006 2008 2010 05001000150020002500 #messages ● ● ● ● ● ● ● ● ● ● ● ● ● ● base recommended popular contributed Contributed  Packages  take  over   Developer  Traffic
  • 79. 1998 2000 2002 2004 2006 2008 2010 05001000150020002500 #messages ● ● ● ● ● ● ● ● ● ● ● ● ● ● base recommended popular contributed Contributed  Packages  take  over   Developer  Traffic
  • 80. 110010000 Total#messages base recommended popular contributed The  Less  Core,  the  Less  Traffic
  • 81. 110010000 Total#messages base recommended popular contributed The  Less  Core,  the  Less  Traffic strong compeDDon
  • 82. Time instant day week month year 5 year 10 year 1st msg. 10th msg. 100th msg. 1000th msg. base recommended popular contributed Star)ng  up  a  Community  takes  1  Year
  • 83. Time instant day week month year 5 year 10 year 1st msg. 10th msg. 100th msg. 1000th msg. base recommended popular contributed Star)ng  up  a  Community  takes  1  Year 3  months
  • 84. Time instant day week month year 5 year 10 year 1st msg. 10th msg. 100th msg. 1000th msg. base recommended popular contributed Star)ng  up  a  Community  takes  1  Year 3  months 1  year
  • 85. Time instant day week month year 5 year 10 year 1st msg. 10th msg. 100th msg. 1000th msg. base recommended popular contributed Star)ng  up  a  Community  takes  1  Year 3  months 1  year 5  months  slower
  • 86. Time instant day week month year 5 year 10 year 1st msg. 10th msg. 100th msg. 1000th msg. base recommended popular contributed Star)ng  up  a  Community  takes  1  Year 3  months 1  year 5  months  slower 44.9%  gets  here
  • 87. Time instant day week month year 5 year 10 year 1st msg. 10th msg. 100th msg. 1000th msg. base recommended popular contributed Star)ng  up  a  Community  takes  1  Year 3  months 1  year 5  months  slower only  6.5%   gets  this  far 44.9%  gets  here
  • 88. 32 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community extensive   documenta)on small   contributed   packages fast  growth  of   contributed   packages stable   package  size ac)ve   maintenance few   dependencies contributed   packages  are   higher  level
  • 89. 32 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community extensive   documenta)on small   contributed   packages fast  growth  of   contributed   packages stable   package  size ac)ve   maintenance few   dependencies contributed   packages  are   higher  level strong   compe))on   for  a#en)on building  a   community   takes  a  year
  • 90. So  What? • How  do  contributors  deal  with  the  fight  for  aYenDon? –  What  is  their  mo)va)on? –  How  much  effort  do  they  spend  on  their  package? • How  does  a  package  become  popular/recommended? –  Do  bloggers/books  have  an  impact? –  Or  is  it  the  other  way  around? • How  do  R-­‐forge  and  the  core  team  ensure  high   quality  releases  without  broken  packages? • ...
  • 91. Bosch, SPLC '09 Desktop ecosystems for end- user programming are the holy grail of software platforms!
  • 92. base recommended popular contributed Case  Study  on  R CRAN 23/04/1997  -­‐  25/02/2011 80  official  R  versions 2,733 15 13 179 19,593   versions +
  • 93. 37 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community extensive   documenta)on small   contributed   packages fast  growth  of   contributed   packages stable   package  size ac)ve   maintenance few   dependencies contributed   packages  are   higher  level strong   compe))on   for  a#en)on building  a   community   takes  a  year
  • 94. 1st International Workshop on Release Engineering http://releng.polymtl.ca May 20, 2013, San Francisco, USA RELENG 2013