SlideShare a Scribd company logo
On the Development and
Distribution of R Packages
An Empirical Analysis of the R Ecosystem
Alexandre Decan, Tom Mens,
Maëlick Claes & Philippe Grosjean
COMPLEXYS Research Institute
8th September 2015, IWSECO-WEA 2015
Statistical environment
Packages with code, doc, examples, tests, datasets:
http://www.r-project.org
i n s t a l l . p a c k a g e s ( " M y P a c k a g e " )
R package repositories (in March 2015)
Repository name Number of packages Since Role
CRAN 6411 1997 Distribution
Bioconductor 997 2001 Distribution
R-Forge 1883 2006 SVN development
Distribution
GitHub 5150 2008 Git development
Distribution using devtools
But there are more: RForge, Omegahat, Bitbucket, Sourceforge, Google code, ...
How to install packages
install.packages function:
automatically installs a package and its dependencies if needed
only uses CRAN by default
can be configured to use other repositories like Bioconductor and R-Forge
Package devtools provides various functions to install packages from other sources:
SVN
Git
GitHub
Bitbucket
Gitorious
devtools retrieves the package content and installs it using install.packages
Previous work
Preliminary empirical study using CRAN meta-data
On the maintainability of CRAN packages (CSMR-WCRE 2014)
Inter-project (Type1) clone study of CRAN packages:
An Empirical Study of Identical Function Clones in CRAN (IWSC 2015)
Web-dashboard for CRAN maintainers
maintaineR, a web-based dashboard for maintainers of CRAN packages (ICSME 2014)
Research Questions
Where are R packages developed and/or distributed?
How to resolve package dependencies?
Where are R packages
developed and/or
distributed?
Packages contained in the different repositories
in March 2015
Number of newly created packages on GitHub
More and more packages are developed on GitHub that are not distributed somewhere else.
Evolution of the number of packages in CRAN
and GitHub
The number of packages only on GitHub grows faster than the number of packages on CRAN!
But it does not seem to impact the growth of CRAN.
How to resolve package
dependencies?
Dependencies
Defined in the DESCRIPTION file
Using the fields Depends and Imports
These fields does not specify from which repository the dependency must come!
P a c k a g e : S c i V i e w s
T y p e : P a c k a g e
T i t l e : S c i V i e w s G U I A P I - M a i n p a c k a g e
I m p o r t s : e l l i p s e
D e p e n d s : R ( > = 2 . 6 . 0 ) , s t a t s , g r D e v i c e s , g r a p h i c s , M A S S
E n h a n c e s : b a s e
D e s c r i p t i o n : F u n c t i o n s t o i n s t a l l S c i V i e w s a d d i t i o n s t o R , a n d m o r e ( v a r i o u s ) t o o l s
V e r s i o n : 0 . 9 - 5
D a t e : 2 0 1 3 - 0 3 - 0 1
A u t h o r : P h i l i p p e G r o s j e a n
M a i n t a i n e r : P h i l i p p e G r o s j e a n p h g r o s j e a n @ s c i v i e w s . o r g
L i c e n s e : G P L - 2
L a z y L o a d : y e s
U R L : h t t p : / / w w w . s c i v i e w s . o r g / S c i V i e w s - R
B u g R e p o r t s : h t t p s : / / r - f o r g e . r - p r o j e c t . o r g / t r a c k e r / ? g r o u p _ i d = 1 9 4
P a c k a g e d : 2 0 1 4 - 0 3 - 0 1 2 0 : 3 4 : 1 1 U T C ; p h g r o s j e a n
N e e d s C o m p i l a t i o n : n o
R e p o s i t o r y : C R A N
D a t e / P u b l i c a t i o n : 2 0 1 4 - 0 3 - 0 2 1 2 : 4 0 : 4 2
I m p o r t s : e l l i p s e
D e p e n d s : R ( > = 2 . 6 . 0 ) , s t a t s , g r D e v i c e s , g r a p h i c s , M A S S
Package repository priority
For each defined dependency relationship we consider the first package matching the dependency
by privileging repositories in this order:
CRAN Bioconductor GitHub R-Forge
Dependencies between repositories
CRAN
Bioconductor GitHub
R-Forge
58,8% 48.9%
37.2%
5.2%
2.3%
77.1%
61%
5.8%
5.7%
CRAN is the core of the ecosystem
Conclusion
We looked where R packages are developed and distributed taking into account CRAN,
Bioconductor, GitHub and R-Forge
GitHub is growing at a faster pace than the other repositories
More and more packages are developed on GitHub but not distributed somewhere else
However it does not impact the other repositories:
CRAN is (still) at the center of the ecosystem
Most of Bioconductor, R-Forge and GitHub requires CRAN in order to work
Current and future work
Take into account more R package repositories (e.g. Bitbucket)
Investigate why there are so many packages only on GitHub
Asking developers (survey) about usage of CRAN and Github
Eventually provide support to R package users and developers
by improving package dependency management
Socio-technical analysis of R package developer communities
Similar study of an ecosystem based on another programming
Thanks for your attention
Questions?
Slides: http://maelick.net/presentations/iwseco-wea2015/

More Related Content

Similar to On the development and distribution of R packages

Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris. Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
OW2
 
Generating Linked Data descriptions of Debian packages in the Debian PTS
Generating Linked Data descriptions of Debian packages in the Debian PTSGenerating Linked Data descriptions of Debian packages in the Debian PTS
Generating Linked Data descriptions of Debian packages in the Debian PTSolberger
 
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
Pôle Systematic Paris-Region
 
What every C++ programmer should know about modern compilers (w/o comments, A...
What every C++ programmer should know about modern compilers (w/o comments, A...What every C++ programmer should know about modern compilers (w/o comments, A...
What every C++ programmer should know about modern compilers (w/o comments, A...
Sławomir Zborowski
 
Debian meetup nantes 2015 : Salt pour gérer de nombreux serveurs debian
Debian meetup nantes 2015 : Salt pour gérer de nombreux serveurs debianDebian meetup nantes 2015 : Salt pour gérer de nombreux serveurs debian
Debian meetup nantes 2015 : Salt pour gérer de nombreux serveurs debian
Arthur Lutz
 
A Deep Dive into the Socio-Technical Aspects of Delays in Security Patching
A Deep Dive into the Socio-Technical Aspects of Delays in Security PatchingA Deep Dive into the Socio-Technical Aspects of Delays in Security Patching
A Deep Dive into the Socio-Technical Aspects of Delays in Security Patching
CREST
 
PyLadies Talk: Learn to love the command line!
PyLadies Talk: Learn to love the command line!PyLadies Talk: Learn to love the command line!
PyLadies Talk: Learn to love the command line!
Blanca Mancilla
 
Reducing Resistance: Deployment as Surface
Reducing Resistance: Deployment as SurfaceReducing Resistance: Deployment as Surface
Reducing Resistance: Deployment as Surface
Jeffrey Hulten
 
Web enabling your survey business
Web enabling your survey businessWeb enabling your survey business
Web enabling your survey businessRudy Stricklan
 
The net is dark and full of terrors - James Bennett
The net is dark and full of terrors - James BennettThe net is dark and full of terrors - James Bennett
The net is dark and full of terrors - James Bennett
Leo Zhou
 
airflow_aws_snow.pptx
airflow_aws_snow.pptxairflow_aws_snow.pptx
airflow_aws_snow.pptx
rishikakhanna7
 
DevOps introduction
DevOps introductionDevOps introduction
DevOps introduction
Ahmed Ehab AbdulAziz
 
Spring Roo 2.0 Preview at Spring I/O 2016
Spring Roo 2.0 Preview at Spring I/O 2016 Spring Roo 2.0 Preview at Spring I/O 2016
Spring Roo 2.0 Preview at Spring I/O 2016
DISID
 
Svelte (adjective): Attractively thin, graceful, and stylish
Svelte (adjective): Attractively thin, graceful, and stylishSvelte (adjective): Attractively thin, graceful, and stylish
Svelte (adjective): Attractively thin, graceful, and stylish
The Software House
 
Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014
StampedeCon
 
Through the firewall with miniCRAN
Through the firewall with miniCRANThrough the firewall with miniCRAN
Through the firewall with miniCRANRevolution Analytics
 
E xact micro 10 photometer v4
E xact micro 10 photometer v4E xact micro 10 photometer v4
E xact micro 10 photometer v4Ronnie Lewis
 
Meteor - not just for rockstars
Meteor - not just for rockstarsMeteor - not just for rockstars
Meteor - not just for rockstars
Stephan Hochhaus
 
Hardware Description Languages .pptx
Hardware Description Languages .pptxHardware Description Languages .pptx
Hardware Description Languages .pptx
wafawafa52
 
Metadata Provenance
Metadata ProvenanceMetadata Provenance
Metadata Provenance
Kai Eckert
 

Similar to On the development and distribution of R packages (20)

Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris. Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
 
Generating Linked Data descriptions of Debian packages in the Debian PTS
Generating Linked Data descriptions of Debian packages in the Debian PTSGenerating Linked Data descriptions of Debian packages in the Debian PTS
Generating Linked Data descriptions of Debian packages in the Debian PTS
 
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
 
What every C++ programmer should know about modern compilers (w/o comments, A...
What every C++ programmer should know about modern compilers (w/o comments, A...What every C++ programmer should know about modern compilers (w/o comments, A...
What every C++ programmer should know about modern compilers (w/o comments, A...
 
Debian meetup nantes 2015 : Salt pour gérer de nombreux serveurs debian
Debian meetup nantes 2015 : Salt pour gérer de nombreux serveurs debianDebian meetup nantes 2015 : Salt pour gérer de nombreux serveurs debian
Debian meetup nantes 2015 : Salt pour gérer de nombreux serveurs debian
 
A Deep Dive into the Socio-Technical Aspects of Delays in Security Patching
A Deep Dive into the Socio-Technical Aspects of Delays in Security PatchingA Deep Dive into the Socio-Technical Aspects of Delays in Security Patching
A Deep Dive into the Socio-Technical Aspects of Delays in Security Patching
 
PyLadies Talk: Learn to love the command line!
PyLadies Talk: Learn to love the command line!PyLadies Talk: Learn to love the command line!
PyLadies Talk: Learn to love the command line!
 
Reducing Resistance: Deployment as Surface
Reducing Resistance: Deployment as SurfaceReducing Resistance: Deployment as Surface
Reducing Resistance: Deployment as Surface
 
Web enabling your survey business
Web enabling your survey businessWeb enabling your survey business
Web enabling your survey business
 
The net is dark and full of terrors - James Bennett
The net is dark and full of terrors - James BennettThe net is dark and full of terrors - James Bennett
The net is dark and full of terrors - James Bennett
 
airflow_aws_snow.pptx
airflow_aws_snow.pptxairflow_aws_snow.pptx
airflow_aws_snow.pptx
 
DevOps introduction
DevOps introductionDevOps introduction
DevOps introduction
 
Spring Roo 2.0 Preview at Spring I/O 2016
Spring Roo 2.0 Preview at Spring I/O 2016 Spring Roo 2.0 Preview at Spring I/O 2016
Spring Roo 2.0 Preview at Spring I/O 2016
 
Svelte (adjective): Attractively thin, graceful, and stylish
Svelte (adjective): Attractively thin, graceful, and stylishSvelte (adjective): Attractively thin, graceful, and stylish
Svelte (adjective): Attractively thin, graceful, and stylish
 
Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014
 
Through the firewall with miniCRAN
Through the firewall with miniCRANThrough the firewall with miniCRAN
Through the firewall with miniCRAN
 
E xact micro 10 photometer v4
E xact micro 10 photometer v4E xact micro 10 photometer v4
E xact micro 10 photometer v4
 
Meteor - not just for rockstars
Meteor - not just for rockstarsMeteor - not just for rockstars
Meteor - not just for rockstars
 
Hardware Description Languages .pptx
Hardware Description Languages .pptxHardware Description Languages .pptx
Hardware Description Languages .pptx
 
Metadata Provenance
Metadata ProvenanceMetadata Provenance
Metadata Provenance
 

More from Tom Mens

How to be(come) a successful PhD student
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD student
Tom Mens
 
Recognising bot activity in collaborative software development
Recognising bot activity in collaborative software developmentRecognising bot activity in collaborative software development
Recognising bot activity in collaborative software development
Tom Mens
 
A Dataset of Bot and Human Activities in GitHub
A Dataset of Bot and Human Activities in GitHubA Dataset of Bot and Human Activities in GitHub
A Dataset of Bot and Human Activities in GitHub
Tom Mens
 
The (r)evolution of CI/CD on GitHub
 The (r)evolution of CI/CD on GitHub The (r)evolution of CI/CD on GitHub
The (r)evolution of CI/CD on GitHub
Tom Mens
 
Nurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the FutureNurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the Future
Tom Mens
 
Comment programmer un robot en 30 minutes?
Comment programmer un robot en 30 minutes?Comment programmer un robot en 30 minutes?
Comment programmer un robot en 30 minutes?
Tom Mens
 
On the rise and fall of CI services in GitHub
On the rise and fall of CI services in GitHubOn the rise and fall of CI services in GitHub
On the rise and fall of CI services in GitHub
Tom Mens
 
On backporting practices in package dependency networks
On backporting practices in package dependency networksOn backporting practices in package dependency networks
On backporting practices in package dependency networks
Tom Mens
 
Comparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and RubygemsComparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Tom Mens
 
Lost in Zero Space
Lost in Zero SpaceLost in Zero Space
Lost in Zero Space
Tom Mens
 
Evaluating a bot detection model on git commit messages
Evaluating a bot detection model on git commit messagesEvaluating a bot detection model on git commit messages
Evaluating a bot detection model on git commit messages
Tom Mens
 
Is my software ecosystem healthy? It depends!
Is my software ecosystem healthy? It depends!Is my software ecosystem healthy? It depends!
Is my software ecosystem healthy? It depends!
Tom Mens
 
Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Bot or not? Detecting bots in GitHub pull request activity based on comment s...Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Tom Mens
 
On the fragility of open source software packaging ecosystems
On the fragility of open source software packaging ecosystemsOn the fragility of open source software packaging ecosystems
On the fragility of open source software packaging ecosystems
Tom Mens
 
How magic is zero? An Empirical Analysis of Initial Development Releases in S...
How magic is zero? An Empirical Analysis of Initial Development Releases in S...How magic is zero? An Empirical Analysis of Initial Development Releases in S...
How magic is zero? An Empirical Analysis of Initial Development Releases in S...
Tom Mens
 
Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)
Tom Mens
 
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Tom Mens
 
SecoHealth 2019 Research Achievements
SecoHealth 2019 Research AchievementsSecoHealth 2019 Research Achievements
SecoHealth 2019 Research Achievements
Tom Mens
 
SECO-Assist 2019 research seminar
SECO-Assist 2019 research seminarSECO-Assist 2019 research seminar
SECO-Assist 2019 research seminar
Tom Mens
 
Empirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package ManagersEmpirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package Managers
Tom Mens
 

More from Tom Mens (20)

How to be(come) a successful PhD student
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD student
 
Recognising bot activity in collaborative software development
Recognising bot activity in collaborative software developmentRecognising bot activity in collaborative software development
Recognising bot activity in collaborative software development
 
A Dataset of Bot and Human Activities in GitHub
A Dataset of Bot and Human Activities in GitHubA Dataset of Bot and Human Activities in GitHub
A Dataset of Bot and Human Activities in GitHub
 
The (r)evolution of CI/CD on GitHub
 The (r)evolution of CI/CD on GitHub The (r)evolution of CI/CD on GitHub
The (r)evolution of CI/CD on GitHub
 
Nurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the FutureNurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the Future
 
Comment programmer un robot en 30 minutes?
Comment programmer un robot en 30 minutes?Comment programmer un robot en 30 minutes?
Comment programmer un robot en 30 minutes?
 
On the rise and fall of CI services in GitHub
On the rise and fall of CI services in GitHubOn the rise and fall of CI services in GitHub
On the rise and fall of CI services in GitHub
 
On backporting practices in package dependency networks
On backporting practices in package dependency networksOn backporting practices in package dependency networks
On backporting practices in package dependency networks
 
Comparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and RubygemsComparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
 
Lost in Zero Space
Lost in Zero SpaceLost in Zero Space
Lost in Zero Space
 
Evaluating a bot detection model on git commit messages
Evaluating a bot detection model on git commit messagesEvaluating a bot detection model on git commit messages
Evaluating a bot detection model on git commit messages
 
Is my software ecosystem healthy? It depends!
Is my software ecosystem healthy? It depends!Is my software ecosystem healthy? It depends!
Is my software ecosystem healthy? It depends!
 
Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Bot or not? Detecting bots in GitHub pull request activity based on comment s...Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Bot or not? Detecting bots in GitHub pull request activity based on comment s...
 
On the fragility of open source software packaging ecosystems
On the fragility of open source software packaging ecosystemsOn the fragility of open source software packaging ecosystems
On the fragility of open source software packaging ecosystems
 
How magic is zero? An Empirical Analysis of Initial Development Releases in S...
How magic is zero? An Empirical Analysis of Initial Development Releases in S...How magic is zero? An Empirical Analysis of Initial Development Releases in S...
How magic is zero? An Empirical Analysis of Initial Development Releases in S...
 
Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)
 
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
 
SecoHealth 2019 Research Achievements
SecoHealth 2019 Research AchievementsSecoHealth 2019 Research Achievements
SecoHealth 2019 Research Achievements
 
SECO-Assist 2019 research seminar
SECO-Assist 2019 research seminarSECO-Assist 2019 research seminar
SECO-Assist 2019 research seminar
 
Empirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package ManagersEmpirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package Managers
 

Recently uploaded

Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
ShamsuddeenMuhammadA
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi ArabiaTop 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Yara Milbes
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 

Recently uploaded (20)

Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi ArabiaTop 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 

On the development and distribution of R packages

  • 1. On the Development and Distribution of R Packages An Empirical Analysis of the R Ecosystem Alexandre Decan, Tom Mens, Maëlick Claes & Philippe Grosjean COMPLEXYS Research Institute 8th September 2015, IWSECO-WEA 2015
  • 2. Statistical environment Packages with code, doc, examples, tests, datasets: http://www.r-project.org i n s t a l l . p a c k a g e s ( " M y P a c k a g e " )
  • 3. R package repositories (in March 2015) Repository name Number of packages Since Role CRAN 6411 1997 Distribution Bioconductor 997 2001 Distribution R-Forge 1883 2006 SVN development Distribution GitHub 5150 2008 Git development Distribution using devtools But there are more: RForge, Omegahat, Bitbucket, Sourceforge, Google code, ...
  • 4. How to install packages install.packages function: automatically installs a package and its dependencies if needed only uses CRAN by default can be configured to use other repositories like Bioconductor and R-Forge Package devtools provides various functions to install packages from other sources: SVN Git GitHub Bitbucket Gitorious devtools retrieves the package content and installs it using install.packages
  • 5. Previous work Preliminary empirical study using CRAN meta-data On the maintainability of CRAN packages (CSMR-WCRE 2014) Inter-project (Type1) clone study of CRAN packages: An Empirical Study of Identical Function Clones in CRAN (IWSC 2015) Web-dashboard for CRAN maintainers maintaineR, a web-based dashboard for maintainers of CRAN packages (ICSME 2014)
  • 6. Research Questions Where are R packages developed and/or distributed? How to resolve package dependencies?
  • 7. Where are R packages developed and/or distributed?
  • 8. Packages contained in the different repositories in March 2015
  • 9. Number of newly created packages on GitHub More and more packages are developed on GitHub that are not distributed somewhere else.
  • 10. Evolution of the number of packages in CRAN and GitHub The number of packages only on GitHub grows faster than the number of packages on CRAN! But it does not seem to impact the growth of CRAN.
  • 11. How to resolve package dependencies?
  • 12. Dependencies Defined in the DESCRIPTION file Using the fields Depends and Imports These fields does not specify from which repository the dependency must come! P a c k a g e : S c i V i e w s T y p e : P a c k a g e T i t l e : S c i V i e w s G U I A P I - M a i n p a c k a g e I m p o r t s : e l l i p s e D e p e n d s : R ( > = 2 . 6 . 0 ) , s t a t s , g r D e v i c e s , g r a p h i c s , M A S S E n h a n c e s : b a s e D e s c r i p t i o n : F u n c t i o n s t o i n s t a l l S c i V i e w s a d d i t i o n s t o R , a n d m o r e ( v a r i o u s ) t o o l s V e r s i o n : 0 . 9 - 5 D a t e : 2 0 1 3 - 0 3 - 0 1 A u t h o r : P h i l i p p e G r o s j e a n M a i n t a i n e r : P h i l i p p e G r o s j e a n p h g r o s j e a n @ s c i v i e w s . o r g L i c e n s e : G P L - 2 L a z y L o a d : y e s U R L : h t t p : / / w w w . s c i v i e w s . o r g / S c i V i e w s - R B u g R e p o r t s : h t t p s : / / r - f o r g e . r - p r o j e c t . o r g / t r a c k e r / ? g r o u p _ i d = 1 9 4 P a c k a g e d : 2 0 1 4 - 0 3 - 0 1 2 0 : 3 4 : 1 1 U T C ; p h g r o s j e a n N e e d s C o m p i l a t i o n : n o R e p o s i t o r y : C R A N D a t e / P u b l i c a t i o n : 2 0 1 4 - 0 3 - 0 2 1 2 : 4 0 : 4 2 I m p o r t s : e l l i p s e D e p e n d s : R ( > = 2 . 6 . 0 ) , s t a t s , g r D e v i c e s , g r a p h i c s , M A S S
  • 13. Package repository priority For each defined dependency relationship we consider the first package matching the dependency by privileging repositories in this order: CRAN Bioconductor GitHub R-Forge
  • 14. Dependencies between repositories CRAN Bioconductor GitHub R-Forge 58,8% 48.9% 37.2% 5.2% 2.3% 77.1% 61% 5.8% 5.7% CRAN is the core of the ecosystem
  • 15. Conclusion We looked where R packages are developed and distributed taking into account CRAN, Bioconductor, GitHub and R-Forge GitHub is growing at a faster pace than the other repositories More and more packages are developed on GitHub but not distributed somewhere else However it does not impact the other repositories: CRAN is (still) at the center of the ecosystem Most of Bioconductor, R-Forge and GitHub requires CRAN in order to work
  • 16. Current and future work Take into account more R package repositories (e.g. Bitbucket) Investigate why there are so many packages only on GitHub Asking developers (survey) about usage of CRAN and Github Eventually provide support to R package users and developers by improving package dependency management Socio-technical analysis of R package developer communities Similar study of an ecosystem based on another programming
  • 17. Thanks for your attention Questions? Slides: http://maelick.net/presentations/iwseco-wea2015/