SlideShare a Scribd company logo
1 of 17
Download to read offline
On the Development and
Distribution of R Packages
An Empirical Analysis of the R Ecosystem
Alexandre Decan, Tom Mens,
Maëlick Claes & Philippe Grosjean
COMPLEXYS Research Institute
8th September 2015, IWSECO-WEA 2015
Statistical environment
Packages with code, doc, examples, tests, datasets:
http://www.r-project.org
i n s t a l l . p a c k a g e s ( " M y P a c k a g e " )
R package repositories (in March 2015)
Repository name Number of packages Since Role
CRAN 6411 1997 Distribution
Bioconductor 997 2001 Distribution
R-Forge 1883 2006 SVN development
Distribution
GitHub 5150 2008 Git development
Distribution using devtools
But there are more: RForge, Omegahat, Bitbucket, Sourceforge, Google code, ...
How to install packages
install.packages function:
automatically installs a package and its dependencies if needed
only uses CRAN by default
can be configured to use other repositories like Bioconductor and R-Forge
Package devtools provides various functions to install packages from other sources:
SVN
Git
GitHub
Bitbucket
Gitorious
devtools retrieves the package content and installs it using install.packages
Previous work
Preliminary empirical study using CRAN meta-data
On the maintainability of CRAN packages (CSMR-WCRE 2014)
Inter-project (Type1) clone study of CRAN packages:
An Empirical Study of Identical Function Clones in CRAN (IWSC 2015)
Web-dashboard for CRAN maintainers
maintaineR, a web-based dashboard for maintainers of CRAN packages (ICSME 2014)
Research Questions
Where are R packages developed and/or distributed?
How to resolve package dependencies?
Where are R packages
developed and/or
distributed?
Packages contained in the different repositories
in March 2015
Number of newly created packages on GitHub
More and more packages are developed on GitHub that are not distributed somewhere else.
Evolution of the number of packages in CRAN
and GitHub
The number of packages only on GitHub grows faster than the number of packages on CRAN!
But it does not seem to impact the growth of CRAN.
How to resolve package
dependencies?
Dependencies
Defined in the DESCRIPTION file
Using the fields Depends and Imports
These fields does not specify from which repository the dependency must come!
P a c k a g e : S c i V i e w s
T y p e : P a c k a g e
T i t l e : S c i V i e w s G U I A P I - M a i n p a c k a g e
I m p o r t s : e l l i p s e
D e p e n d s : R ( > = 2 . 6 . 0 ) , s t a t s , g r D e v i c e s , g r a p h i c s , M A S S
E n h a n c e s : b a s e
D e s c r i p t i o n : F u n c t i o n s t o i n s t a l l S c i V i e w s a d d i t i o n s t o R , a n d m o r e ( v a r i o u s ) t o o l s
V e r s i o n : 0 . 9 - 5
D a t e : 2 0 1 3 - 0 3 - 0 1
A u t h o r : P h i l i p p e G r o s j e a n
M a i n t a i n e r : P h i l i p p e G r o s j e a n p h g r o s j e a n @ s c i v i e w s . o r g
L i c e n s e : G P L - 2
L a z y L o a d : y e s
U R L : h t t p : / / w w w . s c i v i e w s . o r g / S c i V i e w s - R
B u g R e p o r t s : h t t p s : / / r - f o r g e . r - p r o j e c t . o r g / t r a c k e r / ? g r o u p _ i d = 1 9 4
P a c k a g e d : 2 0 1 4 - 0 3 - 0 1 2 0 : 3 4 : 1 1 U T C ; p h g r o s j e a n
N e e d s C o m p i l a t i o n : n o
R e p o s i t o r y : C R A N
D a t e / P u b l i c a t i o n : 2 0 1 4 - 0 3 - 0 2 1 2 : 4 0 : 4 2
I m p o r t s : e l l i p s e
D e p e n d s : R ( > = 2 . 6 . 0 ) , s t a t s , g r D e v i c e s , g r a p h i c s , M A S S
Package repository priority
For each defined dependency relationship we consider the first package matching the dependency
by privileging repositories in this order:
CRAN Bioconductor GitHub R-Forge
Dependencies between repositories
CRAN
Bioconductor GitHub
R-Forge
58,8% 48.9%
37.2%
5.2%
2.3%
77.1%
61%
5.8%
5.7%
CRAN is the core of the ecosystem
Conclusion
We looked where R packages are developed and distributed taking into account CRAN,
Bioconductor, GitHub and R-Forge
GitHub is growing at a faster pace than the other repositories
More and more packages are developed on GitHub but not distributed somewhere else
However it does not impact the other repositories:
CRAN is (still) at the center of the ecosystem
Most of Bioconductor, R-Forge and GitHub requires CRAN in order to work
Current and future work
Take into account more R package repositories (e.g. Bitbucket)
Investigate why there are so many packages only on GitHub
Asking developers (survey) about usage of CRAN and Github
Eventually provide support to R package users and developers
by improving package dependency management
Socio-technical analysis of R package developer communities
Similar study of an ecosystem based on another programming
Thanks for your attention
Questions?
Slides: http://maelick.net/presentations/iwseco-wea2015/

More Related Content

Similar to Analysis of R Package Development and Distribution Across Repositories

Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris. Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris. OW2
 
Generating Linked Data descriptions of Debian packages in the Debian PTS
Generating Linked Data descriptions of Debian packages in the Debian PTSGenerating Linked Data descriptions of Debian packages in the Debian PTS
Generating Linked Data descriptions of Debian packages in the Debian PTSolberger
 
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet Pôle Systematic Paris-Region
 
What every C++ programmer should know about modern compilers (w/o comments, A...
What every C++ programmer should know about modern compilers (w/o comments, A...What every C++ programmer should know about modern compilers (w/o comments, A...
What every C++ programmer should know about modern compilers (w/o comments, A...Sławomir Zborowski
 
Debian meetup nantes 2015 : Salt pour gérer de nombreux serveurs debian
Debian meetup nantes 2015 : Salt pour gérer de nombreux serveurs debianDebian meetup nantes 2015 : Salt pour gérer de nombreux serveurs debian
Debian meetup nantes 2015 : Salt pour gérer de nombreux serveurs debianArthur Lutz
 
A Deep Dive into the Socio-Technical Aspects of Delays in Security Patching
A Deep Dive into the Socio-Technical Aspects of Delays in Security PatchingA Deep Dive into the Socio-Technical Aspects of Delays in Security Patching
A Deep Dive into the Socio-Technical Aspects of Delays in Security PatchingCREST @ University of Adelaide
 
PyLadies Talk: Learn to love the command line!
PyLadies Talk: Learn to love the command line!PyLadies Talk: Learn to love the command line!
PyLadies Talk: Learn to love the command line!Blanca Mancilla
 
Reducing Resistance: Deployment as Surface
Reducing Resistance: Deployment as SurfaceReducing Resistance: Deployment as Surface
Reducing Resistance: Deployment as SurfaceJeffrey Hulten
 
Web enabling your survey business
Web enabling your survey businessWeb enabling your survey business
Web enabling your survey businessRudy Stricklan
 
The net is dark and full of terrors - James Bennett
The net is dark and full of terrors - James BennettThe net is dark and full of terrors - James Bennett
The net is dark and full of terrors - James BennettLeo Zhou
 
Spring Roo 2.0 Preview at Spring I/O 2016
Spring Roo 2.0 Preview at Spring I/O 2016 Spring Roo 2.0 Preview at Spring I/O 2016
Spring Roo 2.0 Preview at Spring I/O 2016 DISID
 
Svelte (adjective): Attractively thin, graceful, and stylish
Svelte (adjective): Attractively thin, graceful, and stylishSvelte (adjective): Attractively thin, graceful, and stylish
Svelte (adjective): Attractively thin, graceful, and stylishThe Software House
 
Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014StampedeCon
 
Through the firewall with miniCRAN
Through the firewall with miniCRANThrough the firewall with miniCRAN
Through the firewall with miniCRANRevolution Analytics
 
Meteor - not just for rockstars
Meteor - not just for rockstarsMeteor - not just for rockstars
Meteor - not just for rockstarsStephan Hochhaus
 
Hardware Description Languages .pptx
Hardware Description Languages .pptxHardware Description Languages .pptx
Hardware Description Languages .pptxwafawafa52
 
Metadata Provenance
Metadata ProvenanceMetadata Provenance
Metadata ProvenanceKai Eckert
 
Object Oriented Software Development revision slide
Object Oriented Software Development revision slide Object Oriented Software Development revision slide
Object Oriented Software Development revision slide fauza jali
 

Similar to Analysis of R Package Development and Distribution Across Repositories (20)

Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris. Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
 
Generating Linked Data descriptions of Debian packages in the Debian PTS
Generating Linked Data descriptions of Debian packages in the Debian PTSGenerating Linked Data descriptions of Debian packages in the Debian PTS
Generating Linked Data descriptions of Debian packages in the Debian PTS
 
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
 
What every C++ programmer should know about modern compilers (w/o comments, A...
What every C++ programmer should know about modern compilers (w/o comments, A...What every C++ programmer should know about modern compilers (w/o comments, A...
What every C++ programmer should know about modern compilers (w/o comments, A...
 
Debian meetup nantes 2015 : Salt pour gérer de nombreux serveurs debian
Debian meetup nantes 2015 : Salt pour gérer de nombreux serveurs debianDebian meetup nantes 2015 : Salt pour gérer de nombreux serveurs debian
Debian meetup nantes 2015 : Salt pour gérer de nombreux serveurs debian
 
A Deep Dive into the Socio-Technical Aspects of Delays in Security Patching
A Deep Dive into the Socio-Technical Aspects of Delays in Security PatchingA Deep Dive into the Socio-Technical Aspects of Delays in Security Patching
A Deep Dive into the Socio-Technical Aspects of Delays in Security Patching
 
PyLadies Talk: Learn to love the command line!
PyLadies Talk: Learn to love the command line!PyLadies Talk: Learn to love the command line!
PyLadies Talk: Learn to love the command line!
 
Reducing Resistance: Deployment as Surface
Reducing Resistance: Deployment as SurfaceReducing Resistance: Deployment as Surface
Reducing Resistance: Deployment as Surface
 
Web enabling your survey business
Web enabling your survey businessWeb enabling your survey business
Web enabling your survey business
 
The net is dark and full of terrors - James Bennett
The net is dark and full of terrors - James BennettThe net is dark and full of terrors - James Bennett
The net is dark and full of terrors - James Bennett
 
airflow_aws_snow.pptx
airflow_aws_snow.pptxairflow_aws_snow.pptx
airflow_aws_snow.pptx
 
DevOps introduction
DevOps introductionDevOps introduction
DevOps introduction
 
Spring Roo 2.0 Preview at Spring I/O 2016
Spring Roo 2.0 Preview at Spring I/O 2016 Spring Roo 2.0 Preview at Spring I/O 2016
Spring Roo 2.0 Preview at Spring I/O 2016
 
Svelte (adjective): Attractively thin, graceful, and stylish
Svelte (adjective): Attractively thin, graceful, and stylishSvelte (adjective): Attractively thin, graceful, and stylish
Svelte (adjective): Attractively thin, graceful, and stylish
 
Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014Apache Spark: the next big thing? - StampedeCon 2014
Apache Spark: the next big thing? - StampedeCon 2014
 
Through the firewall with miniCRAN
Through the firewall with miniCRANThrough the firewall with miniCRAN
Through the firewall with miniCRAN
 
Meteor - not just for rockstars
Meteor - not just for rockstarsMeteor - not just for rockstars
Meteor - not just for rockstars
 
Hardware Description Languages .pptx
Hardware Description Languages .pptxHardware Description Languages .pptx
Hardware Description Languages .pptx
 
Metadata Provenance
Metadata ProvenanceMetadata Provenance
Metadata Provenance
 
Object Oriented Software Development revision slide
Object Oriented Software Development revision slide Object Oriented Software Development revision slide
Object Oriented Software Development revision slide
 

More from Tom Mens

How to be(come) a successful PhD student
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD studentTom Mens
 
Recognising bot activity in collaborative software development
Recognising bot activity in collaborative software developmentRecognising bot activity in collaborative software development
Recognising bot activity in collaborative software developmentTom Mens
 
A Dataset of Bot and Human Activities in GitHub
A Dataset of Bot and Human Activities in GitHubA Dataset of Bot and Human Activities in GitHub
A Dataset of Bot and Human Activities in GitHubTom Mens
 
The (r)evolution of CI/CD on GitHub
 The (r)evolution of CI/CD on GitHub The (r)evolution of CI/CD on GitHub
The (r)evolution of CI/CD on GitHubTom Mens
 
Nurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the FutureNurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the FutureTom Mens
 
Comment programmer un robot en 30 minutes?
Comment programmer un robot en 30 minutes?Comment programmer un robot en 30 minutes?
Comment programmer un robot en 30 minutes?Tom Mens
 
On the rise and fall of CI services in GitHub
On the rise and fall of CI services in GitHubOn the rise and fall of CI services in GitHub
On the rise and fall of CI services in GitHubTom Mens
 
On backporting practices in package dependency networks
On backporting practices in package dependency networksOn backporting practices in package dependency networks
On backporting practices in package dependency networksTom Mens
 
Comparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and RubygemsComparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and RubygemsTom Mens
 
Lost in Zero Space
Lost in Zero SpaceLost in Zero Space
Lost in Zero SpaceTom Mens
 
Evaluating a bot detection model on git commit messages
Evaluating a bot detection model on git commit messagesEvaluating a bot detection model on git commit messages
Evaluating a bot detection model on git commit messagesTom Mens
 
Is my software ecosystem healthy? It depends!
Is my software ecosystem healthy? It depends!Is my software ecosystem healthy? It depends!
Is my software ecosystem healthy? It depends!Tom Mens
 
Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Bot or not? Detecting bots in GitHub pull request activity based on comment s...Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Bot or not? Detecting bots in GitHub pull request activity based on comment s...Tom Mens
 
On the fragility of open source software packaging ecosystems
On the fragility of open source software packaging ecosystemsOn the fragility of open source software packaging ecosystems
On the fragility of open source software packaging ecosystemsTom Mens
 
How magic is zero? An Empirical Analysis of Initial Development Releases in S...
How magic is zero? An Empirical Analysis of Initial Development Releases in S...How magic is zero? An Empirical Analysis of Initial Development Releases in S...
How magic is zero? An Empirical Analysis of Initial Development Releases in S...Tom Mens
 
Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)Tom Mens
 
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)Tom Mens
 
SecoHealth 2019 Research Achievements
SecoHealth 2019 Research AchievementsSecoHealth 2019 Research Achievements
SecoHealth 2019 Research AchievementsTom Mens
 
SECO-Assist 2019 research seminar
SECO-Assist 2019 research seminarSECO-Assist 2019 research seminar
SECO-Assist 2019 research seminarTom Mens
 
Empirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package ManagersEmpirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package ManagersTom Mens
 

More from Tom Mens (20)

How to be(come) a successful PhD student
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD student
 
Recognising bot activity in collaborative software development
Recognising bot activity in collaborative software developmentRecognising bot activity in collaborative software development
Recognising bot activity in collaborative software development
 
A Dataset of Bot and Human Activities in GitHub
A Dataset of Bot and Human Activities in GitHubA Dataset of Bot and Human Activities in GitHub
A Dataset of Bot and Human Activities in GitHub
 
The (r)evolution of CI/CD on GitHub
 The (r)evolution of CI/CD on GitHub The (r)evolution of CI/CD on GitHub
The (r)evolution of CI/CD on GitHub
 
Nurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the FutureNurturing the Software Ecosystems of the Future
Nurturing the Software Ecosystems of the Future
 
Comment programmer un robot en 30 minutes?
Comment programmer un robot en 30 minutes?Comment programmer un robot en 30 minutes?
Comment programmer un robot en 30 minutes?
 
On the rise and fall of CI services in GitHub
On the rise and fall of CI services in GitHubOn the rise and fall of CI services in GitHub
On the rise and fall of CI services in GitHub
 
On backporting practices in package dependency networks
On backporting practices in package dependency networksOn backporting practices in package dependency networks
On backporting practices in package dependency networks
 
Comparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and RubygemsComparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
Comparing semantic versioning practices in Cargo, npm, Packagist and Rubygems
 
Lost in Zero Space
Lost in Zero SpaceLost in Zero Space
Lost in Zero Space
 
Evaluating a bot detection model on git commit messages
Evaluating a bot detection model on git commit messagesEvaluating a bot detection model on git commit messages
Evaluating a bot detection model on git commit messages
 
Is my software ecosystem healthy? It depends!
Is my software ecosystem healthy? It depends!Is my software ecosystem healthy? It depends!
Is my software ecosystem healthy? It depends!
 
Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Bot or not? Detecting bots in GitHub pull request activity based on comment s...Bot or not? Detecting bots in GitHub pull request activity based on comment s...
Bot or not? Detecting bots in GitHub pull request activity based on comment s...
 
On the fragility of open source software packaging ecosystems
On the fragility of open source software packaging ecosystemsOn the fragility of open source software packaging ecosystems
On the fragility of open source software packaging ecosystems
 
How magic is zero? An Empirical Analysis of Initial Development Releases in S...
How magic is zero? An Empirical Analysis of Initial Development Releases in S...How magic is zero? An Empirical Analysis of Initial Development Releases in S...
How magic is zero? An Empirical Analysis of Initial Development Releases in S...
 
Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)Comparing dependency issues across software package distributions (FOSDEM 2020)
Comparing dependency issues across software package distributions (FOSDEM 2020)
 
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
Measuring Technical Lag in Software Deployments (CHAOSScon 2020)
 
SecoHealth 2019 Research Achievements
SecoHealth 2019 Research AchievementsSecoHealth 2019 Research Achievements
SecoHealth 2019 Research Achievements
 
SECO-Assist 2019 research seminar
SECO-Assist 2019 research seminarSECO-Assist 2019 research seminar
SECO-Assist 2019 research seminar
 
Empirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package ManagersEmpirically Analysing the Socio-Technical Health of Software Package Managers
Empirically Analysing the Socio-Technical Health of Software Package Managers
 

Recently uploaded

Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 

Recently uploaded (20)

Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 

Analysis of R Package Development and Distribution Across Repositories

  • 1. On the Development and Distribution of R Packages An Empirical Analysis of the R Ecosystem Alexandre Decan, Tom Mens, Maëlick Claes & Philippe Grosjean COMPLEXYS Research Institute 8th September 2015, IWSECO-WEA 2015
  • 2. Statistical environment Packages with code, doc, examples, tests, datasets: http://www.r-project.org i n s t a l l . p a c k a g e s ( " M y P a c k a g e " )
  • 3. R package repositories (in March 2015) Repository name Number of packages Since Role CRAN 6411 1997 Distribution Bioconductor 997 2001 Distribution R-Forge 1883 2006 SVN development Distribution GitHub 5150 2008 Git development Distribution using devtools But there are more: RForge, Omegahat, Bitbucket, Sourceforge, Google code, ...
  • 4. How to install packages install.packages function: automatically installs a package and its dependencies if needed only uses CRAN by default can be configured to use other repositories like Bioconductor and R-Forge Package devtools provides various functions to install packages from other sources: SVN Git GitHub Bitbucket Gitorious devtools retrieves the package content and installs it using install.packages
  • 5. Previous work Preliminary empirical study using CRAN meta-data On the maintainability of CRAN packages (CSMR-WCRE 2014) Inter-project (Type1) clone study of CRAN packages: An Empirical Study of Identical Function Clones in CRAN (IWSC 2015) Web-dashboard for CRAN maintainers maintaineR, a web-based dashboard for maintainers of CRAN packages (ICSME 2014)
  • 6. Research Questions Where are R packages developed and/or distributed? How to resolve package dependencies?
  • 7. Where are R packages developed and/or distributed?
  • 8. Packages contained in the different repositories in March 2015
  • 9. Number of newly created packages on GitHub More and more packages are developed on GitHub that are not distributed somewhere else.
  • 10. Evolution of the number of packages in CRAN and GitHub The number of packages only on GitHub grows faster than the number of packages on CRAN! But it does not seem to impact the growth of CRAN.
  • 11. How to resolve package dependencies?
  • 12. Dependencies Defined in the DESCRIPTION file Using the fields Depends and Imports These fields does not specify from which repository the dependency must come! P a c k a g e : S c i V i e w s T y p e : P a c k a g e T i t l e : S c i V i e w s G U I A P I - M a i n p a c k a g e I m p o r t s : e l l i p s e D e p e n d s : R ( > = 2 . 6 . 0 ) , s t a t s , g r D e v i c e s , g r a p h i c s , M A S S E n h a n c e s : b a s e D e s c r i p t i o n : F u n c t i o n s t o i n s t a l l S c i V i e w s a d d i t i o n s t o R , a n d m o r e ( v a r i o u s ) t o o l s V e r s i o n : 0 . 9 - 5 D a t e : 2 0 1 3 - 0 3 - 0 1 A u t h o r : P h i l i p p e G r o s j e a n M a i n t a i n e r : P h i l i p p e G r o s j e a n p h g r o s j e a n @ s c i v i e w s . o r g L i c e n s e : G P L - 2 L a z y L o a d : y e s U R L : h t t p : / / w w w . s c i v i e w s . o r g / S c i V i e w s - R B u g R e p o r t s : h t t p s : / / r - f o r g e . r - p r o j e c t . o r g / t r a c k e r / ? g r o u p _ i d = 1 9 4 P a c k a g e d : 2 0 1 4 - 0 3 - 0 1 2 0 : 3 4 : 1 1 U T C ; p h g r o s j e a n N e e d s C o m p i l a t i o n : n o R e p o s i t o r y : C R A N D a t e / P u b l i c a t i o n : 2 0 1 4 - 0 3 - 0 2 1 2 : 4 0 : 4 2 I m p o r t s : e l l i p s e D e p e n d s : R ( > = 2 . 6 . 0 ) , s t a t s , g r D e v i c e s , g r a p h i c s , M A S S
  • 13. Package repository priority For each defined dependency relationship we consider the first package matching the dependency by privileging repositories in this order: CRAN Bioconductor GitHub R-Forge
  • 14. Dependencies between repositories CRAN Bioconductor GitHub R-Forge 58,8% 48.9% 37.2% 5.2% 2.3% 77.1% 61% 5.8% 5.7% CRAN is the core of the ecosystem
  • 15. Conclusion We looked where R packages are developed and distributed taking into account CRAN, Bioconductor, GitHub and R-Forge GitHub is growing at a faster pace than the other repositories More and more packages are developed on GitHub but not distributed somewhere else However it does not impact the other repositories: CRAN is (still) at the center of the ecosystem Most of Bioconductor, R-Forge and GitHub requires CRAN in order to work
  • 16. Current and future work Take into account more R package repositories (e.g. Bitbucket) Investigate why there are so many packages only on GitHub Asking developers (survey) about usage of CRAN and Github Eventually provide support to R package users and developers by improving package dependency management Socio-technical analysis of R package developer communities Similar study of an ecosystem based on another programming
  • 17. Thanks for your attention Questions? Slides: http://maelick.net/presentations/iwseco-wea2015/