Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
On the Topology of Package Dependency Networks
A Comparison of Programming Language Ecosystems
Alexandre Decan, Tom Mens, ...
Research
Team
Previous Work
• A. Decan, T. Mens, M. Claes, P. Grosjean
– IWSECO-WEA 2015: "On the Development and Distribution of R
Pack...
Software Packaging Ecosystems
• Ecosystem: ”a collection of software projects
which are developed and evolve together in
t...
Software Packaging Ecosystems
for programming languages
• Many programming-language specific
package managers
5
npm
JavaSc...
Software Packaging Ecosystems
for programming languages
IEEE Spectrum ranking of most popular programming languages
6
(htt...
Ecosystem comparison
7
CRAN PyPI NPM
Snapshot date 2016-04-26 2016-02-17 2016-06-28
Packages 9k 56k 317k
Dependencies 21k ...
Data extraction
• CRAN: https://github.com/ecos-umons/extractoR
• npm: https://registry.npmjs.org
• PyPI: Missing dependen...
Terminology
• b is a dependency of a
• a is a reverse dependency of b
• c is a transitive dependency of a
• a is a transit...
Dependency usage
in programming language ecosystems
PyPI has proportionally more isolated Python packages
(due to its exte...
Topology
of programming language ecosystems
The majority of packages are part of a single huge component
11
Largest compon...
Differences in dependencies
between programming language ecosystems
12
npm packages have a much higher ratio of transitive...
Differences in reverse dependencies
between programming language ecosystems
13
There are proportionally more very popular ...
Differences in reverse dependencies
between programming language ecosystems
14
Number of packages required by more than 2%...
Possible explanation
micro-packages in npm
“In a lot of JavaScript environments, space is at a premium. [...]
Several larg...
function leftpad (str, len, ch) {
str = String(str);
var i = -1;
if (!ch && ch !== 0) ch = ' ';
len = len - str.length;
wh...
function leftpad (str, len, ch) {
str = String(str);
var i = -1;
if (!ch && ch !== 0) ch = ' ';
len = len - str.length;
wh...
Conclusion
• Simple metrics can be used to compare the topology of
different package-based software ecosystems
• Similarit...
Future work
• See our SANER 2017 article
“An empirical comparison of dependency issues in
OSS packaging ecosystems”
• Incl...
Thanks for you attention!
Questions?
20
Upcoming SlideShare
Loading in …5
×

On the topology of package dependency networks: A comparison of programming language ecosystems

337 views

Published on

This presentation is joint work by Alexandre Decan, Tom Mens and Maelick Claes (Software Engineering Lab, COMPLEXYS research institute, University of Mons). It was presented by Maelick during the International Workshop on Software Ecosystem Architectures (WEA 2016) in Copenhagen, on 29 November 2016.
Abstract of the accompanying paper (DOI 10.1145/1235):
Package-based software ecosystems are composed of thousands of interdependent software packages. Many empirical studies have focused on software packages belonging to a single software ecosystem, and suggest to generalise the results to more ecosystems. We claim that such a generalisation is not always possible, because the technical structure of software ecosystems can be very different, even if these ecosystems belong to the same domain. We confirm this claim through a study of three big and popular package-based programming language ecosystems: R’s CRAN archive network, Python’s PyPI distribution, and JavaScript’s NPM package manager. We study and compare the structure of their package dependency graphs and reveal some important differences that may make it difficult to generalise the findings of one ecosystem to another one.

A follow-up on this work can be found in the SANER 2017 paper by the same authors, entitled "An Empirical Comparison of Dependency Issues in OSS Packaging Ecosystems”

Published in: Software
  • Be the first to comment

  • Be the first to like this

On the topology of package dependency networks: A comparison of programming language ecosystems

  1. 1. On the Topology of Package Dependency Networks A Comparison of Programming Language Ecosystems Alexandre Decan, Tom Mens, Maëlick Claes Software Engineering Lab 1 29 November 2016 – Int’l Workshop Software Ecosystem Architectures (WEA)
  2. 2. Research Team
  3. 3. Previous Work • A. Decan, T. Mens, M. Claes, P. Grosjean – IWSECO-WEA 2015: "On the Development and Distribution of R Packages: An Empirical Analysis of the R Ecosystem" – SANER 2016:"When GitHub Meets CRAN: An Analysis of Inter- Repository Package Dependency Problems” • A. Serebrenik, T. Mens – WEA 2015: "Challenges in Software Ecosystems Research" • Generalizability • Comparing different ecosystems 3
  4. 4. Software Packaging Ecosystems • Ecosystem: ”a collection of software projects which are developed and evolve together in the same environment” [Lungu] • Software distributed as packages – Dependency relationships between packages – Package versioning 4
  5. 5. Software Packaging Ecosystems for programming languages • Many programming-language specific package managers 5 npm JavaScript PyPI Python RubyGems Ruby CRAN R
  6. 6. Software Packaging Ecosystems for programming languages IEEE Spectrum ranking of most popular programming languages 6 (http://spectrum.ieee.org/image/Mjc5MjI0Ng.png) “The real standard library people want is more like what you find in Python or Ruby, and it’s more batteries included, feature complete, and that is not in JavaScript. That’s in the NPM world or the larger world.”
  7. 7. Ecosystem comparison 7 CRAN PyPI NPM Snapshot date 2016-04-26 2016-02-17 2016-06-28 Packages 9k 56k 317k Dependencies 21k 53k 728k New packages in 2015 1.6k 17k 113k Updates in 2015 8k 131k 711k
  8. 8. Data extraction • CRAN: https://github.com/ecos-umons/extractoR • npm: https://registry.npmjs.org • PyPI: Missing dependencies information => https://kgullikson88.github.io/blog/pypi-analysis.html 8
  9. 9. Terminology • b is a dependency of a • a is a reverse dependency of b • c is a transitive dependency of a • a is a transitive reverse dependency of c • {a, b, c, d, e, f} is a (weakly connected) component • g is an isolated package 9
  10. 10. Dependency usage in programming language ecosystems PyPI has proportionally more isolated Python packages (due to its extensive standard library?) 10 “The real standard library people want is more like what you find in Python or Ruby, and it’s more batteries included, feature complete, and that is not in JavaScript. That’s in the NPM world or the larger world.”
  11. 11. Topology of programming language ecosystems The majority of packages are part of a single huge component 11 Largest component: • 76.5% (CRAN), 35.6% (PyPI), 63.8% (npm) of all packages • 91% (CRAN), 88% (PyPI), 92% (npm) of all non-isolated packages
  12. 12. Differences in dependencies between programming language ecosystems 12 npm packages have a much higher ratio of transitive dependencies
  13. 13. Differences in reverse dependencies between programming language ecosystems 13 There are proportionally more very popular npm packages (i.e. higher number of transitive reverse dependencies)
  14. 14. Differences in reverse dependencies between programming language ecosystems 14 Number of packages required by more than 2% of the ecosystem
  15. 15. Possible explanation micro-packages in npm “In a lot of JavaScript environments, space is at a premium. [...] Several larger libraries […] have actually intentionally split themselves into sub-modules because people usually only ever load them to use a single merge function.” Example: isarray 150 direct, 77K inverse transitive deps in August 2016 var toString = {}.toString; module.exports = Array.isArray || function (arr) { return toString.call(arr) == '[object Array]’; }; 15
  16. 16. function leftpad (str, len, ch) { str = String(str); var i = -1; if (!ch && ch !== 0) ch = ' '; len = len - str.length; while (++i < len) { str = ch + str; } return str; } Known problems: leftpad 16 Its developer removed all his packages from npm: “This impacted many thousands of projects. [...] We began observing hundreds of failures per minute, as dependent projects – and their dependents, and their dependents... – all failed when requesting the now-unpublished package.” http://blog.npmjs.org/post/141577284765/kik-left-pad-and-npm
  17. 17. function leftpad (str, len, ch) { str = String(str); var i = -1; if (!ch && ch !== 0) ch = ' '; len = len - str.length; while (++i < len) { str = ch + str; } return str; } Known problems: leftpad 17 npm managers un-unpublished leftpad but … “a number of dependency chains [...] explicitly requested 0.0.3.” http://blog.npmjs.org/post/141577284765/kik-left-pad-and-npm
  18. 18. Conclusion • Simple metrics can be used to compare the topology of different package-based software ecosystems • Similarities in the dependency graph structure • Most non isolated packages are part of a large weakly connected component • Differences that can be explained by the specificities of each ecosystem • Python’s extensive standard library • CRAN’s particular versioning policy • npm's abundance of micro-packages 18
  19. 19. Future work • See our SANER 2017 article “An empirical comparison of dependency issues in OSS packaging ecosystems” • Include RubyGems • Study the evolution over time • Frequency of package updates • Resilience of packages to failures in dependencies • Impact of solutions that rely on dependency constraints and semantic versioning • Beyond SANER 2017: study the interplay between social and technical aspects 19
  20. 20. Thanks for you attention! Questions? 20

×