Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

An empirical comparison of dependency issues in open source software packaging ecosystems

158 views

Published on

Presentation of paper published at the International Conference on Software Analysis, Evolution and Reengineering (SANER 2017), Klagenfurt, Austria, February 2017. Co-authored by Alexandre Decan, Tom Mes and Maelick Claes

Published in: Science
  • Be the first to comment

An empirical comparison of dependency issues in open source software packaging ecosystems

  1. 1. An Empirical Comparison of Dependency Issues in Packaging Ecosystems Alexandre Decan, Tom Mens, Maelick Claes So#ware Engineering Lab, Belgium
  2. 2. SANER – Klagenfurt, Austria, February 2017 Packaging Ecosystem A large collecDon of interdependent so#ware packages … … that can be installed and distributed using a package manager Selected Examples: Open Source Packaging Ecosystems Bogart, Kastner, Herbsleb & Thung (FSE 2016) How to break an API: Cost NegotaDon and Community Values in Three So#ware Ecosystems 2
  3. 3. SANER – Klagenfurt, Austria, February 2017 Package Dependencies Are necessary •  Increase modularity and evoluDon •  Facilitate reuse •  Reduce complexity 3
  4. 4. SANER – Klagenfurt, Austria, February 2017 CRAN RubyGems npm Language R Ruby JavaScript Packages 10K 123K 317K Dependencies 22K 183K 728K All pkg. releases 57K 685K 2000K All dependencies 128K 1675K 7500K Package Dependencies Are omnipresent 4
  5. 5. SANER – Klagenfurt, Austria, February 2017 Most packages depend on another one April 2016 npm ~60% RubyGems ~60% CRAN ~70% 5
  6. 6. SANER – Klagenfurt, Austria, February 2017 Package Dependencies Are difficult to manage – TransiDve dependencies 6
  7. 7. SANER – Klagenfurt, Austria, February 2017 Dealing with the deeply nested dependencies has caused us no end of frustrations. A dependency of a dependency of a dependency breaks and we’re left trying to trace the source of the error and figure out which repo to open an issue on. The Problem of TransiDve Package Dependencies h[p://www.haneycodes.net/npm-le#-pad-have-we-forgo[en-how-to-program/ 7
  8. 8. SANER – Klagenfurt, Austria, February 2017 The Problem of TransiDve Package Dependencies This impacted many thousands of projects. [...] We began observing hundreds of failures per minute, as dependent projects – and their dependents, and their dependents... – all failed when requesting the now-unpublished package. 8
  9. 9. SANER – Klagenfurt, Austria, February 2017 >2% of all npm packages relied on left-pad. Left-pad is not an exception: The Problem of TransiDve Package Dependencies EvoluDon of the number of packages having a relaDve impact >2% 9
  10. 10. SANER – Klagenfurt, Austria, February 2017 Package Dependencies Some packages have very high impact (>30%) 2011 2012 2013 2014 2015 2016 0.0 0.1 0.2 0.3 0.4 0.5 ratioofpackages npm cran rubygems RelaDve number of (transiDve) dependents for most the required package 10
  11. 11. SANER – Klagenfurt, Austria, February 2017 [...] the risk of things breaking at some point due to the fact that a version of a dependency has changed without you knowing about it is immense. That actually cost us weeks and months in a couple of professional projects I was part of. The Problem of IncompaDble Package Updates One recent example was the forced roll-back of the ggplot2 update to version 0.9.0, because the introduced changes caused several other packages to break. 11
  12. 12. SANER – Klagenfurt, Austria, February 2017 41% of observed errors caused by incompatible updates. On average, one backward incompatible update per 20 new releases The Problem of IncompaDble Package Updates Decan, Mens, Claes & Grosjean, SANER 2016: “When GitHub meets CRAN: an analysis of inter-repository package dependency problems.” In 2010, release 0.5.0 of i18n broke the popular ActiveRecord gem… … on which relied 874 packages... ... which represents 5.2% of all packages! 12
  13. 13. SANER – Klagenfurt, Austria, February 2017 Possible solutions to package dependency management Solutions tend to be ecosystem-specific 1.  Package Update policy 2.  Semantic Versioning 3.  Dependency Constraints 4.  Continuous Integration Tools 13
  14. 14. SANER – Klagenfurt, Austria, February 2017 1. Package Update Policy Possible solutions to package dependency management Submiting updates should be done responsibly and with respect for the volunteers’ time. Once a package is established (which may take several rounds), “no more than every 1–2 months” seems appropriate. Changes to CRAN packages causing significant disruption to other packages must be agreed with the CRAN maintainers well in advance of any publicity. 14 One recent example was the forced roll-back of the ggplot2 update to version 0.9.0, because the introduced changes caused several other packages to break.
  15. 15. SANER – Klagenfurt, Austria, February 2017 How frequently are packages updated? •  Packages tend to be updated shortly after a previous update. •  Packages required by other packages are updated more frequently. 15
  16. 16. SANER – Klagenfurt, Austria, February 2017 Possible solutions to package dependency management 2. SemanDc versioning: MAJOR.MINOR.PATCH – MAJOR = breaking changes are allowed – MINOR = only backward compaDble updates – PATCH = only bug and security fixes While semanDc versioning can be suggested, it cannot be enforced! release 0.5.0 of i18n broke 875 packages (i.e., 5% of the ecosystem) 16
  17. 17. SANER – Klagenfurt, Austria, February 2017 3. Dependency Constraints •  Minimal constraint pkg >= 2.4.0 •  Maximal constraint pkg < 3.0.0 •  Strict constraint pkg == 2.4.0 ProporDon of packages (straight lines) and proporDon of dependencies (do[ed lines) that use a dependency constraint. Possible solutions to package dependency management 17
  18. 18. SANER – Klagenfurt, Austria, February 2017 3. Dependency Constraints •  Minimal constraint pkg >= 2.4.0 •  Maximal constraint pkg < 3.0.0 •  Strict constraint pkg == 2.4.0 Possible solutions to package dependency management ProporDon of packages with dependencies (straight lines) and dependencies (do[ed lines) that specify a strict, minimal or maximal dependency constraint. 18
  19. 19. SANER – Klagenfurt, Austria, February 2017 3. Dependency Constraints •  Minimal constraint pkg >= 2.4.0 •  Maximal constraint pkg < 3.0.0 •  Strict constraint pkg == 2.4.0 Possible solutions to package dependency management “we continued to observe many errors. This happened because a number of dependency chains [...] explicitly requested 0.0.3.” 19
  20. 20. SANER – Klagenfurt, Austria, February 2017 Possible solutions to package dependency management Constraints that require a specific subset of accepted versions Can lead to co-installability issues May prevent a package to benefit from updates Eg.: security fixes in C 1.4.1 A C 1.4.0 B <= 1.4.0 >= 1.4.1 C 1.4.1 20
  21. 21. SANER – Klagenfurt, Austria, February 2017 Possible solutions to package dependency management 4. Continuous integration management Automated monitoring of dependency updates and security issues e.g., Gemnasium, Requires.io, DependencyCI, GreenKeeper … only monitor direct dependencies, not transitive ones Automated testing for breaking changes e.g., travis-ci, codeship … help to detect breaking changes but not to address them 21
  22. 22. SANER – Klagenfurt, Austria, February 2017 Empirical comparison of 3 packaging ecosystems Need to find right balance between –  having up to date dependencies –  facing the risk of backward incompaDble changes Requires a combinaDon of –  technical soluDons (constraints, CI) –  social responsibiliDes 22
  23. 23. SANER – Klagenfurt, Austria, February 2017 Are micro-packages harmful? – 11 lines of le#pad package breaking >6000 packages ? Is installing packages directly from github harmful? – No specific noDon of version (only commits and tags) – Will make package management even more problemaDc 23

×