Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

On the fragility of open source software packaging ecosystems

Keynote at the Seminar on Advanced Techniques and Tools for Software Evolution (SATTOSE 2020) by Tom Mens, University of Mons (Belgium)

  • Be the first to comment

  • Be the first to like this

On the fragility of open source software packaging ecosystems

  1. 1. Tom Mens Software Engineering Lab Faculty of Sciences tom.mens@umons.ac.be @tom_mens On the fragility of open source software packaging ecosystems
  2. 2. Directed by Tom Mens Department of Computer Science Faculty of Sciences tom.mens@umons.ac.be http://informatique.umons.ac.be Software Engineering Lab
  3. 3. SECO-ASSIST "Excellence of Science” Research Project 2018-2021 secoassist.github.io@secoassist
  4. 4. Duration: 2018-2021 Budget: 2,4 million euros "Excellence of Science” Research Project
  5. 5. Duration: 2018-2021 Budget: 2,4 million euros "Excellence of Science” Research Project
  6. 6. What is a software packaging ecosystem? A collection of interdependent software packages that are developed and distributed by a large community of software developers • Distributed development, e.g., through git • Social coding, e.g., through GitHub • Package distribution through dedicated package managers • Ecosystem-specific versioning and release policies
  7. 7. OS Package manager Logo macOS MacPorts, Homebrew Linux dpkg, apt, RPM, pacman Windows winget, Windows Store, Chocolatey Android Play Store iOS App Store ROS rospkg Packaging ecosystems can be for a specific operating system
  8. 8. Language Package manager #packages Logo JavaScript npm >1.4M PHP Packagist >0.33M Python PyPI >0.26K .NET NuGet >0.22M Java Maven >0.19K Ruby RubgyGems >0.16M Cargo (Rust), CPAN (Perl), CRAN (R), NuGet (.NET), Hackage (Haskell), … Packaging ecosystems can be for a specific programming language
  9. 9. Project #packages Logo Eclipse >40M Wordpress >67K Atom >13K Emacs >5K … Packaging ecosystems can be for a specific (open source) project / community
  10. 10. Libraries.io monitors 7,387,590 open source packages across 37 different package managers https://libraries.io (20 May 2020)
  11. 11. Why are packaging ecosystems fragile? • Rapid ecosystem evolution and growth • Bugs • Security vulnerabilities • Backward incompatibilities • Abandoned or unmaintained packages • Deprecated packages • Incompatible or prohibited licences • Suboptimal release and update policies • Insufficient social diversity • Social conflicts • …
  12. 12. Dependency Hell
  13. 13. Dependency Hell Dependency issues • Too many direct and transitive dependencies • Broken dependencies due to backward incompatibilities • Co-installability problems • Incompatible licences • Deprecated dependencies “Technical lag” due to outdated dependencies
  14. 14. Case study Evolution of the “request” package
  15. 15. Case study Evolution of the “request” package https://npm.anvaka.com/
  16. 16. Case study Evolution of the “request” package https://npm.anvaka.com/
  17. 17. Case study Evolution of the “request” package https://npm.anvaka.com/
  18. 18. Case study Evolution of the “request” package https://npm.anvaka.com/
  19. 19. Case study Evolution of the “request” package https://npm.anvaka.com/
  20. 20. Case study Evolution of the “request” package
  21. 21. Case study Evolution of the “request” package
  22. 22. Alternatives to request?
  23. 23. Alternatives to request?
  24. 24. Research Questions Raised For maintainers of dependent packages: • (When) should I upgrade the version of my dependency? • How to manage/avoid explosive growth of dependencies? • How to avoid depending on fragile packages? • How to deal with breaking changes? • How to decide which (alternative) package to depend upon? • How to migrate to alternative packages?
  25. 25. Research Questions Raised For maintainers of required packages: • How to assess impact through transitive dependents? • E.g. propagation of security vulnerabilities and their fixes • How to inform dependents of bugs and security vulnerabilities? • When and how to release backward incompatible changes? • When and how to decide to deprecate a package (release)? • When to declare a package as being stable? • How to attract contributors and avoid abandonment?
  26. 26. Research Questions Raised For ecosystem managers: • How to identify fragile packages? • Which of those fragile packages have a high ecosystem-wide impact? • How to compare fragility between ecosystems? • How to reduce fragility over time?
  27. 27. Characterising the evolution of software packaging ecosystems Observation: Fast package dependency network growth in two years Packaging ecosystem #packages (2018-01) #packages (2020-01) % growth #deps (2018-01) #deps (2020-01) % growth npm 630K 1.218K 93% 19.0M 48.7M 156% RubyGems 141K 180K 28% 1.92M 2.40M 25% Packagist 121K 155K 28% 2.17M 4.73M 118% Cargo 13K 35K 169% 257K 796K 210%
  28. 28. Characterising the evolution of software packaging ecosystems 830K packages – 5.8M package versions – 20.5M dependencies (April 2017) An Empirical Comparison of Dependency Network Evolution in Seven Software Packaging Ecosystems A Decan, T. Mens, Ph. Grosjean (2019) Empirical Software Engineering 24(1)
  29. 29. Fast growth Package dependency networks grow exponentially in terms of number of packages and/or dependencies Fastest growth for npm Slowest growth for CRAN
  30. 30. Continuing change • Number of package updates grows over time • >50% of package releases are updated within 2 months • Required and young packages are updated more frequently 2012 2013 2014 2015 2016 2017 100 101 102 103 104 105 106 number of updates (log) cargo cpan cran npm nuget packagist rubygems Fastest growth for npm Slowest growth for CRAN
  31. 31. 2012 2013 2014 2015 2016 2017 0 50 100 150 200 250 300 350 400 cargo cpan cran npm nuget packagist rubygems Increasingly connected • Highly connected network, containing 60% to 80% of all packages • Pareto principle: A stable minority (20%) of required packages collect over 80% of all reverse dependencies Reusability index: Maximal value n such that there exist n required packages having at least n dependent packages. Fastest growth for npm
  32. 32. Ecosystem fragility due to transitive dependencies March 2016 November 2010 Unexpected removal of left-pad caused > 2% of all packages to become uninstallable (> 5,400 packages) Release 0.5.0 of i18n broke dependent package ActiveRecord that was transitively required by >5% of all packages
  33. 33. Many deep transitive dependencies • Fragile packages may have a very high transitive impact • Over 50% of top-level packages have a deep dependency graph 2012 2013 2014 2015 2016 2017 0 50 100 150 200 250 300 number of packages cargo cpan cran npm nuget packagist rubygems Number of packages that are transitively required by at least 5% of all packages. 1 2 3 4 5 6+ 0.0 0.1 0.2 0.3 0.4 0.5 proportion of top­level packages cargo 1 2 3 4 5 6+ cpan 1 2 3 4 5 6+ cran 1 2 3 4 5 6+ npm 1 2 3 4 5 6+ nuget 1 2 3 4 5 6+ packagist 1 2 3 4 5 6+ rubygems Transitive dependency depth of top-level packages
  34. 34. Many outdated dependencies Should package maintainers upgrade their dependencies to more recent versions? 😀 Upgrades benefit from bug and security fixes 😀 Upgrading allows to use new features 😢 Upgrading requires effort 😢 Upgrading may introduce breaking changes
  35. 35. Measuring Technical Lag Technical lag measures how outdated a package or dependency is w.r.t. the “ideal” situation where “ideal” = “most recent”; “most secure”; ”least bugs”; “most stable”; “most compatible”; … A formal framework for measuring technical lag in component repositories – and its application to npm A Zerouali, T Mens, et al. (2019) J. Software Evolution and Process Technical lag in software compilations: Measuring how outdated a software deployment is J Gonzalez-Barahona, P Sherwood, G Robles, D Izquierdo (2017) IFIP International Conference on Open Source Systems. Springer
  36. 36. Technical Lag - Example Time-based measurement of technical lag (ideal = most recent release; delta = time difference) 1.0.1 1.1.0 2.0.01.2.0 2.0.1 deployed package upstream package Time lag date(2.0.1) - date(1.1.0)
  37. 37. Technical Lag - Example Version-based measurement of technical lag (ideal = highest release; delta = version difference) 1.0.1 1.1.0 2.0.12.0.0 1.2.0 deployed package 1 major upstream package 1 patch Version lag 1 major + 1 patch
  38. 38. Technical Lag - Example Vulnerability-based measurement of technical lag (ideal = least vulnerable release; delta = #vulnerabilities) 1.0.1 1.1.0 2.0.01.2.0 2.0.1 deployed package upstream package Security lag 1 vulnerability fix behind
  39. 39. Technical Lag - Example Bug-based measurement of technical lag (ideal = least known bugs; delta = #known bugs) 1.0.1 1.1.0 2.0.0 deployed package upstream package 1.2.0 2.0.1 Dependency needs to be downgraded to be able to use most stable version… Bug lag 1 more bug than most stable version
  40. 40. Technical Lag - Example Bug-based measurement of technical lag (ideal = least known bugs; delta = #known bugs) 1.0.1 1.1.0 2.0.0 deployed package upstream package 1.2.0 2.0.1 An empirical study of dependency downgrades in the npm ecosystem. F Roseiro Côgo, G Ansaldi Oliva, A E Hassan (2019) IEEE Transactions on Software Engineering On the evolution of technical lag in the npm package dependency network. A Decan, T Mens, E Constantinou (2018) IEEE Int’l Conf. Software Maintenance and Evolution
  41. 41. Technical Lag Do semantic versioning and dependency constraints play a role? major minor patch 3 9 2 Breaking changes Backwards compatible changes Bug fixes Most permissive Most Restrictive
  42. 42. Technical Lag in npm ht • 1 out of 3 dependents never update their dependency • Outdatedness is related to the type of dependency constraint being used Strict constraints represent about 20% of all dependencies, but about 33% of all outdated dependencies All runtime dependencies Outdated runtime dependencies
  43. 43. Technical Lag in npm By making dependency constraints “semver-compliant”, the proportion of releases suffering from technical lag could be reduced by >17% “What if” analysis:
  44. 44. Semantic versioning To which extent do software packaging ecosystems enable/adhere to semantic versioning? What do package dependencies tell us about semantic versioning? A Decan, T Mens (2019) IEEE Transactions on Software Engineering
  45. 45. Semantic versioning Different packaging ecosystems interpret version constraints in different ways More restrictive than semver More permissive than semver
  46. 46. Semantic versioning Proportion of dependency constraints (for package releases ≥1.0.0) that are semver-compliant, more permissive, or more restrictive (based on January 2018 dataset from libraries.io)
  47. 47. Semantic versioning To which extent do software packaging ecosystems enable/adhere to semantic versioning? • Cargo, npm and Packagist are mostly semver-compliant. • All considered ecosystems become more compliant over time. • More than 16% of the constraints in npm, Packagist and Rubygems are restrictive, preventing automatic adoption of backward compatible upgrades
  48. 48. Abandoned packages • Will continue to increase their lag • Will not incorporate fixes of bugs or vulnerabilities in their dependencies, even if those fixes exist How to reduce the risk of abandoned packages? • By forecasting future commit activity of its contributors GAP: Forecasting commit activity in git projects A Decan, E Constantinou, T Mens, H Rocha (2020) Journal on Systems and Software
  49. 49. Abandoned packages Forecasting future commit activity of git contributors • Based on a probabistic model of future days of activity pip install git+https://github.com/AlexandreDecan/gap
  50. 50. Conclusion Packaging ecosystems are affected by “fragile dependency” issues • Many and deep transitive dependencies • Fast growth and continuing change • Outdated or deprecated dependencies • Breaking changes • Unmaintained pakages
  51. 51. Conclusion Tools and policies can help to mitigate these issues • Measuring, monitoring and updating fragile dependencies and contributor abandonment • Supporting semantic versioning • Supporting transitive dependencies • Automating selection of and migration to alternative packages

×