What is a software packaging ecosystem?
A collection of interdependent software packages that are
developed and distributed by a large community of software
developers
• Distributed development, e.g., through git
• Social coding, e.g., through GitHub
• Package distribution through dedicated package managers
• Ecosystem-specific versioning and release policies
OS Package manager Logo
macOS MacPorts, Homebrew
Linux dpkg, apt, RPM, pacman
Windows winget, Windows Store, Chocolatey
Android Play Store
iOS App Store
ROS rospkg
Packaging ecosystems can be
for a specific operating system
Language Package manager #packages Logo
JavaScript npm >1.4M
PHP Packagist >0.33M
Python PyPI >0.26K
.NET NuGet >0.22M
Java Maven >0.19K
Ruby RubgyGems >0.16M
Cargo (Rust), CPAN (Perl), CRAN (R), NuGet (.NET), Hackage (Haskell), …
Packaging ecosystems can be
for a specific programming language
Project #packages Logo
Eclipse >40M
Wordpress >67K
Atom >13K
Emacs >5K
…
Packaging ecosystems can be
for a specific (open source) project / community
Dependency Hell
Dependency issues
• Too many direct and transitive
dependencies
• Broken dependencies due to
backward incompatibilities
• Co-installability problems
• Incompatible licences
• Deprecated dependencies
“Technical lag” due to outdated
dependencies
Research Questions Raised
For maintainers of dependent packages:
• (When) should I upgrade the version of my dependency?
• How to manage/avoid explosive growth of dependencies?
• How to avoid depending on fragile packages?
• How to deal with breaking changes?
• How to decide which (alternative) package to depend upon?
• How to migrate to alternative packages?
Research Questions Raised
For maintainers of required packages:
• How to assess impact through transitive dependents?
• E.g. propagation of security vulnerabilities and their fixes
• How to inform dependents of bugs and security vulnerabilities?
• When and how to release backward incompatible changes?
• When and how to decide to deprecate a package (release)?
• When to declare a package as being stable?
• How to attract contributors and avoid abandonment?
Research Questions Raised
For ecosystem managers:
• How to identify fragile packages?
• Which of those fragile packages have a high ecosystem-wide
impact?
• How to compare fragility between ecosystems?
• How to reduce fragility over time?
Characterising the evolution of
software packaging ecosystems
Observation: Fast package dependency network growth
in two years
Packaging
ecosystem
#packages
(2018-01)
#packages
(2020-01)
% growth #deps
(2018-01)
#deps
(2020-01)
% growth
npm 630K 1.218K 93% 19.0M 48.7M 156%
RubyGems 141K 180K 28% 1.92M 2.40M 25%
Packagist 121K 155K 28% 2.17M 4.73M 118%
Cargo 13K 35K 169% 257K 796K 210%
Characterising the evolution of
software packaging ecosystems
830K packages – 5.8M package versions – 20.5M dependencies (April 2017)
An Empirical Comparison of Dependency Network Evolution in Seven Software Packaging
Ecosystems
A Decan, T. Mens, Ph. Grosjean (2019) Empirical Software Engineering 24(1)
Fast growth
Package dependency networks grow exponentially in terms of
number of packages and/or dependencies
Fastest growth for npm
Slowest growth for CRAN
Continuing change
• Number of package updates grows over time
• >50% of package releases are updated within 2 months
• Required and young packages are updated more frequently
2012 2013 2014 2015 2016 2017
100
101
102
103
104
105
106
number of updates (log)
cargo
cpan
cran
npm
nuget
packagist
rubygems
Fastest growth for npm
Slowest growth for CRAN
2012 2013 2014 2015 2016 2017
0
50
100
150
200
250
300
350
400
cargo
cpan
cran
npm
nuget
packagist
rubygems
Increasingly connected
• Highly connected network, containing 60% to 80% of all packages
• Pareto principle: A stable minority (20%) of required packages collect
over 80% of all reverse dependencies
Reusability index: Maximal value n such that there exist n
required packages having at least n dependent packages.
Fastest growth for npm
Ecosystem fragility due to
transitive dependencies
March 2016
November 2010
Unexpected removal of
left-pad caused > 2% of all
packages to become uninstallable
(> 5,400 packages)
Release 0.5.0 of i18n broke dependent package
ActiveRecord that was transitively required by
>5% of all packages
Many deep transitive dependencies
• Fragile packages may have a very high transitive impact
• Over 50% of top-level packages have a deep dependency
graph
2012 2013 2014 2015 2016 2017
0
50
100
150
200
250
300
number of packages
cargo
cpan
cran
npm
nuget
packagist
rubygems
Number of packages that are transitively
required by at least 5% of all packages.
1 2 3 4 5 6+
0.0
0.1
0.2
0.3
0.4
0.5
proportion of toplevel packages
cargo
1 2 3 4 5 6+
cpan
1 2 3 4 5 6+
cran
1 2 3 4 5 6+
npm
1 2 3 4 5 6+
nuget
1 2 3 4 5 6+
packagist
1 2 3 4 5 6+
rubygems
Transitive dependency depth
of top-level packages
Many outdated dependencies
Should package maintainers upgrade their
dependencies to more recent versions?
😀 Upgrades benefit from bug and security fixes
😀 Upgrading allows to use new features
😢 Upgrading requires effort
😢 Upgrading may introduce breaking changes
Measuring Technical Lag
Technical lag measures how outdated a package or dependency
is w.r.t. the “ideal” situation
where “ideal” = “most recent”;
“most secure”;
”least bugs”;
“most stable”;
“most compatible”; …
A formal framework for measuring technical lag in component repositories
– and its application to npm
A Zerouali, T Mens, et al. (2019) J. Software Evolution and Process
Technical lag in software compilations: Measuring how outdated a software deployment is
J Gonzalez-Barahona, P Sherwood, G Robles, D Izquierdo (2017)
IFIP International Conference on Open Source Systems. Springer
Technical Lag - Example
Time-based measurement of technical lag
(ideal = most recent release; delta = time difference)
1.0.1 1.1.0 2.0.01.2.0 2.0.1
deployed package
upstream package
Time lag
date(2.0.1) - date(1.1.0)
Technical Lag - Example
Version-based measurement of technical lag
(ideal = highest release; delta = version difference)
1.0.1 1.1.0 2.0.12.0.0 1.2.0
deployed
package
1 major
upstream package
1 patch
Version lag
1 major + 1 patch
Technical Lag - Example
Vulnerability-based measurement of technical lag
(ideal = least vulnerable release; delta = #vulnerabilities)
1.0.1 1.1.0 2.0.01.2.0 2.0.1
deployed
package
upstream package
Security lag
1 vulnerability fix behind
Technical Lag - Example
Bug-based measurement of technical lag
(ideal = least known bugs; delta = #known bugs)
1.0.1 1.1.0 2.0.0
deployed
package
upstream package
1.2.0 2.0.1
Dependency needs to be downgraded to be
able to use most stable version…
Bug lag
1 more bug than
most stable version
Technical Lag - Example
Bug-based measurement of technical lag
(ideal = least known bugs; delta = #known bugs)
1.0.1 1.1.0 2.0.0
deployed
package
upstream package
1.2.0 2.0.1
An empirical study of dependency downgrades in the npm ecosystem.
F Roseiro Côgo, G Ansaldi Oliva, A E Hassan (2019) IEEE Transactions on Software Engineering
On the evolution of technical lag in the npm package dependency network.
A Decan, T Mens, E Constantinou (2018) IEEE Int’l Conf. Software Maintenance and Evolution
Technical Lag
Do semantic versioning and dependency constraints
play a role?
major minor patch
3 9 2
Breaking
changes
Backwards
compatible
changes
Bug fixes
Most
permissive
Most
Restrictive
Technical Lag in npm
ht
• 1 out of 3 dependents never update their dependency
• Outdatedness is related to the type of dependency constraint being used
Strict constraints represent about 20% of all dependencies,
but about 33% of all outdated dependencies
All
runtime dependencies
Outdated
runtime dependencies
Technical Lag in npm
By making dependency constraints “semver-compliant”, the proportion
of releases suffering from technical lag could be reduced by >17%
“What if” analysis:
Semantic versioning
To which extent do software packaging ecosystems
enable/adhere to semantic versioning?
What do package dependencies tell us about semantic versioning?
A Decan, T Mens (2019) IEEE Transactions on Software Engineering
Semantic versioning
Proportion of dependency constraints
(for package releases ≥1.0.0) that are
semver-compliant, more permissive, or more restrictive
(based on January 2018 dataset from libraries.io)
Semantic versioning
To which extent do software packaging ecosystems
enable/adhere to semantic versioning?
• Cargo, npm and Packagist are mostly semver-compliant.
• All considered ecosystems become more compliant over time.
• More than 16% of the constraints in npm, Packagist
and Rubygems are restrictive, preventing automatic adoption of
backward compatible upgrades
Abandoned packages
• Will continue to increase their lag
• Will not incorporate fixes of bugs or vulnerabilities in their
dependencies, even if those fixes exist
How to reduce the risk of abandoned packages?
• By forecasting future commit activity of its contributors
GAP: Forecasting commit activity in git projects
A Decan, E Constantinou, T Mens, H Rocha (2020) Journal on Systems and Software
Abandoned packages
Forecasting future commit activity of git contributors
• Based on a probabistic model of future days of activity
pip install git+https://github.com/AlexandreDecan/gap
Conclusion
Packaging ecosystems are affected by “fragile
dependency” issues
• Many and deep transitive dependencies
• Fast growth and continuing change
• Outdated or deprecated dependencies
• Breaking changes
• Unmaintained pakages
Conclusion
Tools and policies can help to mitigate these issues
• Measuring, monitoring and updating fragile
dependencies and contributor abandonment
• Supporting semantic versioning
• Supporting transitive dependencies
• Automating selection of and migration to alternative
packages