Dependency Issues in
Open Source Software
Package Registries
Tom Mens
tom.mens@umons.ac.be
Software Engineering Lab
Faculty of Sciences
Software package registry
• A collection of, often interdependent, software packages
• Distributed through dedicated package managers
• Focus on a specific programming language, OS, application, ...
• Ecosystem-specific formats, policies, tools, ...
© 2019 Théo Zimmermann. Challenges in the collaborative evolution of a proof language and its ecosystem. PhD dissertation, Université de Paris
An empirical comparison of dependency network evolution in seven software packaging ecosystems
A Decan, T Mens, P Grosjean (2019) Empirical Software Engineering
When and how to make breaking changes: Policies and practices in 18 open source software ecosystems
C Kästner, J Herbsleb, F Thung, T Mens (2021) ACM TOSEM
Libraries.io monitors >9M open source packages
across 32 different package registries
https://libraries.io (12 November 2024)
Catalogue of Dependency Challenges
See paper!
http://arxiv.org/abs/2409.18884
https://xkcd.com/2347 CC BY-NC 2.5
A TYPICAL SOFTWARE SYSTEM
A PACKAGE SOME
Challenge
Outdated Dependencies
Problem
• Outdated dependencies
cannot benefit from bug
fixes and security fixes
“attackers entered its system in mid-May through a web-application
vulnerability (CVE-2017-5638) that had a patch available in March. In
other words, the credit-reporting giant had more than two months to
take precautions that would have defended the personal data of 143
million people from being exposed. It didn’t.”
Wired Magazine, “Equifax Has No Excuse”, September 2017
data breach (May 2017)
Solution
• Use technical lag framework to quantify outdatedness
• Use monitoring and tools to detect and update outdated dependencies
(e.g. Dependabot, Renovate)
“systems using outdated dependencies four times as likely to
have security issues as opposed to systems that are up-to-date”
Measuring Dependency Freshness in Software Systems
J Cox, E Bouwers, M van Eekelen, J Visser. (2015) ICSE
Outdated Dependencies
Technical Lag
Quantifies difference (e.g. time delta)
between current situation and
ideal one (e.g. most up-to-date)
1.0.0 2.0.0
1.1.0 1.1.1 2.0.1
Time lag
date(1.1.3) - date(1.1.0)
1.0.1 1.1.2 1.1.3
dependent
package
required
package p
CHAPTER 4. AN EMPIRICAL STUDY OF DEPENDENCY DOWNGRADES
versions 1.1.2 and 2.0.0. Because the numerical and chronological orderin
they are not suitable to represent the parallel releases of npm.
1.0.0 1.0.1
1.1.0 1.1.1 1.1.2
2.0.0 2.0.1
T
B
a
c
1.0
1.1
2.0
Figure 4.1: Development of parallel versions in npm.
Applying the chronological and numerical orderings to the releases th
in Figure 4.1 would yield the following results (≺ denotes a precedence re
Chronological:
1.0.0 ≺ 1.0.1 ≺ 1.1.0 ≺ 1.1.1 ≺ 2.0.0 ≺ 1.1.2 ≺ 2.0.1
Numerical:
1.0.0 ≺ 1.0.1 ≺ 1.1.0 ≺ 1.1.1 ≺ 1.1.2 ≺ 2.0.0 ≺ 2.0.1
Branch-based:
1.1.3
A formal framework for measuring technical lag in
component repositories
A Zerouali, T Mens, et al. (2019)
Wiley Journal of Software: Evolution and Process, 31(8)
Challenge
Breaking Changes
Problem
• Upgrading dependencies may require effort
• Upgrading dependencies may cause your software to break
• Deep transitive dependencies are major source of breaking changes
Solution
• Semantic versioning policy signals consumers whether an update is potentially
backward incompatible
• Tools can help to detect potential breaking changes proactively
• E.g. by running the test suites of all clients on the updated dependency
Model-based testing of breaking changes in Node.js libraries
A. Møller, M. T. Torp, ESEC/FSE (2019)
Challenge
Deprecated Dependencies
• Depending on them increases risk of bugs, vulnerabilities,
incompatibilities
• Tools help to detect use of deprecated dependencies, but not
always where they occur in the dependency tree
• Deprecated transitive dependencies are hard to replace
Deprecation of packages and releases in software ecosystems: A case study on npm.
F Cogo, G Oliva, A Hassan (2022) IEEE Transactions on Software Engineering
• 54% of all packages transitively
depend on at least one deprecated
package release.
• In more than half of the cases,
dependency depth is 4 or higher.
Challenge
Incompatible Dependencies
• Incompatibilities due to dependency conflicts may occur
when upgrading/installing (versions of) installed packages
• Problem
• Dependency solving is an NP-complete problem
• Package managers use ad hoc solutions that lack expressiveness
https://research.swtch.com/version-sat
Challenge
Incompatible Dependencies
• Solutions
• Researchers are proposing generic solutions based on
formalisms such as constraint (SAT) solvers and optimisiation
• Functional package managers (e.g. Guix, Nix) avoid the
problem by allowing to deploy incompatible packages side-by-
side
• They enable creating separate namespaces on-the-fly, allowing
multiple versions of the same package to be installed side-by-side
without any risk of incompatibility or inconsistencies.
Dependency solving is still hard, but we are getting better at it
P Abate, R Di Cosmo, G Gousios, S Zacchiroli (2020) SANER
• Dependencies that are packaged with an application while they
are not needed to build and run it.
• Including them increases application size and may affect
performance and security posture
• Solution: Researchers are proposing debloating techniques
Challenge
Bloated Dependencies
A comprehensive study of bloated dependencies in the Maven ecosystem
Soto-Valero et al. (2021) Empirical Software Engineering
Challenge
Software Supply Chain Attacks
2019-2020
malicious update of network
monitoring software affecting
thousands of organisations
including US government
https://security.googleblog.com/2021/06/introducing-slsa-end-to-end-framework.html
Challenge
Software Supply Chain Attacks
OWASP Top 10 CI/CD Security Risks (2022)
• CICD-SEC-3 Dependency Chain Abuse: Abuse flaws relating to how build
environments fetch code dependencies, to enable malicious packages to be
fetched and executed locally.
• Dependency confusion: Publication of malicious packages in public repositories with the same
name as internal package names, to trick clients into downloading the malicious package
rather than the private one.
• Dependency hijacking: Obtaining control of the account of a package maintainer on the public
repository, in order to upload a new, malicious version of a widely used package, with the
intent of compromising unsuspecting clients who pull the latest version of the package.
• Typosquatting: Publication of malicious packages with similar names to those of popular
packages in the hope that a developer will misspell a package name and unintentionally fetch
the typosquatted package.
• Brandjacking: Publication of malicious packages in a manner that is consistent with the
naming convention or other characteristics of a specific brand’s package, in an attempt to get
unsuspecting developers to fetch these packages due to falsely associating them with the
trusted brand.
https://owasp.org/www-project-top-10-ci-cd-security-risks/CICD-SEC-03-Dependency-Chain-Abuse
Challenge
Software Supply Chain Attacks
Solutions
• Software Bill of Materials (SBOM)
• formally structured lists of all software components present in a software
product, including their licenses, versions, security vulnerabilities, and
vendors
• imposed or recommended by
• US Executive Order 14028 https://www.federalregister.gov/d/2021-10460
• EU Cyber Resilience Act https://www.cyberresilienceact.eu
• Supply chain Levels of Software Artefacts (SLSA)
https://slsa.dev
• SLSA L3: Hardened builds
• Reproducible builds are a set of software development practices that
create an independently-verifiable path from source to binary code
https://reproducible-builds.org
Challenge: Abandoned and
unmaintained dependencies
event-stream (November 2018)
maintenance of the npm package was
unknowingly handed over to a malicious
developer who subsequently modified the
package to include code for stealing crypto-
coins. The malicious package had been added
as a dependency to version 3.3.6 of the
popular package event-stream. The malicious
package, called flatmap-stream, contained an
encrypted payload that stole bitcoins from
certain applications.
XZ-Utils (March 2024)
compromised software compression
package for Linux distributions. Its original
well-intentioned maintainer who was no
longer able to fully maintain the package.
After gaining this maintainer’s trust during a
period of two years, a malicious attacker
took over its maintenance and introduced a
backdoor to authorise remote code
execution on affected systems.
Challenge: Abandoned and
unmaintained dependencies
Observations
• OSS packages are often
• insufficiently credited/sponsored
• developed by single (or few) maintainers
• OSS package maintainers are often
• underproductive
• unpaid volunteers
Problems
• Slows down development
• Increases risk of bugs and vulnerabilities
• Increases risk of package becoming unmaintained / abandoned
• Increases risk of “hostile takeovers” by malicious developers
Challenge: Abandoned and
unmaintained dependencies
Solutions
• ensure that package maintainers have the necessary resources
to maintain their code
• provide/use tools to detect unmaintained/single maintainer
packages and avoid depending on such packages
• maintain healthy and sustainable OSS communities able to
attract and retain motivated contributors
• put into place community package maintenance organizations
(CPMO), consisting of volunteers that steward and maintain
abandoned packages
A first look at an emerging model of community organizations for
the long-term maintenance of ecosystems' packages.
Théo Zimmermann (2020) ICSE Workshop on Software Health
Challenge
Incompatible Software Licenses
Software licenses determine the terms and conditions to
use or modify libraries within one’s own software
• Examples
• (L)GPL, Apache, MIT, BSD, CC, Eclipse, European Union, ...
https://spdx.org/licenses/
• Problem
• A software system’s license may be incompatible with the
license of its dependencies, leading to legal disputes
Challenge
Incompatible Licenses
https://en.wikipedia.org/wiki/License_compatibility
Challenge
Incompatible Licenses
Solution
• Use tools to detect and resolve license incompatibilities
https://www.npmjs.com/package/license-compatibility-checker
https://www.npmjs.com/package/license-checker
Challenge
Depending on trivial packages
left-pad (March 2016)
The package was unpublished as the result of
a naming dispute between Azer Koçulu, an
individual software engineer, and Kik. The
package was immensely popular on the
platform, being depended on by thousands of
projects and reaching 15 million downloads
prior to its removal. Several projects critical to
the JavaScript ecosystem
including Babel and Webpack depended
on left-pad and were rendered
unusable. Although the package was
republished three hours later, it caused
widespread disruption, leading npm to change
its policies regarding unpublishing to prevent a
similar event in the future.
https://en.wikipedia.org/wiki/Npm_left-pad_incident
Challenge
Depending on trivial packages
• Trivial packages implement simple and trivial tasks
Cf. left-pad and is-promise case study
• Trivial packages are prominent
They make up 16.8% of 230K studied packages
• Developers perceive trivial packages as well implemented and well-tested
• In reality, less than half of all trivial packages have tests!
Why do developers use trivial packages? An empirical case study on npm.
R Abdalkareem, O. Nourry, et al. (2017) ESEC/FSE conference
Conclusion
• Depending on reusable packages comes with a wide
range of challenges
• Problems may differ across package registries/managers
due to different policies, tools, practices, ...
• Partial solutions exist but cannot solve everything
• Many opportunities for further empical research, tooling,
awareness, standardisation ...

Dependency Issues in Open Source Software Package Registries

  • 1.
    Dependency Issues in OpenSource Software Package Registries Tom Mens tom.mens@umons.ac.be Software Engineering Lab Faculty of Sciences
  • 2.
    Software package registry •A collection of, often interdependent, software packages • Distributed through dedicated package managers • Focus on a specific programming language, OS, application, ... • Ecosystem-specific formats, policies, tools, ... © 2019 Théo Zimmermann. Challenges in the collaborative evolution of a proof language and its ecosystem. PhD dissertation, Université de Paris An empirical comparison of dependency network evolution in seven software packaging ecosystems A Decan, T Mens, P Grosjean (2019) Empirical Software Engineering When and how to make breaking changes: Policies and practices in 18 open source software ecosystems C Kästner, J Herbsleb, F Thung, T Mens (2021) ACM TOSEM
  • 3.
    Libraries.io monitors >9Mopen source packages across 32 different package registries https://libraries.io (12 November 2024)
  • 4.
    Catalogue of DependencyChallenges See paper! http://arxiv.org/abs/2409.18884 https://xkcd.com/2347 CC BY-NC 2.5 A TYPICAL SOFTWARE SYSTEM A PACKAGE SOME
  • 5.
    Challenge Outdated Dependencies Problem • Outdateddependencies cannot benefit from bug fixes and security fixes “attackers entered its system in mid-May through a web-application vulnerability (CVE-2017-5638) that had a patch available in March. In other words, the credit-reporting giant had more than two months to take precautions that would have defended the personal data of 143 million people from being exposed. It didn’t.” Wired Magazine, “Equifax Has No Excuse”, September 2017 data breach (May 2017) Solution • Use technical lag framework to quantify outdatedness • Use monitoring and tools to detect and update outdated dependencies (e.g. Dependabot, Renovate) “systems using outdated dependencies four times as likely to have security issues as opposed to systems that are up-to-date” Measuring Dependency Freshness in Software Systems J Cox, E Bouwers, M van Eekelen, J Visser. (2015) ICSE
  • 6.
    Outdated Dependencies Technical Lag Quantifiesdifference (e.g. time delta) between current situation and ideal one (e.g. most up-to-date) 1.0.0 2.0.0 1.1.0 1.1.1 2.0.1 Time lag date(1.1.3) - date(1.1.0) 1.0.1 1.1.2 1.1.3 dependent package required package p CHAPTER 4. AN EMPIRICAL STUDY OF DEPENDENCY DOWNGRADES versions 1.1.2 and 2.0.0. Because the numerical and chronological orderin they are not suitable to represent the parallel releases of npm. 1.0.0 1.0.1 1.1.0 1.1.1 1.1.2 2.0.0 2.0.1 T B a c 1.0 1.1 2.0 Figure 4.1: Development of parallel versions in npm. Applying the chronological and numerical orderings to the releases th in Figure 4.1 would yield the following results (≺ denotes a precedence re Chronological: 1.0.0 ≺ 1.0.1 ≺ 1.1.0 ≺ 1.1.1 ≺ 2.0.0 ≺ 1.1.2 ≺ 2.0.1 Numerical: 1.0.0 ≺ 1.0.1 ≺ 1.1.0 ≺ 1.1.1 ≺ 1.1.2 ≺ 2.0.0 ≺ 2.0.1 Branch-based: 1.1.3 A formal framework for measuring technical lag in component repositories A Zerouali, T Mens, et al. (2019) Wiley Journal of Software: Evolution and Process, 31(8)
  • 7.
    Challenge Breaking Changes Problem • Upgradingdependencies may require effort • Upgrading dependencies may cause your software to break • Deep transitive dependencies are major source of breaking changes Solution • Semantic versioning policy signals consumers whether an update is potentially backward incompatible • Tools can help to detect potential breaking changes proactively • E.g. by running the test suites of all clients on the updated dependency Model-based testing of breaking changes in Node.js libraries A. Møller, M. T. Torp, ESEC/FSE (2019)
  • 8.
    Challenge Deprecated Dependencies • Dependingon them increases risk of bugs, vulnerabilities, incompatibilities • Tools help to detect use of deprecated dependencies, but not always where they occur in the dependency tree • Deprecated transitive dependencies are hard to replace Deprecation of packages and releases in software ecosystems: A case study on npm. F Cogo, G Oliva, A Hassan (2022) IEEE Transactions on Software Engineering • 54% of all packages transitively depend on at least one deprecated package release. • In more than half of the cases, dependency depth is 4 or higher.
  • 9.
    Challenge Incompatible Dependencies • Incompatibilitiesdue to dependency conflicts may occur when upgrading/installing (versions of) installed packages • Problem • Dependency solving is an NP-complete problem • Package managers use ad hoc solutions that lack expressiveness https://research.swtch.com/version-sat
  • 10.
    Challenge Incompatible Dependencies • Solutions •Researchers are proposing generic solutions based on formalisms such as constraint (SAT) solvers and optimisiation • Functional package managers (e.g. Guix, Nix) avoid the problem by allowing to deploy incompatible packages side-by- side • They enable creating separate namespaces on-the-fly, allowing multiple versions of the same package to be installed side-by-side without any risk of incompatibility or inconsistencies. Dependency solving is still hard, but we are getting better at it P Abate, R Di Cosmo, G Gousios, S Zacchiroli (2020) SANER
  • 11.
    • Dependencies thatare packaged with an application while they are not needed to build and run it. • Including them increases application size and may affect performance and security posture • Solution: Researchers are proposing debloating techniques Challenge Bloated Dependencies A comprehensive study of bloated dependencies in the Maven ecosystem Soto-Valero et al. (2021) Empirical Software Engineering
  • 12.
    Challenge Software Supply ChainAttacks 2019-2020 malicious update of network monitoring software affecting thousands of organisations including US government https://security.googleblog.com/2021/06/introducing-slsa-end-to-end-framework.html
  • 13.
    Challenge Software Supply ChainAttacks OWASP Top 10 CI/CD Security Risks (2022) • CICD-SEC-3 Dependency Chain Abuse: Abuse flaws relating to how build environments fetch code dependencies, to enable malicious packages to be fetched and executed locally. • Dependency confusion: Publication of malicious packages in public repositories with the same name as internal package names, to trick clients into downloading the malicious package rather than the private one. • Dependency hijacking: Obtaining control of the account of a package maintainer on the public repository, in order to upload a new, malicious version of a widely used package, with the intent of compromising unsuspecting clients who pull the latest version of the package. • Typosquatting: Publication of malicious packages with similar names to those of popular packages in the hope that a developer will misspell a package name and unintentionally fetch the typosquatted package. • Brandjacking: Publication of malicious packages in a manner that is consistent with the naming convention or other characteristics of a specific brand’s package, in an attempt to get unsuspecting developers to fetch these packages due to falsely associating them with the trusted brand. https://owasp.org/www-project-top-10-ci-cd-security-risks/CICD-SEC-03-Dependency-Chain-Abuse
  • 14.
    Challenge Software Supply ChainAttacks Solutions • Software Bill of Materials (SBOM) • formally structured lists of all software components present in a software product, including their licenses, versions, security vulnerabilities, and vendors • imposed or recommended by • US Executive Order 14028 https://www.federalregister.gov/d/2021-10460 • EU Cyber Resilience Act https://www.cyberresilienceact.eu • Supply chain Levels of Software Artefacts (SLSA) https://slsa.dev • SLSA L3: Hardened builds • Reproducible builds are a set of software development practices that create an independently-verifiable path from source to binary code https://reproducible-builds.org
  • 15.
    Challenge: Abandoned and unmaintaineddependencies event-stream (November 2018) maintenance of the npm package was unknowingly handed over to a malicious developer who subsequently modified the package to include code for stealing crypto- coins. The malicious package had been added as a dependency to version 3.3.6 of the popular package event-stream. The malicious package, called flatmap-stream, contained an encrypted payload that stole bitcoins from certain applications. XZ-Utils (March 2024) compromised software compression package for Linux distributions. Its original well-intentioned maintainer who was no longer able to fully maintain the package. After gaining this maintainer’s trust during a period of two years, a malicious attacker took over its maintenance and introduced a backdoor to authorise remote code execution on affected systems.
  • 16.
    Challenge: Abandoned and unmaintaineddependencies Observations • OSS packages are often • insufficiently credited/sponsored • developed by single (or few) maintainers • OSS package maintainers are often • underproductive • unpaid volunteers Problems • Slows down development • Increases risk of bugs and vulnerabilities • Increases risk of package becoming unmaintained / abandoned • Increases risk of “hostile takeovers” by malicious developers
  • 17.
    Challenge: Abandoned and unmaintaineddependencies Solutions • ensure that package maintainers have the necessary resources to maintain their code • provide/use tools to detect unmaintained/single maintainer packages and avoid depending on such packages • maintain healthy and sustainable OSS communities able to attract and retain motivated contributors • put into place community package maintenance organizations (CPMO), consisting of volunteers that steward and maintain abandoned packages A first look at an emerging model of community organizations for the long-term maintenance of ecosystems' packages. Théo Zimmermann (2020) ICSE Workshop on Software Health
  • 18.
    Challenge Incompatible Software Licenses Softwarelicenses determine the terms and conditions to use or modify libraries within one’s own software • Examples • (L)GPL, Apache, MIT, BSD, CC, Eclipse, European Union, ... https://spdx.org/licenses/ • Problem • A software system’s license may be incompatible with the license of its dependencies, leading to legal disputes
  • 19.
  • 20.
    Challenge Incompatible Licenses Solution • Usetools to detect and resolve license incompatibilities https://www.npmjs.com/package/license-compatibility-checker https://www.npmjs.com/package/license-checker
  • 21.
    Challenge Depending on trivialpackages left-pad (March 2016) The package was unpublished as the result of a naming dispute between Azer Koçulu, an individual software engineer, and Kik. The package was immensely popular on the platform, being depended on by thousands of projects and reaching 15 million downloads prior to its removal. Several projects critical to the JavaScript ecosystem including Babel and Webpack depended on left-pad and were rendered unusable. Although the package was republished three hours later, it caused widespread disruption, leading npm to change its policies regarding unpublishing to prevent a similar event in the future. https://en.wikipedia.org/wiki/Npm_left-pad_incident
  • 22.
    Challenge Depending on trivialpackages • Trivial packages implement simple and trivial tasks Cf. left-pad and is-promise case study • Trivial packages are prominent They make up 16.8% of 230K studied packages • Developers perceive trivial packages as well implemented and well-tested • In reality, less than half of all trivial packages have tests! Why do developers use trivial packages? An empirical case study on npm. R Abdalkareem, O. Nourry, et al. (2017) ESEC/FSE conference
  • 23.
    Conclusion • Depending onreusable packages comes with a wide range of challenges • Problems may differ across package registries/managers due to different policies, tools, practices, ... • Partial solutions exist but cannot solve everything • Many opportunities for further empical research, tooling, awareness, standardisation ...