Prevalence and Evolution of License Violations in npm and RubyGems Dependency Networks
1. Prevalence and Evolution of License Violations
in npm and RubyGems Dependency Networks
Ilyas Saïd Makari Ilyas.Said.Makari@vub.be
Ahmed Zerouali Ahmed.Zerouali@vub.be
Coen De Roover Coen.De.Roover@vub.be
The International Conference on Software and Systems Reuse (ICSR)
Virtual (originally Montpellier, France) - June 15-17, 2022
3. Open source software can be distributed with varying degrees of freedom
● 1. Public domain
○ All rights are granted with no conditions whatsoever
○ For example: “The Unlicense”
● 2. Permissive licenses
○ Little restrictions imposed
○ Must include copyright notice from original author
○ MIT, Apache, BSD, etc
● 3. Restrictive licenses (copyleft)
Background
4. ● Strong copyleft licenses
○ All derivatives of the original work should be released under the same license
○ For example: GNU General Public License (GPL)
● Weak copyleft licenses
○ Exception when work is used as independent building block
○ For example: GNU Lesser General Public License (LGPL)
Background
5. Not all licenses may be legally combined in one software package
- For example: MIT (permissive) is compatible with GPL (restrictive), but not vice
versa.
We call a license A “one-way compatible” with license B, if software that
contains packages from both licenses may be legally licensed under
license B.
Background
10. Method
A new license Compatibility Matrix, based on:
1. Kapitsaki et al. [*]
[*] Georgia M Kapitsaki, Frederik Kramer, and Nikolaos D Tselikas. Automating the license compatibility process in open source
software with spdx. Journal of Systems and Software, 131:386–401, 2017.
Compatibility graph from Kapitsaki et al. [*]
Then, we manually included information from:
1. Free Software Foundation
2. The European Commission
=> answer for 1,681 pairs of licenses, from which 205 (12.2%) are labeled as “Unknown”
11. Research questions
RQ1: What are the most prevalent licenses in package repositories?
○ What are the climates of each ecosystem?
○ Permissive or restrictive climate?
○ Does it influence the number of incompatibilities?
RQ2: To which extent do packages rely on direct dependencies with incompatible
licenses?
○ How prevalent are license violations on the first dependency tree level?
RQ3: How does license incompatibility spread across package dependency
networks?
○ How prevalent are license violations on each dependency tree level?
12. Case studies
~750k packages (latest release)
~3.5M direct runtime
dependencies
~95k packages (latest release)
~211k direct runtime
dependencies
13. Open Data:
- Libraries.io gathers data from 32 package managers and 3 source code
repositories.
- They monitor over 5.4M unique open source packages, and more than 500M
interdependencies between them.
Dataset
15. Case studies
~750k packages
~3.5M direct dependencies
On January 12th, 2020
~ 66.4 M (all) runtime dependencies
~7.3% of the packages have
dependencies with incompatible
licenses,
~95k packages
~211k direct dependencies
On January 12th, 2020:
~ 1.2M (all) runtime dependencies
~13.9% of the packages have
dependencies with incompatible
licenses,
16. Research questions
RQ1: What are the most prevalent licenses in package repositories?
- MIT is the most popular license in npm and RubyGems.
17. Research questions
RQ1: What are the most prevalent licenses in package repositories?
- MIT has been popular in npm since its beginning.
- ISC becoming the new default license for npm packages increased its
popularity.
18. Research questions
RQ1: What are the most prevalent licenses in package repositories?
- MIT gradually evolved into the most popular license.
- Over the last few years, Apache has become the second most popular
license choice within the RubyGems ecosystem.
19. Research questions
RQ2: To which extent do packages rely on direct dependencies with incompatible
licenses?
- Only 0.9% and 4.3% of npm and RubyGems dependencies have licenses
that are incompatible with those of their dependents, respectively.
- The most common pair of incompatible licenses is MIT with GPL.
20. Research questions
RQ3: How does license incompatibility spread across package dependency networks?
- npm packages have more indirect dependencies with incompatible licenses than
RubyGems (due to the high number of dependencies that npm packages include.)
- However, RubyGems has proportionally more incompatible indirect dependencies
than npm.
21. Research questions
RQ3: How does license incompatibility spread across package dependency networks?
- The number of dependencies without a license decreases from one level to the
next as we go deeper in both package repositories.
22. Tooling
Screenshot of the license compatibility checking tool.
https://doi.org/10.5281/zenodo.5913761
23. Conclusion
● Deeper-level dependencies cause fewer incompatibilities than those at the shallow levels.
● GPL dependencies are the major cause for incompatibilities, and they are more present in the first
level of dependency trees.
● We found that a set of packages created by a single organization can influence an ecosystem when it
consistently releases useful packages under a particular license.
● Our results help in understanding the state of license incompatibilities in software package
ecosystems.
24. Threats to validity
● Libraries.io dataset.
● Various sources of information to construct our license compatibility matrix.
● We only considered the license of the latest release of each package.
● Many packages do not have any license.