DO SOFTWARE DEVELOPERS
UNDERSTAND OPEN SOURCE LICENSES?
Daniel A. Almeida and
Gail C. Murphy
University of British
Columbia
Greg Wilson
Rangle.io
Mike Hoye
Mozilla
Corporation
1
@_DanielAlmeida
Database
Database
adapter
Web framework
Front-end
framework
PostgreSQL License GNU LGPL 3.0 BSD License MIT License
2
• Java applications using the Central Repository relied on more than 100
open source components in 2014. [1]
• More than 25 different licenses in use in a sample of Java GitHub
projects. [2]
• License compliance problems in 150+ products found by gpl-
violations.org. [3]
[1] Sonatype, ”2015 state of the software supply chain report: Hidden speed bumps on the road to ”continuous.
[2] C. Vendome, “A large scale study of license usage on GitHub”.
[3] A. Hemel, K. T. Kalleberg, R. Vermaas, and E. Dolstra, “Finding software license violations through binary code clone detection”
3
Do software developers understand
open source licenses?
• Survey consisting of 7 hypothetical software
development scenarios.
• 375 participants from many countries recruited on
social media and mailing lists.
• Our participants:
• Software developers (67%)
• At least 3 years of development experience (93%)
• Had to choose a project’s license before (85%)
• Responsible for licensing decisions (63%)
• Often contribute to open source projects (74%)
4
Licenses
5
Restrictive/copyleft: generally require the same
rights on derivative work (e.g., GNU GPL).
Weak copyleft: generally, not all of the derivative
work needs to be released under the copyleft
license.
Permissive: require only attribution, allowing
derivative work to be proprietary (e.g., MIT
License and BSD License).
Scenario 1
6
John has been working on ToDoApp, his own personal task management application. ToDoApp
will be used exclusively by John on his own computer. John will use LightDB to persist ToDoApp’s
data.
If LightDB is distributed under the following licenses, would John be allowed to use it as part of
ToDoApp?
LightDB
LICENSE
CHOICES
GNU GPL 3.0 Yes No Unsure
GNU LGPL 3.0 Yes No Unsure
MPL 2.0 Yes No Unsure
LightDB
ToDoApp
John
Scenario 1
7
Could you explain why you are not sure about
your answer for MPL 2.0?
LightDB
LICENSE
CHOICES
GNU GPL 3.0 Yes No Unsure
GNU LGPL 3.0 Yes No Unsure
MPL 2.0 Yes No Unsure “I’m not very familiar with MPL.”
Are there any assumptions you've made about this scenario?
Is anything unclear or confusing to you?
“This is all assuming John follows the details of each license.”
Scenario 2
8
LightDB
ToDoApp
If LightDB, the lightweight library used to persist ToDoApp’s data is
distributed under [LightDB LICENSE] would John be allowed to make
ToDoApp available under [ToDoApp LICENSE] ?
LightDB
LICENSE
ToDoApp
LICENSE
CHOICES
GPL GPL Yes No Unsure
GPL LGPL Yes No Unsure
GPL MPL Yes No Unsure
LGPL GPL Yes No Unsure
LGPL LGPL Yes No Unsure
… … …
Method
Our oracle: an intellectual property lawyer with more than a decade
of experience in software licensing.
Quantitative analysis: all 7 scenarios and its 45 cases.
Qualitative analysis: open-coding of the comments and assumptions
for cases where (1) over 30% of the participants disagreed with
expert; OR (2) at least 10% of the participants answered “Unsure”.
9
UNSURENOYES RIGHT ANSWER
10
Overview
• 7 scenarios (total of 45 cases) involving 3 open source licenses.
• Participants were given access to the licenses used in the survey
and were free to use external resources as needed.
• Participants agreed with our oracle in 26 out of 42 cases (62%).
• Open-coding focused on the 19 cases (and related scenarios)
where over 30% of the participants disagreed with our oracle or
at least 10% of the participants answered “Unsure”.
Observation #1
Developers cope well with single
licenses even in complex scenarios,
but have difficulty when more than
one license is in use.
11
UNSURENOYES RIGHT ANSWER
12
SIMPLE CASES COMPLEX CASES
Observation #2
Developers understand that how the software
is built affects license interactions, but don't
have a deep grasp of what technical details
matter.
13
Assumptions coding
14
Technical Assumption
(system structure,
deployment etc.)
Change Dependent
(what files are
modified)
AG: Authorship, I: Invalid, IQ: Invalid Question, LA: License Assumption, LI: License Interactions, PA: Patent Assumption,
SC: Specific Case, TeA: Term Assumption, U: Unsure
Scenario 2: comments coding
15
Technical Detail
(concerns about technical
aspects of the case)
A: Assumption, Am: Ambiguity, I: Invalid, LI: License Interactions, SC: Specific Case, U: Unsure
Scenario 2 (GPL-LGPL): comments
“It depends on how ToDoApp is distributed. If ToDoApp
was only distributed as source then this would be fine.
For binary distributions, if ToDoApp is statically linked
against LightDB it must be distributed under GPL. The
case is less clear for dynamically linked code - I
understand the FSF and other organizations disagree!”.
“I think it might depend on how the two libraries are
linked together”.
16
Observation #3
Developers don't have a solid understanding
of the intricacies of how licenses interact.
17
Assumptions coding
18
License Interactions
(ramifications of more
than one license)
License Assumption
(characteristics of
license)
AG: Authorship, CD: Change Dependent, I: Invalid, IQ: Invalid Question, PA: Patent Assumption, SC: Specific Case, TA:
Technical Assumption, TeA: Term Assumption, U: Unsure
Scenario 2: comments coding
19
License Interaction
(what actions are possible with more
than one license)
A: Assumption, Am: Ambiguity, I: Invalid, LI: License Interactions, SC: Specific Case, U: Unsure
Specific Case
(dual licensing or relicensing)
Scenario 2 (GPL-MPL): comments
“I don’t understand how the secondary license
restriction and GPL interact”
“MPL/(L)GPL dual licensing is popular, so I assume
there is a reason for that”
“Have not studied the details; generically expect
trouble when mixing non-GPL licenses with GPL so
would have guessed ’No’ if forced”
20
Other observations
• Questions that arise about the use of multiple open
source licenses are situationally dependent.
• A number of developers lack knowledge of the
details of open source licenses.
21
Implications
22
File A File B
DO DEVELOPERS UNDERSTAND OPEN SOURCE LICENSES?
• Cope well with single licenses even in complex scenarios,
but struggle when more than one license is in use.
• Understand that technical details affect license interactions.
• Don't have a deep grasp of what technical details matter or
of the intricacies of how licenses interact.
23
@_DanielAlmeida

Do software developers understand open source licenses?

  • 1.
    DO SOFTWARE DEVELOPERS UNDERSTANDOPEN SOURCE LICENSES? Daniel A. Almeida and Gail C. Murphy University of British Columbia Greg Wilson Rangle.io Mike Hoye Mozilla Corporation 1 @_DanielAlmeida
  • 2.
  • 3.
    • Java applicationsusing the Central Repository relied on more than 100 open source components in 2014. [1] • More than 25 different licenses in use in a sample of Java GitHub projects. [2] • License compliance problems in 150+ products found by gpl- violations.org. [3] [1] Sonatype, ”2015 state of the software supply chain report: Hidden speed bumps on the road to ”continuous. [2] C. Vendome, “A large scale study of license usage on GitHub”. [3] A. Hemel, K. T. Kalleberg, R. Vermaas, and E. Dolstra, “Finding software license violations through binary code clone detection” 3 Do software developers understand open source licenses?
  • 4.
    • Survey consistingof 7 hypothetical software development scenarios. • 375 participants from many countries recruited on social media and mailing lists. • Our participants: • Software developers (67%) • At least 3 years of development experience (93%) • Had to choose a project’s license before (85%) • Responsible for licensing decisions (63%) • Often contribute to open source projects (74%) 4
  • 5.
    Licenses 5 Restrictive/copyleft: generally requirethe same rights on derivative work (e.g., GNU GPL). Weak copyleft: generally, not all of the derivative work needs to be released under the copyleft license. Permissive: require only attribution, allowing derivative work to be proprietary (e.g., MIT License and BSD License).
  • 6.
    Scenario 1 6 John hasbeen working on ToDoApp, his own personal task management application. ToDoApp will be used exclusively by John on his own computer. John will use LightDB to persist ToDoApp’s data. If LightDB is distributed under the following licenses, would John be allowed to use it as part of ToDoApp? LightDB LICENSE CHOICES GNU GPL 3.0 Yes No Unsure GNU LGPL 3.0 Yes No Unsure MPL 2.0 Yes No Unsure LightDB ToDoApp John
  • 7.
    Scenario 1 7 Could youexplain why you are not sure about your answer for MPL 2.0? LightDB LICENSE CHOICES GNU GPL 3.0 Yes No Unsure GNU LGPL 3.0 Yes No Unsure MPL 2.0 Yes No Unsure “I’m not very familiar with MPL.” Are there any assumptions you've made about this scenario? Is anything unclear or confusing to you? “This is all assuming John follows the details of each license.”
  • 8.
    Scenario 2 8 LightDB ToDoApp If LightDB,the lightweight library used to persist ToDoApp’s data is distributed under [LightDB LICENSE] would John be allowed to make ToDoApp available under [ToDoApp LICENSE] ? LightDB LICENSE ToDoApp LICENSE CHOICES GPL GPL Yes No Unsure GPL LGPL Yes No Unsure GPL MPL Yes No Unsure LGPL GPL Yes No Unsure LGPL LGPL Yes No Unsure … … …
  • 9.
    Method Our oracle: anintellectual property lawyer with more than a decade of experience in software licensing. Quantitative analysis: all 7 scenarios and its 45 cases. Qualitative analysis: open-coding of the comments and assumptions for cases where (1) over 30% of the participants disagreed with expert; OR (2) at least 10% of the participants answered “Unsure”. 9
  • 10.
    UNSURENOYES RIGHT ANSWER 10 Overview •7 scenarios (total of 45 cases) involving 3 open source licenses. • Participants were given access to the licenses used in the survey and were free to use external resources as needed. • Participants agreed with our oracle in 26 out of 42 cases (62%). • Open-coding focused on the 19 cases (and related scenarios) where over 30% of the participants disagreed with our oracle or at least 10% of the participants answered “Unsure”.
  • 11.
    Observation #1 Developers copewell with single licenses even in complex scenarios, but have difficulty when more than one license is in use. 11
  • 12.
  • 13.
    Observation #2 Developers understandthat how the software is built affects license interactions, but don't have a deep grasp of what technical details matter. 13
  • 14.
    Assumptions coding 14 Technical Assumption (systemstructure, deployment etc.) Change Dependent (what files are modified) AG: Authorship, I: Invalid, IQ: Invalid Question, LA: License Assumption, LI: License Interactions, PA: Patent Assumption, SC: Specific Case, TeA: Term Assumption, U: Unsure
  • 15.
    Scenario 2: commentscoding 15 Technical Detail (concerns about technical aspects of the case) A: Assumption, Am: Ambiguity, I: Invalid, LI: License Interactions, SC: Specific Case, U: Unsure
  • 16.
    Scenario 2 (GPL-LGPL):comments “It depends on how ToDoApp is distributed. If ToDoApp was only distributed as source then this would be fine. For binary distributions, if ToDoApp is statically linked against LightDB it must be distributed under GPL. The case is less clear for dynamically linked code - I understand the FSF and other organizations disagree!”. “I think it might depend on how the two libraries are linked together”. 16
  • 17.
    Observation #3 Developers don'thave a solid understanding of the intricacies of how licenses interact. 17
  • 18.
    Assumptions coding 18 License Interactions (ramificationsof more than one license) License Assumption (characteristics of license) AG: Authorship, CD: Change Dependent, I: Invalid, IQ: Invalid Question, PA: Patent Assumption, SC: Specific Case, TA: Technical Assumption, TeA: Term Assumption, U: Unsure
  • 19.
    Scenario 2: commentscoding 19 License Interaction (what actions are possible with more than one license) A: Assumption, Am: Ambiguity, I: Invalid, LI: License Interactions, SC: Specific Case, U: Unsure Specific Case (dual licensing or relicensing)
  • 20.
    Scenario 2 (GPL-MPL):comments “I don’t understand how the secondary license restriction and GPL interact” “MPL/(L)GPL dual licensing is popular, so I assume there is a reason for that” “Have not studied the details; generically expect trouble when mixing non-GPL licenses with GPL so would have guessed ’No’ if forced” 20
  • 21.
    Other observations • Questionsthat arise about the use of multiple open source licenses are situationally dependent. • A number of developers lack knowledge of the details of open source licenses. 21
  • 22.
  • 23.
    DO DEVELOPERS UNDERSTANDOPEN SOURCE LICENSES? • Cope well with single licenses even in complex scenarios, but struggle when more than one license is in use. • Understand that technical details affect license interactions. • Don't have a deep grasp of what technical details matter or of the intricacies of how licenses interact. 23 @_DanielAlmeida

Editor's Notes

  • #3 + Reuse of high-quality components + Fast production of software + Low cost
  • #6 Why these licenses: Common licenses in use Range from restrictive to permissive Different resulting restrictions (GPL vs LGPL) GPL: strong copyleft, requires licensed works or modifications to be open source LGPL: weak copyleft. Mostly used for shared libraries. Allows us to use the licensed library without making the rest of the code/product open source MPL: weak copyleft. Similar to LGPL, but at a file level.
  • #10 Quantitative analysis of all scenarios: to account for some degree of ambiguity, we consider that the participants did well when at least 70% of the answers matched the expert’s. We focused our qualitative analysis on the cases where participants disagreed with the expert (that is, less than 70% of them agreed) or more than 10% of them answered “Unsure”
  • #11 Four scenarios for which there are cases where more than 30% of the participants disagreed with the expert... or cases where more than 10% of them answered “Unsure” We noticed an issue with Scenario 5. Our legal expert and the participants made such different assumptions that we decided to focus our analysis on the other scenarios.
  • #13 Four scenarios for which there are cases where more than 30% of the participants disagreed with the expert... or cases where more than 10% of them answered “Unsure” We noticed an issue with Scenario 5. Our legal expert and the participants made such different assumptions that we decided to focus our analysis on the other scenarios.
  • #22 Different combinations of codes appear for the same license combinations (e.g., S2-GPL-LGPL had a lot of LI, but that was not an issue for S3-GPL-LGPL or S6-LGPL-GPL). The difference is in how the software is used, changed and combined. The most frequent code for the case comments was Unsure, indicating a lack of knowledge of the licenses used in this survey.
  • #23 German and Hassan: models for identifying possible mismatches and a number of “patterns of integration”. Vendome and Poshyvanyk: find, explain and recommend a fix for license incompatibility (either license change or code restructuring). Our participants described ways of restructuring their code or the open source component’s code to avoid license incompatibility. We believe we may need a more robust recommender system that is able to formally model license interactions in terms of how it’s used in the code and in which other ways it could be used. A model such as the one introduced by Alspaugh and others might be a starting point to build tools that can recommend how the code can be refactored to resolve license compliance issues.
  • #24  Open source software is not a small, self-contained set of licenses and components. Software developers struggle to interpret the implications of license interactions and the relevant technical details. We need tools to help developers identify and resolve license incompatibility issues. Open source components are released under a variety of licenses and used in closed and open-source projects. There are thousands of components released under many different licenses. Developers have a good understanding of at least three licenses, but in many cases they struggle to identify the relevant technical details and correctly interpret the license interactions. We need tools to help developers identify and solve license incompatibility problems. One possibility, based on existing work by Germán and Hassan, is to use formal models to identify mismatches. We can go further and help developers change the code structures that are causing the mismatch.