Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Do software developers understand open source licenses?

187 views

Published on

Talk at ICPC 2017.

Abstract: Software provided under open source licenses is widely used, from forming high-profile stand-alone applications (e.g., Mozilla Firefox) to being embedded in commercial offerings (e.g., network routers). Despite the high frequency of use of open source licenses, there has been little work about whether software developers understand the open source licenses they use. To help fill the gap of whether or not developers understand the open source licenses they use, we conducted a survey that posed development scenarios involving three popular open source licenses (GNU GPL 3.0, GNU LGPL 3.0 and MPL 2.0) both alone and in combination.

Published in: Software
  • Be the first to comment

Do software developers understand open source licenses?

  1. 1. DO SOFTWARE DEVELOPERS UNDERSTAND OPEN SOURCE LICENSES? Daniel A. Almeida and Gail C. Murphy University of British Columbia Greg Wilson Rangle.io Mike Hoye Mozilla Corporation 1 @_DanielAlmeida
  2. 2. Database Database adapter Web framework Front-end framework PostgreSQL License GNU LGPL 3.0 BSD License MIT License 2
  3. 3. • Java applications using the Central Repository relied on more than 100 open source components in 2014. [1] • More than 25 different licenses in use in a sample of Java GitHub projects. [2] • License compliance problems in 150+ products found by gpl- violations.org. [3] [1] Sonatype, ”2015 state of the software supply chain report: Hidden speed bumps on the road to ”continuous. [2] C. Vendome, “A large scale study of license usage on GitHub”. [3] A. Hemel, K. T. Kalleberg, R. Vermaas, and E. Dolstra, “Finding software license violations through binary code clone detection” 3 Do software developers understand open source licenses?
  4. 4. • Survey consisting of 7 hypothetical software development scenarios. • 375 participants from many countries recruited on social media and mailing lists. • Our participants: • Software developers (67%) • At least 3 years of development experience (93%) • Had to choose a project’s license before (85%) • Responsible for licensing decisions (63%) • Often contribute to open source projects (74%) 4
  5. 5. Licenses 5 Restrictive/copyleft: generally require the same rights on derivative work (e.g., GNU GPL). Weak copyleft: generally, not all of the derivative work needs to be released under the copyleft license. Permissive: require only attribution, allowing derivative work to be proprietary (e.g., MIT License and BSD License).
  6. 6. Scenario 1 6 John has been working on ToDoApp, his own personal task management application. ToDoApp will be used exclusively by John on his own computer. John will use LightDB to persist ToDoApp’s data. If LightDB is distributed under the following licenses, would John be allowed to use it as part of ToDoApp? LightDB LICENSE CHOICES GNU GPL 3.0 Yes No Unsure GNU LGPL 3.0 Yes No Unsure MPL 2.0 Yes No Unsure LightDB ToDoApp John
  7. 7. Scenario 1 7 Could you explain why you are not sure about your answer for MPL 2.0? LightDB LICENSE CHOICES GNU GPL 3.0 Yes No Unsure GNU LGPL 3.0 Yes No Unsure MPL 2.0 Yes No Unsure “I’m not very familiar with MPL.” Are there any assumptions you've made about this scenario? Is anything unclear or confusing to you? “This is all assuming John follows the details of each license.”
  8. 8. Scenario 2 8 LightDB ToDoApp If LightDB, the lightweight library used to persist ToDoApp’s data is distributed under [LightDB LICENSE] would John be allowed to make ToDoApp available under [ToDoApp LICENSE] ? LightDB LICENSE ToDoApp LICENSE CHOICES GPL GPL Yes No Unsure GPL LGPL Yes No Unsure GPL MPL Yes No Unsure LGPL GPL Yes No Unsure LGPL LGPL Yes No Unsure … … …
  9. 9. Method Our oracle: an intellectual property lawyer with more than a decade of experience in software licensing. Quantitative analysis: all 7 scenarios and its 45 cases. Qualitative analysis: open-coding of the comments and assumptions for cases where (1) over 30% of the participants disagreed with expert; OR (2) at least 10% of the participants answered “Unsure”. 9
  10. 10. UNSURENOYES RIGHT ANSWER 10 Overview • 7 scenarios (total of 45 cases) involving 3 open source licenses. • Participants were given access to the licenses used in the survey and were free to use external resources as needed. • Participants agreed with our oracle in 26 out of 42 cases (62%). • Open-coding focused on the 19 cases (and related scenarios) where over 30% of the participants disagreed with our oracle or at least 10% of the participants answered “Unsure”.
  11. 11. Observation #1 Developers cope well with single licenses even in complex scenarios, but have difficulty when more than one license is in use. 11
  12. 12. UNSURENOYES RIGHT ANSWER 12 SIMPLE CASES COMPLEX CASES
  13. 13. Observation #2 Developers understand that how the software is built affects license interactions, but don't have a deep grasp of what technical details matter. 13
  14. 14. Assumptions coding 14 Technical Assumption (system structure, deployment etc.) Change Dependent (what files are modified) AG: Authorship, I: Invalid, IQ: Invalid Question, LA: License Assumption, LI: License Interactions, PA: Patent Assumption, SC: Specific Case, TeA: Term Assumption, U: Unsure
  15. 15. Scenario 2: comments coding 15 Technical Detail (concerns about technical aspects of the case) A: Assumption, Am: Ambiguity, I: Invalid, LI: License Interactions, SC: Specific Case, U: Unsure
  16. 16. Scenario 2 (GPL-LGPL): comments “It depends on how ToDoApp is distributed. If ToDoApp was only distributed as source then this would be fine. For binary distributions, if ToDoApp is statically linked against LightDB it must be distributed under GPL. The case is less clear for dynamically linked code - I understand the FSF and other organizations disagree!”. “I think it might depend on how the two libraries are linked together”. 16
  17. 17. Observation #3 Developers don't have a solid understanding of the intricacies of how licenses interact. 17
  18. 18. Assumptions coding 18 License Interactions (ramifications of more than one license) License Assumption (characteristics of license) AG: Authorship, CD: Change Dependent, I: Invalid, IQ: Invalid Question, PA: Patent Assumption, SC: Specific Case, TA: Technical Assumption, TeA: Term Assumption, U: Unsure
  19. 19. Scenario 2: comments coding 19 License Interaction (what actions are possible with more than one license) A: Assumption, Am: Ambiguity, I: Invalid, LI: License Interactions, SC: Specific Case, U: Unsure Specific Case (dual licensing or relicensing)
  20. 20. Scenario 2 (GPL-MPL): comments “I don’t understand how the secondary license restriction and GPL interact” “MPL/(L)GPL dual licensing is popular, so I assume there is a reason for that” “Have not studied the details; generically expect trouble when mixing non-GPL licenses with GPL so would have guessed ’No’ if forced” 20
  21. 21. Other observations • Questions that arise about the use of multiple open source licenses are situationally dependent. • A number of developers lack knowledge of the details of open source licenses. 21
  22. 22. Implications 22 File A File B
  23. 23. DO DEVELOPERS UNDERSTAND OPEN SOURCE LICENSES? • Cope well with single licenses even in complex scenarios, but struggle when more than one license is in use. • Understand that technical details affect license interactions. • Don't have a deep grasp of what technical details matter or of the intricacies of how licenses interact. 23 @_DanielAlmeida

×