SlideShare a Scribd company logo
1 of 17
Download to read offline
Code siblings: technical and legal
                implications of copying code
                   Between applications

            Daniel German, Massimiliano Di Penta, Yann-Gaël
               Guéhéneuc, and Giuliano (Giulio) Antoniol




MSR 2009,
Vancouver
     1/17
The Challenge

            Code, as any other artistic production, is
            regulated by copyright law

            Companies own the property of source code

            Free and open source software (FOSS) model
            is different

            Copying 27 LOC out of 525 KLOC resulted in a
            copyright infringement

            Users and companies must be aware of copyright
            law and ownership
MSR 2009,
Vancouver
     2/17
Code Has Preferential Migration Flows




MSR 2009,
Vancouver
     3/17
License Types

            Permissive – the MIT/X11 and BSD licenses
               Minor constraints on the licensee
               Inclusion of fragments in a system under a different license
                   BSD licensed fragments can be included in proprietary systems.
               CAVEAT! Multiple BSD licenses: original BSD (4-clauses
               BSD), the new BSD (3-clauses BSD), and the 2-clauses BSD
               Code licensed under the original 4-clauses BSD cannot
               be included inside systems licensed under the GPL


            Reciprocal – GNU variants
               Any system that includes the fragments must be licensed
               under the same license
               GPL-licensed fragments can only be included in systems
               licensed under the same version of the GPL

MSR 2009,
Vancouver
     4/17
The Scale of the Problem

            Widely adopted systems are in the range of
            MLOC and thousands of files

            If 27LOC in 525KLOC lead to copyright
            infringement
              Companies implication in reusing code
              End user implications


            We are like detectives
              Help monitoring and detecting license inconsistencies
              Help monitoring and identifying inconsistent licenses in
              code fragments
MSR 2009,
Vancouver
     5/17
Empirical Study

            Code siblings: code fragments that migrated from
            one system to another and then evolved following
            their own paths

            Three *nix kernels
              Linux ~7MLOC and 20,000 files
              FreeBSB ~8MLOC and 21,000 files
              OpenBSD ~2MLOC and 5,500 files


            Overall Size as of Jan. 2009, 17MLOC


MSR 2009,
Vancouver
     6/17
Research Questions

            RQ1: What kinds of open source licenses are
            used in the three kernels?

            RQ2: How many potential siblings exist between
            the BSD kernels and the Linux kernel?

            RQ3: What licenses are used by siblings and, if
            different, why?




MSR 2009,
Vancouver
     7/17
Technologies and Setup

            Clone detection tool
               CCFinderX tool
               Min 100 tokens
               Parse only .c files
               Concentrate on pair of files sharing a high percentage
               of common code fragment, least ~30%, i.e., ~20LOC
               Prune files mapped into more than five siblings


            License detection and identification
               First comment(s)
               FoSSology version 1.0.0
                  78 different license variants
                  Added 5 more licenses

MSR 2009,
Vancouver
     8/17
Sibling(s) Origin

               Identify current siblings
               Trace back into past siblings – their code
               fragments in the same files
               When they disappear, then we have their origins
               Take the oldest of the two as the true origin

    Sys 1 – File i

                                   Cloned fragments
                     Migration                               siblings
                     direction
    Sys 2 – File j

MSR 2009,
Vancouver                                 Cloned fragments
     9/17
RQ1: Kinds of open source licenses

            Linux… is Linux… 65% of GPL files plus 25% of
            files “promoted” to GPL by L. Torvald
              A few files (35) have two licenses
            FreeBSD 75% of the files with BSD license
              189 files (5%) with no license
              179 files with a corporate license (Intel licenses)
              167 files with MIT license
              A few multiple licenses – 19 BSD and GPL, 15 BSD and
              Educational, 14 MIT and GPL
            OpenBSD 76 % BSD licenses
              295 files (9%) with a MIT license, 179 with an
              educational license
              138 (84%) without license
              59 files with BSD and Educational, 25 with MIT and
MSR 2009,     BSD, and 14 with BSD and GPL
Vancouver
    10/17
RQ2: Siblings between kernels
2500



2000



1500

                                                                                        FreeBSD vs.Linux
                                                                                        OpenBSD vs. Linux
1000

                                                                                                                        Siblings
 500



   0
       Clone pairs   Files Linux   Files BSD        File Pairs File Pairs (same name)
                                                                                                                        Filtered siblings
                            250



                            200



                            150

                                                                                                                                     FreeBSD vs. Linux
                                                                                                                                     OpenBSD vs. Linux
                            100



                             50

MSR 2009,
Vancouver                      0
       11/17                          Files Linux                 Files BSD               File Pairs        File Pairs (same name)
RQ3: Code Migration and Licenses
FreeBSD        Linux     Files    Before Jan 1, 2002
BSD            GPL            8
                                  Almost nothing after
BSD            MIT            2
                                                     OpenBSD       Linux     Files
BSD            None           2
                                                     BSD           BSD+GPL           1
Corporate      BSD+GPL       89
                                                     BSD           MIT               2
GPL            None           1
Phrase         BSD+GPL        1                      BSD           Unknown           1
X.Net+BSD      MIT            1                      BSD+GPL    GPL               1
                                                     BSD+Phrase Phrase+GPL        1
                                                     MIT        GPL              23



                    Linux      FreeBSD     Files
                    BSD+GPL    Corporate          8
                    GPL        BSD               17
                    GPL        BSD+GPL            1
                    GPL        CPL+BSD+GPL        1      After Jan 1, 2002
                    MIT        BSD                1      Nothing before
                    MIT+GPL    None               2
   MSR 2009,
                    None       BSD                1
   Vancouver
       12/17        Phrase+GPL MIT                2
AIC7xxx Maintaining Siblings

            1994: Linux AIC7xxx series SCSI adapters
            1995: Linux code is incorporated into an
            OpenBSD driver
            1996: NetBSD driver is ported to FreeBSD
               #ifdef to maintain the variants
            1997: A mailing list is created in FreeBSD to unify
            the efforts of people in the different kernels
               The major development of the driver seems to happen
               in FreeBSD
            2000: Development propagates to Linux,
            NetBSD, and OpenBSD
            Today: Development mostly Linux and FreeBSD
MSR 2009,
Vancouver
    13/17
GPC code in FreeBSD

            2002: Silicon Graphics xfs file system integrated
            into Linux
            Dec 12, 2005 xfs appears in FreeBSD
               The license of xfs is GPL
               FreeBSD is licensed under the 2-clause BSD
               Including xfs in a BSD kernel requires the kernel to be
               under the GPL too a


            Compiling GPL-licensed code into the kernel
            makes it “RESTRICTED”
               It can no longer be distributed in binary form, its source
               code be made available for mirroring

MSR 2009,
Vancouver
    14/17
License Defects

             FreeBSD rdma_cma.c / Linux cdma.c are siblings
             In Linux, it appeared on Jun 17, 2006, with 64 changes plus
             including 8 changes after it appeared in FreeBSD
             The Linux sibling is licensed under GPL v2 and the 2-
             clause BSD licenses
             The FreeBSD sibling is licensed under the terms of the new
             BSD license, the GPL v2, and Commons Public License
             Original license still present in FreeBSD
             Linux license was changed:

            commit a9474917099e007c0f51d5474394b5890111614f
            Author: Sean Hefty <sean.hefty@intel.com>
            Date: Mon Jul 14 23:48:43 2008 -0700
            RDMA: Fix license text
            The license text for several files references a third software license
MSR 2009,   that was inadvertently copied in. Update the license to what was
Vancouver   intended. This update was based on a request from HP. [..]
    15/17
Conclusion

            Code move and code siblings do exist
            Siblings have a preferential flow
               Initially from BSD(s) to Linux – frequent
               Today from Linux to FreeBSD – less frequent
            Companies directly contribute to code in different
            kernels – see Intel drivers with dual licenses

            Managing siblings is a difficult problem




MSR 2009,
Vancouver
    16/17
If you don’t monitor code may sneak in …




                      Questions ?




MSR 2009,
Vancouver
    17/17

More Related Content

Similar to MSR09.ppt

Ait 1-1 Berkely Unix Operating System
Ait 1-1 Berkely Unix Operating SystemAit 1-1 Berkely Unix Operating System
Ait 1-1 Berkely Unix Operating SystemEmman Paolo Nuñez
 
Ait 1 1 group 3 berkeley unix
Ait 1 1 group 3 berkeley unixAit 1 1 group 3 berkeley unix
Ait 1 1 group 3 berkeley unixjayson
 
Ait 1 1 group 3 berkeley unix
Ait 1 1 group 3 berkeley unixAit 1 1 group 3 berkeley unix
Ait 1 1 group 3 berkeley unixdraperMarmilyn
 
Ait 1 1 group 3 berkeley unix
Ait 1 1 group 3 berkeley unixAit 1 1 group 3 berkeley unix
Ait 1 1 group 3 berkeley unixbravnel
 
Ait 1 1 group 3 berkeley unix
Ait 1 1 group 3 berkeley unixAit 1 1 group 3 berkeley unix
Ait 1 1 group 3 berkeley unixanthonycaya2009
 
FreeBSD - LinuxExpo
FreeBSD - LinuxExpoFreeBSD - LinuxExpo
FreeBSD - LinuxExpowebuploader
 
Towards a Census of Free and Open Source Licenses
Towards a Census of Free and Open Source LicensesTowards a Census of Free and Open Source Licenses
Towards a Census of Free and Open Source Licensesdmgerman
 
You think you're not a target? A tale of three developers...
You think you're not a target? A tale of three developers...You think you're not a target? A tale of three developers...
You think you're not a target? A tale of three developers...Speck&Tech
 
Software Licensing: A Minefield Guide - Andrey Listochkin
Software Licensing:  A Minefield Guide - Andrey ListochkinSoftware Licensing:  A Minefield Guide - Andrey Listochkin
Software Licensing: A Minefield Guide - Andrey ListochkinRuby Meditation
 
Prevalence and Evolution of License Violations in npm and RubyGems Dependency...
Prevalence and Evolution of License Violations in npm and RubyGems Dependency...Prevalence and Evolution of License Violations in npm and RubyGems Dependency...
Prevalence and Evolution of License Violations in npm and RubyGems Dependency...Ahmed Zerouali
 
Starting with Linux
Starting with LinuxStarting with Linux
Starting with LinuxMesi Rendón
 
BSD for Linux Users
BSD for Linux UsersBSD for Linux Users
BSD for Linux UsersDru Lavigne
 
Scale 2010: BSD for Linux Users
Scale 2010: BSD for Linux UsersScale 2010: BSD for Linux Users
Scale 2010: BSD for Linux UsersDru Lavigne
 
Open Source Business Models
Open Source Business ModelsOpen Source Business Models
Open Source Business ModelsMotaz Saad
 

Similar to MSR09.ppt (20)

Msr09.ppt
Msr09.pptMsr09.ppt
Msr09.ppt
 
Ait 1-1 Berkely Unix Operating System
Ait 1-1 Berkely Unix Operating SystemAit 1-1 Berkely Unix Operating System
Ait 1-1 Berkely Unix Operating System
 
Ait 1 1 group 3 berkeley unix
Ait 1 1 group 3 berkeley unixAit 1 1 group 3 berkeley unix
Ait 1 1 group 3 berkeley unix
 
Ait 1 1 group 3 berkeley unix
Ait 1 1 group 3 berkeley unixAit 1 1 group 3 berkeley unix
Ait 1 1 group 3 berkeley unix
 
Ait 1 1 group 3 berkeley unix
Ait 1 1 group 3 berkeley unixAit 1 1 group 3 berkeley unix
Ait 1 1 group 3 berkeley unix
 
Ait 1 1 group 3 berkeley unix
Ait 1 1 group 3 berkeley unixAit 1 1 group 3 berkeley unix
Ait 1 1 group 3 berkeley unix
 
FreeBSD - LinuxExpo
FreeBSD - LinuxExpoFreeBSD - LinuxExpo
FreeBSD - LinuxExpo
 
Towards a Census of Free and Open Source Licenses
Towards a Census of Free and Open Source LicensesTowards a Census of Free and Open Source Licenses
Towards a Census of Free and Open Source Licenses
 
You think you're not a target? A tale of three developers...
You think you're not a target? A tale of three developers...You think you're not a target? A tale of three developers...
You think you're not a target? A tale of three developers...
 
bsd
bsdbsd
bsd
 
Intro to linux
Intro to linuxIntro to linux
Intro to linux
 
bsd-group1
bsd-group1bsd-group1
bsd-group1
 
Presentation1
Presentation1Presentation1
Presentation1
 
Software Licensing: A Minefield Guide - Andrey Listochkin
Software Licensing:  A Minefield Guide - Andrey ListochkinSoftware Licensing:  A Minefield Guide - Andrey Listochkin
Software Licensing: A Minefield Guide - Andrey Listochkin
 
Prevalence and Evolution of License Violations in npm and RubyGems Dependency...
Prevalence and Evolution of License Violations in npm and RubyGems Dependency...Prevalence and Evolution of License Violations in npm and RubyGems Dependency...
Prevalence and Evolution of License Violations in npm and RubyGems Dependency...
 
Starting with Linux
Starting with LinuxStarting with Linux
Starting with Linux
 
BSD for Linux Users
BSD for Linux UsersBSD for Linux Users
BSD for Linux Users
 
Scale 2010: BSD for Linux Users
Scale 2010: BSD for Linux UsersScale 2010: BSD for Linux Users
Scale 2010: BSD for Linux Users
 
Open Source Business Models
Open Source Business ModelsOpen Source Business Models
Open Source Business Models
 
Linux
Linux Linux
Linux
 

More from Ptidej Team

From IoT to Software Miniaturisation
From IoT to Software MiniaturisationFrom IoT to Software Miniaturisation
From IoT to Software MiniaturisationPtidej Team
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel BriandPtidej Team
 
Manel Abdellatif
Manel AbdellatifManel Abdellatif
Manel AbdellatifPtidej Team
 
Azadeh Kermansaravi
Azadeh KermansaraviAzadeh Kermansaravi
Azadeh KermansaraviPtidej Team
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel GrichiPtidej Team
 
Cristiano Politowski
Cristiano PolitowskiCristiano Politowski
Cristiano PolitowskiPtidej Team
 
Will io t trigger the next software crisis
Will io t trigger the next software crisisWill io t trigger the next software crisis
Will io t trigger the next software crisisPtidej Team
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptPtidej Team
 
Thesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptThesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptPtidej Team
 

More from Ptidej Team (20)

From IoT to Software Miniaturisation
From IoT to Software MiniaturisationFrom IoT to Software Miniaturisation
From IoT to Software Miniaturisation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel Briand
 
Manel Abdellatif
Manel AbdellatifManel Abdellatif
Manel Abdellatif
 
Azadeh Kermansaravi
Azadeh KermansaraviAzadeh Kermansaravi
Azadeh Kermansaravi
 
Mouna Abidi
Mouna AbidiMouna Abidi
Mouna Abidi
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel Grichi
 
Cristiano Politowski
Cristiano PolitowskiCristiano Politowski
Cristiano Politowski
 
Will io t trigger the next software crisis
Will io t trigger the next software crisisWill io t trigger the next software crisis
Will io t trigger the next software crisis
 
MIPA
MIPAMIPA
MIPA
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.ppt
 
Thesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptThesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.ppt
 
Medicine15.ppt
Medicine15.pptMedicine15.ppt
Medicine15.ppt
 
Qrs17b.ppt
Qrs17b.pptQrs17b.ppt
Qrs17b.ppt
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Icpc11c.ppt
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
 
Msr17a.ppt
Msr17a.pptMsr17a.ppt
Msr17a.ppt
 
Icsoc15.ppt
Icsoc15.pptIcsoc15.ppt
Icsoc15.ppt
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 

Recently uploaded (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 

MSR09.ppt

  • 1. Code siblings: technical and legal implications of copying code Between applications Daniel German, Massimiliano Di Penta, Yann-Gaël Guéhéneuc, and Giuliano (Giulio) Antoniol MSR 2009, Vancouver 1/17
  • 2. The Challenge Code, as any other artistic production, is regulated by copyright law Companies own the property of source code Free and open source software (FOSS) model is different Copying 27 LOC out of 525 KLOC resulted in a copyright infringement Users and companies must be aware of copyright law and ownership MSR 2009, Vancouver 2/17
  • 3. Code Has Preferential Migration Flows MSR 2009, Vancouver 3/17
  • 4. License Types Permissive – the MIT/X11 and BSD licenses Minor constraints on the licensee Inclusion of fragments in a system under a different license BSD licensed fragments can be included in proprietary systems. CAVEAT! Multiple BSD licenses: original BSD (4-clauses BSD), the new BSD (3-clauses BSD), and the 2-clauses BSD Code licensed under the original 4-clauses BSD cannot be included inside systems licensed under the GPL Reciprocal – GNU variants Any system that includes the fragments must be licensed under the same license GPL-licensed fragments can only be included in systems licensed under the same version of the GPL MSR 2009, Vancouver 4/17
  • 5. The Scale of the Problem Widely adopted systems are in the range of MLOC and thousands of files If 27LOC in 525KLOC lead to copyright infringement Companies implication in reusing code End user implications We are like detectives Help monitoring and detecting license inconsistencies Help monitoring and identifying inconsistent licenses in code fragments MSR 2009, Vancouver 5/17
  • 6. Empirical Study Code siblings: code fragments that migrated from one system to another and then evolved following their own paths Three *nix kernels Linux ~7MLOC and 20,000 files FreeBSB ~8MLOC and 21,000 files OpenBSD ~2MLOC and 5,500 files Overall Size as of Jan. 2009, 17MLOC MSR 2009, Vancouver 6/17
  • 7. Research Questions RQ1: What kinds of open source licenses are used in the three kernels? RQ2: How many potential siblings exist between the BSD kernels and the Linux kernel? RQ3: What licenses are used by siblings and, if different, why? MSR 2009, Vancouver 7/17
  • 8. Technologies and Setup Clone detection tool CCFinderX tool Min 100 tokens Parse only .c files Concentrate on pair of files sharing a high percentage of common code fragment, least ~30%, i.e., ~20LOC Prune files mapped into more than five siblings License detection and identification First comment(s) FoSSology version 1.0.0 78 different license variants Added 5 more licenses MSR 2009, Vancouver 8/17
  • 9. Sibling(s) Origin Identify current siblings Trace back into past siblings – their code fragments in the same files When they disappear, then we have their origins Take the oldest of the two as the true origin Sys 1 – File i Cloned fragments Migration siblings direction Sys 2 – File j MSR 2009, Vancouver Cloned fragments 9/17
  • 10. RQ1: Kinds of open source licenses Linux… is Linux… 65% of GPL files plus 25% of files “promoted” to GPL by L. Torvald A few files (35) have two licenses FreeBSD 75% of the files with BSD license 189 files (5%) with no license 179 files with a corporate license (Intel licenses) 167 files with MIT license A few multiple licenses – 19 BSD and GPL, 15 BSD and Educational, 14 MIT and GPL OpenBSD 76 % BSD licenses 295 files (9%) with a MIT license, 179 with an educational license 138 (84%) without license 59 files with BSD and Educational, 25 with MIT and MSR 2009, BSD, and 14 with BSD and GPL Vancouver 10/17
  • 11. RQ2: Siblings between kernels 2500 2000 1500 FreeBSD vs.Linux OpenBSD vs. Linux 1000 Siblings 500 0 Clone pairs Files Linux Files BSD File Pairs File Pairs (same name) Filtered siblings 250 200 150 FreeBSD vs. Linux OpenBSD vs. Linux 100 50 MSR 2009, Vancouver 0 11/17 Files Linux Files BSD File Pairs File Pairs (same name)
  • 12. RQ3: Code Migration and Licenses FreeBSD Linux Files Before Jan 1, 2002 BSD GPL 8 Almost nothing after BSD MIT 2 OpenBSD Linux Files BSD None 2 BSD BSD+GPL 1 Corporate BSD+GPL 89 BSD MIT 2 GPL None 1 Phrase BSD+GPL 1 BSD Unknown 1 X.Net+BSD MIT 1 BSD+GPL GPL 1 BSD+Phrase Phrase+GPL 1 MIT GPL 23 Linux FreeBSD Files BSD+GPL Corporate 8 GPL BSD 17 GPL BSD+GPL 1 GPL CPL+BSD+GPL 1 After Jan 1, 2002 MIT BSD 1 Nothing before MIT+GPL None 2 MSR 2009, None BSD 1 Vancouver 12/17 Phrase+GPL MIT 2
  • 13. AIC7xxx Maintaining Siblings 1994: Linux AIC7xxx series SCSI adapters 1995: Linux code is incorporated into an OpenBSD driver 1996: NetBSD driver is ported to FreeBSD #ifdef to maintain the variants 1997: A mailing list is created in FreeBSD to unify the efforts of people in the different kernels The major development of the driver seems to happen in FreeBSD 2000: Development propagates to Linux, NetBSD, and OpenBSD Today: Development mostly Linux and FreeBSD MSR 2009, Vancouver 13/17
  • 14. GPC code in FreeBSD 2002: Silicon Graphics xfs file system integrated into Linux Dec 12, 2005 xfs appears in FreeBSD The license of xfs is GPL FreeBSD is licensed under the 2-clause BSD Including xfs in a BSD kernel requires the kernel to be under the GPL too a Compiling GPL-licensed code into the kernel makes it “RESTRICTED” It can no longer be distributed in binary form, its source code be made available for mirroring MSR 2009, Vancouver 14/17
  • 15. License Defects FreeBSD rdma_cma.c / Linux cdma.c are siblings In Linux, it appeared on Jun 17, 2006, with 64 changes plus including 8 changes after it appeared in FreeBSD The Linux sibling is licensed under GPL v2 and the 2- clause BSD licenses The FreeBSD sibling is licensed under the terms of the new BSD license, the GPL v2, and Commons Public License Original license still present in FreeBSD Linux license was changed: commit a9474917099e007c0f51d5474394b5890111614f Author: Sean Hefty <sean.hefty@intel.com> Date: Mon Jul 14 23:48:43 2008 -0700 RDMA: Fix license text The license text for several files references a third software license MSR 2009, that was inadvertently copied in. Update the license to what was Vancouver intended. This update was based on a request from HP. [..] 15/17
  • 16. Conclusion Code move and code siblings do exist Siblings have a preferential flow Initially from BSD(s) to Linux – frequent Today from Linux to FreeBSD – less frequent Companies directly contribute to code in different kernels – see Intel drivers with dual licenses Managing siblings is a difficult problem MSR 2009, Vancouver 16/17
  • 17. If you don’t monitor code may sneak in … Questions ? MSR 2009, Vancouver 17/17