SlideShare a Scribd company logo
+ => 1 million SPDX 
Large-scale license transparency using open data, open standards and F/OSS 
http://triplecheck.net http://searchcode.com
Speaker 
Slide #2 
Nuno Brito 
 Free/open source contributor since 2005 
 Last 12 months wrote 100k F/OSS lines of code 
 SPDX contributor, co-founder of TripleCheck 
Around the web 
http://nunobrito.eu
Transparency 
Slide #3 
Take some source code as example 
Who developed the code? 
Which licenses are applicable? 
Was the code copied from somewhere else?
Size 
Slide #4 
A problem of scale 
Open licenses? > 300 types to choose 
> 5 million F/OSS projects 
> 100 million source code files
Practice 
Slide #5 
Applying licenses 
 Burden on developer (do correctly, do enough) 
 Expressed differently (difficult to understand) 
 Scaling obstacles (scarce automation) 
Transparency?
What do? 
Slide #6 
Ideally, we'd have tooling that is.. 
a) Reachable 
b) Cooperative 
c) Free 
Choose two. (sad reality)
Choose three 
Slide #7 
Choose building blocks based on: 
a) Open standards 
b) Open data 
c) Reachable tools 
Learn, write, improve. 
Share.
Standards 
Slide #8 
SPDX: Open standard for software licensing 
 Standardizes license description 
 Defines Id for license terms 
 http://spdx.org 
Pro: Good docs, straightforward, getting better 
Cons: Slow adoption, scarce tooling
Open data 
Slide #9 
GitHub: Targeting open data repositories 
 API suited for intensive access 
 Social coding 
 Largest open source code collection 
Pro: Reachable, diverse 
Cons: Repositories processed one-by-one
Tooling 
Slide #10 
Custom-built tools for software licenses 
 Large-scale repository data-mining 
 Find applicable licenses inside content 
 Share millions of SPDX documents 
Pro: Learn by doing, modularized, single language 
Cons: Built from scratch, needs consolidation
Step 1 
Slide #11 
Desktop tool/engine to discover licenses 
 SPDX format as storage medium 
 Identify copyright and 18 license types 
 Java, released in Feb 2014. EUPL 
http://spdx.org/tools/community/triplecheck-reporter
Desktop 
Slide #12
File detail 
Slide #13
SPDX file 
Slide #14
Customize 
Slide #15
Details 
Slide #16 
Underneath the hood 
 147 file extensions, 18 license types 
 LOC, hashes (SHA1, MD5, SHA256, SSDEEP) 
 Command line supported (Jenkins, cron) 
 Fast, 40k files/minute (Pentium IV)
Step 2 
Discovering repositories with gitFinder 
Create a list of projects online to use as components. 
Get basic licensing information from each project. 
 Write text file with each github user (~7 million) 
 For each user, find repositories not forked (~10M) 
 Split each repository according to language (197) 
 For each list of language/reps, download code 
Slide #17
Performance 
Slide #18 
~70k repositories/day 
 Single machine (i7, 8Gb RAM, CentOS) 
 9 parallel threads 
 Resume/recover supported 
 Released in Jun. 2014 
https://github.com/triplecheck/gitfinder
Output 
Slide #19
Storage? 
https://what-if.xkcd.com/29/ (CC BY-NC 2.5) Slide #20
Storage 
BigZip, +100 million files on a single download 
Slide #21 
 Flat-file, zip compression (per entry) 
 Fast, simple, portable. Indexed search 
https://github.com/triplecheck/big
How it looks 
Slide #22
Step 3 
Slide #23 
SPDX search engine 
 One-click SPDX creation from open data 
 Visualize license and copyright data 
 Visit at http://searchcode.com/spdx
Example 
Slide #24 
Using the original URL.. 
 https://github.com/iuly/europa_kernel/ 
=> 
 https://spdxhub.com/iuly/europa_kernel/
Example 
Slide #25
SPDX-1M 
“Do It Yourself” kit. Generate 1 million SPDX 
Slide #26 
 https://github.com/triplecheck/diy 
 1.2 million open source projects 
 “Arduino” for s/w licenses detection 
9Gb worth of SPDX? Grab: 
http://triplecheck.net/public/storage/spdx.big
Screenshots 
Slide #27
Next step? 
Slide #28 
F2F – pinpointing non-original code 
 Decompose code into blocks 
 Tokenize/anonymize data 
 Find code matches across knowledge base 
ETA in Dec. 2014 
https://github.com/triplecheck/f2f
Preview 
Slide #29
Conclusion 
Slide #30 
What is now available for everyone 
 Desktop tooling / detection engine 
 Extraction of open data in scale 
 Search engine for SPDX
Questions? 
Slide #31 
http://spdx.org 
http://searchcode.com/spdx 
http://github.com/triplecheck 
Interesting stuff? 
Let us know: @nn81 @boyte #linuxcon 
http://xkcd.com/1118/
Backup slides 
Slide #32
Engine 
Slide #33
License DB 
Slide #34
Components 
Slide #35
Exporting 
Slide #36

More Related Content

What's hot

Open Source Software Concepts
Open Source Software ConceptsOpen Source Software Concepts
Open Source Software Concepts
JITENDRA LENKA
 
The Ring programming language version 1.5.1 book - Part 14 of 180
The Ring programming language version 1.5.1 book - Part 14 of 180The Ring programming language version 1.5.1 book - Part 14 of 180
The Ring programming language version 1.5.1 book - Part 14 of 180
Mahmoud Samir Fayed
 
Philosophy of Open Source - SFO17-TR01
Philosophy of Open Source - SFO17-TR01Philosophy of Open Source - SFO17-TR01
Philosophy of Open Source - SFO17-TR01
Linaro
 
For the Love of Tux: Linux on RISC-V
For the Love of Tux: Linux on RISC-VFor the Love of Tux: Linux on RISC-V
For the Love of Tux: Linux on RISC-V
Drew Fustini
 
Open Source and Free Software
Open Source and Free SoftwareOpen Source and Free Software
Introduction to FOSS, SRM University
Introduction to FOSS, SRM UniversityIntroduction to FOSS, SRM University
Introduction to FOSS, SRM University
Atul Jha
 
Benefits of Opensource Products
Benefits of Opensource ProductsBenefits of Opensource Products
Benefits of Opensource Products
Anju Merin
 
Python at a glance
Python at a glancePython at a glance
Python at a glance
Mohammad Rafiee
 
Dynamic hacking with Guile (FOSDEM 2011)
Dynamic hacking with Guile (FOSDEM 2011)Dynamic hacking with Guile (FOSDEM 2011)
Dynamic hacking with Guile (FOSDEM 2011)
Igalia
 
The open source philosophy
The open source philosophyThe open source philosophy
The open source philosophy
Gautam Krishnan
 
Free and open source software
Free and open source softwareFree and open source software
Free and open source software
Frederik Questier
 
GNU GPL, LGPL, Apache licence Types and Differences
GNU GPL, LGPL, Apache licence Types and DifferencesGNU GPL, LGPL, Apache licence Types and Differences
GNU GPL, LGPL, Apache licence Types and Differences
Iresha Rubasinghe
 
Fundamentals of Free and Open Source Software
Fundamentals of Free and Open Source SoftwareFundamentals of Free and Open Source Software
Fundamentals of Free and Open Source Software
Ross Gardler
 
Kivy report
Kivy reportKivy report
Kivy report
shobhit bhatnagar
 
Open Source Presentation
Open Source PresentationOpen Source Presentation
Open Source Presentation
Adhoura Academy
 
Avoiding the tragedy of the commons: some lessons from the Software Heritage ...
Avoiding the tragedy of the commons: some lessons from the Software Heritage ...Avoiding the tragedy of the commons: some lessons from the Software Heritage ...
Avoiding the tragedy of the commons: some lessons from the Software Heritage ...
OW2
 
Free and Open Source Software
Free and Open Source SoftwareFree and Open Source Software
Free and Open Source Software
iwilldo4u
 
Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01
Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01
Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01
Hiro Yoshioka
 

What's hot (20)

Open Source Software Concepts
Open Source Software ConceptsOpen Source Software Concepts
Open Source Software Concepts
 
The Ring programming language version 1.5.1 book - Part 14 of 180
The Ring programming language version 1.5.1 book - Part 14 of 180The Ring programming language version 1.5.1 book - Part 14 of 180
The Ring programming language version 1.5.1 book - Part 14 of 180
 
Philosophy of Open Source - SFO17-TR01
Philosophy of Open Source - SFO17-TR01Philosophy of Open Source - SFO17-TR01
Philosophy of Open Source - SFO17-TR01
 
For the Love of Tux: Linux on RISC-V
For the Love of Tux: Linux on RISC-VFor the Love of Tux: Linux on RISC-V
For the Love of Tux: Linux on RISC-V
 
Open Source and Free Software
Open Source and Free SoftwareOpen Source and Free Software
Open Source and Free Software
 
Introduction to FOSS, SRM University
Introduction to FOSS, SRM UniversityIntroduction to FOSS, SRM University
Introduction to FOSS, SRM University
 
Benefits of Opensource Products
Benefits of Opensource ProductsBenefits of Opensource Products
Benefits of Opensource Products
 
Python at a glance
Python at a glancePython at a glance
Python at a glance
 
Dynamic hacking with Guile (FOSDEM 2011)
Dynamic hacking with Guile (FOSDEM 2011)Dynamic hacking with Guile (FOSDEM 2011)
Dynamic hacking with Guile (FOSDEM 2011)
 
The open source philosophy
The open source philosophyThe open source philosophy
The open source philosophy
 
MSR09.ppt
MSR09.pptMSR09.ppt
MSR09.ppt
 
Free and open source software
Free and open source softwareFree and open source software
Free and open source software
 
GNU GPL, LGPL, Apache licence Types and Differences
GNU GPL, LGPL, Apache licence Types and DifferencesGNU GPL, LGPL, Apache licence Types and Differences
GNU GPL, LGPL, Apache licence Types and Differences
 
Fundamentals of Free and Open Source Software
Fundamentals of Free and Open Source SoftwareFundamentals of Free and Open Source Software
Fundamentals of Free and Open Source Software
 
Kivy report
Kivy reportKivy report
Kivy report
 
Open Source Presentation
Open Source PresentationOpen Source Presentation
Open Source Presentation
 
Avoiding the tragedy of the commons: some lessons from the Software Heritage ...
Avoiding the tragedy of the commons: some lessons from the Software Heritage ...Avoiding the tragedy of the commons: some lessons from the Software Heritage ...
Avoiding the tragedy of the commons: some lessons from the Software Heritage ...
 
Free and Open Source Software
Free and Open Source SoftwareFree and Open Source Software
Free and Open Source Software
 
Foss Presentation
Foss PresentationFoss Presentation
Foss Presentation
 
Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01
Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01
Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01
 

Similar to 2014 10-14: GitHub plus FOSS == 1 million SPDX

Android Developer Meetup
Android Developer MeetupAndroid Developer Meetup
Android Developer Meetup
Medialets
 
Automate your iOS deployment a bit
Automate your iOS deployment a bitAutomate your iOS deployment a bit
Automate your iOS deployment a bit
Michał Łukasiewicz
 
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
sparkfabrik
 
Ubucon 2013, licensing and packaging OSS
Ubucon 2013, licensing and packaging OSSUbucon 2013, licensing and packaging OSS
Ubucon 2013, licensing and packaging OSS
Nuno Brito
 
Open frameworks 101_fitc
Open frameworks 101_fitcOpen frameworks 101_fitc
Open frameworks 101_fitc
benDesigning
 
Hacking the Kinect with GAFFTA Day 1
Hacking the Kinect with GAFFTA Day 1Hacking the Kinect with GAFFTA Day 1
Hacking the Kinect with GAFFTA Day 1
benDesigning
 
Module 18 (linux hacking)
Module 18 (linux hacking)Module 18 (linux hacking)
Module 18 (linux hacking)
Wail Hassan
 
Become Rick and famous, thanks to Open Source
Become Rick and famous, thanks to Open SourceBecome Rick and famous, thanks to Open Source
Become Rick and famous, thanks to Open Source
Geeks Anonymes
 
2nd ARM Developer Day - mbed Workshop - ARM
2nd ARM Developer Day - mbed Workshop - ARM2nd ARM Developer Day - mbed Workshop - ARM
2nd ARM Developer Day - mbed Workshop - ARMAntonio Mondragon
 
Introduction to License Compliance and My research (D. German)
Introduction to License Compliance and My research (D. German)Introduction to License Compliance and My research (D. German)
Introduction to License Compliance and My research (D. German)
dmgerman
 
Scanning Docker Images with ScanCode.io
Scanning Docker Images with ScanCode.ioScanning Docker Images with ScanCode.io
Scanning Docker Images with ScanCode.io
Michael Herzog
 
Software Heritage, a revolutionary infrastructure for software source code, O...
Software Heritage, a revolutionary infrastructure for software source code, O...Software Heritage, a revolutionary infrastructure for software source code, O...
Software Heritage, a revolutionary infrastructure for software source code, O...
OW2
 
OpenNTF Webinar 05/07/13: OpenNTF - The IBM Collaboration Solutions App Dev C...
OpenNTF Webinar 05/07/13: OpenNTF - The IBM Collaboration Solutions App Dev C...OpenNTF Webinar 05/07/13: OpenNTF - The IBM Collaboration Solutions App Dev C...
OpenNTF Webinar 05/07/13: OpenNTF - The IBM Collaboration Solutions App Dev C...
Niklas Heidloff
 
Lab Handson: Power your Creations with Intel Edison!
Lab Handson: Power your Creations with Intel Edison!Lab Handson: Power your Creations with Intel Edison!
Lab Handson: Power your Creations with Intel Edison!
Codemotion
 
Microsoft Embracing Open Source Technologies
Microsoft Embracing Open Source TechnologiesMicrosoft Embracing Open Source Technologies
Microsoft Embracing Open Source Technologies
Ricardo Peres
 
Software Heritage: Archiving the Free Software Commons for Fun & Profit
Software Heritage: Archiving the Free Software Commons for Fun & ProfitSoftware Heritage: Archiving the Free Software Commons for Fun & Profit
Software Heritage: Archiving the Free Software Commons for Fun & Profit
Speck&Tech
 
DT2014-15 S01: Digital Toolbox
DT2014-15 S01: Digital ToolboxDT2014-15 S01: Digital Toolbox
DT2014-15 S01: Digital Toolbox
Carlos Cámara
 
UnDeveloper Studio
UnDeveloper StudioUnDeveloper Studio
UnDeveloper Studio
Christien Rioux
 
Open source freeopensource & linux
Open source freeopensource & linuxOpen source freeopensource & linux
Open source freeopensource & linux
Manura Perera
 
Tech Talk - Blockchain presentation
Tech Talk - Blockchain presentationTech Talk - Blockchain presentation
Tech Talk - Blockchain presentation
Laura Steggles
 

Similar to 2014 10-14: GitHub plus FOSS == 1 million SPDX (20)

Android Developer Meetup
Android Developer MeetupAndroid Developer Meetup
Android Developer Meetup
 
Automate your iOS deployment a bit
Automate your iOS deployment a bitAutomate your iOS deployment a bit
Automate your iOS deployment a bit
 
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
 
Ubucon 2013, licensing and packaging OSS
Ubucon 2013, licensing and packaging OSSUbucon 2013, licensing and packaging OSS
Ubucon 2013, licensing and packaging OSS
 
Open frameworks 101_fitc
Open frameworks 101_fitcOpen frameworks 101_fitc
Open frameworks 101_fitc
 
Hacking the Kinect with GAFFTA Day 1
Hacking the Kinect with GAFFTA Day 1Hacking the Kinect with GAFFTA Day 1
Hacking the Kinect with GAFFTA Day 1
 
Module 18 (linux hacking)
Module 18 (linux hacking)Module 18 (linux hacking)
Module 18 (linux hacking)
 
Become Rick and famous, thanks to Open Source
Become Rick and famous, thanks to Open SourceBecome Rick and famous, thanks to Open Source
Become Rick and famous, thanks to Open Source
 
2nd ARM Developer Day - mbed Workshop - ARM
2nd ARM Developer Day - mbed Workshop - ARM2nd ARM Developer Day - mbed Workshop - ARM
2nd ARM Developer Day - mbed Workshop - ARM
 
Introduction to License Compliance and My research (D. German)
Introduction to License Compliance and My research (D. German)Introduction to License Compliance and My research (D. German)
Introduction to License Compliance and My research (D. German)
 
Scanning Docker Images with ScanCode.io
Scanning Docker Images with ScanCode.ioScanning Docker Images with ScanCode.io
Scanning Docker Images with ScanCode.io
 
Software Heritage, a revolutionary infrastructure for software source code, O...
Software Heritage, a revolutionary infrastructure for software source code, O...Software Heritage, a revolutionary infrastructure for software source code, O...
Software Heritage, a revolutionary infrastructure for software source code, O...
 
OpenNTF Webinar 05/07/13: OpenNTF - The IBM Collaboration Solutions App Dev C...
OpenNTF Webinar 05/07/13: OpenNTF - The IBM Collaboration Solutions App Dev C...OpenNTF Webinar 05/07/13: OpenNTF - The IBM Collaboration Solutions App Dev C...
OpenNTF Webinar 05/07/13: OpenNTF - The IBM Collaboration Solutions App Dev C...
 
Lab Handson: Power your Creations with Intel Edison!
Lab Handson: Power your Creations with Intel Edison!Lab Handson: Power your Creations with Intel Edison!
Lab Handson: Power your Creations with Intel Edison!
 
Microsoft Embracing Open Source Technologies
Microsoft Embracing Open Source TechnologiesMicrosoft Embracing Open Source Technologies
Microsoft Embracing Open Source Technologies
 
Software Heritage: Archiving the Free Software Commons for Fun & Profit
Software Heritage: Archiving the Free Software Commons for Fun & ProfitSoftware Heritage: Archiving the Free Software Commons for Fun & Profit
Software Heritage: Archiving the Free Software Commons for Fun & Profit
 
DT2014-15 S01: Digital Toolbox
DT2014-15 S01: Digital ToolboxDT2014-15 S01: Digital Toolbox
DT2014-15 S01: Digital Toolbox
 
UnDeveloper Studio
UnDeveloper StudioUnDeveloper Studio
UnDeveloper Studio
 
Open source freeopensource & linux
Open source freeopensource & linuxOpen source freeopensource & linux
Open source freeopensource & linux
 
Tech Talk - Blockchain presentation
Tech Talk - Blockchain presentationTech Talk - Blockchain presentation
Tech Talk - Blockchain presentation
 

More from Nuno Brito

Triplechecheck induction-presentation-sample
Triplechecheck induction-presentation-sampleTriplechecheck induction-presentation-sample
Triplechecheck induction-presentation-sample
Nuno Brito
 
Stop look and listen before you talk
Stop look and listen before you talkStop look and listen before you talk
Stop look and listen before you talk
Nuno Brito
 
Lifes Good In Portugal
Lifes Good In PortugalLifes Good In Portugal
Lifes Good In PortugalNuno Brito
 
Managing business relationships
Managing business relationshipsManaging business relationships
Managing business relationshipsNuno Brito
 
Explaining the WinBuilder framework
Explaining the WinBuilder frameworkExplaining the WinBuilder framework
Explaining the WinBuilder framework
Nuno Brito
 
White paper - Adhoc 2.0
White paper - Adhoc 2.0White paper - Adhoc 2.0
White paper - Adhoc 2.0
Nuno Brito
 

More from Nuno Brito (6)

Triplechecheck induction-presentation-sample
Triplechecheck induction-presentation-sampleTriplechecheck induction-presentation-sample
Triplechecheck induction-presentation-sample
 
Stop look and listen before you talk
Stop look and listen before you talkStop look and listen before you talk
Stop look and listen before you talk
 
Lifes Good In Portugal
Lifes Good In PortugalLifes Good In Portugal
Lifes Good In Portugal
 
Managing business relationships
Managing business relationshipsManaging business relationships
Managing business relationships
 
Explaining the WinBuilder framework
Explaining the WinBuilder frameworkExplaining the WinBuilder framework
Explaining the WinBuilder framework
 
White paper - Adhoc 2.0
White paper - Adhoc 2.0White paper - Adhoc 2.0
White paper - Adhoc 2.0
 

Recently uploaded

Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 

Recently uploaded (20)

Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 

2014 10-14: GitHub plus FOSS == 1 million SPDX

  • 1. + => 1 million SPDX Large-scale license transparency using open data, open standards and F/OSS http://triplecheck.net http://searchcode.com
  • 2. Speaker Slide #2 Nuno Brito  Free/open source contributor since 2005  Last 12 months wrote 100k F/OSS lines of code  SPDX contributor, co-founder of TripleCheck Around the web http://nunobrito.eu
  • 3. Transparency Slide #3 Take some source code as example Who developed the code? Which licenses are applicable? Was the code copied from somewhere else?
  • 4. Size Slide #4 A problem of scale Open licenses? > 300 types to choose > 5 million F/OSS projects > 100 million source code files
  • 5. Practice Slide #5 Applying licenses  Burden on developer (do correctly, do enough)  Expressed differently (difficult to understand)  Scaling obstacles (scarce automation) Transparency?
  • 6. What do? Slide #6 Ideally, we'd have tooling that is.. a) Reachable b) Cooperative c) Free Choose two. (sad reality)
  • 7. Choose three Slide #7 Choose building blocks based on: a) Open standards b) Open data c) Reachable tools Learn, write, improve. Share.
  • 8. Standards Slide #8 SPDX: Open standard for software licensing  Standardizes license description  Defines Id for license terms  http://spdx.org Pro: Good docs, straightforward, getting better Cons: Slow adoption, scarce tooling
  • 9. Open data Slide #9 GitHub: Targeting open data repositories  API suited for intensive access  Social coding  Largest open source code collection Pro: Reachable, diverse Cons: Repositories processed one-by-one
  • 10. Tooling Slide #10 Custom-built tools for software licenses  Large-scale repository data-mining  Find applicable licenses inside content  Share millions of SPDX documents Pro: Learn by doing, modularized, single language Cons: Built from scratch, needs consolidation
  • 11. Step 1 Slide #11 Desktop tool/engine to discover licenses  SPDX format as storage medium  Identify copyright and 18 license types  Java, released in Feb 2014. EUPL http://spdx.org/tools/community/triplecheck-reporter
  • 16. Details Slide #16 Underneath the hood  147 file extensions, 18 license types  LOC, hashes (SHA1, MD5, SHA256, SSDEEP)  Command line supported (Jenkins, cron)  Fast, 40k files/minute (Pentium IV)
  • 17. Step 2 Discovering repositories with gitFinder Create a list of projects online to use as components. Get basic licensing information from each project.  Write text file with each github user (~7 million)  For each user, find repositories not forked (~10M)  Split each repository according to language (197)  For each list of language/reps, download code Slide #17
  • 18. Performance Slide #18 ~70k repositories/day  Single machine (i7, 8Gb RAM, CentOS)  9 parallel threads  Resume/recover supported  Released in Jun. 2014 https://github.com/triplecheck/gitfinder
  • 21. Storage BigZip, +100 million files on a single download Slide #21  Flat-file, zip compression (per entry)  Fast, simple, portable. Indexed search https://github.com/triplecheck/big
  • 22. How it looks Slide #22
  • 23. Step 3 Slide #23 SPDX search engine  One-click SPDX creation from open data  Visualize license and copyright data  Visit at http://searchcode.com/spdx
  • 24. Example Slide #24 Using the original URL..  https://github.com/iuly/europa_kernel/ =>  https://spdxhub.com/iuly/europa_kernel/
  • 26. SPDX-1M “Do It Yourself” kit. Generate 1 million SPDX Slide #26  https://github.com/triplecheck/diy  1.2 million open source projects  “Arduino” for s/w licenses detection 9Gb worth of SPDX? Grab: http://triplecheck.net/public/storage/spdx.big
  • 28. Next step? Slide #28 F2F – pinpointing non-original code  Decompose code into blocks  Tokenize/anonymize data  Find code matches across knowledge base ETA in Dec. 2014 https://github.com/triplecheck/f2f
  • 30. Conclusion Slide #30 What is now available for everyone  Desktop tooling / detection engine  Extraction of open data in scale  Search engine for SPDX
  • 31. Questions? Slide #31 http://spdx.org http://searchcode.com/spdx http://github.com/triplecheck Interesting stuff? Let us know: @nn81 @boyte #linuxcon http://xkcd.com/1118/