SlideShare a Scribd company logo
1 of 20
Download to read offline
Software Heritage
the Great Library of Free and Open Source Software
Stefano Zacchiroli
University Paris Diderot & Inria — zack@upsilon.cc
7 December 2017
Paris Open Source Summit
THE GREAT LIBRARY OF SOURCE CODE
Stefano Zacchiroli Software Heritage 07/12/2017, POSS 1 / 14
Software is everywhere
Software
Stefano Zacchiroli Software Heritage 07/12/2017, POSS 2 / 14
Our Software Commons
Definition (Commons)
The commons is the cultural and natural resources accessible to all members of a
society, including natural materials such as air, water, and a habitable earth. These
resources are held in common, not owned privately. https://en.wikipedia.org/wiki/Commons
Definition (Software Commons)
The software commons consists of all computer software which is available at little or no
cost and which can be altered and reused with few restrictions. Thus all open source
software and all free software are part of the [software] commons. [...]
https://en.wikipedia.org/wiki/Software_Commons
Stefano Zacchiroli Software Heritage 07/12/2017, POSS 3 / 14
Our Software Commons
Definition (Commons)
The commons is the cultural and natural resources accessible to all members of a
society, including natural materials such as air, water, and a habitable earth. These
resources are held in common, not owned privately. https://en.wikipedia.org/wiki/Commons
Definition (Software Commons)
The software commons consists of all computer software which is available at little or no
cost and which can be altered and reused with few restrictions. Thus all open source
software and all free software are part of the [software] commons. [...]
https://en.wikipedia.org/wiki/Software_Commons
Source code is a precious part of our commons
are we taking care of it?
Stefano Zacchiroli Software Heritage 07/12/2017, POSS 3 / 14
Software is fragile
Like all digital information, FOSS is fragile
inconsiderate and/or malicious code loss (e.g., Code Spaces)
business-driven code loss (e.g., Gitorious, Google Code)
for obsolete code: physical media decay (data rot)
Stefano Zacchiroli Software Heritage 07/12/2017, POSS 4 / 14
Software is fragile
Like all digital information, FOSS is fragile
inconsiderate and/or malicious code loss (e.g., Code Spaces)
business-driven code loss (e.g., Gitorious, Google Code)
for obsolete code: physical media decay (data rot)
Where is the archive...
where we go if (a repository on) GitHub or GitLab.com goes away?
Stefano Zacchiroli Software Heritage 07/12/2017, POSS 4 / 14
Software lacks its own research infrastructure
A wealth of software research on crucial issues...
safety, security, test, verification, proof
software engineering, software evolution
big data, machine learning, empirical studies
Stefano Zacchiroli Software Heritage 07/12/2017, POSS 5 / 14
Software lacks its own research infrastructure
A wealth of software research on crucial issues...
safety, security, test, verification, proof
software engineering, software evolution
big data, machine learning, empirical studies
If you study the stars, you go to Atacama...
... where is the very large telescope of source code?
Stefano Zacchiroli Software Heritage 07/12/2017, POSS 5 / 14
The Software Heritage Project
THE GREAT LIBRARY OF SOURCE CODE
Our mission
Collect, preserve and share the source code of all the software that is publicly available.
Past, present and future
Preserving the past, enhancing the present, preparing the future.
Stefano Zacchiroli Software Heritage 07/12/2017, POSS 6 / 14
We are working on the foundations
Stefano Zacchiroli Software Heritage 07/12/2017, POSS 7 / 14
Core principles
Technology
transparency and FOSS
replicas all the way down
Content
source code first
no a priori selection
intrinsic identifiers
facts and provenance
Organization
non-profit
multi-stakeholder
Stefano Zacchiroli Software Heritage 07/12/2017, POSS 8 / 14
Data flow
dsc
dsc
hg
hg
hg
git
git
git git
svn
svn
svn
tar
zip
software
origins
Package
repos
Software Heritage
Archive
Forges
GitHub
lister
GitLab
lister
Debian
lister
Git
loader
Mercurial
loader
Debian source
package loader
PyPi
lister
tar loader
Merkle DAG
+
blob storage
.
.
.
.
.
.
Distros
...
Scheduling
Listing
(full/incremental)
Loading
& deduplication
Stefano Zacchiroli Software Heritage 07/12/2017, POSS 9 / 14
The archive: a (giant) Merkle DAG
Stefano Zacchiroli Software Heritage 07/12/2017, POSS 10 / 14
Archive coverage
Our current sources
GitHub
Debian, GNU
WIP: Gitorious, Google Code, Bitbucket
Stefano Zacchiroli Software Heritage 07/12/2017, POSS 11 / 14
Archive coverage
Our current sources
GitHub
Debian, GNU
WIP: Gitorious, Google Code, Bitbucket
150 TB blobs, 5 TB database (as a graph: 7 B nodes + 60 B edges)
Stefano Zacchiroli Software Heritage 07/12/2017, POSS 11 / 14
Archive coverage
Our current sources
GitHub
Debian, GNU
WIP: Gitorious, Google Code, Bitbucket
150 TB blobs, 5 TB database (as a graph: 7 B nodes + 60 B edges)
The richest source code archive already, ... and growing daily!
Stefano Zacchiroli Software Heritage 07/12/2017, POSS 11 / 14
Roadmap
Features...
(done) lookup by content hash
browsing: "wayback machine" for archived code
(done) via Web API
(todo) via Web UI
(todo) download: wget / git clone from the archive
(todo) deposit of source code bundles directly to the archive
(todo) provenance lookup for all archived content
(todo) full-text search on all archived source code files
Stefano Zacchiroli Software Heritage 07/12/2017, POSS 12 / 14
Roadmap
Features...
(done) lookup by content hash
browsing: "wayback machine" for archived code
(done) via Web API
(todo) via Web UI
(todo) download: wget / git clone from the archive
(todo) deposit of source code bundles directly to the archive
(todo) provenance lookup for all archived content
(todo) full-text search on all archived source code files
... and much more than one could possibly imagine
all the world’s software development history in a single graph!
Stefano Zacchiroli Software Heritage 07/12/2017, POSS 12 / 14
The Software Heritage community
Sponsors Testimonials
UNESCO/Inria agreement (April 3rd, 2017)
Stefano Zacchiroli Software Heritage 07/12/2017, POSS 13 / 14
Conclusion
it is now urgent to preserve software source code
Software Heritage is taking a systematic approach at it
Software Heritage has already assembled the largest archive to date
Software Heritage has synergies with cultural, research, and industry needs
it is a shared infrastructure that can benefit us all...
... we should collaborate and pool resources to make it so
Come in, we’re open!
www.softwareheritage.org — learn more
www.softwareheritage.org/support/sponsors/ — sponsoring info
forge.softwareheritage.org — our own code
Thanks!
Stefano Zacchiroli Software Heritage 07/12/2017, POSS 14 / 14

More Related Content

Similar to Avoiding the tragedy of the commons: some lessons from the Software Heritage experience, Stefano Zacchiroli, Software heritage, Paris Open Source Summit 2017

OpenChain Webinar #5: Software Heritage
OpenChain Webinar #5: Software HeritageOpenChain Webinar #5: Software Heritage
OpenChain Webinar #5: Software HeritageShane Coughlan
 
Open P2P Design @ Simbioms.org, Helsinki 12/11/2011
Open P2P Design @ Simbioms.org, Helsinki 12/11/2011Open P2P Design @ Simbioms.org, Helsinki 12/11/2011
Open P2P Design @ Simbioms.org, Helsinki 12/11/2011Massimo Menichinelli
 
Greenstone Digital Library Software
Greenstone Digital Library SoftwareGreenstone Digital Library Software
Greenstone Digital Library SoftwareMINTUMATHEW8
 
FFmpeg - the universal multimedia toolkit
FFmpeg - the universal multimedia toolkitFFmpeg - the universal multimedia toolkit
FFmpeg - the universal multimedia toolkitStefano Sabatini
 
II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...
II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...
II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...Dr. Haxel Consult
 
You think you're not a target? A tale of three developers...
You think you're not a target? A tale of three developers...You think you're not a target? A tale of three developers...
You think you're not a target? A tale of three developers...Speck&Tech
 
OpenChain Automation Case Study - September to December 2021
OpenChain Automation Case Study - September to December 2021OpenChain Automation Case Study - September to December 2021
OpenChain Automation Case Study - September to December 2021Shane Coughlan
 
Software Preservation: challenges and opportunities for reproductibility (Sci...
Software Preservation: challenges and opportunities for reproductibility (Sci...Software Preservation: challenges and opportunities for reproductibility (Sci...
Software Preservation: challenges and opportunities for reproductibility (Sci...Roberto Di Cosmo
 
ScilabTEC 2015 - Irill
ScilabTEC 2015 - IrillScilabTEC 2015 - Irill
ScilabTEC 2015 - IrillScilab
 
2014 10-14: GitHub plus FOSS == 1 million SPDX
2014 10-14: GitHub plus FOSS == 1 million SPDX2014 10-14: GitHub plus FOSS == 1 million SPDX
2014 10-14: GitHub plus FOSS == 1 million SPDXNuno Brito
 
Building an Open Source iOS app: lessons learned
Building an Open Source iOS app: lessons learnedBuilding an Open Source iOS app: lessons learned
Building an Open Source iOS app: lessons learnedWojciech Koszek
 
GoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'EliaGoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'EliaFriprogsenteret
 
SFScon21 - Carlo Piana - Alberto Pianon - Aliens4friends: make yourself an al...
SFScon21 - Carlo Piana - Alberto Pianon - Aliens4friends: make yourself an al...SFScon21 - Carlo Piana - Alberto Pianon - Aliens4friends: make yourself an al...
SFScon21 - Carlo Piana - Alberto Pianon - Aliens4friends: make yourself an al...South Tyrol Free Software Conference
 
FOSS4G2018: presentation of the app Geopaparazzi
FOSS4G2018: presentation of the app GeopaparazziFOSS4G2018: presentation of the app Geopaparazzi
FOSS4G2018: presentation of the app Geopaparazzisilli
 
Open-source, how we survive with it?
Open-source, how we survive with it?Open-source, how we survive with it?
Open-source, how we survive with it?Hermet Park
 
Community based software development: The GRASS GIS project
Community based software development: The GRASS GIS projectCommunity based software development: The GRASS GIS project
Community based software development: The GRASS GIS projectMarkus Neteler
 
OpenChain Automation Case Study - September to December 2021
OpenChain Automation Case Study - September to December 2021OpenChain Automation Case Study - September to December 2021
OpenChain Automation Case Study - September to December 2021Shane Coughlan
 

Similar to Avoiding the tragedy of the commons: some lessons from the Software Heritage experience, Stefano Zacchiroli, Software heritage, Paris Open Source Summit 2017 (20)

OpenChain Webinar #5: Software Heritage
OpenChain Webinar #5: Software HeritageOpenChain Webinar #5: Software Heritage
OpenChain Webinar #5: Software Heritage
 
Open P2P Design @ Simbioms.org, Helsinki 12/11/2011
Open P2P Design @ Simbioms.org, Helsinki 12/11/2011Open P2P Design @ Simbioms.org, Helsinki 12/11/2011
Open P2P Design @ Simbioms.org, Helsinki 12/11/2011
 
Greenstone Digital Library Software
Greenstone Digital Library SoftwareGreenstone Digital Library Software
Greenstone Digital Library Software
 
FFmpeg - the universal multimedia toolkit
FFmpeg - the universal multimedia toolkitFFmpeg - the universal multimedia toolkit
FFmpeg - the universal multimedia toolkit
 
II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...
II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...
II-SDV 2017: Custom Open Source Search Engine with Drupal 8 and Solr at Frenc...
 
You think you're not a target? A tale of three developers...
You think you're not a target? A tale of three developers...You think you're not a target? A tale of three developers...
You think you're not a target? A tale of three developers...
 
OpenChain Automation Case Study - September to December 2021
OpenChain Automation Case Study - September to December 2021OpenChain Automation Case Study - September to December 2021
OpenChain Automation Case Study - September to December 2021
 
Software Preservation: challenges and opportunities for reproductibility (Sci...
Software Preservation: challenges and opportunities for reproductibility (Sci...Software Preservation: challenges and opportunities for reproductibility (Sci...
Software Preservation: challenges and opportunities for reproductibility (Sci...
 
ScilabTEC 2015 - Irill
ScilabTEC 2015 - IrillScilabTEC 2015 - Irill
ScilabTEC 2015 - Irill
 
2014 10-14: GitHub plus FOSS == 1 million SPDX
2014 10-14: GitHub plus FOSS == 1 million SPDX2014 10-14: GitHub plus FOSS == 1 million SPDX
2014 10-14: GitHub plus FOSS == 1 million SPDX
 
Building an Open Source iOS app: lessons learned
Building an Open Source iOS app: lessons learnedBuilding an Open Source iOS app: lessons learned
Building an Open Source iOS app: lessons learned
 
GoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'EliaGoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'Elia
 
IoTivity: From Devices to the Cloud
IoTivity: From Devices to the CloudIoTivity: From Devices to the Cloud
IoTivity: From Devices to the Cloud
 
SFScon21 - Carlo Piana - Alberto Pianon - Aliens4friends: make yourself an al...
SFScon21 - Carlo Piana - Alberto Pianon - Aliens4friends: make yourself an al...SFScon21 - Carlo Piana - Alberto Pianon - Aliens4friends: make yourself an al...
SFScon21 - Carlo Piana - Alberto Pianon - Aliens4friends: make yourself an al...
 
FOSS4G2018: presentation of the app Geopaparazzi
FOSS4G2018: presentation of the app GeopaparazziFOSS4G2018: presentation of the app Geopaparazzi
FOSS4G2018: presentation of the app Geopaparazzi
 
Open-source, how we survive with it?
Open-source, how we survive with it?Open-source, how we survive with it?
Open-source, how we survive with it?
 
2nd
2nd2nd
2nd
 
2nd
2nd2nd
2nd
 
Community based software development: The GRASS GIS project
Community based software development: The GRASS GIS projectCommunity based software development: The GRASS GIS project
Community based software development: The GRASS GIS project
 
OpenChain Automation Case Study - September to December 2021
OpenChain Automation Case Study - September to December 2021OpenChain Automation Case Study - September to December 2021
OpenChain Automation Case Study - September to December 2021
 

More from OW2

OW2 and RIOS teaming up to boost the open source impact, Nov. 2022 in Roma
OW2 and RIOS teaming up to boost the open source impact, Nov. 2022 in RomaOW2 and RIOS teaming up to boost the open source impact, Nov. 2022 in Roma
OW2 and RIOS teaming up to boost the open source impact, Nov. 2022 in RomaOW2
 
The Open Source Good Governance Initiative presented at RIOS OS Week, Nov. 20...
The Open Source Good Governance Initiative presented at RIOS OS Week, Nov. 20...The Open Source Good Governance Initiative presented at RIOS OS Week, Nov. 20...
The Open Source Good Governance Initiative presented at RIOS OS Week, Nov. 20...OW2
 
GLPi v.10, les fonctionnalités principales et l'offre cloud
GLPi v.10, les fonctionnalités principales et l'offre cloudGLPi v.10, les fonctionnalités principales et l'offre cloud
GLPi v.10, les fonctionnalités principales et l'offre cloudOW2
 
Centreon: superviser le Cloud et le Legacy à partir d'une même plateforme, po...
Centreon: superviser le Cloud et le Legacy à partir d'une même plateforme, po...Centreon: superviser le Cloud et le Legacy à partir d'une même plateforme, po...
Centreon: superviser le Cloud et le Legacy à partir d'une même plateforme, po...OW2
 
FusionIAM : la gestion des identités et des accés open source
FusionIAM : la gestion des identités et des accés open sourceFusionIAM : la gestion des identités et des accés open source
FusionIAM : la gestion des identités et des accés open sourceOW2
 
OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...
OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...
OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...OW2
 
SFScon'20 Bringing the User into the Equation
SFScon'20 Bringing the User into the EquationSFScon'20 Bringing the User into the Equation
SFScon'20 Bringing the User into the EquationOW2
 
Towards a sustainable solution to open source sustainability, OW2online20, Ju...
Towards a sustainable solution to open source sustainability, OW2online20, Ju...Towards a sustainable solution to open source sustainability, OW2online20, Ju...
Towards a sustainable solution to open source sustainability, OW2online20, Ju...OW2
 
Advanced proactive and polymorphing cloud application adaptation with MORPHEM...
Advanced proactive and polymorphing cloud application adaptation with MORPHEM...Advanced proactive and polymorphing cloud application adaptation with MORPHEM...
Advanced proactive and polymorphing cloud application adaptation with MORPHEM...OW2
 
Open Source governance and the Eclipse Foundation, OW2online, June 2020
Open Source governance and the Eclipse Foundation, OW2online, June 2020Open Source governance and the Eclipse Foundation, OW2online, June 2020
Open Source governance and the Eclipse Foundation, OW2online, June 2020OW2
 
Open source contribution policies, OW2online, June 2020
Open source contribution policies, OW2online, June 2020Open source contribution policies, OW2online, June 2020
Open source contribution policies, OW2online, June 2020OW2
 
Software development at scale, pandemic lockdown and oss ecosystems, OW2onlin...
Software development at scale, pandemic lockdown and oss ecosystems, OW2onlin...Software development at scale, pandemic lockdown and oss ecosystems, OW2onlin...
Software development at scale, pandemic lockdown and oss ecosystems, OW2onlin...OW2
 
Overview of the OpenChain Reference Tooling Work Group, OW2online20, June 2020
Overview of the OpenChain Reference Tooling Work Group, OW2online20, June 2020Overview of the OpenChain Reference Tooling Work Group, OW2online20, June 2020
Overview of the OpenChain Reference Tooling Work Group, OW2online20, June 2020OW2
 
Open Source Compliance at Orange, OW2online, June 2020
Open Source Compliance at Orange, OW2online, June 2020Open Source Compliance at Orange, OW2online, June 2020
Open Source Compliance at Orange, OW2online, June 2020OW2
 
Ideas, methods and tools for OSS Compliance assessment, OW2online, June 2020
Ideas, methods and tools for OSS Compliance assessment, OW2online, June 2020Ideas, methods and tools for OSS Compliance assessment, OW2online, June 2020
Ideas, methods and tools for OSS Compliance assessment, OW2online, June 2020OW2
 
Intelligent package management with FASTEN, OW2online, June 2020
Intelligent package management with FASTEN, OW2online, June 2020Intelligent package management with FASTEN, OW2online, June 2020
Intelligent package management with FASTEN, OW2online, June 2020OW2
 
DECODER, a Smarter Environment for DevOps Teams , OW2online, June 2020
DECODER, a Smarter Environment for DevOps Teams , OW2online, June 2020DECODER, a Smarter Environment for DevOps Teams , OW2online, June 2020
DECODER, a Smarter Environment for DevOps Teams , OW2online, June 2020OW2
 
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...Enabling DevOps for IoT software development, powered by Open Source, OW2onli...
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...OW2
 
Upcoming Challenges in Artificial Intelligence Research and Development, OW2o...
Upcoming Challenges in Artificial Intelligence Research and Development, OW2o...Upcoming Challenges in Artificial Intelligence Research and Development, OW2o...
Upcoming Challenges in Artificial Intelligence Research and Development, OW2o...OW2
 
Cacti and Big Data at Orange France, OW2online, June 2020
Cacti and Big Data at Orange France, OW2online, June 2020Cacti and Big Data at Orange France, OW2online, June 2020
Cacti and Big Data at Orange France, OW2online, June 2020OW2
 

More from OW2 (20)

OW2 and RIOS teaming up to boost the open source impact, Nov. 2022 in Roma
OW2 and RIOS teaming up to boost the open source impact, Nov. 2022 in RomaOW2 and RIOS teaming up to boost the open source impact, Nov. 2022 in Roma
OW2 and RIOS teaming up to boost the open source impact, Nov. 2022 in Roma
 
The Open Source Good Governance Initiative presented at RIOS OS Week, Nov. 20...
The Open Source Good Governance Initiative presented at RIOS OS Week, Nov. 20...The Open Source Good Governance Initiative presented at RIOS OS Week, Nov. 20...
The Open Source Good Governance Initiative presented at RIOS OS Week, Nov. 20...
 
GLPi v.10, les fonctionnalités principales et l'offre cloud
GLPi v.10, les fonctionnalités principales et l'offre cloudGLPi v.10, les fonctionnalités principales et l'offre cloud
GLPi v.10, les fonctionnalités principales et l'offre cloud
 
Centreon: superviser le Cloud et le Legacy à partir d'une même plateforme, po...
Centreon: superviser le Cloud et le Legacy à partir d'une même plateforme, po...Centreon: superviser le Cloud et le Legacy à partir d'une même plateforme, po...
Centreon: superviser le Cloud et le Legacy à partir d'une même plateforme, po...
 
FusionIAM : la gestion des identités et des accés open source
FusionIAM : la gestion des identités et des accés open sourceFusionIAM : la gestion des identités et des accés open source
FusionIAM : la gestion des identités et des accés open source
 
OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...
OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...
OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...
 
SFScon'20 Bringing the User into the Equation
SFScon'20 Bringing the User into the EquationSFScon'20 Bringing the User into the Equation
SFScon'20 Bringing the User into the Equation
 
Towards a sustainable solution to open source sustainability, OW2online20, Ju...
Towards a sustainable solution to open source sustainability, OW2online20, Ju...Towards a sustainable solution to open source sustainability, OW2online20, Ju...
Towards a sustainable solution to open source sustainability, OW2online20, Ju...
 
Advanced proactive and polymorphing cloud application adaptation with MORPHEM...
Advanced proactive and polymorphing cloud application adaptation with MORPHEM...Advanced proactive and polymorphing cloud application adaptation with MORPHEM...
Advanced proactive and polymorphing cloud application adaptation with MORPHEM...
 
Open Source governance and the Eclipse Foundation, OW2online, June 2020
Open Source governance and the Eclipse Foundation, OW2online, June 2020Open Source governance and the Eclipse Foundation, OW2online, June 2020
Open Source governance and the Eclipse Foundation, OW2online, June 2020
 
Open source contribution policies, OW2online, June 2020
Open source contribution policies, OW2online, June 2020Open source contribution policies, OW2online, June 2020
Open source contribution policies, OW2online, June 2020
 
Software development at scale, pandemic lockdown and oss ecosystems, OW2onlin...
Software development at scale, pandemic lockdown and oss ecosystems, OW2onlin...Software development at scale, pandemic lockdown and oss ecosystems, OW2onlin...
Software development at scale, pandemic lockdown and oss ecosystems, OW2onlin...
 
Overview of the OpenChain Reference Tooling Work Group, OW2online20, June 2020
Overview of the OpenChain Reference Tooling Work Group, OW2online20, June 2020Overview of the OpenChain Reference Tooling Work Group, OW2online20, June 2020
Overview of the OpenChain Reference Tooling Work Group, OW2online20, June 2020
 
Open Source Compliance at Orange, OW2online, June 2020
Open Source Compliance at Orange, OW2online, June 2020Open Source Compliance at Orange, OW2online, June 2020
Open Source Compliance at Orange, OW2online, June 2020
 
Ideas, methods and tools for OSS Compliance assessment, OW2online, June 2020
Ideas, methods and tools for OSS Compliance assessment, OW2online, June 2020Ideas, methods and tools for OSS Compliance assessment, OW2online, June 2020
Ideas, methods and tools for OSS Compliance assessment, OW2online, June 2020
 
Intelligent package management with FASTEN, OW2online, June 2020
Intelligent package management with FASTEN, OW2online, June 2020Intelligent package management with FASTEN, OW2online, June 2020
Intelligent package management with FASTEN, OW2online, June 2020
 
DECODER, a Smarter Environment for DevOps Teams , OW2online, June 2020
DECODER, a Smarter Environment for DevOps Teams , OW2online, June 2020DECODER, a Smarter Environment for DevOps Teams , OW2online, June 2020
DECODER, a Smarter Environment for DevOps Teams , OW2online, June 2020
 
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...Enabling DevOps for IoT software development, powered by Open Source, OW2onli...
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...
 
Upcoming Challenges in Artificial Intelligence Research and Development, OW2o...
Upcoming Challenges in Artificial Intelligence Research and Development, OW2o...Upcoming Challenges in Artificial Intelligence Research and Development, OW2o...
Upcoming Challenges in Artificial Intelligence Research and Development, OW2o...
 
Cacti and Big Data at Orange France, OW2online, June 2020
Cacti and Big Data at Orange France, OW2online, June 2020Cacti and Big Data at Orange France, OW2online, June 2020
Cacti and Big Data at Orange France, OW2online, June 2020
 

Recently uploaded

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Avoiding the tragedy of the commons: some lessons from the Software Heritage experience, Stefano Zacchiroli, Software heritage, Paris Open Source Summit 2017

  • 1. Software Heritage the Great Library of Free and Open Source Software Stefano Zacchiroli University Paris Diderot & Inria — zack@upsilon.cc 7 December 2017 Paris Open Source Summit THE GREAT LIBRARY OF SOURCE CODE Stefano Zacchiroli Software Heritage 07/12/2017, POSS 1 / 14
  • 2. Software is everywhere Software Stefano Zacchiroli Software Heritage 07/12/2017, POSS 2 / 14
  • 3. Our Software Commons Definition (Commons) The commons is the cultural and natural resources accessible to all members of a society, including natural materials such as air, water, and a habitable earth. These resources are held in common, not owned privately. https://en.wikipedia.org/wiki/Commons Definition (Software Commons) The software commons consists of all computer software which is available at little or no cost and which can be altered and reused with few restrictions. Thus all open source software and all free software are part of the [software] commons. [...] https://en.wikipedia.org/wiki/Software_Commons Stefano Zacchiroli Software Heritage 07/12/2017, POSS 3 / 14
  • 4. Our Software Commons Definition (Commons) The commons is the cultural and natural resources accessible to all members of a society, including natural materials such as air, water, and a habitable earth. These resources are held in common, not owned privately. https://en.wikipedia.org/wiki/Commons Definition (Software Commons) The software commons consists of all computer software which is available at little or no cost and which can be altered and reused with few restrictions. Thus all open source software and all free software are part of the [software] commons. [...] https://en.wikipedia.org/wiki/Software_Commons Source code is a precious part of our commons are we taking care of it? Stefano Zacchiroli Software Heritage 07/12/2017, POSS 3 / 14
  • 5. Software is fragile Like all digital information, FOSS is fragile inconsiderate and/or malicious code loss (e.g., Code Spaces) business-driven code loss (e.g., Gitorious, Google Code) for obsolete code: physical media decay (data rot) Stefano Zacchiroli Software Heritage 07/12/2017, POSS 4 / 14
  • 6. Software is fragile Like all digital information, FOSS is fragile inconsiderate and/or malicious code loss (e.g., Code Spaces) business-driven code loss (e.g., Gitorious, Google Code) for obsolete code: physical media decay (data rot) Where is the archive... where we go if (a repository on) GitHub or GitLab.com goes away? Stefano Zacchiroli Software Heritage 07/12/2017, POSS 4 / 14
  • 7. Software lacks its own research infrastructure A wealth of software research on crucial issues... safety, security, test, verification, proof software engineering, software evolution big data, machine learning, empirical studies Stefano Zacchiroli Software Heritage 07/12/2017, POSS 5 / 14
  • 8. Software lacks its own research infrastructure A wealth of software research on crucial issues... safety, security, test, verification, proof software engineering, software evolution big data, machine learning, empirical studies If you study the stars, you go to Atacama... ... where is the very large telescope of source code? Stefano Zacchiroli Software Heritage 07/12/2017, POSS 5 / 14
  • 9. The Software Heritage Project THE GREAT LIBRARY OF SOURCE CODE Our mission Collect, preserve and share the source code of all the software that is publicly available. Past, present and future Preserving the past, enhancing the present, preparing the future. Stefano Zacchiroli Software Heritage 07/12/2017, POSS 6 / 14
  • 10. We are working on the foundations Stefano Zacchiroli Software Heritage 07/12/2017, POSS 7 / 14
  • 11. Core principles Technology transparency and FOSS replicas all the way down Content source code first no a priori selection intrinsic identifiers facts and provenance Organization non-profit multi-stakeholder Stefano Zacchiroli Software Heritage 07/12/2017, POSS 8 / 14
  • 12. Data flow dsc dsc hg hg hg git git git git svn svn svn tar zip software origins Package repos Software Heritage Archive Forges GitHub lister GitLab lister Debian lister Git loader Mercurial loader Debian source package loader PyPi lister tar loader Merkle DAG + blob storage . . . . . . Distros ... Scheduling Listing (full/incremental) Loading & deduplication Stefano Zacchiroli Software Heritage 07/12/2017, POSS 9 / 14
  • 13. The archive: a (giant) Merkle DAG Stefano Zacchiroli Software Heritage 07/12/2017, POSS 10 / 14
  • 14. Archive coverage Our current sources GitHub Debian, GNU WIP: Gitorious, Google Code, Bitbucket Stefano Zacchiroli Software Heritage 07/12/2017, POSS 11 / 14
  • 15. Archive coverage Our current sources GitHub Debian, GNU WIP: Gitorious, Google Code, Bitbucket 150 TB blobs, 5 TB database (as a graph: 7 B nodes + 60 B edges) Stefano Zacchiroli Software Heritage 07/12/2017, POSS 11 / 14
  • 16. Archive coverage Our current sources GitHub Debian, GNU WIP: Gitorious, Google Code, Bitbucket 150 TB blobs, 5 TB database (as a graph: 7 B nodes + 60 B edges) The richest source code archive already, ... and growing daily! Stefano Zacchiroli Software Heritage 07/12/2017, POSS 11 / 14
  • 17. Roadmap Features... (done) lookup by content hash browsing: "wayback machine" for archived code (done) via Web API (todo) via Web UI (todo) download: wget / git clone from the archive (todo) deposit of source code bundles directly to the archive (todo) provenance lookup for all archived content (todo) full-text search on all archived source code files Stefano Zacchiroli Software Heritage 07/12/2017, POSS 12 / 14
  • 18. Roadmap Features... (done) lookup by content hash browsing: "wayback machine" for archived code (done) via Web API (todo) via Web UI (todo) download: wget / git clone from the archive (todo) deposit of source code bundles directly to the archive (todo) provenance lookup for all archived content (todo) full-text search on all archived source code files ... and much more than one could possibly imagine all the world’s software development history in a single graph! Stefano Zacchiroli Software Heritage 07/12/2017, POSS 12 / 14
  • 19. The Software Heritage community Sponsors Testimonials UNESCO/Inria agreement (April 3rd, 2017) Stefano Zacchiroli Software Heritage 07/12/2017, POSS 13 / 14
  • 20. Conclusion it is now urgent to preserve software source code Software Heritage is taking a systematic approach at it Software Heritage has already assembled the largest archive to date Software Heritage has synergies with cultural, research, and industry needs it is a shared infrastructure that can benefit us all... ... we should collaborate and pool resources to make it so Come in, we’re open! www.softwareheritage.org — learn more www.softwareheritage.org/support/sponsors/ — sponsoring info forge.softwareheritage.org — our own code Thanks! Stefano Zacchiroli Software Heritage 07/12/2017, POSS 14 / 14