Software Heritage
A revolutionary infrastructure for Open Source
Roberto Di Cosmo
June 1st, 2020
OpenChain Webinar
THE GREAT LIBRARY OF SOURCE CODE
Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 1 / 12
Outline
1 Introduction
2 Knowing Open Source Software
3 Meet Software Heritage
4 Zoom on selected industry use cases
5 Conclusion
Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 2 / 12
Short Bio: Roberto Di Cosmo
Computer Science professor in Paris, now working at INRIA
30 years of research (Theor. CS, Programming, Software Engineering, Erdos #: 3)
20 years of Free and Open Source Software
10 years building and directing structures for the common good
1999 DemoLinux – first live GNU/Linux distro
2007 Free Software Thematic Group
150 members 40 projects 200Me
2015 Software Heritage at INRIA
2018 National Committee for Open Science, France
Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 2 / 12
Outline
1 Introduction
2 Knowing Open Source Software
3 Meet Software Heritage
4 Zoom on selected industry use cases
5 Conclusion
Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 3 / 12
Reuse is the new rule ... ... KYSW is coming!
Reuse is the new rule
80% to 90% of a new application is ... just reuse! (Sonatype survey, 2017)
Where does reused software come from? Do you know where it comes from?
the software you ship
the software you use
the software you acquire
the software that
has that bug
has that vulnerability
KYSW: Know Your SoftWare
Like KYC in banking, KYSW is now essential all over IT
Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 3 / 12
An open approach is needed
Open Data
Open Standards
Open Process
Open
Tools
Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 4 / 12
Outline
1 Introduction
2 Knowing Open Source Software
3 Meet Software Heritage
4 Zoom on selected industry use cases
5 Conclusion
Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 5 / 12
Software Heritage, in a nutshell www.softwareheritage.org
THE GREAT LIBRARY OF SOURCE CODE
Collect, preserve and share the source code of all the software
Preserving our heritage, enabling better software and better science for all
Reference catalog
find and reference all the
source code
Universal archive
preserve all the source
code
Research infrastructure
enable analysis of all the
source code
Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 5 / 12
An international, non profit initiative built for the long term
Sharing the vision
And many more ...
www.softwareheritage.org/support/testimonials
Donors, members, sponsors
Platinum sponsors
Silver sponsors
Bronze sponsors
Gold sponsor
Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 6 / 12
A dedicated team
Find us at
https://www.softwareheritage.org/people/
Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 7 / 12
The largest software archive, a shared infrastructure
Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 8 / 12
A revolutionary infrastructure for software source code
The graph of Software Development
Snapshots
Releases
Revisions
Directories
Contents
All software development in a
single graph ...
a long term archive
preserve open source
ensure access
The blockchain of Software Development
... a single Merkle
graph
cryptographic identifiers for SBOMs
trusted traceability
20B+ artifacts already
Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 9 / 12
Outline
1 Introduction
2 Knowing Open Source Software
3 Meet Software Heritage
4 Zoom on selected industry use cases
5 Conclusion
Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 10 / 12
Software Heritage Identifiers (SWHID) link to full docs
Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 10 / 12
Industry use cases (selection)
Open Source complete and corresponding source code distribution (Intel)
Software Heritage members can:
archive source code in Software Heritage, distribute only the SWHID
Traceability and integrity (OIN for the Linux System Definition)
Software Heritage members can:
archive source code in Software Heritage
track it and verify its integrity using its SWHID
And much more!
provenance/compliance (collaborations with Intel, FossId, CAST, ...)
security (ongoing collaboration, US Department of Commerce)
supply chain management, long term archive add your use case here
Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 11 / 12
Outline
1 Introduction
2 Knowing Open Source Software
3 Meet Software Heritage
4 Zoom on selected industry use cases
5 Conclusion
Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 12 / 12
Join the revolution!
www.softwareheritage.org @swheritage
Learn more
SHWIDs https://docs.softwareheritage.org/devel/
swh-model/persistent-identifiers.html
Archive https://archive.softwareheritage.org/
News https://www.softwareheritage.org/blog/
Becoming a member
https://sponsorship.softwarheritage.org
Contact: mailto:sponsor@softwareheritage.org
Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 12 / 12
Automation, and storage
Git
loader
Mercurial
loader
Debian source
package loader
tar loader
.
.
.
Software Heritage Archive
Merkle DAG + blob storage
Loading
& deduplication
dsc
dsc
hg
hg
hg
git
git
git git
svn
svn
svn
tar
zip
software
origins
Package
repos
Forges
GitHub
lister
GitLab
lister
Debian
lister
PyPi
lister
.
.
.
Distros
...
Scheduling
Listing
(full/incremental)
full development history permanently archived!
over 8 billions unique source files from 120+ million origins
Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 1 / 1

OpenChain Webinar #5: Software Heritage

  • 1.
    Software Heritage A revolutionaryinfrastructure for Open Source Roberto Di Cosmo June 1st, 2020 OpenChain Webinar THE GREAT LIBRARY OF SOURCE CODE Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 1 / 12
  • 2.
    Outline 1 Introduction 2 KnowingOpen Source Software 3 Meet Software Heritage 4 Zoom on selected industry use cases 5 Conclusion Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 2 / 12
  • 3.
    Short Bio: RobertoDi Cosmo Computer Science professor in Paris, now working at INRIA 30 years of research (Theor. CS, Programming, Software Engineering, Erdos #: 3) 20 years of Free and Open Source Software 10 years building and directing structures for the common good 1999 DemoLinux – first live GNU/Linux distro 2007 Free Software Thematic Group 150 members 40 projects 200Me 2015 Software Heritage at INRIA 2018 National Committee for Open Science, France Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 2 / 12
  • 4.
    Outline 1 Introduction 2 KnowingOpen Source Software 3 Meet Software Heritage 4 Zoom on selected industry use cases 5 Conclusion Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 3 / 12
  • 5.
    Reuse is thenew rule ... ... KYSW is coming! Reuse is the new rule 80% to 90% of a new application is ... just reuse! (Sonatype survey, 2017) Where does reused software come from? Do you know where it comes from? the software you ship the software you use the software you acquire the software that has that bug has that vulnerability KYSW: Know Your SoftWare Like KYC in banking, KYSW is now essential all over IT Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 3 / 12
  • 6.
    An open approachis needed Open Data Open Standards Open Process Open Tools Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 4 / 12
  • 7.
    Outline 1 Introduction 2 KnowingOpen Source Software 3 Meet Software Heritage 4 Zoom on selected industry use cases 5 Conclusion Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 5 / 12
  • 8.
    Software Heritage, ina nutshell www.softwareheritage.org THE GREAT LIBRARY OF SOURCE CODE Collect, preserve and share the source code of all the software Preserving our heritage, enabling better software and better science for all Reference catalog find and reference all the source code Universal archive preserve all the source code Research infrastructure enable analysis of all the source code Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 5 / 12
  • 9.
    An international, nonprofit initiative built for the long term Sharing the vision And many more ... www.softwareheritage.org/support/testimonials Donors, members, sponsors Platinum sponsors Silver sponsors Bronze sponsors Gold sponsor Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 6 / 12
  • 10.
    A dedicated team Findus at https://www.softwareheritage.org/people/ Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 7 / 12
  • 11.
    The largest softwarearchive, a shared infrastructure Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 8 / 12
  • 12.
    A revolutionary infrastructurefor software source code The graph of Software Development Snapshots Releases Revisions Directories Contents All software development in a single graph ... a long term archive preserve open source ensure access The blockchain of Software Development ... a single Merkle graph cryptographic identifiers for SBOMs trusted traceability 20B+ artifacts already Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 9 / 12
  • 13.
    Outline 1 Introduction 2 KnowingOpen Source Software 3 Meet Software Heritage 4 Zoom on selected industry use cases 5 Conclusion Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 10 / 12
  • 14.
    Software Heritage Identifiers(SWHID) link to full docs Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 10 / 12
  • 15.
    Industry use cases(selection) Open Source complete and corresponding source code distribution (Intel) Software Heritage members can: archive source code in Software Heritage, distribute only the SWHID Traceability and integrity (OIN for the Linux System Definition) Software Heritage members can: archive source code in Software Heritage track it and verify its integrity using its SWHID And much more! provenance/compliance (collaborations with Intel, FossId, CAST, ...) security (ongoing collaboration, US Department of Commerce) supply chain management, long term archive add your use case here Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 11 / 12
  • 16.
    Outline 1 Introduction 2 KnowingOpen Source Software 3 Meet Software Heritage 4 Zoom on selected industry use cases 5 Conclusion Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 12 / 12
  • 17.
    Join the revolution! www.softwareheritage.org@swheritage Learn more SHWIDs https://docs.softwareheritage.org/devel/ swh-model/persistent-identifiers.html Archive https://archive.softwareheritage.org/ News https://www.softwareheritage.org/blog/ Becoming a member https://sponsorship.softwarheritage.org Contact: mailto:sponsor@softwareheritage.org Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 12 / 12
  • 18.
    Automation, and storage Git loader Mercurial loader Debiansource package loader tar loader . . . Software Heritage Archive Merkle DAG + blob storage Loading & deduplication dsc dsc hg hg hg git git git git svn svn svn tar zip software origins Package repos Forges GitHub lister GitLab lister Debian lister PyPi lister . . . Distros ... Scheduling Listing (full/incremental) full development history permanently archived! over 8 billions unique source files from 120+ million origins Roberto Di Cosmo www.softwareheritage.org Software Heritage: key infrastructure CC-BY 4.0 June 1st, 2020 1 / 1