The goal of the Software Heritage project is to collect, preserve, and share all publicly available software in source code form. Forever.
By doing so Software Heritage will serve the needs of: Society, by preserving our collective technological heritage; Industry, by building the largest software provenance open database; Science, by assembling the largest curated archive for software research; and Education, by creating the ultimate anthology for programming curricula.
Although still in Beta, Software Heritage has already archived more than 2.5 billion unique source code files and 600 million unique commits, spanning more than 20 million projects from major software development hubs, GNU/Linux distributions, and upstream software collections.
Software Heritage is developed transparently as a collaborative project and all its own source code is available as Free/Open Source Software. Currently incubated by Inria, the project will graduate soon to an independent charitable, nonprofit organization.
TeamStation AI System Report LATAM IT Salaries 2024
Software Heritage: Building the Universal Software Archive, OW2con'16, Paris.
1. Software Heritage
Building the Universal Software Archive
Stefano Zacchiroli
Inria & University Paris Diderot
zack@softwareheritage.org
21 September 2016
OW2con
Paris, France
Stefano Zacchiroli Software Heritage 21/09/2016, OW2con 1 / 9
2. Software is pervasive
At the heart of our society
Software
communication, entertainment
administration, finance
health, energy, transportation
education, research, politics
...
Stefano Zacchiroli Software Heritage 21/09/2016, OW2con 2 / 9
3. Software is pervasive
At the heart of our society
Software
communication, entertainment
administration, finance
health, energy, transportation
education, research, politics
...
At the heart of technology
house appliances ≈ 10M SLOC
phones ≈ 20M SLOC, cars ≈ 100M SLOC
Internet of things, ...
Stefano Zacchiroli Software Heritage 21/09/2016, OW2con 2 / 9
4. Software is knowledge
Key mediator for accessing all information
Information is a main pillar of our modern societies.
Absent an ability to correctly interpret digi-
tal information, we are left with [...] "rotting
bits" [...] of no value.
Vinton G. Cerf IEEE 2011
Stefano Zacchiroli Software Heritage 21/09/2016, OW2con 3 / 9
5. Software is knowledge
Key mediator for accessing all information
Information is a main pillar of our modern societies.
Absent an ability to correctly interpret digi-
tal information, we are left with [...] "rotting
bits" [...] of no value.
Vinton G. Cerf IEEE 2011
Essential component of modern scientific research
[...] the vast majority describe experimental
methods or sofware that have become essential
in their fields.
Top 100 papers (Nature, October 2014)
Stefano Zacchiroli Software Heritage 21/09/2016, OW2con 3 / 9
6. Software is knowledge
Key mediator for accessing all information
Information is a main pillar of our modern societies.
Absent an ability to correctly interpret digi-
tal information, we are left with [...] "rotting
bits" [...] of no value.
Vinton G. Cerf IEEE 2011
Essential component of modern scientific research
[...] the vast majority describe experimental
methods or sofware that have become essential
in their fields.
Top 100 papers (Nature, October 2014)
Bottom line: software embodies our Knowledge and Cultural Heritage
It must be collected, preserved, referenced and made accessible!
Stefano Zacchiroli Software Heritage 21/09/2016, OW2con 3 / 9
7. Software is fragile
like all digital information, FOSS is fragile
inconsiderate and/or malicious code loss (e.g., Code Spaces)
business-driven code loss (e.g., Gitorious, Google Code)
for obsolete code: physical media decay (data rot)
Stefano Zacchiroli Software Heritage 21/09/2016, OW2con 4 / 9
8. Software is fragile
like all digital information, FOSS is fragile
inconsiderate and/or malicious code loss (e.g., Code Spaces)
business-driven code loss (e.g., Gitorious, Google Code)
for obsolete code: physical media decay (data rot)
If a website disappears you go to the Internet Archive...
... where do you go if (a repository on) GitHub goes away?
Stefano Zacchiroli Software Heritage 21/09/2016, OW2con 4 / 9
9. Mission
Collect, organise, preserve and share all the software source code that
lies at the heart of our culture and our society.
https://www.softwareheritage.org/
Stefano Zacchiroli Software Heritage 21/09/2016, OW2con 5 / 9
10. The archive
Our sources
GitHub — all public repositories as of August 2016
Debian — daily snapshots of all suites since 2005–2015
GNU — all releases as of August 2015
Gitorious — retrieved full mirror from Archive Team
Google Code — retrieved full mirror from Google
Stefano Zacchiroli Software Heritage 21/09/2016, OW2con 6 / 9
11. The archive
Our sources
GitHub — all public repositories as of August 2016
Debian — daily snapshots of all suites since 2005–2015
GNU — all releases as of August 2015
Gitorious — retrieved full mirror from Archive Team
Google Code — retrieved full mirror from Google
Some numbers
Stefano Zacchiroli Software Heritage 21/09/2016, OW2con 6 / 9
12. The archive
Our sources
GitHub — all public repositories as of August 2016
Debian — daily snapshots of all suites since 2005–2015
GNU — all releases as of August 2015
Gitorious — retrieved full mirror from Archive Team
Google Code — retrieved full mirror from Google
Some numbers
The richest source code archive already, ... and growing daily!
Stefano Zacchiroli Software Heritage 21/09/2016, OW2con 6 / 9
13. The road ahead
Planned features...
lookup by hashes for contents (done)
download: git clone from Software Heritage
provenance information for all the content
browsing: wayback machine for software source code
full text search: dive into the Software Heritage archive
Stefano Zacchiroli Software Heritage 21/09/2016, OW2con 7 / 9
14. The road ahead
Planned features...
lookup by hashes for contents (done)
download: git clone from Software Heritage
provenance information for all the content
browsing: wayback machine for software source code
full text search: dive into the Software Heritage archive
... and much more one could possibly imagine
all the world’s software development history in a single graph!
that makes a 150TB archive / 5TB database already...
Stefano Zacchiroli Software Heritage 21/09/2016, OW2con 7 / 9
15. Making it happen
Inria as initiator
funds the bootstrap phase of Software Heritage
an agreement with is coming soon!
Stefano Zacchiroli Software Heritage 21/09/2016, OW2con 8 / 9
16. Making it happen
Inria as initiator
funds the bootstrap phase of Software Heritage
an agreement with is coming soon!
Testimonials and early partners
ACM, Bell Labs, Creative Commons, DANS, Eclipse, Engineering,
FSF, OSI, GitHub, GitLab, IEEE, Informatics Europe, Microsoft, OIN,
OW2, SIF, SFC, SFLC, The Document Foundation, The Linux
Foundation, ...
Going global
building an open, multistakeholder, nonprofit organisation
Stefano Zacchiroli Software Heritage 21/09/2016, OW2con 8 / 9
17. Conclusion
Software Heritage is
a revolutionary reference archive of all software ever written
a fantastic new tool for research software
an international, open, nonprofit, mutualized infrastructure
at the service of our community, at the service of society!
Stefano Zacchiroli Software Heritage 21/09/2016, OW2con 9 / 9
18. Conclusion
Software Heritage is
a revolutionary reference archive of all software ever written
a fantastic new tool for research software
an international, open, nonprofit, mutualized infrastructure
at the service of our community, at the service of society!
Now open
www.softwareheritage.org — sponsoring, partnerships
wiki.softwareheritage.org — working groups, leads
forge.softwareheritage.org — our own code
Questions?
Stefano Zacchiroli Software Heritage 21/09/2016, OW2con 9 / 9