SlideShare a Scribd company logo
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
Web Archiving in the Year
eaee1902f186819154789ee22ca30035
Michael L. Nelson
@phonedude_mln
with: Scott Ainsworth, Sawood Alam, Mohamed Aturban, John Berlin, Justin
Brunelle, Kritika Garg, Hussam Hallak, Himarsha Jayanetti, Mat Kelly,
Michele C. Weigle
@WebSciDL
Trust in Web Archives Panel, 2021 Web Archiving Conference
2021-06-16
$ echo "2025" | md5
eaee1902f186819154789ee22ca30035
$ # I read somewhere that hashes
$ # were better than datetime
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
My Vision for Trustworthy
Web Archiving in 2025
2
Hundreds of publicly available,
independent, interoperable, robust,
auditable, cooperating web archives.
#Disclaimer: “…both the live Web and the Wayback Machine [...] are reasonably reliable for everyday use”
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
My Vision for Trustworthy
Web Archiving in 2025
3
Hundreds of publicly available,
independent, interoperable, robust,
auditable, cooperating web archives.
This is doable by 2025. But let’s look
further at the challenges that could stop
us from achieving this goal.
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
4
Hundreds of publicly available,
independent, interoperable, robust,
auditable, cooperating web archives.
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
IA: the Walter Cronkite of web archives?
5
https://www.britannica.com/biography/Walter-Cronkite
https://medium.com/tvnewsanalyzer/visualizing-the-who-and-what-of-cable-tv-news-f51d314b4c2d
Cable news now offers greater diversity, representation, and POV. However, few
anchors offer the gravitas of “Uncle Walter”, “the most trusted man in America”,
and some intentionally deceive.
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
Are we close to 100s of archives?
IIPC has 60+ members!
6
Members are not 1:1 with archives.
OTOH, there are many archives who are not IIPC members.
We certain have “dozens” of archives.
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
Will the number of archives continue to grow?
Maybe not -- innumerable examples point toward
centralization / consolidation
7
https://www.currentware.com/the-state-of-the-web-browser-in-2020/
https://en.wikipedia.org/wiki/Elsevier
https://www.forbes.com/sites/sergeiklebnikov/2019/10/15/faang-facebook-amazon-etc-stocks-have-lagged-this-year-heres-why/
IA has admirably supported the Decentralized Web movement.
https://blog.archive.org/tag/decentralized-web/
But centralization is about economics, not technologies:
DSHR: “Unless decentralized technologies specifically address the issue of how
to avoid increasing returns to scale they will not, of themselves, fix this economic
problem. Their increasing returns to scale will drive layering centralized
businesses on top of decentralized infrastructure, replicating the problem we
face now, just on different infrastructure.”
https://blog.dshr.org/2017/08/why-is-web-centralized.html
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
8
Hundreds of publicly available,
independent, interoperable, robust,
auditable, cooperating web archives.
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
We estimated that ~2/3 of web traffic
is not publicly archivable
9
https://ws-dl.blogspot.com/2018/07/2018-07-18-why-we-need-private-web.html
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
Tools for archiving the private web exist,
but the practice, at least as we might think of it,
is not yet widespread
10
https://oduwsdl.github.io/nehdhig2017/
https://ws-dl.blogspot.com/2019/09/2019-09-02-so-long-and-thanks-for-all.html
https://replayweb.page/
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
Commercial private (web) archives largely uninformed by
IIPC, Wayback, Heritrix, pywb, Brozzler et al.
11
https://www.g2.com/products/pagefreezer/competitors/alternatives
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
Dark web archives :-(
12
$ curl -I
https://www.webarchive.org.uk/wayback/archive/2015093
0064233mp_/http://sigbi.org/
HTTP/1.1 451 Unavailable For Legal Reasons
Server: nginx/1.20.1
Date: Tue, 08 Jun 2021 16:46:14 GMT
Content-Type: text/html
Content-Length: 3947
Connection: keep-alive
$
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
13
Hundreds of publicly available,
independent, interoperable, robust,
auditable, cooperating web archives.
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
Three copies archived at exactly the same time --
What are the chances?!
Actually, there are three copies of the same observation, not three independent observations.
14
$ curl -iLs
memgator.cs.odu.edu/timemap/link/https://blog.reidreport.com |
grep 20051213063757
<https://webarchive.loc.gov/all/20051213063757/http://blog.reidre
port.com/>; rel="memento"; datetime="Tue, 13 Dec 2005 06:37:57
GMT",
<http://archive.md/20051213063757/http://blog.reidreport.com/>;
rel="memento"; datetime="Tue, 13 Dec 2005 06:37:57 GMT",
<https://web.archive.org/web/20051213063757/http://blog.reidrepor
t.com/>; rel="memento"; datetime="Tue, 13 Dec 2005 06:37:57 GMT",
It will never be 2005 again, so hosting IA’s WARC files from 2005 is the best we can do.
Going forward, it would be nice to have 3+ independent observations, which could all be
different because of GeoIP, personalization, CDN status, etc.
Then it’s up to the reader to determine if the differences
are semantically meaningful.
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
15
Hundreds of publicly available,
independent, interoperable, robust,
auditable, cooperating web archives.
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
Homogeneity is not true interoperability
16
https://netpreserveblog.wordpress.com/2020/12/16/openwayback-to-pywb-transition-guide/
http://webarchive.cdlib.org/
https://ws-dl.blogspot.com/2019/09/2019-09-10-where-did-archive-go-part-2.html
I don’t fault the staff who converge on popular, high-quality tech stacks & services,
but I do lament the loss of heterogeneity.
True interoperability comes through the hard work of protocols and standards.
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
17
Hundreds of publicly available,
independent, interoperable, robust,
auditable, cooperating web archives.
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
2017: First published works about
robustness vs. malicious .html/.js?
18
http://labs.rhizome.org/presentations/security.html#/
https://blog.dshr.org/2017/06/wac2017-security-issues-for-web-archives.html
https://acmccs.github.io/papers/p1741-lernerAT3.pdf
https://blog.dshr.org/2017/09/attacking-users-of-wayback-machine.html
Prior to these works, our group (@WebSciDL)
had observed: Zombies (live web leakage into
the archive), Temporal Violations (replaying web
pages that never existed), Cookie Violations,
Twitter replay problems, etc., but we never
considered ingesting malicious .html/.js until
these groundbreaking pubs.
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
2018: Web IDL & Client-side rewriting
2020: Analysis of attacks on rehosting sites
19
https://ws-dl.blogspot.com/2018/04/2018-05-01-high-fidelity-ms-thesis-to.html
https://www.ndss-symposium.org/ndss-paper/melting-pot-of-origins-compromising-the-intermediary-web-services-that-rehost-websites/
I signed off on John’s thesis 3 years ago, but
I’m only now really understanding it.
Key contribution: web archives
as subclass of rehosting sites.
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
20
Hundreds of publicly available,
independent, interoperable, robust,
auditable, cooperating web archives.
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
“No man ever steps in the same river twice, for it's not
the same river and he's not the same man”
21
For third party playback, we are far
from being able to do meaningful
audits: replaying the same archived
page over and over produces
different results.
Left: Reload 1566 archived pages 39 times over 1
year.
Green=resource loaded,
Gray = resource not loaded,
Black line = baseline download.
https://github.com/oduwsdl/mementos-fixity
Conventional fixity-based
approaches will not work.
https://www.slideshare.net/phonedude/blockchain-
can-not-be-used-to-verify-replayed-archived-web-p
ages-125618706
We can’t depend on the archive for
fixity; archives change and/or die.
Cf. “Where did the archive go?”
(parts 1, 2, 3, 4) &
“Archive Assisted Archival Fixity
Verification Framework”
https://arxiv.org/abs/1905.12565
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
22
Hundreds of publicly available,
independent, interoperable, robust,
auditable, cooperating web archives.
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
That archives don’t ingest the output of other archives
is a lack of interoperability.
That we’re not more concerned about this is a lack of cooperation.
23
https://www.slideshare.net/phonedude/web-archives-at-the-nexus-of-good-fakes-and-flawed-originals/87
1 2
3 4
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
Kudos to archive.today for preserving machine-readable
source metadata and including it in the UI
24
n.b. tracking source is built-in to NNTP, SMTP, Atom, etc.
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
APIs are necessary but not sufficient.
We must be able to preserve/audit the data (e.g., WARC, HAR) as
rendered through software (e.g., pywb), not just the data.
25
https://github.com/WASAPI-Community/data-transfer-apis
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
26
Hundreds of publicly available,
independent, interoperable, robust,
auditable, cooperating web archives.
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
These apps probably* use HTTP, json, etc.,
but what’s their URL? Are they even still web?
27
* I really don’t know (WebRTC?). And if they don’t, that further proves my point.
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
28
Hundreds of publicly available,
independent, interoperable, robust,
auditable, cooperating web archives.
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
More than just Wayback Machines:
we must accommodate any system that supports
rehosting and/or revisions
29
see also: https://www.slideshare.net/ibnesayeed/readying-web-archives-to-consume-and-leverage-web-bundles
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL
Web Archiving in the Year
312351bff07989769097660a56395065
30
$ echo -n "2025" | md5
312351bff07989769097660a56395065
$ # oh no - the hash changed from slide 1
$ # is this content drift?!
Hundreds of publicly available,
independent, interoperable, robust,
auditable, cooperating web archives.
Can we achieve this by 2025? Yes.
Will we achieve this by 2025? Maybe.
Will we “solve” trust? No.
Technical definitions (e.g., ISO 16363) notwithstanding,
“trust” in web archives might be better understood as analogous to
“relevance” in info retrieval: defined by a user’s information need.

More Related Content

What's hot

@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015
Michael Nelson
 
Reading Beyond the Book for ICJSE
Reading Beyond the Book for ICJSEReading Beyond the Book for ICJSE
Reading Beyond the Book for ICJSEJen LaMaster
 
2014 03 Google Docs-Custom-Newsreader
2014 03 Google Docs-Custom-Newsreader2014 03 Google Docs-Custom-Newsreader
2014 03 Google Docs-Custom-Newsreader
Invenio Advisors, LLC
 
Wizard of Apps Revised
Wizard of Apps RevisedWizard of Apps Revised
Wizard of Apps Revised
Joyce Kasman Valenza
 
I know how to search the internet,
I know how to search the internet,I know how to search the internet,
I know how to search the internet,
Hindie Dershowitz
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich  the Live Web Experience Through StorytellingUsing Web Archives to Enrich  the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Yasmin AlNoamany, PhD
 
The State of Digital Pedagogy: The Intersection of Networks & Institutions
The State of Digital Pedagogy: The Intersection of Networks & InstitutionsThe State of Digital Pedagogy: The Intersection of Networks & Institutions
The State of Digital Pedagogy: The Intersection of Networks & Institutions
Bonnie Stewart
 
IACE-T Presentation
IACE-T PresentationIACE-T Presentation
IACE-T Presentation
ext504
 
UMASL Search Like a Pro
UMASL Search Like a ProUMASL Search Like a Pro
UMASL Search Like a Probsdesantis
 
Open Apereo 19 Privacy Keynote
Open Apereo 19 Privacy KeynoteOpen Apereo 19 Privacy Keynote
Open Apereo 19 Privacy Keynote
Ian Dolphin
 
Aggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity FrameworkAggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity Framework
Mat Kelly
 
#OESS18 | Holding the Line on Open in an Evolving Course Content Landscape
#OESS18 | Holding the Line on Open in an Evolving Course Content Landscape#OESS18 | Holding the Line on Open in an Evolving Course Content Landscape
#OESS18 | Holding the Line on Open in an Evolving Course Content Landscape
Nicole Allen
 

What's hot (12)

@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015
 
Reading Beyond the Book for ICJSE
Reading Beyond the Book for ICJSEReading Beyond the Book for ICJSE
Reading Beyond the Book for ICJSE
 
2014 03 Google Docs-Custom-Newsreader
2014 03 Google Docs-Custom-Newsreader2014 03 Google Docs-Custom-Newsreader
2014 03 Google Docs-Custom-Newsreader
 
Wizard of Apps Revised
Wizard of Apps RevisedWizard of Apps Revised
Wizard of Apps Revised
 
I know how to search the internet,
I know how to search the internet,I know how to search the internet,
I know how to search the internet,
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich  the Live Web Experience Through StorytellingUsing Web Archives to Enrich  the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
 
The State of Digital Pedagogy: The Intersection of Networks & Institutions
The State of Digital Pedagogy: The Intersection of Networks & InstitutionsThe State of Digital Pedagogy: The Intersection of Networks & Institutions
The State of Digital Pedagogy: The Intersection of Networks & Institutions
 
IACE-T Presentation
IACE-T PresentationIACE-T Presentation
IACE-T Presentation
 
UMASL Search Like a Pro
UMASL Search Like a ProUMASL Search Like a Pro
UMASL Search Like a Pro
 
Open Apereo 19 Privacy Keynote
Open Apereo 19 Privacy KeynoteOpen Apereo 19 Privacy Keynote
Open Apereo 19 Privacy Keynote
 
Aggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity FrameworkAggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity Framework
 
#OESS18 | Holding the Line on Open in an Evolving Course Content Landscape
#OESS18 | Holding the Line on Open in an Evolving Course Content Landscape#OESS18 | Holding the Line on Open in an Evolving Course Content Landscape
#OESS18 | Holding the Line on Open in an Evolving Course Content Landscape
 

Similar to Web Archiving in the Year eaee1902f186819154789ee22ca30035

Readying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web BundlesReadying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web Bundles
Sawood Alam
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning Issue
Herbert Van de Sompel
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
Michael Nelson
 
Prepare for the Mobilacalypse
Prepare for the MobilacalypsePrepare for the Mobilacalypse
Prepare for the MobilacalypseJeff Eaton
 
It is hard to compute fixity on archived web pages
It is hard to compute fixity on archived web pagesIt is hard to compute fixity on archived web pages
It is hard to compute fixity on archived web pages
maturban
 
Enabling Personal Use of Web Archives
Enabling Personal Use of Web ArchivesEnabling Personal Use of Web Archives
Enabling Personal Use of Web Archives
Michele Weigle
 
JCON 2021 talk - "Wil Git Be Around Forever? A List of Possible Successors"
JCON 2021 talk - "Wil Git Be Around Forever? A List of Possible Successors"JCON 2021 talk - "Wil Git Be Around Forever? A List of Possible Successors"
JCON 2021 talk - "Wil Git Be Around Forever? A List of Possible Successors"
🎤 Hanno Embregts 🎸
 
Build a Blockchain
Build a BlockchainBuild a Blockchain
Build a Blockchain
Ipro Tech
 
Improving Organizational Efficiency with Wiki-based Intranets
Improving Organizational Efficiency with Wiki-based IntranetsImproving Organizational Efficiency with Wiki-based Intranets
Improving Organizational Efficiency with Wiki-based Intranets
Thomas Siegers
 
Web2
Web2Web2
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Michael Nelson
 
Archive Assisted Archival Fixity Verification Framework
Archive Assisted Archival Fixity Verification FrameworkArchive Assisted Archival Fixity Verification Framework
Archive Assisted Archival Fixity Verification Framework
Sawood Alam
 
Web 1.0: The Web as Resource
Web 1.0:  The Web as ResourceWeb 1.0:  The Web as Resource
Web 1.0: The Web as ResourceJohan Koren
 
Working With Wikis Libraries Aug2007
Working With Wikis Libraries Aug2007Working With Wikis Libraries Aug2007
Working With Wikis Libraries Aug2007
Martha Rossi
 
It is hard to compute fixity on archived web pages
It is hard to compute fixity on archived web pagesIt is hard to compute fixity on archived web pages
It is hard to compute fixity on archived web pages
maturban
 
VU University Amsterdam - The Social Web 2016 - Lecture 6
VU University Amsterdam - The Social Web 2016 - Lecture 6VU University Amsterdam - The Social Web 2016 - Lecture 6
VU University Amsterdam - The Social Web 2016 - Lecture 6
Davide Ceolin
 
Semantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer AppsSemantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer Apps
Jie Bao
 
A Framework for Verifying the Fixity of Archived Web Resources
A Framework for Verifying the Fixity of Archived Web ResourcesA Framework for Verifying the Fixity of Archived Web Resources
A Framework for Verifying the Fixity of Archived Web Resources
maturban
 

Similar to Web Archiving in the Year eaee1902f186819154789ee22ca30035 (20)

Readying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web BundlesReadying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web Bundles
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning Issue
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
 
Prepare for the Mobilacalypse
Prepare for the MobilacalypsePrepare for the Mobilacalypse
Prepare for the Mobilacalypse
 
It is hard to compute fixity on archived web pages
It is hard to compute fixity on archived web pagesIt is hard to compute fixity on archived web pages
It is hard to compute fixity on archived web pages
 
Enabling Personal Use of Web Archives
Enabling Personal Use of Web ArchivesEnabling Personal Use of Web Archives
Enabling Personal Use of Web Archives
 
JCON 2021 talk - "Wil Git Be Around Forever? A List of Possible Successors"
JCON 2021 talk - "Wil Git Be Around Forever? A List of Possible Successors"JCON 2021 talk - "Wil Git Be Around Forever? A List of Possible Successors"
JCON 2021 talk - "Wil Git Be Around Forever? A List of Possible Successors"
 
Build a Blockchain
Build a BlockchainBuild a Blockchain
Build a Blockchain
 
Improving Organizational Efficiency with Wiki-based Intranets
Improving Organizational Efficiency with Wiki-based IntranetsImproving Organizational Efficiency with Wiki-based Intranets
Improving Organizational Efficiency with Wiki-based Intranets
 
Web2
Web2Web2
Web2
 
Tel Vortrag
Tel VortragTel Vortrag
Tel Vortrag
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Archive Assisted Archival Fixity Verification Framework
Archive Assisted Archival Fixity Verification FrameworkArchive Assisted Archival Fixity Verification Framework
Archive Assisted Archival Fixity Verification Framework
 
Web 1.0: The Web as Resource
Web 1.0:  The Web as ResourceWeb 1.0:  The Web as Resource
Web 1.0: The Web as Resource
 
Working With Wikis Libraries Aug2007
Working With Wikis Libraries Aug2007Working With Wikis Libraries Aug2007
Working With Wikis Libraries Aug2007
 
It is hard to compute fixity on archived web pages
It is hard to compute fixity on archived web pagesIt is hard to compute fixity on archived web pages
It is hard to compute fixity on archived web pages
 
VU University Amsterdam - The Social Web 2016 - Lecture 6
VU University Amsterdam - The Social Web 2016 - Lecture 6VU University Amsterdam - The Social Web 2016 - Lecture 6
VU University Amsterdam - The Social Web 2016 - Lecture 6
 
Web1
Web1Web1
Web1
 
Semantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer AppsSemantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer Apps
 
A Framework for Verifying the Fixity of Archived Web Resources
A Framework for Verifying the Fixity of Archived Web ResourcesA Framework for Verifying the Fixity of Archived Web Resources
A Framework for Verifying the Fixity of Archived Web Resources
 

More from Michael Nelson

Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael Nelson
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Michael Nelson
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniques
Michael Nelson
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Michael Nelson
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
Michael Nelson
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
Michael Nelson
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web Archives
Michael Nelson
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
Michael Nelson
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
Michael Nelson
 
Assessing the Quality of Web Archives
Assessing the Quality of Web ArchivesAssessing the Quality of Web Archives
Assessing the Quality of Web Archives
Michael Nelson
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
Michael Nelson
 
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolEvaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Michael Nelson
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Michael Nelson
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet Archive
Michael Nelson
 
On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over Time
Michael Nelson
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Michael Nelson
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Michael Nelson
 
More Archives, More Better
More Archives, More Better More Archives, More Better
More Archives, More Better
Michael Nelson
 
Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member
Michael Nelson
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?
Michael Nelson
 

More from Michael Nelson (20)

Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniques
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web Archives
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
 
Assessing the Quality of Web Archives
Assessing the Quality of Web ArchivesAssessing the Quality of Web Archives
Assessing the Quality of Web Archives
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolEvaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet Archive
 
On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over Time
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content Language
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
 
More Archives, More Better
More Archives, More Better More Archives, More Better
More Archives, More Better
 
Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?
 

Recently uploaded

PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
ssuserbfdca9
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
Sérgio Sacani
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 

Recently uploaded (20)

PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 

Web Archiving in the Year eaee1902f186819154789ee22ca30035

  • 1. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL Web Archiving in the Year eaee1902f186819154789ee22ca30035 Michael L. Nelson @phonedude_mln with: Scott Ainsworth, Sawood Alam, Mohamed Aturban, John Berlin, Justin Brunelle, Kritika Garg, Hussam Hallak, Himarsha Jayanetti, Mat Kelly, Michele C. Weigle @WebSciDL Trust in Web Archives Panel, 2021 Web Archiving Conference 2021-06-16 $ echo "2025" | md5 eaee1902f186819154789ee22ca30035 $ # I read somewhere that hashes $ # were better than datetime
  • 2. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL My Vision for Trustworthy Web Archiving in 2025 2 Hundreds of publicly available, independent, interoperable, robust, auditable, cooperating web archives. #Disclaimer: “…both the live Web and the Wayback Machine [...] are reasonably reliable for everyday use”
  • 3. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL My Vision for Trustworthy Web Archiving in 2025 3 Hundreds of publicly available, independent, interoperable, robust, auditable, cooperating web archives. This is doable by 2025. But let’s look further at the challenges that could stop us from achieving this goal.
  • 4. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL 4 Hundreds of publicly available, independent, interoperable, robust, auditable, cooperating web archives.
  • 5. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL IA: the Walter Cronkite of web archives? 5 https://www.britannica.com/biography/Walter-Cronkite https://medium.com/tvnewsanalyzer/visualizing-the-who-and-what-of-cable-tv-news-f51d314b4c2d Cable news now offers greater diversity, representation, and POV. However, few anchors offer the gravitas of “Uncle Walter”, “the most trusted man in America”, and some intentionally deceive.
  • 6. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL Are we close to 100s of archives? IIPC has 60+ members! 6 Members are not 1:1 with archives. OTOH, there are many archives who are not IIPC members. We certain have “dozens” of archives.
  • 7. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL Will the number of archives continue to grow? Maybe not -- innumerable examples point toward centralization / consolidation 7 https://www.currentware.com/the-state-of-the-web-browser-in-2020/ https://en.wikipedia.org/wiki/Elsevier https://www.forbes.com/sites/sergeiklebnikov/2019/10/15/faang-facebook-amazon-etc-stocks-have-lagged-this-year-heres-why/ IA has admirably supported the Decentralized Web movement. https://blog.archive.org/tag/decentralized-web/ But centralization is about economics, not technologies: DSHR: “Unless decentralized technologies specifically address the issue of how to avoid increasing returns to scale they will not, of themselves, fix this economic problem. Their increasing returns to scale will drive layering centralized businesses on top of decentralized infrastructure, replicating the problem we face now, just on different infrastructure.” https://blog.dshr.org/2017/08/why-is-web-centralized.html
  • 8. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL 8 Hundreds of publicly available, independent, interoperable, robust, auditable, cooperating web archives.
  • 9. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL We estimated that ~2/3 of web traffic is not publicly archivable 9 https://ws-dl.blogspot.com/2018/07/2018-07-18-why-we-need-private-web.html
  • 10. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL Tools for archiving the private web exist, but the practice, at least as we might think of it, is not yet widespread 10 https://oduwsdl.github.io/nehdhig2017/ https://ws-dl.blogspot.com/2019/09/2019-09-02-so-long-and-thanks-for-all.html https://replayweb.page/
  • 11. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL Commercial private (web) archives largely uninformed by IIPC, Wayback, Heritrix, pywb, Brozzler et al. 11 https://www.g2.com/products/pagefreezer/competitors/alternatives
  • 12. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL Dark web archives :-( 12 $ curl -I https://www.webarchive.org.uk/wayback/archive/2015093 0064233mp_/http://sigbi.org/ HTTP/1.1 451 Unavailable For Legal Reasons Server: nginx/1.20.1 Date: Tue, 08 Jun 2021 16:46:14 GMT Content-Type: text/html Content-Length: 3947 Connection: keep-alive $
  • 13. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL 13 Hundreds of publicly available, independent, interoperable, robust, auditable, cooperating web archives.
  • 14. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL Three copies archived at exactly the same time -- What are the chances?! Actually, there are three copies of the same observation, not three independent observations. 14 $ curl -iLs memgator.cs.odu.edu/timemap/link/https://blog.reidreport.com | grep 20051213063757 <https://webarchive.loc.gov/all/20051213063757/http://blog.reidre port.com/>; rel="memento"; datetime="Tue, 13 Dec 2005 06:37:57 GMT", <http://archive.md/20051213063757/http://blog.reidreport.com/>; rel="memento"; datetime="Tue, 13 Dec 2005 06:37:57 GMT", <https://web.archive.org/web/20051213063757/http://blog.reidrepor t.com/>; rel="memento"; datetime="Tue, 13 Dec 2005 06:37:57 GMT", It will never be 2005 again, so hosting IA’s WARC files from 2005 is the best we can do. Going forward, it would be nice to have 3+ independent observations, which could all be different because of GeoIP, personalization, CDN status, etc. Then it’s up to the reader to determine if the differences are semantically meaningful.
  • 15. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL 15 Hundreds of publicly available, independent, interoperable, robust, auditable, cooperating web archives.
  • 16. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL Homogeneity is not true interoperability 16 https://netpreserveblog.wordpress.com/2020/12/16/openwayback-to-pywb-transition-guide/ http://webarchive.cdlib.org/ https://ws-dl.blogspot.com/2019/09/2019-09-10-where-did-archive-go-part-2.html I don’t fault the staff who converge on popular, high-quality tech stacks & services, but I do lament the loss of heterogeneity. True interoperability comes through the hard work of protocols and standards.
  • 17. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL 17 Hundreds of publicly available, independent, interoperable, robust, auditable, cooperating web archives.
  • 18. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL 2017: First published works about robustness vs. malicious .html/.js? 18 http://labs.rhizome.org/presentations/security.html#/ https://blog.dshr.org/2017/06/wac2017-security-issues-for-web-archives.html https://acmccs.github.io/papers/p1741-lernerAT3.pdf https://blog.dshr.org/2017/09/attacking-users-of-wayback-machine.html Prior to these works, our group (@WebSciDL) had observed: Zombies (live web leakage into the archive), Temporal Violations (replaying web pages that never existed), Cookie Violations, Twitter replay problems, etc., but we never considered ingesting malicious .html/.js until these groundbreaking pubs.
  • 19. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL 2018: Web IDL & Client-side rewriting 2020: Analysis of attacks on rehosting sites 19 https://ws-dl.blogspot.com/2018/04/2018-05-01-high-fidelity-ms-thesis-to.html https://www.ndss-symposium.org/ndss-paper/melting-pot-of-origins-compromising-the-intermediary-web-services-that-rehost-websites/ I signed off on John’s thesis 3 years ago, but I’m only now really understanding it. Key contribution: web archives as subclass of rehosting sites.
  • 20. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL 20 Hundreds of publicly available, independent, interoperable, robust, auditable, cooperating web archives.
  • 21. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL “No man ever steps in the same river twice, for it's not the same river and he's not the same man” 21 For third party playback, we are far from being able to do meaningful audits: replaying the same archived page over and over produces different results. Left: Reload 1566 archived pages 39 times over 1 year. Green=resource loaded, Gray = resource not loaded, Black line = baseline download. https://github.com/oduwsdl/mementos-fixity Conventional fixity-based approaches will not work. https://www.slideshare.net/phonedude/blockchain- can-not-be-used-to-verify-replayed-archived-web-p ages-125618706 We can’t depend on the archive for fixity; archives change and/or die. Cf. “Where did the archive go?” (parts 1, 2, 3, 4) & “Archive Assisted Archival Fixity Verification Framework” https://arxiv.org/abs/1905.12565
  • 22. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL 22 Hundreds of publicly available, independent, interoperable, robust, auditable, cooperating web archives.
  • 23. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL That archives don’t ingest the output of other archives is a lack of interoperability. That we’re not more concerned about this is a lack of cooperation. 23 https://www.slideshare.net/phonedude/web-archives-at-the-nexus-of-good-fakes-and-flawed-originals/87 1 2 3 4
  • 24. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL Kudos to archive.today for preserving machine-readable source metadata and including it in the UI 24 n.b. tracking source is built-in to NNTP, SMTP, Atom, etc.
  • 25. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL APIs are necessary but not sufficient. We must be able to preserve/audit the data (e.g., WARC, HAR) as rendered through software (e.g., pywb), not just the data. 25 https://github.com/WASAPI-Community/data-transfer-apis
  • 26. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL 26 Hundreds of publicly available, independent, interoperable, robust, auditable, cooperating web archives.
  • 27. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL These apps probably* use HTTP, json, etc., but what’s their URL? Are they even still web? 27 * I really don’t know (WebRTC?). And if they don’t, that further proves my point.
  • 28. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL 28 Hundreds of publicly available, independent, interoperable, robust, auditable, cooperating web archives.
  • 29. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL More than just Wayback Machines: we must accommodate any system that supports rehosting and/or revisions 29 see also: https://www.slideshare.net/ibnesayeed/readying-web-archives-to-consume-and-leverage-web-bundles
  • 30. Web Archiving in the Year eaee1902f186819154789ee22ca30035 Web Archiving Conference 2021-06-16 @phonedude_mln, @WebSciDL Web Archiving in the Year 312351bff07989769097660a56395065 30 $ echo -n "2025" | md5 312351bff07989769097660a56395065 $ # oh no - the hash changed from slide 1 $ # is this content drift?! Hundreds of publicly available, independent, interoperable, robust, auditable, cooperating web archives. Can we achieve this by 2025? Yes. Will we achieve this by 2025? Maybe. Will we “solve” trust? No. Technical definitions (e.g., ISO 16363) notwithstanding, “trust” in web archives might be better understood as analogous to “relevance” in info retrieval: defined by a user’s information need.