SlideShare a Scribd company logo
1 of 20
Download to read offline
Readying Web Archives to Consume
and Leverage Web Bundles
Sawood Alam1
, Michele C. Weigle2
, Michael L. Nelson2
,
Martin Klein3
, and Herbert Van de Sompel4
1
Internet Archive, San Francisco, California, USA
2
Old Dominion University, Norfolk, Virginia, USA
3
Los Alamos National Laboratory, New Mexico, USA
4
Data Archiving and Networked Services, Netherlands
@ibnesayeed
IIPC Web Archiving Conference, June 15, 2021
@ibnesayeed 2
https://github.com/WICG/webpackage#explainers
Bundled HTTP Exchanges may or may not
be signed by their corresponding origins
Web Bundles
Resources from:
Origin A
Origin B
Origin C
Web Browser
Bundled resources are
treated as if they
are coming from their
corresponding origins
A B C
Content-Type: application/webbundle
dist.example.com
@ibnesayeed 3
https://www.slideshare.net/ibnesayeed/web-archive-warc-file-format
Web Bundles in Terms of WARC and CDX
Web Bundle
WARC File
CDX File
Manifest
Web Bundles are optimized for performance, not
for the long-term preservation with provenance
● Uses CBOR format for HTTP Response
Message storage
● Includes an entrypoint URI
● Lossy (e.g., HTTP headers with the same name
are concatenated)
● No provision for provenance information
@ibnesayeed 4
https://datatracker.ietf.org/doc/html/draft-yasskin-wpack-use-cases-01.txt#section-2.2.11
Web Packaging Use Cases
● Essential
○ Offline installation
○ Offline browsing
○ Save and share a web page
○ Privacy-preserving prefetch
● Nice-to-have
○ Packaged Web Publications
○ Avoiding Censorship
○ Third-party security review
○ Building packages from multiple libraries
○ Cross-CDN Serving
○ Pre-installed applications
○ Protecting Users from a Compromised Frontend
○ Installation from a self-extracting executable
○ Packages in version control
○ Subresource bundling
○ Archival
@ibnesayeed
Crawling JavaScript-driven Deferred Representations
5
PhantomJS discovers 75% more resources than Heritrix, but crawls 12 times slower
Web Bundles have the potential to
enable efficient and coherent crawling
https://arxiv.org/abs/1508.02315
@ibnesayeed 6
● Crawlers can content negotiate to prefer Web Bundles, when present
○ Server has likely bundled resources that would be difficult to discover
○ A single transaction downloads multiple resources (efficient politeness)
● Media type “application/webbundle” needs to be decomposed for preservation in
WARC records and indexing
Servers will likely continue to serve
non-bundled resources for years to come
Request, Decompose, and Index Bundled Resources
https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API
@ibnesayeed
Mementos That Never Existed on the Live Web
Temporal Violations
7
Live leakage (Zombies)
Origin Violations
https://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html
https://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html
https://ws-dl.blogspot.com/2017/01/2017-01-20-cnncom-has-been-unarchivable.html
https://ws-dl.blogspot.com/2019/03/2019-03-18-cookie-violations-cause.html
Web Packaging
has the potential to
eliminate these issues
Cookie Violations
@ibnesayeed
An HTML Page With External Style Sheet and Image
8
https://www.cs.odu.edu/~salam/dweb/
@ibnesayeed
Archival Replay: Server-side Rewriting
9
https://web.archive.org/web/20190716035757/https://www.cs.odu.edu/~salam/dweb/
URI references (href, src, and srcset etc.) are rewritten to point to their archived versions at a nearby time
@ibnesayeed
Archival Replay: Client-side Rerouting
10
https://oduwsdl.github.io/Reconstructive/
http://ws-dl.blogspot.com/2018/01/2018-01-08-introducing-reconstructive.html
● Avoids zombies (live-leakage)
● Adds an unobtrusive archival banner
(using Custom HTML Element)
@ibnesayeed
Archival Replay: Proxy-based Rerouting
11
http://oldweb.today/chrome/20190715110000/https://www.cs.odu.edu/~salam/dweb/
● A web browser runs
on a remote host
● Configured to use a
web archive proxy
● Accessed via VNC
● Does not scale well
Web Bundles can
enable archival
replay from original
URIs locally without
a replay proxy
@ibnesayeed
Bundled Archival Replay in Portals
12
https://web.dev/hands-on-portals/
2020-07-09
Portal is an experimental HTML element, similar to iframe, but it allows
seamless transition to become the main context when activated
@ibnesayeed
Memento TimeGate
13
$ curl -I https://www.w3.org/wiki/Main_Page
HTTP/2 200
date: Tue, 16 Jul 2019 03:16:01 GMT
link: <https://www.w3.org/wiki/Main_Page>; rel="original latest-version",
<https://www.w3.org/wiki/Special:TimeGate/Main_Page>; rel="timegate",
<https://www.w3.org/wiki/Special:TimeMap/Main_Page>; rel="timemap"; type="application/link-format"; from="Thu, 01 Jan 1970 00:00:00 GMT"; until="Fri, 16 Nov 2018 19:10:23 GMT",
<https://www.w3.org/wiki/index.php?title=Main_Page&oldid=30366>; rel="first memento"; datetime="Thu, 01 Jan 1970 00:00:00 GMT",
<https://www.w3.org/wiki/index.php?title=Main_Page&oldid=108148>; rel="last memento"; datetime="Fri, 16 Nov 2018 19:10:23 GMT"
content-language: en
vary: Accept-Encoding,Cookie
cache-control: s-maxage=18000, must-revalidate, max-age=0
last-modified: Mon, 15 Jul 2019 22:16:01 GMT
content-type: text/html; charset=UTF-8
$ curl -I -H "Accept-Datetime: Sat, 20 Dec 2014 12:30:00 GMT" https://www.w3.org/wiki/Special:TimeGate/Main_Page
HTTP/2 302
date: Tue, 16 Jul 2019 03:16:21 GMT
vary: Accept-Encoding,Cookie,Accept-Datetime
location: https://www.w3.org/wiki/index.php?title=Main_Page&oldid=80125
link: <https://www.w3.org/wiki/Special:TimeMap/Main_Page>; rel="timemap"; type="application/link-format"; from="Thu, 01 Jan 1970 00:00:00 GMT"; until="Fri, 16 Nov 2018 19:10:23 GMT",
<https://www.w3.org/wiki/index.php?title=Main_Page&oldid=30366>; rel="first memento"; datetime="Thu, 01 Jan 1970 00:00:00 GMT",
<https://www.w3.org/wiki/index.php?title=Main_Page&oldid=108148>; rel="last memento"; datetime="Fri, 16 Nov 2018 19:10:23 GMT",
<https://www.w3.org/wiki/Main_Page>; rel="original latest-version"
content-type: text/html; charset=UTF-8
$ curl -I "https://www.w3.org/wiki/index.php?title=Main_Page&oldid=80125"
HTTP/2 200
date: Tue, 16 Jul 2019 03:38:35 GMT
x-content-type-options: nosniff
memento-datetime: Sat, 20 Dec 2014 11:34:08 GMT
link: <https://www.w3.org/wiki/Main_Page>; rel="original latest-version",<https://www.w3.org/wiki/Special:TimeGate/Main_Page>;
rel="timegate",<https://www.w3.org/wiki/Special:TimeMap/Main_Page>; rel="timemap"; type="application/link-format"; from="Thu, 01 Jan 1970 00:00:00 GMT"; until="Fri, 16 Nov 2018
19:10:23 GMT",<https://www.w3.org/wiki/index.php?title=Main_Page&oldid=30366>; rel="first memento"; datetime="Thu, 01 Jan 1970 00:00:00
GMT",<https://www.w3.org/wiki/index.php?title=Main_Page&oldid=108148>; rel="last memento"; datetime="Fri, 16 Nov 2018 19:10:23 GMT"
content-language: en
vary: Accept-Encoding,Cookie
expires: Thu, 01 Jan 1970 00:00:00 GMT
cache-control: private, must-revalidate, max-age=0
content-type: text/html; charset=UTF-8
https://tools.ietf.org/html/rfc7089
Native TimeGate support in
Exchange Loading would be great
Web archives provide
third-party generic
TimeGate resources
Currently, Web Bundles prefer the
most recent version of a resource
@ibnesayeed 14
https://developer.mozilla.org/en-US/docs/Web/API/Cache
● Distributor provides a custom cache namespace with Bundles to stash them in
● More than one related Bundles can have the same namespace
● A configurable policy determines the behavior of cache utilization for Loading
● Archive origins configure Loading policy to only load resources from bundles they
delivered to prevent loading zombie resources
Enable sandboxing via namespaced
caches for security and coherence
Namespaced Cache for Temporal Coherence
Namespaced Bundle Cache
20190718154532
Content-Type: application/webbundle
Bundle-Cache-Name: 20190718154532
Exchange-Loading-Policy: bundle-only
@ibnesayeed 15
● If a needed resources is not in a bundle or any caches of the browser, it causes
live-leakage
● This allows publishers to create partial Web Bundles for distribution while serving
resources like analytics, trackers, ads, etc. live
● A fallback catch-all route would be helpful in preventing Zombies in archives, but
give too much power to distributors
Introduce wildcard routes or Service Workers for
namespaced caches to handle missing resources
Catch-All Route to Prevent Zombies
https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API
@ibnesayeed
Web Archives Currently Lack Technical Means to
Prove Fixity and Non-repudiation
16
https://twitter.com/Jamie_Maz/status/936349041264414721
http://blog.archive.org/2018/04/24/addressing-recent-claims-of-manipulated-blog-posts-in-the-wayback-machine/
https://ws-dl.blogspot.com/2018/04/2018-04-24-why-we-need-multiple-web.html
https://arxiv.org/abs/1905.12565
● Joy Ann Reid claimed copies of her blog in the Internet Archive has been hacked
● The Internet Archive publicly denied it
● We investigated the matter using multiple web archives and concluded Reid’s
claim to be very unlikely
There is strong need of verifiable web archiving
@ibnesayeed 17
http://www.certificate-transparency.org/
● Signed exchanges contain “Date” response header
● An archive returns “Memento-Datetime” header when delivering an HTTP Bundle
● Validate signature in the context of that historical time
○ The difference in the two times should be within an acceptable margin
● Utilize means like Certificate Transparency Logs to establish temporal validation
A means to establish that the signature would have
been “temporally valid” at a given time in the past
Temporal Validation of Signed Exchanges
@ibnesayeed 18
● The origin will likely change (specs are not stable yet)
● Loses the benefits of replaying archived pages as the original domain
● Benefits from faster replay and better caching of popular mementos
http://www.certificate-transparency.org/
Archives can still create Web Bundles with
rewritten URIs to replay in the traditional fashion
Archival Replay With Unsigned Exchanges
@ibnesayeed
Native Memento Support and Acknowledgement
19
https://docs.google.com/presentation/d/1YAQl_1sPH25ZdAiEPhE5cqj8zMomLBKDQWzPS5Avw5s/edit
https://www.slideshare.net/mweigle/enabling-personal-use-of-web-archives
● Visually acknowledge
presence of
“Memento-Datetime”
header
● Surface archival
metadata from “Link”
header
● Enable necessary
security features for
the archived web
which is read-only
● Prevent live-leakage
@ibnesayeed
Conclusions
20
Web Packaging has a unique opportunity to devise a technology that supports web archiving
and provides a much needed capability to verify the integrity of archived web resources
Effective Bundled HTTP Exchanges Efficient and coherent crawling
Temporally coherent replay
TimeGate/Named Cache for Loading
Temporal validation of Signed Exchanges Long-term fixity and non-repudiation
https://arxiv.org/abs/1906.07104
Native Memento support in web browsers Visual archived resource identification

More Related Content

What's hot

A Framework for Verifying the Fixity of Archived Web Resources
A Framework for Verifying the Fixity of Archived Web ResourcesA Framework for Verifying the Fixity of Archived Web Resources
A Framework for Verifying the Fixity of Archived Web Resourcesmaturban
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptMichael Nelson
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingMichael Nelson
 
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred RepresentationsScripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred RepresentationsJustin Brunelle
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Michael Nelson
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesYasmin AlNoamany, PhD
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesMichael Nelson
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingMichael Nelson
 
URI Disambiguation in the Context of Linked Data
URI Disambiguation in the Context of Linked DataURI Disambiguation in the Context of Linked Data
URI Disambiguation in the Context of Linked Databutest
 
To the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationTo the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationMartin Klein
 
Persistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DonePersistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DoneHerbert Van de Sompel
 
Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Mat Kelly
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web ResourcesMartin Klein
 
Establishing and Verifying Fixity of Archived Web Pages
Establishing and Verifying Fixity of Archived Web PagesEstablishing and Verifying Fixity of Archived Web Pages
Establishing and Verifying Fixity of Archived Web Pagesmaturban
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live WebMartin Klein
 
21st Century Archival Appraisal
21st Century Archival Appraisal21st Century Archival Appraisal
21st Century Archival AppraisalArian Ravanbakhsh
 
Making sense of users' Web activities
Making sense of users' Web activitiesMaking sense of users' Web activities
Making sense of users' Web activitiesMathieu d'Aquin
 
Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Herbert Van de Sompel
 

What's hot (20)

A Framework for Verifying the Fixity of Archived Web Resources
A Framework for Verifying the Fixity of Archived Web ResourcesA Framework for Verifying the Fixity of Archived Web Resources
A Framework for Verifying the Fixity of Archived Web Resources
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
 
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred RepresentationsScripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web Archives
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web Archives
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
 
URI Disambiguation in the Context of Linked Data
URI Disambiguation in the Context of Linked DataURI Disambiguation in the Context of Linked Data
URI Disambiguation in the Context of Linked Data
 
The Web We Want
The Web We WantThe Web We Want
The Web We Want
 
Creating Pockets of Persistence
Creating Pockets of PersistenceCreating Pockets of Persistence
Creating Pockets of Persistence
 
To the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationTo the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly Communication
 
Persistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DonePersistent Identification: Easier Said than Done
Persistent Identification: Easier Said than Done
 
Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web Resources
 
Establishing and Verifying Fixity of Archived Web Pages
Establishing and Verifying Fixity of Archived Web PagesEstablishing and Verifying Fixity of Archived Web Pages
Establishing and Verifying Fixity of Archived Web Pages
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live Web
 
21st Century Archival Appraisal
21st Century Archival Appraisal21st Century Archival Appraisal
21st Century Archival Appraisal
 
Making sense of users' Web activities
Making sense of users' Web activitiesMaking sense of users' Web activities
Making sense of users' Web activities
 
Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013
 

Similar to Readying Web Archives to Consume and Leverage Web Bundles

Time -Travel on the Internet
Time -Travel on the InternetTime -Travel on the Internet
Time -Travel on the InternetIRJET Journal
 
Keeping Web Records Lg Web Network August 2009
Keeping Web Records Lg Web Network August 2009Keeping Web Records Lg Web Network August 2009
Keeping Web Records Lg Web Network August 2009Cassie Findlay
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital PreservationMat Kelly
 
Creating an Effective Mobile API
Creating an Effective Mobile API Creating an Effective Mobile API
Creating an Effective Mobile API Nick DeNardis
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesMichael Nelson
 
Emerging storage-trends-for-containers
Emerging storage-trends-for-containersEmerging storage-trends-for-containers
Emerging storage-trends-for-containerskiran mova
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Artefactual Systems - AtoM
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesMichael Nelson
 
The Case for HTTP/2
The Case for HTTP/2The Case for HTTP/2
The Case for HTTP/2Andy Davies
 
Piggy Bank
Piggy BankPiggy Bank
Piggy BankAinY
 
Your browser, your storage (extended version)
Your browser, your storage (extended version)Your browser, your storage (extended version)
Your browser, your storage (extended version)Francesco Fullone
 
Web performance optimization for modern web applications
Web performance optimization for modern web applicationsWeb performance optimization for modern web applications
Web performance optimization for modern web applicationsChris Love
 
MS PowerPoint format
MS PowerPoint formatMS PowerPoint format
MS PowerPoint formatwebhostingguy
 
Engaging Virtual Communities: Web 2.0
Engaging Virtual Communities: Web 2.0Engaging Virtual Communities: Web 2.0
Engaging Virtual Communities: Web 2.0lisbk
 
Web 2.0: What Is It, How Can I Use It, How Can I Deploy It?
Web 2.0: What Is It, How Can I Use It, How Can I Deploy It?Web 2.0: What Is It, How Can I Use It, How Can I Deploy It?
Web 2.0: What Is It, How Can I Use It, How Can I Deploy It?lisbk
 

Similar to Readying Web Archives to Consume and Leverage Web Bundles (20)

Time -Travel on the Internet
Time -Travel on the InternetTime -Travel on the Internet
Time -Travel on the Internet
 
Keeping Web Records Lg Web Network August 2009
Keeping Web Records Lg Web Network August 2009Keeping Web Records Lg Web Network August 2009
Keeping Web Records Lg Web Network August 2009
 
URL Design
URL DesignURL Design
URL Design
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital Preservation
 
Creating an Effective Mobile API
Creating an Effective Mobile API Creating an Effective Mobile API
Creating an Effective Mobile API
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
 
Emerging storage-trends-for-containers
Emerging storage-trends-for-containersEmerging storage-trends-for-containers
Emerging storage-trends-for-containers
 
Always on! ... or not?
Always on! ... or not?Always on! ... or not?
Always on! ... or not?
 
your browser, your storage
your browser, your storageyour browser, your storage
your browser, your storage
 
your browser, my storage
your browser, my storageyour browser, my storage
your browser, my storage
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
 
Cache is king
Cache is kingCache is king
Cache is king
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
The Case for HTTP/2
The Case for HTTP/2The Case for HTTP/2
The Case for HTTP/2
 
Piggy Bank
Piggy BankPiggy Bank
Piggy Bank
 
Your browser, your storage (extended version)
Your browser, your storage (extended version)Your browser, your storage (extended version)
Your browser, your storage (extended version)
 
Web performance optimization for modern web applications
Web performance optimization for modern web applicationsWeb performance optimization for modern web applications
Web performance optimization for modern web applications
 
MS PowerPoint format
MS PowerPoint formatMS PowerPoint format
MS PowerPoint format
 
Engaging Virtual Communities: Web 2.0
Engaging Virtual Communities: Web 2.0Engaging Virtual Communities: Web 2.0
Engaging Virtual Communities: Web 2.0
 
Web 2.0: What Is It, How Can I Use It, How Can I Deploy It?
Web 2.0: What Is It, How Can I Use It, How Can I Deploy It?Web 2.0: What Is It, How Can I Use It, How Can I Deploy It?
Web 2.0: What Is It, How Can I Use It, How Can I Deploy It?
 

More from Sawood Alam

TrendMachine: Temporal Resilience of Web Pages
TrendMachine: Temporal Resilience of Web PagesTrendMachine: Temporal Resilience of Web Pages
TrendMachine: Temporal Resilience of Web PagesSawood Alam
 
CDX Summary: Web Archival Collection Insights
CDX Summary: Web Archival Collection InsightsCDX Summary: Web Archival Collection Insights
CDX Summary: Web Archival Collection InsightsSawood Alam
 
Video Archiving and Playback in the Wayback Machine
Video Archiving and Playback in the Wayback MachineVideo Archiving and Playback in the Wayback Machine
Video Archiving and Playback in the Wayback MachineSawood Alam
 
Profiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento RoutingProfiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento RoutingSawood Alam
 
MementoMap: An Archive Profile Dissemination Framework
MementoMap: An Archive Profile Dissemination FrameworkMementoMap: An Archive Profile Dissemination Framework
MementoMap: An Archive Profile Dissemination FrameworkSawood Alam
 
Web ARChive (WARC) File Format
Web ARChive (WARC) File FormatWeb ARChive (WARC) File Format
Web ARChive (WARC) File FormatSawood Alam
 
MemGator - A Memento Aggregator CLI and Server in Go
MemGator - A Memento Aggregator CLI and Server in GoMemGator - A Memento Aggregator CLI and Server in Go
MemGator - A Memento Aggregator CLI and Server in GoSawood Alam
 
Dockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to ContainerizationDockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to ContainerizationSawood Alam
 
Avoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorkerAvoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorkerSawood Alam
 
Client-side Reconstruction of Composite Mementos Using ServiceWorker
Client-side Reconstruction of Composite Mementos Using ServiceWorkerClient-side Reconstruction of Composite Mementos Using ServiceWorker
Client-side Reconstruction of Composite Mementos Using ServiceWorkerSawood Alam
 
TPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive ProfilingTPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive ProfilingSawood Alam
 
Introducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research GroupIntroducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research GroupSawood Alam
 
InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web ArchivesInterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web ArchivesSawood Alam
 
Web Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchWeb Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchSawood Alam
 
JCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingJCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingSawood Alam
 
Web Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionWeb Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionSawood Alam
 
TPDL 2015 - Profiling Web Archives
TPDL 2015 - Profiling Web ArchivesTPDL 2015 - Profiling Web Archives
TPDL 2015 - Profiling Web ArchivesSawood Alam
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web ArchivesSawood Alam
 
Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...
Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...
Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...Sawood Alam
 
Profile Serialization IIPC GA 2015
Profile Serialization IIPC GA 2015Profile Serialization IIPC GA 2015
Profile Serialization IIPC GA 2015Sawood Alam
 

More from Sawood Alam (20)

TrendMachine: Temporal Resilience of Web Pages
TrendMachine: Temporal Resilience of Web PagesTrendMachine: Temporal Resilience of Web Pages
TrendMachine: Temporal Resilience of Web Pages
 
CDX Summary: Web Archival Collection Insights
CDX Summary: Web Archival Collection InsightsCDX Summary: Web Archival Collection Insights
CDX Summary: Web Archival Collection Insights
 
Video Archiving and Playback in the Wayback Machine
Video Archiving and Playback in the Wayback MachineVideo Archiving and Playback in the Wayback Machine
Video Archiving and Playback in the Wayback Machine
 
Profiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento RoutingProfiling Web Archival Voids for Memento Routing
Profiling Web Archival Voids for Memento Routing
 
MementoMap: An Archive Profile Dissemination Framework
MementoMap: An Archive Profile Dissemination FrameworkMementoMap: An Archive Profile Dissemination Framework
MementoMap: An Archive Profile Dissemination Framework
 
Web ARChive (WARC) File Format
Web ARChive (WARC) File FormatWeb ARChive (WARC) File Format
Web ARChive (WARC) File Format
 
MemGator - A Memento Aggregator CLI and Server in Go
MemGator - A Memento Aggregator CLI and Server in GoMemGator - A Memento Aggregator CLI and Server in Go
MemGator - A Memento Aggregator CLI and Server in Go
 
Dockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to ContainerizationDockerize Your Projects - A Brief Introduction to Containerization
Dockerize Your Projects - A Brief Introduction to Containerization
 
Avoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorkerAvoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorker
 
Client-side Reconstruction of Composite Mementos Using ServiceWorker
Client-side Reconstruction of Composite Mementos Using ServiceWorkerClient-side Reconstruction of Composite Mementos Using ServiceWorker
Client-side Reconstruction of Composite Mementos Using ServiceWorker
 
TPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive ProfilingTPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive Profiling
 
Introducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research GroupIntroducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research Group
 
InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web ArchivesInterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives
 
Web Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchWeb Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext Search
 
JCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingJCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive Profiling
 
Web Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionWeb Archiving: A Brief Introduction
Web Archiving: A Brief Introduction
 
TPDL 2015 - Profiling Web Archives
TPDL 2015 - Profiling Web ArchivesTPDL 2015 - Profiling Web Archives
TPDL 2015 - Profiling Web Archives
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...
Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...
Improving Accessibility of Archived Raster Dictionaries of Complex Script Lan...
 
Profile Serialization IIPC GA 2015
Profile Serialization IIPC GA 2015Profile Serialization IIPC GA 2015
Profile Serialization IIPC GA 2015
 

Recently uploaded

VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一Fs
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Roomdivyansh0kumar0
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Lucknow
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Roomishabajaj13
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一3sw2qly1
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 

Recently uploaded (20)

VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 

Readying Web Archives to Consume and Leverage Web Bundles

  • 1. Readying Web Archives to Consume and Leverage Web Bundles Sawood Alam1 , Michele C. Weigle2 , Michael L. Nelson2 , Martin Klein3 , and Herbert Van de Sompel4 1 Internet Archive, San Francisco, California, USA 2 Old Dominion University, Norfolk, Virginia, USA 3 Los Alamos National Laboratory, New Mexico, USA 4 Data Archiving and Networked Services, Netherlands @ibnesayeed IIPC Web Archiving Conference, June 15, 2021
  • 2. @ibnesayeed 2 https://github.com/WICG/webpackage#explainers Bundled HTTP Exchanges may or may not be signed by their corresponding origins Web Bundles Resources from: Origin A Origin B Origin C Web Browser Bundled resources are treated as if they are coming from their corresponding origins A B C Content-Type: application/webbundle dist.example.com
  • 3. @ibnesayeed 3 https://www.slideshare.net/ibnesayeed/web-archive-warc-file-format Web Bundles in Terms of WARC and CDX Web Bundle WARC File CDX File Manifest Web Bundles are optimized for performance, not for the long-term preservation with provenance ● Uses CBOR format for HTTP Response Message storage ● Includes an entrypoint URI ● Lossy (e.g., HTTP headers with the same name are concatenated) ● No provision for provenance information
  • 4. @ibnesayeed 4 https://datatracker.ietf.org/doc/html/draft-yasskin-wpack-use-cases-01.txt#section-2.2.11 Web Packaging Use Cases ● Essential ○ Offline installation ○ Offline browsing ○ Save and share a web page ○ Privacy-preserving prefetch ● Nice-to-have ○ Packaged Web Publications ○ Avoiding Censorship ○ Third-party security review ○ Building packages from multiple libraries ○ Cross-CDN Serving ○ Pre-installed applications ○ Protecting Users from a Compromised Frontend ○ Installation from a self-extracting executable ○ Packages in version control ○ Subresource bundling ○ Archival
  • 5. @ibnesayeed Crawling JavaScript-driven Deferred Representations 5 PhantomJS discovers 75% more resources than Heritrix, but crawls 12 times slower Web Bundles have the potential to enable efficient and coherent crawling https://arxiv.org/abs/1508.02315
  • 6. @ibnesayeed 6 ● Crawlers can content negotiate to prefer Web Bundles, when present ○ Server has likely bundled resources that would be difficult to discover ○ A single transaction downloads multiple resources (efficient politeness) ● Media type “application/webbundle” needs to be decomposed for preservation in WARC records and indexing Servers will likely continue to serve non-bundled resources for years to come Request, Decompose, and Index Bundled Resources https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API
  • 7. @ibnesayeed Mementos That Never Existed on the Live Web Temporal Violations 7 Live leakage (Zombies) Origin Violations https://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html https://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html https://ws-dl.blogspot.com/2017/01/2017-01-20-cnncom-has-been-unarchivable.html https://ws-dl.blogspot.com/2019/03/2019-03-18-cookie-violations-cause.html Web Packaging has the potential to eliminate these issues Cookie Violations
  • 8. @ibnesayeed An HTML Page With External Style Sheet and Image 8 https://www.cs.odu.edu/~salam/dweb/
  • 9. @ibnesayeed Archival Replay: Server-side Rewriting 9 https://web.archive.org/web/20190716035757/https://www.cs.odu.edu/~salam/dweb/ URI references (href, src, and srcset etc.) are rewritten to point to their archived versions at a nearby time
  • 10. @ibnesayeed Archival Replay: Client-side Rerouting 10 https://oduwsdl.github.io/Reconstructive/ http://ws-dl.blogspot.com/2018/01/2018-01-08-introducing-reconstructive.html ● Avoids zombies (live-leakage) ● Adds an unobtrusive archival banner (using Custom HTML Element)
  • 11. @ibnesayeed Archival Replay: Proxy-based Rerouting 11 http://oldweb.today/chrome/20190715110000/https://www.cs.odu.edu/~salam/dweb/ ● A web browser runs on a remote host ● Configured to use a web archive proxy ● Accessed via VNC ● Does not scale well Web Bundles can enable archival replay from original URIs locally without a replay proxy
  • 12. @ibnesayeed Bundled Archival Replay in Portals 12 https://web.dev/hands-on-portals/ 2020-07-09 Portal is an experimental HTML element, similar to iframe, but it allows seamless transition to become the main context when activated
  • 13. @ibnesayeed Memento TimeGate 13 $ curl -I https://www.w3.org/wiki/Main_Page HTTP/2 200 date: Tue, 16 Jul 2019 03:16:01 GMT link: <https://www.w3.org/wiki/Main_Page>; rel="original latest-version", <https://www.w3.org/wiki/Special:TimeGate/Main_Page>; rel="timegate", <https://www.w3.org/wiki/Special:TimeMap/Main_Page>; rel="timemap"; type="application/link-format"; from="Thu, 01 Jan 1970 00:00:00 GMT"; until="Fri, 16 Nov 2018 19:10:23 GMT", <https://www.w3.org/wiki/index.php?title=Main_Page&oldid=30366>; rel="first memento"; datetime="Thu, 01 Jan 1970 00:00:00 GMT", <https://www.w3.org/wiki/index.php?title=Main_Page&oldid=108148>; rel="last memento"; datetime="Fri, 16 Nov 2018 19:10:23 GMT" content-language: en vary: Accept-Encoding,Cookie cache-control: s-maxage=18000, must-revalidate, max-age=0 last-modified: Mon, 15 Jul 2019 22:16:01 GMT content-type: text/html; charset=UTF-8 $ curl -I -H "Accept-Datetime: Sat, 20 Dec 2014 12:30:00 GMT" https://www.w3.org/wiki/Special:TimeGate/Main_Page HTTP/2 302 date: Tue, 16 Jul 2019 03:16:21 GMT vary: Accept-Encoding,Cookie,Accept-Datetime location: https://www.w3.org/wiki/index.php?title=Main_Page&oldid=80125 link: <https://www.w3.org/wiki/Special:TimeMap/Main_Page>; rel="timemap"; type="application/link-format"; from="Thu, 01 Jan 1970 00:00:00 GMT"; until="Fri, 16 Nov 2018 19:10:23 GMT", <https://www.w3.org/wiki/index.php?title=Main_Page&oldid=30366>; rel="first memento"; datetime="Thu, 01 Jan 1970 00:00:00 GMT", <https://www.w3.org/wiki/index.php?title=Main_Page&oldid=108148>; rel="last memento"; datetime="Fri, 16 Nov 2018 19:10:23 GMT", <https://www.w3.org/wiki/Main_Page>; rel="original latest-version" content-type: text/html; charset=UTF-8 $ curl -I "https://www.w3.org/wiki/index.php?title=Main_Page&oldid=80125" HTTP/2 200 date: Tue, 16 Jul 2019 03:38:35 GMT x-content-type-options: nosniff memento-datetime: Sat, 20 Dec 2014 11:34:08 GMT link: <https://www.w3.org/wiki/Main_Page>; rel="original latest-version",<https://www.w3.org/wiki/Special:TimeGate/Main_Page>; rel="timegate",<https://www.w3.org/wiki/Special:TimeMap/Main_Page>; rel="timemap"; type="application/link-format"; from="Thu, 01 Jan 1970 00:00:00 GMT"; until="Fri, 16 Nov 2018 19:10:23 GMT",<https://www.w3.org/wiki/index.php?title=Main_Page&oldid=30366>; rel="first memento"; datetime="Thu, 01 Jan 1970 00:00:00 GMT",<https://www.w3.org/wiki/index.php?title=Main_Page&oldid=108148>; rel="last memento"; datetime="Fri, 16 Nov 2018 19:10:23 GMT" content-language: en vary: Accept-Encoding,Cookie expires: Thu, 01 Jan 1970 00:00:00 GMT cache-control: private, must-revalidate, max-age=0 content-type: text/html; charset=UTF-8 https://tools.ietf.org/html/rfc7089 Native TimeGate support in Exchange Loading would be great Web archives provide third-party generic TimeGate resources Currently, Web Bundles prefer the most recent version of a resource
  • 14. @ibnesayeed 14 https://developer.mozilla.org/en-US/docs/Web/API/Cache ● Distributor provides a custom cache namespace with Bundles to stash them in ● More than one related Bundles can have the same namespace ● A configurable policy determines the behavior of cache utilization for Loading ● Archive origins configure Loading policy to only load resources from bundles they delivered to prevent loading zombie resources Enable sandboxing via namespaced caches for security and coherence Namespaced Cache for Temporal Coherence Namespaced Bundle Cache 20190718154532 Content-Type: application/webbundle Bundle-Cache-Name: 20190718154532 Exchange-Loading-Policy: bundle-only
  • 15. @ibnesayeed 15 ● If a needed resources is not in a bundle or any caches of the browser, it causes live-leakage ● This allows publishers to create partial Web Bundles for distribution while serving resources like analytics, trackers, ads, etc. live ● A fallback catch-all route would be helpful in preventing Zombies in archives, but give too much power to distributors Introduce wildcard routes or Service Workers for namespaced caches to handle missing resources Catch-All Route to Prevent Zombies https://developer.mozilla.org/en-US/docs/Web/API/Service_Worker_API
  • 16. @ibnesayeed Web Archives Currently Lack Technical Means to Prove Fixity and Non-repudiation 16 https://twitter.com/Jamie_Maz/status/936349041264414721 http://blog.archive.org/2018/04/24/addressing-recent-claims-of-manipulated-blog-posts-in-the-wayback-machine/ https://ws-dl.blogspot.com/2018/04/2018-04-24-why-we-need-multiple-web.html https://arxiv.org/abs/1905.12565 ● Joy Ann Reid claimed copies of her blog in the Internet Archive has been hacked ● The Internet Archive publicly denied it ● We investigated the matter using multiple web archives and concluded Reid’s claim to be very unlikely There is strong need of verifiable web archiving
  • 17. @ibnesayeed 17 http://www.certificate-transparency.org/ ● Signed exchanges contain “Date” response header ● An archive returns “Memento-Datetime” header when delivering an HTTP Bundle ● Validate signature in the context of that historical time ○ The difference in the two times should be within an acceptable margin ● Utilize means like Certificate Transparency Logs to establish temporal validation A means to establish that the signature would have been “temporally valid” at a given time in the past Temporal Validation of Signed Exchanges
  • 18. @ibnesayeed 18 ● The origin will likely change (specs are not stable yet) ● Loses the benefits of replaying archived pages as the original domain ● Benefits from faster replay and better caching of popular mementos http://www.certificate-transparency.org/ Archives can still create Web Bundles with rewritten URIs to replay in the traditional fashion Archival Replay With Unsigned Exchanges
  • 19. @ibnesayeed Native Memento Support and Acknowledgement 19 https://docs.google.com/presentation/d/1YAQl_1sPH25ZdAiEPhE5cqj8zMomLBKDQWzPS5Avw5s/edit https://www.slideshare.net/mweigle/enabling-personal-use-of-web-archives ● Visually acknowledge presence of “Memento-Datetime” header ● Surface archival metadata from “Link” header ● Enable necessary security features for the archived web which is read-only ● Prevent live-leakage
  • 20. @ibnesayeed Conclusions 20 Web Packaging has a unique opportunity to devise a technology that supports web archiving and provides a much needed capability to verify the integrity of archived web resources Effective Bundled HTTP Exchanges Efficient and coherent crawling Temporally coherent replay TimeGate/Named Cache for Loading Temporal validation of Signed Exchanges Long-term fixity and non-repudiation https://arxiv.org/abs/1906.07104 Native Memento support in web browsers Visual archived resource identification