This document discusses client-side reconstruction of composite mementos using ServiceWorker. It introduces reconstructive.js, a ServiceWorker script that intercepts requests from mementos to reroute them to archives and prevent live content leaks. This achieves reconstruction without rewriting, reducing overhead. Reconstructive.js is part of efforts to eliminate "zombies" or broken links in archived web pages. The document reports on experiments showing reconstructive.js can reduce data and time overhead compared to rewriting approaches.
Client-side reconstruction of mementos using ServiceWorker
1. Client-side Reconstruction
of Composite Mementos
Using ServiceWorker
Sawood Alam, Mat Kelly, Michele C. Weigle, and Michael L. Nelson
Web Science and Digital Libraries Research Group
Old Dominion University, Norfolk, VA, 23529
@ibnesayeed
@WebSciDL
Supported in part by NSF III 1526700
1
JCDL 2017, June 19-23, 2017, Toronto, Ontario, Canada
2. Sawood Alam <@ibnesayeed>
2008 Memento Seen in 2017
2
● https://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html
?
3. Sawood Alam <@ibnesayeed>
2008 Memento Seen in 2012
3
● http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html
6. Sawood Alam <@ibnesayeed>
Zombies in Archive
6
<img src="http://xenland.alpha/images/map.png">
// Is rewritten on replay to become:
<img src="http://archive.example.org/1998/http://xenland.alpha/images/map.png">
// URLs constructed by JavaScript are harder to rewrite on replay, e.g.:
var base = 'http://xenland.alpha';
var imgdir = '/images/';
var img = document.createElement('img');
img.src = base + imgdir + 'ruler.png';
document.getElementById('ruler').appendChild(img);
//=>> http://xenland.alpha/images/ruler.png
7. Sawood Alam <@ibnesayeed>
Replay URL Resolution & Rewriting
7
Reference type Example Resolution after relocation
Relative path images/logo.png Potentially correct
Absolute path /public/images/logo.png Potentially incorrect
Absolute URL http://example.com/public/images/logo.png Potentially live leakage
http://example.com/public/index.html
...
<img src="/public/images/logo.png">
...
http://archive.example.org/<datetime>/http://example.com/public/index.html
...
<img src="/<datetime>/http://example.com/public/images/logo.png">
...
9. Sawood Alam <@ibnesayeed>
● New web API (still a working draft)
● A standalone JavaScript file
● Persists in the browser independent of the window
● Acts as a proxy
● Installed by a web page under its domain at a specific path (called scope)
● Intercepts all requests in scope
○ Resources under the scope path (at any depth)
○ Secondary resource requests originated from any resource under scope
● Allows modification in request and response
● Primarily used in web applications for offline access and notification support
● Requires HTTPS
● Growing browser support (73.61% as of June 8, 2017)
ServiceWorker
9
● http://caniuse.com/#feat=serviceworkers
10. Sawood Alam <@ibnesayeed>
reconstructive.js
10
● https://github.com/oduwsdl/reconstructive
● A ServiceWorker script written for archival replay
● Plug-in for web archives or Memento aggregators
● Intercepts all network requests originated from a memento
● Reroutes requests to an archive (prevents live leakage & incorrect references)
● Optionally rewrites the content to add banner & to fix hyperlinks
12. Sawood Alam <@ibnesayeed>
Rewriting Mementos is Expensive
12
Original capture (without any rewriting)
In our experiment over 500 home pages we observed:
● One-fifth mean data overhead
● One-third mean time overhead
15% more data in twice the time
14. Sawood Alam <@ibnesayeed>
Reconstruction Winners: PyWB & reconstructive.js
A. OpenWayback
B. PyWB
C. Memento
Reconstruct
D. Memento for
Chrome
E. reconstructive.js
14
15. Sawood Alam <@ibnesayeed>
Future Work
● Use “Prefer” header for original content (when archives support it)
● Add a customizable archival banner
● Add click handler for lazy rewriting of hyperlinks
● Handle archived ServiceWorkers
● Write a 404-combat ServiceWorker script for webmasters
15
● http://ws-dl.blogspot.co.uk/2016/08/2016-08-15-mementos-in-raw-take-two.html
16. Sawood Alam <@ibnesayeed>
● reconstructive.js => no zombies!
● Rerouting instead of rewriting (lazy rewriting)
● Mean overhead reduction
○ one-fifth data
○ one-third time
● 73.61% (and growing) browser support for ServiceWorker
○ http://caniuse.com/#feat=serviceworkers
● reconstructive.js
○ https://github.com/oduwsdl/reconstructive
● Archival Capture Replay Test Suite
○ https://ibnesayeed.github.io/acrts/
Conclusions
16
● In-depth recap: WADL 2017 Thursday, June 22, 3:45pm (https://fox.cs.vt.edu/wadl2017.html)