Client-Assisted Memento Aggregation Using the Prefer Header

Client-Assisted Memento Aggregation
Using the Prefer Header
Mat Kelly, Sawood Alam, Michael L. Nelson, and Michele C. Weigle
Old Dominion University
Web Science & Digital Libraries Research Group
{mkelly, salam, mln, mweigle}@cs.odu.edu
@machawk1 • @WebSciDL
Web Archiving and Digital Libraries (WADL) Workshop
June 6, 2018, Fort Worth, TX

@machawk1
Client-Assisted Memento Aggregation Using the Prefer Header
WADL 2018 • June 6, 2018 • Fort Worth, TX
Proliferation of Personal Web Archives
2

@machawk1
A Framework for Aggregating Private and Public Web Archives
JCDL 2018 • June 5, 2018 • Fort Worth, TX
Today’s Memento Aggregation
3
Archives Queried (A0 )

@machawk1
Motivation
4
> Include personal archives
> Include other non-aggregated archives

@machawk1
Motivation
5
> Include personal archives
> Include other non-aggregated archives

@machawk1
State of Aggregators’ Capabilities
● Mementoweb aggregator
○ Cannot customize set of archives aggregated
○ Open source? Unavailable for individuals’ deployment
● MemGator
○ Open source ✔ https://github.com/oduwsdl/MemGator
○ Requires static set of archives on-launch
○ Still specified by server, clients have no say
● With each, the set of archives is determined on the “server”.
● Neither allows client to specify set of archives aggregated.
6

@machawk1
HTTP Prefer
● RFC 7240 (June 2014)
● CLIENT requests with HTTP Header:
○ Prefer: foo; bar=""
● SERVER may response with HTTP Header:
○ Preference-Applied: foo
7

@machawk1
HTTP Prefer
● RFC 7240 (June 2014)
● CLIENT requests with HTTP Header:
○ Prefer: foo; bar=""
● SERVER may response with HTTP Header:
○ Preference-Applied: foo
Prefer: archives="data:application/json;charset=utf-8;base64,Ww0KIC7...NCn0="
OUR APPROACH:
8

@machawk1
Prefer + Memento
● S. Jones, H. Van de Sompel, et al. “Mementos in the Raw” 1
○ Prefer: original-content, original-links, original headers
○ Mitigate replay system rewriting, may “raw” information more accessible
● D.S.H. Rosenthal “Content negotiation and Memento” 2
○ none, screenshot, altered-dom, url-rewritten, banner-inserted
○ Additional focus on derived representations
9
1 http://ws-dl.blogspot.com/2016/08/2016-08-15-mementos-in-raw-take-two.html
2 https://blog.dshr.org/2016/08/content-negotiation-and-memento.html

@machawk1
A More Capable, Transparent Aggregator
10

@machawk1
Memento Meta-Aggregator (MMA)1
● Additional responsibilities beyond aggregation
● Provide hierarchical querying model to other aggregators
● Advanced querying models like Precedence and Short-Circuiting
● Systematic interaction and aggregation with Private and Personal Web
archive
1 Kelly et al. “A Framework for Aggregator Private and Public Web Archives”, JCDL 2018
11

@machawk1
Bob Prefers to Exclude IA Captures
12
✓ ✓

GET /archives
Bob Requests Supported Archives
13
→{ }

Bob Customizes the Set in the JSON
14
→{ }
✓ ✓

Bob Requests CNN for His Custom Set
15
→{ }
( )
base64 encoded JSON
transmitted

MMA Complies or Ignores Preference
16
→{ }
→{ }
✓

@machawk1
Client-Side Archive Specification
17

@machawk1
[
{
"id": "ia",
"name": "Internet Archive",
"timemap": "http://web.archive.org/web/timemap/link/",
"timegate": "http://web.archive.org/web/",
},
{
"id": "alice",
"name": "Alice’s Captures",
"timemap": "http://localhost:8081/timemap/",
"timegate": "http://localhost:8081/timegate/",
},
…
]
Respecification of archives.json
18
Base64
encoded
Ww0KICB7...NCn0=

@machawk1
Requesting Custom Set of Archives with curl
> GET /timemap/link/http://fox.cs.vt.edu/wadl2017.html HTTP/1.1
> Host: mma.cs.odu.edu
> Prefer: archives="data:application/json;charset=utf-8;base64,Ww0KICB7...NCn0="
< HTTP /1.1 200
< content-type: application/link-format
< vary: prefer
< preference-applied: archives="data:application/json;charset=utf-8;base64,Ww0KICB7...NCn0="
< content-location: /timemap/link/5bd...8e9/http://fox.cs.vt.edu/wadl2017.html
19

@machawk1
Non-Aggregated Public Web Archives
20

@machawk1
Potential Approaches Toward Archival Set
Persistence for Subsequent Queries
1. Maintain state
○ content-location: /timemap/link/5bd...8e9/http://fox.cs.vt.edu/wadl2017.html
○ Not something we want to do with HTTP
2. Require re-specification with each request
○ not portable to other users
3. Server-side set caching
○ combinatorial explosion
21

Client-Assisted Memento Aggregation Using the Prefer Header

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Client-Assisted Memento Aggregation Using the Prefer Header

Similar to Client-Assisted Memento Aggregation Using the Prefer Header (20)

More from Mat Kelly

More from Mat Kelly (19)

Recently uploaded

Recently uploaded (20)

Client-Assisted Memento Aggregation Using the Prefer Header