Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Client-Assisted Memento Aggregation Using the Prefer Header

606 views

Published on

Presented at the Web Archiving and Digital Libraries (WADL) Workshop in Fort Worth, Texas on June 6, 2018

Published in: Education
  • Be the first to comment

  • Be the first to like this

Client-Assisted Memento Aggregation Using the Prefer Header

  1. 1. Client-Assisted Memento Aggregation Using the Prefer Header Mat Kelly, Sawood Alam, Michael L. Nelson, and Michele C. Weigle Old Dominion University Web Science & Digital Libraries Research Group {mkelly, salam, mln, mweigle}@cs.odu.edu @machawk1 • @WebSciDL Web Archiving and Digital Libraries (WADL) Workshop June 6, 2018, Fort Worth, TX
  2. 2. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX Proliferation of Personal Web Archives 2
  3. 3. @machawk1 A Framework for Aggregating Private and Public Web Archives JCDL 2018 • June 5, 2018 • Fort Worth, TX Today’s Memento Aggregation 3 Archives Queried (A0 )
  4. 4. @machawk1 A Framework for Aggregating Private and Public Web Archives JCDL 2018 • June 5, 2018 • Fort Worth, TX Motivation 4 Archives Queried (A0 ) > Include personal archives > Include other non-aggregated archives
  5. 5. @machawk1 A Framework for Aggregating Private and Public Web Archives JCDL 2018 • June 5, 2018 • Fort Worth, TX Motivation 5 Archives Queried (A0 ) > Include personal archives > Include other non-aggregated archives
  6. 6. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX State of Aggregators’ Capabilities ● Mementoweb aggregator ○ Cannot customize set of archives aggregated ○ Open source? Unavailable for individuals’ deployment ● MemGator ○ Open source ✔ https://github.com/oduwsdl/MemGator ○ Requires static set of archives on-launch ○ Still specified by server, clients have no say ● With each, the set of archives is determined on the “server”. ● Neither allows client to specify set of archives aggregated. 6
  7. 7. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX HTTP Prefer ● RFC 7240 (June 2014) ● CLIENT requests with HTTP Header: ○ Prefer: foo; bar="" ● SERVER may response with HTTP Header: ○ Preference-Applied: foo 7
  8. 8. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX HTTP Prefer ● RFC 7240 (June 2014) ● CLIENT requests with HTTP Header: ○ Prefer: foo; bar="" ● SERVER may response with HTTP Header: ○ Preference-Applied: foo Prefer: archives="data:application/json;charset=utf-8;base64,Ww0KIC7...NCn0=" OUR APPROACH: 8
  9. 9. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX Prefer + Memento ● S. Jones, H. Van de Sompel, et al. “Mementos in the Raw” 1 ○ Prefer: original-content, original-links, original headers ○ Mitigate replay system rewriting, may “raw” information more accessible ● D.S.H. Rosenthal “Content negotiation and Memento” 2 ○ none, screenshot, altered-dom, url-rewritten, banner-inserted ○ Additional focus on derived representations 9 1 http://ws-dl.blogspot.com/2016/08/2016-08-15-mementos-in-raw-take-two.html 2 https://blog.dshr.org/2016/08/content-negotiation-and-memento.html
  10. 10. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX A More Capable, Transparent Aggregator 10
  11. 11. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX Memento Meta-Aggregator (MMA)1 ● Additional responsibilities beyond aggregation ● Provide hierarchical querying model to other aggregators ● Advanced querying models like Precedence and Short-Circuiting ● Systematic interaction and aggregation with Private and Personal Web archive 1 Kelly et al. “A Framework for Aggregator Private and Public Web Archives”, JCDL 2018 11
  12. 12. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX Bob Prefers to Exclude IA Captures 12 ✓ ✓
  13. 13. GET /archives Bob Requests Supported Archives 13 →{ }
  14. 14. Bob Customizes the Set in the JSON 14 →{ } ✓ ✓
  15. 15. Bob Requests CNN for His Custom Set 15 →{ } ( ) base64 encoded JSON transmitted
  16. 16. MMA Complies or Ignores Preference 16 →{ } →{ } ✓
  17. 17. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX Client-Side Archive Specification 17
  18. 18. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX [ { "id": "ia", "name": "Internet Archive", "timemap": "http://web.archive.org/web/timemap/link/", "timegate": "http://web.archive.org/web/", }, { "id": "alice", "name": "Alice’s Captures", "timemap": "http://localhost:8081/timemap/", "timegate": "http://localhost:8081/timegate/", }, … ] Respecification of archives.json 18 Base64 encoded Ww0KICB7...NCn0=
  19. 19. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX Requesting Custom Set of Archives with curl > GET /timemap/link/http://fox.cs.vt.edu/wadl2017.html HTTP/1.1 > Host: mma.cs.odu.edu > Prefer: archives="data:application/json;charset=utf-8;base64,Ww0KICB7...NCn0=" < HTTP /1.1 200 < content-type: application/link-format < vary: prefer < preference-applied: archives="data:application/json;charset=utf-8;base64,Ww0KICB7...NCn0=" < content-location: /timemap/link/5bd...8e9/http://fox.cs.vt.edu/wadl2017.html 19
  20. 20. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX Non-Aggregated Public Web Archives 20
  21. 21. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX Potential Approaches Toward Archival Set Persistence for Subsequent Queries 1. Maintain state ○ content-location: /timemap/link/5bd...8e9/http://fox.cs.vt.edu/wadl2017.html ○ Not something we want to do with HTTP 2. Require re-specification with each request ○ not portable to other users 3. Server-side set caching ○ combinatorial explosion 21
  22. 22. Client-Assisted Memento Aggregation Using the Prefer Header Mat Kelly, Sawood Alam, Michael L. Nelson, and Michele C. Weigle Old Dominion University Web Science & Digital Libraries Research Group {mkelly, salam, mln, mweigle}@cs.odu.edu @machawk1 • @WebSciDL Web Archiving and Digital Libraries (WADL) Workshop June 6, 2018, Fort Worth, TX

×