Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Client-Assisted Memento Aggregation Using the Prefer Header

Presented at the Web Archiving and Digital Libraries (WADL) Workshop in Fort Worth, Texas on June 6, 2018

  • Login to see the comments

  • Be the first to like this

Client-Assisted Memento Aggregation Using the Prefer Header

  1. 1. Client-Assisted Memento Aggregation Using the Prefer Header Mat Kelly, Sawood Alam, Michael L. Nelson, and Michele C. Weigle Old Dominion University Web Science & Digital Libraries Research Group {mkelly, salam, mln, mweigle}@cs.odu.edu @machawk1 • @WebSciDL Web Archiving and Digital Libraries (WADL) Workshop June 6, 2018, Fort Worth, TX
  2. 2. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX Proliferation of Personal Web Archives 2
  3. 3. @machawk1 A Framework for Aggregating Private and Public Web Archives JCDL 2018 • June 5, 2018 • Fort Worth, TX Today’s Memento Aggregation 3 Archives Queried (A0 )
  4. 4. @machawk1 A Framework for Aggregating Private and Public Web Archives JCDL 2018 • June 5, 2018 • Fort Worth, TX Motivation 4 Archives Queried (A0 ) > Include personal archives > Include other non-aggregated archives
  5. 5. @machawk1 A Framework for Aggregating Private and Public Web Archives JCDL 2018 • June 5, 2018 • Fort Worth, TX Motivation 5 Archives Queried (A0 ) > Include personal archives > Include other non-aggregated archives
  6. 6. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX State of Aggregators’ Capabilities ● Mementoweb aggregator ○ Cannot customize set of archives aggregated ○ Open source? Unavailable for individuals’ deployment ● MemGator ○ Open source ✔ https://github.com/oduwsdl/MemGator ○ Requires static set of archives on-launch ○ Still specified by server, clients have no say ● With each, the set of archives is determined on the “server”. ● Neither allows client to specify set of archives aggregated. 6
  7. 7. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX HTTP Prefer ● RFC 7240 (June 2014) ● CLIENT requests with HTTP Header: ○ Prefer: foo; bar="" ● SERVER may response with HTTP Header: ○ Preference-Applied: foo 7
  8. 8. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX HTTP Prefer ● RFC 7240 (June 2014) ● CLIENT requests with HTTP Header: ○ Prefer: foo; bar="" ● SERVER may response with HTTP Header: ○ Preference-Applied: foo Prefer: archives="data:application/json;charset=utf-8;base64,Ww0KIC7...NCn0=" OUR APPROACH: 8
  9. 9. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX Prefer + Memento ● S. Jones, H. Van de Sompel, et al. “Mementos in the Raw” 1 ○ Prefer: original-content, original-links, original headers ○ Mitigate replay system rewriting, may “raw” information more accessible ● D.S.H. Rosenthal “Content negotiation and Memento” 2 ○ none, screenshot, altered-dom, url-rewritten, banner-inserted ○ Additional focus on derived representations 9 1 http://ws-dl.blogspot.com/2016/08/2016-08-15-mementos-in-raw-take-two.html 2 https://blog.dshr.org/2016/08/content-negotiation-and-memento.html
  10. 10. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX A More Capable, Transparent Aggregator 10
  11. 11. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX Memento Meta-Aggregator (MMA)1 ● Additional responsibilities beyond aggregation ● Provide hierarchical querying model to other aggregators ● Advanced querying models like Precedence and Short-Circuiting ● Systematic interaction and aggregation with Private and Personal Web archive 1 Kelly et al. “A Framework for Aggregator Private and Public Web Archives”, JCDL 2018 11
  12. 12. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX Bob Prefers to Exclude IA Captures 12 ✓ ✓
  13. 13. GET /archives Bob Requests Supported Archives 13 →{ }
  14. 14. Bob Customizes the Set in the JSON 14 →{ } ✓ ✓
  15. 15. Bob Requests CNN for His Custom Set 15 →{ } ( ) base64 encoded JSON transmitted
  16. 16. MMA Complies or Ignores Preference 16 →{ } →{ } ✓
  17. 17. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX Client-Side Archive Specification 17
  18. 18. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX [ { "id": "ia", "name": "Internet Archive", "timemap": "http://web.archive.org/web/timemap/link/", "timegate": "http://web.archive.org/web/", }, { "id": "alice", "name": "Alice’s Captures", "timemap": "http://localhost:8081/timemap/", "timegate": "http://localhost:8081/timegate/", }, … ] Respecification of archives.json 18 Base64 encoded Ww0KICB7...NCn0=
  19. 19. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX Requesting Custom Set of Archives with curl > GET /timemap/link/http://fox.cs.vt.edu/wadl2017.html HTTP/1.1 > Host: mma.cs.odu.edu > Prefer: archives="data:application/json;charset=utf-8;base64,Ww0KICB7...NCn0=" < HTTP /1.1 200 < content-type: application/link-format < vary: prefer < preference-applied: archives="data:application/json;charset=utf-8;base64,Ww0KICB7...NCn0=" < content-location: /timemap/link/5bd...8e9/http://fox.cs.vt.edu/wadl2017.html 19
  20. 20. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX Non-Aggregated Public Web Archives 20
  21. 21. @machawk1 Client-Assisted Memento Aggregation Using the Prefer Header WADL 2018 • June 6, 2018 • Fort Worth, TX Potential Approaches Toward Archival Set Persistence for Subsequent Queries 1. Maintain state ○ content-location: /timemap/link/5bd...8e9/http://fox.cs.vt.edu/wadl2017.html ○ Not something we want to do with HTTP 2. Require re-specification with each request ○ not portable to other users 3. Server-side set caching ○ combinatorial explosion 21
  22. 22. Client-Assisted Memento Aggregation Using the Prefer Header Mat Kelly, Sawood Alam, Michael L. Nelson, and Michele C. Weigle Old Dominion University Web Science & Digital Libraries Research Group {mkelly, salam, mln, mweigle}@cs.odu.edu @machawk1 • @WebSciDL Web Archiving and Digital Libraries (WADL) Workshop June 6, 2018, Fort Worth, TX

×