Memento: TimeGates, TimeBundles, and TimeMaps

2,968 views

Published on

Presented at the NDIIPP Partners Meeting
Arlington VA
July 20-22 2010

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,968
On SlideShare
0
From Embeds
0
Number of Embeds
68
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Memento: TimeGates, TimeBundles, and TimeMaps

  1. 1. The Memento Team Herbert Van de Sompel Michael L. Nelson Robert Sanderson Lyudmila Balakireva Scott Ainsworth Harihar Shankar Memento: TimeGates, TimeBundles, and TimeMaps Memento is partially funded by the Library of Congress
  2. 2. W3C Web Architecture: Resource – URI - Representation Resource Representation Represents URI Identifies dereference
  3. 3. W3C Web Architecture: Resource – URI - Representation dereference content negotiation Resource URI Identifies Representation 1 Represents Representation 2 Represents
  4. 4. How does Memento achieve this? In order to fully understand how Memento introduces a time dimension to the Web, we present a brief recap of Transparent Content Negotiation (conneg) in HTTP. RFC 2295. Transparent Content Negotiation in HTTP, http://www.ietf.org/rfc/rfc2295.txt
  5. 5. HTTP GET on URI A
  6. 6. GET with conneg on URI T – Server Choice – 302 Found – Step 1 transparently negotiable resource
  7. 7. GET with conneg on URI T – Server Choice – 302 Found – Step 2
  8. 8. GET with conneg on URI T – Server List – 406 Not Acceptable
  9. 9. How does Memento do This? <ul><li>Component 1: Navigation towards an archived resource via its original resource, by leveraging content negotiation. </li></ul>
  10. 10. Terminology Intermission We introduce the term Memento to refer to an archived version of a resource. A Memento for a resource URI-R (as it existed) at time t i is a resource URI-M i [URI-R@t i ] for which the representation at any moment past its creation time t c is the same as the representation that was available from URI-R at time t i , with t c >= t i . Implicit in this definition is the notion that, once created, a Memento always keeps the same representation.
  11. 11. DT-conneg: Content Negotiation in the datetime dimension <ul><li>RFC 2295 introduces conneg in the following dimensions: media type, language, compression, character set, e.g.: </li></ul><ul><ul><ul><li>HTTP Request: </li></ul></ul></ul><ul><ul><ul><ul><li>Accept-Language: en-US </li></ul></ul></ul></ul><ul><ul><ul><li>HTTP Response: </li></ul></ul></ul><ul><ul><ul><ul><li>Content-Language: en-US </li></ul></ul></ul></ul><ul><li>Inspired by RFC 2295, Memento introduces datetime conneg: </li></ul><ul><ul><ul><li>HTTP Request: </li></ul></ul></ul><ul><ul><ul><ul><li>Accept-Datetime: Mon, 12 Oct 2009 14:20:33 GMT </li></ul></ul></ul></ul><ul><ul><ul><li>HTTP Response: </li></ul></ul></ul><ul><ul><ul><ul><li>Content-Datetime: Sun, 11 Oct 2009 11:18:05 GMT </li></ul></ul></ul></ul>
  12. 12. Terminology Intermission We introduce the term TimeGate to refer to a transparently negotiable resource that supports the datetime dimension. A TimeGate for an original resource URI-R is a transparently negotiable resource URI-G[URI-R] for which all variant resources are Mementos URI-M i [URI-R@t i ] of the resource URI-R. Since multiple archives may host versions of URI-R, multiple TimeGates may exist for any given resource, i.e. one per archive.
  13. 13. A Memento HTTP Navigation involving an Aggregator <ul><li>Scenario </li></ul><ul><li>www.digitalpreservation.gov points at TimeGate provided by an Aggregator </li></ul><ul><li>URI-R, URI-G, URI-M on different servers </li></ul>
  14. 14. Memento HTTP Flow HEAD R, Accept-Datetime 302  M, Vary, TCN, Link  R,B,M 200, Content-Datetime, Link  R,B,M GET G, Accept-Datetime GET M, Accept-Datetime 200, Link  G
  15. 15. Memento HTTP Flow: URI-R HEAD R, Accept-Datetime HEAD / HTTP/1.1 Host: www.digitalpreservation.gov Accept-Datetime: Sat, 10 Oct 2009 00:00:00 GMT Connection: close
  16. 16. Memento HTTP Flow HEAD R, Accept-Datetime 302  M, Vary, TCN, Link  R,B,M 200, Content-Datetime, Link  R,B,M GET G, Accept-Datetime GET M, Accept-Datetime 200, Link  G
  17. 17. Memento HTTP Flow: Success – URI-R HTTP/1.1 200 OK Date: Thu, 21 Jan 2010 00:02:12 GMT Server: Apache Link: <http://mementoproxy.lanl.gov/aggr/timegate/http://www.digitalpreservation.gov/> ; rel=“timegate” Content-Length: 255 Connection: close Content-Type: text/html; charset=iso-8859-1 200, Link  G
  18. 18. Memento HTTP Flow HEAD R, Accept-Datetime 302  M, Vary, TCN, Link  R,B,M 200, Content-Datetime, Link  R,B,M GET G, Accept-Datetime GET M, Accept-Datetime 200, Link  G
  19. 19. Memento HTTP Flow: URI-G GET G, Accept-Datetime GET /aggr/timegate/http://www.digitalpreservation.gov/ HTTP/1.1 Host: mementoproxy.lanl.gov Accept-Datetime: Sat, 10 Oct 2009 00:00:00 GMT Connection: close
  20. 20. Memento HTTP Flow HEAD R, Accept-Datetime 302  M, Vary, TCN, Link  R,B,M 200, Content-Datetime, Link  R,B,M GET G, Accept-Datetime GET M, Accept-Datetime 200, Link  G
  21. 21. Memento HTTP Flow: Success – URI-G 302  M, Vary, Link  R,B,M HTTP/1.1 302 Found Date: Thu, 21 Jan 2010 00:06:50 GMT Server: Apache TCN: choice Vary: negotiate, accept-datetime Location: http://wayback.archive-it.org/1610/20090928171405/http:// www.digitalpreservation.gov/ Link: <http://www.digitalpreservation.gov/>; rel=&quot;original&quot;, <http://mementoproxy.lanl.gov/aggr/timebundle/http://www.digitalpreservation.gov/>; rel=&quot;timebundle”, <http://wayback.archive -it.org/256/20051108162921/http://www.digitalpreservation.gov/>; rel=“first-memento”; datetime=“Tue, 08 Nov 2005 00:00:00 GMT”, <http://webcitation.org/query?id=1257028234035091>; rel=“next-memento”; datetime=”Sat, 31 Oct 2009 18:30:35 GMT”, <http://webcitation.org/query?id=1213058061345794>; rel=“prev-memento”; datetime=&quot;Mon, 09 Jun 2008 20:34:23 GMT”, <http://wayback.archive -it.org/256/20100120102000/http://www.digitalpreservation.gov/>; rel=“last-memento”; datetime=”Wed, 20 Jan 2010 10:20:00 GMT” Content-Length: 0 Connection: close
  22. 22. Memento HTTP Flow HEAD R, Accept-Datetime 302  M, Vary, TCN, Link  R,B,M 200, Content-Datetime, Link  R,B,M GET G, Accept-Datetime GET M, Accept-Datetime 200, Link  G
  23. 23. Memento HTTP Flow: URI-M GET M, Accept-Datetime GET /1610/20090928171405/http://www.digitalpreservation.gov/ HTTP/1.1 Host: wayback.archive-it.org Accept-Datetime: Sat, 10 Oct 2009 00:00:00 GMT Connection: close
  24. 24. Memento HTTP Flow HEAD R, Accept-Datetime 302  M, Vary, TCN, Link  R,B,M 200, Content-Datetime, Link  R,B,M GET G, Accept-Datetime GET M, Accept-Datetime 200, Link  G
  25. 25. Memento HTTP Flow: Success – URI-M 200, Content-Datetime, Link  R,B,M HTTP/1.1 200 OK Server: Apache-Coyote/1.1 X-Archive-Orig-Accept-Ranges: bytes … Content-Type: text/html;charset=utf-8 Content-Length: 23364 Date: Thu, 21 Jan 2010 00:09:40 GMT Content-Datetime: Mon, 28 Sep 2009 17:14:05 GMT Link: <http://www.digitalpreservation.gov/>; rel=&quot;original&quot;, <http://wayback.archive-it.org/web/timebundle/http://www.digitalpreservation.gov/>; rel=&quot;timebundle”, <http://wayback.archive -it.org/256/20051108162921/http://www.digitalpreservation.gov/>; rel=“first-memento”; datetime=“Tue, 08 Nov 2005 00:00:00 GMT”, <http://wayback.archive -it.org/256/20100120102000/http://www.digitalpreservation.gov/>; rel=“last-memento”; datetime=”Wed, 20 Jan 2010 10:20:00 GMT” Connection: close Link header values are local to wayback.archive-it.org and different than those provided by URI-G
  26. 26. 300 Multiple Choices HTTP/1.1 300 Multiple Choices Server: Apache Content-Length: 705 Content-Type: text/html; charset=utf-8 Date: Thu, 21 Jan 2010 00:09:40 GMT TCN: list Vary: negotiate, accept-datetime Link: < http://en.wikipedia.org/Special:TimeBundle/http://en.wikipedia.org/wiki/DJ_Shadow >; rel=&quot;timebundle&quot;, < http://en.wikipedia.org/wiki/DJ_Shadow >; rel=&quot;original”, <http://en.wikipedia.org/w/index.php?title=DJ_Shadow&oldid=1493688>; rel=“first-memento”; datetime=&quot;Sun, 28 Sep 2003 01:42:00 GMT”, <http://en.wikipedia.org/w/index.php?title=DJ_Shadow&oldid=337446696>; rel=“last-memento”; datetime=&quot;Tue, 12 Jan 2010 19:55:00 GMT”, <http://en.wikipedia.org/w/index.php?title=DJ_Shadow&oldid=322586071>; rel=“prev-memento”; datetime=&quot;Wed, 28 Oct 2009 14:307:00 GMT”, <http://en.wikipedia.org/w/index.php?title=DJ_Shadow&oldid=326164283” ; rel=“next-memento”; datetime=&quot;Thu, 26 Nov 2009 23:50:00 GMT” Connection: close <ul><li>Two scenarios that generate a 300 at the TimeGate: </li></ul><ul><ul><li>A client requests a 300 using the “Negotiate: 1.0” request header </li></ul></ul><ul><ul><li>An archive has two or more Mementos with the same Datetime (HTTP only supports second-level granularity) </li></ul></ul>
  27. 27. 406 Not Acceptable <ul><li>A client request for a Memento with a datetime outside the first and last values will generate a 406 </li></ul><ul><li>For example a request in Wikipedia with: </li></ul><ul><li>Accept-DateTime: Mon, 31 May 1999 00:00:00 GMT </li></ul>HTTP/1.1 406 Not Acceptable Server: Apache Content-Length: 709 Content-Type: text/html; charset=utf-8 Date: Thu, 21 Jan 2010 00:09:40 GMT Vary: negotiate, accept-datetime TCN: list Link: < http://en.wikipedia.org/Special:TimeBundle/http://en.wikipedia.org/wiki/DJ_Shadow >; rel=&quot;timebundle&quot;, < http://en.wikipedia.org/wiki/DJ_Shadow >; rel=&quot;original”, <http://en.wikipedia.org/w/index.php?title=DJ_Shadow&oldid=1493688>; rel=“first-memento”; datetime=&quot;Sun, 28 Sep 2003 01:42:00 GMT”, <http://en.wikipedia.org/w/index.php?title=DJ_Shadow&oldid=337446696>; rel=“last-memento”; datetime=&quot;Tue, 12 Jan 2010 19:55:00 GMT”, Connection: close
  28. 28. The Web with Time Dimension added by Memento
  29. 29. How does Memento do This? <ul><li>There are two components to the Memento Solution: </li></ul><ul><li>Component 1: Navigation towards an archived resource via its original resource, by leveraging content negotiation. </li></ul><ul><li>Component 2: A discovery API for archives that allows requesting a list of all archived versions it holds for a resource with a given URI. </li></ul>Done
  30. 30. <ul><li>Mementos for any given URI-R are distributed across archives. </li></ul><ul><li>In order to get a correct perspective of available Mementos, different archives need to be consulted. </li></ul><ul><li>Can do so in distributed consultation mode (slooow), or by consulting an aggregator. </li></ul>Why an API?
  31. 31. Terminology Intermission We introduce the term TimeBundle to refer to a resource via which an overview of all Mementos for an original resource URI-R is available. <ul><li>A TimeBundle for a resource URI-R, is a resource URI-B[URI-R] that is an aggregation of: </li></ul><ul><li>All Mementos URI-Mi [URI-R@t i ] available from an archive, </li></ul><ul><li>The archive's TimeGate URI-G for URI-R, </li></ul><ul><li>The original resource URI-R itself. </li></ul>
  32. 32.
  33. 33. Memento DT-conneg component
  34. 34. See OAI-ORE: http://www.openarchives.org/ore/1.0/toc/ Memento DT-conneg component
  35. 35. Memento DT-conneg component Memento discovery component
  36. 36. Recall URI-G Response… 302  M, Vary, Link  R,B,M HTTP/1.1 302 Found Date: Thu, 21 Jan 2010 00:06:50 GMT Server: Apache TCN: choice Vary: negotiate, accept-datetime Location: http://wayback.archive-it.org/1610/20090928171405/http:// www.digitalpreservation.gov/ Link: <http://www.digitalpreservation.gov/>; rel=&quot;original&quot;, <http://mementoproxy.lanl.gov/aggr/timebundle/http://www.digitalpreservation.gov/>; rel=&quot;timebundle”, <http://wayback.archive -it.org/256/20051108162921/http://www.digitalpreservation.gov/>; rel=“first-memento”; datetime=“Tue, 08 Nov 2005 00:00:00 GMT”, <http://webcitation.org/query?id=1257028234035091>; rel=“next-memento”; datetime=”Sat, 31 Oct 2009 18:30:35 GMT”, <http://webcitation.org/query?id=1213058061345794>; rel=“prev-memento”; datetime=&quot;Mon, 09 Jun 2008 20:34:23 GMT”, <http://wayback.archive -it.org/256/20100120102000/http://www.digitalpreservation.gov/>; rel=“last-memento”; datetime=”Wed, 20 Jan 2010 10:20:00 GMT” Content-Length: 0 Connection: close
  37. 37. Dereferencing URI-B % telnet mementoproxy.lanl.gov 80 Trying 204.121.6.37... Connected to ttt.lanl.gov. Escape character is '^]'. HEAD /aggr/timebundle/http://www.digitalpreservation.gov/ HTTP/1.1 Host: mementoproxy.lanl.gov Connection: close HTTP/1.1 303 See Other Date: Wed, 21 Jul 2010 03:09:46 GMT Server: Apache Location: http://mementoproxy.lanl.gov/aggr/timemap/rdf/http://www.digitalpreservation.gov/ Vary: Accept Connection: close Content-Type: text/plain; charset=UTF-8 Connection closed by foreign host.
  38. 38. RDF?! Yuck! % telnet mementoproxy.lanl.gov 80 Trying 204.121.6.37... Connected to ttt.lanl.gov. Escape character is '^]'. HEAD /aggr/timebundle/http://www.digitalpreservation.gov/ HTTP/1.1 Accept: application/rdf+xml; q=0.0 Host: mementoproxy.lanl.gov Connection: close HTTP/1.1 303 See Other Date: Wed, 21 Jul 2010 03:12:42 GMT Server: Apache Location: http://mementoproxy.lanl.gov/aggr/timemap/link/http://www.digitalpreservation.gov/ Vary: Accept Connection: close Content-Type: text/plain; charset=UTF-8 Connection closed by foreign host.
  39. 39. TimeMap http://mementoproxy.lanl.gov/aggr/timemap/rdf/http://www.digitialpreservation.gov/ http://mementoproxy.lanl.gov/aggr/timemap/link/http://www.digitialpreservation.gov/ <http://mementoproxy.lanl.gov/aggr/timebundle/http://www.digitalpreservation.gov/>;rel=&quot;timebundle&quot;, <http://www.digitalpreservation.gov/>;rel=&quot;original&quot;, <http://web.archive.org/web/20020802022406/www.digitalpreservation.gov/>;rel=&quot;first-memento&quot;;datetime=&quot;Fri, 02 Aug 2002 02:24:06 GMT&quot;, <http://web.archive.org/web/20020921111830/www.digitalpreservation.gov/>;rel=&quot;memento&quot;;datetime=&quot;Sat, 21 Sep 2002 11:18:30 GMT&quot;, <http://web.archive.org/web/20020924113650/www.digitalpreservation.gov/>;rel=&quot;memento&quot;;datetime=&quot;Tue, 24 Sep 2002 11:36:50 GMT&quot;, <http://web.archive.org/web/20020927005417/www.digitalpreservation.gov/>;rel=&quot;memento&quot;;datetime=&quot;Fri, 27 Sep 2002 00:54:17 GMT&quot;, … [deletia]… <http://webarchive.nationalarchives.gov.uk/20080911010610/http://www.digitalpreservation.gov/>;rel=&quot;memento&quot;;datetime=&quot;Thu, 11 Sep 2008 00:00:00 GMT&quot;, <http://web.archive.org/web/20090516160321/www.digitalpreservation.gov/>;rel=&quot;memento&quot;;datetime=&quot;Sat, 16 May 2009 16:03:21 GMT&quot;, <http://web.archive.org/web/20090616162603/www.digitalpreservation.gov/>;rel=&quot;memento&quot;;datetime=&quot;Tue, 16 Jun 2009 16:26:03 GMT&quot;, <http://web.archive.org/web/20090716162514/www.digitalpreservation.gov/>;rel=&quot;memento&quot;;datetime=&quot;Thu, 16 Jul 2009 16:25:14 GMT&quot;, <http://web.archive.org/web/20090816181051/www.digitalpreservation.gov/>;rel=&quot;memento&quot;;datetime=&quot;Sun, 16 Aug 2009 18:10:51 GMT&quot;, <http://web.archive.org/web/20090916193533/www.digitalpreservation.gov/>;rel=&quot;memento&quot;;datetime=&quot;Wed, 16 Sep 2009 19:35:33 GMT&quot;, <http://wayback.archive-it.org/1610/20090928171405/http://www.digitalpreservation.gov/>;rel=&quot;memento&quot;;datetime=&quot;Mon, 28 Sep 2009 0 0:00:00 GMT&quot;, <http://web.archive.org/web/20091016235112/www.digitalpreservation.gov/>;rel=&quot;memento&quot;;datetime=&quot;Fri, 16 Oct 2009 23:51:12 GMT&quot;, <http://webcitation.org/query?id=1257028234035091>;rel=&quot;memento&quot;;datetime=&quot;Sat, 31 Oct 2009 18:30:35 GMT&quot;, <http://web.archive.org/web/20091116214743/www.digitalpreservation.gov/>;rel=&quot;memento&quot;;datetime=&quot;Mon, 16 Nov 2009 21:47:43 GMT&quot;, <http://web.archive.org/web/20091216192113/www.digitalpreservation.gov/>;rel=&quot;memento&quot;;datetime=&quot;Wed, 16 Dec 2009 19:21:13 GMT&quot;, <http://web.archive.org/web/20100116192640/www.digitalpreservation.gov/>;rel=&quot;memento&quot;;datetime=&quot;Sat, 16 Jan 2010 19:26:40 GMT&quot;, <http://web.archive.org/web/20100216193825/www.digitalpreservation.gov/>;rel=&quot;memento&quot;;datetime=&quot;Tue, 16 Feb 2010 19:38:25 GMT&quot;, <http://web.archive.org/web/20100316200421/www.digitalpreservation.gov/>;rel=&quot;memento&quot;;datetime=&quot;Tue, 16 Mar 2010 20:04:21 GMT&quot;, <http://web.archive.org/web/20100416195253/www.digitalpreservation.gov/>;rel=&quot;memento&quot;;datetime=&quot;Fri, 16 Apr 2010 19:52:53 GMT&quot;, <http://web.archive.org/web/20100516200754/www.digitalpreservation.gov/>;rel=&quot;last-memento&quot;;datetime=&quot;Sun, 16 May 2010 20:07:54 GMT&quot;
  40. 40. TimeBundle API: For Discovery, Cross-Archive Services <ul><li>Archive uses common approaches to make TimeBundles/TimeMaps discoverable: </li></ul><ul><ul><li>SiteMaps, </li></ul></ul><ul><ul><li>Atom Feeds, </li></ul></ul><ul><ul><li>OAI-PMH. </li></ul></ul><ul><li>Aggregator harvests and merges TimeMaps. Based on this information, the Aggregator exposes its own TimeGates. </li></ul><ul><ul><li>Cross-archive </li></ul></ul><ul><ul><li>Finer datetime granularity </li></ul></ul><ul><ul><li>Better chances of matching a client’s datetime preference. </li></ul></ul><ul><ul><li>Can become a shared target for redirection for many web servers. </li></ul></ul>
  41. 41. How does Memento do This? <ul><li>There are two components to the Memento Solution: </li></ul><ul><li>Component 1: Navigation towards an archived resource via its original resource, by leveraging content negotiation. </li></ul><ul><li>Component 2: A discovery API for archives that allows requesting a list of all archived versions it holds for a resource with a given URI. </li></ul>Done Done
  42. 42. Memento wants to make navigating the Web’s Past Easy <ul><ul><li>http://www.mementoweb.org </li></ul></ul><ul><ul><li>http://groups.google.com/group/memento-dev </li></ul></ul>

×