TimeMaps: Metadata for Memento


                                  Herbert Van de Sompel
                                   Robert Sanderson
                                    Michael L. Nelson
                                   Lyudmila Balakireva
                                     Scott Ainsworth
                                     Harihar Shankar


                              http://www.mementoweb.org/


                                 Memento is partially funded by the
                                      Library of Congress




       TimeMaps: Metadata for Memento
   GSLIS Metadata Group, UIUC, 14th July 2010
Memento wants to make Navigating the Web’s Past Easy


•    Problem Statement

•    Memento Solution
       •  Navigation not Search
        •    API for Web Archives

•    Memento Ontology for TimeMaps



                       http://www.mementoweb.org/
             http://groups.google.com/group/memento-dev
                         TimeMaps: Metadata for Memento
                     GSLIS Metadata Group, UIUC, 14th July 2010
Web Resources have Different Representations over Time




                  TimeMaps: Metadata for Memento
              GSLIS Metadata Group, UIUC, 14th July 2010
Thankfully Archived Representations Exist




           TimeMaps: Metadata for Memento
       GSLIS Metadata Group, UIUC, 14th July 2010
3 Issues with Current Access to Archives

1.  Access is via a new URI, unknown to the user.

2.  People do not like to search for archived resources, and there is no
    automated method

3.  Navigation in the past is inconsistent:
      1.  Stuck in single, necessarily incomplete archive
      2.  Or if not rewritten, URIs lead back to the present




            Comment on Popular Science article:     http://bit.ly/bWr5gP


                        TimeMaps: Metadata for Memento
                    GSLIS Metadata Group, UIUC, 14th July 2010
1. Representations Archived at a Different URI




          Sep 11 2001, 20:36:10 UTC                                Dec 20 2001, 4:51:00 UTC

                                                               http://en.wikipedia.org/w/index.php?
http://web.archive.org/web/20010911203610/http://     title=September_11_attacks&oldid=282333 archived
www.cnn.com/ archived resource for http://cnn.com            resource for http://en.wikipedia.org/wiki/
                                                                       September_11_attacks

                                  TimeMaps: Metadata for Memento
                              GSLIS Metadata Group, UIUC, 14th July 2010
2. Searching is Cumbersome




http://web.archive.org/web/*/http://cnn.com/                 http://en.wikipedia.org/w/index.php?
                                                        title=September_11_attacks&action=history


                                TimeMaps: Metadata for Memento
                            GSLIS Metadata Group, UIUC, 14th July 2010
3. Inconsistent Navigation (Archives Incomplete)




    SPACE




           Sep 11 2001, 20:36:10 UTC                             Sep 11 2001, 21:38:55 UTC

http://web.archive.org/web/20010911203610/http://        http://web.archive.org/web/20010911213855/
www.cnn.com/ archived resource for http://cnn.com                 www.cnn.com/TECH/space/


                                  TimeMaps: Metadata for Memento
                              GSLIS Metadata Group, UIUC, 14th July 2010
3. Inconsistent Navigation (Can't Stay in Past)




                                  Pentagon




            Dec 20 2001, 4:51:00 UTC                                        current
         http://en.wikipedia.org/w/index.php?
title=September_11_attacks&oldid=282333 archived           http://en.wikipedia.org/wiki/The_Pentagon
       resource for http://en.wikipedia.org/wiki/
                September_11_attacks3


                                  TimeMaps: Metadata for Memento
                              GSLIS Metadata Group, UIUC, 14th July 2010
Past and Current Web are Not Integrated




          TimeMaps: Metadata for Memento
      GSLIS Metadata Group, UIUC, 14th July 2010
The Web without a Time Dimension




Need to use a different URI to access archived versions of a resource and its current version

                              TimeMaps: Metadata for Memento
                          GSLIS Metadata Group, UIUC, 14th July 2010
The Web with Time Dimension added by Memento




Memento uses URI of the current version to access archived versions, but qualify it
          with datetime, and magically arrive at the correct location.

                         TimeMaps: Metadata for Memento
                     GSLIS Metadata Group, UIUC, 14th July 2010
The Memento Solution



There are two components to the Memento Solution:

•    Component 1: Navigation to an archived resource
     via its original resource, by leveraging content
     negotiation.

•    Component 2: A discovery API for archives that
     enables retrieving a list of all archived versions of a
     resource for a given URI.



                      TimeMaps: Metadata for Memento
                  GSLIS Metadata Group, UIUC, 14th July 2010
Content Negotiation in Time

•    Many systems support content negotiation for file format
      o  Your client by default asks for HTML and gets HTML

      o  But it could get PDF via the same URI



•    Memento proposes a new dimension for content negotiation: Time
      o  Your client by default asks for the current time, and gets it

      o  But it could get an older version via the same URI



•    Can be accomplished with only one new HTTP header in each
     direction:

      o    Accept-Datetime             Request for a particular timestamp
      o    Content-Datetime            The returned content’s timestamp

      o    These exactly mirror existing headers for Format, Language, etc.

                           TimeMaps: Metadata for Memento
                       GSLIS Metadata Group, UIUC, 14th July 2010
Apr 10 2001, 21:39:30 UTC




current



                                                               Aug 15 2004, 08:45:27 UTC



                              Aug 15 2007, 19:21:58 UTC


www.cnn.com                                  web.archive.org

                  TimeMaps: Metadata for Memento
              GSLIS Metadata Group, UIUC, 14th July 2010
Original                                                                      Mementos
Resource




                                               Apr 10 2001, 21:39:30 UTC




current



                                                               Aug 15 2004, 08:45:27 UTC



                              Aug 15 2007, 19:21:58 UTC


www.cnn.com                                  web.archive.org

                  TimeMaps: Metadata for Memento
              GSLIS Metadata Group, UIUC, 14th July 2010
Original
                           ?                                                  Mementos
Resource




                                               Apr 10 2001, 21:39:30 UTC




current



                                                               Aug 15 2004, 08:45:27 UTC



                              Aug 15 2007, 19:21:58 UTC


www.cnn.com                                  web.archive.org

                  TimeMaps: Metadata for Memento
              GSLIS Metadata Group, UIUC, 14th July 2010
Original                TimeGate                                              Mementos
Resource




                                               Apr 10 2001, 21:39:30 UTC




current



                                                               Aug 15 2004, 08:45:27 UTC



                              Aug 15 2007, 19:21:58 UTC


www.cnn.com                                  web.archive.org

                  TimeMaps: Metadata for Memento
              GSLIS Metadata Group, UIUC, 14th July 2010
Conneg with TimeGate to Mementos

Original                TimeGate                                              Mementos
Resource




                                               Apr 10 2001, 21:39:30 UTC




current



                                                               Aug 15 2004, 08:45:27 UTC



                              Aug 15 2007, 19:21:58 UTC


www.cnn.com                                  web.archive.org

                  TimeMaps: Metadata for Memento
              GSLIS Metadata Group, UIUC, 14th July 2010
Link Headers                  Conneg with TimeGate to Mementos

Original                      TimeGate                                              Mementos
Resource




                                                     Apr 10 2001, 21:39:30 UTC




current



                                                                     Aug 15 2004, 08:45:27 UTC



                                    Aug 15 2007, 19:21:58 UTC


www.cnn.com                                        web.archive.org

                        TimeMaps: Metadata for Memento
                    GSLIS Metadata Group, UIUC, 14th July 2010
Link Headers                  Conneg with TimeGate to Mementos

Original                   TimeGate                                         Mementos
Resource




                                wikipedia.org

                     TimeMaps: Metadata for Memento
                 GSLIS Metadata Group, UIUC, 14th July 2010
The Web with Time Dimension added by Memento




              TimeMaps: Metadata for Memento
          GSLIS Metadata Group, UIUC, 14th July 2010
The Memento Solution




•    Component 2: A discovery API for archives that
     allows requesting a list of all archived versions held
     for a resource with a given URI.



                     TimeMaps: Metadata for Memento
                 GSLIS Metadata Group, UIUC, 14th July 2010
Why an API?

•    Mementos for any given resource are distributed across archives.
     (What? Not just the Internet Archive?!)

•    In order to get a correct perspective of available Mementos, different
     archives need to be consulted.

•    Can do by distributed search (slow), or by consulting an aggregator.

•    Aggregator and other services need machine readable description of
     archives' holdings to select appropriate Memento for request
        •  Closest in time
        •  Most reliable representation
        •  Fastest responding
        •  (etc)



                           TimeMaps: Metadata for Memento
                       GSLIS Metadata Group, UIUC, 14th July 2010
WebCitation   13 May 2009 12:28:39




    TimeMaps: Metadata for Memento
GSLIS Metadata Group, UIUC, 14th July 2010
WebCitation   13 May 2009 12:28:39
                                     Archive-It    14 May 2009 01:18:11




    TimeMaps: Metadata for Memento
GSLIS Metadata Group, UIUC, 14th July 2010
WebCitation    13 May 2009 12:28:39
                                     Archive-It     14 May 2009 01:18:11
                                       BL Archive   14 May 2009 07:12:45




    TimeMaps: Metadata for Memento
GSLIS Metadata Group, UIUC, 14th July 2010
WebCitation    13 May 2009 12:28:39
                                     Archive-It     14 May 2009 01:18:11
                                       BL Archive   14 May 2009 07:12:45
                                         Dracos     14 May 2009 13:00:00




    TimeMaps: Metadata for Memento
GSLIS Metadata Group, UIUC, 14th July 2010
WebCitation    13 May 2009 12:28:39
                                     Archive-It     14 May 2009 01:18:11
                                       BL Archive   14 May 2009 07:12:45
                                         Dracos     14 May 2009 13:00:00
                                            TNA     14 May 2009 18:21:32




                                               And no Internet
                                               Archive…




    TimeMaps: Metadata for Memento
GSLIS Metadata Group, UIUC, 14th July 2010
TimeMaps
•  At most basic: List of URIs of Mementos and their times
•  Expressed as Linked Data; a profile of OAI ORE Resource Maps
•  Link header from TimeGate and Memento




                    TimeMaps: Metadata for Memento
                GSLIS Metadata Group, UIUC, 14th July 2010
Basic ORE Model

Aggregation (Aggr) is a set of web resources (R-1 to R-3), described in RDF or
Atom by a Resource Map (ReM).




                          TimeMaps: Metadata for Memento
                      GSLIS Metadata Group, UIUC, 14th July 2010
TimeBundles

Resources of Interest in Memento:
   •  Original Resource
   •  TimeGate
   •  Mementos




                       TimeMaps: Metadata for Memento
                   GSLIS Metadata Group, UIUC, 14th July 2010
TimeGates

•  Period(s) that the TimeGate covers
•  Which resource is it a TimeGate for
•  mem:TimeSpan as can cover multiple distinct periods




                    TimeMaps: Metadata for Memento
                GSLIS Metadata Group, UIUC, 14th July 2010
Mementos

•  Time Period: valid for or observed over, number of observations
•  Metadata: size, format, etc (will come back to the "etc")
•  Which resource it is a Memento for




                    TimeMaps: Metadata for Memento
                GSLIS Metadata Group, UIUC, 14th July 2010
Serializations

•  RDF/XML
    •  Good for XML parsers

•  Turtle, N3 and related
    •  Good for graph parsers

•  RDFa
    •  Good for web browsers

•  Atom
    •  Good for alerting, feed readers etc (but still embeds RDF)

•  New: Link Header format
    •  Good for real-time applications
    •  Smaller file size (just the facts, ma'am)
    •  Easy to implement with existing link header parsers
    •  Servers need to produce format anyway, so non-rdf way out

                    TimeMaps: Metadata for Memento
                GSLIS Metadata Group, UIUC, 14th July 2010
Use Case: Aggregator using TimeMaps




         TimeMaps: Metadata for Memento
     GSLIS Metadata Group, UIUC, 14th July 2010
Link Headers                    Conneg with TimeGate to Mementos

Original                  TimeGate                                            Mementos
Resource




                              TimeMaps: Metadata for Memento
                          GSLIS Metadata Group, UIUC, 14th July 2010
Metadata Discussion Points

1.  What metadata is necessary to determine the most appropriate copy?

       •    Distance to requested time most important
       •    Quality of representation?
       •    Usage statistics for Original Resource? For Memento?
       •    User tagging of Memento for quality?
       •    Archive response speed?
       •    Need to know more information from user preferences?


2.  What other metadata is useful and available?

       •    Crawling archives have limited information
       •    CMS systems have much more
       •    User tags, comments, annotations
       •    Semantic information about content, eg title, author, subject
       •    Distribution of changes over time




                           TimeMaps: Metadata for Memento
                       GSLIS Metadata Group, UIUC, 14th July 2010
Metadata Discussion Points

3.  What metadata is necessary for inter-archive synchronization?

       •    Deduplication information: digests, request headers
       •    "Significant Change" factors
       •    Crawler settings: respect no-cache, robots.txt etc


4.  What metadata can be generated by other services?

       •    Open World Model: Anyone can say anything about anything
       •    Technical metadata easy (MIX for images, etc)
       •    Time Series Analysis interesting (techtales.org)
       •    Machine Learning based approaches?




                           TimeMaps: Metadata for Memento
                       GSLIS Metadata Group, UIUC, 14th July 2010
Thank You 

Rob Sanderson:
    •  azaroth42@gmail.com
    •  rsanderson@lanl.gov

This presentation:
    •    http://www.slideshare.net/azaroth42/xxx

Memento:
   •   http://www.mementoweb.org/
   •   http:groups.google.com/group/memento-dev

MementoFox:
   •   https://addons.mozilla.com/en-US/firefox/addon/100298
         aka: http://bit.ly/memfox



           Memento Enables Navigating the Past Web

                         TimeMaps: Metadata for Memento
                     GSLIS Metadata Group, UIUC, 14th July 2010
Discussion Questions

1.  What metadata is necessary to determine the most appropriate copy?


2.  What other metadata is useful and available?


3.  What metadata is necessary for inter-archive synchronization?


4.  What metadata can be generated by other services?




                         TimeMaps: Metadata for Memento
                     GSLIS Metadata Group, UIUC, 14th July 2010
Appendix: Memento HTTP Flow


    HEAD R, (Accept-Datetime)


             LinkG


     GET G, Accept-Datetime


 302M, Vary, TCN, LinkR,B,M


    GET M, (Accept-Datetime)


200, Content-Datetime, LinkR,B,M
Memento HTTP
        Memento HTTP Flow



Flow
     HEAD R, (Accept-Datetime)


              LinkG


      GET G, Accept-Datetime


  302M, Vary, TCN, LinkR,B,M


     GET M, (Accept-Datetime)


 200, Content-Datetime, LinkR,B,M
Memento HTTP
                            Memento HTTP Flow



            Flow: URI-R
                        HEAD R, (Accept-Datetime)


HEAD http://cnn.com/ HTTP/1.1
Host: cnn.com
Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT
Connection: close
Memento HTTP
        Memento HTTP Flow



Flow
     HEAD R, (Accept-Datetime)


              LinkG


      GET G, Accept-Datetime


  302M, Vary, TCN, LinkR,B,M


     GET M, (Accept-Datetime)


 200, Content-Datetime, LinkR,B,M
Memento HTTP
                            Memento HTTP Flow



            Flow: Success –
                                     LinkG


HTTP/1.1 200 OK



            URI-R
Date: Thu, 21 Jan 2010 00:02:12 GMT
Server: Apache
Link: <http://web.archive.org/web/timegate/http://cnn.com>; rel="timegate"
Content-Length: 255
Connection: close
Content-Type: text/html; charset=iso-8859-1
Memento HTTP
        Memento HTTP Flow



Flow
     HEAD R, (Accept-Datetime)


              LinkG


      GET G, Accept-Datetime


  302M, Vary, TCN, LinkR,B,M


     GET M, (Accept-Datetime)


 200, Content-Datetime, LinkR,B,M
Memento HTTP Flow


                          GET G, Accept-Datetime


GET http://web.archive.org/web/timegate/http://cnn.com HTTP/1.1
Host: cnn.com
Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT
Connection: close
Memento HTTP
        Memento HTTP Flow



Flow
      HEAD R, Accept-Datetime


              LinkG


      GET G, Accept-Datetime


  302M, Vary, TCN, LinkR,B,M


      GET M, Accept-Datetime


 200, Content-Datetime, LinkR,B,M
Memento HTTP Flow


                        302M, Vary, LinkR,B,M

HTTP/1.1 302 Found
Date: Thu, 21 Jan 2010 00:06:50 GMT
Server: Apache
TCN: choice
Vary: negotiate, accept-datetime
Location: http://web.archive.org/web/20010911203610/http://www.cnn.com
Link: <http://cnn.com/>; rel="original",
<http://web.archive.org/web/timebundle/http://cnn.com/>; rel="timebundle”,
<http://web.archive.org/web/20000915112826/http://www.cnn.com>; rel=“first-memento”;
datetime=“Tue, 15 Sep 2000 11:28:26 GMT”,
<http://web.archive.org/web/20080708093433/http://www.cnn.com>; rel=“last-memento”;
datetime="Tue, 08 Jul 2008 09:34:33 GMT”,
<http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel=“prev-memento”;
datetime="Tue, 11 Sep 2001 20:30:51 GMT”,
<http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel=“next-memento”;
datetime="Tue, 11 Sep 2001 20:47:33 GMT”
Content-Length: 0
Connection: close
Content-Type: text/plain; charset=UTF-8
Memento HTTP Flow


    HEAD R, (Accept-Datetime)


             LinkG


     GET G, Accept-Datetime


 302M, Vary, TCN, LinkR,B,M


    GET M, (Accept-Datetime)


200, Content-Datetime, LinkR,B,M
Memento HTTP Flow


                          GET M, Accept-Datetime

GET http://web.archive.org/web/20010911203610/http://www.cnn.com HTTP/1.1
Host: web.archive.org
Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT
Connection: close
Flow
        Memento HTTP Flow


     HEAD R, (Accept-Datetime)


              LinkG


      GET G, Accept-Datetime


  302M, Vary, TCN, LinkR,B,M


     GET M, (Accept-Datetime)


 200, Content-Datetime, LinkR,B,M
Memento HTTP Flow


                  200, Content-Datetime, LinkR,B,M

HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
X-Archive-Orig-Accept-Ranges: bytes
…
Content-Type: text/html;charset=utf-8
Content-Length: 23364
Date: Thu, 21 Jan 2010 00:09:40 GMT
Content-Datetime: Tue, 11 Sep 2001 20:36:10 GMT
Link: <http://cnn.com/>; rel="original",
<http://web.archive.org/web/timebundle/http://cnn.com/>; rel="timebundle”,
<http://web.archive.org/web/20000915112826/http://www.cnn.com>; rel=“first-memento”;
datetime=“Tue, 15 Sep 2000 11:28:26 GMT”,
<http://web.archive.org/web/20080708093433/http://www.cnn.com>; rel=“last-memento”;
datetime="Tue, 08 Jul 2008 09:34:33 GMT”,
<http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel=“prev-memento”;
datetime="Tue, 11 Sep 2001 20:30:51 GMT”,
<http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel=“next-memento”;
datetime="Tue, 11 Sep 2001 20:47:33 GMT”
Connection: close

TimeMaps: Metadata for Memento

  • 1.
    TimeMaps: Metadata forMemento Herbert Van de Sompel Robert Sanderson Michael L. Nelson Lyudmila Balakireva Scott Ainsworth Harihar Shankar http://www.mementoweb.org/ Memento is partially funded by the Library of Congress TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 2.
    Memento wants tomake Navigating the Web’s Past Easy •  Problem Statement •  Memento Solution •  Navigation not Search •  API for Web Archives •  Memento Ontology for TimeMaps http://www.mementoweb.org/ http://groups.google.com/group/memento-dev TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 3.
    Web Resources haveDifferent Representations over Time TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 4.
    Thankfully Archived RepresentationsExist TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 5.
    3 Issues withCurrent Access to Archives 1.  Access is via a new URI, unknown to the user. 2.  People do not like to search for archived resources, and there is no automated method 3.  Navigation in the past is inconsistent: 1.  Stuck in single, necessarily incomplete archive 2.  Or if not rewritten, URIs lead back to the present Comment on Popular Science article: http://bit.ly/bWr5gP TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 6.
    1. Representations Archivedat a Different URI Sep 11 2001, 20:36:10 UTC Dec 20 2001, 4:51:00 UTC http://en.wikipedia.org/w/index.php? http://web.archive.org/web/20010911203610/http:// title=September_11_attacks&oldid=282333 archived www.cnn.com/ archived resource for http://cnn.com resource for http://en.wikipedia.org/wiki/ September_11_attacks TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 7.
    2. Searching isCumbersome http://web.archive.org/web/*/http://cnn.com/ http://en.wikipedia.org/w/index.php? title=September_11_attacks&action=history TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 8.
    3. Inconsistent Navigation(Archives Incomplete) SPACE Sep 11 2001, 20:36:10 UTC Sep 11 2001, 21:38:55 UTC http://web.archive.org/web/20010911203610/http:// http://web.archive.org/web/20010911213855/ www.cnn.com/ archived resource for http://cnn.com www.cnn.com/TECH/space/ TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 9.
    3. Inconsistent Navigation(Can't Stay in Past) Pentagon Dec 20 2001, 4:51:00 UTC current http://en.wikipedia.org/w/index.php? title=September_11_attacks&oldid=282333 archived http://en.wikipedia.org/wiki/The_Pentagon resource for http://en.wikipedia.org/wiki/ September_11_attacks3 TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 10.
    Past and CurrentWeb are Not Integrated TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 11.
    The Web withouta Time Dimension Need to use a different URI to access archived versions of a resource and its current version TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 12.
    The Web withTime Dimension added by Memento Memento uses URI of the current version to access archived versions, but qualify it with datetime, and magically arrive at the correct location. TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 13.
    The Memento Solution Thereare two components to the Memento Solution: •  Component 1: Navigation to an archived resource via its original resource, by leveraging content negotiation. •  Component 2: A discovery API for archives that enables retrieving a list of all archived versions of a resource for a given URI. TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 14.
    Content Negotiation inTime •  Many systems support content negotiation for file format o  Your client by default asks for HTML and gets HTML o  But it could get PDF via the same URI •  Memento proposes a new dimension for content negotiation: Time o  Your client by default asks for the current time, and gets it o  But it could get an older version via the same URI •  Can be accomplished with only one new HTTP header in each direction: o  Accept-Datetime Request for a particular timestamp o  Content-Datetime The returned content’s timestamp o  These exactly mirror existing headers for Format, Language, etc. TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 15.
    Apr 10 2001,21:39:30 UTC current Aug 15 2004, 08:45:27 UTC Aug 15 2007, 19:21:58 UTC www.cnn.com web.archive.org TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 16.
    Original Mementos Resource Apr 10 2001, 21:39:30 UTC current Aug 15 2004, 08:45:27 UTC Aug 15 2007, 19:21:58 UTC www.cnn.com web.archive.org TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 17.
    Original ? Mementos Resource Apr 10 2001, 21:39:30 UTC current Aug 15 2004, 08:45:27 UTC Aug 15 2007, 19:21:58 UTC www.cnn.com web.archive.org TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 18.
    Original TimeGate Mementos Resource Apr 10 2001, 21:39:30 UTC current Aug 15 2004, 08:45:27 UTC Aug 15 2007, 19:21:58 UTC www.cnn.com web.archive.org TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 19.
    Conneg with TimeGateto Mementos Original TimeGate Mementos Resource Apr 10 2001, 21:39:30 UTC current Aug 15 2004, 08:45:27 UTC Aug 15 2007, 19:21:58 UTC www.cnn.com web.archive.org TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 20.
    Link Headers Conneg with TimeGate to Mementos Original TimeGate Mementos Resource Apr 10 2001, 21:39:30 UTC current Aug 15 2004, 08:45:27 UTC Aug 15 2007, 19:21:58 UTC www.cnn.com web.archive.org TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 21.
    Link Headers Conneg with TimeGate to Mementos Original TimeGate Mementos Resource wikipedia.org TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 22.
    The Web withTime Dimension added by Memento TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 23.
    The Memento Solution •  Component 2: A discovery API for archives that allows requesting a list of all archived versions held for a resource with a given URI. TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 24.
    Why an API? •  Mementos for any given resource are distributed across archives. (What? Not just the Internet Archive?!) •  In order to get a correct perspective of available Mementos, different archives need to be consulted. •  Can do by distributed search (slow), or by consulting an aggregator. •  Aggregator and other services need machine readable description of archives' holdings to select appropriate Memento for request •  Closest in time •  Most reliable representation •  Fastest responding •  (etc) TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 25.
    WebCitation 13 May 2009 12:28:39 TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 26.
    WebCitation 13 May 2009 12:28:39 Archive-It 14 May 2009 01:18:11 TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 27.
    WebCitation 13 May 2009 12:28:39 Archive-It 14 May 2009 01:18:11 BL Archive 14 May 2009 07:12:45 TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 28.
    WebCitation 13 May 2009 12:28:39 Archive-It 14 May 2009 01:18:11 BL Archive 14 May 2009 07:12:45 Dracos 14 May 2009 13:00:00 TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 29.
    WebCitation 13 May 2009 12:28:39 Archive-It 14 May 2009 01:18:11 BL Archive 14 May 2009 07:12:45 Dracos 14 May 2009 13:00:00 TNA 14 May 2009 18:21:32 And no Internet Archive… TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 30.
    TimeMaps •  At mostbasic: List of URIs of Mementos and their times •  Expressed as Linked Data; a profile of OAI ORE Resource Maps •  Link header from TimeGate and Memento TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 31.
    Basic ORE Model Aggregation(Aggr) is a set of web resources (R-1 to R-3), described in RDF or Atom by a Resource Map (ReM). TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 32.
    TimeBundles Resources of Interestin Memento: •  Original Resource •  TimeGate •  Mementos TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 33.
    TimeGates •  Period(s) thatthe TimeGate covers •  Which resource is it a TimeGate for •  mem:TimeSpan as can cover multiple distinct periods TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 34.
    Mementos •  Time Period:valid for or observed over, number of observations •  Metadata: size, format, etc (will come back to the "etc") •  Which resource it is a Memento for TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 35.
    Serializations •  RDF/XML •  Good for XML parsers •  Turtle, N3 and related •  Good for graph parsers •  RDFa •  Good for web browsers •  Atom •  Good for alerting, feed readers etc (but still embeds RDF) •  New: Link Header format •  Good for real-time applications •  Smaller file size (just the facts, ma'am) •  Easy to implement with existing link header parsers •  Servers need to produce format anyway, so non-rdf way out TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 36.
    Use Case: Aggregatorusing TimeMaps TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 37.
    Link Headers Conneg with TimeGate to Mementos Original TimeGate Mementos Resource TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 38.
    Metadata Discussion Points 1. What metadata is necessary to determine the most appropriate copy? •  Distance to requested time most important •  Quality of representation? •  Usage statistics for Original Resource? For Memento? •  User tagging of Memento for quality? •  Archive response speed? •  Need to know more information from user preferences? 2.  What other metadata is useful and available? •  Crawling archives have limited information •  CMS systems have much more •  User tags, comments, annotations •  Semantic information about content, eg title, author, subject •  Distribution of changes over time TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 39.
    Metadata Discussion Points 3. What metadata is necessary for inter-archive synchronization? •  Deduplication information: digests, request headers •  "Significant Change" factors •  Crawler settings: respect no-cache, robots.txt etc 4.  What metadata can be generated by other services? •  Open World Model: Anyone can say anything about anything •  Technical metadata easy (MIX for images, etc) •  Time Series Analysis interesting (techtales.org) •  Machine Learning based approaches? TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 40.
    Thank You  RobSanderson: •  azaroth42@gmail.com •  rsanderson@lanl.gov This presentation: •  http://www.slideshare.net/azaroth42/xxx Memento: •  http://www.mementoweb.org/ •  http:groups.google.com/group/memento-dev MementoFox: •  https://addons.mozilla.com/en-US/firefox/addon/100298 aka: http://bit.ly/memfox Memento Enables Navigating the Past Web TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 41.
    Discussion Questions 1.  Whatmetadata is necessary to determine the most appropriate copy? 2.  What other metadata is useful and available? 3.  What metadata is necessary for inter-archive synchronization? 4.  What metadata can be generated by other services? TimeMaps: Metadata for Memento GSLIS Metadata Group, UIUC, 14th July 2010
  • 42.
    Appendix: Memento HTTPFlow HEAD R, (Accept-Datetime) LinkG GET G, Accept-Datetime 302M, Vary, TCN, LinkR,B,M GET M, (Accept-Datetime) 200, Content-Datetime, LinkR,B,M
  • 43.
    Memento HTTP Memento HTTP Flow Flow HEAD R, (Accept-Datetime) LinkG GET G, Accept-Datetime 302M, Vary, TCN, LinkR,B,M GET M, (Accept-Datetime) 200, Content-Datetime, LinkR,B,M
  • 44.
    Memento HTTP Memento HTTP Flow Flow: URI-R HEAD R, (Accept-Datetime) HEAD http://cnn.com/ HTTP/1.1 Host: cnn.com Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT Connection: close
  • 45.
    Memento HTTP Memento HTTP Flow Flow HEAD R, (Accept-Datetime) LinkG GET G, Accept-Datetime 302M, Vary, TCN, LinkR,B,M GET M, (Accept-Datetime) 200, Content-Datetime, LinkR,B,M
  • 46.
    Memento HTTP Memento HTTP Flow Flow: Success – LinkG HTTP/1.1 200 OK URI-R Date: Thu, 21 Jan 2010 00:02:12 GMT Server: Apache Link: <http://web.archive.org/web/timegate/http://cnn.com>; rel="timegate" Content-Length: 255 Connection: close Content-Type: text/html; charset=iso-8859-1
  • 47.
    Memento HTTP Memento HTTP Flow Flow HEAD R, (Accept-Datetime) LinkG GET G, Accept-Datetime 302M, Vary, TCN, LinkR,B,M GET M, (Accept-Datetime) 200, Content-Datetime, LinkR,B,M
  • 48.
    Memento HTTP Flow GET G, Accept-Datetime GET http://web.archive.org/web/timegate/http://cnn.com HTTP/1.1 Host: cnn.com Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT Connection: close
  • 49.
    Memento HTTP Memento HTTP Flow Flow HEAD R, Accept-Datetime LinkG GET G, Accept-Datetime 302M, Vary, TCN, LinkR,B,M GET M, Accept-Datetime 200, Content-Datetime, LinkR,B,M
  • 50.
    Memento HTTP Flow 302M, Vary, LinkR,B,M HTTP/1.1 302 Found Date: Thu, 21 Jan 2010 00:06:50 GMT Server: Apache TCN: choice Vary: negotiate, accept-datetime Location: http://web.archive.org/web/20010911203610/http://www.cnn.com Link: <http://cnn.com/>; rel="original", <http://web.archive.org/web/timebundle/http://cnn.com/>; rel="timebundle”, <http://web.archive.org/web/20000915112826/http://www.cnn.com>; rel=“first-memento”; datetime=“Tue, 15 Sep 2000 11:28:26 GMT”, <http://web.archive.org/web/20080708093433/http://www.cnn.com>; rel=“last-memento”; datetime="Tue, 08 Jul 2008 09:34:33 GMT”, <http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel=“prev-memento”; datetime="Tue, 11 Sep 2001 20:30:51 GMT”, <http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel=“next-memento”; datetime="Tue, 11 Sep 2001 20:47:33 GMT” Content-Length: 0 Connection: close Content-Type: text/plain; charset=UTF-8
  • 51.
    Memento HTTP Flow HEAD R, (Accept-Datetime) LinkG GET G, Accept-Datetime 302M, Vary, TCN, LinkR,B,M GET M, (Accept-Datetime) 200, Content-Datetime, LinkR,B,M
  • 52.
    Memento HTTP Flow GET M, Accept-Datetime GET http://web.archive.org/web/20010911203610/http://www.cnn.com HTTP/1.1 Host: web.archive.org Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT Connection: close
  • 53.
    Flow Memento HTTP Flow HEAD R, (Accept-Datetime) LinkG GET G, Accept-Datetime 302M, Vary, TCN, LinkR,B,M GET M, (Accept-Datetime) 200, Content-Datetime, LinkR,B,M
  • 54.
    Memento HTTP Flow 200, Content-Datetime, LinkR,B,M HTTP/1.1 200 OK Server: Apache-Coyote/1.1 X-Archive-Orig-Accept-Ranges: bytes … Content-Type: text/html;charset=utf-8 Content-Length: 23364 Date: Thu, 21 Jan 2010 00:09:40 GMT Content-Datetime: Tue, 11 Sep 2001 20:36:10 GMT Link: <http://cnn.com/>; rel="original", <http://web.archive.org/web/timebundle/http://cnn.com/>; rel="timebundle”, <http://web.archive.org/web/20000915112826/http://www.cnn.com>; rel=“first-memento”; datetime=“Tue, 15 Sep 2000 11:28:26 GMT”, <http://web.archive.org/web/20080708093433/http://www.cnn.com>; rel=“last-memento”; datetime="Tue, 08 Jul 2008 09:34:33 GMT”, <http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel=“prev-memento”; datetime="Tue, 11 Sep 2001 20:30:51 GMT”, <http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel=“next-memento”; datetime="Tue, 11 Sep 2001 20:47:33 GMT” Connection: close