Memento: Big Leaps Towards Seamless Navigation of the Web of the Past

Memento
                                http://mementoweb.org/


                                 Herbert Van de Sompel
                                     Robert Sanderson
                                     Michael L. Nelson


Big Leaps Towards Seamless Navigation
        of the Web of the Past

                    Memento Update
        CNI Task Force Meeting, Spring 2011   1
Overview of Memento Framework

Deployment Progress

Memento and Data

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies


                         Memento Update
             CNI Task Force Meeting, Spring 2011   2
Overview of Memento Framework

Progress

Memento and Data

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies


                         Memento Update
             CNI Task Force Meeting, Spring 2011   3
Memento wants to make it easy

to access the Web of the Past.




              Memento Update
  CNI Task Force Meeting, Spring 2011   4
Tate Online            Select Date                        Tate Online
  Today               March 16 2008                      March 16 2008




                                                              From
                                                        National Archives


                          Memento Update
              CNI Task Force Meeting, Spring 2011   5
Memento achieves this by introducing

a uniform version access capability to

 integrate the present and past Web.




                  Memento Update
      CNI Task Force Meeting, Spring 2011   6
Content Management Systems:

                     •  Designed to be aware of all
                        versions of a resource;

                     •  Self-contained;

                     •  Variety of proprietary version
                        mechanisms;

                     •  Versions interlinked using
                        proprietary mechanisms.



            Memento Update
CNI Task Force Meeting, Spring 2011   7
World Wide Web:

                     •  Designed to forget about prior
                        versions of a resource;

                     •  Distributed.




            Memento Update
CNI Task Force Meeting, Spring 2011   8
There are resource versions on
                       the Web:

                     •  Content Management
                        Systems;

                     •  Web Archives;

                     •  Transactional archives;

                     •  Search engine caches.



            Memento Update
CNI Task Force Meeting, Spring 2011   9
But the Web architecture has a
                        hard time dealing with them:

                      •  Cannot talk about a resource
                         as it used to exist;

                      •  Cannot access a prior version
                         knowing the current one;

                      •  Cannot access the current
                         version knowing a prior one;

                      Current approaches are ad hoc
                        and localized.


             Memento Update
CNI Task Force Meeting, Spring 2011   10
Memento:

                     •  Regards the Web as a big
                        Content Management System

                     •  Introduces a uniform
                        capability to access versions
                        on the Web;

                     •  Does not build new archives
                        but leverages all systems that
                        host versions: Web archives,
                        Content Management
                        Systems, Software Version
                        Systems, etc.

             Memento Update
CNI Task Force Meeting, Spring 2011   11
Memento’s version access
                        approach:

                      •  Is distributed: versions may
                         exist on several servers;

                      •  Uses time as a global version
                         indicator;

                      •  Is based on the primitives of
                         the Web: resource, resource
                         state, representation, content
                         negotiation, link.



             Memento Update
CNI Task Force Meeting, Spring 2011   12
Original Resource and Versions




               Memento Update
  CNI Task Force Meeting, Spring 2011   13
Bridge from Present to Past




             Memento Update
CNI Task Force Meeting, Spring 2011   14
Bridge from Past to Present




             Memento Update
CNI Task Force Meeting, Spring 2011   15
Memento Framework




             Memento Update
CNI Task Force Meeting, Spring 2011   16
Multiple Archives




             Memento Update
CNI Task Force Meeting, Spring 2011   17
Memento Client-Server Interaction




                 Memento Update
    CNI Task Force Meeting, Spring 2011   18
Overview of Memento Framework

Deployment Progress

Memento and Data

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies


                          Memento Update
             CNI Task Force Meeting, Spring 2011   19
Significant progress has been made towards

seamless navigation of the Web of the Past.




                     Memento Update
        CNI Task Force Meeting, Spring 2011   20
Standardization:

                                  •  Standardization process started
                                     via the IETF;

                                  •  Interest from IETF and W3C;

                                  •  Encouraged by major Web
                                     architects, including: Tim
                                     Berners-Lee, Mark Nottingham,
                                     Michael Hausenblas.


https://datatracker.ietf.org/doc/draft-vandesompel-memento/

                         Memento Update
            CNI Task Force Meeting, Spring 2011   21
Memento Clients:

                      •  Several client tools developed
                         by us and others;

                      •  Add-ons for FireFox
                         (operational) and Internet
                         Explorer (experimental);

                      •  Applications for Android
                         (operational) and iPhone/iPad
                         (in development);

                      •  Paper in next issue of Code4Lib
                         Journal.

http://www.mementoweb.org/tools/

             Memento Update
CNI Task Force Meeting, Spring 2011   22
Memento server support (1):

                      •  Memento-compliant Wayback
                         software:

                           •  Used by Internet Archive.

                           •  Available to Web archives,
                              worldwide.

                           •  Please have your favorite
                              Web Archive install this new
                              version 1.6!


http://www.mementoweb.org/tools/

             Memento Update
CNI Task Force Meeting, Spring 2011   23
Memento server support (2):

                      •  Plug-in for MediaWiki
                         (operational);

                           •  Used on W3C’s main wiki.

                      •  Please install it for your
                         MediaWiki!




http://www.mementoweb.org/tools/

             Memento Update
CNI Task Force Meeting, Spring 2011   24
Memento Server Validator

                      •  Server side client:

                           •  Attempts to perform all
                              Memento actions against a
                              given URI

                           •  Reports success/failure of
                              the interactions and
                              warnings for optional
                              aspects

                           •  Kept up to date with IETF
                              Internet Draft

http://www.mementoweb.org/tools/

             Memento Update
CNI Task Force Meeting, Spring 2011   25
Memento Proxy Support

                      •  Several systems that host
                         Mementos made Memento-
                         compliant “by proxy”:

                           •  All major Web Archives that
                              do not yet run Memento-
                              compliant Wayback software

                           •  3,000+ MediaWiki systems,
                              including Wikipedia

                      •  We want all of these to become
                         natively Memento compliant!


             Memento Update
CNI Task Force Meeting, Spring 2011   26
Memento Website:

                      •  Ongoing effort to add
                         materials that support
                         understanding and adoption:
                          •  Introduction to Memento
                          •  How to recognize
                             Mementos, TimeGates,
                             Original Resources?
                          •  Guidelines for servers that
                             host Mementos (Web
                             Archives, CMS, snapshot
                             archives, etc.)
http://www.mementoweb.org/guide/

             Memento Update
CNI Task Force Meeting, Spring 2011   27
Funding:

                      •  2007-2010: US $250K grant
                         from Library of Congress;
                          •  Approx. 50K on Memento.

                      •  2010-2011: US $1 Million
                         follow-up grant from Library of
                         Congress.

                           •  For: Specification, outreach,
                              tool development, further
                              research.



             Memento Update
CNI Task Force Meeting, Spring 2011   28
Overview of Memento Framework

Deployment Progress

Memento and Data

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies


                          Memento Update
             CNI Task Force Meeting, Spring 2011   29
Memento Time Travel is really powerful.

Time-Series Data via HTTP follow-your-nose.




                       Memento Update
          CNI Task Force Meeting, Spring 2011   30
Memento Framework




             Memento Update
CNI Task Force Meeting, Spring 2011   31
Memento Framework & Time Series


Original Resource: http://dbpedia.org/resource/France




                          Memento Update
             CNI Task Force Meeting, Spring 2011   32
Time Travel across DBpedia Versions




 Data collected through HTTP Navigation

   paper at http://arxiv.org/abs/1003.3661

                  Memento Update
     CNI Task Force Meeting, Spring 2011   33
Overview of Memento Framework

Deployment Progress

Memento and Data

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies


                          Memento Update
             CNI Task Force Meeting, Spring 2011   34
Very few Web sites provide a “timegate” link.

Need additional mechanisms to support Discovery.




                          Memento Update
             CNI Task Force Meeting, Spring 2011   35
Batch discovery of Mementos: TimeMaps




                       A TimeMap minimally lists:

•  URI and datetime of Mementos known to an archive
•  URI of Original Resource

    TimeMaps can be aggregated across systems that host Mementos

                                 Memento Update
                    CNI Task Force Meeting, Spring 2011   36
Batch discovery of Mementos: Feed of TimeMaps

•  System that host Mementos exposes Feed (e.g. Atom) of
TimeMaps to allow applications to remain in sync with its
evolving Memento collection:

   •  One Atom entry per Original Resource for which
   system hosts Mementos;
   •  The entry provides a “timemap” link to a
   TimeMap for the Original Resource;
   •  The datetime value of the updated field of the entry
   changes when additional Memento for Original Resource
   becomes available (i.e. TimeMap changes);
   •  The ID of the entry is a tag URI based on URI of
   Original Resource.
                    Will be proposed to IIPC

                            Memento Update
               CNI Task Force Meeting, Spring 2011   37
Batch discovery of Mementos: robots.txt

•  robots.txt file is used by Web servers to convey
crawling policies;

•  Add a directive to support discovery of Mementos known to
the server:
     •  Pointer to a single Memento can suffice as the robot
     can crawl on from there
     •  Mementos allow for discovery of TimeMaps via HTTP
     links.
     •  e.g. jcdl.org hosts snapshot archives of prior JCDL
     conferences and adds the following to its robots.txt:

   Memento: jcdl.org/archive/2002/index.html
               Will be promoted via Internet Draft

                             Memento Update
                CNI Task Force Meeting, Spring 2011   38
Batch discovery of TimeGates: robots.txt

•  robots.txt file is used by Web servers to convey
crawling policies;

•  Add a directive to support discovery of TimeGates known
to the server:
     •  TimeGates can be on server itself or on external server
     •  Value for the directive is typcially a regular expression
     •  e.g example.org could point at TimeGates in its
     associated transactional ta.org via robots.txt:

   TimeGate: ta.org/timegate/http://
   example.org/*


                Will be promoted via Internet Draft

                              Memento Update
                 CNI Task Force Meeting, Spring 2011   39
Discovery of Systems that Host Mementos: Registry/Feed

 •  Registry of collections of Mementos, e.g. of Web Archives,
 Transactional Archives, etc.

 •  Feed of registry records.

 •  A registry record details essential characteristics of a
 Memento collection.
       •  cf VOiD collection description for Linked Data.




                          Will be researched

                               Memento Update
                  CNI Task Force Meeting, Spring 2011   40
Overview of Memento Framework

Deployment Progress

Memento and Data

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies


                          Memento Update
             CNI Task Force Meeting, Spring 2011   41
Memento can recreate pages using
     resources from different archives.

This poses a branding challenge for archives.




                        Memento Update
           CNI Task Force Meeting, Spring 2011   42
Current Branding Practice for Web Archives

        Page and embedded resources from same Web Archive




 Branding
    for
   page
    and
embedded
resources




                                 Memento Update
                    CNI Task Force Meeting, Spring 2011   43
Branding for Web Archives in Memento Mode

       Page and embedded resources from various Web Archives

  Page
branding



   No
branding



   No
branding


                           Will be researched

                                Memento Update
                   CNI Task Force Meeting, Spring 2011   44
Overview of Memento Framework

Deployment Progress

Memento and Data

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies


                          Memento Update
             CNI Task Force Meeting, Spring 2011   45
Crawl-based Archives host distinct observations.

 Transactional Archives never miss an update.




                         Memento Update
            CNI Task Force Meeting, Spring 2011   46
Crawl-Based Web Archives




                    Observations

For example: Heritrix crawler for Internet Archive

                     Memento Update
        CNI Task Force Meeting, Spring 2011   47
Crawl-Based Web Archives

•  Collect discreet observations of resources, not their entire
evolution.

•  Can be rejected (robots.txt, by user-agent, by host
IP)

•  Can be deceived (cloaking, by geo-location, by user-
agent).

•  Coverage of particular Web server dependent on crawl-
strategy.




                              Memento Update
                 CNI Task Force Meeting, Spring 2011   48
Server-Side Transactional Web Archives




                       Change History

For example: TTApache, PageVault, Vignette Web Capture

                         Memento Update
            CNI Task Force Meeting, Spring 2011   49
Server-Side Transactional Web Archives

•  Collect all representations served by to-be-archived server.

•  To-be-archived server needs to cooperate.
     •  Incentives e.g. institutional memory, official record of
     Web presence.

•  Archival coverage restricted by to-be-archived server, does
not include external servers (e.g. embedded resources).

•  To be archived server can submit falsified information.

•  Archival collection management: what to keep, what not
(e.g. significant changes, deduplication, …).


                               Memento Update
                  CNI Task Force Meeting, Spring 2011   50
Development of Transactional Web Archive Software
Capture:
•  Apache connection filter module (mod_ta) captures URI, headers, body;
•  Module POSTs in real-time to transactional archive’s Submit URI.




Submit:
•  Java-Grizzly-Jersey submission interface application;
•  Berkeley DB metadata store;
•  FS store for body and headers.

                                  Memento Update
                     CNI Task Force Meeting, Spring 2011   51
Development of Transactional Web Archive Software
Access:
•  Transactional archive natively supports Memento;
•  Immediate availability of archived content;
•  Export of WARC, e.g. for long-term archiving in other environment.




Development timeline:
•  Ongoing development (LANL) and testing (ODU);
•  Submit/Access finalized; development focus on collection management.
•  Expected release as open source, 3rd Quarter 2011.

                                  Memento Update
                     CNI Task Force Meeting, Spring 2011   52
Memento
                                  http://mementoweb.org/


                                  Herbert Van de Sompel
                                      Robert Sanderson
                                      Michael L. Nelson


Big Leaps Towards Seamless Navigation of
           the Web of the Past

                      Memento Update
         CNI Task Force Meeting, Spring 2011   53
1 of 53

Recommended

The Web as infrastructure for scholarly research and communication by
The Web as infrastructure for scholarly research and communicationThe Web as infrastructure for scholarly research and communication
The Web as infrastructure for scholarly research and communicationHerbert Van de Sompel
5.1K views78 slides
OAC Presentation at CNI 09 Fall Forum by
OAC Presentation at CNI 09 Fall ForumOAC Presentation at CNI 09 Fall Forum
OAC Presentation at CNI 09 Fall ForumRobert Sanderson
5K views47 slides
The Roof is on Fire by
The Roof is on FireThe Roof is on Fire
The Roof is on FireHerbert Van de Sompel
5.1K views51 slides
Motivation, inspiration and innovation from frustration by
Motivation, inspiration and innovation from frustrationMotivation, inspiration and innovation from frustration
Motivation, inspiration and innovation from frustrationHerbert Van de Sompel
3.8K views42 slides
A Perspective on Archiving the Scholarly Record by
A Perspective on Archiving the Scholarly RecordA Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly RecordHerbert Van de Sompel
7.5K views65 slides
the UPS protoproto project by
the UPS protoproto projectthe UPS protoproto project
the UPS protoproto projectHerbert Van de Sompel
3.2K views30 slides

More Related Content

Viewers also liked

Open Archives Initiative Object Re-Use & Exchange by
Open Archives Initiative Object Re-Use & ExchangeOpen Archives Initiative Object Re-Use & Exchange
Open Archives Initiative Object Re-Use & ExchangeHerbert Van de Sompel
3.5K views43 slides
The bX project: Federating and Mining Usage Logs from Linking Servers by
The bX project: Federating and Mining Usage Logs from Linking ServersThe bX project: Federating and Mining Usage Logs from Linking Servers
The bX project: Federating and Mining Usage Logs from Linking ServersHerbert Van de Sompel
4.8K views37 slides
The djatoka Image Server by
The djatoka Image ServerThe djatoka Image Server
The djatoka Image ServerHerbert Van de Sompel
5.2K views22 slides
An Overview of the OAI Object Reuse and Exchange Interoperability Framework by
An Overview of the OAI Object Reuse and Exchange Interoperability FrameworkAn Overview of the OAI Object Reuse and Exchange Interoperability Framework
An Overview of the OAI Object Reuse and Exchange Interoperability FrameworkHerbert Van de Sompel
11.3K views167 slides
The aDORe Federation Architecture by
The aDORe Federation ArchitectureThe aDORe Federation Architecture
The aDORe Federation ArchitectureHerbert Van de Sompel
3.5K views33 slides
MESUR: Making sense and use of usage data by
MESUR: Making sense and use of usage dataMESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataHerbert Van de Sompel
6K views35 slides

Viewers also liked(17)

The bX project: Federating and Mining Usage Logs from Linking Servers by Herbert Van de Sompel
The bX project: Federating and Mining Usage Logs from Linking ServersThe bX project: Federating and Mining Usage Logs from Linking Servers
The bX project: Federating and Mining Usage Logs from Linking Servers
An Overview of the OAI Object Reuse and Exchange Interoperability Framework by Herbert Van de Sompel
An Overview of the OAI Object Reuse and Exchange Interoperability FrameworkAn Overview of the OAI Object Reuse and Exchange Interoperability Framework
An Overview of the OAI Object Reuse and Exchange Interoperability Framework
Herbert Van de Sompel11.3K views
Hiberlink: Investigating Reference Rot, December 2013 by Herbert Van de Sompel
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013
Herbert Van de Sompel11.3K views
The SFX Framework for Context-Sensitive Reference Linking by Herbert Van de Sompel
The SFX Framework for  Context-Sensitive Reference LinkingThe SFX Framework for  Context-Sensitive Reference Linking
The SFX Framework for Context-Sensitive Reference Linking
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT by Herbert Van de Sompel
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
towards interoperable archives: the Universal Preprint Service initiative by Herbert Van de Sompel
towards interoperable archives:  the Universal Preprint Service initiativetowards interoperable archives:  the Universal Preprint Service initiative
towards interoperable archives: the Universal Preprint Service initiative
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl... by Herbert Van de Sompel
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...
Towards a Machine-Actionable Scholarly Communication System by Herbert Van de Sompel
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication System

Similar to Memento: Big Leaps Towards Seamless Navigation of the Web of the Past

Update on Memento (IIPC 2011 Plenary) by
Update on Memento (IIPC 2011 Plenary)Update on Memento (IIPC 2011 Plenary)
Update on Memento (IIPC 2011 Plenary)Robert Sanderson
664 views40 slides
Memento: Updated technical details (May 2011) by
Memento: Updated technical details (May 2011)Memento: Updated technical details (May 2011)
Memento: Updated technical details (May 2011)Herbert Van de Sompel
4K views139 slides
Semantic Annotation and Search for Resources in the Next Generation Web by
Semantic Annotation and Search for Resources in the Next Generation WebSemantic Annotation and Search for Resources in the Next Generation Web
Semantic Annotation and Search for Resources in the Next Generation Webajithranabahu
1.2K views18 slides
VA Smalltalk Update by
VA Smalltalk UpdateVA Smalltalk Update
VA Smalltalk UpdateESUG
523 views49 slides
VA Smalltalk Update ESUG2014 by
VA Smalltalk Update ESUG2014VA Smalltalk Update ESUG2014
VA Smalltalk Update ESUG2014ESUG
710 views41 slides
Os php-wiki1-pdf by
Os php-wiki1-pdfOs php-wiki1-pdf
Os php-wiki1-pdfVrandesh Bandikatti
5K views22 slides

Similar to Memento: Big Leaps Towards Seamless Navigation of the Web of the Past(20)

Update on Memento (IIPC 2011 Plenary) by Robert Sanderson
Update on Memento (IIPC 2011 Plenary)Update on Memento (IIPC 2011 Plenary)
Update on Memento (IIPC 2011 Plenary)
Robert Sanderson664 views
Semantic Annotation and Search for Resources in the Next Generation Web by ajithranabahu
Semantic Annotation and Search for Resources in the Next Generation WebSemantic Annotation and Search for Resources in the Next Generation Web
Semantic Annotation and Search for Resources in the Next Generation Web
ajithranabahu1.2K views
VA Smalltalk Update by ESUG
VA Smalltalk UpdateVA Smalltalk Update
VA Smalltalk Update
ESUG523 views
VA Smalltalk Update ESUG2014 by ESUG
VA Smalltalk Update ESUG2014VA Smalltalk Update ESUG2014
VA Smalltalk Update ESUG2014
ESUG710 views
Webinar Mobile ECM Apps with Nuxeo EP by Nuxeo
Webinar Mobile ECM Apps with Nuxeo EPWebinar Mobile ECM Apps with Nuxeo EP
Webinar Mobile ECM Apps with Nuxeo EP
Nuxeo606 views
facebook architecture for 600M users by Jongyoon Choi
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M users
Jongyoon Choi72K views
Tycho - Building plug-ins with Maven by Pascal Rapicault
Tycho - Building plug-ins with MavenTycho - Building plug-ins with Maven
Tycho - Building plug-ins with Maven
Pascal Rapicault2.3K views
Jasig-sakai2012-communitytranslation-kajita by Shoji Kajita
Jasig-sakai2012-communitytranslation-kajitaJasig-sakai2012-communitytranslation-kajita
Jasig-sakai2012-communitytranslation-kajita
Shoji Kajita429 views
An introduction to honeyclient technology by Angelo Dell'Aera
An introduction to honeyclient technologyAn introduction to honeyclient technology
An introduction to honeyclient technology
Angelo Dell'Aera718 views
Content Management Systems and Refactoring - Drupal, WordPress and eZ Publish by Jani Tarvainen
Content Management Systems and Refactoring - Drupal, WordPress and eZ PublishContent Management Systems and Refactoring - Drupal, WordPress and eZ Publish
Content Management Systems and Refactoring - Drupal, WordPress and eZ Publish
Jani Tarvainen1.4K views
NLLC 2011: Memento, Open Annotation, SharedCanvas by Robert Sanderson
NLLC 2011: Memento, Open Annotation, SharedCanvasNLLC 2011: Memento, Open Annotation, SharedCanvas
NLLC 2011: Memento, Open Annotation, SharedCanvas
Robert Sanderson1K views
Open Source na IBM (palestra efetuada no Comaer 2008) by Cezar Taurion
Open Source na IBM (palestra efetuada no Comaer 2008)Open Source na IBM (palestra efetuada no Comaer 2008)
Open Source na IBM (palestra efetuada no Comaer 2008)
Cezar Taurion777 views
Open Source and Open Standards for Information and Records Managers by Cheryl McKinnon
Open Source and Open Standards for Information and Records ManagersOpen Source and Open Standards for Information and Records Managers
Open Source and Open Standards for Information and Records Managers
Cheryl McKinnon876 views
Enabling The Enterprise With Php by phptechtalk
Enabling The Enterprise With PhpEnabling The Enterprise With Php
Enabling The Enterprise With Php
phptechtalk591 views
Programming With WinRT And Windows8 by Rainer Stropek
Programming With WinRT And Windows8Programming With WinRT And Windows8
Programming With WinRT And Windows8
Rainer Stropek1.2K views

More from Herbert Van de Sompel

The web is rotting and what to do about it by
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about itHerbert Van de Sompel
325 views86 slides
Researcher Pod: Scholarly Communication Using the Decentralized Web by
Researcher Pod: Scholarly Communication Using the Decentralized WebResearcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebHerbert Van de Sompel
395 views42 slides
Persistent Identification: Easier Said than Done by
Persistent Identification: Easier Said than DonePersistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DoneHerbert Van de Sompel
414 views41 slides
FAIR Signposting: A KISS Approach to a Burning Issue by
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueHerbert Van de Sompel
1.1K views28 slides
Registration / Certification Interoperability Architecture (overlay peer-review) by
Registration / Certification Interoperability Architecture (overlay peer-review)Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Herbert Van de Sompel
371 views44 slides
Collecting the organizational scholarly record by
Collecting the organizational scholarly recordCollecting the organizational scholarly record
Collecting the organizational scholarly recordHerbert Van de Sompel
672 views69 slides

More from Herbert Van de Sompel(20)

Researcher Pod: Scholarly Communication Using the Decentralized Web by Herbert Van de Sompel
Researcher Pod: Scholarly Communication Using the Decentralized WebResearcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized Web
Registration / Certification Interoperability Architecture (overlay peer-review) by Herbert Van de Sompel
Registration / Certification Interoperability Architecture (overlay peer-review)Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping by Herbert Van de Sompel
Persistent Identifiers and the Web: The Need for an Unambiguous MappingPersistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping

Recently uploaded

Future of AR - Facebook Presentation by
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook PresentationRob McCarty
64 views27 slides
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...Bernd Ruecker
54 views69 slides
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue by
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueCloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueShapeBlue
135 views13 slides
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ... by
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...ShapeBlue
186 views15 slides
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...The Digital Insurer
90 views52 slides
The Power of Heat Decarbonisation Plans in the Built Environment by
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built EnvironmentIES VE
79 views20 slides

Recently uploaded(20)

Future of AR - Facebook Presentation by Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty64 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker54 views
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue by ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueCloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
ShapeBlue135 views
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ... by ShapeBlue
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
ShapeBlue186 views
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
The Power of Heat Decarbonisation Plans in the Built Environment by IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE79 views
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... by ShapeBlue
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
ShapeBlue119 views
DRBD Deep Dive - Philipp Reisner - LINBIT by ShapeBlue
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBIT
ShapeBlue180 views
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash158 views
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue by ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueVNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
ShapeBlue203 views
NTGapps NTG LowCode Platform by Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu423 views
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by ShapeBlue
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
ShapeBlue194 views
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue166 views
Business Analyst Series 2023 - Week 4 Session 7 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7
DianaGray10139 views
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT by ShapeBlue
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
ShapeBlue206 views
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue by ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueCloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
ShapeBlue138 views
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit... by ShapeBlue
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
ShapeBlue159 views
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue by ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
ShapeBlue147 views

Memento: Big Leaps Towards Seamless Navigation of the Web of the Past

  • 1. Memento http://mementoweb.org/ Herbert Van de Sompel Robert Sanderson Michael L. Nelson Big Leaps Towards Seamless Navigation of the Web of the Past Memento Update CNI Task Force Meeting, Spring 2011 1
  • 2. Overview of Memento Framework Deployment Progress Memento and Data Memento and Discovery Memento and Branding Alternative Web Archiving Strategies Memento Update CNI Task Force Meeting, Spring 2011 2
  • 3. Overview of Memento Framework Progress Memento and Data Memento and Discovery Memento and Branding Alternative Web Archiving Strategies Memento Update CNI Task Force Meeting, Spring 2011 3
  • 4. Memento wants to make it easy to access the Web of the Past. Memento Update CNI Task Force Meeting, Spring 2011 4
  • 5. Tate Online Select Date Tate Online Today March 16 2008 March 16 2008 From National Archives Memento Update CNI Task Force Meeting, Spring 2011 5
  • 6. Memento achieves this by introducing a uniform version access capability to integrate the present and past Web. Memento Update CNI Task Force Meeting, Spring 2011 6
  • 7. Content Management Systems: •  Designed to be aware of all versions of a resource; •  Self-contained; •  Variety of proprietary version mechanisms; •  Versions interlinked using proprietary mechanisms. Memento Update CNI Task Force Meeting, Spring 2011 7
  • 8. World Wide Web: •  Designed to forget about prior versions of a resource; •  Distributed. Memento Update CNI Task Force Meeting, Spring 2011 8
  • 9. There are resource versions on the Web: •  Content Management Systems; •  Web Archives; •  Transactional archives; •  Search engine caches. Memento Update CNI Task Force Meeting, Spring 2011 9
  • 10. But the Web architecture has a hard time dealing with them: •  Cannot talk about a resource as it used to exist; •  Cannot access a prior version knowing the current one; •  Cannot access the current version knowing a prior one; Current approaches are ad hoc and localized. Memento Update CNI Task Force Meeting, Spring 2011 10
  • 11. Memento: •  Regards the Web as a big Content Management System •  Introduces a uniform capability to access versions on the Web; •  Does not build new archives but leverages all systems that host versions: Web archives, Content Management Systems, Software Version Systems, etc. Memento Update CNI Task Force Meeting, Spring 2011 11
  • 12. Memento’s version access approach: •  Is distributed: versions may exist on several servers; •  Uses time as a global version indicator; •  Is based on the primitives of the Web: resource, resource state, representation, content negotiation, link. Memento Update CNI Task Force Meeting, Spring 2011 12
  • 13. Original Resource and Versions Memento Update CNI Task Force Meeting, Spring 2011 13
  • 14. Bridge from Present to Past Memento Update CNI Task Force Meeting, Spring 2011 14
  • 15. Bridge from Past to Present Memento Update CNI Task Force Meeting, Spring 2011 15
  • 16. Memento Framework Memento Update CNI Task Force Meeting, Spring 2011 16
  • 17. Multiple Archives Memento Update CNI Task Force Meeting, Spring 2011 17
  • 18. Memento Client-Server Interaction Memento Update CNI Task Force Meeting, Spring 2011 18
  • 19. Overview of Memento Framework Deployment Progress Memento and Data Memento and Discovery Memento and Branding Alternative Web Archiving Strategies Memento Update CNI Task Force Meeting, Spring 2011 19
  • 20. Significant progress has been made towards seamless navigation of the Web of the Past. Memento Update CNI Task Force Meeting, Spring 2011 20
  • 21. Standardization: •  Standardization process started via the IETF; •  Interest from IETF and W3C; •  Encouraged by major Web architects, including: Tim Berners-Lee, Mark Nottingham, Michael Hausenblas. https://datatracker.ietf.org/doc/draft-vandesompel-memento/ Memento Update CNI Task Force Meeting, Spring 2011 21
  • 22. Memento Clients: •  Several client tools developed by us and others; •  Add-ons for FireFox (operational) and Internet Explorer (experimental); •  Applications for Android (operational) and iPhone/iPad (in development); •  Paper in next issue of Code4Lib Journal. http://www.mementoweb.org/tools/ Memento Update CNI Task Force Meeting, Spring 2011 22
  • 23. Memento server support (1): •  Memento-compliant Wayback software: •  Used by Internet Archive. •  Available to Web archives, worldwide. •  Please have your favorite Web Archive install this new version 1.6! http://www.mementoweb.org/tools/ Memento Update CNI Task Force Meeting, Spring 2011 23
  • 24. Memento server support (2): •  Plug-in for MediaWiki (operational); •  Used on W3C’s main wiki. •  Please install it for your MediaWiki! http://www.mementoweb.org/tools/ Memento Update CNI Task Force Meeting, Spring 2011 24
  • 25. Memento Server Validator •  Server side client: •  Attempts to perform all Memento actions against a given URI •  Reports success/failure of the interactions and warnings for optional aspects •  Kept up to date with IETF Internet Draft http://www.mementoweb.org/tools/ Memento Update CNI Task Force Meeting, Spring 2011 25
  • 26. Memento Proxy Support •  Several systems that host Mementos made Memento- compliant “by proxy”: •  All major Web Archives that do not yet run Memento- compliant Wayback software •  3,000+ MediaWiki systems, including Wikipedia •  We want all of these to become natively Memento compliant! Memento Update CNI Task Force Meeting, Spring 2011 26
  • 27. Memento Website: •  Ongoing effort to add materials that support understanding and adoption: •  Introduction to Memento •  How to recognize Mementos, TimeGates, Original Resources? •  Guidelines for servers that host Mementos (Web Archives, CMS, snapshot archives, etc.) http://www.mementoweb.org/guide/ Memento Update CNI Task Force Meeting, Spring 2011 27
  • 28. Funding: •  2007-2010: US $250K grant from Library of Congress; •  Approx. 50K on Memento. •  2010-2011: US $1 Million follow-up grant from Library of Congress. •  For: Specification, outreach, tool development, further research. Memento Update CNI Task Force Meeting, Spring 2011 28
  • 29. Overview of Memento Framework Deployment Progress Memento and Data Memento and Discovery Memento and Branding Alternative Web Archiving Strategies Memento Update CNI Task Force Meeting, Spring 2011 29
  • 30. Memento Time Travel is really powerful. Time-Series Data via HTTP follow-your-nose. Memento Update CNI Task Force Meeting, Spring 2011 30
  • 31. Memento Framework Memento Update CNI Task Force Meeting, Spring 2011 31
  • 32. Memento Framework & Time Series Original Resource: http://dbpedia.org/resource/France Memento Update CNI Task Force Meeting, Spring 2011 32
  • 33. Time Travel across DBpedia Versions Data collected through HTTP Navigation paper at http://arxiv.org/abs/1003.3661 Memento Update CNI Task Force Meeting, Spring 2011 33
  • 34. Overview of Memento Framework Deployment Progress Memento and Data Memento and Discovery Memento and Branding Alternative Web Archiving Strategies Memento Update CNI Task Force Meeting, Spring 2011 34
  • 35. Very few Web sites provide a “timegate” link. Need additional mechanisms to support Discovery. Memento Update CNI Task Force Meeting, Spring 2011 35
  • 36. Batch discovery of Mementos: TimeMaps A TimeMap minimally lists: •  URI and datetime of Mementos known to an archive •  URI of Original Resource TimeMaps can be aggregated across systems that host Mementos Memento Update CNI Task Force Meeting, Spring 2011 36
  • 37. Batch discovery of Mementos: Feed of TimeMaps •  System that host Mementos exposes Feed (e.g. Atom) of TimeMaps to allow applications to remain in sync with its evolving Memento collection: •  One Atom entry per Original Resource for which system hosts Mementos; •  The entry provides a “timemap” link to a TimeMap for the Original Resource; •  The datetime value of the updated field of the entry changes when additional Memento for Original Resource becomes available (i.e. TimeMap changes); •  The ID of the entry is a tag URI based on URI of Original Resource. Will be proposed to IIPC Memento Update CNI Task Force Meeting, Spring 2011 37
  • 38. Batch discovery of Mementos: robots.txt •  robots.txt file is used by Web servers to convey crawling policies; •  Add a directive to support discovery of Mementos known to the server: •  Pointer to a single Memento can suffice as the robot can crawl on from there •  Mementos allow for discovery of TimeMaps via HTTP links. •  e.g. jcdl.org hosts snapshot archives of prior JCDL conferences and adds the following to its robots.txt: Memento: jcdl.org/archive/2002/index.html Will be promoted via Internet Draft Memento Update CNI Task Force Meeting, Spring 2011 38
  • 39. Batch discovery of TimeGates: robots.txt •  robots.txt file is used by Web servers to convey crawling policies; •  Add a directive to support discovery of TimeGates known to the server: •  TimeGates can be on server itself or on external server •  Value for the directive is typcially a regular expression •  e.g example.org could point at TimeGates in its associated transactional ta.org via robots.txt: TimeGate: ta.org/timegate/http:// example.org/* Will be promoted via Internet Draft Memento Update CNI Task Force Meeting, Spring 2011 39
  • 40. Discovery of Systems that Host Mementos: Registry/Feed •  Registry of collections of Mementos, e.g. of Web Archives, Transactional Archives, etc. •  Feed of registry records. •  A registry record details essential characteristics of a Memento collection. •  cf VOiD collection description for Linked Data. Will be researched Memento Update CNI Task Force Meeting, Spring 2011 40
  • 41. Overview of Memento Framework Deployment Progress Memento and Data Memento and Discovery Memento and Branding Alternative Web Archiving Strategies Memento Update CNI Task Force Meeting, Spring 2011 41
  • 42. Memento can recreate pages using resources from different archives. This poses a branding challenge for archives. Memento Update CNI Task Force Meeting, Spring 2011 42
  • 43. Current Branding Practice for Web Archives Page and embedded resources from same Web Archive Branding for page and embedded resources Memento Update CNI Task Force Meeting, Spring 2011 43
  • 44. Branding for Web Archives in Memento Mode Page and embedded resources from various Web Archives Page branding No branding No branding Will be researched Memento Update CNI Task Force Meeting, Spring 2011 44
  • 45. Overview of Memento Framework Deployment Progress Memento and Data Memento and Discovery Memento and Branding Alternative Web Archiving Strategies Memento Update CNI Task Force Meeting, Spring 2011 45
  • 46. Crawl-based Archives host distinct observations. Transactional Archives never miss an update. Memento Update CNI Task Force Meeting, Spring 2011 46
  • 47. Crawl-Based Web Archives Observations For example: Heritrix crawler for Internet Archive Memento Update CNI Task Force Meeting, Spring 2011 47
  • 48. Crawl-Based Web Archives •  Collect discreet observations of resources, not their entire evolution. •  Can be rejected (robots.txt, by user-agent, by host IP) •  Can be deceived (cloaking, by geo-location, by user- agent). •  Coverage of particular Web server dependent on crawl- strategy. Memento Update CNI Task Force Meeting, Spring 2011 48
  • 49. Server-Side Transactional Web Archives Change History For example: TTApache, PageVault, Vignette Web Capture Memento Update CNI Task Force Meeting, Spring 2011 49
  • 50. Server-Side Transactional Web Archives •  Collect all representations served by to-be-archived server. •  To-be-archived server needs to cooperate. •  Incentives e.g. institutional memory, official record of Web presence. •  Archival coverage restricted by to-be-archived server, does not include external servers (e.g. embedded resources). •  To be archived server can submit falsified information. •  Archival collection management: what to keep, what not (e.g. significant changes, deduplication, …). Memento Update CNI Task Force Meeting, Spring 2011 50
  • 51. Development of Transactional Web Archive Software Capture: •  Apache connection filter module (mod_ta) captures URI, headers, body; •  Module POSTs in real-time to transactional archive’s Submit URI. Submit: •  Java-Grizzly-Jersey submission interface application; •  Berkeley DB metadata store; •  FS store for body and headers. Memento Update CNI Task Force Meeting, Spring 2011 51
  • 52. Development of Transactional Web Archive Software Access: •  Transactional archive natively supports Memento; •  Immediate availability of archived content; •  Export of WARC, e.g. for long-term archiving in other environment. Development timeline: •  Ongoing development (LANL) and testing (ODU); •  Submit/Access finalized; development focus on collection management. •  Expected release as open source, 3rd Quarter 2011. Memento Update CNI Task Force Meeting, Spring 2011 52
  • 53. Memento http://mementoweb.org/ Herbert Van de Sompel Robert Sanderson Michael L. Nelson Big Leaps Towards Seamless Navigation of the Web of the Past Memento Update CNI Task Force Meeting, Spring 2011 53