Memento: Big Leaps Towards Seamless Navigation of the Web of the Past
Upcoming SlideShare
Loading in...5
×
 

Memento: Big Leaps Towards Seamless Navigation of the Web of the Past

on

  • 4,812 views

These slides provide an explanation of the Memento Framework (time travel for the Web) from the perspective of resource versioning. It also details progress that has been made with deploying the ...

These slides provide an explanation of the Memento Framework (time travel for the Web) from the perspective of resource versioning. It also details progress that has been made with deploying the framework since it was first introduced in November 2009, including standardization, development of tools, and advocacy. In addition, it touches upon new challenges (discovery, branding) and announces plans to make transactional Web archiving software available in the course of 2011.

Statistics

Views

Total Views
4,812
Views on SlideShare
3,929
Embed Views
883

Actions

Likes
1
Downloads
32
Comments
0

3 Embeds 883

http://public.lanl.gov 881
url_unknown 1
http://translate.googleusercontent.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Memento: Big Leaps Towards Seamless Navigation of the Web of the Past Memento: Big Leaps Towards Seamless Navigation of the Web of the Past Presentation Transcript

  • Memento http://mementoweb.org/ Herbert Van de Sompel Robert Sanderson Michael L. NelsonBig Leaps Towards Seamless Navigation of the Web of the Past Memento Update CNI Task Force Meeting, Spring 2011 1
  • Overview of Memento FrameworkDeployment ProgressMemento and DataMemento and DiscoveryMemento and BrandingAlternative Web Archiving Strategies Memento Update CNI Task Force Meeting, Spring 2011 2
  • Overview of Memento FrameworkProgressMemento and DataMemento and DiscoveryMemento and BrandingAlternative Web Archiving Strategies Memento Update CNI Task Force Meeting, Spring 2011 3
  • Memento wants to make it easyto access the Web of the Past. Memento Update CNI Task Force Meeting, Spring 2011 4
  • Tate Online Select Date Tate Online Today March 16 2008 March 16 2008 From National Archives Memento Update CNI Task Force Meeting, Spring 2011 5
  • Memento achieves this by introducinga uniform version access capability to integrate the present and past Web. Memento Update CNI Task Force Meeting, Spring 2011 6
  • Content Management Systems: •  Designed to be aware of all versions of a resource; •  Self-contained; •  Variety of proprietary version mechanisms; •  Versions interlinked using proprietary mechanisms. Memento UpdateCNI Task Force Meeting, Spring 2011 7
  • World Wide Web: •  Designed to forget about prior versions of a resource; •  Distributed. Memento UpdateCNI Task Force Meeting, Spring 2011 8
  • There are resource versions on the Web: •  Content Management Systems; •  Web Archives; •  Transactional archives; •  Search engine caches. Memento UpdateCNI Task Force Meeting, Spring 2011 9
  • But the Web architecture has a hard time dealing with them: •  Cannot talk about a resource as it used to exist; •  Cannot access a prior version knowing the current one; •  Cannot access the current version knowing a prior one; Current approaches are ad hoc and localized. Memento UpdateCNI Task Force Meeting, Spring 2011 10
  • Memento: •  Regards the Web as a big Content Management System •  Introduces a uniform capability to access versions on the Web; •  Does not build new archives but leverages all systems that host versions: Web archives, Content Management Systems, Software Version Systems, etc. Memento UpdateCNI Task Force Meeting, Spring 2011 11
  • Memento’s version access approach: •  Is distributed: versions may exist on several servers; •  Uses time as a global version indicator; •  Is based on the primitives of the Web: resource, resource state, representation, content negotiation, link. Memento UpdateCNI Task Force Meeting, Spring 2011 12
  • Original Resource and Versions Memento Update CNI Task Force Meeting, Spring 2011 13
  • Bridge from Present to Past Memento UpdateCNI Task Force Meeting, Spring 2011 14
  • Bridge from Past to Present Memento UpdateCNI Task Force Meeting, Spring 2011 15
  • Memento Framework Memento UpdateCNI Task Force Meeting, Spring 2011 16
  • Multiple Archives Memento UpdateCNI Task Force Meeting, Spring 2011 17
  • Memento Client-Server Interaction Memento Update CNI Task Force Meeting, Spring 2011 18
  • Overview of Memento FrameworkDeployment ProgressMemento and DataMemento and DiscoveryMemento and BrandingAlternative Web Archiving Strategies Memento Update CNI Task Force Meeting, Spring 2011 19
  • Significant progress has been made towardsseamless navigation of the Web of the Past. Memento Update CNI Task Force Meeting, Spring 2011 20
  • Standardization: •  Standardization process started via the IETF; •  Interest from IETF and W3C; •  Encouraged by major Web architects, including: Tim Berners-Lee, Mark Nottingham, Michael Hausenblas.https://datatracker.ietf.org/doc/draft-vandesompel-memento/ Memento Update CNI Task Force Meeting, Spring 2011 21
  • Memento Clients: •  Several client tools developed by us and others; •  Add-ons for FireFox (operational) and Internet Explorer (experimental); •  Applications for Android (operational) and iPhone/iPad (in development); •  Paper in next issue of Code4Lib Journal.http://www.mementoweb.org/tools/ Memento UpdateCNI Task Force Meeting, Spring 2011 22
  • Memento server support (1): •  Memento-compliant Wayback software: •  Used by Internet Archive. •  Available to Web archives, worldwide. •  Please have your favorite Web Archive install this new version 1.6!http://www.mementoweb.org/tools/ Memento UpdateCNI Task Force Meeting, Spring 2011 23
  • Memento server support (2): •  Plug-in for MediaWiki (operational); •  Used on W3C’s main wiki. •  Please install it for your MediaWiki!http://www.mementoweb.org/tools/ Memento UpdateCNI Task Force Meeting, Spring 2011 24
  • Memento Server Validator •  Server side client: •  Attempts to perform all Memento actions against a given URI •  Reports success/failure of the interactions and warnings for optional aspects •  Kept up to date with IETF Internet Drafthttp://www.mementoweb.org/tools/ Memento UpdateCNI Task Force Meeting, Spring 2011 25
  • Memento Proxy Support •  Several systems that host Mementos made Memento- compliant “by proxy”: •  All major Web Archives that do not yet run Memento- compliant Wayback software •  3,000+ MediaWiki systems, including Wikipedia •  We want all of these to become natively Memento compliant! Memento UpdateCNI Task Force Meeting, Spring 2011 26
  • Memento Website: •  Ongoing effort to add materials that support understanding and adoption: •  Introduction to Memento •  How to recognize Mementos, TimeGates, Original Resources? •  Guidelines for servers that host Mementos (Web Archives, CMS, snapshot archives, etc.)http://www.mementoweb.org/guide/ Memento UpdateCNI Task Force Meeting, Spring 2011 27
  • Funding: •  2007-2010: US $250K grant from Library of Congress; •  Approx. 50K on Memento. •  2010-2011: US $1 Million follow-up grant from Library of Congress. •  For: Specification, outreach, tool development, further research. Memento UpdateCNI Task Force Meeting, Spring 2011 28
  • Overview of Memento FrameworkDeployment ProgressMemento and DataMemento and DiscoveryMemento and BrandingAlternative Web Archiving Strategies Memento Update CNI Task Force Meeting, Spring 2011 29
  • Memento Time Travel is really powerful.Time-Series Data via HTTP follow-your-nose. Memento Update CNI Task Force Meeting, Spring 2011 30
  • Memento Framework Memento UpdateCNI Task Force Meeting, Spring 2011 31
  • Memento Framework & Time SeriesOriginal Resource: http://dbpedia.org/resource/France Memento Update CNI Task Force Meeting, Spring 2011 32
  • Time Travel across DBpedia Versions Data collected through HTTP Navigation paper at http://arxiv.org/abs/1003.3661 Memento Update CNI Task Force Meeting, Spring 2011 33
  • Overview of Memento FrameworkDeployment ProgressMemento and DataMemento and DiscoveryMemento and BrandingAlternative Web Archiving Strategies Memento Update CNI Task Force Meeting, Spring 2011 34
  • Very few Web sites provide a “timegate” link.Need additional mechanisms to support Discovery. Memento Update CNI Task Force Meeting, Spring 2011 35
  • Batch discovery of Mementos: TimeMaps A TimeMap minimally lists:•  URI and datetime of Mementos known to an archive•  URI of Original Resource TimeMaps can be aggregated across systems that host Mementos Memento Update CNI Task Force Meeting, Spring 2011 36
  • Batch discovery of Mementos: Feed of TimeMaps•  System that host Mementos exposes Feed (e.g. Atom) ofTimeMaps to allow applications to remain in sync with itsevolving Memento collection: •  One Atom entry per Original Resource for which system hosts Mementos; •  The entry provides a “timemap” link to a TimeMap for the Original Resource; •  The datetime value of the updated field of the entry changes when additional Memento for Original Resource becomes available (i.e. TimeMap changes); •  The ID of the entry is a tag URI based on URI of Original Resource. Will be proposed to IIPC Memento Update CNI Task Force Meeting, Spring 2011 37
  • Batch discovery of Mementos: robots.txt•  robots.txt file is used by Web servers to conveycrawling policies;•  Add a directive to support discovery of Mementos known tothe server: •  Pointer to a single Memento can suffice as the robot can crawl on from there •  Mementos allow for discovery of TimeMaps via HTTP links. •  e.g. jcdl.org hosts snapshot archives of prior JCDL conferences and adds the following to its robots.txt: Memento: jcdl.org/archive/2002/index.html Will be promoted via Internet Draft Memento Update CNI Task Force Meeting, Spring 2011 38
  • Batch discovery of TimeGates: robots.txt•  robots.txt file is used by Web servers to conveycrawling policies;•  Add a directive to support discovery of TimeGates knownto the server: •  TimeGates can be on server itself or on external server •  Value for the directive is typcially a regular expression •  e.g example.org could point at TimeGates in its associated transactional ta.org via robots.txt: TimeGate: ta.org/timegate/http:// example.org/* Will be promoted via Internet Draft Memento Update CNI Task Force Meeting, Spring 2011 39
  • Discovery of Systems that Host Mementos: Registry/Feed •  Registry of collections of Mementos, e.g. of Web Archives, Transactional Archives, etc. •  Feed of registry records. •  A registry record details essential characteristics of a Memento collection. •  cf VOiD collection description for Linked Data. Will be researched Memento Update CNI Task Force Meeting, Spring 2011 40
  • Overview of Memento FrameworkDeployment ProgressMemento and DataMemento and DiscoveryMemento and BrandingAlternative Web Archiving Strategies Memento Update CNI Task Force Meeting, Spring 2011 41
  • Memento can recreate pages using resources from different archives.This poses a branding challenge for archives. Memento Update CNI Task Force Meeting, Spring 2011 42
  • Current Branding Practice for Web Archives Page and embedded resources from same Web Archive Branding for page andembeddedresources Memento Update CNI Task Force Meeting, Spring 2011 43
  • Branding for Web Archives in Memento Mode Page and embedded resources from various Web Archives Pagebranding Nobranding Nobranding Will be researched Memento Update CNI Task Force Meeting, Spring 2011 44
  • Overview of Memento FrameworkDeployment ProgressMemento and DataMemento and DiscoveryMemento and BrandingAlternative Web Archiving Strategies Memento Update CNI Task Force Meeting, Spring 2011 45
  • Crawl-based Archives host distinct observations. Transactional Archives never miss an update. Memento Update CNI Task Force Meeting, Spring 2011 46
  • Crawl-Based Web Archives ObservationsFor example: Heritrix crawler for Internet Archive Memento Update CNI Task Force Meeting, Spring 2011 47
  • Crawl-Based Web Archives•  Collect discreet observations of resources, not their entireevolution.•  Can be rejected (robots.txt, by user-agent, by hostIP)•  Can be deceived (cloaking, by geo-location, by user-agent).•  Coverage of particular Web server dependent on crawl-strategy. Memento Update CNI Task Force Meeting, Spring 2011 48
  • Server-Side Transactional Web Archives Change HistoryFor example: TTApache, PageVault, Vignette Web Capture Memento Update CNI Task Force Meeting, Spring 2011 49
  • Server-Side Transactional Web Archives•  Collect all representations served by to-be-archived server.•  To-be-archived server needs to cooperate. •  Incentives e.g. institutional memory, official record of Web presence.•  Archival coverage restricted by to-be-archived server, doesnot include external servers (e.g. embedded resources).•  To be archived server can submit falsified information.•  Archival collection management: what to keep, what not(e.g. significant changes, deduplication, …). Memento Update CNI Task Force Meeting, Spring 2011 50
  • Development of Transactional Web Archive SoftwareCapture:•  Apache connection filter module (mod_ta) captures URI, headers, body;•  Module POSTs in real-time to transactional archive’s Submit URI.Submit:•  Java-Grizzly-Jersey submission interface application;•  Berkeley DB metadata store;•  FS store for body and headers. Memento Update CNI Task Force Meeting, Spring 2011 51
  • Development of Transactional Web Archive SoftwareAccess:•  Transactional archive natively supports Memento;•  Immediate availability of archived content;•  Export of WARC, e.g. for long-term archiving in other environment.Development timeline:•  Ongoing development (LANL) and testing (ODU);•  Submit/Access finalized; development focus on collection management.•  Expected release as open source, 3rd Quarter 2011. Memento Update CNI Task Force Meeting, Spring 2011 52
  • Memento http://mementoweb.org/ Herbert Van de Sompel Robert Sanderson Michael L. NelsonBig Leaps Towards Seamless Navigation of the Web of the Past Memento Update CNI Task Force Meeting, Spring 2011 53