Robert Sanderson Herbert Van de Sompel
rsanderson@lanl.gov herbertv@lanl.gov
azaroth42@gmail.com hvdsomp@gmail.com
Digital Library Research
and Prototyping Team
Los Alamos Na@onal Laboratory,
USA
Persistent Web Annotations
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
10+ Years of Annota@on Research
Persistent Web Annotations Slide: 3
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Our 2020 Vision
web
Persistent Web Annotations Slide: 4
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Web Resources Change …
Google Sidewiki Annotation on http://news.bbc.co.uk/ as of 2010-06-14
Persistent Web Annotations Slide: 6
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Archived Copy, But No Annota@ons
Archived page from:
http://www.dracos.co.uk/work/bbc-news-archive/2010/03/08/07.05.html
Persistent Web Annotations Slide: 7
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Can We Fix This? Automa@cally?
The desired outcome:
Display the correct representation of the Web Resource with the Annotation.
Persistent Web Annotations Slide: 8
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Previous Annota@on Persistence Methods
• Migrate annota@ons from one version to the next:
• Seek to discover new loca@on of old target segment
• Otherwise discard the annota@on as no longer relevant
• Treats the Annota@on as of secondary importance
• Focused on heuris@cs:
• Cross format, cross loca@on
• Edited text in same document
• Dynamically scaling target areas, marks of annota@on
• …
Persistent Web Annotations Slide: 9
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Persistent and Web‐Centric?
• OAC: Describe Annota@ons in a Web‐centric Model
+
• Memento: Make Naviga@ng the Past Web Easy
=
• Given an Annota@on, display appropriate archived Web Resource?
• Given an archived Web Resource, display appropriate Annota@ons?
Persistent Web Annotations Slide: 10
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Open Annota@on Collabora@on
• Focus: Interoperability between systems to enable sharing
• Founda@on: Architecture of the World Wide Web
• Framework: Linked Data Guidelines
• Funding: Mellon Founda@on for 18 months
Persistent Web Annotations Slide: 11
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
OAC Data Model: Basics
• An oac:Annota@on is an ore:Aggrega@on of two or more resources,
such that one (oac:Body) annotates at least one other (oac:Target)
• We get OAI‐ORE en@@es for free (ore:ResourceMap, ore:Proxy)
• All resources are regular web resources
Persistent Web Annotations Slide: 12
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
OAC Data Model: Basics
Persistent Web Annotations Slide: 13
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
OAC Data Model: Segments
• Most Annota@ons are about part of
a resource
• Resources are atomic, in terms of
iden@fica@on (by a URI)
• Segments of the resource apply in
the context of the Annota@on
• Solu@on: aaach a Descrip@on of the
Segment of interest to an ORE
Proxy for the resource
Persistent Web Annotations Slide: 14
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
OAC Data Model: Segments
Persistent Web Annotations Slide: 15
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
OAC Data Model: Time
• As regular web resources, Resource Map, Body and Target have
representa@ons that can change over @me
• The Resource Map, Body and Target can change independently of
each other
• If an Annota@on involves resources as they existed at a par@cular
point in @me, this needs to be recorded
• Three different Time models are possible…
Persistent Web Annotations Slide: 16
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Timeless Annota@ons
• The Annota@on is always applicable, regardless of the
representa@on served from the URIs of the Body and Targets.
• Example: "This is the home page of CNN"
• Timeless Annota@ons do not need a special @mestamp.
Persistent Web Annotations Slide: 17
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Uniform Time Annota@ons
• The Annota@on is not always applicable, but pertains to the state
of the Body and Target at a single moment in @me.
• Example: Tweet is about contemporary state of a web page.
• Add mem:when property to Annota@on
Persistent Web Annotations Slide: 18
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Varied Time Annota@ons
• The Annota@on is not always applicable, but pertains to the state
of the Body and Target at different moments in @me.
• Example: Blog post is about previous day's state of a web page
• Add mem:when property to Proxies for resources
Persistent Web Annotations Slide: 19
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Issues with the Past Web
• New names for archived resources
• What was cnn.com, becomes archive.org/web/20010120…
• … And lots of other names
• With no way to discover them without searching by hand
• People do not like to search
• Especially when a computer could do it.
• Naviga@on is inconsistent
• Stuck in web archive content silo (URIs rewriaen)
• Or end up back in present (URIs not rewriaen)
Persistent Web Annotations Slide: 21
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
The Web without Time Dimension
eg: http://www.cnn.com/ eg: http://web.archive.org/web/
20020209001709rn_1/www.cnn.com/?
Persistent Web Annotations Slide: 22
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Apr 10 2001, 21:39:30 UTC
current
Aug 15 2004, 08:45:27 UTC
Aug 15 2007, 19:21:58 UTC
www.cnn.com web.archive.org
Persistent Web Annotations Slide: 24
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Original
Resource
? Mementos
Apr 10 2001, 21:39:30 UTC
current
Aug 15 2004, 08:45:27 UTC
Aug 15 2007, 19:21:58 UTC
www.cnn.com web.archive.org
Persistent Web Annotations Slide: 25
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Original
Resource
?
TimeGate Mementos
Apr 10 2001, 21:39:30 UTC
current
Aug 15 2004, 08:45:27 UTC
Aug 15 2007, 19:21:58 UTC
www.cnn.com web.archive.org
Persistent Web Annotations Slide: 26
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Original
Resource
Link Headers
?
TimeGate
Conneg with TimeGate to Mementos
Mementos
Apr 10 2001, 21:39:30 UTC
current
Aug 15 2004, 08:45:27 UTC
Aug 15 2007, 19:21:58 UTC
www.cnn.com web.archive.org
Persistent Web Annotations Slide: 27
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Experiments
1. Given an annota@on, find the appropriate representa@ons
• Create annota@on on resource known to change
• Can we use the informa@on from the annota@on to faithfully
recreate the environment through Memento?
2. Given an archived resource, find the appropriate annota@ons
• Create annota@ons at different @mes on resource known to
be archived
• Can we use the informa@on from Memento to find the
appropriate annota@ons?
Persistent Web Annotations Slide: 28
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Experiment 1
Retrieve
Persistent Web Annotations Slide: 29
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Experiment 1
Reconstruct
Persistent Web Annotations Slide: 30
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Experiment 1: Create Annota@on
Persistent Web Annotations Slide: 31
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Experiment 1: Test without Memento
Persistent Web Annotations Slide: 32
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Experiment 1: Test with Memento
Persistent Web Annotations Slide: 33
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Experiment 2
• Need to find Original URI, start, end time of representation
• Need searchable collection of annotations
Persistent Web Annotations Slide: 34
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Experiment 2: Create Annota@ons
Persistent Web Annotations Slide: 35
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Experiment 2: Create Annota@ons
Persistent Web Annotations Slide: 36
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Experiment 2: Create Annota@ons
Persistent Web Annotations Slide: 37
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Experiment 2: Test without Memento
Persistent Web Annotations Slide: 38
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Experiment 2: Test with Memento
Persistent Web Annotations Slide: 39
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Conclusions
• Annota@on, as a core scholarly prac@ce, is increasingly web‐based
• We propose using OAC and Memento to provide a solu@on for
persistence of annota@ons, by displaying annota@ons in their
original context and displaying relevant annota@ons for archived
resources
• Archiving of annotated and annota@ng resources important
• Interes@ng research ques@on of whether annota@on spans
mul@ple archived resources
Persistent Web Annotations Slide: 40
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Thank You
• Authors:
• azaroth42@gmail.com / rsanderson@lanl.gov
• hvdsomp@gmail.com / herbertv@lanl.gov
• OAC:
• hap://www.openannota@on.org/
• hap://groups.google.com/group/oac‐discuss
• Memento:
• hap://www.mementoweb.org/
• hap://groups.google.com/group/memento‐dev
• Thanks To:
• Scoa Ainsworth, Luda Balakireva, Tim Cole, Anna Gerber, Bernhard
Haslhofer, Eric Hetzner, Jane Hunter, Cliff Lynch, Michael Nelson, Doug
Reside, Harihar Shankar
Persistent Web Annotations Slide: 41
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia
Memento HTTP Flow
HEAD R, (Accept-Datetime)
LinkG
GET G, Accept-Datetime
302M, Vary, TCN, LinkR,B,M
GET M, (Accept-Datetime)
200, Content-Datetime, LinkR,B,M
Persistent Web Annotations Slide: 42
Rob Sanderson, Herbert Van de Sompel
JCDL 2010, June 21-25, Surfers Paradise, Australia