This is the slide deck of the presentation given to the RRAC national group meeting on 10-20-2010. It is a summary of the research efforts in Digital Preservation at ODU.
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Â
Digital Preservation - ODU
1. Digital Preservation Research
at Old Dominion University
Justin F. Brunelle
The MITRE Corporation
Old Dominion University
(And hopefully MITRE, soon)
2. Why are we listening?
⢠Overview of the problem
⢠BRIEF introduction to ODU WSDL group
research
⢠Memento
⢠Iâll be skipping around, so donât hesitate to
interrupt me
3. Digital Preservation
⢠Using the past Web
â Focus of our research
⢠Temporal Browsing
â Sessions in the past
⢠Recovering Lost Pages
â Is it really gone?
⢠404s
â How to fix broken links?
4. 1
same URI
maps to same
or very similar
content at a
later time
2
same URI
maps to
different
content at a
later time
3
different URI
maps to same
or very similar
content at the
same or at a
later time
4
the content
can not be
found at
any URI
U1
C1
U1
C1
timeA B
U1
C2
U1
C1
timeA B
U2
C1
U1
C1
U1
404
timeA B
U1
??
U1
C1
timeA B
Change on the Web
5. Time to Talk About Saving
Everything?
Dinner for one or two costs more than 1TB disk Wikis have popularized versioning
Cool URIs (http://www.w3.org/Provider/Style/URI.html) are widely adopted, e.g.:
http://news.yahoo.com/s/ap/20100920/ap_on_el_se/us_alaska_senate
http://d.yimg.com/a/p/ap/20100918/capt.67567dbc0a874b689f0b4a5c392f379c-67567dbc0a874b689f0b4a5c392f379c-0.jpg
http://d.yimg.com/a/p/afp/20100918/thumb.photo_1284846332993-1-0.jpg
Also related projects with cool URI / permalink focus:
http://www.citability.org/
http://data.gov/
http://data.gov.uk/
6. Fortress Model
⢠Get a lot of money
⢠Buy lots of storage
⢠Hire lots of people
⢠âLook upon my archive ye Mighty, and
despair!â
7. Alternate Methods
⢠Lazy Preservation (McCown)
â âHow much preservation do I get if I do absolutely
nothing?â
⢠Just-In-Time Preservation (Klein)
â Wait for it to disappear, then find a âgood ânuffâ
version
⢠Shared Infrastructure Preservation
â Push content to sites that might preserve it
⢠arXiv.org, IA, WebCiteâŚ
⢠Server Enhanced Preservation
â Create archival-ready resources
8. And SoonâŚ
⢠Social Preservation
â Preserving resources using 3rd
party Web Services
â Repository for OAI-ORE ReMs
â Social network feel
â Lazy-esque, server-side reconstruction
9. But I digressâŚ
⢠Few years awayâŚ
⢠Preliminary research
⢠And now back to the prior researchâŚ
24. Finding Archived Resources
Go to http://www.archive.org/ and search
http://cnn.com
On http://web.archive.org/web/*/http://cnn.com, select
desired datetime
24
27. Current and Past Web are Not
Integrated
27
⢠Current and Past Web based on
same technology.
⢠But, going from Current to
Past Web is a matter of (manual)
discovery.
⢠Memento wants to make going
from Current to Past Web a
(HTTP) protocol matter.
⢠Memento wants to integrate
Current And Past Web.
29. Memento HTTP Flow
HEAD R, Accept-Datetime
Linkď¨G
302ď¨M, Vary, TCN, Linkď¨R,B,M
200, Content-Datetime, Linkď¨R,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
30. One Memento HTTP Navigation
30
Scenario
⢠cnn.com includes Link to TimeGate at Internet Archive
⢠URI-R on one server, URI-G & URI-M on another
31. Memento HTTP Flow
HEAD R, Accept-Datetime
Linkď¨G
302ď¨M, Vary, TCN, Linkď¨R,B,M
200, Content-Datetime, Linkď¨R,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
32. Memento HTTP Flow: URI-R
HEAD R, Accept-Datetime
HEAD http://cnn.com/ HTTP/1.1
Host: cnn.com
Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT
Connection: close
32
33. Memento HTTP Flow
HEAD R, Accept-Datetime
Linkď¨G
302ď¨M, Vary, TCN, Linkď¨R,B,M
200, Content-Datetime, Linkď¨R,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
43. What does it all mean?
⢠Cutting edge technology
⢠Existing Infrastructure
⢠Redefining Web surfing
⢠MAJOR âreal worldâ implications
44. Closing Thoughts
Preservation not for
privileged priesthood
http://doi.acm.org/10.1145/1592761.1592794
http://booktwo.org/notebook/wikipedia-historiography/
no more hoary stories
about format obsolescence:
http://blog.dshr.org/2010/09/reinforcing-my-point.html
Don't dessicate resources;
leave them on the web
Endless metadata is not
preservationâŚ
archiving as branded service,
not infrastructure
http://blog.dshr.org/2010/06/jcdl-2010-keynote.html
45. Acknowledgements
⢠Slides borrowed from:
⢠Dr. Michael L. Nelson:
â http://www.slideshare.net/phonedude/my-point-of-view-
michael-l-nelson-web-archiving-cooperative
â http://www.slideshare.net/phonedude/review-of-web-
archiving
â http://www.slideshare.net/phonedude/memento-time-
travel-for-the-web
⢠Martin Klein:
â http://www.slideshare.net/phonedude/synchronicity-
justintime-discovery-of-lost-web-pages