This is the slide deck of the presentation given to the RRAC national group meeting on 10-20-2010. It is a summary of the research efforts in Digital Preservation at ODU.
1. Digital Preservation Research
at Old Dominion University
Justin F. Brunelle
The MITRE Corporation
Old Dominion University
(And hopefully MITRE, soon)
2. Why are we listening?
• Overview of the problem
• BRIEF introduction to ODU WSDL group
research
• Memento
• I’ll be skipping around, so don’t hesitate to
interrupt me
3. Digital Preservation
• Using the past Web
– Focus of our research
• Temporal Browsing
– Sessions in the past
• Recovering Lost Pages
– Is it really gone?
• 404s
– How to fix broken links?
4. 1
same URI
maps to same
or very similar
content at a
later time
2
same URI
maps to
different
content at a
later time
3
different URI
maps to same
or very similar
content at the
same or at a
later time
4
the content
can not be
found at
any URI
U1
C1
U1
C1
timeA B
U1
C2
U1
C1
timeA B
U2
C1
U1
C1
U1
404
timeA B
U1
??
U1
C1
timeA B
Change on the Web
5. Time to Talk About Saving
Everything?
Dinner for one or two costs more than 1TB disk Wikis have popularized versioning
Cool URIs (http://www.w3.org/Provider/Style/URI.html) are widely adopted, e.g.:
http://news.yahoo.com/s/ap/20100920/ap_on_el_se/us_alaska_senate
http://d.yimg.com/a/p/ap/20100918/capt.67567dbc0a874b689f0b4a5c392f379c-67567dbc0a874b689f0b4a5c392f379c-0.jpg
http://d.yimg.com/a/p/afp/20100918/thumb.photo_1284846332993-1-0.jpg
Also related projects with cool URI / permalink focus:
http://www.citability.org/
http://data.gov/
http://data.gov.uk/
6. Fortress Model
• Get a lot of money
• Buy lots of storage
• Hire lots of people
• “Look upon my archive ye Mighty, and
despair!”
7. Alternate Methods
• Lazy Preservation (McCown)
– “How much preservation do I get if I do absolutely
nothing?”
• Just-In-Time Preservation (Klein)
– Wait for it to disappear, then find a “good ‘nuff”
version
• Shared Infrastructure Preservation
– Push content to sites that might preserve it
• arXiv.org, IA, WebCite…
• Server Enhanced Preservation
– Create archival-ready resources
8. And Soon…
• Social Preservation
– Preserving resources using 3rd
party Web Services
– Repository for OAI-ORE ReMs
– Social network feel
– Lazy-esque, server-side reconstruction
9. But I digress…
• Few years away…
• Preliminary research
• And now back to the prior research…
24. Finding Archived Resources
Go to http://www.archive.org/ and search
http://cnn.com
On http://web.archive.org/web/*/http://cnn.com, select
desired datetime
24
27. Current and Past Web are Not
Integrated
27
• Current and Past Web based on
same technology.
• But, going from Current to
Past Web is a matter of (manual)
discovery.
• Memento wants to make going
from Current to Past Web a
(HTTP) protocol matter.
• Memento wants to integrate
Current And Past Web.
43. What does it all mean?
• Cutting edge technology
• Existing Infrastructure
• Redefining Web surfing
• MAJOR “real world” implications
44. Closing Thoughts
Preservation not for
privileged priesthood
http://doi.acm.org/10.1145/1592761.1592794
http://booktwo.org/notebook/wikipedia-historiography/
no more hoary stories
about format obsolescence:
http://blog.dshr.org/2010/09/reinforcing-my-point.html
Don't dessicate resources;
leave them on the web
Endless metadata is not
preservation…
archiving as branded service,
not infrastructure
http://blog.dshr.org/2010/06/jcdl-2010-keynote.html
45. Acknowledgements
• Slides borrowed from:
• Dr. Michael L. Nelson:
– http://www.slideshare.net/phonedude/my-point-of-view-
michael-l-nelson-web-archiving-cooperative
– http://www.slideshare.net/phonedude/review-of-web-
archiving
– http://www.slideshare.net/phonedude/memento-time-
travel-for-the-web
• Martin Klein:
– http://www.slideshare.net/phonedude/synchronicity-
justintime-discovery-of-lost-web-pages