SlideShare a Scribd company logo
1 of 45
Digital Preservation Research
at Old Dominion University
Justin F. Brunelle
The MITRE Corporation
Old Dominion University
(And hopefully MITRE, soon)
Why are we listening?
• Overview of the problem
• BRIEF introduction to ODU WSDL group
research
• Memento
• I’ll be skipping around, so don’t hesitate to
interrupt me
Digital Preservation
• Using the past Web
– Focus of our research
• Temporal Browsing
– Sessions in the past
• Recovering Lost Pages
– Is it really gone?
• 404s
– How to fix broken links?
1
same URI
maps to same
or very similar
content at a
later time
2
same URI
maps to
different
content at a
later time
3
different URI
maps to same
or very similar
content at the
same or at a
later time
4
the content
can not be
found at
any URI
U1
C1
U1
C1
timeA B
U1
C2
U1
C1
timeA B
U2
C1
U1
C1
U1
404
timeA B
U1
??
U1
C1
timeA B
Change on the Web
Time to Talk About Saving
Everything?
Dinner for one or two costs more than 1TB disk Wikis have popularized versioning
Cool URIs (http://www.w3.org/Provider/Style/URI.html) are widely adopted, e.g.:
http://news.yahoo.com/s/ap/20100920/ap_on_el_se/us_alaska_senate
http://d.yimg.com/a/p/ap/20100918/capt.67567dbc0a874b689f0b4a5c392f379c-67567dbc0a874b689f0b4a5c392f379c-0.jpg
http://d.yimg.com/a/p/afp/20100918/thumb.photo_1284846332993-1-0.jpg
Also related projects with cool URI / permalink focus:
http://www.citability.org/
http://data.gov/
http://data.gov.uk/
Fortress Model
• Get a lot of money
• Buy lots of storage
• Hire lots of people
• “Look upon my archive ye Mighty, and
despair!”
Alternate Methods
• Lazy Preservation (McCown)
– “How much preservation do I get if I do absolutely
nothing?”
• Just-In-Time Preservation (Klein)
– Wait for it to disappear, then find a “good ‘nuff”
version
• Shared Infrastructure Preservation
– Push content to sites that might preserve it
• arXiv.org, IA, WebCite…
• Server Enhanced Preservation
– Create archival-ready resources
And Soon…
• Social Preservation
– Preserving resources using 3rd
party Web Services
– Repository for OAI-ORE ReMs
– Social network feel
– Lazy-esque, server-side reconstruction
But I digress…
• Few years away…
• Preliminary research
• And now back to the prior research…
Web Infrastructure (McCown, 2007)
WayBack Machine
http://web.archive.org/web/*/http://www.thecribs.com/
http://mementoproxy.cs.odu.edu/aggr/timemap/link/http://www.thecribs.com/
from these we can
create time-based:
• indexes
• IDF values
• PageRank
Batch Recovery For Sites
http://warrick.cs.odu.edu/
Free limo rides for life?!
13
Reconstruction Diagram
added
20%
identical
50%
changed
33%
missing
17%
Real-Time Recovery for URIs
Synchronicity - www.cs.odu.edu/~mklein/
Memento wants to make navigating the
Web’s Past Easy
15
http://www.mementoweb.org
http://groups.google.com/group/memento-dev
What are you talking about?
• Universal Resource Identifier (URI) ~= URL
• Resource:
– <HTML>
• Representation
W3C Web Architecture: Resource –
URI - Representation
Resource
Representation
Represents
URI
Identifies
dereference
17
dereference content negotiation
W3C Web Architecture: Resource –
URI - Representation
Resource
URI
Identifies
Representation 1
Represents
Representation 2Represents
18
Resources
19
Resources have Representations
20
Resources have Representations that
Change over Time
21
Only the Current Representation is
Available from a Resource
22
Old Representations are Lost
Forever
23
Finding Archived Resources
Go to http://www.archive.org/ and search
http://cnn.com
On http://web.archive.org/web/*/http://cnn.com, select
desired datetime
24
Archived Resources
http://web.archive.org/web/20010911203610/http://www.c
nn.com/ archived resource for http://cnn.com
http://en.wikipedia.org/w/index.php?
title=September_11_attacks&oldid=282333 archived
resource for
http://en.wikipedia.org/wiki/September_11_attacks
Sep 11 2001, 20:36:10 UTC Dec 20 2001, 4:51:00 UTC
25
Navigating Archived Resources
http://en.wikipedia.org/w/index.php?
title=September_11_attacks&oldid=282333 archived
resource for
http://en.wikipedia.org/wiki/September_11_attacks3
Dec 20 2001, 4:51:00 UTC
http://en.wikipedia.org/wiki/The_Pentagon
current
Pentagon
26
Current and Past Web are Not
Integrated
27
• Current and Past Web based on
same technology.
• But, going from Current to
Past Web is a matter of (manual)
discovery.
• Memento wants to make going
from Current to Past Web a
(HTTP) protocol matter.
• Memento wants to integrate
Current And Past Web.
One Memento HTTP Navigation
28
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
One Memento HTTP Navigation
30
Scenario
• cnn.com includes Link to TimeGate at Internet Archive
• URI-R on one server, URI-G & URI-M on another
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
Memento HTTP Flow: URI-R
HEAD R, Accept-Datetime
HEAD http://cnn.com/ HTTP/1.1
Host: cnn.com
Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT
Connection: close
32
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
Memento HTTP Flow: Success –
URI-R
LinkG
HTTP/1.1 200 OK
Date: Thu, 21 Jan 2010 00:02:12 GMT
Server: Apache
Link: <http://web.archive.org/web/timegate/http://cnn.com>; rel="timegate"
Content-Length: 255
Connection: close
Content-Type: text/html; charset=iso-8859-1
34
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
GET G, Accept-Datetime
Memento HTTP Flow: URI-G
GET http://web.archive.org/web/timegate/http://cnn.com HTTP/1.1
Host: web.archive.org
Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT
Connection: close
36
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
Memento HTTP Flow: Success –
URI-G
302M, Vary, LinkR,B,M
HTTP/1.1 302 Found
Date: Thu, 21 Jan 2010 00:06:50 GMT
Server: Apache
TCN: choice
Vary: negotiate, accept-datetime
Location: http://web.archive.org/web/20010911203610/http://www.cnn.com
Link: <http://cnn.com/>; rel="original",
<http://web.archive.org/web/timebundle/http://cnn.com/>; rel="timebundle”,
<http://web.archive.org/web/20000915112826/http://www.cnn.com>;
rel=“first- memento”; datetime=“Tue, 15 Sep 2000 11:28:26 GMT”,
<http://web.archive.org/web/20080708093433/http://www.cnn.com>;
rel=“last-memento”; datetime="Tue, 08 Jul 2008 09:34:33 GMT”,
<http://web.archive.org/web/20010911203610/http://www.cnn.com>;
rel=“prev-memento”; datetime="Tue, 11 Sep 2001 20:30:51 GMT”,
<http://web.archive.org/web/20010911203610/http://www.cnn.com>;
rel=“next-memento”; datetime="Tue, 11 Sep 2001 20:47:33 GMT”
Content-Length: 0
Connection: close
Content-Type: text/plain; charset=UTF-8
38
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
GET M, Accept-Datetime
Memento HTTP Flow: URI-M
GET http://web.archive.org/web/20010911203610/http://www.cnn.com HTTP/1.1
Host: web.archive.org
Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT
Connection: close
40
Memento HTTP Flow
HEAD R, Accept-Datetime
LinkG
302M, Vary, TCN, LinkR,B,M
200, Content-Datetime, LinkR,B,M
GET G, Accept-Datetime
GET M, Accept-Datetime
Memento HTTP Flow: Success –
URI-M
200, Content-Datetime, LinkR,B,M
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
X-Archive-Orig-Accept-Ranges: bytes
…
Content-Type: text/html;charset=utf-8
Content-Length: 23364
Date: Thu, 21 Jan 2010 00:09:40 GMT
Content-Datetime: Tue, 11 Sep 2001 20:36:10 GMT
Link: <http://cnn.com/>; rel="original",
<http://web.archive.org/web/timebundle/http://cnn.com/>; rel="timebundle”,
<http://web.archive.org/web/20000915112826/http://www.cnn.com>;
rel=“first-memento”; datetime=“Tue, 15 Sep 2000 11:28:26 GMT”,
<http://web.archive.org/web/20080708093433/http://www.cnn.com>;
rel=“last-memento”; datetime="Tue, 08 Jul 2008 09:34:33 GMT”,
<http://web.archive.org/web/20010911203610/http://www.cnn.com>;
rel=“prev-memento”; datetime="Tue, 11 Sep 2001 20:30:51 GMT”,
<http://web.archive.org/web/20010911203610/http://www.cnn.com>;
rel=“next-memento”; datetime="Tue, 11 Sep 2001 20:47:33 GMT”
Connection: close
What does it all mean?
• Cutting edge technology
• Existing Infrastructure
• Redefining Web surfing
• MAJOR “real world” implications
Closing Thoughts
Preservation not for
privileged priesthood
http://doi.acm.org/10.1145/1592761.1592794
http://booktwo.org/notebook/wikipedia-historiography/
no more hoary stories
about format obsolescence:
http://blog.dshr.org/2010/09/reinforcing-my-point.html
Don't dessicate resources;
leave them on the web
Endless metadata is not
preservation…
archiving as branded service,
not infrastructure
http://blog.dshr.org/2010/06/jcdl-2010-keynote.html
Acknowledgements
• Slides borrowed from:
• Dr. Michael L. Nelson:
– http://www.slideshare.net/phonedude/my-point-of-view-
michael-l-nelson-web-archiving-cooperative
– http://www.slideshare.net/phonedude/review-of-web-
archiving
– http://www.slideshare.net/phonedude/memento-time-
travel-for-the-web
• Martin Klein:
– http://www.slideshare.net/phonedude/synchronicity-
justintime-discovery-of-lost-web-pages

More Related Content

Similar to Digital Preservation - ODU

Memento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMapsMemento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMapsMichael Nelson
 
Summarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMapSummarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMapSawood Alam
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital PreservationMat Kelly
 
Herbert Van De Sompel - Time Travel for the Web
Herbert Van De Sompel - Time Travel for the WebHerbert Van De Sompel - Time Travel for the Web
Herbert Van De Sompel - Time Travel for the WebiMinds conference
 
Prototypes of pro-active approaches to support the archiving of web reference...
Prototypes of pro-active approaches to support the archiving of web reference...Prototypes of pro-active approaches to support the archiving of web reference...
Prototypes of pro-active approaches to support the archiving of web reference...EDINA, University of Edinburgh
 
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...EDINA, University of Edinburgh
 
Introduction to Web Programming - first course
Introduction to Web Programming - first courseIntroduction to Web Programming - first course
Introduction to Web Programming - first courseVlad Posea
 
Introducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research GroupIntroducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research GroupSawood Alam
 
Detecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARCDetecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARCMichele Weigle
 
Avoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using MementoAvoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using MementoShawn Jones
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web PagesMichael Nelson
 
Synchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web PagesSynchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web PagesMichael Nelson
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingMichael Nelson
 
ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013Ahmed AlSum
 
Useful Websites (Report in Online Journalism)
Useful Websites (Report in Online Journalism)Useful Websites (Report in Online Journalism)
Useful Websites (Report in Online Journalism)Kim Ortego
 
Web decay and Internet Archive
Web decay and Internet ArchiveWeb decay and Internet Archive
Web decay and Internet ArchiveDr. B T Sampath Kumar
 
Filling in the Blanks: Capturing Dynamically Generated Content
Filling in the Blanks: Capturing Dynamically Generated ContentFilling in the Blanks: Capturing Dynamically Generated Content
Filling in the Blanks: Capturing Dynamically Generated ContentJustin Brunelle
 

Similar to Digital Preservation - ODU (20)

Memento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMapsMemento: TimeGates, TimeBundles, and TimeMaps
Memento: TimeGates, TimeBundles, and TimeMaps
 
The Memento protocol
The Memento protocolThe Memento protocol
The Memento protocol
 
Summarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMapSummarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMap
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital Preservation
 
Herbert Van De Sompel - Time Travel for the Web
Herbert Van De Sompel - Time Travel for the WebHerbert Van De Sompel - Time Travel for the Web
Herbert Van De Sompel - Time Travel for the Web
 
Prototypes of pro-active approaches to support the archiving of web reference...
Prototypes of pro-active approaches to support the archiving of web reference...Prototypes of pro-active approaches to support the archiving of web reference...
Prototypes of pro-active approaches to support the archiving of web reference...
 
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
 
Memento 101
Memento 101Memento 101
Memento 101
 
Introduction to Web Programming - first course
Introduction to Web Programming - first courseIntroduction to Web Programming - first course
Introduction to Web Programming - first course
 
Introducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research GroupIntroducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research Group
 
Detecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARCDetecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARC
 
Avoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using MementoAvoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using Memento
 
Internet content as research data
Internet content as research dataInternet content as research data
Internet content as research data
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages
 
Synchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web PagesSynchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web Pages
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
 
ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013
 
Useful Websites (Report in Online Journalism)
Useful Websites (Report in Online Journalism)Useful Websites (Report in Online Journalism)
Useful Websites (Report in Online Journalism)
 
Web decay and Internet Archive
Web decay and Internet ArchiveWeb decay and Internet Archive
Web decay and Internet Archive
 
Filling in the Blanks: Capturing Dynamically Generated Content
Filling in the Blanks: Capturing Dynamically Generated ContentFilling in the Blanks: Capturing Dynamically Generated Content
Filling in the Blanks: Capturing Dynamically Generated Content
 

More from Justin Brunelle

Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...
Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...
Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...Justin Brunelle
 
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...Justin Brunelle
 
Not All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos
Not All Mementos Are Created Equal: Measuring The Impact Of Missing MementosNot All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos
Not All Mementos Are Created Equal: Measuring The Impact Of Missing MementosJustin Brunelle
 
How I spend my summer vacations
How I spend my summer vacationsHow I spend my summer vacations
How I spend my summer vacationsJustin Brunelle
 
An Evaluation of Caching Policies for Memento TimeMaps
An Evaluation of Caching Policies for Memento TimeMapsAn Evaluation of Caching Policies for Memento TimeMaps
An Evaluation of Caching Policies for Memento TimeMapsJustin Brunelle
 
Day in the Life of a Computer Scientist
Day in the Life of a Computer ScientistDay in the Life of a Computer Scientist
Day in the Life of a Computer ScientistJustin Brunelle
 
Agile Engineering - ODU ACM
Agile Engineering - ODU ACMAgile Engineering - ODU ACM
Agile Engineering - ODU ACMJustin Brunelle
 
Digital Preservation at ODU
Digital Preservation at ODUDigital Preservation at ODU
Digital Preservation at ODUJustin Brunelle
 

More from Justin Brunelle (9)

Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...
Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...
Scripts in a Frame: A Two-Tiered Crawling Approach to Archiving Deferred Repr...
 
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
 
Not All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos
Not All Mementos Are Created Equal: Measuring The Impact Of Missing MementosNot All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos
Not All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos
 
How I spend my summer vacations
How I spend my summer vacationsHow I spend my summer vacations
How I spend my summer vacations
 
An Evaluation of Caching Policies for Memento TimeMaps
An Evaluation of Caching Policies for Memento TimeMapsAn Evaluation of Caching Policies for Memento TimeMaps
An Evaluation of Caching Policies for Memento TimeMaps
 
Day in the Life of a Computer Scientist
Day in the Life of a Computer ScientistDay in the Life of a Computer Scientist
Day in the Life of a Computer Scientist
 
Agile Engineering - ODU ACM
Agile Engineering - ODU ACMAgile Engineering - ODU ACM
Agile Engineering - ODU ACM
 
Records expo
Records expoRecords expo
Records expo
 
Digital Preservation at ODU
Digital Preservation at ODUDigital Preservation at ODU
Digital Preservation at ODU
 

Recently uploaded

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Dr. Mazin Mohamed alkathiri
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 

Recently uploaded (20)

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 

Digital Preservation - ODU

  • 1. Digital Preservation Research at Old Dominion University Justin F. Brunelle The MITRE Corporation Old Dominion University (And hopefully MITRE, soon)
  • 2. Why are we listening? • Overview of the problem • BRIEF introduction to ODU WSDL group research • Memento • I’ll be skipping around, so don’t hesitate to interrupt me
  • 3. Digital Preservation • Using the past Web – Focus of our research • Temporal Browsing – Sessions in the past • Recovering Lost Pages – Is it really gone? • 404s – How to fix broken links?
  • 4. 1 same URI maps to same or very similar content at a later time 2 same URI maps to different content at a later time 3 different URI maps to same or very similar content at the same or at a later time 4 the content can not be found at any URI U1 C1 U1 C1 timeA B U1 C2 U1 C1 timeA B U2 C1 U1 C1 U1 404 timeA B U1 ?? U1 C1 timeA B Change on the Web
  • 5. Time to Talk About Saving Everything? Dinner for one or two costs more than 1TB disk Wikis have popularized versioning Cool URIs (http://www.w3.org/Provider/Style/URI.html) are widely adopted, e.g.: http://news.yahoo.com/s/ap/20100920/ap_on_el_se/us_alaska_senate http://d.yimg.com/a/p/ap/20100918/capt.67567dbc0a874b689f0b4a5c392f379c-67567dbc0a874b689f0b4a5c392f379c-0.jpg http://d.yimg.com/a/p/afp/20100918/thumb.photo_1284846332993-1-0.jpg Also related projects with cool URI / permalink focus: http://www.citability.org/ http://data.gov/ http://data.gov.uk/
  • 6. Fortress Model • Get a lot of money • Buy lots of storage • Hire lots of people • “Look upon my archive ye Mighty, and despair!”
  • 7. Alternate Methods • Lazy Preservation (McCown) – “How much preservation do I get if I do absolutely nothing?” • Just-In-Time Preservation (Klein) – Wait for it to disappear, then find a “good ‘nuff” version • Shared Infrastructure Preservation – Push content to sites that might preserve it • arXiv.org, IA, WebCite… • Server Enhanced Preservation – Create archival-ready resources
  • 8. And Soon… • Social Preservation – Preserving resources using 3rd party Web Services – Repository for OAI-ORE ReMs – Social network feel – Lazy-esque, server-side reconstruction
  • 9. But I digress… • Few years away… • Preliminary research • And now back to the prior research…
  • 12. Batch Recovery For Sites http://warrick.cs.odu.edu/ Free limo rides for life?!
  • 14. Real-Time Recovery for URIs Synchronicity - www.cs.odu.edu/~mklein/
  • 15. Memento wants to make navigating the Web’s Past Easy 15 http://www.mementoweb.org http://groups.google.com/group/memento-dev
  • 16. What are you talking about? • Universal Resource Identifier (URI) ~= URL • Resource: – <HTML> • Representation
  • 17. W3C Web Architecture: Resource – URI - Representation Resource Representation Represents URI Identifies dereference 17
  • 18. dereference content negotiation W3C Web Architecture: Resource – URI - Representation Resource URI Identifies Representation 1 Represents Representation 2Represents 18
  • 21. Resources have Representations that Change over Time 21
  • 22. Only the Current Representation is Available from a Resource 22
  • 23. Old Representations are Lost Forever 23
  • 24. Finding Archived Resources Go to http://www.archive.org/ and search http://cnn.com On http://web.archive.org/web/*/http://cnn.com, select desired datetime 24
  • 25. Archived Resources http://web.archive.org/web/20010911203610/http://www.c nn.com/ archived resource for http://cnn.com http://en.wikipedia.org/w/index.php? title=September_11_attacks&oldid=282333 archived resource for http://en.wikipedia.org/wiki/September_11_attacks Sep 11 2001, 20:36:10 UTC Dec 20 2001, 4:51:00 UTC 25
  • 26. Navigating Archived Resources http://en.wikipedia.org/w/index.php? title=September_11_attacks&oldid=282333 archived resource for http://en.wikipedia.org/wiki/September_11_attacks3 Dec 20 2001, 4:51:00 UTC http://en.wikipedia.org/wiki/The_Pentagon current Pentagon 26
  • 27. Current and Past Web are Not Integrated 27 • Current and Past Web based on same technology. • But, going from Current to Past Web is a matter of (manual) discovery. • Memento wants to make going from Current to Past Web a (HTTP) protocol matter. • Memento wants to integrate Current And Past Web.
  • 28. One Memento HTTP Navigation 28
  • 29. Memento HTTP Flow HEAD R, Accept-Datetime LinkG 302M, Vary, TCN, LinkR,B,M 200, Content-Datetime, LinkR,B,M GET G, Accept-Datetime GET M, Accept-Datetime
  • 30. One Memento HTTP Navigation 30 Scenario • cnn.com includes Link to TimeGate at Internet Archive • URI-R on one server, URI-G & URI-M on another
  • 31. Memento HTTP Flow HEAD R, Accept-Datetime LinkG 302M, Vary, TCN, LinkR,B,M 200, Content-Datetime, LinkR,B,M GET G, Accept-Datetime GET M, Accept-Datetime
  • 32. Memento HTTP Flow: URI-R HEAD R, Accept-Datetime HEAD http://cnn.com/ HTTP/1.1 Host: cnn.com Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT Connection: close 32
  • 33. Memento HTTP Flow HEAD R, Accept-Datetime LinkG 302M, Vary, TCN, LinkR,B,M 200, Content-Datetime, LinkR,B,M GET G, Accept-Datetime GET M, Accept-Datetime
  • 34. Memento HTTP Flow: Success – URI-R LinkG HTTP/1.1 200 OK Date: Thu, 21 Jan 2010 00:02:12 GMT Server: Apache Link: <http://web.archive.org/web/timegate/http://cnn.com>; rel="timegate" Content-Length: 255 Connection: close Content-Type: text/html; charset=iso-8859-1 34
  • 35. Memento HTTP Flow HEAD R, Accept-Datetime LinkG 302M, Vary, TCN, LinkR,B,M 200, Content-Datetime, LinkR,B,M GET G, Accept-Datetime GET M, Accept-Datetime
  • 36. GET G, Accept-Datetime Memento HTTP Flow: URI-G GET http://web.archive.org/web/timegate/http://cnn.com HTTP/1.1 Host: web.archive.org Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT Connection: close 36
  • 37. Memento HTTP Flow HEAD R, Accept-Datetime LinkG 302M, Vary, TCN, LinkR,B,M 200, Content-Datetime, LinkR,B,M GET G, Accept-Datetime GET M, Accept-Datetime
  • 38. Memento HTTP Flow: Success – URI-G 302M, Vary, LinkR,B,M HTTP/1.1 302 Found Date: Thu, 21 Jan 2010 00:06:50 GMT Server: Apache TCN: choice Vary: negotiate, accept-datetime Location: http://web.archive.org/web/20010911203610/http://www.cnn.com Link: <http://cnn.com/>; rel="original", <http://web.archive.org/web/timebundle/http://cnn.com/>; rel="timebundle”, <http://web.archive.org/web/20000915112826/http://www.cnn.com>; rel=“first- memento”; datetime=“Tue, 15 Sep 2000 11:28:26 GMT”, <http://web.archive.org/web/20080708093433/http://www.cnn.com>; rel=“last-memento”; datetime="Tue, 08 Jul 2008 09:34:33 GMT”, <http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel=“prev-memento”; datetime="Tue, 11 Sep 2001 20:30:51 GMT”, <http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel=“next-memento”; datetime="Tue, 11 Sep 2001 20:47:33 GMT” Content-Length: 0 Connection: close Content-Type: text/plain; charset=UTF-8 38
  • 39. Memento HTTP Flow HEAD R, Accept-Datetime LinkG 302M, Vary, TCN, LinkR,B,M 200, Content-Datetime, LinkR,B,M GET G, Accept-Datetime GET M, Accept-Datetime
  • 40. GET M, Accept-Datetime Memento HTTP Flow: URI-M GET http://web.archive.org/web/20010911203610/http://www.cnn.com HTTP/1.1 Host: web.archive.org Accept-Datetime: Tue, 11 Sep 2001 20:35:00 GMT Connection: close 40
  • 41. Memento HTTP Flow HEAD R, Accept-Datetime LinkG 302M, Vary, TCN, LinkR,B,M 200, Content-Datetime, LinkR,B,M GET G, Accept-Datetime GET M, Accept-Datetime
  • 42. Memento HTTP Flow: Success – URI-M 200, Content-Datetime, LinkR,B,M HTTP/1.1 200 OK Server: Apache-Coyote/1.1 X-Archive-Orig-Accept-Ranges: bytes … Content-Type: text/html;charset=utf-8 Content-Length: 23364 Date: Thu, 21 Jan 2010 00:09:40 GMT Content-Datetime: Tue, 11 Sep 2001 20:36:10 GMT Link: <http://cnn.com/>; rel="original", <http://web.archive.org/web/timebundle/http://cnn.com/>; rel="timebundle”, <http://web.archive.org/web/20000915112826/http://www.cnn.com>; rel=“first-memento”; datetime=“Tue, 15 Sep 2000 11:28:26 GMT”, <http://web.archive.org/web/20080708093433/http://www.cnn.com>; rel=“last-memento”; datetime="Tue, 08 Jul 2008 09:34:33 GMT”, <http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel=“prev-memento”; datetime="Tue, 11 Sep 2001 20:30:51 GMT”, <http://web.archive.org/web/20010911203610/http://www.cnn.com>; rel=“next-memento”; datetime="Tue, 11 Sep 2001 20:47:33 GMT” Connection: close
  • 43. What does it all mean? • Cutting edge technology • Existing Infrastructure • Redefining Web surfing • MAJOR “real world” implications
  • 44. Closing Thoughts Preservation not for privileged priesthood http://doi.acm.org/10.1145/1592761.1592794 http://booktwo.org/notebook/wikipedia-historiography/ no more hoary stories about format obsolescence: http://blog.dshr.org/2010/09/reinforcing-my-point.html Don't dessicate resources; leave them on the web Endless metadata is not preservation… archiving as branded service, not infrastructure http://blog.dshr.org/2010/06/jcdl-2010-keynote.html
  • 45. Acknowledgements • Slides borrowed from: • Dr. Michael L. Nelson: – http://www.slideshare.net/phonedude/my-point-of-view- michael-l-nelson-web-archiving-cooperative – http://www.slideshare.net/phonedude/review-of-web- archiving – http://www.slideshare.net/phonedude/memento-time- travel-for-the-web • Martin Klein: – http://www.slideshare.net/phonedude/synchronicity- justintime-discovery-of-lost-web-pages