SlideShare a Scribd company logo
1 of 35
On the Change in Archivability of
Websites Over Time
Mat Kelly, Justin F. Brunelle, Michele C. Weigle, and Michael L.
Nelson
Old Dominion University
{mkelly,jbrunelle,mweigle,mln}@cs.odu.edu
TPDL 2013, Valletta, Malta,
September 23, 2013
1
Preserving Web Pages
• Identify page content needed for re-render
• Save page contents
– HTML, Images, CSS, JavaScript
• Rewrite inter-resource references for
replay
TPDL 2013, Valletta, Malta,
September 23, 2013
2
From M. Klein, J. F. Brunelle, WADL 2013
Ease of Archiving
• HTML – text-based, references other resources
• Images
• CSS
• JavaScript
EXAMPLE
<html><body>
<img src=“meatball-logo.png” />
</body></html>
TPDL 2013, Valletta, Malta,
September 23, 2013
3
Ease of Archiving
• HTML
• Images – binary data, no embedded URIs
• CSS
• JavaScript
EXAMPLE
TPDL 2013, Valletta, Malta,
September 23, 2013
4
Ease of Archiving
• HTML
• Images
• CSS – text-based, references other resources
• JavaScript
EXAMPLE
body {
margin: 5px;
color: black;
background-image: url(„outerSpace.png‟);
}
TPDL 2013, Valletta, Malta,
September 23, 2013
5
Ease of Archiving
• HTML
• Images
• CSS
• JavaScript – text-based, references other resources,
URIs (not necessarily known until runtime)
EXAMPLE
$.ajax({
url: “meatball-logo” +“.”+“png”; //build logo URI at runtime
});
TPDL 2013, Valletta, Malta,
September 23, 2013
6
Ease of Archiving
• HTML
• Images
• CSS
• JavaScript – text-based, references other
resources, URIs (not necessarily known until
runtime)
EXAMPLE
$.ajax({
url: “wormlogo” +“.”+“png”; //build logo URI at runtime
});
TPDL 2013, Valletta, Malta,
September 23, 2013
7
BIG Problem
• JavaScript meant for browser
– Capable of JS Execution
• Crawlers only recently became capable of exec
• Crawlers getting smarter at extracting URIs
– Nowhere near perfect
TPDL 2013, Valletta, Malta,
September 23, 2013
8
Identifying Missing Resources
TPDL 2013, Valletta, Malta,
September 23, 2013
9
• Sometimes missing resources are subtle
From Live Web From Internet Archive Apr. 30, 2012
http://web.archive.org/web/20120430235712/http://maps.google.com
Identifying Missing Resources
TPDL 2013, Valletta, Malta,
September 23, 2013
10
• Archived version not interactive & resources are missing
From Live Web From Internet Archive Apr. 30, 2012
http://web.archive.org/web/20120430235712/http://maps.google.com
Identifying Missing Resources
TPDL 2013, Valletta, Malta,
September 23, 2013
11
• Other times, a failed AJAX call prevents other
resources from loading
YouTube (2011) in archives with failed Ajax call
http://web.archive.org/web/20110420002216/http://www.youtube.com/
Why JavaScript makes it difficult
• Archival crawlers don’t interpret DOM, are
made for capture and thus faster
TPDL 2013, Valletta, Malta,
September 23, 2013
12
Same JS Code Different Agents
functionality1()
functionality2()
functionality3()
Different functionality
(Recent versions)
• Interprets JavaScript
• Enabled pages w/ JS to
be archived better!
JavaScript and Accessibility
TPDL 2013, Valletta, Malta,
September 23, 2013
13
g u i d e l i n e s
secti n
• Content displaying should not be dependent
on script execution
• U.S. Government
– required to comply with accessibility standards
– Content should be available to crawler w/o JS
– Therefore, government sites are better preserved
• Right?
If not Accessible then not
Archivable
TPDL 2013, Valletta, Malta,
September 23, 2013
14
NASA.gov in Internet Archive, 2004
http://web.archive.org/web/20041014205942/http://www.nasa.gov
Finding out where NASA went
wrong
TPDL 2013, Valletta, Malta,
September 23, 2013
15
nasa.gov in 1996
nasa.gov in 1997
nasa.gov in 2012
…
Query
Archives
Step 1
Capture
DOM Requests for
resources
Screenshot of
Web page
HTTP
.jp
g
Step 2
Webkit
• Modern rendering
• Browser-like JS behavior
1997
NASA.gov over time
TPDL 2013, Valletta, Malta,
September 23, 2013
16
1996 1998 1999 2000 2001 2002
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
1997
NASA.gov over time
TPDL 2013, Valletta, Malta,
September 23, 2013
17
1996 1998 1999 2000 2001 2002
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
Does this apply to popular sites?
Alexa
Rank
Web Site
Name
1 Facebook.co
m
2 Google.com
3 YouTube.com
4 Yahoo.com
5 Baidu.com
6 Wikipedia.org
7 Live.org
8 Amazon.com
9 QQ.com
10 Twitter.com
TPDL 2013, Valletta, Malta,
September 23, 2013
18
• Alexa’s Top 10 websites
Does this apply to popular sites?
Alexa
Rank
Web Site
Name
1 Facebook.co
m
2 Google.com
3 YouTube.com
4 Yahoo.com
5 Baidu.com
6 Wikipedia.org
7 Live.org
8 Amazon.com
9 QQ.com
10 Twitter.com
TPDL 2013, Valletta, Malta,
September 23, 2013
19
• Alexa’s Top 10 websites
• No Mementos
• robots.txt exclusion
prevents crawl
Does this apply to popular sites?
Alexa
Rank
Web Site
Name
Sampled Mementos
1 Facebook.com No memento robots.txt exclusion
2 Google.com 15 mementos 1998 to 2012
3 YouTube.com 7 mementos 2006 to 2012
4 Yahoo.com 16 mementos 1997 to 2012
5 Baidu.com No memento robots.txt exclusion
6 Wikipedia.org 12 mementos 2001 to 2012
7 Live.org 15 mementos 1999 to 2012
8 Amazon.com 14 mementos 1999 to 2012
9 QQ.com 15 mementos 1998 to 2012
10 Twitter.com No memento robots.txt exclusion
TPDL 2013, Valletta, Malta,
September 23, 2013
20
all thumbnails at: http://www.cs.odu.edu/~mkelly/semester/2013_spring/20130127alexatop10/
Case Study with Ajax emphasis:
YouTube
Alexa
Rank
Web Site
Name
Sampled Mementos
1 Facebook.com No memento robots.txt exclusion
2 Google.com 15 mementos 1998 to 2012
3 YouTube.com 7 mementos 2006 to 2012
4 Yahoo.com 16 mementos 1997 to 2012
5 Baidu.com No memento robots.txt exclusion
6 Wikipedia.org 12 mementos 2001 to 2012
7 Live.org 15 mementos 1999 to 2012
8 Amazon.com 14 mementos 1999 to 2012
9 QQ.com 15 mementos 1998 to 2012
10 Twitter.com No memento robots.txt exclusion
TPDL 2013, Valletta, Malta,
September 23, 2013
21
Ajax‟s Effects on the Archivability of
YouTube
• Recently
Viewed
content
missing
TPDL 2013, Valletta, Malta,
September 23, 2013
22
2006 YouTube.com from Internet Archive
http://web.archive.org/web/20060427213420/http://youtube.com
Ajax‟s Effects on the Archivability of
YouTube
• Recently
Viewed
content
missing
• How much of
this is
because of
JavaScript?
TPDL 2013, Valletta, Malta,
September 23, 2013
23
2006 YouTube.com from Internet Archive
http://web.archive.org/web/20060427213420/http://youtube.com
YouTube as captured without
JavaScript capability
• Titles
fetched with
JavaScript
• Primary
content
(preview)
still missing
TPDL 2013, Valletta, Malta,
September 23, 2013
24
2006 YouTube.com from Internet Archive
http://web.archive.org/web/20060427213420/http://youtube.com
Dependent Loading of Resources
GET
http://web.archive.org/web/20121208145112cs_/http://s.yti
mg.com/yt/cssbin/www-core-vfl_OJqFG.css 404 (Not
Found) www.youtube.com:15
GET
http://web.archive.org/web/20121208145115js_/http://s.yti
mg.com/yt/jsbin/www-core-vfl8PDcRe.js 404 (Not Found)
www.youtube.com:45
Uncaught TypeError: Object #<Object> has no method
'setConfig' www.youtube.com:56
Uncaught TypeError: Cannot read property 'home' of undefined
www.youtube.com:76
Uncaught TypeError: Cannot read property 'ajax' of undefined
www.youtube.com:86
Uncaught TypeError: Object #<Object> has no method
'setConfig' www.youtube.com:101
Uncaught ReferenceError: _gel is not defined
www.youtube.com:1784
Uncaught TypeError: Object #<Object> has no method
'setConfig' www.youtube.com:1929
Uncaught TypeError: Cannot read property 'home' of undefined
www.youtube.com:524
GET
http://web.archive.org/web/20130101024721im_/http://i2.yti
mg.com/vi/1f7neSzDqvc/default.jpg 404 (Not Found)
TPDL 2013, Valletta, Malta,
September 23, 2013
25
YouTube 2011
http://web.archive.org/web/20110420002216/http://www.youtube.com/
Dependent Loading of Resources
GET
http://web.archive.org/web/20121208145112cs_/http://s.yti
mg.com/yt/cssbin/www-core-vfl_OJqFG.css 404 (Not
Found) www.youtube.com:15
GET
http://web.archive.org/web/20121208145115js_/http://s.yti
mg.com/yt/jsbin/www-core-vfl8PDcRe.js 404 (Not Found)
www.youtube.com:45
Uncaught TypeError: Object #<Object> has no method
'setConfig' www.youtube.com:56
Uncaught TypeError: Cannot read property 'home' of undefined
www.youtube.com:76
Uncaught TypeError: Cannot read property 'ajax' of undefined
www.youtube.com:86
Uncaught TypeError: Object #<Object> has no method
'setConfig' www.youtube.com:101
Uncaught ReferenceError: _gel is not defined
www.youtube.com:1784
Uncaught TypeError: Object #<Object> has no method
'setConfig' www.youtube.com:1929
Uncaught TypeError: Cannot read property 'home' of undefined
www.youtube.com:524
GET
http://web.archive.org/web/20130101024721im_/http://i2.yti
mg.com/vi/1f7neSzDqvc/default.jpg 404 (Not Found)
TPDL 2013, Valletta, Malta,
September 23, 2013
26
Missing CSS
Missing JS
Missing JS-
dependent
resources
YouTube 2011
http://web.archive.org/web/20110420002216/http://www.youtube.com/
Dependent Loading of Resources
GET
http://web.archive.org/web/20121208145112cs_/http://s.yti
mg.com/yt/cssbin/www-core-vfl_OJqFG.css 404 (Not
Found) www.youtube.com:15
GET
http://web.archive.org/web/20121208145115js_/http://s.yti
mg.com/yt/jsbin/www-core-vfl8PDcRe.js 404 (Not Found)
www.youtube.com:45
Uncaught TypeError: Object #<Object> has no method
'setConfig' www.youtube.com:56
Uncaught TypeError: Cannot read property 'home' of undefined
www.youtube.com:76
Uncaught TypeError: Cannot read property 'ajax' of undefined
www.youtube.com:86
Uncaught TypeError: Object #<Object> has no method
'setConfig' www.youtube.com:101
Uncaught ReferenceError: _gel is not defined
www.youtube.com:1784
Uncaught TypeError: Object #<Object> has no method
'setConfig' www.youtube.com:1929
Uncaught TypeError: Cannot read property 'home' of undefined
www.youtube.com:524
GET
http://web.archive.org/web/20130101024721im_/http://i2.yti
mg.com/vi/1f7neSzDqvc/default.jpg 404 (Not Found)
TPDL 2013, Valletta, Malta,
September 23, 2013
27
Missing CSS
Missing JS
Missing JS-
dependent
resources
YouTube 2011
http://web.archive.org/web/20110420002216/http://www.youtube.com/
Dependent Loading of Resources
GET
http://web.archive.org/web/20121208145112cs_/http://s.yti
mg.com/yt/cssbin/www-core-vfl_OJqFG.css 404 (Not
Found) www.youtube.com:15
GET
http://web.archive.org/web/20121208145115js_/http://s.yti
mg.com/yt/jsbin/www-core-vfl8PDcRe.js 404 (Not Found)
www.youtube.com:45
Uncaught TypeError: Object #<Object> has no method
'setConfig' www.youtube.com:56
Uncaught TypeError: Cannot read property 'home' of undefined
www.youtube.com:76
Uncaught TypeError: Cannot read property 'ajax' of undefined
www.youtube.com:86
Uncaught TypeError: Object #<Object> has no method
'setConfig' www.youtube.com:101
Uncaught ReferenceError: _gel is not defined
www.youtube.com:1784
Uncaught TypeError: Object #<Object> has no method
'setConfig' www.youtube.com:1929
Uncaught TypeError: Cannot read property 'home' of undefined
www.youtube.com:524
GET
http://web.archive.org/web/20130101024721im_/http://i2.yti
mg.com/vi/1f7neSzDqvc/default.jpg 404 (Not Found)
TPDL 2013, Valletta, Malta,
September 23, 2013
28
Missing CSS
Missing JS
Missing JS-
dependent
resources
YouTube 2011
http://web.archive.org/web/20110420002216/http://www.youtube.com/
TPDL 2013, Valletta, Malta,
September 23, 2013
29
Live Web Leaks
Into Archive
Via Javascript
TPDL 2013, Valletta, Malta,
September 23, 2013
30
see: http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html
Sept 3, 2008
2012
Stepping Back, NASA 1999
• Few embedded resources
• Little to no JavaScript
– If some, not Ajax
• No resources dependent on
JavaScript
TPDL 2013, Valletta, Malta,
September 23, 2013
31
http://web.archive.org/web/19991109105105/http://www.nasa.gov
Stepping Back, NASA 2003
• Content requires JS
• Accessibility goes awry
• Archivability decreases
TPDL 2013, Valletta, Malta,
September 23, 2013
32
var fstr = '';
if(hasFlash(6)) {
fstr+='<object id="screenreader.swf">...</object>';
window.status = 'Flash 6 Detected...';
} else {
fstr+='...To view the enhanced version of NASA.gov, you must
have Flash 6 installed....';
}
with(document) { open('text/html'); write(fstr); close(); }
http://web.archive.org/web/20041014205942/http://www.nasa.gov
Stepping Back, NASA 2007
• JS check removed
• Content is Accessible
• Archivability Goes Up
• AccessibilityArchivability
TPDL 2013, Valletta, Malta,
September 23, 2013
33
http://web.archive.org/web/20071011055607/http://www.nasa.gov
Reinforcing Case: Wikipedia
TPDL 2013, Valletta, Malta,
September 23, 2013
34
20022001 2003 2004 2005 2006
2007 2008 2009 2010 2011 2012
Summary
TPDL 2013, Valletta, Malta,
September 23, 2013
35
• Javascript: good for interaction, bad for archiving
– crawlers miss URIs at crawl time
– rendering yesterday's pages causes them to reach into
today's web
• Different trends of archivability over time:
– YouTube: bad-->worse
– NASA: good-->bad-->good
– Wikipedia: good
– see all: http://www.cs.odu.edu/~mkelly/semester/2013_spring/20130127alexatop10/
• Overall, archivability is getting worse
– 24% increase in missing embedded resources from 2006-
2010 due to Javascript

More Related Content

What's hot

More Archives, More Better
More Archives, More Better More Archives, More Better
More Archives, More Better Michael Nelson
 
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred RepresentationsScripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred RepresentationsJustin Brunelle
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesMichael Nelson
 
To the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationTo the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationMartin Klein
 
Avoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorkerAvoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorkerSawood Alam
 
Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Mat Kelly
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesMichael Nelson
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesYasmin AlNoamany, PhD
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesYasmin AlNoamany, PhD
 
Recommending Archived Webpages Using Only The URI
Recommending Archived Webpages Using Only The URIRecommending Archived Webpages Using Only The URI
Recommending Archived Webpages Using Only The URILulwahMA
 
An Introduction to Linked Data for Librarians (2018-06-28)
An Introduction to Linked Data for Librarians (2018-06-28)An Introduction to Linked Data for Librarians (2018-06-28)
An Introduction to Linked Data for Librarians (2018-06-28)Cliff Landis
 
Case Study: UNC Eshelman School of Pharmacy Web Site
Case Study: UNC Eshelman School of Pharmacy Web SiteCase Study: UNC Eshelman School of Pharmacy Web Site
Case Study: UNC Eshelman School of Pharmacy Web Sitejzheel
 
Quantifying Orphaned Annotations in Hypothes.is
Quantifying Orphaned Annotations in Hypothes.isQuantifying Orphaned Annotations in Hypothes.is
Quantifying Orphaned Annotations in Hypothes.ismaturban
 
Can’t Find Your 404s?
Can’t Find Your 404s?Can’t Find Your 404s?
Can’t Find Your 404s?Michael Nelson
 
It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011Ross Singer
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web ResourcesMartin Klein
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live WebMartin Klein
 
The Off-Topic Memento Toolkit
The Off-Topic Memento ToolkitThe Off-Topic Memento Toolkit
The Off-Topic Memento ToolkitShawn Jones
 

What's hot (20)

More Archives, More Better
More Archives, More Better More Archives, More Better
More Archives, More Better
 
The Web We Want
The Web We WantThe Web We Want
The Web We Want
 
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred RepresentationsScripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
Scripts in a Frame: A Two-Tiered Approach for Archiving Deferred Representations
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
 
To the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationTo the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly Communication
 
Avoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorkerAvoiding Zombies in Archival Replay Using ServiceWorker
Avoiding Zombies in Archival Replay Using ServiceWorker
 
Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count
 
Summarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniques
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web Archives
 
PID Signposting Pattern
PID Signposting PatternPID Signposting Pattern
PID Signposting Pattern
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web Archives
 
Recommending Archived Webpages Using Only The URI
Recommending Archived Webpages Using Only The URIRecommending Archived Webpages Using Only The URI
Recommending Archived Webpages Using Only The URI
 
An Introduction to Linked Data for Librarians (2018-06-28)
An Introduction to Linked Data for Librarians (2018-06-28)An Introduction to Linked Data for Librarians (2018-06-28)
An Introduction to Linked Data for Librarians (2018-06-28)
 
Case Study: UNC Eshelman School of Pharmacy Web Site
Case Study: UNC Eshelman School of Pharmacy Web SiteCase Study: UNC Eshelman School of Pharmacy Web Site
Case Study: UNC Eshelman School of Pharmacy Web Site
 
Quantifying Orphaned Annotations in Hypothes.is
Quantifying Orphaned Annotations in Hypothes.isQuantifying Orphaned Annotations in Hypothes.is
Quantifying Orphaned Annotations in Hypothes.is
 
Can’t Find Your 404s?
Can’t Find Your 404s?Can’t Find Your 404s?
Can’t Find Your 404s?
 
It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011It's not rocket surgery - Linked In: ALA 2011
It's not rocket surgery - Linked In: ALA 2011
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web Resources
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live Web
 
The Off-Topic Memento Toolkit
The Off-Topic Memento ToolkitThe Off-Topic Memento Toolkit
The Off-Topic Memento Toolkit
 

Viewers also liked

Web Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionWeb Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionSawood Alam
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015Michael Nelson
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptMichael Nelson
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesMichael Nelson
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Michael Nelson
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple ArchivesMichael Nelson
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesMichael Nelson
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web ArchivesMichael Nelson
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web ArchivesMichael Nelson
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?Michael Nelson
 
Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Michael Nelson
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingUsing Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingYasmin AlNoamany, PhD
 
Software as a Well-Formed Research Object
Software as a Well-Formed Research ObjectSoftware as a Well-Formed Research Object
Software as a Well-Formed Research ObjectYasmin AlNoamany, PhD
 
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
OAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange ProjectOAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange Project
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange ProjectMichael Nelson
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?Michael Nelson
 

Viewers also liked (15)

Web Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionWeb Archiving: A Brief Introduction
Web Archiving: A Brief Introduction
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
 
We Need Multiple, Independent Web Archives
We Need Multiple, Independent Web ArchivesWe Need Multiple, Independent Web Archives
We Need Multiple, Independent Web Archives
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web Archives
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
 
Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingUsing Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
 
Software as a Well-Formed Research Object
Software as a Well-Formed Research ObjectSoftware as a Well-Formed Research Object
Software as a Well-Formed Research Object
 
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
OAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange ProjectOAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange Project
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?
 

Similar to On the Change in Archivability of Websites Over Time

Introducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research GroupIntroducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research GroupSawood Alam
 
Archive What I See Now - Archive-It Partner Meeting 2013 2013
Archive What I See Now - Archive-It Partner Meeting 2013 2013Archive What I See Now - Archive-It Partner Meeting 2013 2013
Archive What I See Now - Archive-It Partner Meeting 2013 2013Mat Kelly
 
Mojo+presentation+1
Mojo+presentation+1Mojo+presentation+1
Mojo+presentation+1Craig Condon
 
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingMementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingSawood Alam
 
Automated Interactions With WorldCat: A Look at OCLC’s WorldCat Metadata API
Automated Interactions With WorldCat:  A Look at OCLC’s WorldCat Metadata APIAutomated Interactions With WorldCat:  A Look at OCLC’s WorldCat Metadata API
Automated Interactions With WorldCat: A Look at OCLC’s WorldCat Metadata APITerry Reese
 
SEO AJAX Crawlability in a Responsive Publisher World
SEO AJAX Crawlability in a Responsive Publisher WorldSEO AJAX Crawlability in a Responsive Publisher World
SEO AJAX Crawlability in a Responsive Publisher WorldEric Wu
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital PreservationMat Kelly
 
Discovery at the RLUK conference 2012
Discovery at the RLUK conference 2012Discovery at the RLUK conference 2012
Discovery at the RLUK conference 2012andymcg
 
W3C WAI update at OZeWAI conference 2013
W3C WAI update at OZeWAI conference 2013W3C WAI update at OZeWAI conference 2013
W3C WAI update at OZeWAI conference 2013Andrew Arch
 
Readying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web BundlesReadying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web BundlesSawood Alam
 
Preserving the web
Preserving the webPreserving the web
Preserving the webJeremy Floyd
 
Drupal 8 Preview for Site Builders
Drupal 8 Preview for Site BuildersDrupal 8 Preview for Site Builders
Drupal 8 Preview for Site BuildersAcquia
 
jQuery - Chapter 1 - Introduction
 jQuery - Chapter 1 - Introduction jQuery - Chapter 1 - Introduction
jQuery - Chapter 1 - IntroductionWebStackAcademy
 
Time -Travel on the Internet
Time -Travel on the InternetTime -Travel on the Internet
Time -Travel on the InternetIRJET Journal
 
Detecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARCDetecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARCMichele Weigle
 
Making the Switch, Part 1: Top 5 Things to Consider When Evaluating Drupal
Making the Switch, Part 1: Top 5 Things to Consider When Evaluating DrupalMaking the Switch, Part 1: Top 5 Things to Consider When Evaluating Drupal
Making the Switch, Part 1: Top 5 Things to Consider When Evaluating DrupalAcquia
 
Link your Way to Successful Content Management with MadCap Flare
Link your Way to Successful Content Management with MadCap FlareLink your Way to Successful Content Management with MadCap Flare
Link your Way to Successful Content Management with MadCap FlareDenise Kadilak
 
Online Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and MuseumsOnline Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and Museumsmherbison
 
How to Optimize Your Drupal Site with Structured Content
How to Optimize Your Drupal Site with Structured ContentHow to Optimize Your Drupal Site with Structured Content
How to Optimize Your Drupal Site with Structured ContentAcquia
 
Best Practices in Widget Development - Examples and Counterexamples
Best Practices in Widget Development  - Examples and CounterexamplesBest Practices in Widget Development  - Examples and Counterexamples
Best Practices in Widget Development - Examples and CounterexamplesROLE Project
 

Similar to On the Change in Archivability of Websites Over Time (20)

Introducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research GroupIntroducing Web Archiving and WSDL Research Group
Introducing Web Archiving and WSDL Research Group
 
Archive What I See Now - Archive-It Partner Meeting 2013 2013
Archive What I See Now - Archive-It Partner Meeting 2013 2013Archive What I See Now - Archive-It Partner Meeting 2013 2013
Archive What I See Now - Archive-It Partner Meeting 2013 2013
 
Mojo+presentation+1
Mojo+presentation+1Mojo+presentation+1
Mojo+presentation+1
 
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingMementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
 
Automated Interactions With WorldCat: A Look at OCLC’s WorldCat Metadata API
Automated Interactions With WorldCat:  A Look at OCLC’s WorldCat Metadata APIAutomated Interactions With WorldCat:  A Look at OCLC’s WorldCat Metadata API
Automated Interactions With WorldCat: A Look at OCLC’s WorldCat Metadata API
 
SEO AJAX Crawlability in a Responsive Publisher World
SEO AJAX Crawlability in a Responsive Publisher WorldSEO AJAX Crawlability in a Responsive Publisher World
SEO AJAX Crawlability in a Responsive Publisher World
 
Browser-Based Digital Preservation
Browser-Based Digital PreservationBrowser-Based Digital Preservation
Browser-Based Digital Preservation
 
Discovery at the RLUK conference 2012
Discovery at the RLUK conference 2012Discovery at the RLUK conference 2012
Discovery at the RLUK conference 2012
 
W3C WAI update at OZeWAI conference 2013
W3C WAI update at OZeWAI conference 2013W3C WAI update at OZeWAI conference 2013
W3C WAI update at OZeWAI conference 2013
 
Readying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web BundlesReadying Web Archives to Consume and Leverage Web Bundles
Readying Web Archives to Consume and Leverage Web Bundles
 
Preserving the web
Preserving the webPreserving the web
Preserving the web
 
Drupal 8 Preview for Site Builders
Drupal 8 Preview for Site BuildersDrupal 8 Preview for Site Builders
Drupal 8 Preview for Site Builders
 
jQuery - Chapter 1 - Introduction
 jQuery - Chapter 1 - Introduction jQuery - Chapter 1 - Introduction
jQuery - Chapter 1 - Introduction
 
Time -Travel on the Internet
Time -Travel on the InternetTime -Travel on the Internet
Time -Travel on the Internet
 
Detecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARCDetecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARC
 
Making the Switch, Part 1: Top 5 Things to Consider When Evaluating Drupal
Making the Switch, Part 1: Top 5 Things to Consider When Evaluating DrupalMaking the Switch, Part 1: Top 5 Things to Consider When Evaluating Drupal
Making the Switch, Part 1: Top 5 Things to Consider When Evaluating Drupal
 
Link your Way to Successful Content Management with MadCap Flare
Link your Way to Successful Content Management with MadCap FlareLink your Way to Successful Content Management with MadCap Flare
Link your Way to Successful Content Management with MadCap Flare
 
Online Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and MuseumsOnline Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and Museums
 
How to Optimize Your Drupal Site with Structured Content
How to Optimize Your Drupal Site with Structured ContentHow to Optimize Your Drupal Site with Structured Content
How to Optimize Your Drupal Site with Structured Content
 
Best Practices in Widget Development - Examples and Counterexamples
Best Practices in Widget Development  - Examples and CounterexamplesBest Practices in Widget Development  - Examples and Counterexamples
Best Practices in Widget Development - Examples and Counterexamples
 

More from Michael Nelson

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Michael Nelson
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesMichael Nelson
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsMichael Nelson
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsMichael Nelson
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesMichael Nelson
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesMichael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
 

More from Michael Nelson (9)

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pages
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 

Recently uploaded

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

On the Change in Archivability of Websites Over Time

  • 1. On the Change in Archivability of Websites Over Time Mat Kelly, Justin F. Brunelle, Michele C. Weigle, and Michael L. Nelson Old Dominion University {mkelly,jbrunelle,mweigle,mln}@cs.odu.edu TPDL 2013, Valletta, Malta, September 23, 2013 1
  • 2. Preserving Web Pages • Identify page content needed for re-render • Save page contents – HTML, Images, CSS, JavaScript • Rewrite inter-resource references for replay TPDL 2013, Valletta, Malta, September 23, 2013 2 From M. Klein, J. F. Brunelle, WADL 2013
  • 3. Ease of Archiving • HTML – text-based, references other resources • Images • CSS • JavaScript EXAMPLE <html><body> <img src=“meatball-logo.png” /> </body></html> TPDL 2013, Valletta, Malta, September 23, 2013 3
  • 4. Ease of Archiving • HTML • Images – binary data, no embedded URIs • CSS • JavaScript EXAMPLE TPDL 2013, Valletta, Malta, September 23, 2013 4
  • 5. Ease of Archiving • HTML • Images • CSS – text-based, references other resources • JavaScript EXAMPLE body { margin: 5px; color: black; background-image: url(„outerSpace.png‟); } TPDL 2013, Valletta, Malta, September 23, 2013 5
  • 6. Ease of Archiving • HTML • Images • CSS • JavaScript – text-based, references other resources, URIs (not necessarily known until runtime) EXAMPLE $.ajax({ url: “meatball-logo” +“.”+“png”; //build logo URI at runtime }); TPDL 2013, Valletta, Malta, September 23, 2013 6
  • 7. Ease of Archiving • HTML • Images • CSS • JavaScript – text-based, references other resources, URIs (not necessarily known until runtime) EXAMPLE $.ajax({ url: “wormlogo” +“.”+“png”; //build logo URI at runtime }); TPDL 2013, Valletta, Malta, September 23, 2013 7
  • 8. BIG Problem • JavaScript meant for browser – Capable of JS Execution • Crawlers only recently became capable of exec • Crawlers getting smarter at extracting URIs – Nowhere near perfect TPDL 2013, Valletta, Malta, September 23, 2013 8
  • 9. Identifying Missing Resources TPDL 2013, Valletta, Malta, September 23, 2013 9 • Sometimes missing resources are subtle From Live Web From Internet Archive Apr. 30, 2012 http://web.archive.org/web/20120430235712/http://maps.google.com
  • 10. Identifying Missing Resources TPDL 2013, Valletta, Malta, September 23, 2013 10 • Archived version not interactive & resources are missing From Live Web From Internet Archive Apr. 30, 2012 http://web.archive.org/web/20120430235712/http://maps.google.com
  • 11. Identifying Missing Resources TPDL 2013, Valletta, Malta, September 23, 2013 11 • Other times, a failed AJAX call prevents other resources from loading YouTube (2011) in archives with failed Ajax call http://web.archive.org/web/20110420002216/http://www.youtube.com/
  • 12. Why JavaScript makes it difficult • Archival crawlers don’t interpret DOM, are made for capture and thus faster TPDL 2013, Valletta, Malta, September 23, 2013 12 Same JS Code Different Agents functionality1() functionality2() functionality3() Different functionality (Recent versions) • Interprets JavaScript • Enabled pages w/ JS to be archived better!
  • 13. JavaScript and Accessibility TPDL 2013, Valletta, Malta, September 23, 2013 13 g u i d e l i n e s secti n • Content displaying should not be dependent on script execution • U.S. Government – required to comply with accessibility standards – Content should be available to crawler w/o JS – Therefore, government sites are better preserved • Right?
  • 14. If not Accessible then not Archivable TPDL 2013, Valletta, Malta, September 23, 2013 14 NASA.gov in Internet Archive, 2004 http://web.archive.org/web/20041014205942/http://www.nasa.gov
  • 15. Finding out where NASA went wrong TPDL 2013, Valletta, Malta, September 23, 2013 15 nasa.gov in 1996 nasa.gov in 1997 nasa.gov in 2012 … Query Archives Step 1 Capture DOM Requests for resources Screenshot of Web page HTTP .jp g Step 2 Webkit • Modern rendering • Browser-like JS behavior
  • 16. 1997 NASA.gov over time TPDL 2013, Valletta, Malta, September 23, 2013 16 1996 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
  • 17. 1997 NASA.gov over time TPDL 2013, Valletta, Malta, September 23, 2013 17 1996 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
  • 18. Does this apply to popular sites? Alexa Rank Web Site Name 1 Facebook.co m 2 Google.com 3 YouTube.com 4 Yahoo.com 5 Baidu.com 6 Wikipedia.org 7 Live.org 8 Amazon.com 9 QQ.com 10 Twitter.com TPDL 2013, Valletta, Malta, September 23, 2013 18 • Alexa’s Top 10 websites
  • 19. Does this apply to popular sites? Alexa Rank Web Site Name 1 Facebook.co m 2 Google.com 3 YouTube.com 4 Yahoo.com 5 Baidu.com 6 Wikipedia.org 7 Live.org 8 Amazon.com 9 QQ.com 10 Twitter.com TPDL 2013, Valletta, Malta, September 23, 2013 19 • Alexa’s Top 10 websites • No Mementos • robots.txt exclusion prevents crawl
  • 20. Does this apply to popular sites? Alexa Rank Web Site Name Sampled Mementos 1 Facebook.com No memento robots.txt exclusion 2 Google.com 15 mementos 1998 to 2012 3 YouTube.com 7 mementos 2006 to 2012 4 Yahoo.com 16 mementos 1997 to 2012 5 Baidu.com No memento robots.txt exclusion 6 Wikipedia.org 12 mementos 2001 to 2012 7 Live.org 15 mementos 1999 to 2012 8 Amazon.com 14 mementos 1999 to 2012 9 QQ.com 15 mementos 1998 to 2012 10 Twitter.com No memento robots.txt exclusion TPDL 2013, Valletta, Malta, September 23, 2013 20 all thumbnails at: http://www.cs.odu.edu/~mkelly/semester/2013_spring/20130127alexatop10/
  • 21. Case Study with Ajax emphasis: YouTube Alexa Rank Web Site Name Sampled Mementos 1 Facebook.com No memento robots.txt exclusion 2 Google.com 15 mementos 1998 to 2012 3 YouTube.com 7 mementos 2006 to 2012 4 Yahoo.com 16 mementos 1997 to 2012 5 Baidu.com No memento robots.txt exclusion 6 Wikipedia.org 12 mementos 2001 to 2012 7 Live.org 15 mementos 1999 to 2012 8 Amazon.com 14 mementos 1999 to 2012 9 QQ.com 15 mementos 1998 to 2012 10 Twitter.com No memento robots.txt exclusion TPDL 2013, Valletta, Malta, September 23, 2013 21
  • 22. Ajax‟s Effects on the Archivability of YouTube • Recently Viewed content missing TPDL 2013, Valletta, Malta, September 23, 2013 22 2006 YouTube.com from Internet Archive http://web.archive.org/web/20060427213420/http://youtube.com
  • 23. Ajax‟s Effects on the Archivability of YouTube • Recently Viewed content missing • How much of this is because of JavaScript? TPDL 2013, Valletta, Malta, September 23, 2013 23 2006 YouTube.com from Internet Archive http://web.archive.org/web/20060427213420/http://youtube.com
  • 24. YouTube as captured without JavaScript capability • Titles fetched with JavaScript • Primary content (preview) still missing TPDL 2013, Valletta, Malta, September 23, 2013 24 2006 YouTube.com from Internet Archive http://web.archive.org/web/20060427213420/http://youtube.com
  • 25. Dependent Loading of Resources GET http://web.archive.org/web/20121208145112cs_/http://s.yti mg.com/yt/cssbin/www-core-vfl_OJqFG.css 404 (Not Found) www.youtube.com:15 GET http://web.archive.org/web/20121208145115js_/http://s.yti mg.com/yt/jsbin/www-core-vfl8PDcRe.js 404 (Not Found) www.youtube.com:45 Uncaught TypeError: Object #<Object> has no method 'setConfig' www.youtube.com:56 Uncaught TypeError: Cannot read property 'home' of undefined www.youtube.com:76 Uncaught TypeError: Cannot read property 'ajax' of undefined www.youtube.com:86 Uncaught TypeError: Object #<Object> has no method 'setConfig' www.youtube.com:101 Uncaught ReferenceError: _gel is not defined www.youtube.com:1784 Uncaught TypeError: Object #<Object> has no method 'setConfig' www.youtube.com:1929 Uncaught TypeError: Cannot read property 'home' of undefined www.youtube.com:524 GET http://web.archive.org/web/20130101024721im_/http://i2.yti mg.com/vi/1f7neSzDqvc/default.jpg 404 (Not Found) TPDL 2013, Valletta, Malta, September 23, 2013 25 YouTube 2011 http://web.archive.org/web/20110420002216/http://www.youtube.com/
  • 26. Dependent Loading of Resources GET http://web.archive.org/web/20121208145112cs_/http://s.yti mg.com/yt/cssbin/www-core-vfl_OJqFG.css 404 (Not Found) www.youtube.com:15 GET http://web.archive.org/web/20121208145115js_/http://s.yti mg.com/yt/jsbin/www-core-vfl8PDcRe.js 404 (Not Found) www.youtube.com:45 Uncaught TypeError: Object #<Object> has no method 'setConfig' www.youtube.com:56 Uncaught TypeError: Cannot read property 'home' of undefined www.youtube.com:76 Uncaught TypeError: Cannot read property 'ajax' of undefined www.youtube.com:86 Uncaught TypeError: Object #<Object> has no method 'setConfig' www.youtube.com:101 Uncaught ReferenceError: _gel is not defined www.youtube.com:1784 Uncaught TypeError: Object #<Object> has no method 'setConfig' www.youtube.com:1929 Uncaught TypeError: Cannot read property 'home' of undefined www.youtube.com:524 GET http://web.archive.org/web/20130101024721im_/http://i2.yti mg.com/vi/1f7neSzDqvc/default.jpg 404 (Not Found) TPDL 2013, Valletta, Malta, September 23, 2013 26 Missing CSS Missing JS Missing JS- dependent resources YouTube 2011 http://web.archive.org/web/20110420002216/http://www.youtube.com/
  • 27. Dependent Loading of Resources GET http://web.archive.org/web/20121208145112cs_/http://s.yti mg.com/yt/cssbin/www-core-vfl_OJqFG.css 404 (Not Found) www.youtube.com:15 GET http://web.archive.org/web/20121208145115js_/http://s.yti mg.com/yt/jsbin/www-core-vfl8PDcRe.js 404 (Not Found) www.youtube.com:45 Uncaught TypeError: Object #<Object> has no method 'setConfig' www.youtube.com:56 Uncaught TypeError: Cannot read property 'home' of undefined www.youtube.com:76 Uncaught TypeError: Cannot read property 'ajax' of undefined www.youtube.com:86 Uncaught TypeError: Object #<Object> has no method 'setConfig' www.youtube.com:101 Uncaught ReferenceError: _gel is not defined www.youtube.com:1784 Uncaught TypeError: Object #<Object> has no method 'setConfig' www.youtube.com:1929 Uncaught TypeError: Cannot read property 'home' of undefined www.youtube.com:524 GET http://web.archive.org/web/20130101024721im_/http://i2.yti mg.com/vi/1f7neSzDqvc/default.jpg 404 (Not Found) TPDL 2013, Valletta, Malta, September 23, 2013 27 Missing CSS Missing JS Missing JS- dependent resources YouTube 2011 http://web.archive.org/web/20110420002216/http://www.youtube.com/
  • 28. Dependent Loading of Resources GET http://web.archive.org/web/20121208145112cs_/http://s.yti mg.com/yt/cssbin/www-core-vfl_OJqFG.css 404 (Not Found) www.youtube.com:15 GET http://web.archive.org/web/20121208145115js_/http://s.yti mg.com/yt/jsbin/www-core-vfl8PDcRe.js 404 (Not Found) www.youtube.com:45 Uncaught TypeError: Object #<Object> has no method 'setConfig' www.youtube.com:56 Uncaught TypeError: Cannot read property 'home' of undefined www.youtube.com:76 Uncaught TypeError: Cannot read property 'ajax' of undefined www.youtube.com:86 Uncaught TypeError: Object #<Object> has no method 'setConfig' www.youtube.com:101 Uncaught ReferenceError: _gel is not defined www.youtube.com:1784 Uncaught TypeError: Object #<Object> has no method 'setConfig' www.youtube.com:1929 Uncaught TypeError: Cannot read property 'home' of undefined www.youtube.com:524 GET http://web.archive.org/web/20130101024721im_/http://i2.yti mg.com/vi/1f7neSzDqvc/default.jpg 404 (Not Found) TPDL 2013, Valletta, Malta, September 23, 2013 28 Missing CSS Missing JS Missing JS- dependent resources YouTube 2011 http://web.archive.org/web/20110420002216/http://www.youtube.com/
  • 29. TPDL 2013, Valletta, Malta, September 23, 2013 29 Live Web Leaks Into Archive Via Javascript
  • 30. TPDL 2013, Valletta, Malta, September 23, 2013 30 see: http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html Sept 3, 2008 2012
  • 31. Stepping Back, NASA 1999 • Few embedded resources • Little to no JavaScript – If some, not Ajax • No resources dependent on JavaScript TPDL 2013, Valletta, Malta, September 23, 2013 31 http://web.archive.org/web/19991109105105/http://www.nasa.gov
  • 32. Stepping Back, NASA 2003 • Content requires JS • Accessibility goes awry • Archivability decreases TPDL 2013, Valletta, Malta, September 23, 2013 32 var fstr = ''; if(hasFlash(6)) { fstr+='<object id="screenreader.swf">...</object>'; window.status = 'Flash 6 Detected...'; } else { fstr+='...To view the enhanced version of NASA.gov, you must have Flash 6 installed....'; } with(document) { open('text/html'); write(fstr); close(); } http://web.archive.org/web/20041014205942/http://www.nasa.gov
  • 33. Stepping Back, NASA 2007 • JS check removed • Content is Accessible • Archivability Goes Up • AccessibilityArchivability TPDL 2013, Valletta, Malta, September 23, 2013 33 http://web.archive.org/web/20071011055607/http://www.nasa.gov
  • 34. Reinforcing Case: Wikipedia TPDL 2013, Valletta, Malta, September 23, 2013 34 20022001 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
  • 35. Summary TPDL 2013, Valletta, Malta, September 23, 2013 35 • Javascript: good for interaction, bad for archiving – crawlers miss URIs at crawl time – rendering yesterday's pages causes them to reach into today's web • Different trends of archivability over time: – YouTube: bad-->worse – NASA: good-->bad-->good – Wikipedia: good – see all: http://www.cs.odu.edu/~mkelly/semester/2013_spring/20130127alexatop10/ • Overall, archivability is getting worse – 24% increase in missing embedded resources from 2006- 2010 due to Javascript