SlideShare a Scribd company logo
1 of 35
We Need Multiple, Independent Web Archives
Panel 4: Social Media Research Data, Tools, and Methodologies
Michael L. Nelson
Old Dominion University
Web Science & Digital Libraries Research Group
www.cs.odu.edu/~mln/
@phonedude_mln
With:
ODU: Michele C. Weigle
Los Alamos National Laboratory: Herbert Van de Sompel
timetravel.mementoweb.org
http://timetravel.mementoweb.org/list/20140525002314/http://www.bbc.co.uk/
e.g., bbc.co.uk in six different archives…
Seagal’s Law
A man with a watch knows what time it is.
A man with two watches is never sure.
How to resolve conflicting archives?
Personalization, GeoIP, mobile vs. desktop, etc.
means “the” page rarely exists, only “a” page.
Mat Kelly, Justin F. Brunelle, Michele C. Weigle, and Michael L. Nelson,
A Method for Identifying Personalized Representations in Web Archives,
D-Lib Magazine, 19(11/12), 2013.
http://www.dlib.org/dlib/november13/kelly/11kelly.html
Why we need multiple,
independent archives…
A single archive is vulnerable
http://www.bbc.com/news/uk-politics-24924185
http://ws-dl.blogspot.com/2013/11/2013-11-21-conservative-party-speeches.html
Houston, Tranquility Base Here. The Eagle has landed.
see also: http://ws-dl.blogspot.com/2013/03/2013-03-22-ntrs-web-archives-and-why-we.html
http://www.theguardian.com/technology/2015/feb/19/google-acknowledges-some-people-want-right-to-be-forgotten
$ curl –I "http://www.thedailybeast.com/articles/2016/08/11/i-
got-three-grindr-dates-in-an-hour-in-the-olympic-village.html"
HTTP/1.1 301 Moved Permanently
Access-Control-Allow-Origin: *
Age: 0
Cache-Control: max-age=60
Content-Type: text/html; charset=iso-8859-1
Date: Thu, 18 Aug 2016 01:13:46 GMT
Location: http://www.thedailybeast.com/articles/2016/08/11/a-
note-from-the-editors.html
RealAge: 0
Server: Apache
Vary: Accept-Encoding, User-Agent
Via: 1.1 varnish
X-BackEnd: default
X-Cache: MISS
X-Cacheable: YES
X-Restarts: 0
X-UA-Device: pc
X-Varnish: 995407903
Connection: keep-alive
http://www.usnews.com/news/articles/2016-08-17/wayback-machine-wont-censor-archive-for-taste-director-says-after-olympics-article-scrubbed
But who pays for those extra archives?
1TB endowment = ~$4700: http://blog.dshr.org/2011/02/paying-for-long-term-storage.html
see also: http://blog.dshr.org/2011/01/memento-marketplace-for-archiving.html
Archives Aren’t Magic Web Sites
They’re Just Web Sites.
If you used Mummify, you’re now left with a bunch of defunct, shortened links like:
https://mummify.it/XbmcMfE3
Don’t throw away link semantics! See: http://robustlinks.mementoweb.org
Economics Working Against Archives
In the paper world in order to monetize their content the
copyright owner had to maximize the number of copies
of it. In the Web world, in order to monetize their content
the copyright owner has to minimize the number of copies.
Thus the fundamental economic motivation for Web
content militates against its preservation in the ways
that Herbert and I would like.
--David Rosenthal
http://blog.dshr.org/2015/02/the-evanescent-web.html
“We’ll use the cloud!”
https://www.chriswatterston.com/blog/my-there-no-cloud-sticker
http://www.bbc.com/future/story/20120927-the-decaying-web
On January 28 2011, three days into the fierce protests that would
eventually oust the Egyptian president Hosni Mubarak, a Twitter
user called Farrah posted a link to a picture that supposedly showed
an armed man as he ran on a “rooftop during clashes between police
and protesters in Suez”. I say supposedly, because both the tweet
and the picture it linked to no longer exist. Instead they have
been replaced with error messages that claim the message – and its
contents – “doesn’t exist”.
Missing Tweet & Pic
https://twitter.com/Farrah3m/status/31727870736859137 http://twitpic.com/3uvo6z
http://ws-dl.blogspot.com/2013/05/2013-05-07-who-is-archiving-your-tweets.html
In May 2013, not completely missing…
In February 2015, completely missing.
http://topsy.com/http://twitpic.com/3uvo6z
In 2016, Redirecting
http://topsy.com/http://twitpic.com/3uvo6z
In 2016, Redirecting
http://topsy.com/http://twitpic.com/3uvo6z
No Server == No HTTP Event == Nothing to Archive
http://topsy.com/http://twitpic.com/3uvo6z
Hany M. SalahEldeen, Michael L. Nelson, Losing My Revolution: How Many Resources Shared on Social Media Have Been
Lost?, Proceedings of TPDL 2012. http://arxiv.org/abs/1209.3026
Hany SalahEldeen, Michael L. Nelson, Resurrecting My Revolution: Using Social Link Neighborhood in Bringing Context to
the Disappearing Web, Proceedings of TPDL 2013. http://arxiv.org/abs/1309.2648
Missing: 11% year 1, 7%/year afterwards
Archived: 7% year 1, 15%/year afterwards
Malaysia Airlines Flight 17 (MH17)
http://web.archive.org/web/20140717152222/http://vk.com/strelkov_info
http://www.csmonitor.com/World/Europe/2014/0717/Web-evidence-points-to-pro-Russia-rebels-in-downing-of-MH17-video
http://www.newyorker.com/magazine/2015/01/26/cobweb
(not really archived as well as you think)
Ed and I Discuss Who Has What…
https://twitter.com/phonedude_mln/status/490171976389238784
Remember MH17?
https://twitter.com/phonedude_mln/status/490171976389238784
Alex is now 404.
Would multiple archives have convinced him?
https://twitter.com/quicknquiet
Do we really have
“a perfect tool to produce `evidence’ of any kind”?
@AstroKatie Schools @gary4205
https://twitter.com/AstroKatie/status/765344020184739840
But can you prove he didn’t say this?
Or that she didn’t say this?
(remember: black hats can use tools created by white hats)
Mutt and Jeff
http://quoteinvestigator.com/2013/04/11/better-light/
Hey #Twitter, did you know there’s flooding in LA…
https://www.facebook.com/KevinFreyTV/photos/a.1678627819032359.1073741829.1675465999348541/1834217933473346/?type=1&theater
Reminder: Facebook ~5X Larger Than Twitter
Summary
• Seagal’s Law has come to web archiving
– Learn more about archive interoperability:
http://mementoweb.org/
• Archived web is incomplete, unstable, unreliable, and
unevenly distributed
– Always true for archives, but shouldn’t we expect better?
– Learn more about archival verifiability:
https://mellon.org/grants/grants-database/grants/old-dominion-
university/11600663/

More Related Content

What's hot

Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Shawn Jones
 
Learning & Web 2.0: It's all about Play!
Learning & Web 2.0:  It's all about Play!Learning & Web 2.0:  It's all about Play!
Learning & Web 2.0: It's all about Play!
hblowers
 
Surfing Cs & Wading 2.0 Tide Pools
Surfing Cs & Wading 2.0 Tide PoolsSurfing Cs & Wading 2.0 Tide Pools
Surfing Cs & Wading 2.0 Tide Pools
hblowers
 

What's hot (20)

Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
 
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
 
The Off-Topic Memento Toolkit
The Off-Topic Memento ToolkitThe Off-Topic Memento Toolkit
The Off-Topic Memento Toolkit
 
Mapping the Dutch Blogosphere
Mapping the Dutch BlogosphereMapping the Dutch Blogosphere
Mapping the Dutch Blogosphere
 
Storytelling With Web Archives
Storytelling With Web ArchivesStorytelling With Web Archives
Storytelling With Web Archives
 
Characteristics of Social Media Stories
Characteristics of Social Media StoriesCharacteristics of Social Media Stories
Characteristics of Social Media Stories
 
Learning & Web 2.0: It's all about Play!
Learning & Web 2.0:  It's all about Play!Learning & Web 2.0:  It's all about Play!
Learning & Web 2.0: It's all about Play!
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Virtual Libraries
Virtual LibrariesVirtual Libraries
Virtual Libraries
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
The Many Shapes of Archive-It
The Many Shapes of Archive-ItThe Many Shapes of Archive-It
The Many Shapes of Archive-It
 
Just a Room Full of Stuff? Why Libraries are Great / Katie Birkwood
Just a Room Full of Stuff? Why Libraries are Great / Katie BirkwoodJust a Room Full of Stuff? Why Libraries are Great / Katie Birkwood
Just a Room Full of Stuff? Why Libraries are Great / Katie Birkwood
 
Info2011
Info2011Info2011
Info2011
 
Surfing Cs & Wading 2.0 Tide Pools
Surfing Cs & Wading 2.0 Tide PoolsSurfing Cs & Wading 2.0 Tide Pools
Surfing Cs & Wading 2.0 Tide Pools
 
Summarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMapSummarize Your Archival Holdings With MementoMap
Summarize Your Archival Holdings With MementoMap
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
 
Emerging Technologies in the Library
Emerging Technologies in the LibraryEmerging Technologies in the Library
Emerging Technologies in the Library
 
New Librarians: This is your time
New Librarians: This is your timeNew Librarians: This is your time
New Librarians: This is your time
 
What mean ye storytelling- the #etmooc version
What mean ye storytelling- the #etmooc versionWhat mean ye storytelling- the #etmooc version
What mean ye storytelling- the #etmooc version
 
Reaching Your Patrons in the Brave New World of the Social Web
Reaching Your Patrons in the Brave New World of the Social WebReaching Your Patrons in the Brave New World of the Social Web
Reaching Your Patrons in the Brave New World of the Social Web
 

Viewers also liked

Viewers also liked (20)

Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingUsing Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet Archive
 
More Archives, More Better
More Archives, More Better More Archives, More Better
More Archives, More Better
 
Web Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionWeb Archiving: A Brief Introduction
Web Archiving: A Brief Introduction
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content Language
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
 
Software as a Well-Formed Research Object
Software as a Well-Formed Research ObjectSoftware as a Well-Formed Research Object
Software as a Well-Formed Research Object
 
Assessing the Quality of Web Archives
Assessing the Quality of Web ArchivesAssessing the Quality of Web Archives
Assessing the Quality of Web Archives
 
On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over Time
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
 
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolEvaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
 
Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
OAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange ProjectOAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange Project
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?
 

Similar to We Need Multiple, Independent Web Archives

The Dark Side of Social Media
The Dark Side of Social MediaThe Dark Side of Social Media
The Dark Side of Social Media
Aref Jdey
 
The dark side of social media alarm bells analysis and the way out
The dark side of social media alarm bells analysis and the way outThe dark side of social media alarm bells analysis and the way out
The dark side of social media alarm bells analysis and the way out
Twittercrisis
 
Twitter - A Powerful Collaboration Tool for Teachers
Twitter - A Powerful Collaboration Tool for TeachersTwitter - A Powerful Collaboration Tool for Teachers
Twitter - A Powerful Collaboration Tool for Teachers
Eric Langhorst
 
Social Networking Informative Speech
Social Networking Informative SpeechSocial Networking Informative Speech
Social Networking Informative Speech
Cory Bohon
 
Fb Twitter Presentation Cd April19 [Compatibility Mode]
Fb Twitter Presentation Cd April19 [Compatibility Mode]Fb Twitter Presentation Cd April19 [Compatibility Mode]
Fb Twitter Presentation Cd April19 [Compatibility Mode]
Cherie Dargan
 
Assignment 15 primary & secondary research
Assignment 15 primary & secondary researchAssignment 15 primary & secondary research
Assignment 15 primary & secondary research
Rosiezein
 
Handheld Librarian 7 Online Conference - August 15, 2012
Handheld Librarian 7 Online Conference - August 15, 2012 Handheld Librarian 7 Online Conference - August 15, 2012
Handheld Librarian 7 Online Conference - August 15, 2012
Robin M. Ashford, MSLIS
 
Carrying the Banner: Reinventing News on Your University Website
Carrying the Banner: Reinventing News on Your University WebsiteCarrying the Banner: Reinventing News on Your University Website
Carrying the Banner: Reinventing News on Your University Website
Georgiana Cohen
 
Assignment 15 primary & secondary research
Assignment 15 primary & secondary researchAssignment 15 primary & secondary research
Assignment 15 primary & secondary research
ChelseaFashole
 

Similar to We Need Multiple, Independent Web Archives (20)

ARC211: American Diversity and Design: Victoria Towndrow
ARC211: American Diversity and Design: Victoria TowndrowARC211: American Diversity and Design: Victoria Towndrow
ARC211: American Diversity and Design: Victoria Towndrow
 
The Dark Side of Social Media
The Dark Side of Social MediaThe Dark Side of Social Media
The Dark Side of Social Media
 
The dark side of social media alarm bells analysis and the way out
The dark side of social media alarm bells analysis and the way outThe dark side of social media alarm bells analysis and the way out
The dark side of social media alarm bells analysis and the way out
 
Twitter - A Powerful Collaboration Tool for Teachers
Twitter - A Powerful Collaboration Tool for TeachersTwitter - A Powerful Collaboration Tool for Teachers
Twitter - A Powerful Collaboration Tool for Teachers
 
How to Make Infographics that Change the World
How to Make Infographics that Change the WorldHow to Make Infographics that Change the World
How to Make Infographics that Change the World
 
Social Networking Informative Speech
Social Networking Informative SpeechSocial Networking Informative Speech
Social Networking Informative Speech
 
Not so social
Not so social Not so social
Not so social
 
Social Media For Public Libraries: Basics and Beyond
Social Media For Public Libraries: Basics and BeyondSocial Media For Public Libraries: Basics and Beyond
Social Media For Public Libraries: Basics and Beyond
 
Fb Twitter Presentation Cd April19 [Compatibility Mode]
Fb Twitter Presentation Cd April19 [Compatibility Mode]Fb Twitter Presentation Cd April19 [Compatibility Mode]
Fb Twitter Presentation Cd April19 [Compatibility Mode]
 
Social Media Today
Social Media TodaySocial Media Today
Social Media Today
 
Social Media Edge- Feb 2010
Social Media Edge- Feb 2010Social Media Edge- Feb 2010
Social Media Edge- Feb 2010
 
Digital Tattoo for NITEP students
Digital Tattoo for NITEP studentsDigital Tattoo for NITEP students
Digital Tattoo for NITEP students
 
Assignment 15 primary & secondary research
Assignment 15 primary & secondary researchAssignment 15 primary & secondary research
Assignment 15 primary & secondary research
 
Social interaction and social media in museums
Social interaction and social media in museumsSocial interaction and social media in museums
Social interaction and social media in museums
 
Rd Table Presentation Power Of Sns
Rd Table Presentation Power Of SnsRd Table Presentation Power Of Sns
Rd Table Presentation Power Of Sns
 
The Digital Divide in the Post Snowden Era Presentation
The Digital Divide in the Post Snowden Era PresentationThe Digital Divide in the Post Snowden Era Presentation
The Digital Divide in the Post Snowden Era Presentation
 
Handheld Librarian 7 Online Conference - August 15, 2012
Handheld Librarian 7 Online Conference - August 15, 2012 Handheld Librarian 7 Online Conference - August 15, 2012
Handheld Librarian 7 Online Conference - August 15, 2012
 
Carrying the Banner: Reinventing News on Your University Website
Carrying the Banner: Reinventing News on Your University WebsiteCarrying the Banner: Reinventing News on Your University Website
Carrying the Banner: Reinventing News on Your University Website
 
Working with opportunities and risks for CSE in a digital age
Working with opportunities and risks for CSE in a digital ageWorking with opportunities and risks for CSE in a digital age
Working with opportunities and risks for CSE in a digital age
 
Assignment 15 primary & secondary research
Assignment 15 primary & secondary researchAssignment 15 primary & secondary research
Assignment 15 primary & secondary research
 

More from Michael Nelson

More from Michael Nelson (7)

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pages
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

We Need Multiple, Independent Web Archives

  • 1. We Need Multiple, Independent Web Archives Panel 4: Social Media Research Data, Tools, and Methodologies Michael L. Nelson Old Dominion University Web Science & Digital Libraries Research Group www.cs.odu.edu/~mln/ @phonedude_mln With: ODU: Michele C. Weigle Los Alamos National Laboratory: Herbert Van de Sompel
  • 2.
  • 4. Seagal’s Law A man with a watch knows what time it is. A man with two watches is never sure. How to resolve conflicting archives? Personalization, GeoIP, mobile vs. desktop, etc. means “the” page rarely exists, only “a” page. Mat Kelly, Justin F. Brunelle, Michele C. Weigle, and Michael L. Nelson, A Method for Identifying Personalized Representations in Web Archives, D-Lib Magazine, 19(11/12), 2013. http://www.dlib.org/dlib/november13/kelly/11kelly.html
  • 5. Why we need multiple, independent archives…
  • 6. A single archive is vulnerable http://www.bbc.com/news/uk-politics-24924185 http://ws-dl.blogspot.com/2013/11/2013-11-21-conservative-party-speeches.html
  • 7. Houston, Tranquility Base Here. The Eagle has landed. see also: http://ws-dl.blogspot.com/2013/03/2013-03-22-ntrs-web-archives-and-why-we.html
  • 9. $ curl –I "http://www.thedailybeast.com/articles/2016/08/11/i- got-three-grindr-dates-in-an-hour-in-the-olympic-village.html" HTTP/1.1 301 Moved Permanently Access-Control-Allow-Origin: * Age: 0 Cache-Control: max-age=60 Content-Type: text/html; charset=iso-8859-1 Date: Thu, 18 Aug 2016 01:13:46 GMT Location: http://www.thedailybeast.com/articles/2016/08/11/a- note-from-the-editors.html RealAge: 0 Server: Apache Vary: Accept-Encoding, User-Agent Via: 1.1 varnish X-BackEnd: default X-Cache: MISS X-Cacheable: YES X-Restarts: 0 X-UA-Device: pc X-Varnish: 995407903 Connection: keep-alive http://www.usnews.com/news/articles/2016-08-17/wayback-machine-wont-censor-archive-for-taste-director-says-after-olympics-article-scrubbed
  • 10. But who pays for those extra archives? 1TB endowment = ~$4700: http://blog.dshr.org/2011/02/paying-for-long-term-storage.html see also: http://blog.dshr.org/2011/01/memento-marketplace-for-archiving.html
  • 11. Archives Aren’t Magic Web Sites They’re Just Web Sites. If you used Mummify, you’re now left with a bunch of defunct, shortened links like: https://mummify.it/XbmcMfE3 Don’t throw away link semantics! See: http://robustlinks.mementoweb.org
  • 12. Economics Working Against Archives In the paper world in order to monetize their content the copyright owner had to maximize the number of copies of it. In the Web world, in order to monetize their content the copyright owner has to minimize the number of copies. Thus the fundamental economic motivation for Web content militates against its preservation in the ways that Herbert and I would like. --David Rosenthal http://blog.dshr.org/2015/02/the-evanescent-web.html
  • 13. “We’ll use the cloud!”
  • 15. http://www.bbc.com/future/story/20120927-the-decaying-web On January 28 2011, three days into the fierce protests that would eventually oust the Egyptian president Hosni Mubarak, a Twitter user called Farrah posted a link to a picture that supposedly showed an armed man as he ran on a “rooftop during clashes between police and protesters in Suez”. I say supposedly, because both the tweet and the picture it linked to no longer exist. Instead they have been replaced with error messages that claim the message – and its contents – “doesn’t exist”.
  • 16. Missing Tweet & Pic https://twitter.com/Farrah3m/status/31727870736859137 http://twitpic.com/3uvo6z http://ws-dl.blogspot.com/2013/05/2013-05-07-who-is-archiving-your-tweets.html
  • 17. In May 2013, not completely missing…
  • 18. In February 2015, completely missing. http://topsy.com/http://twitpic.com/3uvo6z
  • 21. No Server == No HTTP Event == Nothing to Archive http://topsy.com/http://twitpic.com/3uvo6z
  • 22. Hany M. SalahEldeen, Michael L. Nelson, Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost?, Proceedings of TPDL 2012. http://arxiv.org/abs/1209.3026 Hany SalahEldeen, Michael L. Nelson, Resurrecting My Revolution: Using Social Link Neighborhood in Bringing Context to the Disappearing Web, Proceedings of TPDL 2013. http://arxiv.org/abs/1309.2648 Missing: 11% year 1, 7%/year afterwards Archived: 7% year 1, 15%/year afterwards
  • 23. Malaysia Airlines Flight 17 (MH17) http://web.archive.org/web/20140717152222/http://vk.com/strelkov_info http://www.csmonitor.com/World/Europe/2014/0717/Web-evidence-points-to-pro-Russia-rebels-in-downing-of-MH17-video http://www.newyorker.com/magazine/2015/01/26/cobweb
  • 24.
  • 25. (not really archived as well as you think)
  • 26. Ed and I Discuss Who Has What… https://twitter.com/phonedude_mln/status/490171976389238784
  • 28. Alex is now 404. Would multiple archives have convinced him? https://twitter.com/quicknquiet
  • 29. Do we really have “a perfect tool to produce `evidence’ of any kind”?
  • 31. But can you prove he didn’t say this?
  • 32. Or that she didn’t say this? (remember: black hats can use tools created by white hats)
  • 34. Hey #Twitter, did you know there’s flooding in LA… https://www.facebook.com/KevinFreyTV/photos/a.1678627819032359.1073741829.1675465999348541/1834217933473346/?type=1&theater Reminder: Facebook ~5X Larger Than Twitter
  • 35. Summary • Seagal’s Law has come to web archiving – Learn more about archive interoperability: http://mementoweb.org/ • Archived web is incomplete, unstable, unreliable, and unevenly distributed – Always true for archives, but shouldn’t we expect better? – Learn more about archival verifiability: https://mellon.org/grants/grants-database/grants/old-dominion- university/11600663/