Your SlideShare is downloading. ×
Who and What Links to the Internet Archive
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Who and What Links to the Internet Archive

1,743

Published on

Published in: Technology, Design
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,743
On Slideshare
0
From Embeds
0
Number of Embeds
38
Actions
Shares
0
Downloads
3
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Who and What Links to the Internet Archive Yasmin AlNoamany, Ahmed AlSum, Michele C. Weigle, Michael L. Nelson Computer Science Department Old Dominion University, Norfolk, VA yasmin@cs.odu.edu
  • 2. 2Access Patterns for Robots and Humans in Web Archives Motivation • No prior study has answered the following questions: – What do web archive users look for in terms of the content language of requested pages? – How do people reach web archives? – Who links to web archives? – How do sites link to web archives? – Why do sites link to the past?
  • 3. 3Access Patterns for Robots and Humans in Web Archives Dataset Sampling
  • 4. 4Access Patterns for Robots and Humans in Web Archives English pages are the most, followed by the European languages
  • 5. 5Access Patterns for Robots and Humans in Web Archives European languages represent 22% of the Unarchived requested pages
  • 6. 6Access Patterns for Robots and Humans in Web Archives Most languages self-link
  • 7. 7Access Patterns for Robots and Humans in Web Archives 82% of human sessions connect to the Wayback Machine via referrals WebSite Percentage Description en.wikipedia.org 12.9% Wikipedia archive.org 11.9% IA Home Page reddit.com 10.2% Social News Web Site google.TLD 9.9% Search Engine info-poland.bualo.edu 1.5% Polish Studies de.wikipedia.org 1.4% Wikipedia cracked.com 1.2% Humor Site snopes.com 1.1% Urban Legends Reference Pages facebook.com 0.9% Social Media crochetpatterncentral.com 0.9% Crocheting Hobbies
  • 8. 8Access Patterns for Robots and Humans in Web Archives Most of the links (86%) are to mementos
  • 9. 9Access Patterns for Robots and Humans in Web Archives 83% of the mementos that have links from outside the archive do not currently exist on the live web
  • 10. 10Access Patterns for Robots and Humans in Web Archives Conclusions • We provided analysis of the distributions of languages to gain insight about what users look for on the Wayback Machine – English is the most used language, followed by many European languages – The languages are linking mainly to themselves and to English • We provided analysis for the human referrers to discover where Wayback Machine users come from: – 86% of the referrer web pages link deeply to mementos – More than 82% of the links to these mementos are because their corresponding URI-Rs do not exist on the live web

×