Who and What Links to the Internet Archive

  • 1,646 views
Uploaded on

 

More in: Technology , Design
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,646
On Slideshare
0
From Embeds
0
Number of Embeds
38

Actions

Shares
Downloads
3
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Who and What Links to the Internet Archive Yasmin AlNoamany, Ahmed AlSum, Michele C. Weigle, Michael L. Nelson Computer Science Department Old Dominion University, Norfolk, VA yasmin@cs.odu.edu
  • 2. 2Access Patterns for Robots and Humans in Web Archives Motivation • No prior study has answered the following questions: – What do web archive users look for in terms of the content language of requested pages? – How do people reach web archives? – Who links to web archives? – How do sites link to web archives? – Why do sites link to the past?
  • 3. 3Access Patterns for Robots and Humans in Web Archives Dataset Sampling
  • 4. 4Access Patterns for Robots and Humans in Web Archives English pages are the most, followed by the European languages
  • 5. 5Access Patterns for Robots and Humans in Web Archives European languages represent 22% of the Unarchived requested pages
  • 6. 6Access Patterns for Robots and Humans in Web Archives Most languages self-link
  • 7. 7Access Patterns for Robots and Humans in Web Archives 82% of human sessions connect to the Wayback Machine via referrals WebSite Percentage Description en.wikipedia.org 12.9% Wikipedia archive.org 11.9% IA Home Page reddit.com 10.2% Social News Web Site google.TLD 9.9% Search Engine info-poland.bualo.edu 1.5% Polish Studies de.wikipedia.org 1.4% Wikipedia cracked.com 1.2% Humor Site snopes.com 1.1% Urban Legends Reference Pages facebook.com 0.9% Social Media crochetpatterncentral.com 0.9% Crocheting Hobbies
  • 8. 8Access Patterns for Robots and Humans in Web Archives Most of the links (86%) are to mementos
  • 9. 9Access Patterns for Robots and Humans in Web Archives 83% of the mementos that have links from outside the archive do not currently exist on the live web
  • 10. 10Access Patterns for Robots and Humans in Web Archives Conclusions • We provided analysis of the distributions of languages to gain insight about what users look for on the Wayback Machine – English is the most used language, followed by many European languages – The languages are linking mainly to themselves and to English • We provided analysis for the human referrers to discover where Wayback Machine users come from: – 86% of the referrer web pages link deeply to mementos – More than 82% of the links to these mementos are because their corresponding URI-Rs do not exist on the live web