Who and What Links to the Internet Archive
Upcoming SlideShare
Loading in...5
×
 

Who and What Links to the Internet Archive

on

  • 1,734 views

 

Statistics

Views

Total Views
1,734
Views on SlideShare
492
Embed Views
1,242

Actions

Likes
1
Downloads
2
Comments
0

38 Embeds 1,242

http://ws-dl.blogspot.com 714
http://www.cs.odu.edu 286
http://justinhome2 30
http://ws-dl.blogspot.ca 22
http://192.168.1.9 19
http://ws-dl.blogspot.ru 19
http://ws-dl.blogspot.com.au 16
http://ws-dl.blogspot.in 15
http://ws-dl.blogspot.co.uk 14
http://ws-dl.blogspot.de 14
http://ws-dl.blogspot.it 11
https://twitter.com 11
http://ws-dl.blogspot.gr 10
http://ws-dl.blogspot.co.nz 9
http://cloud.feedly.com 8
http://ws-dl.blogspot.nl 8
http://ws-dl.blogspot.fr 5
http://ws-dl.blogspot.sg 4
http://ws-dl.blogspot.jp 3
http://ws-dl.blogspot.hu 2
http://ws-dl.blogspot.se 2
http://ws-dl.blogspot.ch 2
http://ws-dl.blogspot.com.es 2
http://ws-dl.blogspot.hk 2
http://kred.com 1
http://ws-dl.blogspot.ro 1
http://ws-dl.blogspot.ae 1
http://ws-dl.blogspot.pt 1
http://ws-dl.blogspot.com.br 1
http://ws-dl.blogspot.cz 1
http://ws-dl.blogspot.com.ar 1
http://localhost 1
http://ws-dl.blogspot.tw 1
http://www.ws-dl.blogspot.com 1
http://ws-dl.blogspot.kr 1
http://ws-dl.blogspot.be 1
http://newsblur.com 1
http://feedly.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Who and What Links to the Internet Archive Who and What Links to the Internet Archive Presentation Transcript

    • Who and What Links to the Internet Archive Yasmin AlNoamany, Ahmed AlSum, Michele C. Weigle, Michael L. Nelson Computer Science Department Old Dominion University, Norfolk, VA yasmin@cs.odu.edu
    • 2Access Patterns for Robots and Humans in Web Archives Motivation • No prior study has answered the following questions: – What do web archive users look for in terms of the content language of requested pages? – How do people reach web archives? – Who links to web archives? – How do sites link to web archives? – Why do sites link to the past?
    • 3Access Patterns for Robots and Humans in Web Archives Dataset Sampling
    • 4Access Patterns for Robots and Humans in Web Archives English pages are the most, followed by the European languages
    • 5Access Patterns for Robots and Humans in Web Archives European languages represent 22% of the Unarchived requested pages
    • 6Access Patterns for Robots and Humans in Web Archives Most languages self-link
    • 7Access Patterns for Robots and Humans in Web Archives 82% of human sessions connect to the Wayback Machine via referrals WebSite Percentage Description en.wikipedia.org 12.9% Wikipedia archive.org 11.9% IA Home Page reddit.com 10.2% Social News Web Site google.TLD 9.9% Search Engine info-poland.bualo.edu 1.5% Polish Studies de.wikipedia.org 1.4% Wikipedia cracked.com 1.2% Humor Site snopes.com 1.1% Urban Legends Reference Pages facebook.com 0.9% Social Media crochetpatterncentral.com 0.9% Crocheting Hobbies
    • 8Access Patterns for Robots and Humans in Web Archives Most of the links (86%) are to mementos
    • 9Access Patterns for Robots and Humans in Web Archives 83% of the mementos that have links from outside the archive do not currently exist on the live web
    • 10Access Patterns for Robots and Humans in Web Archives Conclusions • We provided analysis of the distributions of languages to gain insight about what users look for on the Wayback Machine – English is the most used language, followed by many European languages – The languages are linking mainly to themselves and to English • We provided analysis for the human referrers to discover where Wayback Machine users come from: – 86% of the referrer web pages link deeply to mementos – More than 82% of the links to these mementos are because their corresponding URI-Rs do not exist on the live web