Web Archiving Profile - WADL 2013
Upcoming SlideShare
Loading in...5
×
 

Web Archiving Profile - WADL 2013

on

  • 1,562 views

 

Statistics

Views

Total Views
1,562
Views on SlideShare
297
Embed Views
1,265

Actions

Likes
0
Downloads
1
Comments
0

36 Embeds 1,265

http://ws-dl.blogspot.com 727
http://www.cs.odu.edu 295
http://justinhome2 30
http://ws-dl.blogspot.ca 23
http://ws-dl.blogspot.ru 20
http://192.168.1.9 18
http://ws-dl.blogspot.com.au 17
http://ws-dl.blogspot.in 17
http://ws-dl.blogspot.de 14
http://ws-dl.blogspot.co.uk 13
http://ws-dl.blogspot.it 12
http://ws-dl.blogspot.nl 12
http://ws-dl.blogspot.co.nz 11
http://ws-dl.blogspot.gr 10
http://ws-dl.blogspot.fr 8
http://cloud.feedly.com 8
http://ws-dl.blogspot.sg 4
http://ws-dl.blogspot.jp 3
http://ws-dl.blogspot.hk 2
http://ws-dl.blogspot.com.es 2
http://ws-dl.blogspot.se 2
http://ws-dl.blogspot.hu 2
http://ws-dl.blogspot.ch 2
http://ws-dl.blogspot.ae 1
http://ws-dl.blogspot.pt 1
http://ws-dl.blogspot.ro 1
http://ws-dl.blogspot.com.br 1
http://ws-dl.blogspot.cz 1
http://ws-dl.blogspot.com.ar 1
http://localhost 1
http://ws-dl.blogspot.tw 1
http://www.ws-dl.blogspot.com 1
http://ws-dl.blogspot.kr 1
http://ws-dl.blogspot.be 1
http://newsblur.com 1
http://feedly.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Web Archiving Profile - WADL 2013 Web Archiving Profile - WADL 2013 Presentation Transcript

  • Web Archiving Profile OverviewAhmed AlSum PhD Candidate Old Dominion University Web Archiving and Digital Libraries (WADL 2013) A Workshop at JCDL 2013 July 25-26, 2013 Indianapolis, Indiana, USA
  • What is the problem? • Web Archives are blackbox, it just accessible through textbox search (full-text or URI-lookup) • We need to profile/characterize the web archives around the world such as: o Age o Top-level domains o Languages o Growth rate
  • Why • To optimize the query routing for Memento Aggregator. • To determine the missing parts of the web.
  • Who Full text URI-lookup Internet Archive x Library of Congress x Icelandic Web Archive x Library and Archives Canada x x British Library x x UK National Library x x Portuguese Web Archive x x Web Archive of Catalonia x x Croatian Web Archive x x Archive of the Czech Web x x National Taiwan University x x Archive IT x x
  • How • Sampling from different sources • Retrieve the TimeMap from each archive • Analyze the TimeMaps
  • URIs Samples Sources Web 1. DMOZ – Random sample 2. DMOZ – TLD %2 of each TLD from DMOZ (.com, .org, .jp, etc 52 TLD) 3. DMOZ – Languages 100 URIs for each Languages (24 lang.) Web Archives 4. Top 1-Gram from Bing 5. Top 1000 queries term by Yahoo in 9 languages User requests 6. IA Wayback Machine Log files 7. Memento aggregator log files * We used hostnames only
  • General Coverage
  • Web Archive Growth Rate
  • TLD Sample Coverage
  • TLD per archive (TLD Sample)
  • TLD per archive (Fulltext search)
  • TLD across archives
  • Languages distribution per archive
  • Query Routing Evaluation