SlideShare a Scribd company logo
1 of 14
Web Archiving
Profile
OverviewAhmed AlSum
PhD Candidate
Old Dominion University
Web Archiving and Digital Libraries (WADL 2013)
A Workshop at JCDL 2013
July 25-26, 2013
Indianapolis, Indiana, USA
What is the problem?
• Web Archives are blackbox, it just accessible
through textbox search (full-text or URI-lookup)
• We need to profile/characterize the web archives
around the world such as:
o Age
o Top-level domains
o Languages
o Growth rate
Why
• To optimize the query routing for Memento
Aggregator.
• To determine the missing parts of the web.
Who
Full text URI-lookup
Internet Archive x
Library of Congress x
Icelandic Web Archive x
Library and Archives Canada x x
British Library x x
UK National Library x x
Portuguese Web Archive x x
Web Archive of Catalonia x x
Croatian Web Archive x x
Archive of the Czech Web x x
National Taiwan University x x
Archive IT x x
How
• Sampling from different sources
• Retrieve the TimeMap from each archive
• Analyze the TimeMaps
URIs Samples Sources
Web
1. DMOZ – Random sample
2. DMOZ – TLD %2 of each
TLD from DMOZ (.com,
.org, .jp, etc 52 TLD)
3. DMOZ – Languages 100
URIs for each Languages (24
lang.)
Web Archives
4. Top 1-Gram from Bing
5. Top 1000 queries term
by Yahoo in 9 languages
User requests
6. IA Wayback Machine Log files
7. Memento aggregator log files
* We used hostnames only
General Coverage
Web Archive Growth Rate
TLD Sample Coverage
TLD per archive
(TLD Sample)
TLD per archive
(Fulltext search)
TLD across archives
Languages distribution
per archive
Query Routing Evaluation

More Related Content

What's hot

03 Researchfriendly Org2
03 Researchfriendly Org203 Researchfriendly Org2
03 Researchfriendly Org2
Inria
 
Forging New Links: Libraries in the Semantic Web
Forging New Links: Libraries in the Semantic WebForging New Links: Libraries in the Semantic Web
Forging New Links: Libraries in the Semantic Web
Gillian Byrne
 
RDF in Hydra Summit Overview
RDF in Hydra Summit OverviewRDF in Hydra Summit Overview
RDF in Hydra Summit Overview
Karen Estlund
 
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early AdoptersApril 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
National Information Standards Organization (NISO)
 

What's hot (20)

FLAX: Flexible Language Acquisition with Open Data-Driven Learning
FLAX: Flexible Language Acquisition with Open Data-Driven LearningFLAX: Flexible Language Acquisition with Open Data-Driven Learning
FLAX: Flexible Language Acquisition with Open Data-Driven Learning
 
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
 
Rs detective afpl
Rs detective afplRs detective afpl
Rs detective afpl
 
Archivegrid
ArchivegridArchivegrid
Archivegrid
 
Library resources- CSD - 08 2016
Library resources- CSD -  08 2016Library resources- CSD -  08 2016
Library resources- CSD - 08 2016
 
UW Libraries: Research Smarter, Not Harder
UW Libraries: Research Smarter, Not HarderUW Libraries: Research Smarter, Not Harder
UW Libraries: Research Smarter, Not Harder
 
03 Researchfriendly Org2
03 Researchfriendly Org203 Researchfriendly Org2
03 Researchfriendly Org2
 
Eaa2014 open access_session_4_g.eberhardt+n.riedl_topoi_final_13092014
Eaa2014 open access_session_4_g.eberhardt+n.riedl_topoi_final_13092014Eaa2014 open access_session_4_g.eberhardt+n.riedl_topoi_final_13092014
Eaa2014 open access_session_4_g.eberhardt+n.riedl_topoi_final_13092014
 
Publishing and Using Linked Open Data - Day 2
Publishing and Using Linked Open Data - Day 2Publishing and Using Linked Open Data - Day 2
Publishing and Using Linked Open Data - Day 2
 
The Open Source Library: It's Free As in Puppy
The Open Source Library: It's Free As in PuppyThe Open Source Library: It's Free As in Puppy
The Open Source Library: It's Free As in Puppy
 
A review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic WebA review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic Web
 
Forging New Links: Libraries in the Semantic Web
Forging New Links: Libraries in the Semantic WebForging New Links: Libraries in the Semantic Web
Forging New Links: Libraries in the Semantic Web
 
Thompson 6-jun15-final
Thompson 6-jun15-finalThompson 6-jun15-final
Thompson 6-jun15-final
 
Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web
 
Sharing an Open Methodology for Building Domain-specific Corpora for EAP
Sharing an Open Methodology for Building Domain-specific Corpora for EAP Sharing an Open Methodology for Building Domain-specific Corpora for EAP
Sharing an Open Methodology for Building Domain-specific Corpora for EAP
 
RDF in Hydra Summit Overview
RDF in Hydra Summit OverviewRDF in Hydra Summit Overview
RDF in Hydra Summit Overview
 
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early AdoptersApril 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
 
Consuming Linked Data by Humans - WWW2010
Consuming Linked Data by Humans - WWW2010Consuming Linked Data by Humans - WWW2010
Consuming Linked Data by Humans - WWW2010
 
Bracke may4-1
Bracke may4-1Bracke may4-1
Bracke may4-1
 
Data management basics, for UC Davis EDU 292
Data management basics, for UC Davis EDU 292Data management basics, for UC Davis EDU 292
Data management basics, for UC Davis EDU 292
 

Viewers also liked

Site story wadl2013
Site story wadl2013Site story wadl2013
Site story wadl2013
Martin Klein
 

Viewers also liked (6)

Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member
 
Site story wadl2013
Site story wadl2013Site story wadl2013
Site story wadl2013
 
Archiving the Mobile Web
Archiving the Mobile WebArchiving the Mobile Web
Archiving the Mobile Web
 
The Revolution Will Not Be Archived
The Revolution Will Not Be ArchivedThe Revolution Will Not Be Archived
The Revolution Will Not Be Archived
 
Word Clouds from Twitter Follower Descriptions
Word Clouds from Twitter Follower DescriptionsWord Clouds from Twitter Follower Descriptions
Word Clouds from Twitter Follower Descriptions
 
Tweet Visibility Dynamics in a Tweet Conversation Graph
Tweet Visibility Dynamics in a Tweet Conversation GraphTweet Visibility Dynamics in a Tweet Conversation Graph
Tweet Visibility Dynamics in a Tweet Conversation Graph
 

Similar to Web Archiving Profile - WADL 2013

Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012
Roxanne Missingham
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
Jon Voss
 
Fri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineeringFri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineering
eswcsummerschool
 
Drupal Open Source Everything
Drupal Open Source EverythingDrupal Open Source Everything
Drupal Open Source Everything
librarywebchic
 

Similar to Web Archiving Profile - WADL 2013 (20)

Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content Language
 
Web archiving challenges and opportunities
Web archiving challenges and opportunitiesWeb archiving challenges and opportunities
Web archiving challenges and opportunities
 
ITS Projects and Services Showcase - June 2013
ITS Projects and Services Showcase - June 2013ITS Projects and Services Showcase - June 2013
ITS Projects and Services Showcase - June 2013
 
Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012Slides anu talkwebarchivingaug2012
Slides anu talkwebarchivingaug2012
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival Technologies
 
Hub and Spokes Development June07
Hub and Spokes Development June07Hub and Spokes Development June07
Hub and Spokes Development June07
 
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
 
IIPC GA 2014 Solr
IIPC GA 2014 SolrIIPC GA 2014 Solr
IIPC GA 2014 Solr
 
Capture All the URLS: First Steps in Web Archiving
Capture All the URLS: First Steps in Web ArchivingCapture All the URLS: First Steps in Web Archiving
Capture All the URLS: First Steps in Web Archiving
 
Easter JISC metadata May25 DT
Easter JISC metadata May25 DTEaster JISC metadata May25 DT
Easter JISC metadata May25 DT
 
Linked Open Data for Libraries, Archives, and Museums: An Aggregators View
Linked Open Data for Libraries, Archives, and Museums: An Aggregators ViewLinked Open Data for Libraries, Archives, and Museums: An Aggregators View
Linked Open Data for Libraries, Archives, and Museums: An Aggregators View
 
Internet content as research data
Internet content as research dataInternet content as research data
Internet content as research data
 
Fri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineeringFri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineering
 
Web Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchWeb Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext Search
 
Drupal Open Source Everything
Drupal Open Source EverythingDrupal Open Source Everything
Drupal Open Source Everything
 
Creating and Maintaining Web Archives
Creating and Maintaining Web ArchivesCreating and Maintaining Web Archives
Creating and Maintaining Web Archives
 
Institutional Repository - May 2010
Institutional Repository - May 2010Institutional Repository - May 2010
Institutional Repository - May 2010
 
Reborn Digital: coding text
Reborn Digital: coding textReborn Digital: coding text
Reborn Digital: coding text
 

More from Ahmed AlSum (6)

Restoring US First Website
Restoring US First WebsiteRestoring US First Website
Restoring US First Website
 
"Web Archive services framework for tighter integration between the past and ...
"Web Archive services framework for tighter integration between the past and ..."Web Archive services framework for tighter integration between the past and ...
"Web Archive services framework for tighter integration between the past and ...
 
Thumbnail Summarization Techniques For Web Archives
Thumbnail Summarization Techniques For Web ArchivesThumbnail Summarization Techniques For Web Archives
Thumbnail Summarization Techniques For Web Archives
 
Archival HTTP Redirection Retrieval Policies - TemporalWeb 2013
Archival HTTP Redirection Retrieval Policies - TemporalWeb 2013Archival HTTP Redirection Retrieval Policies - TemporalWeb 2013
Archival HTTP Redirection Retrieval Policies - TemporalWeb 2013
 
ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013
 
How Much of the Web is Archived? JCDL 2011
How Much of the Web is Archived? JCDL 2011How Much of the Web is Archived? JCDL 2011
How Much of the Web is Archived? JCDL 2011
 

Recently uploaded

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 

Web Archiving Profile - WADL 2013