SlideShare a Scribd company logo
1 of 1
Download to read offline
User Access Patterns in Web Archives
Robot sessions outnumber human sessions 10:1 in the Internet Archive
Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson
{yasmin, mweigle, mln}@cs.odu.edu
How do Users access Web Archives?

Methodology

Although user patterns in the live web are well-understood, there has been no corresponding study of how
users, both humans and robots, access web archives.

Data Set: random sample of 2M requests from Internet Archive’s Wayback Machine access logs of Feb. 2,
2012.

Abstract Models for Accessing Web Archives

Robots vs Humans
User
Robots
Humans

Raw Requests
1,002,573 (50.1%)
810,049 (40.5%)

Filtered Requests
396,627 (93.0%)
29,690 (7.0%)

Sessions
34,203 (90.9%)
3,431 (9.1%)

MBs Transferred
20,010
4,459

Results
40

50

30
20

30
20
10

Percentage

40

TimeMap
Memento

0

0

10

Dip

Dive

Slide and Dive

Robots

Skim

Slide

Dip

Dive

Slide & Dive

Skim

Slide

Humans

Robots and humans exhibit different access patterns.

Conclusion
• Robots outnumber humans 10:1 in terms of sessions, 5:4 in terms of raw HTTP accesses, and 4:1
in terms of MB transferred.
• Robots mainly exhibit the Dip and Skim patterns, with about 49% of their sessions for each pattern,
and that they access TimeMaps almost exclusively.
• Humans exhibit the Dip pattern with 39% and Dive pattern with 30% of their sessions. Unlike
robots, humans mainly access archived pages rather than TimeMaps.

References
1- Access Patterns for Robots and Humans in Web Archives. Yasmin AlNomany, Michele C. Weigle and
Michael L. Nelson. IEEE/ACM Joint Conference on Digital Libraries, 2013.

More Related Content

Viewers also liked

What can linked data do for digital libraries
What can linked data do for digital librariesWhat can linked data do for digital libraries
What can linked data do for digital librariesSören Auer
 
On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeMichael Nelson
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Michael Nelson
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageMichael Nelson
 
"Archive What I See Now" - NEH ODH overview
"Archive What I See Now" - NEH ODH overview"Archive What I See Now" - NEH ODH overview
"Archive What I See Now" - NEH ODH overviewMichele Weigle
 

Viewers also liked (6)

What can linked data do for digital libraries
What can linked data do for digital librariesWhat can linked data do for digital libraries
What can linked data do for digital libraries
 
On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over Time
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content Language
 
ResourceSync Tutorial
ResourceSync TutorialResourceSync Tutorial
ResourceSync Tutorial
 
"Archive What I See Now" - NEH ODH overview
"Archive What I See Now" - NEH ODH overview"Archive What I See Now" - NEH ODH overview
"Archive What I See Now" - NEH ODH overview
 

Similar to User Access Patterns in Web Archives

ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social Machinespetermurrayrust
 
Image retrieval from the world wide web issues, techniques, and systems
Image retrieval from the world wide web issues, techniques, and systemsImage retrieval from the world wide web issues, techniques, and systems
Image retrieval from the world wide web issues, techniques, and systemsunyil96
 
Image retrieval from the world wide web
Image retrieval from the world wide webImage retrieval from the world wide web
Image retrieval from the world wide webunyil96
 
Introduction to webometrics(13 mar2011)
Introduction to webometrics(13 mar2011)Introduction to webometrics(13 mar2011)
Introduction to webometrics(13 mar2011)Webometrics Class
 
Defensa.V11
Defensa.V11Defensa.V11
Defensa.V11promanas
 
Extreme ethnography - challenges for conducting research in large scale onlin...
Extreme ethnography - challenges for conducting research in large scale onlin...Extreme ethnography - challenges for conducting research in large scale onlin...
Extreme ethnography - challenges for conducting research in large scale onlin...Dana Rotman
 
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEYUSING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEYcseij
 
Heraclitus: A Framework for Semantic Web Adaptation
Heraclitus: A Framework for Semantic Web AdaptationHeraclitus: A Framework for Semantic Web Adaptation
Heraclitus: A Framework for Semantic Web AdaptationAlexander Mikroyannidis
 
Introduction to webometrics(13 mar2011)
Introduction to webometrics(13 mar2011)Introduction to webometrics(13 mar2011)
Introduction to webometrics(13 mar2011)Myunggoon Choi
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesTheContentMine
 
Visual Information Retrieval: Advances, Challenges and Opportunities
Visual Information Retrieval: Advances, Challenges and OpportunitiesVisual Information Retrieval: Advances, Challenges and Opportunities
Visual Information Retrieval: Advances, Challenges and OpportunitiesOge Marques
 
Semantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionSemantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionKent State University
 
Artificial Intelligence Advances | Vol.3, Iss.1 April 2021
Artificial Intelligence Advances | Vol.3, Iss.1 April 2021Artificial Intelligence Advances | Vol.3, Iss.1 April 2021
Artificial Intelligence Advances | Vol.3, Iss.1 April 2021Bilingual Publishing Group
 
Scaling Microblogging Services with Divergent Traffic Demands
Scaling Microblogging Services with Divergent Traffic DemandsScaling Microblogging Services with Divergent Traffic Demands
Scaling Microblogging Services with Divergent Traffic Demandsyeung2000
 
Automatic detection of online abuse and analysis of problematic users in wiki...
Automatic detection of online abuse and analysis of problematic users in wiki...Automatic detection of online abuse and analysis of problematic users in wiki...
Automatic detection of online abuse and analysis of problematic users in wiki...Melissa Moody
 

Similar to User Access Patterns in Web Archives (20)

ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social Machines
 
A density based clustering approach for web robot detection
A density based clustering approach for web robot detectionA density based clustering approach for web robot detection
A density based clustering approach for web robot detection
 
Image retrieval from the world wide web issues, techniques, and systems
Image retrieval from the world wide web issues, techniques, and systemsImage retrieval from the world wide web issues, techniques, and systems
Image retrieval from the world wide web issues, techniques, and systems
 
Image retrieval from the world wide web
Image retrieval from the world wide webImage retrieval from the world wide web
Image retrieval from the world wide web
 
Introduction to webometrics(13 mar2011)
Introduction to webometrics(13 mar2011)Introduction to webometrics(13 mar2011)
Introduction to webometrics(13 mar2011)
 
Defensa.V11
Defensa.V11Defensa.V11
Defensa.V11
 
Extreme ethnography - challenges for conducting research in large scale onlin...
Extreme ethnography - challenges for conducting research in large scale onlin...Extreme ethnography - challenges for conducting research in large scale onlin...
Extreme ethnography - challenges for conducting research in large scale onlin...
 
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEYUSING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
 
Heraclitus: A Framework for Semantic Web Adaptation
Heraclitus: A Framework for Semantic Web AdaptationHeraclitus: A Framework for Semantic Web Adaptation
Heraclitus: A Framework for Semantic Web Adaptation
 
Semantic Web
Semantic WebSemantic Web
Semantic Web
 
Semantic Web
Semantic WebSemantic Web
Semantic Web
 
Introduction to webometrics(13 mar2011)
Introduction to webometrics(13 mar2011)Introduction to webometrics(13 mar2011)
Introduction to webometrics(13 mar2011)
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social Machines
 
Semantic Web-Linked Data and Libraries
Semantic Web-Linked Data and LibrariesSemantic Web-Linked Data and Libraries
Semantic Web-Linked Data and Libraries
 
Visual Information Retrieval: Advances, Challenges and Opportunities
Visual Information Retrieval: Advances, Challenges and OpportunitiesVisual Information Retrieval: Advances, Challenges and Opportunities
Visual Information Retrieval: Advances, Challenges and Opportunities
 
Don't Get Too Comfortable, The Landscape of eLearning is Changing (
Don't Get Too Comfortable, The Landscape of eLearning is Changing (Don't Get Too Comfortable, The Landscape of eLearning is Changing (
Don't Get Too Comfortable, The Landscape of eLearning is Changing (
 
Semantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionSemantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: Introduction
 
Artificial Intelligence Advances | Vol.3, Iss.1 April 2021
Artificial Intelligence Advances | Vol.3, Iss.1 April 2021Artificial Intelligence Advances | Vol.3, Iss.1 April 2021
Artificial Intelligence Advances | Vol.3, Iss.1 April 2021
 
Scaling Microblogging Services with Divergent Traffic Demands
Scaling Microblogging Services with Divergent Traffic DemandsScaling Microblogging Services with Divergent Traffic Demands
Scaling Microblogging Services with Divergent Traffic Demands
 
Automatic detection of online abuse and analysis of problematic users in wiki...
Automatic detection of online abuse and analysis of problematic users in wiki...Automatic detection of online abuse and analysis of problematic users in wiki...
Automatic detection of online abuse and analysis of problematic users in wiki...
 

More from Yasmin AlNoamany, PhD

Software as a Well-Formed Research Object
Software as a Well-Formed Research ObjectSoftware as a Well-Formed Research Object
Software as a Well-Formed Research ObjectYasmin AlNoamany, PhD
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...Yasmin AlNoamany, PhD
 
Generating stories from Archive-It collections
Generating stories from Archive-It collectionsGenerating stories from Archive-It collections
Generating stories from Archive-It collectionsYasmin AlNoamany, PhD
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingUsing Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingYasmin AlNoamany, PhD
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesYasmin AlNoamany, PhD
 
Characteristics of Social Media Stories
Characteristics of Social Media StoriesCharacteristics of Social Media Stories
Characteristics of Social Media StoriesYasmin AlNoamany, PhD
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesYasmin AlNoamany, PhD
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet ArchiveYasmin AlNoamany, PhD
 
Access Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web ArchivesAccess Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web ArchivesYasmin AlNoamany, PhD
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich  the Live Web Experience Through StorytellingUsing Web Archives to Enrich  the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingYasmin AlNoamany, PhD
 
Access Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web ArchivesAccess Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web ArchivesYasmin AlNoamany, PhD
 

More from Yasmin AlNoamany, PhD (14)

A Guide for Reproducible Research
A Guide for Reproducible ResearchA Guide for Reproducible Research
A Guide for Reproducible Research
 
Software as a Well-Formed Research Object
Software as a Well-Formed Research ObjectSoftware as a Well-Formed Research Object
Software as a Well-Formed Research Object
 
csvconfyasmin2017_05_03
csvconfyasmin2017_05_03csvconfyasmin2017_05_03
csvconfyasmin2017_05_03
 
Data curation vanderbilt
Data curation vanderbiltData curation vanderbilt
Data curation vanderbilt
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
 
Generating stories from Archive-It collections
Generating stories from Archive-It collectionsGenerating stories from Archive-It collections
Generating stories from Archive-It collections
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingUsing Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web Archives
 
Characteristics of Social Media Stories
Characteristics of Social Media StoriesCharacteristics of Social Media Stories
Characteristics of Social Media Stories
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web Archives
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet Archive
 
Access Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web ArchivesAccess Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web Archives
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich  the Live Web Experience Through StorytellingUsing Web Archives to Enrich  the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
 
Access Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web ArchivesAccess Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web Archives
 

User Access Patterns in Web Archives

  • 1. User Access Patterns in Web Archives Robot sessions outnumber human sessions 10:1 in the Internet Archive Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson {yasmin, mweigle, mln}@cs.odu.edu How do Users access Web Archives? Methodology Although user patterns in the live web are well-understood, there has been no corresponding study of how users, both humans and robots, access web archives. Data Set: random sample of 2M requests from Internet Archive’s Wayback Machine access logs of Feb. 2, 2012. Abstract Models for Accessing Web Archives Robots vs Humans User Robots Humans Raw Requests 1,002,573 (50.1%) 810,049 (40.5%) Filtered Requests 396,627 (93.0%) 29,690 (7.0%) Sessions 34,203 (90.9%) 3,431 (9.1%) MBs Transferred 20,010 4,459 Results 40 50 30 20 30 20 10 Percentage 40 TimeMap Memento 0 0 10 Dip Dive Slide and Dive Robots Skim Slide Dip Dive Slide & Dive Skim Slide Humans Robots and humans exhibit different access patterns. Conclusion • Robots outnumber humans 10:1 in terms of sessions, 5:4 in terms of raw HTTP accesses, and 4:1 in terms of MB transferred. • Robots mainly exhibit the Dip and Skim patterns, with about 49% of their sessions for each pattern, and that they access TimeMaps almost exclusively. • Humans exhibit the Dip pattern with 39% and Dive pattern with 30% of their sessions. Unlike robots, humans mainly access archived pages rather than TimeMaps. References 1- Access Patterns for Robots and Humans in Web Archives. Yasmin AlNomany, Michele C. Weigle and Michael L. Nelson. IEEE/ACM Joint Conference on Digital Libraries, 2013.