Access Patterns for Robots and Humans in Web Archives
User Access Patterns in Web Archives
1. User Access Patterns in Web Archives
Robot sessions outnumber human sessions 10:1 in the Internet Archive
Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson
{yasmin, mweigle, mln}@cs.odu.edu
How do Users access Web Archives?
Methodology
Although user patterns in the live web are well-understood, there has been no corresponding study of how
users, both humans and robots, access web archives.
Data Set: random sample of 2M requests from Internet Archive’s Wayback Machine access logs of Feb. 2,
2012.
Abstract Models for Accessing Web Archives
Robots vs Humans
User
Robots
Humans
Raw Requests
1,002,573 (50.1%)
810,049 (40.5%)
Filtered Requests
396,627 (93.0%)
29,690 (7.0%)
Sessions
34,203 (90.9%)
3,431 (9.1%)
MBs Transferred
20,010
4,459
Results
40
50
30
20
30
20
10
Percentage
40
TimeMap
Memento
0
0
10
Dip
Dive
Slide and Dive
Robots
Skim
Slide
Dip
Dive
Slide & Dive
Skim
Slide
Humans
Robots and humans exhibit different access patterns.
Conclusion
• Robots outnumber humans 10:1 in terms of sessions, 5:4 in terms of raw HTTP accesses, and 4:1
in terms of MB transferred.
• Robots mainly exhibit the Dip and Skim patterns, with about 49% of their sessions for each pattern,
and that they access TimeMaps almost exclusively.
• Humans exhibit the Dip pattern with 39% and Dive pattern with 30% of their sessions. Unlike
robots, humans mainly access archived pages rather than TimeMaps.
References
1- Access Patterns for Robots and Humans in Web Archives. Yasmin AlNomany, Michele C. Weigle and
Michael L. Nelson. IEEE/ACM Joint Conference on Digital Libraries, 2013.