User Access Patterns in Web Archives
Robot sessions outnumber human sessions 10:1 in the Internet Archive
Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson
{yasmin, mweigle, mln}@cs.odu.edu
How do Users access Web Archives?

Methodology

Although user patterns in the live web are well-understood, there has been no corresponding study of how
users, both humans and robots, access web archives.

Data Set: random sample of 2M requests from Internet Archive’s Wayback Machine access logs of Feb. 2,
2012.

Abstract Models for Accessing Web Archives

Robots vs Humans
User
Robots
Humans

Raw Requests
1,002,573 (50.1%)
810,049 (40.5%)

Filtered Requests
396,627 (93.0%)
29,690 (7.0%)

Sessions
34,203 (90.9%)
3,431 (9.1%)

MBs Transferred
20,010
4,459

Results
40

50

30
20

30
20
10

Percentage

40

TimeMap
Memento

0

0

10

Dip

Dive

Slide and Dive

Robots

Skim

Slide

Dip

Dive

Slide & Dive

Skim

Slide

Humans

Robots and humans exhibit different access patterns.

Conclusion
• Robots outnumber humans 10:1 in terms of sessions, 5:4 in terms of raw HTTP accesses, and 4:1
in terms of MB transferred.
• Robots mainly exhibit the Dip and Skim patterns, with about 49% of their sessions for each pattern,
and that they access TimeMaps almost exclusively.
• Humans exhibit the Dip pattern with 39% and Dive pattern with 30% of their sessions. Unlike
robots, humans mainly access archived pages rather than TimeMaps.

References
1- Access Patterns for Robots and Humans in Web Archives. Yasmin AlNomany, Michele C. Weigle and
Michael L. Nelson. IEEE/ACM Joint Conference on Digital Libraries, 2013.

User Access Patterns in Web Archives

  • 1.
    User Access Patternsin Web Archives Robot sessions outnumber human sessions 10:1 in the Internet Archive Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson {yasmin, mweigle, mln}@cs.odu.edu How do Users access Web Archives? Methodology Although user patterns in the live web are well-understood, there has been no corresponding study of how users, both humans and robots, access web archives. Data Set: random sample of 2M requests from Internet Archive’s Wayback Machine access logs of Feb. 2, 2012. Abstract Models for Accessing Web Archives Robots vs Humans User Robots Humans Raw Requests 1,002,573 (50.1%) 810,049 (40.5%) Filtered Requests 396,627 (93.0%) 29,690 (7.0%) Sessions 34,203 (90.9%) 3,431 (9.1%) MBs Transferred 20,010 4,459 Results 40 50 30 20 30 20 10 Percentage 40 TimeMap Memento 0 0 10 Dip Dive Slide and Dive Robots Skim Slide Dip Dive Slide & Dive Skim Slide Humans Robots and humans exhibit different access patterns. Conclusion • Robots outnumber humans 10:1 in terms of sessions, 5:4 in terms of raw HTTP accesses, and 4:1 in terms of MB transferred. • Robots mainly exhibit the Dip and Skim patterns, with about 49% of their sessions for each pattern, and that they access TimeMaps almost exclusively. • Humans exhibit the Dip pattern with 39% and Dive pattern with 30% of their sessions. Unlike robots, humans mainly access archived pages rather than TimeMaps. References 1- Access Patterns for Robots and Humans in Web Archives. Yasmin AlNomany, Michele C. Weigle and Michael L. Nelson. IEEE/ACM Joint Conference on Digital Libraries, 2013.