Access Patterns for Robots and Humans in Web Archives
                                                   Robot sessions outnumber human sessions 10:1 in the Internet Archive
                                                       Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson
                                                                                 {yasmin, mweigle, mln}@cs.odu.edu

How do Users access Web Archives?                                                                          Methodology
Although user patterns in the live web are well-understood, there has been no corresponding study of how   Data Set: random sample of 2M requests from Internet Archive’s Wayback Machine access logs of Feb. 2,
users, both humans and robots, access web archives.                                                        2012.

Abstract Models for Accessing Web Archives                                                                 Robots vs Humans
                                                                                                                          User        Raw Requests                   Filtered Requests          Sessions       MBs Transferred
                                                                                                                          Robots    1,002,573 (50.1%)                    396,627 (93.0%)   34,203 (90.9%)                20,010
                                                                                                                          Humans      810,049 (40.5%)                     29,690 (7.0%)     3,431 (9.1%)                  4,459


                                                                                                           Results

                                                                                                                         50




                                                                                                                                                                                    40
                                                                                                                                                                                                                               TimeMap
                                                                                                                                                                                                                               Memento
                                                                                                                         40




                                                                                                                                                                                    30
                                                                                                            Percentage
                                                                                                                         30




                                                                                                                                                                                    20
                                                                                                                         20




                                                                                                                                                                                    10
                                                                                                                         10

                                                                                                                         0




                                                                                                                                                                                    0
                                                                                                                              Dip    Dive    Slide and Dive   Skim       Slide             Dip    Dive   Slide & Dive   Skim    Slide


                                                                                                                                             Robots                                                      Humans

                                                                                                                                            Robots and humans exhibit different access patterns.

                                                                                                           Conclusion
                                                                                                                     • Robots outnumber humans 10:1 in terms of sessions, 5:4 in terms of raw HTTP accesses, and 4:1
                                                                                                                       in terms of MB transferred.
                                                                                                                     • Robots mainly exhibit the Dip and Skim patterns, with about 49% of their sessions for each pattern,
                                                                                                                       and that they access TimeMaps almost exclusively.
                                                                                                                     • Humans exhibit the Dip pattern with 39% and Dive pattern with 30% of their sessions. Unlike
                                                                                                                       robots, humans mainly access archived pages rather than TimeMaps.

                                                                                                           References
                                                                                                           1- Access Patterns for Robots and Humans in Web Archives. Yasmin AlNomany, Michele C. Weigle and
                                                                                                           Michael L. Nelson. IEEE/ACM Joint Conference on Digital Libraries, 2013.

Access Patterns for Robots and Humans in Web Archives

  • 1.
    Access Patterns forRobots and Humans in Web Archives Robot sessions outnumber human sessions 10:1 in the Internet Archive Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson {yasmin, mweigle, mln}@cs.odu.edu How do Users access Web Archives? Methodology Although user patterns in the live web are well-understood, there has been no corresponding study of how Data Set: random sample of 2M requests from Internet Archive’s Wayback Machine access logs of Feb. 2, users, both humans and robots, access web archives. 2012. Abstract Models for Accessing Web Archives Robots vs Humans User Raw Requests Filtered Requests Sessions MBs Transferred Robots 1,002,573 (50.1%) 396,627 (93.0%) 34,203 (90.9%) 20,010 Humans 810,049 (40.5%) 29,690 (7.0%) 3,431 (9.1%) 4,459 Results 50 40 TimeMap Memento 40 30 Percentage 30 20 20 10 10 0 0 Dip Dive Slide and Dive Skim Slide Dip Dive Slide & Dive Skim Slide Robots Humans Robots and humans exhibit different access patterns. Conclusion • Robots outnumber humans 10:1 in terms of sessions, 5:4 in terms of raw HTTP accesses, and 4:1 in terms of MB transferred. • Robots mainly exhibit the Dip and Skim patterns, with about 49% of their sessions for each pattern, and that they access TimeMaps almost exclusively. • Humans exhibit the Dip pattern with 39% and Dive pattern with 30% of their sessions. Unlike robots, humans mainly access archived pages rather than TimeMaps. References 1- Access Patterns for Robots and Humans in Web Archives. Yasmin AlNomany, Michele C. Weigle and Michael L. Nelson. IEEE/ACM Joint Conference on Digital Libraries, 2013.