Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Impact of URI Canonicalization on Memento Count

808 views

Published on

By Mat Kelly, Lulwah M. Alkwai, Sawood Alam, Michael L. Nelson, Michele C. Weigle, and Herbert Van de Sompel

Published in: Technology
  • Be the first to comment

Impact of URI Canonicalization on Memento Count

  1. 1. Impact of URI Canonicalization on Memento Count Mat Kelly1 , Lulwah M. Alkwai1 , Sawood Alam1 , Michael L. Nelson1 , Michele C. Weigle1 , and Herbert Van de Sompel2 1 Web Science and Digital Libraries (WS-DL) Research Group Old Dominion University, Norfolk, Virginia, USA ws-dl.cs.odu.edu • @WebSciDL 2 Los Alamos National Laboratory Los Alamos, New Mexico, USA @hvdsomp Web Archiving and Digital Libraries (WADL) Workshop 2017 June 22-23, 2017 Toronto, Canada https://arxiv.org/abs/1703.03302
  2. 2. Memento COUNT from a Web Interface
  3. 3. Memento COUNT from a TimeMap |TM|rel
  4. 4. Memento COUNT from a CDX Endpoint https://arxiv.org/abs/1703.03302
  5. 5. https://arxiv.org/abs/1703.03302
  6. 6. Accurate Counting Impossible without Dereferencing https://arxiv.org/abs/1703.03302
  7. 7. Google Redirection Patterns https://arxiv.org/abs/1703.03302
  8. 8. How Bad Is It? -- A Metric Naive counting solely using contents of TimeMap
  9. 9. How Bad Is It? -- A Metric
  10. 10. Google Redirection Over Time DI < 1 → more 3xxs than 200s DI = 1 → one 3xx for every 200 DI > 1 → more 200s than 3XXs
  11. 11. Google DI Compared to Other URI-Rs https://arxiv.org/abs/1703.03302
  12. 12. % Redirects Over Time ● Revisits (no content change) ● Scheme switch ● Subdomain switch ● Slash-added ● others... https://arxiv.org/abs/1703.03302
  13. 13. HTTPS Adoption? ● Early, quick redirects attributed to slash-added pattern ● Crawl rate increase → Fewer changes → More revisits ● Δtime for HTTP→ HTTPS redirect by year: Datetime between two URI-Ms is ≤ 2 sec. google.com, collected May 2016 2012 2014 2016 https://arxiv.org/abs/1703.03302
  14. 14. Impact of URI Canonicalization on Memento Count Mat Kelly1 , Lulwah M. Alkwai1 , Sawood Alam1 , Michael L. Nelson1 , Michele C. Weigle1 , and Herbert Van de Sompel2 1 Web Science and Digital Libraries (WS-DL) Research Group Old Dominion University, Norfolk, Virginia, USA ws-dl.cs.odu.edu • @WebSciDL 2 Los Alamos National Laboratory Los Alamos, New Mexico, USA @hvdsomp Web Archiving and Digital Libraries (WADL) Workshop 2017 June 22-23, 2017 Toronto, Canada https://arxiv.org/abs/1703.03302 http://ws-dl.blogspot.com/2017/03/2017-03-24-impact-of-uri.html
  15. 15. Backup Slides
  16. 16. URI Canonicalization ● http://www.example.com ● https://www.example.com ● http://example.com/ ● http://example.com/index.html ● http://example.com/#articles canonicalize to... example.com https://arxiv.org/abs/1703.03302
  17. 17. Google Redirection Patterns

×