Your SlideShare is downloading. ×
0
Archival HTTP RedirectionRetrieval PoliciesTemporal Web Analytics Workshop 2013, Rio De JaniroAhmed AlSum,Michael L. Nelso...
Agenda• Introduction• Abstract Model• Experiment And Results• Retrieval Policies2
Memento TerminologyURI-R, RURI-M, MURI-T, TMhttp://www.amazon.comhttp://web.archive.org/web/20110411070244/http://amazon.c...
Live Redirecthttp://bit.ly/r9kIfC redirects to http://www.cs.odu.edu% curl -I http://bit.ly/r9kIfCHTTP/1.1 301 Moved….Loca...
Live Redirecthttp://bit.ly/r9kIfC redirects to http://www.cs.odu.edu5
Archived Redirectwww.draculathemusical.co.ukredirectswww.dracula-uk.com/index.htmlhttp://api.wayback.archive.org/memento/2...
Abstract Model7
Abstract ModelM1 M2 M38
URI Stability• URI’s stability is a count of the change in HTTPresponses across time (200, 3xx, or 4xx) and thenumber of d...
Timemap Redirection Categories• Category 1All Mementos have 200 HTTP status code10
Timemap Redirection Categories• Category 2All Mementos have redirection to the same URI.11
Timemap Redirection Categories• Category 3All Mementos have redirection to different URIs.12
Timemap Redirection Categories• Category 4Mementos have different HTTP status code.13
Timemap Redirection CategoriesAll Mementos have 200 HTTP status code All Mementos have redirection to the same URI.All Mem...
URI ReliabilityM13xxM23xxM33xxrel=originalR`Mrel=originalR`Mrel=originalR`M? ? ?200 404 3xx15
HTTP Redirection Relationshipbetween URI-R & URI-MLive Web URI − ROK RedirectionWeb ArchiveURI-MOK Case 1 5Redirection 2 3...
Experiment & Results17
Experiment• Dataset: 10,000 sample URIs from• Dataset doesn’t have bit.ly nor doi.• Experiment foucsed on the root page (n...
Time span Number of Mementos19
URI StabilityTimeMap Category Percentage StabilityAll Mementos have OK 52% 1Mementos have mix status code 36% 0.91All Meme...
URI Reliability• 23% of the mementos did not lead to a successfulmemento at the end.Reliabilityin semi-log scale Reliabili...
HTTP Redirection Relationshipbetween URI-R & URI-MLive Web URI − ROK RedirectionWeb ArchiveURI-MOK Case 1 5Redirection 2 3...
RETRIEVAL POLICIESARCHIVED HTTP REDIRECTION RETRIEVAL POLICIES23
Current WaybackMachine Policy24
Policy one:URI-R with HTTP redirectionRetrieve the memento M for R.Status(M) =200Status(M) =3xxStopGo to Policy 2StopYesYe...
Policy one:URI-R with HTTP redirection• Evaluation:o Policy scope has: 1471 URIs (that have live redirection)o 77 out of 1...
Policy two:URI-M with HTTP redirectionhttp://www.cnn.com/Accept-Datetime: Sun, 13 May 2006http://www.cnn.com/27
Policy two:URI-M with HTTP redirection• Evaluation:o Policy scope: 2980 TimeMap (that showed HTTP redirection status code ...
Conclusion• Quantitative study with 10,000 URIs.• 48% were not fully stable through time.• 27% were not perfectly reliable...
Upcoming SlideShare
Loading in...5
×

Archival HTTP Redirection Retrieval Policies - TemporalWeb 2013

1,544

Published on

This is my presentation in TemporalWeb 2013 workshop collocated with WWW 2013

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,544
On Slideshare
0
From Embeds
0
Number of Embeds
36
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Archival HTTP Redirection Retrieval Policies - TemporalWeb 2013"

  1. 1. Archival HTTP RedirectionRetrieval PoliciesTemporal Web Analytics Workshop 2013, Rio De JaniroAhmed AlSum,Michael L. NelsonOld Dominion UniversityNorfolk VA, USA{aalsum,mln}@cs.odu.eduRobert Sanderson,Herbert Van de SompelLos Alamos National LaboratoryLos Alamos NM, USA{rsanderson,herbertv}@lanl.gov1
  2. 2. Agenda• Introduction• Abstract Model• Experiment And Results• Retrieval Policies2
  3. 3. Memento TerminologyURI-R, RURI-M, MURI-T, TMhttp://www.amazon.comhttp://web.archive.org/web/20110411070244/http://amazon.comOriginal ResourceMementoTimeMap3
  4. 4. Live Redirecthttp://bit.ly/r9kIfC redirects to http://www.cs.odu.edu% curl -I http://bit.ly/r9kIfCHTTP/1.1 301 Moved….Location: http://www.cs.odu.edu/…4
  5. 5. Live Redirecthttp://bit.ly/r9kIfC redirects to http://www.cs.odu.edu5
  6. 6. Archived Redirectwww.draculathemusical.co.ukredirectswww.dracula-uk.com/index.htmlhttp://api.wayback.archive.org/memento/20020212194020/http://www.draculathemusical.co.uk/Archived redirectshttp://api.wayback.archive.org/memento/20020212194020/http://www.geocities.com/draculathemusicalhttp://api.wayback.archive.org/memento/20020212194020/http://www.geocities.com/draculathemusical6
  7. 7. Abstract Model7
  8. 8. Abstract ModelM1 M2 M38
  9. 9. URI Stability• URI’s stability is a count of the change in HTTPresponses across time (200, 3xx, or 4xx) and thenumber of different URIs in the “Location” for 3xxstatus code.High Stability = 1 No Stability = 09
  10. 10. Timemap Redirection Categories• Category 1All Mementos have 200 HTTP status code10
  11. 11. Timemap Redirection Categories• Category 2All Mementos have redirection to the same URI.11
  12. 12. Timemap Redirection Categories• Category 3All Mementos have redirection to different URIs.12
  13. 13. Timemap Redirection Categories• Category 4Mementos have different HTTP status code.13
  14. 14. Timemap Redirection CategoriesAll Mementos have 200 HTTP status code All Mementos have redirection to the same URI.All Mementos have redirection to different URIs. Mementos have different HTTP status code.14
  15. 15. URI ReliabilityM13xxM23xxM33xxrel=originalR`Mrel=originalR`Mrel=originalR`M? ? ?200 404 3xx15
  16. 16. HTTP Redirection Relationshipbetween URI-R & URI-MLive Web URI − ROK RedirectionWeb ArchiveURI-MOK Case 1 5Redirection 2 3,4Case 1Case 2 Case 3 Case 4 Case 516
  17. 17. Experiment & Results17
  18. 18. Experiment• Dataset: 10,000 sample URIs from• Dataset doesn’t have bit.ly nor doi.• Experiment foucsed on the root page (no embeddedresources)HTTP Status/Code (10,000 URI-R)OK (200) 82.83%Redirection (3xx) 14.71%Redirection (301) 8.4%Redirection (302) 6.1%Redirection (others) 0.2%Not-Found (4xx) 1.18%Others 1.28%HTTP Status/Code (894,717 URI-M)OK (200) 93.46%Redirection (3xx) 5.69%Not-Found (4xx) 0.26%Others 0.59%URIs Live HTTP status code Memento HTTP status code18
  19. 19. Time span Number of Mementos19
  20. 20. URI StabilityTimeMap Category Percentage StabilityAll Mementos have OK 52% 1Mementos have mix status code 36% 0.91All Mementos have Redirection 0.92% 0.85Redirection to the same URI 0.62%Redirection to different URIs 0.30%URI has no Mementos at all 10.97% 0Stability in semi-log scale Stability for |TM(R)| < 300 20
  21. 21. URI Reliability• 23% of the mementos did not lead to a successfulmemento at the end.Reliabilityin semi-log scale Reliabilityfor |TM(R)| < 30021
  22. 22. HTTP Redirection Relationshipbetween URI-R & URI-MLive Web URI − ROK RedirectionWeb ArchiveURI-MOK Case 1 5Redirection 2 3,4Case 1Case 2 Case 3 Case 4 Case 580.8%2.74% 1.34%1.33%13.7%22
  23. 23. RETRIEVAL POLICIESARCHIVED HTTP REDIRECTION RETRIEVAL POLICIES23
  24. 24. Current WaybackMachine Policy24
  25. 25. Policy one:URI-R with HTTP redirectionRetrieve the memento M for R.Status(M) =200Status(M) =3xxStopGo to Policy 2StopYesYesYes NoNoNo25
  26. 26. Policy one:URI-R with HTTP redirection• Evaluation:o Policy scope has: 1471 URIs (that have live redirection)o 77 out of 1471 have no mementos at allo 17 out of 77 have been retrieved mementos based on live redirection• Implementation26Tool CommentMementoFox v 9.6+mcurl v 1.0IA Wayback Machine For bit.ly URIs only
  27. 27. Policy two:URI-M with HTTP redirectionhttp://www.cnn.com/Accept-Datetime: Sun, 13 May 2006http://www.cnn.com/27
  28. 28. Policy two:URI-M with HTTP redirection• Evaluation:o Policy scope: 2980 TimeMap (that showed HTTP redirection status code in at least one memento)o Success criteria: Using policy two contributed to the original TimeMapo Success percentage: 58% of the cases28
  29. 29. Conclusion• Quantitative study with 10,000 URIs.• 48% were not fully stable through time.• 27% were not perfectly reliable through time.• New archival retrieval policy:o Policy one: successfully retreived mementos for17 out of 77o Policy two: Expanded the timemap for 58% of cases.• aalsum@cs.odu.edu• @aalsum29
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×