Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Quantifying Orphaned Annotations in Hypothes.is

2,252 views

Published on

Web annotation has been receiving increased attention recently with the organization of the Open Annotation Collaboration and new tools for open annotation, such as Hypothes.is. In this paper, we investigate the prevalence of orphaned annotations, where a live Web page no longer contains the text that had previously been annotated in the
Hypothes.is annotation system (containing 20,953 highlighted text annotations).

Published in: Science
  • Be the first to comment

Quantifying Orphaned Annotations in Hypothes.is

  1. 1. Quantifying Orphaned Annotations in Hypothes.is Mohamed Aturban, Michael L. Nelson, and Michele C. Weigle Department of Computer Science, Old Dominion University, Norfolk, VA 23529 1 TPDL 2015 Poznan, Poland September 13-17, 2015
  2. 2. What is Web Annotation? 2 Handwritten Annotations Web Annotations Haslhofer, B., Simon, R., Sanderson, R., Van de Sompel, H.: The Open Annotation Collaboration (OAC) model. In: Proceedings of the IEEE Workshop on Multimedia on the Web (MMWeb). pp. 5–9. IEEE (2011) http://networkedlearningcollaborative.com/wp-content/uploads/2015/07/53e12bf10cf2d79877a53311.pdf
  3. 3. • Open Annotation Collaboration (OAC) group defines an annotation as a set of connected resources What is Web Annotation? 3
  4. 4. • Open Annotation Collaboration (OAC) group defines an annotation as a set of connected resources What is Web Annotation? 4
  5. 5. Why is Web Annotation Important? • A collaborative tool • Social criticism • Education for students and teachers • Scholarly and academic purposes • Editing and publishing Web annotations do not modify the original resource 5
  6. 6. Annotating Web Resources Using Hypothes.is 6 Highlights Notes Replies Tags Target URI
  7. 7. The Annotation is Attached to the Live Web 7 http://www.wired.co.uk/news/archive/2014-02/06/tim-berners-lee-reclaim-the- web/viewgallery/332234 August 2015 (from live web) https://hypothes.is/a/tp5hnn4PTjuSg7h bc7_cxQ Annotation made in February 2014 Tags
  8. 8. A Review of Annotation Attachment to the Live & Archived Web 8 Does an annotation attach to its target’s live webpage? Does an annotation attach to any archived copies (Mementos) of the target webpage 1 YES YES 2 YES NO 3 NO YES 4 NO NO
  9. 9. The Annotation is Attached to the Live Web and to an Archived Copy (Memento) of the Webpage 9 https://web.archive.org/web/20140207083733/http:// www.wired.co.uk/news/archive/2014-02/06/tim- berners-lee-reclaim-the-web/viewgallery/332234 February 2014 (an archived copy) http://www.wired.co.uk/news/archive/2014- 02/06/tim-berners-lee-reclaim-the- web/viewgallery/332234 August 2015 (from live web) https://hypothes.is/a/tp5hnn4PT juSg7hbc7_cxQ Annotation made in February 2014
  10. 10. The Annotation is Attached to the Live Web but No Mementos Are Available 10 http://tkbr.ccsp.sfu.ca/pub802/2015/01/more- horsepower-to-wattpad/ August 2015 (from live web) https://hypothes.is/a/o_2W8QwZR Dm8w0F1dqRxUQ and https://hypothes.is/a/NE8AT6R3Tn 6Qg6FeYy1C0w Annotation made in February 2015 The Annotation is in Danger of being Orphaned
  11. 11. The Annotation is Not Attached to the Live Web but It is Attached to Mementos 11 https://web.archive.org/web/201412101210 18/http://climatefeedback.org/ December 2014 (an archived copy) climatefeedback.org Annotation made in December 2014 climatefeedback.org August 2015 (from live web)
  12. 12. The Annotation is Not Attached to the Live Web and No Mementos Are Available 12 http://renaissancejohnson.weebly.com/spensers- wordcloud.html August 2015 (from live web) https://hypothes.is/a/wFLZKLGqS8 S3Zyfr_Rmu4w Annotation in July 2015 The Annotation is Orphaned
  13. 13. Four Different Cases For Annotation Attachment Does an annotation attach to its target’s live webpage? Does an annotation attach to any archived copies (Mementos) of the target webpage 1 YES YES 2 YES NO 3 NO YES 4 NO NO 13 In Danger of being Orphaned Orphaned Safe Can Reattach to Memento
  14. 14. We Studied Hypothes.is Annotations •How many annotations are orphaned? •How many annotations are in danger of being orphaned? •How many annotations can be reattached to mementos in public web archives? 14
  15. 15. Related Work • OAC introduced the idea to make annotations reusable through different systems • Annotations, as web resources, have unique URIs • Sanderson and Van de Sompel introduced a framework to make web annotations persistent over time • Integrating features in the Open Annotation Data Model with the Memento framework • Reconstructing annotations for a given memento • Retrieving mementos for a given annotation • Kreymer's Browsertrix provides on-demand web archiving • Whenever an annotation is created, a copy of the related webpage could be archived automatically • Funded by Hypothes.is 15 Sanderson, R., Van de Sompel, H.: Making web annotations persistent over time. In: Proceedings of the 10th ACM/IEEE Joint Conference on Digital Libraries (JCDL). pp. 1–10. ACM (2010) http://blog.webrecorder.io/2015/06/open-annotation-fund-project.html https://hypothes.is/blog/fund-on-demand-web-archiving-completion/
  16. 16. Annotations in Hypothes.is Are Increasing 16 January 2015 (7744) August 2015 ( 33,946 ) January 2015 - dataset used in TPDL 2015 paper August 2015 - dataset presented here and in arXiv version
  17. 17. Annotation Types in Hypothes.is 17 Highlighted Text Note Tags Number of Annotations  11,289   9858    9252   1835   1356  348  8 33,953
  18. 18. Annotation Types in Hypothes.is 18 Highlighted Text Note Tags Number of Annotations  11,289   9858    9252   1835   1356  348  8 33,953 We studied 20,953 annotations that contain highlighted text
  19. 19. Several Academic Sites Use Hypothes.is Widely 19 Number of Annotations Contain Highlighted Text Host 1222 caseyboyle.net 1191 www.perseus.tufts.edu 887 rhetoric.eserver.org 875 networkedlearningcollaborative.com 749 sosol.perseids.org 733 tkbr.ccsp.sfu.ca 526 shakespeare.mit.edu 391 hypothes.is 356 renaissancejohnson.weebly.com 336 moodle2.wesleyan.edu
  20. 20. We Issued HTTP Head Requests for Target URIs of All Annotations 20 Number of Annotations Status Code Example 18,167 200 OK http://www.w3.org/Talks/9704WWW6-tbl/slide16.htm 820 Unresolvable URIs file:///Users/peggy/Desktop/CIRCLE-youthvoting-individualPages.pdf 778 Timeout http://testbelfastgroup.digitalscholarship.emory.edu/ 666 404 https://www.facebook.com/manunymous 190 Soft 4XX http://www.transmography.net/braineryworkshop/camperforce-by-joseph/ 87 401 http://wiki.shuttleworthfoundation.org/~shuttlew/wiki/index.php?title=Nov_2 013_DW_Dogfood_prep 80 503 https://via.hypothes.is/http://b.pagekite.me/blog/2015-04- 27_Roadmap_to_v1.html 68 403 http://onlinelibrary.wiley.com/store/10.1002/2013EF000191/asset/eft214.pdf? v=1&t=hppouayu&s=e3a980c9e2c6317987306d4e1d76c690c29fe758 48 410 https://www.scribd.com/word/removal/31126999 21 406 https://www.scribd.com/deleted/95457320 19 500 http://androidfrat.com/2015/01/the-new-usb-announcement-just-killed-the- usb-super-position/ 9 400, 416, 504, 520 https://ibpublishing.ibo.org/live-exist/rest/app/pub.xql?doc=EX_Instructions_ 2013_e&part=10&chapter=4&page=1
  21. 21. Out of 33,946 Annotations, We Investigated 20,133 Target URIs 21
  22. 22. { "updated": "2014-02-10T22:51:03.920650+00:00", "target": [ { "source": "http://www.wired.co.uk/news/archive/2014-02/06/tim-berners- lee-reclaim-the-web", "selector": [ { "endContainer": "/div[3]/form[1]/div[2]/article[1]/div[1]/div[1]/div[2]/div[1]/div[1]/p[1]", "endOffset": 187, "type": "RangeSelector", "startOffset": 0, "startContainer": “/div[3]/form[1]/div[2]/article[1]/div[1]/div[1]/div[2]/div[1]/div[1]/p[1]" }, { "exact": "Twenty-five years on from the web's inception, its creator has urged the public to re-engage with its original design: a decentralised internet that at its very core, remains open to all.", "prefix": "anChris Woods / chrismwoods.com", "type": "TextQuoteSelector", "suffix": "Speaking with Wired editor Davi" }, { "start": 307, "end": 494, "type": "TextPositionSelector" }]}], "tags": ["w3", "re-decentrilization" ], "text": "", "created": "2014-02-10T22:51:03.920636+00:00", "uri": "http://www.wired.co.uk/news/archive/2014-02/06/tim-berners-lee- reclaim-the-web", "user": "acct:aculich@hypothes.is", "consumer": "00000000-0000-0000-0000-000000000000", "id": "tp5hnn4PTjuSg7hbc7_cxQ", "permissions": { "admin": [ "acct:aculich@hypothes.is“ ], … An Annotation: JSON (L) and Visualized (R) 22 http://www.wired.co.uk/news/archive/2014-02/06/tim-berners-lee- reclaim-the-web/viewgallery/332234 August 2015
  23. 23. • Compare an annotation’s highlighted text with a live webpage’s content. • Download annotation JSON from Hypothes.is • Extract the text from the annotation’s target URI • If the annotation’s highlighted text is found in the webpage the annotation is attached to the live web 23 Methodology
  24. 24. 24 Only 78% of Highlighted Text Annotations Attach to the Live Web
  25. 25. Discovering Mementos for All Resolvable Target URIs • Using LANL Memento Aggregator • Considering only mementos with datetime immediately before or after the annotation’s creation date • Four cases regarding the availability of mementos for the annotation’s target URI: • Mementos exist before and after the annotation’s creation date • Mementos exist only before the annotation’s creation date • Mementos exist only after annotation’s creation date • No mementos exist 25
  26. 26. Mementos Exist Before and After the Annotation Creation Date 26 25% (4986) of resolvable target URIs
  27. 27. 27 Mementos Exist Only Before the Annotation Creation Date 12% (2477) of resolvable target URIs
  28. 28. 28 7% (1397) of resolvable target URIs Mementos Exist Only After the Annotation Creation Date
  29. 29. 29 No Mementos Exist 56% (11273) of resolvable target URIs
  30. 30. Are Annotations Attached to Existing Mementos? • Similar to checking if an annotation is attached to the live web • If the highlighted text is found in a memento the annotation is attached and could be recovered. • 8860 annotations have a target URI with at least one memento • Of these, 90% (7963) can be attached to a memento 30
  31. 31. Annotation Targets with Existing Mementos Before and After the Annotation Creation Date 31 Attached to Live Web Page Attached to Memento (Before) Attached to Memento (After) Number of Annotations Yes Yes Yes 4091 Yes Yes No 93 Yes No Yes 100 Yes No No 182 No Yes Yes 251 No Yes No 69 No No Yes 44 No No No 156 4986 (Total)
  32. 32. Attached to Live Web Page Attached to Memento (Before) Attached to Memento (After) Number of Annotations Yes Yes Yes 4091 Yes Yes No 93 Yes No Yes 100 Yes No No 182 No Yes Yes 251 No Yes No 69 No No Yes 44 No No No 156 4986 (Total) Annotation Targets with Existing Mementos Before and After the Annotation Creation Date 32 In Danger Orphaned
  33. 33. Annotation Targets with Existing Mementos Only Before the Annotation Creation Date 33 Attached to Live Web Page Attached to Memento (Before) Number of Annotations Yes Yes 1984 Yes No 235 No Yes 133 No No 125 2477 (Total)
  34. 34. Attached to Live Web Page Attached to Memento (Before) Number of Annotations Yes Yes 1984 Yes No 235 No Yes 133 No No 125 2477 (Total) Annotation Targets with Existing Mementos Only Before the Annotation Creation Date 34 In Danger Orphaned
  35. 35. Attached to Live Web Page Attached to Memento (After) Number of Annotations Yes Yes 1148 Yes No 101 No Yes 50 No No 98 1397 (Total) Annotation Targets with Existing Mementos Only After the Annotation Creation Date 35
  36. 36. Attached to Live Web Page Attached to Memento (After) Number of Annotations Yes Yes 1148 Yes No 101 No Yes 50 No No 98 1397 (Total) Annotation Targets with Existing Mementos Only After the Annotation Creation Date 36 In Danger Orphaned
  37. 37. Annotation Targets with No Existing Mementos 37 Attached to Live Web Page Number of Annotations Yes 7839 No 3434 11,273 (Total)
  38. 38. Attached to Live Web Page Number of Annotations Yes 7839 No 3434 11,273 (Total) Annotation Targets with No Existing Mementos 38 In Danger Orphaned
  39. 39. How Many Orphaned Annotations Does Hypothes.is Have? • 19% (3813) of annotations are orphaned • 41% (8357) of annotations are in danger of being orphaned • In total, 60% (12,170) of annotations are either orphaned or in danger of being orphaned 39
  40. 40. How Many Annotations Can Be Reattached Using Web Archives? • Archives could only save 3% (547) of annotations that would otherwise be orphaned • 37% (7416) of annotations are safe -- attached to the live web and also attached to one or more mementos. 40
  41. 41. Archives Used to Attach Annotations to Mementos 41 Archive Attached to Live Web Not Attached to Live Web web.archive.org 6997 (94.3%) 455 (83.1%) archive.is 679 (9.15%) 39 (7.12%) wayback.archive-it.org 562 (7.57%) 47 (8.59%) github.com 80 (1.07%) 21 (3.83%) wayback.vefsafn.is 71 (0.95%) 53 (9.68%) arxiv.org 18 (0.24%) 0 webarchive.loc.gov 3 (0.04%) 0 webarchive.org.uk 4 (0.05%) 0 webarchive.nationalarchives.gov.uk 2 (0.02%) 0 discordia.wikia.com 1 (0.01%) 0 Total 8417 (113.4%) 615 (112.32%) A single annotation may reattach to mementos from multiple archives
  42. 42. The Status of Current Hypothes.is Annotations 42 Highlighted Text Annotations with Resolvable Target URIs : 20,133
  43. 43. Conclusion • We analyzed the attachment of 20,953 highlighted text annotations in Hypothes.is. • 60% of annotations are orphaned or in danger of being orphaned • 7963 mementos from 10 different web archives could be used to keep the remaining 40% of annotations safe. 43 Archiving webpages at the time of annotation is important to avoid orphaned annotations.

×