Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Storytelling With Web Archives

394 views

Published on

This CEDWARC presentation highlights AlNoamany's Algorithm and other research done to apply social media storytelling techniques to web archives.

Published in: Technology
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Storytelling With Web Archives

  1. 1. @shawnmjones @WebSciDL Storytelling With Web Archives Giving visitors a taste of a huge collection Thanks to: Shawn M. Jones Web Science and Digital Libraries Research Group Old Dominion University RE-70-18-0005-18
  2. 2. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Storytelling With Web Archives Has Multiple Use Cases… 1. Promotion of the collection  Storytelling allows a curator to promote a collection, making others aware of it 2. Exploring aspects of the collection  Users can explore a collection and expose specific sides of a news story or focus on specific people or places 3. Summarization  Web archive collections are too large for manual review – we need a summary to understand what they contain 2
  3. 3. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Lesson Goals  Introduce social media storytelling  Identify the 2 actions of conducting storytelling with web archives  Provide an overview of AlNoamany’s Algorithm, used for generating the resources for a story  Highlight tools from the Dark and Stormy Archives project for producing stories 3
  4. 4. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Storytelling in literature consists of elements… 4 Story elements: setting, characters, sequence, exposition, conflict, climax, resolution Wikipedia contributors. (2019, August 16). Dramatic structure. In Wikipedia, The Free Encyclopedia. Retrieved 15:56, August 16, 2019, from https://en.wikipedia.org/w/index.php?title=Dramatic_structure&oldid=911098220 Annenberg Foundation. (2017). Interactives: Elements of a Story. In Annenberg Learner: Teacher resources and professional development across the curriculum. Retrieved 15:59, August 16, 2019, from http://www.learner.org/interactives/story/
  5. 5. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Storytelling in social media consists of resources… 5 A sampling and arrangement of web resources for summarization.
  6. 6. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Social cards are summaries of web resources  URLs can be difficult for people to comprehend  Social cards provide a title, small text snippet, and striking image from the web page behind a URL 6 https://www.google.com/maps/dir/Old+Dominion+University,+Norfolk,+VA/Los+Alamos+National+Laboratory,+New+Mexico/@35.3644614,- 109.356967,4z/data=!3m1!4b1!4m13!4m12!1m5!1m1!1s0x89ba99ad24ba3945:0xcd2bdc432c4e4bac!2m2!1d-76.3067676!2d36 .8855515!1m5!1m1!1s0x87181246af22e765:0x7f5a90170c5df1b4!2m2!1d-106.287162!2d35.8440582 Long URL: vs. The same URL represented as a social card:
  7. 7. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Social media storytelling uses groups of social cards to provide a “summary of summaries” 7 2 resources are shown in this Wakelet story6 resources are shown in this Storify story Each social card summarizes a web resource. Each story groups the social cards, summarizing the topic. Social cards contain the same information in the same place on each card, allowing for easy comparison. We want to use this technique to summarize web archive collections because users are already familiar with this visualization paradigm.
  8. 8. @shawnmjones @WebSciDL@shawnmjones @WebSciDL AlNoamany analyzed popular social media stories…  AlNoamany discovered that popular social media stories  Contain around 28 elements  Contain mostly social cards  This means that they are mostly links to other content  Because they are mostly links, popular stories help users by reducing a topic down to a small number of items – a summary 8 Y. AlNoamany, M. C. Weigle, and M. L. Nelson, “Characteristics of social media stories: What makes a good story?,” International Journal on Digital Libraries, vol. 17, no. 3, pp. 239–256, 2016. https://doi.org/10.1007/ s00799-016-0185-3.
  9. 9. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Existing storytelling services are bookmarking, not preserving! 9
  10. 10. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Web archive collections consist of mementos – different versions of the same page over time 10 2013 2015 2018 University of Utah Office of Admissions from the University of Utah Web Archive Collection 4/1/2015 3/5/2015 Tumblr Black Lives Matter Blog from the #blacklivesmatter Collection 2/12/2015
  11. 11. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Mementos are the documents in web archive collections Mementos are the versions of pages from the time of the crawl. The mementos are the documents in our collections. Unlike most document collections web archives consist of many different versions of the same document. For summarization, web archive collections require different handling than other types of document collections. 11
  12. 12. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Building stories from web archives consists of two main actions… 1. Select a small subset of mementos from the web archive collection 12 2. Visualize that subset via a social media storytelling tool
  13. 13. @shawnmjones @WebSciDL Storytelling with web archives action 1 13
  14. 14. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Action 1: Sample k ≈ 28 mementos from N mementos of the collection to create a summary story 14 Web sites may group some content, but curators theme some of this content into collections which we can reduce to stories.
  15. 15. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Selecting our k ≈ 28 mementos manually requires exploring our web archive collection and deciding on our story…  What people, places, ideas are in the collection?  Decide on the story  What story do we want to tell?  What events are the story centered on?  Do we want to address the 5Ws: who, what, when, where, how, why?  Select the k mementos:  Use the web archive collection search engine to find the people, places, etc. that reflect the story you wish to tell  Record the URLs of these mementos  We choose about k ≈ 28 from the N ≈ 1000s of mementos 15 M. Praetzellis. (2018). Browse and search on archive-it.org. In Archive-It Help Center. Retrieved 16:05, August 16, 2019, from https://support.archive-it.org/hc/en-us/articles/208002196-Browse-and-search-on-archive-it-org
  16. 16. @shawnmjones @WebSciDL@shawnmjones @WebSciDL We can select our k ≈ 28 mementos automatically using AlNoamany’s Algorithm… 16 We developed the Off-Topic Memento Toolkit (OTMT) to execute this process. The OTMT is part of the Dark and Stormy Archives project. Y. AlNoamany, M. C. Weigle, and M. L. Nelson. 2017. Generating Stories From Archived Collections. In Proceedings of the 2017 ACM on Web Science Conference, 309–318. http://doi.org/10.1145/3091478.3091508 S. M. Jones, M. C. Weigle, and M. L. Nelson. 2018. The Off-Topic Memento Toolkit. In International Conference on Digital Preservation (iPRES) 2018. https://doi.org/10.17605/OSF.IO/UBW87 Dark and Stormy Archives. https://oduwsdl.github.io/dsa/ Characteristicsof human-generated Stories Characteristicsof Archive-It collections Exclude duplicates Exclude off-topic pages Exclude non-English Language Dynamically slice the collection Cluster the pages in each slice Select high-quality pages from each cluster Order pages by time Visualize Parts of this algorithm are useful for manually reviewing web archive collections, too.
  17. 17. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Step 1: Identifying Off-topic Mementos 17 Hacked Moved on from topic Collections have a theme. Seeds are selected to support that theme. Mementos are versions of seeds. Some of these versions are off-topic. Identifying these off-topic mementos is key to summarization. Web Page Gone Account Suspension S. M. Jones, M. C. Weigle, and M. L. Nelson. 2018. The Off-Topic Memento Toolkit. In International Conference on Digital Preservation (iPRES) 2018. https://doi.org/10.17605/OSF.IO/UBW87 Y. AlNoamany, M. C. Weigle, and M. L. Nelson, “Detecting off-topic pages within TimeMaps in Web archives,” International Journal on Digital Libraries, 2016. https://doi.org/10.1007/s00799016-0183-5
  18. 18. @shawnmjones @WebSciDL@shawnmjones @WebSciDL First We Identify and Exclude Off-Topic Mementos, But Why? We want to identify and exclude (not remove) off-topic mementos because they do not make for good summaries 18 Things happen to web pages that make them go off-topic. Red: off-topic, Green: on-topic Mementos are observations of seeds at different points in time S. M. Jones, M. C. Weigle, and M. L. Nelson. 2018. The Off-Topic Memento Toolkit. In International Conference on Digital Preservation (iPRES) 2018. https://doi.org/10.17605/OSF.IO/UBW87 Y. AlNoamany, M. C. Weigle, and M. L. Nelson, “Detecting off-topic pages within TimeMaps in Web archives,” International Journal on Digital Libraries, 2016. https://doi.org/10.1007/s00799016-0183-5
  19. 19. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Step 2: remove duplicate mementos  Remember: A memento is an observation at a particular point in time  Sometimes the web page did not change  These duplicates are extras that we do not need in our story 19 Thumbnails of duplicate mementos, grouped by color. Mementos outlined in red are the same, green are the same, etc. Y. AlNoamany, M. C. Weigle, and M. L. Nelson. 2017. Generating Stories From Archived Collections. In Proceedings of the 2017 ACM on Web Science Conference, 309–318. http://doi.org/10.1145/3091478.3091508
  20. 20. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Step 3: only consider pages using the language of our story 20 We typically want to tell stories with a single language Characteristicsof human-generated Stories Characteristicsof Archive-It collections Exclude duplicates Exclude off-topic pages Exclude non-English Language Dynamically slice the collection Cluster the pages in each slice Select high-quality pages from each cluster Order pages by time Visualize Y. AlNoamany, M. C. Weigle, and M. L. Nelson. 2017. Generating Stories From Archived Collections. In Proceedings of the 2017 ACM on Web Science Conference, 309–318. http://doi.org/10.1145/3091478.3091508
  21. 21. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Step 4: slice the collection so we cover the spread across time  To ensure we account for the spread across time, we slice the collection dynamically and distribute the mementos equally on the slices.  For N mementos:  If |N| <= 28, then the number of slices is |N|  If |N| > 28, then the number of slices is:  This way the size of the story grows slowly as needed for large collections 21 Characteristicsof human-generated Stories Characteristicsof Archive-It collections Exclude duplicates Exclude off-topic pages Exclude non-English Language Dynamically slice the collection Cluster the pages in each slice Select high-quality pages from each cluster Order pages by time Visualize Y. AlNoamany, M. C. Weigle, and M. L. Nelson. 2017. Generating Stories From Archived Collections. In Proceedings of the 2017 ACM on Web Science Conference, 309–318. http://doi.org/10.1145/3091478.3091508
  22. 22. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Step 5: cluster each slice for novelty  To ensure we find novel mementos, we reuse the Simhash scores from the deduplication step  Each cluster is built from the distance between these Simhash scores using the DBSCAN algorithm 22 Characteristicsof human-generated Stories Characteristicsof Archive-It collections Exclude duplicates Exclude off-topic pages Exclude non-English Language Dynamically slice the collection Cluster the pages in each slice Select high-quality pages from each cluster Order pages by time Visualize Y. AlNoamany, M. C. Weigle, and M. L. Nelson. 2017. Generating Stories From Archived Collections. In Proceedings of the 2017 ACM on Web Science Conference, 309–318. http://doi.org/10.1145/3091478.3091508
  23. 23. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Before moving to the step 6 we must understand Memento Damage…  Sometimes, when crawling, a web archive does not acquire all of the images, stylesheets, or JavaScript to render a page  This lack of resources is called damage  Note that calculating memento damage takes a long time, so this next step will take a while 23 J. F. Brunelle, M. Kelly, H. SalahEldeen, M. C. Weigle, and M. L. Nelson, “Not all mementos are created equal: measuring the impact of missing resources,” International Journal on Digital Libraries, vol. 16, no. 3-4, 2015. https://doi.org/10.1007/s00799-015-0150-6. E. Siregar, “Deploying the Memento-Damage Service,” https://ws-dl.blogspot.com/2017/11/2017-11-22- deploying-memento-damage.html, 2017.
  24. 24. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Step 6: select high-quality mementos  We favor pages with the following features:  News over social media, because social media posts produce poorer cards  Longer URLs with deeper paths, because they contain more unique information and thus produce better cards  They have low memento damage 24 Characteristicsof human-generated Stories Characteristicsof Archive-It collections Exclude duplicates Exclude off-topic pages Exclude non-English Language Dynamically slice the collection Cluster the pages in each slice Select high-quality pages from each cluster Order pages by time Visualize Y. AlNoamany, M. C. Weigle, and M. L. Nelson. 2017. Generating Stories From Archived Collections. In Proceedings of the 2017 ACM on Web Science Conference, 309–318. http://doi.org/10.1145/3091478.3091508
  25. 25. @shawnmjones @WebSciDL Storytelling with web archives action 2 25
  26. 26. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Action 2: visualize our selection of k items 26 A story of 15 mementos visualized as a Blogger post The same 15 mementos visualized as a Twitter thread
  27. 27. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Storify shut down on May 16, 2018 27 https://storify.com/ Originally we visualized stories from web archive collections by using Storify. Because Storify shut down in 2018, we need to visualize the stories with alternative tools.
  28. 28. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Creating a Web Archive Summary Story on Facebook 1. Create a Facebook post with the title of your story as the text 2. Create a comment 3. Take the first URL from your story resources and place it in the comment 4. Wait for the card to appear 5. Repeat for each additional URL, in order 28
  29. 29. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Problems with Summary Stories on Facebook  Facebook does not always generate a card for a link.  You can fix this by editing the comment and inserting your own text and image.  If a logged-in Facebook user clicks on a card, they may not get to the memento.  Facebook adds extra “stuff” to the end of a URL for logged-in users, and web archives may consider this ”stuff” to be part of the URL, so your viewer will get a 404.  Comments do not always appear in the order you inserted them. 29
  30. 30. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Existing platforms do not reliably produce social cards for mementos… 30 If we cannot rely upon the service to generate a social card for a memento, our system must then do the work to create our own. S. M. Jones. “Where Can We Post Stories Summarizing Web Archive Collections?” https://ws- dl.blogspot.com/2017/08/2017-08-11-where-can-we-post-stories.html, 2017.
  31. 31. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Some services have stories, but not long term storytelling? 31 Facebook stories Image ref: https://techcrunch.com/2018/04/05/facebook-stories-default/ Image ref: https://techcrunch.com/2013/10/03/snapc hat-gets-its-own-timeline-with-snapchat- stories-24-hour-photo-video-tales/ Snapchat stories Image ref: https://buffer.com/library/instagram-stories Instagram stories These platforms delete the user’s stories 24 hours after they are posted. This form of social media storytelling is the opposite of what we are looking for. We want the stories to be artifacts themselves.
  32. 32. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Existing card services create a confusing experience for mementos 32 Who published these resources? Archive-It? CNN? Is the story author sharing fake news? S. M. Jones. “A Preview of MementoEmbed: Embeddable Surrogates for Archived Web Pages.” https://ws- dl.blogspot.com/2018/08/2018-08-01-preview-of-mementoembed.html, 2018. embed.rocks social card embed.ly social card
  33. 33. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Neither social media services nor card services were reliable for storytelling, so we created MementoEmbed… 33 Information in the MementoEmbed social card is separated to avoid issues of confusion about attribution. MementoEmbed is archive-aware. It can locate information about the memento that is not available in other cards. S. M. Jones. “A Preview of MementoEmbed: Embeddable Surrogates for Archived Web Pages.” https://ws- dl.blogspot.com/2018/08/2018-08-01-preview-of-mementoembed.html, 2018.
  34. 34. @shawnmjones @WebSciDL@shawnmjones @WebSciDL MementoEmbed just produces cards and information for mementos, so we created Raintale to tell stories…  Raintale uses MementoEmbed to generate raw HTML stories or publish them to services like Twitter.  Raintale takes a text file consisting of the URLs for the story.  This file could be generated by the Off-Topic Memento Toolkit  This file could also be generated by you! You can manually select memento URLs to insert into the story. 34 S. M. Jones. “Raintale – A Storytelling Tool for Web Archives.” https://ws-dl.blogspot.com/2019/07/2019-07-11- raintale-storytelling-tool.html, 2019.
  35. 35. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Raintale uses templates to let you customize the look and destination media of your story 35 Extracted components as Bootstrap cards used by the fictional “My Archive” MementoEmbed cards in Blogger Extracted components via Twitter Thread Extracted components via MediaWiki Pages S. M. Jones. “Raintale – A Storytelling Tool for Web Archives.” https://ws-dl.blogspot.com/2019/07/2019-07-11- raintale-storytelling-tool.html, 2019.
  36. 36. @shawnmjones @WebSciDL@shawnmjones @WebSciDL Summary  We introduced social media storytelling  Storytelling with web archives consists of 2 actions: 1. selecting k mementos from the collection of N mementos where k << N 2. visualizing our selection of k mementos via a social media story  Per action 1, we covered:  the challenges of manually selecting mementos for stories  the steps of AlNoamany’s Algorithm for automatically selecting mementos  Per action 2, we highlighted  the need for Archive-Aware cards via MementoEmbed  the Raintale storytelling tool for web archives  Thus, we covered how to conduct storytelling with web archives 36
  37. 37. @shawnmjones @WebSciDL@shawnmjones @WebSciDL For More Information on Dark and Stormy Archives  Dark and Stormy Archives Project: https://oduwsdl.github.io/dsa/  Laboratory exercises: https://github.com/oduwsdl/dsa/tree/master/tutorials/CEDWARC-2019  Produce the mementos for your story with the Off-Topic Memento Toolkit (OTMT):  Distribution Page: https://pypi.org/project/otmt/  Report Issues: https://github.com/oduwsdl/off-topic-memento-toolkit/issues  Create social cards of mementos with MementoEmbed:  Documentation: https://mementoembed.readthedocs.io/en/latest/  Report Issues: https://github.com/oduwsdl/MementoEmbed/issues  Generate and publish your story with Raintale:  Website: https://oduwsdl.github.io/raintale/  Documentation: https://raintale.readthedocs.io/en/latest/  Report Issues: https://github.com/oduwsdl/raintale/issues 37
  38. 38. @shawnmjones @WebSciDL Storytelling With Web Archives An overview of building a small story from a large collection Thanks to: Shawn M. Jones Web Science and Digital Libraries Research Group Old Dominion University RE-70-18-0005-18

×