This document discusses improving understanding of collections in web archives. It notes that web archive collections contain multiple versions of pages that allow observing changes over time. However, metadata for collections is often missing or inconsistent, making it difficult for users to understand collections. The document proposes visualizing representative mementos from collections as a summary to help users understand collections at low cost compared to manually reviewing all content. Prior work on visualizing collections and generating summaries is also discussed.
Combining Social Media Storytelling With Web ArchivesShawn Jones
(This was a guest presentation for CS6604 - Digital Libraries - Fall 2019 - taught by Edward A. Fox)
Web archive collections consist of 1000s of documents. Manually making sense of collections at this scale is difficult. We propose using social media storytelling to aid in summarizing web archive collections. We discuss AlNoamany's Algorithm for generating a representative sample from these collections and highlight how to use the Dark and Stormy Archives toolkit.
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Shawn Jones
With web archives, journalists find evidence and information to back up their stories, historians store information for later users, and social scientists can study the actions of humans during specific time periods. These different groups gain value not only from creating their own collections but from using the collections of others. Web archive collections store the content that would otherwise be lost. As users, we currently have no efficient way of understanding what is in each collection without manually reviewing all of its items. Web archives intentionally consist of different versions of the same document. With these multiple versions, we can watch the evolution of a single resource over time, following the changes to an organization or how the public learns the details of an unfolding news story. As aggregations of archived web pages, or mementos, these collections become resources unto themselves. While past work has used mementos for studying how web resources change over time or evaluated the changes to various industries, there is still theoretical work to be done in improving the usability of web archive collections. Our goal is to help collection creators and the public at large to make better use of these collections through improvements to collection understanding. We build upon the work of AlNoamany by using visualizations from social media storytelling. Our goal is to produce a story for each web archive collection. Each story consists of representative mementos selected from the web archive collection that are then individually visualized as surrogates (e.g., screenshots, cards containing a summary of the page). This solution has the benefit of using visualization paradigms familiar to users. In this work, we provide background on the problem, analyze previous work in this area, and highlight our preliminary work before providing a plan for future research.
I presented this at iPres 2018. It consists of an analysis of some structural features found in Archive-It collections. We also categorize Archive-It collections into 4 different semantic categories and then uses the structural features to predict these categories with a Random Forest Classifier.
I presented this paper at iPres 2018. Here, we introduce the Off-Topic Memento Toolkit, used to detect versions of web pages that have drifted off topic from the general topic of a collection.
Where Can We Post Stories Summarizing Web Archive CollectionsShawn Jones
This is a presentation of social media storytelling tools that were covered in a blog post written for the Web Science and Digital Libraries research group: http://ws-dl.blogspot.com/2017/08/2017-08-11-where-can-we-post-stories.html
Combining Social Media Storytelling With Web ArchivesShawn Jones
(This was a guest presentation for CS6604 - Digital Libraries - Fall 2019 - taught by Edward A. Fox)
Web archive collections consist of 1000s of documents. Manually making sense of collections at this scale is difficult. We propose using social media storytelling to aid in summarizing web archive collections. We discuss AlNoamany's Algorithm for generating a representative sample from these collections and highlight how to use the Dark and Stormy Archives toolkit.
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Shawn Jones
With web archives, journalists find evidence and information to back up their stories, historians store information for later users, and social scientists can study the actions of humans during specific time periods. These different groups gain value not only from creating their own collections but from using the collections of others. Web archive collections store the content that would otherwise be lost. As users, we currently have no efficient way of understanding what is in each collection without manually reviewing all of its items. Web archives intentionally consist of different versions of the same document. With these multiple versions, we can watch the evolution of a single resource over time, following the changes to an organization or how the public learns the details of an unfolding news story. As aggregations of archived web pages, or mementos, these collections become resources unto themselves. While past work has used mementos for studying how web resources change over time or evaluated the changes to various industries, there is still theoretical work to be done in improving the usability of web archive collections. Our goal is to help collection creators and the public at large to make better use of these collections through improvements to collection understanding. We build upon the work of AlNoamany by using visualizations from social media storytelling. Our goal is to produce a story for each web archive collection. Each story consists of representative mementos selected from the web archive collection that are then individually visualized as surrogates (e.g., screenshots, cards containing a summary of the page). This solution has the benefit of using visualization paradigms familiar to users. In this work, we provide background on the problem, analyze previous work in this area, and highlight our preliminary work before providing a plan for future research.
I presented this at iPres 2018. It consists of an analysis of some structural features found in Archive-It collections. We also categorize Archive-It collections into 4 different semantic categories and then uses the structural features to predict these categories with a Random Forest Classifier.
I presented this paper at iPres 2018. Here, we introduce the Off-Topic Memento Toolkit, used to detect versions of web pages that have drifted off topic from the general topic of a collection.
Where Can We Post Stories Summarizing Web Archive CollectionsShawn Jones
This is a presentation of social media storytelling tools that were covered in a blog post written for the Web Science and Digital Libraries research group: http://ws-dl.blogspot.com/2017/08/2017-08-11-where-can-we-post-stories.html
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Shawn Jones
Presented at ACM CIKM 2019. Used by a variety of researchers, web archive collections have become invaluable sources of evidence. If a researcher is presented with a web archive collection that they did not create, how do they know what is inside so that they can use it for their own research? Search engine results and social media links are represented as surrogates, small easily digestible summaries of the underlying page. Search engines and social media have a different focus, and hence produce different surrogates than web archives. Search engine surrogates help a user answer the question "Will this link meet my information need?" Social media surrogates help a user decide "Should I click on this?" Our use case is subtly different. We hypothesize that groups of surrogates together are useful for summarizing a collection. We want to help users answer the question of "What does the underlying collection contain?" But which surrogate should we use? With Mechanical Turk participants, we evaluate six different surrogate types against each other. We find that the type of surrogate does not influence the time to complete the task we presented the participants. Of particular interest are social cards, surrogates typically found on social media, and browser thumbnails, screen captures of web pages rendered in a browser. At p=0.0569, and p=0.0770, respectively, we find that social cards and social cards paired side-by-side with browser thumbnails probably provide better collection understanding than the surrogates currently used by the popular Archive-It web archiving platform. We measure user interactions with each surrogate and find that users interact with social cards less than other types. The results of this study have implications for our web archive summarization work, live web curation platforms, social media, and more.
Summarizing archival collections using storytelling techniquesMichael Nelson
Summarizing archival collections using storytelling techniques
Yasmin AlNoamany
Michele C. Weigle
Michael L. Nelson
Old Dominion University
Web Science & Digital Libraries Research Group
www.cs.odu.edu/~mln/
@phonedude_mln
Research Funded by IMLS LG-71-15-0077-15
Dodging the Memory Hole
Los Angeles, CA, 2016-10-14
Yasmin AlNoamany
Michele C. Weigle
Michael L. Nelson
Old Dominion University
Web Science and Digital Libraries Group
ws-dl.cs.odu.edu
@WebSciDL
This work is supported in part by IMLS LG-71-15-0077
Old Dominion University ECE Department Colloquium
2015-11-13
Storytelling for Summarizing Collections in Web ArchivesMichael Nelson
Yasmin AlNoamany
Michele C. Weigle
Michael L. Nelson
Old Dominion University
Web Science and Digital Libraries Group
@WebSciDL
This work is supported in part by IMLS LG-71-15-0077
CNI Spring 2016
2016-04-05
Brief overview of linked data and RDF followed by use in libraries and archives. Originally delivered at OLITA Digital Odyssey 2014. Revised for the OLA Superconference 2015
The Power of Sharing Linked Data - ELAG 2014 WorkshopRichard Wallis
Presentation to set the scene and stimulate discussion in the Workshop "The Power of Sharing Linked Data" at ELAG 2014 - Bath University, UK June 10/11 2014
Improving Collection Understanding For Web Archives With Storytelling: Shinin...Shawn Jones
Collections are the tools that people use to make sense of an ever-increasing number of archived web pages. As collections themselves grow, we need tools to make sense of them. Tools that work on the general web, like search engines, are not a good fit for these collections because search engines do not currently represent multiple document versions well. Web archive collections themselves are vast, some containing hundreds of thousands of documents. There are also thousands of collections, many of which cover the same topic. Few collections include standardized metadata. Too many documents from too many collections with not enough metadata makes collection understanding an expensive proposition.
This dissertation establishes a five-process model to assist with web archive collection understanding. This model aims to automatically produce a social media story -- a visualization paradigm with which most web users are already familiar. Each social media story contains surrogates which are summaries of individual documents. These surrogates, when collected together, summarize the overall topic of the story. After applying our storytelling model, they summarize the topic of a web archive collection.
We develop and test a framework to select the best exemplars that represent a collection. We establish that algorithms produced from these primitives select exemplars that are otherwise undiscoverable using conventional search engine methods. We generate story metadata to improve the information scent of a story so users can understand it better. After an analysis showing that existing platforms perform poorly for web archives and a user study establishing the best surrogate type, we generate document metadata for the exemplars with machine learning. We then visualize the story and document metadata together and distribute it to satisfy the information needs of multiple personas who benefit from our model.
Our tools serve as a reference implementation of our Dark and Stormy Archives storytelling model. Hypercane selects exemplars and generates story metadata. MementoEmbed generates document metadata. Raintale visualizes and distributes the story based on the story metadata and the document metadata of these exemplars. By providing understanding at a glance, our stories save users the time and effort of reading thousands of documents and, most importantly, help them understand web archive collections.
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Shawn Jones
Presented at ACM CIKM 2019. Used by a variety of researchers, web archive collections have become invaluable sources of evidence. If a researcher is presented with a web archive collection that they did not create, how do they know what is inside so that they can use it for their own research? Search engine results and social media links are represented as surrogates, small easily digestible summaries of the underlying page. Search engines and social media have a different focus, and hence produce different surrogates than web archives. Search engine surrogates help a user answer the question "Will this link meet my information need?" Social media surrogates help a user decide "Should I click on this?" Our use case is subtly different. We hypothesize that groups of surrogates together are useful for summarizing a collection. We want to help users answer the question of "What does the underlying collection contain?" But which surrogate should we use? With Mechanical Turk participants, we evaluate six different surrogate types against each other. We find that the type of surrogate does not influence the time to complete the task we presented the participants. Of particular interest are social cards, surrogates typically found on social media, and browser thumbnails, screen captures of web pages rendered in a browser. At p=0.0569, and p=0.0770, respectively, we find that social cards and social cards paired side-by-side with browser thumbnails probably provide better collection understanding than the surrogates currently used by the popular Archive-It web archiving platform. We measure user interactions with each surrogate and find that users interact with social cards less than other types. The results of this study have implications for our web archive summarization work, live web curation platforms, social media, and more.
Summarizing archival collections using storytelling techniquesMichael Nelson
Summarizing archival collections using storytelling techniques
Yasmin AlNoamany
Michele C. Weigle
Michael L. Nelson
Old Dominion University
Web Science & Digital Libraries Research Group
www.cs.odu.edu/~mln/
@phonedude_mln
Research Funded by IMLS LG-71-15-0077-15
Dodging the Memory Hole
Los Angeles, CA, 2016-10-14
Yasmin AlNoamany
Michele C. Weigle
Michael L. Nelson
Old Dominion University
Web Science and Digital Libraries Group
ws-dl.cs.odu.edu
@WebSciDL
This work is supported in part by IMLS LG-71-15-0077
Old Dominion University ECE Department Colloquium
2015-11-13
Storytelling for Summarizing Collections in Web ArchivesMichael Nelson
Yasmin AlNoamany
Michele C. Weigle
Michael L. Nelson
Old Dominion University
Web Science and Digital Libraries Group
@WebSciDL
This work is supported in part by IMLS LG-71-15-0077
CNI Spring 2016
2016-04-05
Brief overview of linked data and RDF followed by use in libraries and archives. Originally delivered at OLITA Digital Odyssey 2014. Revised for the OLA Superconference 2015
The Power of Sharing Linked Data - ELAG 2014 WorkshopRichard Wallis
Presentation to set the scene and stimulate discussion in the Workshop "The Power of Sharing Linked Data" at ELAG 2014 - Bath University, UK June 10/11 2014
Improving Collection Understanding For Web Archives With Storytelling: Shinin...Shawn Jones
Collections are the tools that people use to make sense of an ever-increasing number of archived web pages. As collections themselves grow, we need tools to make sense of them. Tools that work on the general web, like search engines, are not a good fit for these collections because search engines do not currently represent multiple document versions well. Web archive collections themselves are vast, some containing hundreds of thousands of documents. There are also thousands of collections, many of which cover the same topic. Few collections include standardized metadata. Too many documents from too many collections with not enough metadata makes collection understanding an expensive proposition.
This dissertation establishes a five-process model to assist with web archive collection understanding. This model aims to automatically produce a social media story -- a visualization paradigm with which most web users are already familiar. Each social media story contains surrogates which are summaries of individual documents. These surrogates, when collected together, summarize the overall topic of the story. After applying our storytelling model, they summarize the topic of a web archive collection.
We develop and test a framework to select the best exemplars that represent a collection. We establish that algorithms produced from these primitives select exemplars that are otherwise undiscoverable using conventional search engine methods. We generate story metadata to improve the information scent of a story so users can understand it better. After an analysis showing that existing platforms perform poorly for web archives and a user study establishing the best surrogate type, we generate document metadata for the exemplars with machine learning. We then visualize the story and document metadata together and distribute it to satisfy the information needs of multiple personas who benefit from our model.
Our tools serve as a reference implementation of our Dark and Stormy Archives storytelling model. Hypercane selects exemplars and generates story metadata. MementoEmbed generates document metadata. Raintale visualizes and distributes the story based on the story metadata and the document metadata of these exemplars. By providing understanding at a glance, our stories save users the time and effort of reading thousands of documents and, most importantly, help them understand web archive collections.
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
Presented at the Coalition of Networked Information (CNI) Spring 2024 Project Briefings.
Over the past six years, Getty has been engaged in a project to transform and unify its complex digital infrastructure for cultural heritage information. One of the project’s core goals was to provide validation of the impact and value of the use of linked data throughout this process. With museum, archival, media, and vocabularies in production and others underway, this sessions shares some of the practical implications (and pitfalls) of this work—particularly as it relates to interoperability, discovery, staffing, stakeholder engagement, and complexity management. The session will also share examples of how other organizations can streamline their own, similar work going forward.
web 2.0, library systems and the library systemlisld
The Web 2.0 environment is characterized by concentration and diffusion. Library services are not well matched to this environment: they are fragmented and difficult to mobilize in user workflows. This presentation analyzes this situation and suggests some directions.
The Unreasonable Effectiveness of MetadataJames Hendler
Invited talk at VIVO 2017 conference - explores the view of the semantic web as enriched metadata, and how that kind of information can be used in new and interesting ways.
This is an informal overview of Linked Data and the usage made of it for the project http://res.space (presented on August 11th 2016 during a team meeting)
Linked Open Data for Libraries, Archives, and Museums: An Aggregators ViewRichard Urban
Presented at the American Association of Museums 2012
An accompanying handout can be found here:
http://dl.dropbox.com/u/3881880/aam2012/aam_handout.pdf
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyPRELIDA Project
Peter Burnhill (EDINA, University of Edinburgh), presented at the 3rd PRELIDA Consolidation and Dissemination Workshop, Riva, Italy, October, 17, 2014. More information about the workshop at: prelida.eu
OCLC Research @ U of Calgary: New directions for metadata workflows across li...OCLC Research
Presentation used as scene setting for 2 days worth of discussion around library, archive & museum convergence, metadata workflows and single search at the University of Calgary.
for getting the library resources fro the libraries entire world, the important tool is Library catalogues. every can browse all most all the world literature through WorldCat fro the INTERNET.
A 1015 update to the 2012 "Data Big and Broad" talk - http://www.slideshare.net/jahendler/data-big-and-broad-oxford-2012 - extends coverage, brings more in context of recent "big data" work.
Similar to Improving Collection Understanding in Web Archives (20)
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Shawn Jones
Much computer vision research has focused on natural images, but technical documents typically consist of abstract images, such as charts, drawings, diagrams, and schematics. How well do general web search engines discover abstract images? Recent advancements in computer vision and machine learning have led to the rise of reverse image search engines. Where conventional search engines accept a text query and return a set of document results, including images, a reverse image search accepts an image as a query and returns a set of images as results. This paper evaluates how well common reverse image search engines discover abstract images. We conducted an experiment leveraging images from Wikimedia Commons, a website known to be well indexed by Baidu, Bing, Google, and Yandex. We measure how difficult an image is to find again (retrievability), what percentage of images returned are relevant (precision), and the average number of results a visitor must review before finding the submitted image (mean reciprocal rank). When trying to discover the same image again among similar images, Yandex performs best. When searching for pages containing a specific image, Google and Yandex outperform the others when discovering photographs with precision scores ranging from 0.8191 to 0.8297, respectively. In both of these cases, Google and Yandex perform better with natural images than with abstract ones achieving a difference in retrievability as high as 54% between images in these categories. These results affect anyone applying common web search engines to search for technical documents that use abstract images.
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...Shawn Jones
Much computer vision research has focused on natural images, but technical documents typically consist of abstract images, such as charts, drawings, diagrams, and schematics. How well do general web search engines discover abstract images? Recent advancements in computer vision and machine learning have led to the rise of reverse image search engines. Where conventional search engines accept a text query and return a set of document results, including images, a reverse image search accepts an image as a query and returns a set of images as results. This paper evaluates how well common reverse image search engines discover abstract images. We conducted an experiment leveraging images from Wikimedia Commons, a website known to be well indexed by Baidu, Bing, Google, and Yandex. We measure how difficult an image is to find again (retrievability), what percentage of images returned are relevant (precision), and the average number of results a visitor must review before finding the submitted image (mean reciprocal rank). When trying to discover the same image again among similar images, Yandex performs best. When searching for pages containing a specific image, Google and Yandex outperform the others when discovering photographs with precision scores ranging from 0.8191 to 0.8297, respectively. In both of these cases, Google and Yandex perform better with natural images than with abstract ones achieving a difference in retrievability as high as 54% between images in these categories. These results affect anyone applying common web search engines to search for technical documents that use abstract images.
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Shawn Jones
Much computer vision research has focused on natural images, but technical documents typically consist of abstract images, such as charts, drawings, diagrams, and schematics. How well do general web search engines discover abstract images? Recent advancements in computer vision and machine learning have led to the rise of reverse image search engines. Where conventional search engines accept a text query and return a set of document results, including images, a reverse image search accepts an image as a query and returns a set of images as results. This paper evaluates how well common reverse image search engines discover abstract images. We conducted an experiment leveraging images from Wikimedia Commons, a website known to be well indexed by Baidu, Bing, Google, and Yandex. We measure how difficult an image is to find again (retrievability), what percentage of images returned are relevant (precision), and the average number of results a visitor must review before finding the submitted image (mean reciprocal rank). When trying to discover the same image again among similar images, Yandex performs best. When searching for pages containing a specific image, Google and Yandex outperform the others when discovering photographs with precision scores ranging from 0.8191 to 0.8297, respectively. In both of these cases, Google and Yandex perform better with natural images than with abstract ones achieving a difference in retrievability as high as 54% between images in these categories. These results affect anyone applying
common web search engines to search for technical documents that use abstract images.
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...Shawn Jones
In a perfect world, all articles consistently contain sufficient metadata to describe the resource. We know this is not the reality, so we are motivated to investigate the evolution of the metadata that is present when authors and publishers supply their own. Because applying metadata takes time, we recognize that each news article author has a limited metadata budget with which to spend their time and effort. How are they spending this budget? What are the top metadata categories in use? How did they grow over time? What purpose do they serve? We also recognize that not all metadata fields are used equally. What is the growth of individual fields over time? Which fields experienced the fastest adoption? In this paper, we review 227,726 HTML news articles from 29 outlets captured by the Internet Archive between 1998 and 2016. Upon reviewing the metadata fields in each article, we discovered that 2010 began a metadata renaissance as publishers embraced metadata for improved search engine ranking, search engine tracking, social media tracking, and social media sharing. When analyzing individual fields, we find that one application of metadata stands out above all others: social cards -- the cards generated by platforms like Twitter when one shares a URL. Once a metadata standard was established for cards in 2010, its fields were adopted by 20% of articles in the first year and reached more than 95% adoption by 2016. This rate of adoption surpasses efforts like schema.org and Dublin Core by a fair margin. When confronted with these results on how news publishers spend their metadata budget, we must conclude that it is all about the cards.
Automatically Selecting Striking Images for Social CardsShawn Jones
To allow previewing a web page, social media platforms have developed social cards: visualizations consisting of vital information about the underlying resource. At a minimum, social cards often include features such as the web resource's title, text summary, striking image, and domain name. News and scholarly articles on the web are frequently subject to social card creation when being shared on social media. However, we noticed that not all web resources offer sufficient metadata elements to enable appealing social cards. For example, the COVID-19 emergency has made it clear that scholarly articles, in particular, are at an aesthetic disadvantage in social media platforms when compared to their often more flashy disinformation rivals. Also, social cards are often not generated correctly for archived web resources, including pages that lack or predate standards for specifying striking images. With these observations, we are motivated to quantify the levels of inclusion of required metadata in web resources, its evolution over time for archived resources, and create and evaluate an algorithm to automatically select a striking image for social cards. We find that more than 40% of archived news articles sampled from the NEWSROOM dataset and 22% of scholarly articles sampled from the PubMed Central dataset fail to supply striking images. We demonstrate that we can automatically predict the striking image with a Precision@1 of 0.83 for news articles from NEWSROOM and 0.78 for scholarly articles from the open access journal PLOS ONE.
A presentation of the work I had done with the Research Library Prototyping Team at Los Alamos National Laboratory given to the local chapter of the Special Libraries Association in New Mexico.
Avoiding Spoilers On MediaWiki Fan Sites Using MementoShawn Jones
A variety of fan-based wikis about episodic fiction (e.g., television shows, novels, movies) exist on the World Wide Web. These wikis provide a wealth of information about complex stories, but if readers are behind in their viewing they run the risk of encountering "spoilers" -- information that gives away key plot points before the intended time of the show's writers. Enterprising readers might browse the wiki in a web archive so as to view the page prior to a specic episode date and thereby avoid spoilers. Unfortunately, due to how web archives choose the "best" page, it is still possible to see spoilers (especially in sparse archives).
In this presentation we highlight the issues with avoiding spoilers using Memento. We show that for a sample of fan wiki pages there is as much as a 66% chance of encountering a spoiler. We also find, using logs from the Internet Archive, that 19% of actual requests to the Wayback Machine for wikia.com end in spoilers. We suggest a different heuristic for use with wikis and unveil the Memento MediaWiki Extension as a solution.
Reconstructing the past with media wikiShawn Jones
The Internet Archive attempts to reconstruct web pages via snapshots (Mementos) that are taken of pages at various points in time. Many pages change more frequently than the Internet Archive can capture them, meaning that some revisions of a given web page are lost forever. Mediawiki, however, has all past revisions of a given page, and also its associated external resources. This inspired the development of the Memento Mediawiki Extension as an improvement over the Internet Archive's "drive by" method of digital preservation where Mediawiki sites are involved.
While working on the Memento Mediawiki Extension, effort was put into reconstructing past revisions of each Wiki page. The existing software reconstructs the page text as per RFC 7089, but does not try to pull in past versions of images, JavaScript, CSS, and other external resources, because Mediawiki, as it exists, makes it difficult or impossible to load these resources at page generation time.
This curated talk will explore the problems of page reconstruction on the main web and detail the issues within the Mediawiki code that currently prevent and/or make it difficult to reconstruct the page in its totality as it looked at that revision.
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC
Ellisha Heppner, Grant Management Lead, presented an update on APNIC Foundation to the PNG DNS Forum held from 6 to 10 May, 2024 in Port Moresby, Papua New Guinea.
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBrad Spiegel Macon GA
Brad Spiegel Macon GA’s journey exemplifies the profound impact that one individual can have on their community. Through his unwavering dedication to digital inclusion, he’s not only bridging the gap in Macon but also setting an example for others to follow.
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
Meet up Milano 14 _ Axpo Italia_ Migration from Mule3 (On-prem) to.pdfFlorence Consulting
Quattordicesimo Meetup di Milano, tenutosi a Milano il 23 Maggio 2024 dalle ore 17:00 alle ore 18:30 in presenza e da remoto.
Abbiamo parlato di come Axpo Italia S.p.A. ha ridotto il technical debt migrando le proprie APIs da Mule 3.9 a Mule 4.4 passando anche da on-premises a CloudHub 1.0.
Italy Agriculture Equipment Market Outlook to 2027harveenkaur52
Agriculture and Animal Care
Ken Research has an expertise in Agriculture and Animal Care sector and offer vast collection of information related to all major aspects such as Agriculture equipment, Crop Protection, Seed, Agriculture Chemical, Fertilizers, Protected Cultivators, Palm Oil, Hybrid Seed, Animal Feed additives and many more.
Our continuous study and findings in agriculture sector provide better insights to companies dealing with related product and services, government and agriculture associations, researchers and students to well understand the present and expected scenario.
Our Animal care category provides solutions on Animal Healthcare and related products and services, including, animal feed additives, vaccination
2. @shawnmjones @WebSciDL
Researchers Create Their Own Web Archive Collections
2
Archived web pages, or mementos, are used by journalists, sociologists, and historians.
Tucson Shootings2008 OlympicsUniversity of Utah
3. @shawnmjones @WebSciDL
Web Archive Collections Have Many Versions of the
Same Page
3
2013
2015
2018
University of Utah Office of Admissions
from the University of Utah Web Archive Collection
4/1/2015
3/5/2015
Tumblr Black Lives Matter Blog
from the #blacklivesmatter Collection
2/12/2015
4. @shawnmjones @WebSciDL
Different Versions Allow Us to See an Unfolding News
Story
4
Memento from
April 19, 2013 17:12
Searching for Suspects,
City on Lockdown
Memento from
April 19, 2013 17:59
Officer Donahue in hospital,
Lockdown loosened,
Will the Red Sox game be cancelled?
Memento from
April 11, 2013 2:24
Suspect Found,
Office Collier Lost Life,
Obama speaks
6. @shawnmjones @WebSciDL
Archive-It Provides For Easy Collection Creation
Archive-It was created by the Internet Archive as a consistent user interface for constructing web archive
collections. Curators can supply live web resources as seeds and establish crawling schedules of those
seeds to create mementos.
6
7. @shawnmjones @WebSciDL
The Problem of Collection Understanding
What is the difference between these two Archive-It collections about the South Louisiana Flood of
2016?
Which one should a researcher use?
7
8. @shawnmjones @WebSciDL 8
31 Archive-It
collections match the
search query
“human rights”
How are they different
from each other?
Which one is best for my
needs?
10. @shawnmjones @WebSciDL
But, alas the metadata does not help
Because metadata is optional it is not always
present.
Metadata on Archive-It collections:
• many different curators
• different organizations
• different content standards
• different rules of interpretation
10
9 seeds
with metadata
132,599 seeds
no metadata
11. @shawnmjones @WebSciDL
But, alas the metadata does not help
Because metadata is optional it is not always
present.
Metadata on Archive-It collections:
• many different curators
• different organizations
• different content standards
• different rules of interpretation
• it is inconsistently applied
This means that a user cannot reliably compare
metadata fields to understand the differences
between collections.
11
132,599 seeds
no metadata
9 seeds
with metadata
Paradox of metadata:
More seeds = more effort
12. @shawnmjones @WebSciDL
Reviewing mementos manually is costly
This collection has 132,599 seeds, many
with multiple mementos
Some collections have 1000s of seeds
Each seed can have many mementos
In some cases, this can require
reviewing 100,000+ documents to
understand the collection
12
13. @shawnmjones @WebSciDL
More Archive-It collections are added every year
More than 8000 collections exist as
of the end of 2016
13
More Archive-It collections
are added each year
14. @shawnmjones @WebSciDL
The problem, summarized
There are multiple collections
about the same concept.
The metadata for each collection is
non-existent, or inconsistently
applied.
Many collections have
1000s of seeds with multiple
mementos.
There are more than 8000
collections.
14
15. @shawnmjones @WebSciDL
The problem, summarized
There are multiple collections
about the same concept.
The metadata for each collection is
non-existent, or inconsistently
applied.
Many collections have
1000s of seeds with multiple
mementos.
There are more than 8000
collections.
Human review of these
mementos for collection
understanding is an expensive
proposition.
15
16. @shawnmjones @WebSciDL
The proposal: a visualization made of representative
mementos
Our visualization is a summary that will
act like an abstract
Pirolli and Card’s Information Foraging
Theory:
maximize the value of the information gained
from our summaries
minimize the cost of interacting with the
collection
ensure that our representative mementos have
good information scent
contain cues that the memento will address a
user’s needs
From this:
318 seeds with
2421 mementos To something
like this:
a visualization
of ~28 social
cards
16
Peter Pirolli. 2005. Rational Analyses of Information Foraging on the Web. Cognitive
Science 29, 3 (May 2005), 343–373. DOI:10.1207/s15516709cog0000_20
18. @shawnmjones @WebSciDL
Looking at Archive-It collections from the outside
• Curators select seeds, which are captured as seed mementos
• Deep mementos are created from other pages linked to seeds
• Each seed has a corresponding TimeMap listing all of that seed’s mementos and capture times, their
memento-datetimes
18
Archive–It Collections
19. @shawnmjones @WebSciDL
Document collections have aspects
Metadata on a publication:
used as a surrogate for understanding
answers anticipated questions
Aspects:
The central concepts of the corpus
For example: aspects about a disaster
time
place
cause
countermeasures
Aspects correspond to the questions that a user
might have about a collection
19
Archive–It Collections Summarize with Aspects
Renxian Zhang, Wenjie Li, and Dehong Gao. 2012. Generating Coherent Summaries
with Textual Aspects. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial
Intelligence (AAAI’12), 1727–1733.
20. @shawnmjones @WebSciDL
How can we surface aspects?
Named Entity Recognition can
answer questions of who or
where?
Natural Language Processing can
answer questions of what time
period?
Topic modeling can surface
general concepts from the corpus
And we have to be cognizant of
these concepts over time
20
Archive-It Collection 8121:
“The Obama White House”
Archive-It Collection 8513:
“Donald J Trump White House”
Archive–It Collections Summarize with Aspects
21. @shawnmjones @WebSciDL
Visualizing web resources (surrogates)
21
Thumbnail (example from UK Web Archive)Text snippet (example from Bing)
Social Card (example from Facebook)
Text + Thumbnail (example from Internet Archive)
Visualize MementosArchive–It Collections Summarize with Aspects
22. @shawnmjones @WebSciDL
Which surrogate is best for web resources?
Li (2008)
social cards > text snippets
in performance
Dziadosz (2002)
text + thumbnail > text snippet
text snippet > thumbnail
in performance
Woodruff (2001)
thumbnails > text snippets
in performance
Teevan (2009)
text snippets > thumbnails
in performance
Aula (2010)
text snippets ~= thumbnails
in performance
Loumakis (2011)
text snippets ~= social cards
in performance
social cards > text snippets
in information scent and user preference
Capra (2013)
social cards > text snippets
In performance
(barely statistically significant)
Al Maqbali (2010)
text + thumbnail ~= social card
text snippet ~= social card
text + thumbnail ~= text snippet
in performance
22
Visualize MementosArchive–It Collections Summarize with Aspects
https://ws-dl.blogspot.com/2018/04/2018-04-24-lets-get-visual-and-examine.html
23. @shawnmjones @WebSciDL
Which surrogate is best for web resources?
Studies on visualizing web resources have focused primarily on
determining search engine result relevance and not collection understanding.
Li (2008)
social cards > text snippets
in performance
Dziadosz (2002)
text + thumbnail > text snippet
text snippet > thumbnail
in performance
Woodruff (2001)
thumbnails > text snippets
in performance
Teevan (2009)
text snippets > thumbnails
in performance
Aula (2010)
text snippets ~= thumbnails
in performance
Loumakis (2011)
text snippets ~= social cards
in performance
social cards > text snippets
in information scent and user preference
Capra (2013)
social cards > text snippets
In performance
(barely statistically significant)
Al Maqbali (2010)
text + thumbnail ~= social card
text snippet ~= social card
text + thumbnail ~= text snippet
in performance
23
Visualize MementosArchive–It Collections Summarize with Aspects
https://ws-dl.blogspot.com/2018/04/2018-04-24-lets-get-visual-and-examine.html
24. @shawnmjones @WebSciDL
Visualizing Archive-It Collections
24
Other attempts at
visualizing Archive-It
collections tried to
visualize everything.
Visualize MementosArchive–It Collections Summarize with Aspects Visualize Summary
Kalpesh Padia, Yasmin AlNoamany, and Michele C. Weigle.
2012. Visualizing digital collections at archive-it. In
Proceedings of the 12th ACM/IEEE-CS joint conference on
Digital Libraries (JCDL ‘12) 15 – 18.
DOI:10.1145/2232817.2232821
http://ws-dl.blogspot.com/2012/08/2012-08-10-ms-thesis-
visualizing.html
25. @shawnmjones @WebSciDL
Prior work by AlNoamany
Visualized summaries via the storytelling platform Storify
Proved that test participants could not detect the difference between her automated summaries
and human-generated summaries
Characteristicsof
human-generated
Stories
Characteristicsof
Archive-It
collections
Exclude duplicates
Exclude off-topic pages
Exclude non-English Language
Dynamically slice the collection
Cluster the pages
in each slice
Select high-quality
pages from each
cluster
Order pages
by time
Visualize
25
Visualize MementosArchive–It Collections Summarize with Aspects Visualize Summary
Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson. 2017. Generating Stories From Archived Collections. In Proceedings of the 2017 ACM on Web Science
Conference, 309–318. DOI:10.1145/3091478.3091508
http://ws-dl.blogspot.com/2016/09/2016-09-20-promising-scene-at-end-of.html
26. @shawnmjones @WebSciDL
Prior work by AlNoamany
Visualized summaries via the storytelling platform Storify – which is no longer in service
Proved that test participants could not detect the difference between her automated summaries
and human-generated summaries
Characteristicsof
human-generated
Stories
Characteristicsof
Archive-It
collections
Exclude duplicates
Exclude off-topic pages
Exclude non-English Language
Dynamically slice the collection
Cluster the pages
in each slice
Select high-quality
pages from each
cluster
Order pages
by time
Visualize
26
Visualize MementosArchive–It Collections Summarize with Aspects Visualize Summary
http://ws-dl.blogspot.com/2017/12/2017-12-14-storify-will-be-gone-soon-so.html
x
27. @shawnmjones @WebSciDL
Prior work by AlNoamany
Visualized summaries via the storytelling platform Storify – which is no longer in service
Proved that test participants could not detect the difference between her automated summaries and
human-generated summaries
Did not evaluate if the resulting summaries were effective tools for collection understanding
Focused on summarizing collections about events
There are other types of Archive-It collections
Characteristicsof
human-generated
Stories
Characteristicsof
Archive-It
collections
Exclude duplicates
Exclude off-topic pages
Exclude non-English Language
Dynamically slice the collection
Cluster the pages
in each slice
Select high-quality
pages from each
cluster
Order pages
by time
Visualize
27
Visualize MementosArchive–It Collections Summarize with Aspects Visualize Summary
x
29. @shawnmjones @WebSciDL
Growth curves for understanding collection creation
behavior
29
Archive–It Collections
• Skew of the
collection’s holdings
• Indicates temporality
of collection
• Skew of the curatorial
involvement with the
collection
• When seeds were
added
• When interest was lost
or regained
(Positive) (Positive)
(Negative)
(Negative)
Shawn M. Jones, Alexander Nwala, Michele C. Weigle, and Michael L. Nelson. 2018. The Many Shapes of Archive-It: Characteristics of Archive-It.
In International Conference on Digital Preservation (iPRES) 2018.
30. @shawnmjones @WebSciDL
Structural features of Archive-It collections
difference between seed curve AUC and
diagonal
difference between seed memento curve
AUC and diagonal
difference between seed memento curve
AUC and seed curve AUC
number of seeds
number of mementos
seed URI domain diversity
seed URI path depth diversity
most frequent seed URI path depth
% query string usage in seed URIs
lifespan of collection
30
Archive–It Collections
Shawn M. Jones, Alexander Nwala, Michele C. Weigle, and Michael L. Nelson. 2018. The Many Shapes of Archive-It: Characteristics of Archive-It.
In International Conference on Digital Preservation (iPRES) 2018.
31. @shawnmjones @WebSciDL
Semantic categories of Archive-It collections
Self-Archiving Subject-based Time Bounded – Expected Time Bounded – Spontaneous
31
Archive–It Collections
Shawn M. Jones, Alexander Nwala, Michele C. Weigle, and Michael L. Nelson. 2018. The Many Shapes of Archive-It: Characteristics of Archive-It.
In International Conference on Digital Preservation (iPRES) 2018.
In a study of 3,382 Archive-It collections
32. @shawnmjones @WebSciDL
Semantic categories of Archive-It collections
Self-Archiving
54.1% of collections
Subject-based Time Bounded – Expected Time Bounded – Spontaneous
32
Archive–It Collections
Shawn M. Jones, Alexander Nwala, Michele C. Weigle, and Michael L. Nelson. 2018. The Many Shapes of Archive-It: Characteristics of Archive-It.
In International Conference on Digital Preservation (iPRES) 2018.
In a study of 3,382 Archive-It collections
33. @shawnmjones @WebSciDL
Semantic categories of Archive-It collections
Self-Archiving
54.1% of collections
Subject-based
27.6% of collections
Time Bounded – Expected Time Bounded – Spontaneous
33
Archive–It Collections
Shawn M. Jones, Alexander Nwala, Michele C. Weigle, and Michael L. Nelson. 2018. The Many Shapes of Archive-It: Characteristics of Archive-It.
In International Conference on Digital Preservation (iPRES) 2018.
In a study of 3,382 Archive-It collections
34. @shawnmjones @WebSciDL
Semantic categories of Archive-It collections
Self-Archiving
54.1% of collections
Subject-based
27.6% of collections
Time Bounded – Expected
14.1% of collections
Time Bounded – Spontaneous
34
Archive–It Collections
Shawn M. Jones, Alexander Nwala, Michele C. Weigle, and Michael L. Nelson. 2018. The Many Shapes of Archive-It: Characteristics of Archive-It.
In International Conference on Digital Preservation (iPRES) 2018.
In a study of 3,382 Archive-It collections
35. @shawnmjones @WebSciDL
Semantic categories of Archive-It collections
Self-Archiving
54.1% of collections
Subject-based
27.6% of collections
Time Bounded – Expected
14.1% of collections
Time Bounded – Spontaneous
4.2% of collections
35
Archive–It Collections
Shawn M. Jones, Alexander Nwala, Michele C. Weigle, and Michael L. Nelson. 2018. The Many Shapes of Archive-It: Characteristics of Archive-It.
In International Conference on Digital Preservation (iPRES) 2018.
In a study of 3,382 Archive-It collections
36. @shawnmjones @WebSciDL
Semantic categories of Archive-It collections
Self-Archiving
54.1% of collections
Subject-based
27.6% of collections
Time Bounded – Expected
14.1% of collections
Time Bounded – Spontaneous
4.2% of collections
Some evaluated by AlNoamany
36
Archive–It Collections
Shawn M. Jones, Alexander Nwala, Michele C. Weigle, and Michael L. Nelson. 2018. The Many Shapes of Archive-It: Characteristics of Archive-It.
In International Conference on Digital Preservation (iPRES) 2018.
37. @shawnmjones @WebSciDL
Semantic categories of Archive-It collections
Self-Archiving
54.1% of collections
Subject-based
27.6% of collections
Time Bounded – Expected
14.1% of collections
Time Bounded – Spontaneous
4.2% of collections
Some evaluated by AlNoamany
Using the structural features on the previous slide, we can predict these
semantic categories with a Random Forest classifier with F1 = 0.720
37
Archive–It Collections
Shawn M. Jones, Alexander Nwala, Michele C. Weigle, and Michael L. Nelson. 2018. The Many Shapes of Archive-It: Characteristics of Archive-It.
In International Conference on Digital Preservation (iPRES) 2018.
39. @shawnmjones @WebSciDL
Developing a Flexible Framework
Off-Topic Memento
Toolkit
Representative
Memento Selection
Utilities
Archive-It
Utilities
MementoEmbed
DSA
Visualization
Interface
Web Archive
Collection
Visualized
Summary
Dark and Stormy Archives (DSA) 2.0
A framework based by AlNoamany’s work
Two concepts are embodied in this framework:
1. Selecting representative mementos
2. Visualizing those mementos
39
Shawn M. Jones, Michele C. Weigle, and Michael L. Nelson. 2018. The
Off-Topic Memento Toolkit. In International Conference on Digital
Preservation (iPRES) 2018.
40. @shawnmjones @WebSciDL
Not just Archive-It
40
Our methods will be applicable to any web archive collection,
like those developed by Rhizome’s Webrecorder.
44. @shawnmjones @WebSciDL
Evaluation
1. Choose target
collections for study
2. Develop user tasks
for each collection
3. How well do users
complete the tasks?
44
Who is X? Where is Y? When does Z take place?
45. @shawnmjones @WebSciDL
RQ1: How do we select representative mementos for the
different semantic types of collections?
Summarizing a collection involves:
1. Grouping the mementos by their
commonalities
2. Select the highest quality mementos
from each group
Different semantic categories may
require different algorithms
We want to reuse existing tools
where possible:
Stanford NLP
Archives Unleashed Toolkit
gensim
SpaCy
45
Archive–It Collections Summarize with Aspects
46. @shawnmjones @WebSciDL
RQ1 Evaluation
1. How many user tasks were addressed by the mementos chosen? How many
user tasks failed?
2. How many mementos produced are not useful for any user task?
3. Which algorithm surfaces aspects satisfying the highest mean number of user
tasks for a given collection type?
4. What is the mean minimum number of mementos necessary to address the
most user tasks?
46
Archive–It Collections Summarize with Aspects
47. @shawnmjones @WebSciDL
RQ2: What visualizations (surrogates) work best for
understanding individual mementos?
There are many different possibilities
for surrogates
Does the choice in surrogate change
depending on the collection’s
semantic category?
47
Visualize MementosArchive–It Collections Summarize with Aspects
48. @shawnmjones @WebSciDL
RQ2 Evaluation
1. Does the depth, domain, or category of
the URI play a factor in which surrogate
performs better?
2. Do different surrogates work better for
different semantic categories?
3. For social cards, which elements of the
social card need to be present to
understand the underlying memento?
4. For thumbnails, what size thumbnail
works best for understanding? How
much of the web page needs to be
rendered for a thumbnail to be useful for
understanding?
48
Visualize MementosArchive–It Collections Summarize with Aspects
Evaluated via:
49. @shawnmjones @WebSciDL
RQ3: How well do visualizations of groups of mementos
produced by different summarization algorithms work for
collection understanding?
Once we have:
Candidate summarization algorithms
Evaluated surrogates for individual mementos
We can then evaluate the combination of
summarization and visualization.
There are many options:
arranging surrogates
headings
metadata
49
RQ1:
Summarization
Algorithms
RQ2:
Visualization
Elements
RQ3: Visualization of
Summary
Visualize MementosArchive–It Collections Summarize with Aspects Visualize Summary
50. @shawnmjones @WebSciDL
RQ3 Evaluation
1. How many user tasks are addressed by
the visualization chosen? How many fail?
2. How many visualized mementos were not
needed for any given user task?
3. Given an aspect of the collection, can the
user address a user task concerning it by
visually scanning the visualization?
4. Given multiple aspects of the collection,
can the user successfully compare
different individual memento visualizations
to address a user task?
5. Which visualizations work better for certain
semantic types?
50
Visualize MementosArchive–It Collections Summarize with Aspects Visualize Summary
Evaluated via:
51. @shawnmjones @WebSciDL
Research Plan
51
03/201705/201708/201711/201702/201805/201808/201811/201802/201905/201908/201911/201902/202005/2020
Preliminary work
Implement a flexible framework
Addressing RQ1: Develop new algorithms for selecting
representative mementos
Addressing RQ2: Evaluation of individual memento
visualizations
Dissertation Candidacy Exam
Addressing RQ1: Evaluation of algorithms for selecting
representative mementos
Addressing RQ3: Develop candidate visualizations of groups of
mementos
Addressing RQ3: Evaluation of visualization of groups of
mementos
Disseration Composition
Dissertation Defense
SIGIR 2020
CHI 2020
iPres 2018
iPres 2019
JCDL 2019
CHI 2020
JCDL 2020
JCDL 2021
53. @shawnmjones @WebSciDL
Summary
Collection understanding is a problem
with web archive collections
Inconsistent metadata
1000s of mementos
1000s of collections
Costly for human review
We intend to produce a visualization that
serves as an abstract to assist in
collection understanding
Prior work in this area:
did not evaluate how well this method works
for collection understanding
only focused on collections about events
53
54. @shawnmjones @WebSciDL
Contributions
Existing work:
Semantic categories of web archive collections in Archive-It
Categories can be predicted by using structural features
Most collections are not about events
Future work:
Investigate new ways of surfacing representative mementos
Contribute knowledge of collection understanding in web archive collections
Which visualization methods work best for understanding mementos in a collection
New algorithms for use in collection understanding
54
55. @shawnmjones @WebSciDL
Contributions
Existing work:
Semantic categories of web archive collections in Archive-It
Categories can be predicted by using structural features
Most collections are not about events
Future work:
Investigate new ways of surfacing representative mementos
Contribute knowledge of collection understanding in web archive collections
Which visualization methods work best for understanding mementos in a collection
New algorithms for use in collection understanding
55
Thanks: