Social Cards Probably Provide For Better Understanding Of Web Archive Collections

Shawn Jones
Shawn JonesResearch Assistant at Old Dominion University
Shawn M. Jones Michele C. Weigle Michael L. Nelson
sjone@cs.odu.edu mweigle@cs.odu.edu mln@cs.odu.edu
@shawnmjones @weiglemc @phonedude_mln
Social Cards Probably Provide
For Better Understanding Of
Web Archive Collections
Old Dominion University
Web Science and Digital Libraries Research Group
@WebSciDL
Thanks to:
@shawnmjones @WebSciDL
Curators Build Web Archive Collections
2
Archived web pages, or mementos, are used by journalists, sociologists, and historians.
Tucson Shootings2008 OlympicsUniversity of Utah
@shawnmjones @WebSciDL
Web archive collections consist of mementos
– different versions of the same page over time
3
2013
2015
2018
University of Utah Office of Admissions
from the University of Utah Web Archive Collection
4/1/2015
3/5/2015
Tumblr Black Lives Matter Blog
from the #blacklivesmatter Collection
2/12/2015
@shawnmjones @WebSciDL
Archive-It allows curators to easily create
web archive collections
Archive-It was created by the Internet Archive as a consistent user interface for constructing web archive
collections. Curators can supply live web resources as seeds and establish crawling schedules of those
seeds to create mementos.
4
@shawnmjones @WebSciDL
… and these collections are used by other researchers
5
The collection curator is not the only user of the
collection!
These collections live a life after their curator
has stopped adding to them.
@shawnmjones @WebSciDL
The problem…
 There are multiple collections
about the same concept.
 The metadata for each collection is
non-existent, or inconsistently
applied.
 Many collections have
1000s of seeds with multiple
mementos.
 There are more than 8000
collections.
 Human review of these
mementos for collection
understanding is an expensive
proposition.
6
@shawnmjones @WebSciDL
Collections provide web pages based on a theme – we can
summarize those collections by using the best mementos
supporting that theme…
7
Web sites may group some
content, but curators theme some
of this content into collections
which we can reduce to stories.
Our stories consist of ~28
representative mementos, making
them much smaller than the
collections from which they come.
@shawnmjones @WebSciDL
Surrogates provide a visual summary of the content
behind a URI…
8
https://www.google.com/maps/dir/Old+Dominion+University,+Norfolk,+VA/Los+Alamos+National+Laboratory,+New+Mexico/@35 .3644614,-
109.356967,4z/data=!3m1!4b1!4m13!4m12!1m5!1m1!1 s0x89ba99ad24ba3945:0xcd2bdc432c4e4bac!2m2!1d-76.3067676!2d36
.8855515!1m5!1m1!1s0x87181246af22e765:0x7f5a90170c5df1b4!2m2!1 d-106.287162!2d35.8440582
Long URI:
The same URI represented by a
browser thumbnail surrogate:
The same URI represented by a
social card surrogate:
@shawnmjones @WebSciDL
Social media storytelling uses surrogates to provide a
“summary of summaries”
9
2 resources are shown from this Wakelet story6 resources are shown from this Storify story
Each surrogate summarizes a
web resource.
Each story groups the
surrogates, summarizing the
topic.
We want to use this technique
to summarize web archive
collections because users are
already familiar with this
visualization paradigm.
@shawnmjones @WebSciDL
We want to help the user answer the question of
“What does the underlying collection contain?”
10
There are many types of surrogates. Which one
best conveys the concepts of the underlying
collection?
If we already have the mementos describing a
collection, we need to visualize them with some
type of surrogate.
@shawnmjones @WebSciDL
Storytelling implemented
11
From this: 31,863 seed mementos of 955 seeds,
potentially 31K+ documents to review
To this: a story built from a representative sample of
~30 mementos visualized as surrogates
@shawnmjones @WebSciDL
Archive-It provides fields for metadata
12
Collection-wide metadata Metadata on individual seeds
Dublin
Core
+
Custom
Fields
@shawnmjones @WebSciDL
But, alas the metadata does not help…
13
A variety of metadata fields can be used for surrogates,
with the most popular field being Title.
The metadata on seeds and collections is
optional.
As the number of seeds increases, the less
metadata is present per seed.Even though this is the case, 54.60% of Archive-It
surrogates consist of only the seed URL and capture
dates.
@shawnmjones @WebSciDL
Understanding Archive-It Collections… Manually
14
Step 1: Decide upon search terms
and find a collection
via the search engine
Step 0:
Decide on your
information need
Step 2: Select a collection from
the many results available
@shawnmjones @WebSciDL
Understanding Archive-It Collections… Manually
15
Step 3: View the seed-centric collection page
Step 4: Choose a seed from this list that may meet the
information need. This collection contains 1,149 seeds.
How do you choose one without metadata to guide
you?
@shawnmjones @WebSciDL
Understanding Archive-It Collections… Manually
16
Step 5: View the mementos associated with that
seed.
This seed has 923 seed mementos. There are
more mementos linked from these mementos
Step 6: Read the text of the memento to learn about its
contents.
@shawnmjones @WebSciDL
Understanding Archive-It Collections… Manually
17
Step 7: Follow links and review more
content until you reach a page that was
not archived.
Step 8: Repeat steps 4-7 until enough information about has
been amassed to determine if the collection meets your
information need. This collection contained 80,484 seed
mementos.
@shawnmjones @WebSciDL
In spite of the lack of information in Archive-It surrogates,
some information can be gleaned from the URI…
18
Thus, surrogates like these may still yield enough
information for collection understanding.
@shawnmjones @WebSciDL
Existing surrogate services create a confusing
experience for mementos
19
Who published these resources?
Archive-It?
CNN?
Is the story author sharing fake news?
S. M. Jones. “A Preview of MementoEmbed: Embeddable Surrogates for Archived Web Pages.” https://ws-
dl.blogspot.com/2018/08/2018-08-01-preview-of-mementoembed.html, 2018.
embed.rocks social card
embed.ly social card
@shawnmjones @WebSciDL
Neither social media services nor card services were
reliable for storytelling, so we created MementoEmbed…
20
Information in the
MementoEmbed social
card is separated to
avoid issues of
confusion about
attribution.
MementoEmbed is
archive-aware. It can
locate information
about the memento
that is not available in
other cards.
S. M. Jones. “A Preview of MementoEmbed: Embeddable Surrogates for Archived Web Pages.” https://ws-
dl.blogspot.com/2018/08/2018-08-01-preview-of-mementoembed.html, 2018.
@shawnmjones @WebSciDL
We reused 4 stories generated by human Archive-It
curators from their own collections…
21
@shawnmjones @WebSciDL
We shared these stories, rendered using 6 different surrogate
types, with MT participants…
22
Archive-It like
Social Card
Browser thumbnails
Social Card With
Thumbnail as Image
(sc/t)
Social Card With
Thumbnail to
Right (sc+t)
Social Card with
Thumbnail on
Hover (sc^t)
• 4 stories of 15-17 mementos selected by human
Archive-It curators from their collections
• 6 different surrogate types
• Social cards and thumbnails were produced by
MementoEmbed
• 24 different story-surrogate combinations
• 120 MT participants
• They were given 30 seconds to view each story
@shawnmjones @WebSciDL
And then we asked them which of 2 of 6 mementos come
from the same collection…
23
• Each participant was shown a list of 6 surrogates of the same type as the story they just viewed.
• They were asked to choose the 2 that they thought came from the same collection.
• They were given as much time as they wished to answer the question.
• This is similar to the Sentence Verification Task from reading comprehension studies.
@shawnmjones @WebSciDL
Response times per surrogate had interesting means,
but p-values were not statistically significant at p < 0.05
24
p = 0.190
p = 0.202
@shawnmjones @WebSciDL
Correct answers per surrogate indicate that social
cards probably outperform the Archive-It surrogate
25
0 0.5 1 1.5 2 2.5
Archive-It Facsimile
Browser Thumbnails
Social Cards
sc+t
sc/t
sc^t
Correct Answers Per Surrogate
Median Mean
p = 0.0569
p = 0.0770
@shawnmjones @WebSciDL
Whenever thumbnails are present, more users interact
with them
26
We could not detect if participants were zooming in to view thumbnails, but most hovered when confronted
with a thumbnail, regardless of surrogate.
For browser thumbnails alone, most of the participants clicked the link to view the actual memento behind the
surrogate.
@shawnmjones @WebSciDL
Conclusions
 54.60% of Archive-It surrogates have only a URL and capture dates
 in spite of this, some information could still be gleaned from the URL
 Results from our MT study with 120 participants:
 response times were not statistically significant at p < 0.05
 for correct answers: social cards probably outperform the existing Archive-It surrogate at p = 0.0569
 when thumbnails are present, more participants interacted with them
 when thumbnails are present, more participants click through to view the page underneath rather than relying
upon the surrogate
 Conclusions:
 thumbnails encourage more interaction, specifically clicking through to the underlying page, than social cards
 social cards outperform the existing status quo Archive-It surrogate in terms of correct answers at p = 0.0569
 social cards probably provide for better understanding of web archive collections
 For more information, see https://arxiv.org/abs/1905.11342
27
1 of 27

Recommended

Improving Understanding of Web Archive Collections Through Storytelling - PhD... by
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Shawn Jones
868 views86 slides
Characteristics of Social Media Stories by
Characteristics of Social Media StoriesCharacteristics of Social Media Stories
Characteristics of Social Media StoriesYasmin AlNoamany, PhD
3.8K views49 slides
Detecting Off-Topic Pages in Web Archives by
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesYasmin AlNoamany, PhD
2.1K views26 slides
Combining Storytelling and Web Archives by
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web ArchivesMichael Nelson
1.6K views102 slides
csvconfyasmin2017_05_03 by
csvconfyasmin2017_05_03csvconfyasmin2017_05_03
csvconfyasmin2017_05_03Yasmin AlNoamany, PhD
4.5K views75 slides
The Many Shapes of Archive-It by
The Many Shapes of Archive-ItThe Many Shapes of Archive-It
The Many Shapes of Archive-ItShawn Jones
1.3K views95 slides

More Related Content

What's hot

Storytelling With Web Archives by
Storytelling With Web ArchivesStorytelling With Web Archives
Storytelling With Web ArchivesShawn Jones
1.3K views38 slides
Improving Collection Understanding in Web Archives by
Improving Collection Understanding in Web ArchivesImproving Collection Understanding in Web Archives
Improving Collection Understanding in Web ArchivesShawn Jones
1.9K views55 slides
Combining Social Media Storytelling With Web Archives by
Combining Social Media Storytelling With Web ArchivesCombining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesShawn Jones
577 views38 slides
Summarizing archival collections using storytelling techniques by
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniquesMichael Nelson
2.9K views91 slides
Using Web Archives to Enrich the Live Web Experience Through Storytelling by
Using Web Archives to Enrich  the Live Web Experience Through StorytellingUsing Web Archives to Enrich  the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingYasmin AlNoamany, PhD
2.9K views82 slides
Detecting Off-Topic Pages in Web Archives by
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesYasmin AlNoamany, PhD
2.9K views39 slides

What's hot(20)

Storytelling With Web Archives by Shawn Jones
Storytelling With Web ArchivesStorytelling With Web Archives
Storytelling With Web Archives
Shawn Jones1.3K views
Improving Collection Understanding in Web Archives by Shawn Jones
Improving Collection Understanding in Web ArchivesImproving Collection Understanding in Web Archives
Improving Collection Understanding in Web Archives
Shawn Jones1.9K views
Combining Social Media Storytelling With Web Archives by Shawn Jones
Combining Social Media Storytelling With Web ArchivesCombining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web Archives
Shawn Jones577 views
Summarizing archival collections using storytelling techniques by Michael Nelson
Summarizing archival collections using storytelling techniquesSummarizing archival collections using storytelling techniques
Summarizing archival collections using storytelling techniques
Michael Nelson2.9K views
Using Web Archives to Enrich the Live Web Experience Through Storytelling by Yasmin AlNoamany, PhD
Using Web Archives to Enrich  the Live Web Experience Through StorytellingUsing Web Archives to Enrich  the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Information Visualization - Visualizing Digital Collections at Archive-It by Michele Weigle
Information Visualization - Visualizing Digital Collections at Archive-ItInformation Visualization - Visualizing Digital Collections at Archive-It
Information Visualization - Visualizing Digital Collections at Archive-It
Michele Weigle2.6K views
Storytelling for Summarizing Collections in Web Archives by Michael Nelson
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
Michael Nelson2.8K views
Telling Stories with Web Archives by Michele Weigle
Telling Stories with Web ArchivesTelling Stories with Web Archives
Telling Stories with Web Archives
Michele Weigle4.5K views
Bootstrapping Web Archive Collections of Stories from Micro-collections in S... by Alexander Nwala
Bootstrapping Web Archive Collections  of Stories from Micro-collections in S...Bootstrapping Web Archive Collections  of Stories from Micro-collections in S...
Bootstrapping Web Archive Collections of Stories from Micro-collections in S...
Alexander Nwala1.1K views
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P... by Yasmin AlNoamany, PhD
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Open the Door, Let \'em In: Virtual School Libraries by Joyce Kasman Valenza
Open the Door, Let \'em In: Virtual School LibrariesOpen the Door, Let \'em In: Virtual School Libraries
Open the Door, Let \'em In: Virtual School Libraries
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App... by Justin Brunelle
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
Justin Brunelle3K views
A Framework for Verifying the Fixity of Archived Web Resources by maturban
A Framework for Verifying the Fixity of Archived Web ResourcesA Framework for Verifying the Fixity of Archived Web Resources
A Framework for Verifying the Fixity of Archived Web Resources
maturban460 views
Increase your readers' advisory knowledge with book blogs by Alexandra Yarrow
Increase your readers' advisory knowledge with book blogsIncrease your readers' advisory knowledge with book blogs
Increase your readers' advisory knowledge with book blogs
Alexandra Yarrow1.4K views
Wikipedia: Why? Who? and How? by Don Boozer
Wikipedia: Why? Who? and How?Wikipedia: Why? Who? and How?
Wikipedia: Why? Who? and How?
Don Boozer283 views
Ict application in sociology research by Dr.Amol Ubale
Ict application in sociology researchIct application in sociology research
Ict application in sociology research
Dr.Amol Ubale1.3K views
Web Archives at the Nexus of Good Fakes and Flawed Originals by Michael Nelson
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
Michael Nelson5.9K views

Similar to Social Cards Probably Provide For Better Understanding Of Web Archive Collections

Enabling Personal Use of Web Archives by
Enabling Personal Use of Web ArchivesEnabling Personal Use of Web Archives
Enabling Personal Use of Web ArchivesMichele Weigle
2.9K views92 slides
Linked Open Data for Archives by
Linked Open Data for ArchivesLinked Open Data for Archives
Linked Open Data for ArchivesCliff Landis
197 views48 slides
2014 TheNextWeb-Mapping connections with NodeXL by
2014 TheNextWeb-Mapping connections with NodeXL2014 TheNextWeb-Mapping connections with NodeXL
2014 TheNextWeb-Mapping connections with NodeXLMarc Smith
8K views73 slides
Radically Open at the National Archives by
Radically Open at the National ArchivesRadically Open at the National Archives
Radically Open at the National ArchivesJon Voss
1.2K views84 slides
Socialmediaandweb2.0 by
Socialmediaandweb2.0Socialmediaandweb2.0
Socialmediaandweb2.0shwetanema
300 views126 slides
WS-DL’s Work towards Enabling Personal Use of Web Archives by
WS-DL’s Work towards Enabling Personal Use of Web ArchivesWS-DL’s Work towards Enabling Personal Use of Web Archives
WS-DL’s Work towards Enabling Personal Use of Web ArchivesMichele Weigle
589 views92 slides

Similar to Social Cards Probably Provide For Better Understanding Of Web Archive Collections(20)

Enabling Personal Use of Web Archives by Michele Weigle
Enabling Personal Use of Web ArchivesEnabling Personal Use of Web Archives
Enabling Personal Use of Web Archives
Michele Weigle2.9K views
Linked Open Data for Archives by Cliff Landis
Linked Open Data for ArchivesLinked Open Data for Archives
Linked Open Data for Archives
Cliff Landis197 views
2014 TheNextWeb-Mapping connections with NodeXL by Marc Smith
2014 TheNextWeb-Mapping connections with NodeXL2014 TheNextWeb-Mapping connections with NodeXL
2014 TheNextWeb-Mapping connections with NodeXL
Marc Smith8K views
Radically Open at the National Archives by Jon Voss
Radically Open at the National ArchivesRadically Open at the National Archives
Radically Open at the National Archives
Jon Voss1.2K views
Socialmediaandweb2.0 by shwetanema
Socialmediaandweb2.0Socialmediaandweb2.0
Socialmediaandweb2.0
shwetanema300 views
WS-DL’s Work towards Enabling Personal Use of Web Archives by Michele Weigle
WS-DL’s Work towards Enabling Personal Use of Web ArchivesWS-DL’s Work towards Enabling Personal Use of Web Archives
WS-DL’s Work towards Enabling Personal Use of Web Archives
Michele Weigle589 views
The Memento Protocol and Research Issues With Web Archiving by Michael Nelson
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
Michael Nelson3.7K views
Online Collections Crawlability for Libraries, Archives, and Museums by mherbison
Online Collections Crawlability for Libraries, Archives, and MuseumsOnline Collections Crawlability for Libraries, Archives, and Museums
Online Collections Crawlability for Libraries, Archives, and Museums
mherbison1.7K views
Think Link: Network Insights with No Programming Skills by Marc Smith
Think Link: Network Insights with No Programming SkillsThink Link: Network Insights with No Programming Skills
Think Link: Network Insights with No Programming Skills
Marc Smith6K views
Weaponized Web Archives: Provenance Laundering of Short Order Evidence by Michael Nelson
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael Nelson1K views
Web 2.0 Excerpt for Troy Teachers by sraslim
Web 2.0 Excerpt for Troy TeachersWeb 2.0 Excerpt for Troy Teachers
Web 2.0 Excerpt for Troy Teachers
sraslim1.6K views
Socialmediaandweb2.0 by shwetanema
Socialmediaandweb2.0Socialmediaandweb2.0
Socialmediaandweb2.0
shwetanema260 views
Web Archiving in the Year eaee1902f186819154789ee22ca30035 by Michael Nelson
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Michael Nelson249 views
Weaponized Web Archives: Provenance Laundering of Short Order Evidence by Michael Nelson
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael Nelson2.8K views
Representing the world: How web users become web thinkers and web makers by judell
Representing the world: How web users become web thinkers and web makersRepresenting the world: How web users become web thinkers and web makers
Representing the world: How web users become web thinkers and web makers
judell1.6K views
Improving Collection Understanding For Web Archives With Storytelling: Shinin... by Shawn Jones
Improving Collection Understanding For Web Archives With Storytelling: Shinin...Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Shawn Jones355 views

More from Shawn Jones

Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea... by
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Shawn Jones
112 views12 slides
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea... by
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Shawn Jones
10 views12 slides
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G... by
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...Shawn Jones
321 views19 slides
Automatically Selecting Striking Images for Social Cards by
Automatically Selecting Striking Images for Social CardsAutomatically Selecting Striking Images for Social Cards
Automatically Selecting Striking Images for Social CardsShawn Jones
151 views20 slides
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration) by
SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)Shawn Jones
259 views13 slides
Reference Rot by
Reference RotReference Rot
Reference RotShawn Jones
243 views44 slides

More from Shawn Jones(10)

Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea... by Shawn Jones
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Shawn Jones112 views
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea... by Shawn Jones
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Shawn Jones10 views
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G... by Shawn Jones
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
Shawn Jones321 views
Automatically Selecting Striking Images for Social Cards by Shawn Jones
Automatically Selecting Striking Images for Social CardsAutomatically Selecting Striking Images for Social Cards
Automatically Selecting Striking Images for Social Cards
Shawn Jones151 views
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration) by Shawn Jones
SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)
Shawn Jones259 views
Avoiding Spoilers On MediaWiki Fan Sites Using Memento by Shawn Jones
Avoiding Spoilers On MediaWiki Fan Sites Using MementoAvoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using Memento
Shawn Jones2.8K views
Continuous Integration: Finding problems soonest by Shawn Jones
Continuous Integration: Finding problems soonestContinuous Integration: Finding problems soonest
Continuous Integration: Finding problems soonest
Shawn Jones1.2K views
A Brief Introduction to Test-Driven Development by Shawn Jones
A Brief Introduction to Test-Driven DevelopmentA Brief Introduction to Test-Driven Development
A Brief Introduction to Test-Driven Development
Shawn Jones1.3K views
Reconstructing the past with media wiki by Shawn Jones
Reconstructing the past with media wikiReconstructing the past with media wiki
Reconstructing the past with media wiki
Shawn Jones2.2K views

Recently uploaded

EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN... by
EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN...EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN...
EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN...gynomark
9 views15 slides
Pollination By Nagapradheesh.M.pptx by
Pollination By Nagapradheesh.M.pptxPollination By Nagapradheesh.M.pptx
Pollination By Nagapradheesh.M.pptxMNAGAPRADHEESH
12 views9 slides
Ecology by
Ecology Ecology
Ecology Abhijith Raj.R
6 views10 slides
PRINCIPLES-OF ASSESSMENT by
PRINCIPLES-OF ASSESSMENTPRINCIPLES-OF ASSESSMENT
PRINCIPLES-OF ASSESSMENTrbalmagro
9 views12 slides
DATABASE MANAGEMENT SYSTEM by
DATABASE MANAGEMENT SYSTEMDATABASE MANAGEMENT SYSTEM
DATABASE MANAGEMENT SYSTEMDr. GOPINATH D
5 views50 slides
"How can I develop my learning path in bioinformatics? by
"How can I develop my learning path in bioinformatics?"How can I develop my learning path in bioinformatics?
"How can I develop my learning path in bioinformatics?Bioinformy
17 views13 slides

Recently uploaded(20)

EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN... by gynomark
EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN...EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN...
EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN...
gynomark9 views
Pollination By Nagapradheesh.M.pptx by MNAGAPRADHEESH
Pollination By Nagapradheesh.M.pptxPollination By Nagapradheesh.M.pptx
Pollination By Nagapradheesh.M.pptx
MNAGAPRADHEESH12 views
PRINCIPLES-OF ASSESSMENT by rbalmagro
PRINCIPLES-OF ASSESSMENTPRINCIPLES-OF ASSESSMENT
PRINCIPLES-OF ASSESSMENT
rbalmagro9 views
"How can I develop my learning path in bioinformatics? by Bioinformy
"How can I develop my learning path in bioinformatics?"How can I develop my learning path in bioinformatics?
"How can I develop my learning path in bioinformatics?
Bioinformy17 views
Physical Characterization of Moon Impactor WE0913A by Sérgio Sacani
Physical Characterization of Moon Impactor WE0913APhysical Characterization of Moon Impactor WE0913A
Physical Characterization of Moon Impactor WE0913A
Sérgio Sacani42 views
A training, certification and marketing scheme for informal dairy vendors in ... by ILRI
A training, certification and marketing scheme for informal dairy vendors in ...A training, certification and marketing scheme for informal dairy vendors in ...
A training, certification and marketing scheme for informal dairy vendors in ...
ILRI8 views
Matthias Beller ChemAI 231116.pptx by Marco Tibaldi
Matthias Beller ChemAI 231116.pptxMatthias Beller ChemAI 231116.pptx
Matthias Beller ChemAI 231116.pptx
Marco Tibaldi82 views
himalay baruah acid fast staining.pptx by HimalayBaruah
himalay baruah acid fast staining.pptxhimalay baruah acid fast staining.pptx
himalay baruah acid fast staining.pptx
HimalayBaruah5 views
Workshop LLM Life Sciences ChemAI 231116.pptx by Marco Tibaldi
Workshop LLM Life Sciences ChemAI 231116.pptxWorkshop LLM Life Sciences ChemAI 231116.pptx
Workshop LLM Life Sciences ChemAI 231116.pptx
Marco Tibaldi96 views
Women in the Workplace Industry Insights for Life Sciences by workplacesurvey
Women in the Workplace Industry Insights for Life SciencesWomen in the Workplace Industry Insights for Life Sciences
Women in the Workplace Industry Insights for Life Sciences
workplacesurvey5 views
Distinct distributions of elliptical and disk galaxies across the Local Super... by Sérgio Sacani
Distinct distributions of elliptical and disk galaxies across the Local Super...Distinct distributions of elliptical and disk galaxies across the Local Super...
Distinct distributions of elliptical and disk galaxies across the Local Super...
Sérgio Sacani30 views
Workshop Chemical Robotics ChemAI 231116.pptx by Marco Tibaldi
Workshop Chemical Robotics ChemAI 231116.pptxWorkshop Chemical Robotics ChemAI 231116.pptx
Workshop Chemical Robotics ChemAI 231116.pptx
Marco Tibaldi90 views
Metatheoretical Panda-Samaneh Borji.pdf by samanehborji
Metatheoretical Panda-Samaneh Borji.pdfMetatheoretical Panda-Samaneh Borji.pdf
Metatheoretical Panda-Samaneh Borji.pdf
samanehborji16 views
Ethical issues associated with Genetically Modified Crops and Genetically Mod... by PunithKumars6
Ethical issues associated with Genetically Modified Crops and Genetically Mod...Ethical issues associated with Genetically Modified Crops and Genetically Mod...
Ethical issues associated with Genetically Modified Crops and Genetically Mod...
PunithKumars618 views

Social Cards Probably Provide For Better Understanding Of Web Archive Collections

  • 1. Shawn M. Jones Michele C. Weigle Michael L. Nelson sjone@cs.odu.edu mweigle@cs.odu.edu mln@cs.odu.edu @shawnmjones @weiglemc @phonedude_mln Social Cards Probably Provide For Better Understanding Of Web Archive Collections Old Dominion University Web Science and Digital Libraries Research Group @WebSciDL Thanks to:
  • 2. @shawnmjones @WebSciDL Curators Build Web Archive Collections 2 Archived web pages, or mementos, are used by journalists, sociologists, and historians. Tucson Shootings2008 OlympicsUniversity of Utah
  • 3. @shawnmjones @WebSciDL Web archive collections consist of mementos – different versions of the same page over time 3 2013 2015 2018 University of Utah Office of Admissions from the University of Utah Web Archive Collection 4/1/2015 3/5/2015 Tumblr Black Lives Matter Blog from the #blacklivesmatter Collection 2/12/2015
  • 4. @shawnmjones @WebSciDL Archive-It allows curators to easily create web archive collections Archive-It was created by the Internet Archive as a consistent user interface for constructing web archive collections. Curators can supply live web resources as seeds and establish crawling schedules of those seeds to create mementos. 4
  • 5. @shawnmjones @WebSciDL … and these collections are used by other researchers 5 The collection curator is not the only user of the collection! These collections live a life after their curator has stopped adding to them.
  • 6. @shawnmjones @WebSciDL The problem…  There are multiple collections about the same concept.  The metadata for each collection is non-existent, or inconsistently applied.  Many collections have 1000s of seeds with multiple mementos.  There are more than 8000 collections.  Human review of these mementos for collection understanding is an expensive proposition. 6
  • 7. @shawnmjones @WebSciDL Collections provide web pages based on a theme – we can summarize those collections by using the best mementos supporting that theme… 7 Web sites may group some content, but curators theme some of this content into collections which we can reduce to stories. Our stories consist of ~28 representative mementos, making them much smaller than the collections from which they come.
  • 8. @shawnmjones @WebSciDL Surrogates provide a visual summary of the content behind a URI… 8 https://www.google.com/maps/dir/Old+Dominion+University,+Norfolk,+VA/Los+Alamos+National+Laboratory,+New+Mexico/@35 .3644614,- 109.356967,4z/data=!3m1!4b1!4m13!4m12!1m5!1m1!1 s0x89ba99ad24ba3945:0xcd2bdc432c4e4bac!2m2!1d-76.3067676!2d36 .8855515!1m5!1m1!1s0x87181246af22e765:0x7f5a90170c5df1b4!2m2!1 d-106.287162!2d35.8440582 Long URI: The same URI represented by a browser thumbnail surrogate: The same URI represented by a social card surrogate:
  • 9. @shawnmjones @WebSciDL Social media storytelling uses surrogates to provide a “summary of summaries” 9 2 resources are shown from this Wakelet story6 resources are shown from this Storify story Each surrogate summarizes a web resource. Each story groups the surrogates, summarizing the topic. We want to use this technique to summarize web archive collections because users are already familiar with this visualization paradigm.
  • 10. @shawnmjones @WebSciDL We want to help the user answer the question of “What does the underlying collection contain?” 10 There are many types of surrogates. Which one best conveys the concepts of the underlying collection? If we already have the mementos describing a collection, we need to visualize them with some type of surrogate.
  • 11. @shawnmjones @WebSciDL Storytelling implemented 11 From this: 31,863 seed mementos of 955 seeds, potentially 31K+ documents to review To this: a story built from a representative sample of ~30 mementos visualized as surrogates
  • 12. @shawnmjones @WebSciDL Archive-It provides fields for metadata 12 Collection-wide metadata Metadata on individual seeds Dublin Core + Custom Fields
  • 13. @shawnmjones @WebSciDL But, alas the metadata does not help… 13 A variety of metadata fields can be used for surrogates, with the most popular field being Title. The metadata on seeds and collections is optional. As the number of seeds increases, the less metadata is present per seed.Even though this is the case, 54.60% of Archive-It surrogates consist of only the seed URL and capture dates.
  • 14. @shawnmjones @WebSciDL Understanding Archive-It Collections… Manually 14 Step 1: Decide upon search terms and find a collection via the search engine Step 0: Decide on your information need Step 2: Select a collection from the many results available
  • 15. @shawnmjones @WebSciDL Understanding Archive-It Collections… Manually 15 Step 3: View the seed-centric collection page Step 4: Choose a seed from this list that may meet the information need. This collection contains 1,149 seeds. How do you choose one without metadata to guide you?
  • 16. @shawnmjones @WebSciDL Understanding Archive-It Collections… Manually 16 Step 5: View the mementos associated with that seed. This seed has 923 seed mementos. There are more mementos linked from these mementos Step 6: Read the text of the memento to learn about its contents.
  • 17. @shawnmjones @WebSciDL Understanding Archive-It Collections… Manually 17 Step 7: Follow links and review more content until you reach a page that was not archived. Step 8: Repeat steps 4-7 until enough information about has been amassed to determine if the collection meets your information need. This collection contained 80,484 seed mementos.
  • 18. @shawnmjones @WebSciDL In spite of the lack of information in Archive-It surrogates, some information can be gleaned from the URI… 18 Thus, surrogates like these may still yield enough information for collection understanding.
  • 19. @shawnmjones @WebSciDL Existing surrogate services create a confusing experience for mementos 19 Who published these resources? Archive-It? CNN? Is the story author sharing fake news? S. M. Jones. “A Preview of MementoEmbed: Embeddable Surrogates for Archived Web Pages.” https://ws- dl.blogspot.com/2018/08/2018-08-01-preview-of-mementoembed.html, 2018. embed.rocks social card embed.ly social card
  • 20. @shawnmjones @WebSciDL Neither social media services nor card services were reliable for storytelling, so we created MementoEmbed… 20 Information in the MementoEmbed social card is separated to avoid issues of confusion about attribution. MementoEmbed is archive-aware. It can locate information about the memento that is not available in other cards. S. M. Jones. “A Preview of MementoEmbed: Embeddable Surrogates for Archived Web Pages.” https://ws- dl.blogspot.com/2018/08/2018-08-01-preview-of-mementoembed.html, 2018.
  • 21. @shawnmjones @WebSciDL We reused 4 stories generated by human Archive-It curators from their own collections… 21
  • 22. @shawnmjones @WebSciDL We shared these stories, rendered using 6 different surrogate types, with MT participants… 22 Archive-It like Social Card Browser thumbnails Social Card With Thumbnail as Image (sc/t) Social Card With Thumbnail to Right (sc+t) Social Card with Thumbnail on Hover (sc^t) • 4 stories of 15-17 mementos selected by human Archive-It curators from their collections • 6 different surrogate types • Social cards and thumbnails were produced by MementoEmbed • 24 different story-surrogate combinations • 120 MT participants • They were given 30 seconds to view each story
  • 23. @shawnmjones @WebSciDL And then we asked them which of 2 of 6 mementos come from the same collection… 23 • Each participant was shown a list of 6 surrogates of the same type as the story they just viewed. • They were asked to choose the 2 that they thought came from the same collection. • They were given as much time as they wished to answer the question. • This is similar to the Sentence Verification Task from reading comprehension studies.
  • 24. @shawnmjones @WebSciDL Response times per surrogate had interesting means, but p-values were not statistically significant at p < 0.05 24 p = 0.190 p = 0.202
  • 25. @shawnmjones @WebSciDL Correct answers per surrogate indicate that social cards probably outperform the Archive-It surrogate 25 0 0.5 1 1.5 2 2.5 Archive-It Facsimile Browser Thumbnails Social Cards sc+t sc/t sc^t Correct Answers Per Surrogate Median Mean p = 0.0569 p = 0.0770
  • 26. @shawnmjones @WebSciDL Whenever thumbnails are present, more users interact with them 26 We could not detect if participants were zooming in to view thumbnails, but most hovered when confronted with a thumbnail, regardless of surrogate. For browser thumbnails alone, most of the participants clicked the link to view the actual memento behind the surrogate.
  • 27. @shawnmjones @WebSciDL Conclusions  54.60% of Archive-It surrogates have only a URL and capture dates  in spite of this, some information could still be gleaned from the URL  Results from our MT study with 120 participants:  response times were not statistically significant at p < 0.05  for correct answers: social cards probably outperform the existing Archive-It surrogate at p = 0.0569  when thumbnails are present, more participants interacted with them  when thumbnails are present, more participants click through to view the page underneath rather than relying upon the surrogate  Conclusions:  thumbnails encourage more interaction, specifically clicking through to the underlying page, than social cards  social cards outperform the existing status quo Archive-It surrogate in terms of correct answers at p = 0.0569  social cards probably provide for better understanding of web archive collections  For more information, see https://arxiv.org/abs/1905.11342 27