Social Cards Probably Provide For Better Understanding Of Web Archive Collections

Shawn M. Jones Michele C. Weigle Michael L. Nelson
sjone@cs.odu.edu mweigle@cs.odu.edu mln@cs.odu.edu
@shawnmjones @weiglemc @phonedude_mln
Social Cards Probably Provide
For Better Understanding Of
Web Archive Collections
Old Dominion University
Web Science and Digital Libraries Research Group
@WebSciDL
Thanks to:

@shawnmjones @WebSciDL
Curators Build Web Archive Collections
2
Archived web pages, or mementos, are used by journalists, sociologists, and historians.
Tucson Shootings2008 OlympicsUniversity of Utah

Web archive collections consist of mementos
– different versions of the same page over time
3
2013
2015
2018
University of Utah Office of Admissions
from the University of Utah Web Archive Collection
4/1/2015
3/5/2015
Tumblr Black Lives Matter Blog
from the #blacklivesmatter Collection
2/12/2015

Archive-It allows curators to easily create
web archive collections
Archive-It was created by the Internet Archive as a consistent user interface for constructing web archive
collections. Curators can supply live web resources as seeds and establish crawling schedules of those
seeds to create mementos.
4

… and these collections are used by other researchers
5
The collection curator is not the only user of the
collection!
These collections live a life after their curator
has stopped adding to them.

The problem…
 There are multiple collections
about the same concept.
 The metadata for each collection is
non-existent, or inconsistently
applied.
 Many collections have
1000s of seeds with multiple
mementos.
 There are more than 8000
collections.
 Human review of these
mementos for collection
understanding is an expensive
proposition.
6

Collections provide web pages based on a theme – we can
summarize those collections by using the best mementos
supporting that theme…
7
Web sites may group some
content, but curators theme some
of this content into collections
which we can reduce to stories.
Our stories consist of ~28
representative mementos, making
them much smaller than the
collections from which they come.

Surrogates provide a visual summary of the content
behind a URI…
8
https://www.google.com/maps/dir/Old+Dominion+University,+Norfolk,+VA/Los+Alamos+National+Laboratory,+New+Mexico/@35 .3644614,-
109.356967,4z/data=!3m1!4b1!4m13!4m12!1m5!1m1!1 s0x89ba99ad24ba3945:0xcd2bdc432c4e4bac!2m2!1d-76.3067676!2d36
.8855515!1m5!1m1!1s0x87181246af22e765:0x7f5a90170c5df1b4!2m2!1 d-106.287162!2d35.8440582
Long URI:
The same URI represented by a
browser thumbnail surrogate:
The same URI represented by a
social card surrogate:

Social media storytelling uses surrogates to provide a
“summary of summaries”
9
2 resources are shown from this Wakelet story6 resources are shown from this Storify story
Each surrogate summarizes a
web resource.
Each story groups the
surrogates, summarizing the
topic.
We want to use this technique
to summarize web archive
collections because users are
already familiar with this
visualization paradigm.

We want to help the user answer the question of
“What does the underlying collection contain?”
10
There are many types of surrogates. Which one
best conveys the concepts of the underlying
collection?
If we already have the mementos describing a
collection, we need to visualize them with some
type of surrogate.

Storytelling implemented
11
From this: 31,863 seed mementos of 955 seeds,
potentially 31K+ documents to review
To this: a story built from a representative sample of
~30 mementos visualized as surrogates

Archive-It provides fields for metadata
12
Collection-wide metadata Metadata on individual seeds
Dublin
Core
+
Custom
Fields

But, alas the metadata does not help…
13
A variety of metadata fields can be used for surrogates,
with the most popular field being Title.
The metadata on seeds and collections is
optional.
As the number of seeds increases, the less
metadata is present per seed.Even though this is the case, 54.60% of Archive-It
surrogates consist of only the seed URL and capture
dates.

Understanding Archive-It Collections… Manually
14
Step 1: Decide upon search terms
and find a collection
via the search engine
Step 0:
Decide on your
information need
Step 2: Select a collection from
the many results available

15
Step 3: View the seed-centric collection page
Step 4: Choose a seed from this list that may meet the
information need. This collection contains 1,149 seeds.
How do you choose one without metadata to guide
you?

16
Step 5: View the mementos associated with that
seed.
This seed has 923 seed mementos. There are
more mementos linked from these mementos
Step 6: Read the text of the memento to learn about its
contents.

17
Step 7: Follow links and review more
content until you reach a page that was
not archived.
Step 8: Repeat steps 4-7 until enough information about has
been amassed to determine if the collection meets your
information need. This collection contained 80,484 seed
mementos.

In spite of the lack of information in Archive-It surrogates,
some information can be gleaned from the URI…
18
Thus, surrogates like these may still yield enough
information for collection understanding.

Existing surrogate services create a confusing
experience for mementos
19
Who published these resources?
Archive-It?
CNN?
Is the story author sharing fake news?
S. M. Jones. “A Preview of MementoEmbed: Embeddable Surrogates for Archived Web Pages.” https://ws-
dl.blogspot.com/2018/08/2018-08-01-preview-of-mementoembed.html, 2018.
embed.rocks social card
embed.ly social card

Neither social media services nor card services were
reliable for storytelling, so we created MementoEmbed…
20
Information in the
MementoEmbed social
card is separated to
avoid issues of
confusion about
attribution.
MementoEmbed is
archive-aware. It can
locate information
about the memento
that is not available in
other cards.
S. M. Jones. “A Preview of MementoEmbed: Embeddable Surrogates for Archived Web Pages.” https://ws-
dl.blogspot.com/2018/08/2018-08-01-preview-of-mementoembed.html, 2018.

We reused 4 stories generated by human Archive-It
curators from their own collections…
21

We shared these stories, rendered using 6 different surrogate
types, with MT participants…
22
Archive-It like
Social Card
Browser thumbnails
Social Card With
Thumbnail as Image
(sc/t)
Social Card With
Thumbnail to
Right (sc+t)
Social Card with
Thumbnail on
Hover (sc^t)
• 4 stories of 15-17 mementos selected by human
Archive-It curators from their collections
• 6 different surrogate types
• Social cards and thumbnails were produced by
MementoEmbed
• 24 different story-surrogate combinations
• 120 MT participants
• They were given 30 seconds to view each story

And then we asked them which of 2 of 6 mementos come
from the same collection…
23
• Each participant was shown a list of 6 surrogates of the same type as the story they just viewed.
• They were asked to choose the 2 that they thought came from the same collection.
• They were given as much time as they wished to answer the question.
• This is similar to the Sentence Verification Task from reading comprehension studies.

Response times per surrogate had interesting means,
but p-values were not statistically significant at p < 0.05
24
p = 0.190
p = 0.202

Correct answers per surrogate indicate that social
cards probably outperform the Archive-It surrogate
25
0 0.5 1 1.5 2 2.5
Archive-It Facsimile
Browser Thumbnails
Social Cards
sc+t
sc/t
sc^t
Correct Answers Per Surrogate
Median Mean
p = 0.0569
p = 0.0770

Whenever thumbnails are present, more users interact
with them
26
We could not detect if participants were zooming in to view thumbnails, but most hovered when confronted
with a thumbnail, regardless of surrogate.
For browser thumbnails alone, most of the participants clicked the link to view the actual memento behind the
surrogate.

Conclusions
 54.60% of Archive-It surrogates have only a URL and capture dates
 in spite of this, some information could still be gleaned from the URL
 Results from our MT study with 120 participants:
 response times were not statistically significant at p < 0.05
 for correct answers: social cards probably outperform the existing Archive-It surrogate at p = 0.0569
 when thumbnails are present, more participants interacted with them
 when thumbnails are present, more participants click through to view the page underneath rather than relying
upon the surrogate
 Conclusions:
 thumbnails encourage more interaction, specifically clicking through to the underlying page, than social cards
 social cards outperform the existing status quo Archive-It surrogate in terms of correct answers at p = 0.0569
 social cards probably provide for better understanding of web archive collections
 For more information, see https://arxiv.org/abs/1905.11342
27

Social Cards Probably Provide For Better Understanding Of Web Archive Collections

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Social Cards Probably Provide For Better Understanding Of Web Archive Collections

Similar to Social Cards Probably Provide For Better Understanding Of Web Archive Collections (20)

More from Shawn Jones

More from Shawn Jones (11)

Recently uploaded

Recently uploaded (20)

Social Cards Probably Provide For Better Understanding Of Web Archive Collections