SlideShare a Scribd company logo
Combining Storytelling
and Web Archives
Yasmin AlNoamany
Michele C. Weigle
Michael L. Nelson
Old Dominion University
Web Science and Digital Libraries Group
ws-dl.cs.odu.edu
@WebSciDL
This work is supported in part by IMLS LG-71-15-0077
Old Dominion University ECE Department Colloquium
2015-11-13
IMLS-Funded Research
1. Use small “stories” to summarize much larger
collections of archived web pages
– big  small
2. Generate web archive collections by mining
user-generated stories for seed URIs
• small  big
http://ws-dl.blogspot.com/2015/10/2015-10-07-imls-and-nsf-fund-web.html
The Web is important for
our cultural heritage.
How to preserve the
Web??!!!
3
What are web archives?
2002
2005
2009
2011 4
Memento: an archived snapshot as
it appeared in the past
2002
2005
2009
2011 5
What are some web archives?
• The entire web
• On-demand free services
• National libraries
• Theme-based collections
6
For more information: https://en.wikipedia.org/wiki/Web_archiving
http://mementoweb.org/
Archive-It, a subscription-based service,
hosts curated web collections
7
> 3,000
collections
~340
institutions
> 10B archived
pages
8
Collection
title
Collection
categorization
based on the
curator
Seed URI
Metadata
about the
collection
Text search
box
The group that
the resource
belongs to
List of
the seed
URIs
Timespan of the
resource
and the number
of times it has
been captured
What is the problem with the archived
collections?
9
There is more than one collection about
“Egyptian Revolution”
10
• “2010-2011 Arab Spring” https://archive-it.org/collections/3101
• “North Africa & the Middle East 2011-2013” https://archive-it.org/collections/2349
• “Egypt Revolution and Politics” https://archive-it.org/collections/2358
Collection understanding and
collection summarization are not
supported currently
Not easy to answer “what’s in that collection?”
11
Our Early Attempts at Collection
Understanding Tried To Include
Everything…
15
http://ws-dl.blogspot.com/2012/08/2012-08-10-ms-thesis-visualizing.html
Timelines
Treemaps
Treemaps
(1000s of Seeds X 1000s of Mementos) +
Dimension of Time ==
Conventional Viz Methods Not Applicable
19
Idea:
Storytelling
20
Stories in Literature
Story elements: setting, characters, sequence, exposition, conflict,
climax, resolution
21
Once upon a time
http://www.learner.org/interactives/story/
Stories in social media
22
“It's hard to define a story, but I know it when I see it” (Alexander, 2008)
A loose context of the conventional definition in Literature
“storytelling” is becoming a popular
technique in social media
23
There is interest in
choosing k items from N
where k << N
24
I have 1000s of posts, images, friends’
posts, etc. in Facebook
25
Facebook reduces this to 1 minute
video (Facebook Look Back)
• https://www.youtube.com/watch?v=AWoBAjo
vHT4
26
Interest from Twitter in storytelling
27
1 Second Everyday
28
What are the limitations of
storytelling services?
29
30
The Egyptian Revolution on Storify
31
Bookmarking, not preserving!
Social media can go off-topic
or disappear
32
A year after publication, about 11% of content
shared on social media will be gone (SalahEldeen, 2012)*
* SalahEldeen, H.M., Nelson, M.L.: Losing my revolution: How many resources shared on social media have been lost? TPDL’12.
Despite these limitations, how do we
combine storytelling & archives?
33
We sample k mementos from N pages of
the collection to create a summary story
S
1
S
2
S
3
S
4
S
2
S
1
S
3
Collection Y
S
3
S
2
S
1
Collection Z
Collection X
Use interface people already know how to use
to summarize collections
35
Archived collectionsStorytelling services
Archived enriched
stories
Research Question
Can we (semi-)automatically identify,
evaluate, and select candidate archived
web pages from archived collections for
generating stories that summarize the
collection?
36
The archived collection has
two dimensions
37
Time
URI
The Two Dimensions
Applied to Stories
Fixed Page – Fixed Time:
differences in GeoIP,
mobile, etc. (Kelly, 2013)
Fixed Page – Sliding Time:
evolution of a single page
(or domain) through time
Sliding Page – Fixed Time:
different perspectives on a
point in time
Sliding Page – Sliding Time:
broadest possible coverage
of a collection
38
same
Time
different
URI
same
different
Fixed Pages, Fixed Time
R1
R1
R1
R1
t1 t3t2 t5t4 t6
39
Fixed Page, Fixed Time
40
A desktop Chrome user-agent
http://www.cnn.com/2014/02/24/world/africa/egypt-
politics/index.html?hpt=wo_c2
Andriod Chrome user-agent
http://www.cnn.com/2014/02/24/world/africa/egypt-
politics/index.html?hpt=wo_c2
Richard Schneider and Frank McCown, “First Steps in Archiving the Mobile Web: Automated Discovery of Mobile Websites”, JCDL 2013.
Fixed Page, Sliding Time
R R R R R R
t1 t3t2 t5t4 t6
41
Feb 1 Feb 1 Feb 2
Feb 4 Feb 5 Feb 7
Feb 9 Feb 11
Feb 11
Sliding Page, Fixed Time
R1
R2
R3
R4
t1 t3t2 t5t4 t6
43
Feb. 11, 2011
Mubarak resigns
44
Sliding Page, Sliding Time
R1
R2
R1
R3
R4
R2
t1 t3t2 t5t4 t6
45
Jan 27 Jan 31
Feb 7Feb 4
Feb 11 Feb 11
Feb 2
Jan 25
Feb 10
Handcrafted Stories in Storify
47
Sliding Page, Sliding TimeSliding Page, Fixed Time
http://storify.com/yasmina_anwar/the-egyptian-revolution-on-archive-it-
collection
http://storify.com/yasmina_anwar/the-egyptian-revolution-story-
from-archive-it-hete
Other Interfaces Possible…
48
Replaying Story of Egyptian Revolution
http://www.dipity.com/YasminAlnoamany/Jan-25-Egyptian-Revolution/
Yasmin’s Research Status
1. Establishing a baseline
a. For archived collections (Visualizing Digital Collections at Archive-It, JCDL 2012)
b. For human stories (Characteristics of Social Media Stories, TPDL 2015)
2. Calculate the aboutness of the collections
3. Find off-topic URIs (Detecting Off-Topic Pages in Web Archives, TPDL 2015)
4. Identify the best k mementos that summarize the
collection for desired story type
5. Evaluate & visualize the generated stories
49
What about using stories to seed web
archive collections?
50
Creating collections is hard.
Requires combination of awareness,
domain knowledge, crowdsourcing, etc.
51
Ukrainian Themed Collections, Prior to February 2014
Ukrainian Conflict Collection Started February 2014
Crowdsourcing Collection of Seed URIs
What about social media, especially
curated social media?
55
https://storify.com/TheInkBee/riots-persist-in-kyiv
(late January, 2014)
https://storify.com/jesseoriordan/ukraine-fight-for-their-rights
(late January, 2014)
https://storify.com/jesseoriordan/ukraine-fight-for-their-rights
(late January, 2014)
Compelling narratives are created by
knowledgeable (and biased) users arranging news
stories, images, social media, commentary, etc.
Intuitively, this is a “good story”.
How do we quantify this?
Characteristics of
Social Media Stories
Yasmin AlNoamany, Michele C. Weigle, and
Michael L. Nelson
Old Dominion University
Web Science and Digital Libraries Group
http://ws-dl.cs.odu.edu/, @WebSciDL
59
TPDL 2015
16.09.2015, Poznań, Poland
http://www.cs.odu.edu/~mln/pubs/tpdl-2015/tpdl-2015-stories.pdf
“Storytelling” is becoming a
popular technique in social media
60
Story creation interface on Storify
61
Research Question
62
We’d like to automatically generate stories,
but we don’t yet know what constitutes a
“good” story.
Q: What are the structural characteristics
of popular (i.e., receiving the most views)
human-generated stories?
What is the length of a story
(the number of resources per story)?
• This story has
31 resources
63
1
3
2
What are the types of resources
that compose a story?
• This story has
– 19 quotes
– 8 images
– 4 videos
64
Quotes
Video
What are the most frequently
used domains?
• 90% twitter.com
• 7% instagram.com
• 3% facebook.com
65
Twitter.com
Twitter.com
Twitter.com
The popularity of the most used
domains in the stories
66
mr_sedivy.tripod.com is ranked
3,084,064 by Alexa global rank
Twitter.com is ranked 9 by Alexa
global rank
What is the timespan (editing
time) of the stories?
• This story covers ~4
hours
(from 3:12 PM – 7:27 PM)
• Is there a relation
between the
timespan and the
features of the
story?
67
http://storify.com/yahoonews/boston-marathon-bombing
What is the “HTTP 404” rate
of the resources?
68
68
Can we find these missing
resources in the archives?
69
http://www.nzherald.co.nz/world/news/article.cfm?c_id=2&objectid=10705523
What differentiates a
popular story?
70
19,795 views 64 views
Data set construction
1. Queried Storify Search API with the top 1000
English keywords as seen by Yahoo
2. Downloaded 145,682 stories in JSON on Feb.
2015
3. Considered stories authored in 2014 or earlier,
resulting in 37,486 stories
4. Eliminated stories with only zero or one
elements or zero views, resulting in:
a. 14,568 unique stories
b. 10,199 unique users
c. 1,251,160 web and text elements
71
Characteristics of human-
generated stories
Features Views Web elements Text elements Subscribers Timespan
in hours
25th percentile 14 10 0 0 0.18
50th percentile 51 23 1 4 3
75th percentile 268 69 9 21 120
90th percentile 1949 210 19 85 1747
Maximum 11,284,896 2,216 559 1,726,143 36,111
72
Characteristics of human-
generated stories
Features Views Web elements Text elements Subscribers Timespan
in hours
25th percentile 14 10 0 0 0.18
50th percentile 51 23 1 4 3
75th percentile 268 69 9 21 120
90th percentile 1949 210 19 85 1747
Maximum 11,284,896 2,216 559 1,726,143 36,111
73
44 % of the stories have no text elements at all
What is the Average Timespan
for Stories?
74
What is the Average Timespan
for Stories?
Intervals Percentage Median web
elements
Median text
elements
Median views
(normalized by
time)
0-60 seconds 14.0% 15 0 23
1-60 minutes 26.7% 19 0 53
1-24 hours 23.4% 25 5 110
1-7 days 13.5% 26 7 78
1-4 weeks 8.4% 26 9 80
1-12 months 10.9% 38 2 129
1-4 years 3.1% 56 15 156
75
There is nearly linear relation between the time length of the story and the
number of elements
What is the Average Timespan
for Stories?
76
The story with the longest timespan in our data set covers more than 4 years
and with more than 13,000 views. It had only 33 web elements and 51 total
elements.
Intervals Percentage Median web
elements
Median text
elements
Median views
(normalized by
time)
0-60 seconds 14.0% 15 0 23
1-60 minutes 26.7% 19 0 53
1-24 hours 23.4% 25 5 110
1-7 days 13.5% 26 7 78
1-4 weeks 8.4% 26 9 80
1-12 months 10.9% 38 2 129
1-4 years 3.1% 56 15 156
The distribution of the elements
(1,251,160)
Resource Proportion
links 70.8%
images 18.4%
text 8.1%
video 2.0%
quotes 0.7%
77
Text elements are relatively rare, meaning that few users choose to annotate the
web elements in their story.
What are the most frequently
used domains?
78
The most frequent domains
in the stories
• Domain Canonicalization
– www.cnn.com  cnn.com
• Dereference all the shortened URIs
– For example: t.co, bit.ly
• 25,947 unique domains
79
The most frequent domains
in the stories represent ~92%
Host Frequency Percentage Alexa Global Rank
as of 2015-03
Category
twitter.com 943,859 82.05% 8 Social media
instagram.com 45,188 3.93% 25 Photos
youtube.com 22,076 1.92% 3 Videos
facebook.com 13,930 1.21% 2 Social media
flickr.com 7,317 0.64% 126 Photos
patch.com 5,783 0.50% 2,096 News
plus.google.com 3,413 0.30% 1 Social media
tumblr.com 3,066 0.27% 31 Blogs
blogspot.com 1,857 0.16% 18 Blogs
imgur.com 1,756 0.15% 36 Photos
coolpile.com 1,706 0.15% 149,281 Entertainment
wordpress.com 1,615 0.14% 33 Blogs
giphy.com 1,055 0.09% 1,604 Photos
bbc.com 966 0.08% 156 News
lastampa.it 927 0.08% 2,440 News
pinterest.com 892 0.08% 32 Photos
softandapps.info 861 0.07% 160,980 News
photobucket.com 768 0.07% 341 Photos
nytimes.com 744 0.06% 97 News
soundcloud.com 736 0.06% 167 Audio
80
The most frequent domains
in the stories represent ~92%
Host Frequency Percentage Alexa Global Rank
as of 2015-03
Category
twitter.com 943,859 82.05% 8 Social media
instagram.com 45,188 3.93% 25 Photos
youtube.com 22,076 1.92% 3 Videos
facebook.com 13,930 1.21% 2 Social media
flickr.com 7,317 0.64% 126 Photos
patch.com 5,783 0.50% 2,096 News
plus.google.com 3,413 0.30% 1 Social media
tumblr.com 3,066 0.27% 31 Blogs
blogspot.com 1,857 0.16% 18 Blogs
imgur.com 1,756 0.15% 36 Photos
coolpile.com 1,706 0.15% 149,281 Entertainment
wordpress.com 1,615 0.14% 33 Blogs
giphy.com 1,055 0.09% 1,604 Photos
bbc.com 966 0.08% 156 News
lastampa.it 927 0.08% 2,440 News
pinterest.com 892 0.08% 32 Photos
softandapps.info 861 0.07% 160,980 News
photobucket.com 768 0.07% 341 Photos
nytimes.com 744 0.06% 97 News
soundcloud.com 736 0.06% 167 Audio
81
The most frequent domains
in the stories represent ~92%
Host Frequency Percentage Alexa Global Rank
as of 2015-03
Category
twitter.com 943,859 82.05% 8 Social media
instagram.com 45,188 3.93% 25 Photos
youtube.com 22,076 1.92% 3 Videos
facebook.com 13,930 1.21% 2 Social media
flickr.com 7,317 0.64% 126 Photos
patch.com 5,783 0.50% 2,096 News
plus.google.com 3,413 0.30% 1 Social media
tumblr.com 3,066 0.27% 31 Blogs
blogspot.com 1,857 0.16% 18 Blogs
imgur.com 1,756 0.15% 36 Photos
coolpile.com 1,706 0.15% 149,281 Entertainment
wordpress.com 1,615 0.14% 33 Blogs
giphy.com 1,055 0.09% 1,604 Photos
bbc.com 966 0.08% 156 News
lastampa.it 927 0.08% 2,440 News
pinterest.com 892 0.08% 32 Photos
softandapps.info 861 0.07% 160,980 News
photobucket.com 768 0.07% 341 Photos
nytimes.com 744 0.06% 97 News
soundcloud.com 736 0.06% 167 Audio
82
The most frequent domains
in the stories represent ~92%
Host Frequency Percentage Alexa Global Rank
as of 2015-03
Category
twitter.com 943,859 82.05% 8 Social media
instagram.com 45,188 3.93% 25 Photos
youtube.com 22,076 1.92% 3 Videos
facebook.com 13,930 1.21% 2 Social media
flickr.com 7,317 0.64% 126 Photos
patch.com 5,783 0.50% 2,096 News
plus.google.com 3,413 0.30% 1 Social media
tumblr.com 3,066 0.27% 31 Blogs
blogspot.com 1,857 0.16% 18 Blogs
imgur.com 1,756 0.15% 36 Photos
coolpile.com 1,706 0.15% 149,281 Entertainment
wordpress.com 1,615 0.14% 33 Blogs
giphy.com 1,055 0.09% 1,604 Photos
bbc.com 966 0.08% 156 News
lastampa.it 927 0.08% 2,440 News
pinterest.com 892 0.08% 32 Photos
softandapps.info 861 0.07% 160,980 News
photobucket.com 768 0.07% 341 Photos
nytimes.com 744 0.06% 97 News
soundcloud.com 736 0.06% 167 Audio
83
The embedded resources
of the tweets
84
https://pbs.twimg.com/media/BfnnY-0CcAA56oy.png
The embedded resources
of the tweets
85
Domain Percentage Category
twimg.com 46.17% Images
instagram.com 4.28% Images
youtube.com 2.82% Videos
linkis.com 2.04% Media sharing
facebook.com 1.40% Social Media
wordpress.com 0.61% Blogs
vine.co 0.53% Videos
blogspot.com 0.52% Blogs
storify.com 0.49% Social Network
bbc.com 0.44% News
• We sampled 5% of
47,512 tweets:
15,217
• The unique URIs:
14,616
The embedded resources
of the tweets
Domain Percentage Category
twimg.com 46.17% Images
instagram.com 4.28% Images
youtube.com 2.82% Videos
linkis.com 2.04% Media sharing
facebook.com 1.40% Social Media
wordpress.com 0.61% Blogs
vine.co 0.53% Videos
blogspot.com 0.52% Blogs
storify.com 0.49% Social Network
bbc.com 0.44% News
86
• We sampled 5% of
47,512 tweets:
15,217
• The unique URIs:
14,616
• 46% are photos from
twitter.com
The embedded resources
of the tweets
Domain Percentage Category
twimg.com 46.17% Images
instagram.com 4.28% Images
youtube.com 2.82% Videos
linkis.com 2.04% Media sharing
facebook.com 1.40% Social Media
wordpress.com 0.61% Blogs
vine.co 0.53% Videos
blogspot.com 0.52% Blogs
storify.com 0.49% Social Network
bbc.com 0.44% News
87
• We sampled 5% of
47,512 tweets:
15,217
• The unique URIs:
14,616
• 0.49% of the stories
point to other stories
in Storify
Is there a correlation between
Alexa global rank and rank
within Storify?
88
Most of the time, the highly ranked
resources correlate the most used
resources in human-generated stories
n 10 15 25 50 100
Kendal 0.1555 0.4476 0.3372 0.3194 0.2485
89
Most of the time, the highly ranked
resources correlate the most used
resources in human-generated stories
n 10 15 25 50 100
Kendal 0.1555 0.4476 0.3372 0.3194 0.2485
90
This is in contrast to the usage
of resources in Pinterest [1]
[1] Zhong, C., Shah, S., Sundaravadivelan, K., Sastry, N.: Sharing the Loves: Understanding the How
and Why of Online Content Curation. ICWSM 2013
Image source: http://workmonk.in/testblog/wp-content/uploads/2014/01/pinterest.jpg
How many of the resources in
these stories disappear
every year?
Can we find these missing
resources in the archives?
91
The existence of the resources
• We checked the live web and public web archives
for 265,181 URIs
– 202,452 URIs from story web elements
– 47,512 randomly sampled tweet URIs
– 15,217 URIs of embedded resources in those tweets
• The unique URIs are 253,978
• We examined the results of the five most
frequent domains in the stories
– twitter.com, instagram.com, youtube.com,
facebook.com, flickr.com
92
The decay rate of the resources
in the stories
93
40.8% of the stories
contain missing
resources with an
average value of
10.3% per story
The decay rate of the resources
in the stories
94
There is a nearly
linear decay rate of
resources through
time. This matches
the finding by
SalahEldeen [2].
[2] SalahEldeen, H.M., Nelson, M.L.: Losing my revolution: How many resources shared on social media have been lost? TPDL’12.
Existence on the live web and
in the archives
95
Of all the unique URIs, 11.8% are missing on the live web.
Existence on the live web and
in the archives
96
• The social media is not well-archived like the regular web
• Facebook uses robots.txt to block web archiving by the Internet Archive
What differentiates a
popular story?
97
Specifying the popular and the
unpopular stories
• The stories are divided based on the number
of their views, then normalized by the amount
of time they were available on the web.
• Popular stories
– the top 25% of stories that have the most views
– 3,642 stories
98
The distributions for the features
of the stories
99
• Based on Kruskal-Wallis test, at the p ≤ 0.05 significance level, the popular and the
unpopular stories are different in terms of most of the features
• Popular stories tend to have:
• more web elements (medians of 28 vs. 21)
• longer timespan (5 hours vs. 2 hours) than the unpopular stories
Do Popular Stories have a Lower
Decay Rate?
100
The 75th percentile of decay rate per popular story is 10% of the resources,
while it is 15% in the unpopular stories
The distributions for the elements
of the stories
101
Conclusions
• We analyzed 14,568 stories from Storify comprising 1,251,160
elements.
• Popular stories have a min/median/max value of 2/28/1950
elements, with the unpopular stories having 2/21/2216
• Popular stories have a median of 12 multimedia resources (the
unpopular stories have a median of 7)
• Of the popular stories, 38% receive continuing edits (as
opposed to 35%)
• In popular stories, only 11% of web elements are missing on the
live web (as opposed to 13%)
• The percentage of the missing resources is proportional with
the age of the stories.
102

More Related Content

What's hot

Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
Michael Nelson
 
The Off-Topic Memento Toolkit
The Off-Topic Memento ToolkitThe Off-Topic Memento Toolkit
The Off-Topic Memento Toolkit
Shawn Jones
 
Characteristics of Social Media Stories
Characteristics of Social Media StoriesCharacteristics of Social Media Stories
Characteristics of Social Media Stories
Yasmin AlNoamany, PhD
 
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Shawn Jones
 
Storytelling With Web Archives
Storytelling With Web ArchivesStorytelling With Web Archives
Storytelling With Web Archives
Shawn Jones
 
The Many Shapes of Archive-It
The Many Shapes of Archive-ItThe Many Shapes of Archive-It
The Many Shapes of Archive-It
Shawn Jones
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
Michael Nelson
 
Where Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive CollectionsWhere Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive Collections
Shawn Jones
 
Telling Stories with Web Archives
Telling Stories with Web ArchivesTelling Stories with Web Archives
Telling Stories with Web Archives
Michele Weigle
 
Improving Collection Understanding in Web Archives
Improving Collection Understanding in Web ArchivesImproving Collection Understanding in Web Archives
Improving Collection Understanding in Web Archives
Shawn Jones
 
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
Justin Brunelle
 
Combining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesCombining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web Archives
Shawn Jones
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
Michael Nelson
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Yasmin AlNoamany, PhD
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael Nelson
 
Information Visualization - Visualizing Digital Collections at Archive-It
Information Visualization - Visualizing Digital Collections at Archive-ItInformation Visualization - Visualizing Digital Collections at Archive-It
Information Visualization - Visualizing Digital Collections at Archive-It
Michele Weigle
 
Virtual Libraries
Virtual LibrariesVirtual Libraries
Virtual Libraries
Joyce Kasman Valenza
 
Learning & Web 2.0: It's all about Play!
Learning & Web 2.0:  It's all about Play!Learning & Web 2.0:  It's all about Play!
Learning & Web 2.0: It's all about Play!hblowers
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael Nelson
 
Open the Door, Let \'em In: Virtual School Libraries
Open the Door, Let \'em In: Virtual School LibrariesOpen the Door, Let \'em In: Virtual School Libraries
Open the Door, Let \'em In: Virtual School LibrariesJoyce Kasman Valenza
 

What's hot (20)

Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
 
The Off-Topic Memento Toolkit
The Off-Topic Memento ToolkitThe Off-Topic Memento Toolkit
The Off-Topic Memento Toolkit
 
Characteristics of Social Media Stories
Characteristics of Social Media StoriesCharacteristics of Social Media Stories
Characteristics of Social Media Stories
 
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
 
Storytelling With Web Archives
Storytelling With Web ArchivesStorytelling With Web Archives
Storytelling With Web Archives
 
The Many Shapes of Archive-It
The Many Shapes of Archive-ItThe Many Shapes of Archive-It
The Many Shapes of Archive-It
 
Why We Need Multiple Archives
Why We Need Multiple ArchivesWhy We Need Multiple Archives
Why We Need Multiple Archives
 
Where Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive CollectionsWhere Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive Collections
 
Telling Stories with Web Archives
Telling Stories with Web ArchivesTelling Stories with Web Archives
Telling Stories with Web Archives
 
Improving Collection Understanding in Web Archives
Improving Collection Understanding in Web ArchivesImproving Collection Understanding in Web Archives
Improving Collection Understanding in Web Archives
 
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
 
Combining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesCombining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web Archives
 
The Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web ArchivingThe Memento Protocol and Research Issues With Web Archiving
The Memento Protocol and Research Issues With Web Archiving
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Information Visualization - Visualizing Digital Collections at Archive-It
Information Visualization - Visualizing Digital Collections at Archive-ItInformation Visualization - Visualizing Digital Collections at Archive-It
Information Visualization - Visualizing Digital Collections at Archive-It
 
Virtual Libraries
Virtual LibrariesVirtual Libraries
Virtual Libraries
 
Learning & Web 2.0: It's all about Play!
Learning & Web 2.0:  It's all about Play!Learning & Web 2.0:  It's all about Play!
Learning & Web 2.0: It's all about Play!
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Open the Door, Let \'em In: Virtual School Libraries
Open the Door, Let \'em In: Virtual School LibrariesOpen the Door, Let \'em In: Virtual School Libraries
Open the Door, Let \'em In: Virtual School Libraries
 

Viewers also liked

Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolEvaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Michael Nelson
 
On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over Time
Michael Nelson
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Michael Nelson
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Michael Nelson
 
Assessing the Quality of Web Archives
Assessing the Quality of Web ArchivesAssessing the Quality of Web Archives
Assessing the Quality of Web Archives
Michael Nelson
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet Archive
Michael Nelson
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingUsing Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Yasmin AlNoamany, PhD
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
Michael Nelson
 
Software as a Well-Formed Research Object
Software as a Well-Formed Research ObjectSoftware as a Well-Formed Research Object
Software as a Well-Formed Research Object
Yasmin AlNoamany, PhD
 
Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member
Michael Nelson
 
Web Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionWeb Archiving: A Brief Introduction
Web Archiving: A Brief Introduction
Sawood Alam
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Michael Nelson
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
Michael Nelson
 
More Archives, More Better
More Archives, More Better More Archives, More Better
More Archives, More Better
Michael Nelson
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Michael Nelson
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
Michael Nelson
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Michael Nelson
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?
Michael Nelson
 
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
OAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange ProjectOAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange Project
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
Michael Nelson
 

Viewers also liked (19)

Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench ToolEvaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
Evaluating the SiteStory Transactional Web Archive with the ApacheBench Tool
 
On the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over TimeOn the Change in Archivability of Websites Over Time
On the Change in Archivability of Websites Over Time
 
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web ArchivingWho Will Archive the Archives? Thoughts About the Future of Web Archiving
Who Will Archive the Archives? Thoughts About the Future of Web Archiving
 
Profiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content LanguageProfiling Web Archive Coverage for Top-Level Domain and Content Language
Profiling Web Archive Coverage for Top-Level Domain and Content Language
 
Assessing the Quality of Web Archives
Assessing the Quality of Web ArchivesAssessing the Quality of Web Archives
Assessing the Quality of Web Archives
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet Archive
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingUsing Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
 
When Should I Make Preservation Copies of Myself?
When Should I Make Preservation Copies of Myself?�When Should I Make Preservation Copies of Myself?�
When Should I Make Preservation Copies of Myself?
 
Software as a Well-Formed Research Object
Software as a Well-Formed Research ObjectSoftware as a Well-Formed Research Object
Software as a Well-Formed Research Object
 
Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member Old Dominion University Computer Science IIPC New Member
Old Dominion University Computer Science IIPC New Member
 
Web Archiving: A Brief Introduction
Web Archiving: A Brief IntroductionWeb Archiving: A Brief Introduction
Web Archiving: A Brief Introduction
 
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with JavascriptCombining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
Combining Heritrix and PhantomJS for Better Crawling of Pages with Javascript
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
More Archives, More Better
More Archives, More Better More Archives, More Better
More Archives, More Better
 
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
Resurrecting My Revolutionsing Social Link Neighborhood in Bringing Context t...
 
Evaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived PagesEvaluating the Temporal Coherence of Archived Pages
Evaluating the Temporal Coherence of Archived Pages
 
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
Web Archiving Activities of ODU’s Web Science and Digital Library Research G...
 
Why Care About the Past?
Why Care About the Past?Why Care About the Past?
Why Care About the Past?
 
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
OAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange ProjectOAI-ORE:  The Open Archives Initiative  Object Reuse and Exchange Project
OAI-ORE: The Open Archives Initiative Object Reuse and Exchange Project
 

Similar to Combining Storytelling and Web Archives

Presentation to Northern Sydney District Teacher Librarian Association
Presentation to Northern Sydney District Teacher Librarian Association Presentation to Northern Sydney District Teacher Librarian Association
Presentation to Northern Sydney District Teacher Librarian Association Roxanne Missingham
 
Carrying the Banner: Reinventing News on Your University Website
Carrying the Banner: Reinventing News on Your University WebsiteCarrying the Banner: Reinventing News on Your University Website
Carrying the Banner: Reinventing News on Your University Website
Georgiana Cohen
 
Visualization notes
Visualization notesVisualization notes
Visualization notes
University of South Australlia
 
Closing the Digital Skills Gap
Closing the Digital Skills GapClosing the Digital Skills Gap
Closing the Digital Skills Gap
Bobbi Newman
 
Let the trumpet sound 2003 version
Let the trumpet sound 2003 versionLet the trumpet sound 2003 version
Let the trumpet sound 2003 versionJohan Koren
 
EMMA Summer School - E. Bruno, I. Merciai, M. Tizzani - MOOC Production autho...
EMMA Summer School - E. Bruno, I. Merciai, M. Tizzani - MOOC Production autho...EMMA Summer School - E. Bruno, I. Merciai, M. Tizzani - MOOC Production autho...
EMMA Summer School - E. Bruno, I. Merciai, M. Tizzani - MOOC Production autho...
EUmoocs
 
2009 Poster - Masters research on YouTube and cultural heritage
2009 Poster - Masters research on YouTube and cultural heritage2009 Poster - Masters research on YouTube and cultural heritage
2009 Poster - Masters research on YouTube and cultural heritage
Leisa Gibbons
 
Web 2.0...it’s okay to play!
Web 2.0...it’s okay to play!Web 2.0...it’s okay to play!
Web 2.0...it’s okay to play!
daveyp
 
The web is a mess: how I learnt to stop worrying and love web archiving. Kris...
The web is a mess: how I learnt to stop worrying and love web archiving. Kris...The web is a mess: how I learnt to stop worrying and love web archiving. Kris...
The web is a mess: how I learnt to stop worrying and love web archiving. Kris...
Biblioteca Nacional de España
 
Jane Finnis Keynote NDF2009 Part Two (see Part One)
Jane Finnis Keynote NDF2009  Part Two (see Part One)Jane Finnis Keynote NDF2009  Part Two (see Part One)
Jane Finnis Keynote NDF2009 Part Two (see Part One)
Jane Finnis
 
Micro Blogs A Moblos Y Herramientas Web Para Aprendizaje
Micro Blogs A Moblos Y Herramientas Web Para AprendizajeMicro Blogs A Moblos Y Herramientas Web Para Aprendizaje
Micro Blogs A Moblos Y Herramientas Web Para Aprendizajecampus party
 
Improving Organizational Efficiency with Wiki-based Intranets
Improving Organizational Efficiency with Wiki-based IntranetsImproving Organizational Efficiency with Wiki-based Intranets
Improving Organizational Efficiency with Wiki-based Intranets
Thomas Siegers
 
Introduction to the Social Web and its applications
Introduction to the Social Web and its applicationsIntroduction to the Social Web and its applications
Introduction to the Social Web and its applications
mdabrowski
 
Preparing for the Impact of Web 3.0
Preparing for the Impact of Web 3.0 Preparing for the Impact of Web 3.0
Preparing for the Impact of Web 3.0
Judy O'Connell
 
Wesleyan2.0
Wesleyan2.0Wesleyan2.0
Wesleyan2.0
sbclapp
 
Social Media For Public Libraries: Basics and Beyond
Social Media For Public Libraries: Basics and BeyondSocial Media For Public Libraries: Basics and Beyond
Social Media For Public Libraries: Basics and Beyond
Elise C. Cole
 
Wrangling Wikipedia
Wrangling WikipediaWrangling Wikipedia
Wrangling Wikipedia
moniquekclark
 
Presentation for wellington sept 12
Presentation for wellington sept 12Presentation for wellington sept 12
Presentation for wellington sept 12
Rosalind Dibley
 
Presentation for wellington sept 12
Presentation for wellington sept 12Presentation for wellington sept 12
Presentation for wellington sept 12
Rosalind Dibley
 
OCLC Research update: Emerging trends.
OCLC Research update: Emerging trends.OCLC Research update: Emerging trends.
OCLC Research update: Emerging trends.
Lynn Connaway
 

Similar to Combining Storytelling and Web Archives (20)

Presentation to Northern Sydney District Teacher Librarian Association
Presentation to Northern Sydney District Teacher Librarian Association Presentation to Northern Sydney District Teacher Librarian Association
Presentation to Northern Sydney District Teacher Librarian Association
 
Carrying the Banner: Reinventing News on Your University Website
Carrying the Banner: Reinventing News on Your University WebsiteCarrying the Banner: Reinventing News on Your University Website
Carrying the Banner: Reinventing News on Your University Website
 
Visualization notes
Visualization notesVisualization notes
Visualization notes
 
Closing the Digital Skills Gap
Closing the Digital Skills GapClosing the Digital Skills Gap
Closing the Digital Skills Gap
 
Let the trumpet sound 2003 version
Let the trumpet sound 2003 versionLet the trumpet sound 2003 version
Let the trumpet sound 2003 version
 
EMMA Summer School - E. Bruno, I. Merciai, M. Tizzani - MOOC Production autho...
EMMA Summer School - E. Bruno, I. Merciai, M. Tizzani - MOOC Production autho...EMMA Summer School - E. Bruno, I. Merciai, M. Tizzani - MOOC Production autho...
EMMA Summer School - E. Bruno, I. Merciai, M. Tizzani - MOOC Production autho...
 
2009 Poster - Masters research on YouTube and cultural heritage
2009 Poster - Masters research on YouTube and cultural heritage2009 Poster - Masters research on YouTube and cultural heritage
2009 Poster - Masters research on YouTube and cultural heritage
 
Web 2.0...it’s okay to play!
Web 2.0...it’s okay to play!Web 2.0...it’s okay to play!
Web 2.0...it’s okay to play!
 
The web is a mess: how I learnt to stop worrying and love web archiving. Kris...
The web is a mess: how I learnt to stop worrying and love web archiving. Kris...The web is a mess: how I learnt to stop worrying and love web archiving. Kris...
The web is a mess: how I learnt to stop worrying and love web archiving. Kris...
 
Jane Finnis Keynote NDF2009 Part Two (see Part One)
Jane Finnis Keynote NDF2009  Part Two (see Part One)Jane Finnis Keynote NDF2009  Part Two (see Part One)
Jane Finnis Keynote NDF2009 Part Two (see Part One)
 
Micro Blogs A Moblos Y Herramientas Web Para Aprendizaje
Micro Blogs A Moblos Y Herramientas Web Para AprendizajeMicro Blogs A Moblos Y Herramientas Web Para Aprendizaje
Micro Blogs A Moblos Y Herramientas Web Para Aprendizaje
 
Improving Organizational Efficiency with Wiki-based Intranets
Improving Organizational Efficiency with Wiki-based IntranetsImproving Organizational Efficiency with Wiki-based Intranets
Improving Organizational Efficiency with Wiki-based Intranets
 
Introduction to the Social Web and its applications
Introduction to the Social Web and its applicationsIntroduction to the Social Web and its applications
Introduction to the Social Web and its applications
 
Preparing for the Impact of Web 3.0
Preparing for the Impact of Web 3.0 Preparing for the Impact of Web 3.0
Preparing for the Impact of Web 3.0
 
Wesleyan2.0
Wesleyan2.0Wesleyan2.0
Wesleyan2.0
 
Social Media For Public Libraries: Basics and Beyond
Social Media For Public Libraries: Basics and BeyondSocial Media For Public Libraries: Basics and Beyond
Social Media For Public Libraries: Basics and Beyond
 
Wrangling Wikipedia
Wrangling WikipediaWrangling Wikipedia
Wrangling Wikipedia
 
Presentation for wellington sept 12
Presentation for wellington sept 12Presentation for wellington sept 12
Presentation for wellington sept 12
 
Presentation for wellington sept 12
Presentation for wellington sept 12Presentation for wellington sept 12
Presentation for wellington sept 12
 
OCLC Research update: Emerging trends.
OCLC Research update: Emerging trends.OCLC Research update: Emerging trends.
OCLC Research update: Emerging trends.
 

More from Michael Nelson

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Michael Nelson
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pages
Michael Nelson
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
Michael Nelson
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
Michael Nelson
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Michael Nelson
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Michael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Michael Nelson
 

More from Michael Nelson (7)

Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pages
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Web Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed OriginalsWeb Archives at the Nexus of Good Fakes and Flawed Originals
Web Archives at the Nexus of Good Fakes and Flawed Originals
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 

Recently uploaded

GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
ssuserbfdca9
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 

Recently uploaded (20)

GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 

Combining Storytelling and Web Archives

  • 1. Combining Storytelling and Web Archives Yasmin AlNoamany Michele C. Weigle Michael L. Nelson Old Dominion University Web Science and Digital Libraries Group ws-dl.cs.odu.edu @WebSciDL This work is supported in part by IMLS LG-71-15-0077 Old Dominion University ECE Department Colloquium 2015-11-13
  • 2. IMLS-Funded Research 1. Use small “stories” to summarize much larger collections of archived web pages – big  small 2. Generate web archive collections by mining user-generated stories for seed URIs • small  big http://ws-dl.blogspot.com/2015/10/2015-10-07-imls-and-nsf-fund-web.html
  • 3. The Web is important for our cultural heritage. How to preserve the Web??!!! 3
  • 4. What are web archives? 2002 2005 2009 2011 4
  • 5. Memento: an archived snapshot as it appeared in the past 2002 2005 2009 2011 5
  • 6. What are some web archives? • The entire web • On-demand free services • National libraries • Theme-based collections 6 For more information: https://en.wikipedia.org/wiki/Web_archiving http://mementoweb.org/
  • 7. Archive-It, a subscription-based service, hosts curated web collections 7 > 3,000 collections ~340 institutions > 10B archived pages
  • 8. 8 Collection title Collection categorization based on the curator Seed URI Metadata about the collection Text search box The group that the resource belongs to List of the seed URIs Timespan of the resource and the number of times it has been captured
  • 9. What is the problem with the archived collections? 9
  • 10. There is more than one collection about “Egyptian Revolution” 10 • “2010-2011 Arab Spring” https://archive-it.org/collections/3101 • “North Africa & the Middle East 2011-2013” https://archive-it.org/collections/2349 • “Egypt Revolution and Politics” https://archive-it.org/collections/2358
  • 11. Collection understanding and collection summarization are not supported currently Not easy to answer “what’s in that collection?” 11
  • 12.
  • 13.
  • 14.
  • 15. Our Early Attempts at Collection Understanding Tried To Include Everything… 15 http://ws-dl.blogspot.com/2012/08/2012-08-10-ms-thesis-visualizing.html
  • 19. (1000s of Seeds X 1000s of Mementos) + Dimension of Time == Conventional Viz Methods Not Applicable 19
  • 21. Stories in Literature Story elements: setting, characters, sequence, exposition, conflict, climax, resolution 21 Once upon a time http://www.learner.org/interactives/story/
  • 22. Stories in social media 22 “It's hard to define a story, but I know it when I see it” (Alexander, 2008) A loose context of the conventional definition in Literature
  • 23. “storytelling” is becoming a popular technique in social media 23
  • 24. There is interest in choosing k items from N where k << N 24
  • 25. I have 1000s of posts, images, friends’ posts, etc. in Facebook 25
  • 26. Facebook reduces this to 1 minute video (Facebook Look Back) • https://www.youtube.com/watch?v=AWoBAjo vHT4 26
  • 27. Interest from Twitter in storytelling 27
  • 29. What are the limitations of storytelling services? 29
  • 32. Social media can go off-topic or disappear 32 A year after publication, about 11% of content shared on social media will be gone (SalahEldeen, 2012)* * SalahEldeen, H.M., Nelson, M.L.: Losing my revolution: How many resources shared on social media have been lost? TPDL’12.
  • 33. Despite these limitations, how do we combine storytelling & archives? 33
  • 34. We sample k mementos from N pages of the collection to create a summary story S 1 S 2 S 3 S 4 S 2 S 1 S 3 Collection Y S 3 S 2 S 1 Collection Z Collection X
  • 35. Use interface people already know how to use to summarize collections 35 Archived collectionsStorytelling services Archived enriched stories
  • 36. Research Question Can we (semi-)automatically identify, evaluate, and select candidate archived web pages from archived collections for generating stories that summarize the collection? 36
  • 37. The archived collection has two dimensions 37 Time URI
  • 38. The Two Dimensions Applied to Stories Fixed Page – Fixed Time: differences in GeoIP, mobile, etc. (Kelly, 2013) Fixed Page – Sliding Time: evolution of a single page (or domain) through time Sliding Page – Fixed Time: different perspectives on a point in time Sliding Page – Sliding Time: broadest possible coverage of a collection 38 same Time different URI same different
  • 39. Fixed Pages, Fixed Time R1 R1 R1 R1 t1 t3t2 t5t4 t6 39
  • 40. Fixed Page, Fixed Time 40 A desktop Chrome user-agent http://www.cnn.com/2014/02/24/world/africa/egypt- politics/index.html?hpt=wo_c2 Andriod Chrome user-agent http://www.cnn.com/2014/02/24/world/africa/egypt- politics/index.html?hpt=wo_c2 Richard Schneider and Frank McCown, “First Steps in Archiving the Mobile Web: Automated Discovery of Mobile Websites”, JCDL 2013.
  • 41. Fixed Page, Sliding Time R R R R R R t1 t3t2 t5t4 t6 41
  • 42. Feb 1 Feb 1 Feb 2 Feb 4 Feb 5 Feb 7 Feb 9 Feb 11 Feb 11
  • 43. Sliding Page, Fixed Time R1 R2 R3 R4 t1 t3t2 t5t4 t6 43
  • 44. Feb. 11, 2011 Mubarak resigns 44
  • 45. Sliding Page, Sliding Time R1 R2 R1 R3 R4 R2 t1 t3t2 t5t4 t6 45
  • 46. Jan 27 Jan 31 Feb 7Feb 4 Feb 11 Feb 11 Feb 2 Jan 25 Feb 10
  • 47. Handcrafted Stories in Storify 47 Sliding Page, Sliding TimeSliding Page, Fixed Time http://storify.com/yasmina_anwar/the-egyptian-revolution-on-archive-it- collection http://storify.com/yasmina_anwar/the-egyptian-revolution-story- from-archive-it-hete
  • 48. Other Interfaces Possible… 48 Replaying Story of Egyptian Revolution http://www.dipity.com/YasminAlnoamany/Jan-25-Egyptian-Revolution/
  • 49. Yasmin’s Research Status 1. Establishing a baseline a. For archived collections (Visualizing Digital Collections at Archive-It, JCDL 2012) b. For human stories (Characteristics of Social Media Stories, TPDL 2015) 2. Calculate the aboutness of the collections 3. Find off-topic URIs (Detecting Off-Topic Pages in Web Archives, TPDL 2015) 4. Identify the best k mementos that summarize the collection for desired story type 5. Evaluate & visualize the generated stories 49
  • 50. What about using stories to seed web archive collections? 50
  • 51. Creating collections is hard. Requires combination of awareness, domain knowledge, crowdsourcing, etc. 51
  • 52. Ukrainian Themed Collections, Prior to February 2014
  • 53. Ukrainian Conflict Collection Started February 2014
  • 55. What about social media, especially curated social media? 55
  • 58. https://storify.com/jesseoriordan/ukraine-fight-for-their-rights (late January, 2014) Compelling narratives are created by knowledgeable (and biased) users arranging news stories, images, social media, commentary, etc. Intuitively, this is a “good story”. How do we quantify this?
  • 59. Characteristics of Social Media Stories Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson Old Dominion University Web Science and Digital Libraries Group http://ws-dl.cs.odu.edu/, @WebSciDL 59 TPDL 2015 16.09.2015, Poznań, Poland http://www.cs.odu.edu/~mln/pubs/tpdl-2015/tpdl-2015-stories.pdf
  • 60. “Storytelling” is becoming a popular technique in social media 60
  • 61. Story creation interface on Storify 61
  • 62. Research Question 62 We’d like to automatically generate stories, but we don’t yet know what constitutes a “good” story. Q: What are the structural characteristics of popular (i.e., receiving the most views) human-generated stories?
  • 63. What is the length of a story (the number of resources per story)? • This story has 31 resources 63 1 3 2
  • 64. What are the types of resources that compose a story? • This story has – 19 quotes – 8 images – 4 videos 64 Quotes Video
  • 65. What are the most frequently used domains? • 90% twitter.com • 7% instagram.com • 3% facebook.com 65 Twitter.com Twitter.com Twitter.com
  • 66. The popularity of the most used domains in the stories 66 mr_sedivy.tripod.com is ranked 3,084,064 by Alexa global rank Twitter.com is ranked 9 by Alexa global rank
  • 67. What is the timespan (editing time) of the stories? • This story covers ~4 hours (from 3:12 PM – 7:27 PM) • Is there a relation between the timespan and the features of the story? 67 http://storify.com/yahoonews/boston-marathon-bombing
  • 68. What is the “HTTP 404” rate of the resources? 68 68
  • 69. Can we find these missing resources in the archives? 69 http://www.nzherald.co.nz/world/news/article.cfm?c_id=2&objectid=10705523
  • 70. What differentiates a popular story? 70 19,795 views 64 views
  • 71. Data set construction 1. Queried Storify Search API with the top 1000 English keywords as seen by Yahoo 2. Downloaded 145,682 stories in JSON on Feb. 2015 3. Considered stories authored in 2014 or earlier, resulting in 37,486 stories 4. Eliminated stories with only zero or one elements or zero views, resulting in: a. 14,568 unique stories b. 10,199 unique users c. 1,251,160 web and text elements 71
  • 72. Characteristics of human- generated stories Features Views Web elements Text elements Subscribers Timespan in hours 25th percentile 14 10 0 0 0.18 50th percentile 51 23 1 4 3 75th percentile 268 69 9 21 120 90th percentile 1949 210 19 85 1747 Maximum 11,284,896 2,216 559 1,726,143 36,111 72
  • 73. Characteristics of human- generated stories Features Views Web elements Text elements Subscribers Timespan in hours 25th percentile 14 10 0 0 0.18 50th percentile 51 23 1 4 3 75th percentile 268 69 9 21 120 90th percentile 1949 210 19 85 1747 Maximum 11,284,896 2,216 559 1,726,143 36,111 73 44 % of the stories have no text elements at all
  • 74. What is the Average Timespan for Stories? 74
  • 75. What is the Average Timespan for Stories? Intervals Percentage Median web elements Median text elements Median views (normalized by time) 0-60 seconds 14.0% 15 0 23 1-60 minutes 26.7% 19 0 53 1-24 hours 23.4% 25 5 110 1-7 days 13.5% 26 7 78 1-4 weeks 8.4% 26 9 80 1-12 months 10.9% 38 2 129 1-4 years 3.1% 56 15 156 75 There is nearly linear relation between the time length of the story and the number of elements
  • 76. What is the Average Timespan for Stories? 76 The story with the longest timespan in our data set covers more than 4 years and with more than 13,000 views. It had only 33 web elements and 51 total elements. Intervals Percentage Median web elements Median text elements Median views (normalized by time) 0-60 seconds 14.0% 15 0 23 1-60 minutes 26.7% 19 0 53 1-24 hours 23.4% 25 5 110 1-7 days 13.5% 26 7 78 1-4 weeks 8.4% 26 9 80 1-12 months 10.9% 38 2 129 1-4 years 3.1% 56 15 156
  • 77. The distribution of the elements (1,251,160) Resource Proportion links 70.8% images 18.4% text 8.1% video 2.0% quotes 0.7% 77 Text elements are relatively rare, meaning that few users choose to annotate the web elements in their story.
  • 78. What are the most frequently used domains? 78
  • 79. The most frequent domains in the stories • Domain Canonicalization – www.cnn.com  cnn.com • Dereference all the shortened URIs – For example: t.co, bit.ly • 25,947 unique domains 79
  • 80. The most frequent domains in the stories represent ~92% Host Frequency Percentage Alexa Global Rank as of 2015-03 Category twitter.com 943,859 82.05% 8 Social media instagram.com 45,188 3.93% 25 Photos youtube.com 22,076 1.92% 3 Videos facebook.com 13,930 1.21% 2 Social media flickr.com 7,317 0.64% 126 Photos patch.com 5,783 0.50% 2,096 News plus.google.com 3,413 0.30% 1 Social media tumblr.com 3,066 0.27% 31 Blogs blogspot.com 1,857 0.16% 18 Blogs imgur.com 1,756 0.15% 36 Photos coolpile.com 1,706 0.15% 149,281 Entertainment wordpress.com 1,615 0.14% 33 Blogs giphy.com 1,055 0.09% 1,604 Photos bbc.com 966 0.08% 156 News lastampa.it 927 0.08% 2,440 News pinterest.com 892 0.08% 32 Photos softandapps.info 861 0.07% 160,980 News photobucket.com 768 0.07% 341 Photos nytimes.com 744 0.06% 97 News soundcloud.com 736 0.06% 167 Audio 80
  • 81. The most frequent domains in the stories represent ~92% Host Frequency Percentage Alexa Global Rank as of 2015-03 Category twitter.com 943,859 82.05% 8 Social media instagram.com 45,188 3.93% 25 Photos youtube.com 22,076 1.92% 3 Videos facebook.com 13,930 1.21% 2 Social media flickr.com 7,317 0.64% 126 Photos patch.com 5,783 0.50% 2,096 News plus.google.com 3,413 0.30% 1 Social media tumblr.com 3,066 0.27% 31 Blogs blogspot.com 1,857 0.16% 18 Blogs imgur.com 1,756 0.15% 36 Photos coolpile.com 1,706 0.15% 149,281 Entertainment wordpress.com 1,615 0.14% 33 Blogs giphy.com 1,055 0.09% 1,604 Photos bbc.com 966 0.08% 156 News lastampa.it 927 0.08% 2,440 News pinterest.com 892 0.08% 32 Photos softandapps.info 861 0.07% 160,980 News photobucket.com 768 0.07% 341 Photos nytimes.com 744 0.06% 97 News soundcloud.com 736 0.06% 167 Audio 81
  • 82. The most frequent domains in the stories represent ~92% Host Frequency Percentage Alexa Global Rank as of 2015-03 Category twitter.com 943,859 82.05% 8 Social media instagram.com 45,188 3.93% 25 Photos youtube.com 22,076 1.92% 3 Videos facebook.com 13,930 1.21% 2 Social media flickr.com 7,317 0.64% 126 Photos patch.com 5,783 0.50% 2,096 News plus.google.com 3,413 0.30% 1 Social media tumblr.com 3,066 0.27% 31 Blogs blogspot.com 1,857 0.16% 18 Blogs imgur.com 1,756 0.15% 36 Photos coolpile.com 1,706 0.15% 149,281 Entertainment wordpress.com 1,615 0.14% 33 Blogs giphy.com 1,055 0.09% 1,604 Photos bbc.com 966 0.08% 156 News lastampa.it 927 0.08% 2,440 News pinterest.com 892 0.08% 32 Photos softandapps.info 861 0.07% 160,980 News photobucket.com 768 0.07% 341 Photos nytimes.com 744 0.06% 97 News soundcloud.com 736 0.06% 167 Audio 82
  • 83. The most frequent domains in the stories represent ~92% Host Frequency Percentage Alexa Global Rank as of 2015-03 Category twitter.com 943,859 82.05% 8 Social media instagram.com 45,188 3.93% 25 Photos youtube.com 22,076 1.92% 3 Videos facebook.com 13,930 1.21% 2 Social media flickr.com 7,317 0.64% 126 Photos patch.com 5,783 0.50% 2,096 News plus.google.com 3,413 0.30% 1 Social media tumblr.com 3,066 0.27% 31 Blogs blogspot.com 1,857 0.16% 18 Blogs imgur.com 1,756 0.15% 36 Photos coolpile.com 1,706 0.15% 149,281 Entertainment wordpress.com 1,615 0.14% 33 Blogs giphy.com 1,055 0.09% 1,604 Photos bbc.com 966 0.08% 156 News lastampa.it 927 0.08% 2,440 News pinterest.com 892 0.08% 32 Photos softandapps.info 861 0.07% 160,980 News photobucket.com 768 0.07% 341 Photos nytimes.com 744 0.06% 97 News soundcloud.com 736 0.06% 167 Audio 83
  • 84. The embedded resources of the tweets 84 https://pbs.twimg.com/media/BfnnY-0CcAA56oy.png
  • 85. The embedded resources of the tweets 85 Domain Percentage Category twimg.com 46.17% Images instagram.com 4.28% Images youtube.com 2.82% Videos linkis.com 2.04% Media sharing facebook.com 1.40% Social Media wordpress.com 0.61% Blogs vine.co 0.53% Videos blogspot.com 0.52% Blogs storify.com 0.49% Social Network bbc.com 0.44% News • We sampled 5% of 47,512 tweets: 15,217 • The unique URIs: 14,616
  • 86. The embedded resources of the tweets Domain Percentage Category twimg.com 46.17% Images instagram.com 4.28% Images youtube.com 2.82% Videos linkis.com 2.04% Media sharing facebook.com 1.40% Social Media wordpress.com 0.61% Blogs vine.co 0.53% Videos blogspot.com 0.52% Blogs storify.com 0.49% Social Network bbc.com 0.44% News 86 • We sampled 5% of 47,512 tweets: 15,217 • The unique URIs: 14,616 • 46% are photos from twitter.com
  • 87. The embedded resources of the tweets Domain Percentage Category twimg.com 46.17% Images instagram.com 4.28% Images youtube.com 2.82% Videos linkis.com 2.04% Media sharing facebook.com 1.40% Social Media wordpress.com 0.61% Blogs vine.co 0.53% Videos blogspot.com 0.52% Blogs storify.com 0.49% Social Network bbc.com 0.44% News 87 • We sampled 5% of 47,512 tweets: 15,217 • The unique URIs: 14,616 • 0.49% of the stories point to other stories in Storify
  • 88. Is there a correlation between Alexa global rank and rank within Storify? 88
  • 89. Most of the time, the highly ranked resources correlate the most used resources in human-generated stories n 10 15 25 50 100 Kendal 0.1555 0.4476 0.3372 0.3194 0.2485 89
  • 90. Most of the time, the highly ranked resources correlate the most used resources in human-generated stories n 10 15 25 50 100 Kendal 0.1555 0.4476 0.3372 0.3194 0.2485 90 This is in contrast to the usage of resources in Pinterest [1] [1] Zhong, C., Shah, S., Sundaravadivelan, K., Sastry, N.: Sharing the Loves: Understanding the How and Why of Online Content Curation. ICWSM 2013 Image source: http://workmonk.in/testblog/wp-content/uploads/2014/01/pinterest.jpg
  • 91. How many of the resources in these stories disappear every year? Can we find these missing resources in the archives? 91
  • 92. The existence of the resources • We checked the live web and public web archives for 265,181 URIs – 202,452 URIs from story web elements – 47,512 randomly sampled tweet URIs – 15,217 URIs of embedded resources in those tweets • The unique URIs are 253,978 • We examined the results of the five most frequent domains in the stories – twitter.com, instagram.com, youtube.com, facebook.com, flickr.com 92
  • 93. The decay rate of the resources in the stories 93 40.8% of the stories contain missing resources with an average value of 10.3% per story
  • 94. The decay rate of the resources in the stories 94 There is a nearly linear decay rate of resources through time. This matches the finding by SalahEldeen [2]. [2] SalahEldeen, H.M., Nelson, M.L.: Losing my revolution: How many resources shared on social media have been lost? TPDL’12.
  • 95. Existence on the live web and in the archives 95 Of all the unique URIs, 11.8% are missing on the live web.
  • 96. Existence on the live web and in the archives 96 • The social media is not well-archived like the regular web • Facebook uses robots.txt to block web archiving by the Internet Archive
  • 98. Specifying the popular and the unpopular stories • The stories are divided based on the number of their views, then normalized by the amount of time they were available on the web. • Popular stories – the top 25% of stories that have the most views – 3,642 stories 98
  • 99. The distributions for the features of the stories 99 • Based on Kruskal-Wallis test, at the p ≤ 0.05 significance level, the popular and the unpopular stories are different in terms of most of the features • Popular stories tend to have: • more web elements (medians of 28 vs. 21) • longer timespan (5 hours vs. 2 hours) than the unpopular stories
  • 100. Do Popular Stories have a Lower Decay Rate? 100 The 75th percentile of decay rate per popular story is 10% of the resources, while it is 15% in the unpopular stories
  • 101. The distributions for the elements of the stories 101
  • 102. Conclusions • We analyzed 14,568 stories from Storify comprising 1,251,160 elements. • Popular stories have a min/median/max value of 2/28/1950 elements, with the unpopular stories having 2/21/2216 • Popular stories have a median of 12 multimedia resources (the unpopular stories have a median of 7) • Of the popular stories, 38% receive continuing edits (as opposed to 35%) • In popular stories, only 11% of web elements are missing on the live web (as opposed to 13%) • The percentage of the missing resources is proportional with the age of the stories. 102

Editor's Notes

  1. What we mean here by Storytelling here is using visualizations to put a set of web pages from web archives in a narrative structure, ordered by time
  2. Institutions that maintain taking periodic snapshots of pages to archive of the entire Web
  3. There are different kinds of archives depending on their purpose: Foe example there are web arhicves tht archive the whole web, an other aredomain specific, or collection based archives There are web archives that archive the whole web, such as IA. The largest web archive, in all of the above cases, the goal is to summarize or abstract a *large* collection with a few, representative selections
  4. First deployed in 2006, Archive-It is a subscription web archiving service from the Internet Archive that helps organizations to harvest, build, and preserve collections of digital content. 
  5. Lori created the collections and entered metadata about them,description, title, etc Collection level metadata but it doesn’t help a lot Archive-It provides faceted browsing and search services on the resulting collection
  6. , there are about 3 or 4 collections about egyptian revolution in Archive-it, If I want to know about the egy rev, which collection should I browse?? Collection is two dimensions <<URIs, and copies of these URIs>> Historian with more than one collection will not know where to start
  7. Every story is made up of a set of events. We use ``story'' in its current, loose context of social media, which is sometimes missing elements from the more formal literary tradition of dramatic structure, morality, humor, improvisation, etc What we mean here by Storytelling here is using visualizations to put a set of web pages from web archives in a narrative structure, ordered by time
  8. Story def. in social media much looser and more relaxed. Storytelling may be seen as the set of cultural practices for representing events chronologically.
  9. Because of the sheer volume of information on the web, “storytelling” is becoming a popular technique in social media for selecting representative tweets, videos, web pages, etc. and arranging them in chronological order to support a particular narrative or “story”4. We use “story” in its current, loose context of social media, which is sometimes missing elements from the more formal literary tradition of dramatic structure, morality, humor, improvisation, etc
  10. Facebook Timeline is one of the most popular site that the people use I have thousands of posts, images, friends posts, and so on in facebook To reduce this o a 1 minute video
  11. Highlights the years on facebook
  12. https://stories.twitter.com/index_en.html
  13. From the videos we take everyday, this app allow selecting and cutting 1 second from each video, and combining these seconds together, so u can review yr life. but a new perspective on how he lived day to day.
  14. Storify is a storytelling service that lets the user create stories or timelines using social media and web pages. Storify was launched in September 2010, and has been open to the public since April 2011. storytelling” is best typified by the company Storify http://storify.com/nzherald/mu http://storify.com/nzherald/mu http://www.nzherald.co.nz/world/news/article.cfm?c_id=2&objectid=10705546
  15. The problem is that storify operate as bookmarking, it doesn’t preserve the links You have no clue of what the person is saying about the link http://www.nzherald.co.nz/world/news/article.cfm?c_id=2&objectid=10705523 http://storify.com/nzherald/mu
  16. https://twitter.com/LamyaeA/statuses/36095702761218048 http://twitpic.com/3yo7e9
  17. So if this is the web, the archived collections are subsets from the web, we will sample from these collections to create a story…..
  18. So what we want to do is to create persistent stories then visualize them using storytelling tool that users already know about, such as storify. So we will integrate the story telling servises and the archived collections to generate archived enriched stories.
  19. The main research question here is How to identify the set of pages that best summarize the collection ? How do we integrate web archives into live web for repalying news stories as they captured?
  20. There are four basic story types We have pages and time both can be sliding or fixed Let me walk you through example for each type…
  21. http://www.cnn.com/2014/02/24/world/africa/egypt-politics/index.html?hpt=wo_c2 http://america.aljazeera.com/ Personalized Web resources offer different representations based on the user-agent string and other values in the HTTP request headers, GeoIP, and other environmental factors. Currently web archives don’t support browsing different representation. This means Web crawlers capturing content for archives may receive representations based on the crawl environment which will differ from the representations returned to the interactive users.
  22. http://wayback.archive-it.org/2358/20110211191423/http://news.blogs.cnn.com/category/world/egypt-world-latest-news/ http://wayback.archive-it.org/2358/*/http://news.blogs.cnn.com/category/world/egypt-world-latest-news/
  23. Here
  24. Here is feb 11 from different news sites https://wayback.archive-it.org/2358/20110211074248/http://www.globalpost.com/dispatch/egypt/110210/mubarak-resign-obama-egypt https://wayback.archive-it.org/2358/20110211191445/http://www.cnn.com/ https://wayback.archive-it.org/2358/20110211192204/http://www.bbc.co.uk/news/world-middle-east-12433045 https://wayback.archive-it.org/2358/20110211192142/http://www.modernegypt.info/ https://wayback.archive-it.org/2358/20110211191423/http://news.blogs.cnn.com/category/world/egypt-world-latest-news/ https://wayback.archive-it.org/2358/20110211191423/http://www.arabist.net/ https://wayback.archive-it.org/2358/20110211194239/http://www.globalpost.com/dispatch/egypt/110211/mubarak-quits-resigns-egypt-cairo
  25. And here I want to get the broadest coverage possible for the egyptian revolution
  26. http://storify.com/yasmina_anwar/the-egyptian-revolution-story-from-archive-it-hete
  27. Because of the sheer volume of information on the web, “storytelling” is becoming a popular technique in social media for selecting representative tweets, videos, web pages, etc. and arranging them in chronological order to support a particular narrative or “story”4.
  28. Storify is a social networking service launched in 2010 that allows users to create a “story” of their own choosing, consisting of manually chosen web resources, arranged with a visually attractive interface, clustered together with a single URI and suitable for sharing. It provides a graphical interface for selecting URIs of web resources and arranging the resulting snippets and previews
  29. Our research question is What are the structural characteristics of popular (i.e., receiving the most views) human-generated stories? We answer the following questions:
  30. http://storify.com/RuBegonia/richardgrenell-tweets-about-cnn-as-story-unfolds http://storify.com/OwenCobb/do-you-hear-the-people-type-1
  31. The loss (when a web page give HTTP 404) rate of the resources that compose Storify stories
  32. the top 25% of views, normalized by time available on the web
  33. . Due to the large range of values we believe median is a better indicator of “typical values” rather than mean.
  34. . Due to the large range of values we believe median is a better indicator of “typical values” rather than mean.
  35. The first two intervals represent the stories that were created and modified, then published with no continuing edits. We see that the majority of the stories in the data set were created and edited in the span of one day. There are 14 % of Storify users who update their stories over a long period of time, with the longest timespan in our data set covering more than four years and with more than 13,000 views. Curiously, it had only 33 web elements and 51 total elements. Although the story with the longest timespan did not have the largest number of elements, from Table 5 we can see that based on the median number of elements in each interval there is nearly linear relation between the time length of the story and the number of elements.
  36. The first two intervals represent the stories that were created and modified, then published with no continuing edits. We see that the majority of the stories in the data set were created and edited in the span of one day. There are 14 % of Storify users who update their stories over a long period of time,
  37. Using Storify-defined categories reflected in the UI the 1,251,160 elements consist of:
  38. This is the most frequent domains used in the stories… We manually categorized these domains in a more fine-grained manner than Storify provides with its “links, images, text, videos, quotes” descriptions
  39. Twitter is the most popular domain
  40. Note that plus.google.com has rank one because Alexa does not differentiate plus.google.com from google.com.
  41. Although the top 25 list of domains appearing in the stories is dominated by globally popular web sites (e.g., Twitter, Instagram, Youtube, Facebook), the long-tailed distribution results in the presence of many globaly lesser known sites.
  42. https://storify.com/acranger/facebook-turns-10 The Embedded Resources of twitter.com Since Twitter is the most popular domain (> 82% of web elements), we investigate if the tweets have embedded resources of their own.
  43. 46% are photos from twitter.com
  44. Note that some Storify stories (0.49%) point to other stories in Storify.
  45. most of the time the highly ranked real-world resources, such as twitter.com, are correspondingly the most used in human-generated stories. The table Kendall’s Tau correlation coefficient for the most frequent n domains. Statistically significant (p < 0.05) correlations are bolded. The highest correlation is 0.45 for list of 15 domains.
  46. In Pinterest, users pin photos or videos of interest to create theme-based image/video collections such as hobbies, fashion, events. The most used subject areas that are being used by Pinterest users are food and drinks, décor and design, and Apparel and Accessories [5]. Most of the pins on Pinterest come from blogs, or uploaded by users. In Storify, the people tend to use social media and web resources to create their narratives about events, or something of interest.
  47. the percentage of the missing resources in the stories over time. a nearly linear decay rate of resources through time
  48. the percentage of the missing resources in the stories over time.
  49. the left two columns of Table 6 contain the results of checking the status of the web pages on the live web. The table also contains the results of the five most frequent domains and all other URIs. We also in we conclude that the decay rate of social media content is lower than the decay rate of the regular web content and websites. the social media is not well-archived like the regular web
  50. Of all the unique URIs, 11.8% are missing on the live web. The table also contains the results of the five most frequent domains and all other URIs.
  51. Popular stories tend to have: more web elements (medians of 28 vs. 21) longer timespan (5 hours vs. 2 hours) than the unpopular stories longer editing time intervals than the unpopular stories
  52. It shows that the resources of the pop- ular stories tends to stay longer than the resources of the unpopular.
  53. the distribution of the images in popular stories is higher than the distribution for the images in the unpopular stories. The median number of images in popular stories is 10, while it is 5 in the unpopular stories Although the unpopular stories tend to use links more than the popular stories, the median of the links in popular stories (20 links) is higher than the unpopular stories (16 links).