I presented this at iPres 2018. It consists of an analysis of some structural features found in Archive-It collections. We also categorize Archive-It collections into 4 different semantic categories and then uses the structural features to predict these categories with a Random Forest Classifier.
1. The Many Shapes of
Archive-It
Shawn M. Jones Alexander Nwala Michele C. Weigle Michael L. Nelson
Old Dominion University
Web Science and Digital Libraries Research Group
@WebSciDL
sjone@cs.odu.edu
@shawnmjones
anwala@cs.odu.edu
@acnwala
mweigle@cs.odu.edu
@weiglemc
mln@cs.odu.edu
@phonedude_mln
Thanks to:
2. @shawnmjones @WebSciDL
Researchers Create Their Own Web Archive Collections
2
Archived web pages, or mementos, are used by journalists, sociologists, and historians.
Tucson Shootings2008 OlympicsUniversity of Utah
3. @shawnmjones @WebSciDL
Web Archive Collections Have Many Versions of the
Same Page
3
2013
2015
2018
University of Utah Office of Admissions
from the University of Utah Web Archive Collection
4/1/2015
3/5/2015
Tumblr Black Lives Matter Blog
from the #blacklivesmatter Collection
2/12/2015
4. @shawnmjones @WebSciDL
Different Versions Allow Us to See an Unfolding News
Story
4
Memento from
April 19, 2013 17:12
Searching for Suspects,
City on Lockdown
Memento from
April 19, 2013 17:59
Officer Donahue in hospital,
Lockdown loosened,
Will the Red Sox game be cancelled?
Memento from
April 21, 2013 2:24
Suspect Found,
Office Collier Lost Life,
Obama speaks
6. @shawnmjones @WebSciDL
The Internet Archive created Archive-It so organizations could
create their own web archive collections
Curators can supply live web resources as seeds and establish crawling schedules of those seeds to
create mementos of these seeds at different points in time.
6
7. @shawnmjones @WebSciDL
But this is the interface available for browsing those
collections…
7
How do we tell the difference without going through them all?
What types of collections exist?
9. @shawnmjones @WebSciDL
We Can Understand It Based On Metadata
9
Collection wide Metadata Metadata on Individual Seeds
Dublin
Core
+
Custom
Fields
10. @shawnmjones @WebSciDL
We Can Understand It Based On Metadata,
but the Metadata Does Not Always Help…
10
132,599 seeds
no metadata
9 seeds
with metadata
Because
metadata is
optional it is
not always
present.
11. @shawnmjones @WebSciDL
We Can Understand It Based On Metadata,
but the Metadata Does Not Always Help…
11
Because
metadata is
optional it is
not always
present.
When it is present, metadata on Archive-It collections is:
• generated by many different curators
• from different organizations
• with different content standards
• and different rules of interpretation
12. @shawnmjones @WebSciDL
We Can Understand It Based On Metadata,
but the Metadata Does Not Always Help…
12
Because
metadata is
optional it is
not always
present.
When it is present, metadata on Archive-It collections is:
• generated by many different curators
• from different organizations
• with different content standards
• and different rules of interpretation
It is inconsistently applied!
This means that a user cannot reliably compare metadata
fields to understand the differences between collections.
13. @shawnmjones @WebSciDL
We Can Understand It Based on Content
We can use techniques such as text mining
and network analysis
The same collection in the
Archives Unleashed Cloud
https://archivesunleashed.org
13
15. @shawnmjones @WebSciDL
We Can Understand It Based on Content,
but all of that Content Must Be Dereferenced…
15
Remember:
• Each result is a
seed
• Each seed has
multiple mementos
16. @shawnmjones @WebSciDL
We Can Understand It Based on Content,
but all of that Content Must Be Dereferenced…
16
There are 486,227 seed mementos that
must be downloaded and processed to
understand this collection.
Remember:
• Each result is a
seed
• Each seed has
multiple mementos
17. @shawnmjones @WebSciDL
We Can Understand It Based on Content,
but all of that Content Must Be Dereferenced…
17
There are 486,227 seed mementos that
must be downloaded and processed to
understand this collection.
Remember:
• Each result is a
seed
• Each seed has
multiple mementos
These 333 seeds correspond to
278,306 seed mementos.
They must be downloaded and processed.
18. @shawnmjones @WebSciDL
and what if we do not know the language?
18
???
About University of Utah
English
non-
German
Speakers
can
discern: About shootings in Tuscon
20. @shawnmjones @WebSciDL
What kinds of questions can be answered with
Structural Features?
Using only structural features is
advantageous because it saves one
from having to dereference all of the
URIs in a collection.
These structural features also give us
different insight than can be provided by
text analysis or metadata.
20
81,014 seeds
486,227 seed mementos
21. @shawnmjones @WebSciDL
Does most of the collection exist earlier or later in its
life?
21
This collection was created in March 2010.
Most of its mementos come from 2016 – 2018.
Most of this collection exists later in its life.
22. @shawnmjones @WebSciDL
When did the curator select and archive a collection’s contents?
22
This collection was created in March 2006.
Some of the seeds were selected in 2006.
Many of the seeds were selected all along its
life.
It has mementos as recent as July 2018.
23. @shawnmjones @WebSciDL
Did the curator create a collection intended to archive new versions of the
same web pages repeatedly?
23
This collection was created in June 2014.
The seeds were selected at the beginning of its life.
Mementos were captured all during its life.
24. @shawnmjones @WebSciDL
Was the collection built from web sites belonging to one domain
or many?
24
Many domains One domain
25. @shawnmjones @WebSciDL
Were most of the web pages in the collection top-level
pages or specific articles deeper in a web site?
25
Top-level pages Deeper Links
26. @shawnmjones @WebSciDL
Other questions answered by structural features:
Was there renewed interest at some point later in the collection’s life?
Did the curator nurture the selected web pages throughout the collection’s life
and add content continuously?
What time period does the collection span?
What is the temporal skew of the collection?
What is the lifetime of the collection?
26
27. @shawnmjones @WebSciDL
Can we bridge the structural to the descriptive?
We can categorize Archive-It’s collections into four main semantic categories.
We can predict these categories using a Random Forest Classifier using
structural features.
27
29. @shawnmjones @WebSciDL
Looking at Archive-It collections from the outside
• Curators select seeds, which are captured as seed mementos
• Deep mementos are created from other pages linked to seeds
• In this work, we focus on seeds and seed mementos
29
30. @shawnmjones @WebSciDL
TimeMaps from the Memento Protocol
30
<http://a.example.org>;rel="original",
<http://arxiv.example.net/timemap/http://a.example.org>; rel="self";
type="application/link-format"
; from="Tue, 20 Jun 2000 18:02:59 GMT"
; until="Wed, 21 Jun 2000 04:41:56 GMT",
<http://arxiv.example.net/timegate/http://a.example.org>; rel="timegate",
<http://arxiv.example.net/web/20000620180259/http://a.example.org>; rel="first memento";
datetime="Tue, 20 Jun 2000 18:02:59 GMT",
<http://arxiv.example.net/web/20091027204954/http://a.example.org>; rel="last memento";
datetime="Tue, 27 Oct 2009 20:49:54 GMT",
<http://arxiv.example.net/web/20000621011731/http://a.example.org>; rel="memento";
datetime="Wed, 21 Jun 2000 01:17:31 GMT",
<http://arxiv.example.net/web/20000621044156/http://a.example.org>; rel="memento";
datetime="Wed, 21 Jun 2000 04:41:56 GMT"
…
Each seed has a corresponding TimeMap listing all of that seed’s mementos and capture times, their
memento-datetimes.
entries for mementos
memento-datetime
original resource URI
Memento URI (URI-M)
TimeMap URI (URI-T)
32. @shawnmjones @WebSciDL
Related Work
32
Nwala (2018)
Mull (2014)
Wang (2016)
Ogden (2017)
features of digital
collections
Fenlon (2017)
selecting seeds
for
web archive
collections
Milligan (2016)
motivations for
creating
collections
behavior of web
archivists
Crook (2009)
Slania (2013)
Deutch (2016)
studies of using
Archive-It
capabilities
of web archive user
interfaces
Niu (2012)
33. @shawnmjones @WebSciDL
Related Work
33
Nwala (2018)
Mull (2014)
Wang (2016)
Ogden (2017)
features of digital
collections
Fenlon (2017)
selecting seeds
for
web archive
collections
Milligan (2016)
motivations for
creating
collections
behavior of web
archivists
Crook (2009)
Slania (2013)
Deutch (2016)
studies of using
Archive-It
capabilities
of web archive user
interfaces
Niu (2012)
We focus on
web archive
collections
34. @shawnmjones @WebSciDL
Related Work
34
Nwala (2018)
Mull (2014)
Wang (2016)
Ogden (2017)
features of digital
collections
Fenlon (2017)
selecting seeds
for
web archive
collections
Milligan (2016)
motivations for
creating
collections
behavior of web
archivists
Crook (2009)
Slania (2013)
Deutch (2016)
studies of using
Archive-It
capabilities
of web archive user
interfaces
Niu (2012)
We focus on
web archive
collections
We examine
web archive
collections
after they
have been
created
35. @shawnmjones @WebSciDL
Related Work
35
Nwala (2018)
Mull (2014)
Wang (2016)
Ogden (2017)
features of digital
collections
Fenlon (2017)
selecting seeds
for
web archive
collections
Milligan (2016)
motivations for
creating
collections
behavior of web
archivists
Crook (2009)
Slania (2013)
Deutch (2016)
studies of using
Archive-It
capabilities
of web archive user
interfaces
Niu (2012)
We focus on
web archive
collections
We examine
web archive
collections
after they
have been
created
We look to
structural
features of
web
archives
rather than
user studies
of live web
curation
platforms
36. @shawnmjones @WebSciDL
Related Work
36
Nwala (2018)
Mull (2014)
Wang (2016)
Ogden (2017)
features of digital
collections
Fenlon (2017)
selecting seeds
for
web archive
collections
Milligan (2016)
motivations for
creating
collections
behavior of web
archivists
Crook (2009)
Slania (2013)
Deutch (2016)
studies of using
Archive-It
capabilities
of web archive user
interfaces
Niu (2012)
We focus on
web archive
collections
We examine
web archive
collections
after they
have been
created
We look to
structural
features of
web
archives
rather than
user studies
of live web
curation
platforms
We focus on
the output of
web
archivists
rather than
studying
their
behavior in
real time
37. @shawnmjones @WebSciDL
Related Work
37
Nwala (2018)
Mull (2014)
Wang (2016)
Ogden (2017)
features of digital
collections
Fenlon (2017)
selecting seeds
for
web archive
collections
Milligan (2016)
motivations for
creating
collections
behavior of web
archivists
Crook (2009)
Slania (2013)
Deutch (2016)
studies of using
Archive-It
capabilities
of web archive user
interfaces
Niu (2012)
We focus on
web archive
collections
We examine
web archive
collections
after they
have been
created
We look to
structural
features of
web
archives
rather than
user studies
of live web
curation
platforms
We focus on
the output of
web
archivists
rather than
studying
their
behavior in
real time
We focus on
structural
features
rather than
challenges
with using
Archive-It as
a tool
38. @shawnmjones @WebSciDL
Related Work
38
Nwala (2018)
Mull (2014)
Wang (2016)
Ogden (2017)
features of digital
collections
Fenlon (2017)
selecting seeds
for
web archive
collections
Milligan (2016)
motivations for
creating
collections
behavior of web
archivists
Crook (2009)
Slania (2013)
Deutch (2016)
studies of using
Archive-It
capabilities
of web archive user
interfaces
Niu (2012)
We focus on
web archive
collections
We examine
web archive
collections
after they
have been
created
We look to
structural
features of
web
archives
rather than
user studies
of live web
curation
platforms
We focus on
the output of
web
archivists
rather than
studying
their
behavior in
real time
We focus on
structural
features
rather than
challenges
with using
Archive-It as
a tool
We focus on
structural
features of
the archives
rather than
their user
interfaces
39. @shawnmjones @WebSciDL
Related Work
39
Sağlam (2014) Abramson
(2012)
AlSum (2014)
metadata
standards
EAD
topics in web
archive
collections
AlNoamany
(2016)
classification of
URIs
web archive growth
analysis
studies of using
Archive-It
Dublin Core
AlNoamany
(2016)
40. @shawnmjones @WebSciDL
Related Work
40
Sağlam (2014) Abramson
(2012)
AlSum (2014)
metadata
standards
EAD
topics in web
archive
collections
AlNoamany
(2016)
classification of
URIs
web archive growth
analysis
studies of using
Archive-It
We look at
the
structural
features
rather than
metadata
Dublin Core
AlNoamany
(2016)
41. @shawnmjones @WebSciDL
Related Work
41
Sağlam (2014) Abramson
(2012)
AlSum (2014)
metadata
standards
EAD
topics in web
archive
collections
AlNoamany
(2016)
classification of
URIs
web archive growth
analysis
studies of using
Archive-It
We look at
the
structural
features
rather than
metadata
We are not
looking at
the content,
but the
structural
features of
collections
Dublin Core
AlNoamany
(2016)
42. @shawnmjones @WebSciDL
Related Work
42
Sağlam (2014) Abramson
(2012)
AlSum (2014)
metadata
standards
EAD
topics in web
archive
collections
AlNoamany
(2016)
classification of
URIs
web archive growth
analysis
studies of using
Archive-It
We look at
the
structural
features
rather than
metadata
We are not
looking at
the content,
but the
structural
features of
collections
We examine
different
features of
URIs like
domain and
path depth
Dublin Core
AlNoamany
(2016)
43. @shawnmjones @WebSciDL
Related Work
43
Sağlam (2014) Abramson
(2012)
AlSum (2014)
metadata
standards
EAD
topics in web
archive
collections
AlNoamany
(2016)
classification of
URIs
web archive growth
analysis
studies of using
Archive-It
We look at
the
structural
features
rather than
metadata
We are not
looking at
the content,
but the
structural
features of
collections
We examine
different
features of
URIs like
domain and
path depth
We apply
AlSum’s
methods to
specific
collections
rather than
entire
archives
Dublin Core
AlNoamany
(2016)
44. @shawnmjones @WebSciDL
Related Work
44
Sağlam (2014) Abramson
(2012)
AlSum (2014)
metadata
standards
EAD
topics in web
archive
collections
AlNoamany
(2016)
classification of
URIs
web archive growth
analysis
studies of using
Archive-It
We look at
the
structural
features
rather than
metadata
We are not
looking at
the content,
but the
structural
features of
collections
We examine
different
features of
URIs like
domain and
path depth
We apply
AlSum’s
methods to
specific
collections
rather than
entire
archives
We look at
collections
as units
rather than
analyzing
Archive-It as
a whole
Dublin Core
AlNoamany
(2016)
46. @shawnmjones @WebSciDL
Acquiring 9351 Archive-It collections
We used BeautifulSoup to
scrape the web pages of 9,351
Archive-It Collections.
From this scraping we
discovered:
• If the collection was public or
private
• Seed URIs
Using the Seed URIs, we
discovered TimeMaps listing all
seed mementos and their
memento-datetimes.
46
51. @shawnmjones @WebSciDL
Remove 357 collections with a single memento
Singletons consist of a single
seed with a single memento,
offering no behavior to study
51
52. @shawnmjones @WebSciDL
Remove 21 instantaneous collections
Single second collections were
captured in a single second,
offering no behavior over time
to study
52
53. @shawnmjones @WebSciDL
Remove 32 test collections
Collections clearly marked as
test or trial do not represent
regular collection behavior
53
54. @shawnmjones @WebSciDL
We study the remaining 3,382 collections
This leaves us with 3,382
collections for study with a total
of :
• 700,835 seeds
• 6,943,677 seed mementos
54
56. @shawnmjones @WebSciDL
Growth curves help us understand collection growth,
but require normalization for comparison
56
We want to compare time
• “2014 Primaries” has 219,084 mementos
• “The Obama White House” has 140
• We normalize the number as a percentage
We want to compare memento count
• “Hurricane Sandy” has 174,884 seeds
• “Scottish Politics” has 58 seeds
• We normalize the number as a percentage
We want to compare seed count
• “Indiana: State and Local Documents”
spans 2005 – 2018
• “Japan: Election 2016 House of Councilors”
spans less than 2 days in July 2016
• We normalize time as a percentage of
the lifespan of the collection,
from the first memento-datetime to the last
57. @shawnmjones @WebSciDL
Once normalized, we can compare behavior in the seed
growth…
57
• Skew of the curator’s
involvement with the
collection
• When seeds were added
• When interest was lost or
regained
Seeds added all up frontSeeds added early, but
not all up front
58. @shawnmjones @WebSciDL
And, we can compare behavior in the memento growth…
58
• Built from all mementos in
the collection’s TimeMaps
• Skew of the collection’s
holdings
• Indicates temporality of
collection
Mementos crawled all alongMementos crawled later
59. @shawnmjones @WebSciDL
We can classifying different behaviors of Growth Curves
Using two features:
Area under the seed curve (AUCseed)
Area under the seed memento curve
(AUCsmem)
We can classify a collection’s
growth curve into 9 categories
If AUC > 0.55, then those points occur
early
If AUC < 0.45, then those points occur
late
If 0.55 > AUC > 0.45, then those points
occur continuously
59
Seeds
Late
Seeds
Continuously
Seeds
Early
Seed
Mementos
Early
Seed
Mementos
Continuously
Seed
Mementos
Late
AUCseed > 0.55
AUCseed < 0.45
AUCsmem > 0.55
0.55 > AUCsmem > 0.45
AUCsmem < 0.45
0.55 > AUCseed > 0.45
60. @shawnmjones @WebSciDL
Seeds Early
60
The curators added most of the seeds at the beginning of the collection’s
life and then scheduled crawls at different schedules.
63. @shawnmjones @WebSciDL
From These Growth Curves we have some
simple Structural Features
Number of Seeds
Number of Seed Mementos
Collection Lifespan
Time between first and last
memento
63
64. @shawnmjones @WebSciDL
We also have complex Growth Curve Features:
Difference of Seed Curve AUC and Diagonal
64
Subtracting the AUC of the diagonal from the AUC of
the seed curve:
• We can more easily see if the seed curve is early
or late
• Early is positive
• Late is negative
• “Close” to 0 means continuous
(pos.)
(neg.)
65. @shawnmjones @WebSciDL
More complex Growth Curve Features:
Difference of Seed Memento Curve AUC and Diagonal
65
Subtracting the AUC of the diagonal from the seed curve:
• We can more easily see if the seed curve is early or
late
• Early is positive
• Late is negative
• “Close” to 0 means continuous
(pos.)
(neg.)
66. @shawnmjones @WebSciDL
More complex Growth Curve Features:
Diff. of Seed Curve AUC and Seed Memento Curve AUC
66
Difference between the seed curve AUC and the seed
memento curve AUC indicates how close the two are.
A value of 0 means that the two overlap, likely meaning
that there is one memento per seed.
A positive value means that the seeds are added earlier
than the seed mementos.
A negative value means that the seed memento growth
has overtaken the seed growth.
68. @shawnmjones @WebSciDL
Seed URI domain diversity
68
Alexander Nwala. (2018 May) An Exploration of URL Diversity Measures. Web Science and Digital Libraries Reseach Group Blog.
http://ws-dl.blogspot.com/2018/05/2018-05-04-exploration-of-url-diversity.html
Domain diversity: 0
(duplicate cnn.com hosts)
http://www.cnn.com/path/to/story0
http://news.cnn.com/path/to/story1
http://top.cnn.com/path/to/story2
Domain diversity: 1
(no duplicate domains)
http://www.cnn.com/path/to/story0
http://www.vox.com/path/to/story
http://www.foxnews.com/path/to/story
Domain diversity: 0.5
(1 duplicate cnn.com host)
http://www.cnn.com/path/to/story0
http://www.cnn.com/path/to/story1
http://www.vox.com/path/to/story
U = # of unique domains
C = number of seeds
D = diversity
D’ = normalized diversity
* Now known as the WSDL Diversity Index
Observation: Some collections only archive a single domain while others have more variety.
69. @shawnmjones @WebSciDL
Path Depth
Path Depth is a concept measuring how many
items exist in a URI’s path
Based on McCown’s work, we also add 1 for
any path containing a query string:
69
Example URI Path Depth
http://example.com/ 0
http://example.com/directory 1
http://example.com/dir1/dir2/dir3/dir4 4
http://example.com/dir1/file2?key1=val1&k
ey2=val2&key3=val3
3
Observation: Top-level pages tend to have more general information whereas deeper pages tend to have a
more specific focus.
70. @shawnmjones @WebSciDL
Seed URI Path Depth Diversity
70
Path depth diversity: 0
(All path depths are 3)
http://www.cnn.com/path/to/story0
http://news.vox.com/path/to/story1
http://top.cnn.com/path/to/story2
Path depth diversity: 1
(all completely different path depths)
http://www.cnn.com/
http://news.vox.com/path/
http://top.cnn.com/path/to/story
Path depth diversity: 0.5
(1 path depth of 1, 2 with depth of 3)
http://www.cnn.com/
http://news.vox.com/path/to/story1
http://top.cnn.com/path/to/story2
Observation: Some collections only have seeds at the top level where others only link to deeper articles.
We reuse the WSDL Diversity Index, but this time apply it to path depth.
71. @shawnmjones @WebSciDL
Other Seed Features
Most Frequent Path Depth
The path depth that appears most in the
seed URIs
Observation: For some collections, most
seeds exist at the top level while others
link to deeper articles.
% Query String Usage
How many URIs consist of query strings
Observation: Some collections have many
URIs with query strings, while others have
none.
71
73. @shawnmjones @WebSciDL
At first, we tried to map the structural features to
metadata directly…
We tried using machine learning to
predict the topics found in the
metadata of a collection
There are problems with this
approach:
Not all collections have topics.
Many collections have multiple topics.
Many collections have user-supplied
topics.
73
74. @shawnmjones @WebSciDL
Instead, we established semantic categories of
Archive-It collections
We reviewed the descriptions of 3,382 Archive-It Collections
Based on their metadata and seeds, we placed them into 4 semantic categories
74
76. @shawnmjones @WebSciDL
We can predict the semantic category
with structural features
76
Random Forest Results by Semantic CategoryResults for different Machine Learning algorithms
We found that a Random Forest classifier was best
able to predict the semantic category using a
collection’s structural features.
The Random Forest classifier works best with
collections in the Self-Archiving category.
without processing the page content
77. @shawnmjones @WebSciDL
We optimized our prediction
77
Using Kendall Tau, we were able to determine
which features had a strong correlation with the
semantic category.
Removing the “number of mementos” feature
improved F1 scores for all categories, except
Self-Archiving.
Original
With
feature
removed
79. @shawnmjones @WebSciDL
Future Work
We will adapt these structural features for our collection summarization work
The skew of growth curves may affect which mementos are chosen for review
The seed analysis features will help us better choose seeds to be included
We can incorporate this classifier to tailor summarization algorithms to specific semantic
categories
We intend to work further with Archive-It to make metadata and other data more
accessible so that screen-scraping is not necessary
79
81. @shawnmjones @WebSciDL
We adapted Growth Curves for collections
We can normalize & visualize curator
engagement with the collection
81
82. @shawnmjones @WebSciDL
We introduced Seed Features
Seed features also help us
understand the curation strategy
of a collection
Are most of the seeds from the
same domain?
Are most of the seeds from top-level
domains or deeper pages?
82
84. @shawnmjones @WebSciDL
We can understand web archive collections
using only structural features
84
Thanks to:
Metadata scraping code available: https://github.com/oduwsdl/archiveit_utilities
86. @shawnmjones @WebSciDL
Growth curves allow us to understand collection curation
behavior
86
• Built from all
mementos in the
collection’s Timemaps
• Skew of the
collection’s holdings
• Indicates temporality
of collection
• Built from the first
memento for each seed
in the collection’s
TimeMaps
• Skew of the curatorial
involvement with the
collection
• When seeds were
added
• When interest was lost
or regained
87. @shawnmjones @WebSciDL
Seeds Early, Seed Mementos Early
Most curatorial
decisions were made
early in this collection’s
life
Most crawling was
done early in its life
The temporalness of
these collections skew
early
AUCseed > 0.55
AUCsmem > 0.55
88. @shawnmjones @WebSciDL
Seeds Early, Seed Mementos Continuously
Most curatorial
decisions were made
early in this
collection’s life
Seed mementos
were added
continuously
The temporalness of
these collections
spreads throughout
their lives
AUCseed > 0.55
0.55 > AUCsmem > 0.45
89. @shawnmjones @WebSciDL
Seeds Early, Seed Mementos Late
Seed mementos were
added later
The temporalness of
these collections skew
more recent
Most curatorial
decisions were made
early in this collection’s
life
AUCseed > 0.55
AUCsmem < 0.45
90. @shawnmjones @WebSciDL
Seeds Continuously, Seed Mementos Early
0.55 > AUCseed > 0.45
AUCsmem > 0.55
Seeds are added
throughout a
collection’s life.
Seed mementos were
added earlier.
This means that most
the content of the
collection comes from
earlier in its life.
91. @shawnmjones @WebSciDL
Seeds Continuously, Seed Mementos Continuously
0.55 > AUCseed > 0.45
0.55 > AUCseed memento > 0.45
Seeds are added
throughout and their seed
mementos are collected
continuously.
These collections have a
lot of curatorial
involvement throughout
their life.
Their contents are spread
throughout their life.
92. @shawnmjones @WebSciDL
Seeds Continuously, Seed Mementos Late
0.55 > AUCseed > 0.45
AUCsmem < 0.45
Seeds are added
throughout, but the
collection is built
from mementos that
were collected later.
93. @shawnmjones @WebSciDL
Seeds Late, Seed Mementos Early
AUCseed < 0.45
AUCseed memento > 0.55
Most curatorial decisions
were made later in this
collection’s life.
But most of the mementos
were added earlier.
The temporalness of the
collection skews earlier.
Most of the mementos
belong to these early
seeds.
94. @shawnmjones @WebSciDL
Seeds Late, Seed Mementos Continuously
AUCseed < 0.45
0.55 > AUCseed memento > 0.45
The collection’s
contents are spread
throughout its life, but
many seeds were
added later.
This means that some
of those early seeds
have more mementos.
95. @shawnmjones @WebSciDL
Seeds Late, Seed Mementos Late
AUCseed < 0.45
AUCseed memento < 0.45
In these cases, the
collection appears
to have
experienced a
“resurgence in
interest” later in its
life.