2. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
2
Why investigate unique print books?
Future of library print collections is in question
We need better “management intelligence” about where
continued investment in print collections – both legacy
holdings and future acquisitions – should be directed
Uniquely-held content may be an asset or liability
Institutional assets that may be leveraged through
digitization and resource-sharing agreements
Potential preservation risks, if the content is not
adequately cared for
Size, character and distribution of aggregate
collection has broad implications
Digitization – identifying distinctive collections
Disclosure – maximizing discoverability
Distributed print archiving – sizing the need
3. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
3
Who’s Involved:
OCLC Programs & Research
Constance Malpas, Program Officer
Ed O’Neill, Senior Research Scientist
Brian Lavoie, Research Scientist
RLG Partners
Arizona State University
Columbia University
Duke University
Florida State University
Harvard University
Indiana University
Library of Congress
New York Public Library
New York University
University of Alberta
University of Arizona
University of California, Berkeley
University of California, Los Angeles
University of Chicago
University of Michigan
University of Minnesota
University of Pennsylvania
University of Texas, Austin
Yale University…
among others
4. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
4
Unique vs. rare: a distinction with a difference
“Unique” = single holding attached to master
record in WorldCat describing a distinct
manifestation / edition
some uniquely held titles may be associated with
multiple local copies
“Rare” typically describes material that is in
limited supply and has special value to particular
audience
Few copies were produced
Few remaining copies available on the market
Distinctive intellectual content or artifactual features
(binding, signatures)
5. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
5
Growth of Unique Holdings in WorldCat
Jan -03 Jan -05 Jan -07 Jan -08
Date of Snapshot
MasterRecords
50%
49%
42%
44%
Proportion of master records with a single holding
has increased 8% since 2003
6. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
6
Background
Anatomy of Aggregate Collections (2005)
Thin duplication of book holdings across “Google Five” libraries
(~40%) and between aggregate collection and rest of
WorldCat (~30%)
Proportion of uniquely held titles decreases as publication date
advances – until 1980s
Books without Boundaries (2006)
9.5M uniquely held works representing 36% of works in
WorldCat; preservation implications
Unique titles in WorldCat represent ~2/3 of total print
production; significant collection gap
Last Copies: What’s at Risk? (2006)
“last expressions” – a conceptual model
26K unique titles at Vanderbilt; typically “old, foreign, short”
Global Resources Report (2007)
Limited redundancy in ARL holdings of non-North American
imprints (~3 to ~6 holdings per title)
7. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
7
Importance of FRBR
Measuring duplication at the “work” or expression
level provides maximum measure of overlap for
intellectual content
Uniquely-held manifestations may represent
artifactual treasures
Book history – bindings, printers
Provenance – autographs, annotations
Implications for collection management
Unique works represent distinctive intellectual assets
Unique manifestations may require curatorial care
8. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
8
FRBR: Group One Entities
Is
exemplified by
Is embodied in
Work
A distinct intellectual or
artistic creation
Is realized
through
Expression
The intellectual or artistic
realization of a work
Manifestation
The physical
embodiment of an
expression
Item
A single
exemplar of a
manifestation
Is
embodied in
9. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
9
Goals of current last copies work
Evaluate relative proportion of unique works in a
representative and statistically significant sample
Application of FRBR
Characterize material and content types
“old, foreign, short”
Examine distribution of holdings by library-type
preservation infrastructure
Assess preservation status and circulation history
of selected titles
In 1995 study of titles published 1850-1940, 12% were
not available for study – missing, not on shelf
10. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
10
Sample Characteristics
Fractional sample of 250 records representing:
January 2007 snapshot of WorldCat
74.5M bibliographic records
Master records with a single holding symbol
36.8M records
Monographic language-based titles, excluding non-print
formats (electronic resources, microforms, braille)
14.7M records
Further limits were applied to facilitate analysis:
English-language cataloging only
Common descriptive standards
Titles published before Y2000
Avoid ‘first copy’ (cataloging lag) problem
11. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
11
Research Methods
Independent assessment followed by team review
Combination of machine- and manual analysis
Connexion, FirstSearch, MARCView
Level of uniqueness
work: content is not duplicated within WorldCat
expression: distinctive expression of duplicated content
manifestation: alternate editions available in WorldCat
analytic: content is part of a larger published work
duplicate record found: cataloging anomalies
Material / content types
Non-fiction books; technical reports; language /
literature; archival materials; ephemera
Theses and dissertations (baccalaureate, masters, PhD)
Government documents (national, state, local)
12. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
12
Levels of Uniqueness within Sample
non-unique
unique analytics
unique manifestations
unique expressions
unique works
N = 250 records
>60% of titles in
sample represent
unique intellectual
content
cataloging shortfalls
13. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
13
Content and Material Types
33%
20%
15%
10%
7%
3%
12%
Non-fiction published books
Theses and dissertations
Technical reports
Serials
Literature, poetry
Archival materials
Other (ephemera, catalogs,
manuals, direcotories, etc.)
N = 250 records
Academic and technical content predominates . . .
14. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
14
Range of Unique Works by Material Type
Material types representing >5% of titles in sample
“grey literature” contains
greatest proportion of
unique intellectual content
more
manifestations
15. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
15
Theses and Dissertations
0
5
10
15
20
25
30
Masters Doctoral Baccalaureate
Total in sample
Unique works
Held by issuing
institution
N = 49 records
75% are unique works
16. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
16
Language of Publication
Non-English publications account for
<40% of uniquely held books in sample
vs. ~75% of uniquely held books in
Vanderbilt study
N = 250 records
17. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
17
Place of Publication
32%
68%
US imprint
Non-US imprint
A majority of uniquely held print books were published
outside the United States
63%
37%
5% more than print books
with multiple holdings
US
Non-US
18. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
18
Subject Access
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 2
No Subject Cataloging
Subject Cataloging
Unique works Multiple holdings
19%
9%
~20% of unique print books lack subject cataloging
NB: unique works do
not benefit from FRBR-
enhanced
discoverability; no
related manifestations
19. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
19
Sample Holdings by Institution Type
54% of sample
23% of sample
Academic and
research libraries
hold the greatest
share of unique
print books
N = 250 records
Non-ARL academic libraries
have the greatest number of
aggregate holdings in
WorldCat – but are less
likely than ARL institutions
to hold unique titles
20. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
20
Age Distribution of Unique Titles
N = 250 records
>70% of titles in sample
were produced after 1950
Relative proportion of
unique works increases in
post-WWII period
increased print production?
rise of scientific and technical
enterprise?
increased library collecting activity?
Date of Publication
Percentageoftitles(records)insample
21. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
21
Characterizing Unique Works
Foreign, but accessible
Limited discoverability
Challenging inventory control
22. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
22
In Sum . . .
Uniquely-held print books containing unique
intellectual
content are typically:
Non US imprints
English language titles
Produced after 1950
Technical, non-fiction content
Sparsely described
Short (~100 pages in length)
Held by academic and research libraries
23. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
23
Preservation and circulation status
Surveyed 27 RLG partners regarding shelf status,
condition and circulation history of selected titles
from ‘only copy’ sample
Responses (to date) from:
Columbia University University of Arizona
Harvard University University of Chicago
Indiana University University of California, Los Angeles
New York Public Library University of Minnesota, Twin Cities
University of Alberta University of Pennsylvania
University of Texas, Austin
Subset representative of larger sample:
~70% unique works / expressions
24. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
24
Survey Results (to date)
Inventory control and item condition
100% of requested titles were available for examination
Multiple copies held for 3 titles in sample, all theses
None had significant condition problems
Location and status
50% housed in off-site shelving facility
Mostly transferred in the 1990s
50% non-circulating (local or off-site)
Some availability via SHARES
Use (value, discoverability?)
None requested or circulated in past 5 years
Limited usage data for non-circulating collections
25. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
25
Implications
Preservation
~50% of uniquely held works are potentially at risk in on-
site, circulating collections
Limited discoverability and low-use of these titles
diminishes relative risk
Recent publications less likely to have inherent condition
problems
Access
Preponderance of recent publications, and non-North
American imprints, is likely to limit potential impact of mass
digitization
Inter-institutional access and borrowing programs (e.g.
SHARES) will test the limits of cooperative collection
management
Effective disclosure (holdings, condition, policies) may
require additional investment
26. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
26
Opportunities for Joint Action
Cooperative access agreements
Increase the mobility of scarcely-held content; empower resource-
sharing networks to lend and borrow unique holdings
Distributed print archiving
Leverage existing on- and off-site storage infrastructure as
network resource
Shared digitization infrastructure
Reposition off-site repositories as digital delivery hubs
Continue to build new uniqueness into system-wide
holdings…strategically
Local collection development priorities will be trumped by
economic realities; plan accordingly.
27. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
27
Short, foreign …
and competing for attention
28. RLG Programs Assessing Uniqueness in the System-wide Book Collection
RLG Webinar – 24 April 2008
28
Questions, Comments?
OCLC Programs & Research Agenda
Managing the Collective Collection
Constance Malpas
malpasc@oclc.org
Editor's Notes
The purpose of today’s session is to share preliminary results of some recent research on unique print book titles, acknowledge the contributions that RLG partners have made to this effort, and offer attendees an opportunity to help shape the final report for this project.
Most RB collections are comprised of early printed works, volumes printed before 1850 in the Americas, and before 1775 in Europe and the other continents.
WorldCat (excluding article-level metadata) has nearly doubled in size over the last 5 years. But while global coverage has increased significantly, the total proportion of unique holdings in the database has continued to grow.
I want to start by establishing some context for our current work on unique titles. Last expressions: “the only known manifestation of specific intellectual or artistic content” Measuring duplication at the title or manifestationlevel is inadequate; must consider relative uniqueness of content
Unique can be defined in absolute terms; “rare” is relative to a particular set of curatorial interests. In the mid ’80s, Ross Atkinson proposed a “materialistic” typology of preservation priorities, each with a distinctive kind of value: Class 1 materials represented rare books and manuscripts with high economic and research value; Class 2 materials represented heavily-used content that was at risk of physical deterioration; Class 3 materials represented infrequently used content that had enduring scholarly value but little economic value
Work : A distinct intellectual or artistic creation. Modifications involving a significant degree of independent intellectual effort such as paraphrases, rewritings, adaptations for children, parodies, abstracts, digests, and summaries are considered to be different works. Expression : The intellectual or artistic realization of a work. The boundaries of an expression are defined to exclude aspects of physical form (typeface, page layout, etc.) Revisions, updates, abridgements, enlargements, and translations are different expressions of the same work. Any revision or modification, no matter how minor, is considered to be a new expression. Manifestation : The physical embodiment of an expression of a work. A manifestation represents all the physical objects that bear the same intellectual and physical characteristics. Changes in typeface, size of font, page layout, or change of publisher will result in a new manifestation. New printings are not considered to be a new manifestation unless other significant changes are also made. The same manifestation may have different binding (hardcover vs. paperback) or the type of paper (regular or acid-free) or other variations (thumb-indexed) that do not significantly printed image. Item : A single exemplar of a manifestation. All changes that occur after the manufacturing process (defacement, rebinding, etc.) are considered changes to the item and do not result in a new manifestation.
(1) Bib lvl = ‘a’ or ‘t’ (books and manuscripts) (2) rec type = ‘m’ (3) enc lvl not = ‘8’ (no cip) (4) 245 subfield h not = “microform” or "electronic resource" (5) 533 subfield a not = "microfilm", "microopaque", "micro opaque", "microfiche", "microprint", "microcard", "microform", "electronic reproduction", "electronic resource", or "braille" (6) No 856 subfield 3 (7) Published before 2000 (a) for date types e, r, s, t; date1 < 2000 (b) for date types m, q; date2 < 2000
Previous study of Vanderbilt’s uniquely held books identified ‘last expressions’ as a class of material deserving careful scrutiny. Our current project confirms that a significant number of unique holdings represent unique intellectual content, i.e. content for which a single expression exists within the aggregate collection of WorldCat libraries.
Both in absolute terms (total number of titles/records in sample)
And in relative terms, with theses/dissertation and technical reports representing the greatest proportion of unique works.
Theses and dissertations account for 20% of the titles in our sample and more than a quarter of titles identified as unique works. Most of the durable uniqueness can be attributed to masters theses, which rarely have more than a single institutional holding in any format. Theses and dissertations are of particular interest, as they represent a source of “locally produced” uniqueness for university libraries.
Books without Boundaries also found ca. 50% non-English titles in sample of uniquely held works. Language distribution for unique works vs. others not substantially different.
Nonetheless, most of the imprints in our sample were published outside of the United States.
This answers question posed in Books without Boundaries regarding the institutional distribution of unique books, confirming that institutions with a strong preservation mission hold the greatest proportion of such titles. Ranked order of institution types by total holdings in WorldCat Non-academic ARL ARL Public Special Govt School State and national
Similarly, in Vanderbilt study, more than half of the titles in sample were published after 1950 – though the relative proportion of unique titles was highest for earlier period. I.e., as Google Books analysis suggests, duplication of holdings is inversely proportional to age of book – until the 1980s, when holdings become relatively scarce again.
http://www.flickr.com/photos/library_of_congress/2162895505/ Bain News Service, publisher. Greece in N.Y. 4th of July Parade [between 1910 and 1915]