Since the early days of e-resource management, holdings maintenance for electronic resources has been a very time consuming and manual process. While the emergence of electronic resource management systems (ERMS) has improved this process to a significant extent, holdings maintenance tasks remain labor intensive due to the increased volume of electronic content to manage, as well as issues related to metadata quality. To ameliorate many of the problems associated with managing electronic resources, and in recognition of a need for greater accuracy and efficiency, some knowledgebase providers are beginning to offer libraries options to automate holdings maintenance for electronic resources. In 2014, OCLC developed a service to provide automated holdings management for a select group of content providers. Within the WorldCat knowledge base system, library specific holdings for e-book and e-serial collections can be managed within the knowledge base without the need for library staff to manually intervene. At the University of Toronto Libraries, we decided to take OCLC's automated holdings management service for a test-drive. For three vendor packages, we conducted an on-going comparison between the library's holdings list and the title listing supplied by the automated service. This presentation will outline the results of this investigation, highlighting the benefits and drawbacks of automated holdings maintenance. The talk will also provide a vision of what the automated holdings management service could look like in the future.
Speaker: Marlene van Ballegooie, Metadata Librarian, University of Toronto
2. Outline
• Flashback to the dawn of NASIG – What
were we thinking about e-resource holdings
management then?
• Current state of ERM
• OCLC’s automated holdings management
services
• The study results
• Benefits/challenges of the service
• A look to the future of e-resource holdings
management
4. The Hits…
“In ten years, the library that we know today
will be augmented by virtual libraries...
Resources that seem to be locally available will
actually be held at remote locations…A library’s
holdings will be defined by access, not by
possession.”
Lucy Seifert Wegner, “The
Research Library and Emerging
Information Technology.” (1992)
5. “Staff will need to change from pointers
and retrievers to organizers and
facilitators. They must accept that the
library must change from a fortress to a
pipeline and realize that the collections
must be dealt with “en masse” rather
than one at a time.”
Kenneth E. Dowlin, “The Neographic Library: A 30-
Year Perspective on Public Libraries.” (1993)
6. “As in-house technical processing recedes into
the afterglow of shared-cataloging nirvana,
catalogers and other technical processing staff
will move toward being managers – rather
than producers – of online records.”
Richard D. Hacken,
“Tomorrow’s research
library: vigor or rigor
mortis?” (1988)
7. “Providing cataloging descriptions for
‘moving targets’ will soon become a
familiar problem.”
Karen L. Horny, “New Turns for a New Century:
Library Services in the Information Age.” (1987)
“Cataloging may not take place entirely within
libraries. Publishers of electronic manuscripts
may have their own staffs provide standardized
bibliographic records with a variety of subject
access points.”
8. And the Misses…
“Few of these new kinds of journals will
come from existing journal publishers, at
least not if the new journals would
compete with existing products.”
“Librarians’ favorite media after print
will continue to be microform…”
Brett Butler, “Scholarly Journals, Electronic Publishing,
and Library Networks: From 1986 to 2000.” (1986)
“Primary research – journals articles, proceedings, reports,
and other published literature – that is the province of
today’s research library does not have a good channel for
distribution of electronic information.”
9. “It would be a mistake, however, to believe that
electronic journals are going to replace present
printed journals, anymore than television
replaced motion pictures … While a few new
electronic journals have appeared, they are being
created at the very margins of scholarship.”
Harold Billings, “Romancing the information flow:
solving the information crisis.” (1991)
10. “If one assumed that the number of electronic
journals would grow to 100 by 1995 and 1,000 by
the year 2000, they will still account for only a
small proportion of the estimated 7,000 to
15,000 scholarly journals in existence. This is not
something … that is going to inundate us anytime
soon.”
Martin J. Dillon cited by Kim McDonald. “Despite
benefits, electronic journals will not replace print,
experts say.” (1991)
12. Proliferation of E-Content in
Libraries
• University of Toronto
Libraries
– $29 million acquisition
budget
– $17.5 million devoted to
electronic resources
(60% of total acquisition
budget)
– Ongoing electronic
subscriptions (serials,
databases, etc.)
$15 million
(86% of e-resource
budget)
• Libraries are making
substantial investments
in electronic resources
13. • Several players in providing access to e-resources
– Libraries
– Content providers
– Knowledgebase vendors / Link resolver vendors
– Subscription agents
• More interdependencies than ever…all based on…
A Changing Environment
14. E-Resource Data Supply Chain
Library activates purchased content in KB
to make content available for discovery
Content provider supplies
knowledgebase provider with
metadata for all electronic
content available for purchase
Library purchases electronic
resources. Content provider
supplies library with title list
of purchased materials.
(hopefully!)
Content
Provider
Knowledgebase
Provider
Library
16. Manual Processing
• Holdings maintenance is a time consuming and
manual process
• Constant ‘tweaking’ of metadata in ERM
– Serial coverage dates
– Individual title purchases
– Non-standard packages
17. TT
Metadata supplied by
content providers is
often incomplete or
erroneous
• Title changes
• Title transfers
• Ceased titles
Problematic
Metadata
18. TT
Time Lags
• Getting content provider metadata
into knowledgebase
• Getting title list from content
provider
• Getting holdings registered in ERM
• The more time goes by, the greater
chance it will get neglected
19. Electronic resources exist in remote locations, yet we
rely on people in libraries to pass around information
about their holdings.
Metadata is passed through many hands…
Sometimes, the baton gets dropped…
Too Many Intermediaries
20. To overcome current shortcomings in ERM, we
need to change the way the data flows.
How should data travel?
…As the crow flies
21. Automated Holdings Management
Content
Provider
Library
Content provider supplies
knowledgebase provider with
metadata for all electronic
content available for purchase
Knowledgebase
Provider
Content provider supplies
knowledgebase provider with
metadata for institution-specific
holdings.
Knowledgebase provider activates
institution-specific holdings in
content packages.
Electronic resources are available
for discovery without library
intervention.
22. Behind the Curtain
• To enable autoload, providers supply
OCLC with the following files:
– Collections File: KBART format file for
each collection/package offered by
the content provider
– If applicable, KBART format file for
PDA e-books
– Collections Description File: Listing of
all collections being transferred
– Holdings Data File: Includes the
institution holdings by collection/title
with customer identifier
– Customer Map: Includes the
provider’s customer identifier and the
corresponding OCLC cataloging
symbol
25. Research Questions
• How well do automated loads reflect the
library’s purchased electronic content?
• What types of collections are ideal for
automated holdings maintenance?
• How quickly do titles get in the system
using the automated service?
• How is the loaded content organized in
relation to the library’s licensing
agreements?
• Does the service provide adequate
reporting to enable libraries to monitor
their collections?
26. The Study
• Study duration: September 2014 – May 2015
• Signed up for as many automated feeds as possible,
no matter how big or small
• Each time a file was uploaded in WorldCat
knowledge base, a corresponding access report was
retrieved from the content provider site
• Data uploaded to a MySQL database and
manipulated to make it suitable for comparison
• Custom scripting to determine matched and non-
matched titles
30. ebrary Observations
• Irregular frequency (between Sept 2014 and
May 2015, only 10 uploads)
• Single title orders are often the most anxiously
awaited…monthly load too long to wait
• Majority of missing titles showed up in the
next subsequent upload
• KB initially represented a fraction of our
ebrary titles…later additional collections were
added to the knowledgebase
33. MyiLibrary Observations
• Load frequency does not live up to
expectations (between Sept 2014 and May
2015 there were 3 uploads)
• List provided by content provider missing a
large number of purchased titles
(approximately 30,000 titles uploaded; 39,636
titles available on website)
• All MyiLibrary content in one collection. Does
not account for separately licensed content.
34. Postscript to MyiLibrary Story
• After contacting MyiLibrary about the missing
titles, a list was produced containing ALL 39,636
titles we subscribe to on the platform.
…for the MyiLibrary collection to
be updated in the WorldCat
Knowledge Base…
35. EBL Ebook Library
• Service Profile
–Collection in KB: Ebook Library Catalogue
–Frequency: Once a week
–OCLC number coverage: 99.8%
–Available for PDA: Yes
37. EBL Book Library Observations
• New content provider for University of
Toronto Libaries
• Perfect results, though sample was extremely
small
• Close to weekly uploads (three loads in a one
month span, though nothing since end of
March)
38. Elsevier ScienceDirect
• Service Profile
– Collections in KB:
• Elsevier ScienceDirect Journals
• ScienceDirect Book Series
• ScienceDirect All Books
– Frequency: Weekly
– OCLC number coverage:
• Elsevier ScienceDirect Journals – 91.6%
• ScienceDirect Book Series – 96.7%
• ScienceDirect All Books – 98.9%
– Available for PDA: No
39. ScienceDirect Access Report
• The ScienceDirect access report includes:
– Subscribed titles
– Complimentary titles
– Free-to-read titles
– Non-Subscribed titles
• Much duplication in report, mainly attributed to
differing access types.
• All categories, except for the non-subscribed
titles, are represented in the data feed to OCLC.
40. Six Publication Types – Three
Collections
• Journal
• Book
• Book Series
• Book Series Volume
• Reference Work
• Handbooks Series
Books
Book
SeriesJournals
41. ScienceDirect Analysis
A Game of Hide and Seek
• Over the course of the study, some
content was missing or moved
from one collection to another.
– Many book series volumes missing
from collections
– Handbook series moved from book
series collection to serials collection
– E-books were often contained in
more than one package
42. Changing Directions
• Due to difficulties in data
matching through time, a
new approach was needed
• Treat ScienceDirect as a
single collection and
compare distinct URLs
• Led to a more accurate
picture of the uploaded
content
44. Elsevier ScienceDirect Results
1129
1208
1234
69
116
182
89
108
46
72
159
46
19
93
110
123
66
71
51
97
42
53
0 2000 4000 6000 8000 10000 12000 14000 16000
9/17/2014
10/25/2014
11/2/2014
11/16/2014
11/30/2014
12/10/2014
12/14/2014
12/22/2014
1/11/2015
1/19/2015
1/25/2015
2/8/2015
2/15/2015
3/2/2015
3/8/2015
3/16/2015
3/22/2015
3/29/2015
4/5/2015
4/12/2015
4/18/2015
4/29/2015
Unmatched URLs
What we really want to know is
how many titles DID NOT get
into the knowledgebase.
45. ScienceDirect Observations
• In early uploads, many book series volumes did
not get loaded into the knowledgebase
• Change in definition of ‘ScienceDirect Book
Series’ collection largely resolved missing title
issue
• In most cases, e-resources that were missing in
one load, showed up in the subsequent load
• Frequency is generally consistent, with a few
minor hiccups
46. Of all the titles NOT matched throughout the study…
…there were only 20 titles not represented in the KB…
…That’s only 0.1% of all titles in our Elsevier account…
47.
48. Autoload vs. ‘Traditional’ ERM
Techniques
• Comparison between UTL’s ‘subscribed’
ScienceDirect titles in ERM and Elsevier
entitlements
• Misalignment between selected packages and
actual purchases
– 879 titles we are entitled to were not represented
in subscribed content packages
– 247 titles in the subscribed packages were titles
we did not have access to
51. An ERM Promise Fulfilled?
• Time saving for librarians
• Well suited for “cherry-picked”
collections where manual
selection is necessary (i.e.
aggregator platforms)
• Increased accuracy
• Excellent compatibility with PDA
programs
52. Some Remaining Challenges
• Completely reliant on accuracy of
content provider metadata
– Any problems need to be addressed by
the content provider
– Manual corrections will be overwritten
each time data is reloaded
• Length of time between uploads can
be long (monthly or more)
• Difficult to spot when things do go
wrong and content does not get
loaded.
54. Seamless Updates
• Will there ever be a time when activation on
content provider site and knowledgebase is
synched daily?
55. Better Reporting Capabilities
• Increased reporting capabilities
– Alerts/notifications when uploads occur
– Libraries need to know what content could not
be loaded
• Feedback loop
– Ability to analyze data
and report inconsistencies
leads to better product
development
56. Help With Single Journal
Subscriptions
• Managing single e-journals is like trying to
herd cats
– Consolidation of registration/activation
• Do I really need to activate a title on the vendor site
AND in the ERM system?
– New opportunity for subscription agents?
57. Concurrent Users
• Ability to determine concurrent user limit
• Particularly important for aggregator
packages that have multiple purchasing
options
– i.e. ebrary MUPO and SUPO collections
58. Greater Participation
• This is only the tip of the
iceberg
• Libraries need to advocate
for autoloaded collections
…LOUDLY!
59. How Do We Get There From Here?
Standardization
Technological
Sophistication
Co-operation
Customer
Feedback
Progressive
Licensing Terms
Data Integrity
60. A Common Purpose
and knowledgebase providers
Above all, perhaps, librarians and publishers
should sit down at a table of common
purpose and join again in what has always
been a necessary partnership: to publish
and make available the ideas and creative
works of authors.
Harold Billings, “Supping with the devil:
new library alliances in the information
age.” (1993)