1) The document provides an introduction to WorldCat, the world's largest bibliographic database maintained by OCLC, including its structure, contributing methods, cataloguing clients, formats and standards, quality control processes, and tools.
2) It discusses how matching and merging of records is done in WorldCat, focusing on factors like title, publisher, and extent that are considered to determine if records should be merged.
3) The presentation concludes by answering questions from attendees about issues like retaining records without holdings, data standards for integrating museum library collections, and searching capabilities in WorldCat.
1. CIG 2018, Edinburgh
Paul Shackleton
Everything you always
wanted to know about
WorldCat (but were
afraid to ask)
Senior Database Specialist, OCLC UK
2. A quick introduction to WorldCat:
• Background and the basic idea
• Tools and services
• Quick access to key information
• Your chance to ask questions
Aims and Objectives
3. Paul Shackleton:
• Librarian/Systems administrator
• Not a cataloguer (but I have catalogued)
• With OCLC for over 4 years, based in Sheffield
• Specialist in loading and updating data in
WorldCat, including migrating library
collections to WorldShare
Introductions
5. The world’s largest and most
consulted bibliographic database
• 2.6 Billion holdings
• 425 Million bibliographic
records
• Collective of collections
Free to use - worldcat.org
6. • De-duplicated, single Master Record model
• ‘Duplicates’ permitted for:
– Different cataloguing language
– Different Editions
– Different Formats
• Unique/local information should not be added
to a Master Record
• ‘Set Holdings’ – a flag saying ‘I have this’
WorldCat structure
7. • Registered libraries are assigned a unique
symbol, 3 or 5 characters long
• Each library has a Registry record
• Registry records should be maintained via
Service Configuration
WorldCat structure
8. OCLC number:
• Unique to each bibliographic record and
trusted
• Stored in WorldCat in the 001 but distributed
in the 035
• Prefixes – om, on, (OCoLC)
• Old OCNs never die, stored in the 019 and
indexed
WorldCat structure
9. 1. Interactive cataloguing using one of our
cataloguing clients
2. Bulk upload of additions and deletions via
data sync
Contributing to WorldCat
10. 1. Connexion client
2. Connexion browser
3. Record Manager
4. Z39.50
5. Metadata API
You can export records with any of these clients.
Cataloguing clients
11. • Old school client software, Windows only
• Seriously powerful – batch functions and bulk
editing and setting/removing of holdings for
up to 9999 records at a time
• Online and offline save files to aid workflow
• Direct integration with your ILS via TCP/IP or
OCLC Gateway
• No access to local data
Connexion client
12. • Web based: http://connexion.oclc.org/
• Shares the online save file with Connexion
client
• Access to Local Holding Records (LHRs)
• No batch/bulk options
Connexion browser
13. • Part of the WorldShare platform, a web
browser based application
• Limited batch options and the place for future
developments
• Supports LHR records and local bibliographic
data (LBD)
• Got a cataloguing subscription? ask UK support
for access to Record Manager
Record Manager
14. • Old school but probably the most popular
method of accessing WorldCat in the UK
• Direct integration with your local system
• In June extended services were implemented
to allow for the deletion of holdings – ask your
system supplier about implementing this so
you can update WorldCat when you delete a
record or the last item in your system.
Z39.50
15. • Build your own interface with WorldCat
• Add and edit bibliographic records
• Set and delete holdings
• Add and edit LBD records
• Open to all with a cataloguing subscription
• Available to all via MARCEdit
• Any organisation can build tools with the API
Metadata API
16. • Send records via sFTP or Collection Manager
• Complete collection load = reclamation
• Send updates – additions, amendments and
deletions
• Own record replace
• Field transfer for record enhancement
• Adding records is optional
• Add security to your collections and access
Bulk upload via data sync
17. • Part of WorldShare metadata management
• Upload and download records and reports
• Receive WorldCat updates for record changes
• Manage e-resources and MARC record delivery
• Rolled out to all Ebsco customers for MARC
record delivery
• Query collections allow targeted selection and
delivery of records
Collection Manager
18. • MARC 21 only for now, can export in MARCXML,
Dublin Core and MODs
• Manuals for bibliographic, LHR and authority formats
• Bibframe is actively supported and developed by
OCLC
• AACR2, hybrid and RDA compliant records supported
• Records are being upgraded to include 3XX data
Formats and Standards
19. • Program for Cooperative Cataloging (PCC) records are
protected but can be enhanced, indicated by 042 $a
pcc
• Library of Congress bibliographic and authority
records are synchronised on a daily and weekly basis
• QC department is dedicated to maintaining quality
• Duplicate Detection and Resolution (DDR) software
constantly seeks and removes duplicate records
• Request changes via bibchange@oclc.org
Quality Control
20. • FRBR clustering has been implemented within workid
groups
• This clustering is visible in the editions and formats
grouping in WorldCat and Discovery
• Subfield $0 has been implemented in authorities in
preparation for linked data triples
• More authority record sets are being added
• Classify – find call numbers based on WorldCat data
• Command line searching – highly addictive
Tools and good to know
21. • Support UK: support-uk@oclc.org
• Community Centre:
https://www.oclc.org/community/home.en.html
• Help pages: https://help.oclc.org/
• Training: https://help.oclc.org/Librarian_Toolbox
Support and Training
22. Q. Is it OCLC policy to retain Bib records in WorldCat if
there are no library holdings attached to them?
A. No, but you can request that individual or groups of
records be deleted by contacting QC. However we do
sometimes clean up after particular projects, such as a
WMS migration where we know that the old records,
not matched by the load, will no longer be needed.
Your Questions
23. Q. We are currently integrating around 45,000 items
across 6 museum libraries and I would be interested in
knowing more about data standards & formats.
A. If your catalogue originates from multiple source
catalogues, and has duplicate bib records, setting
holdings in WorldCat as part of a group project can help
you get a de-duplicated view of your holdings and the
spread of copies across your sites.
Your Questions
24. Q. How the matching and merging of records is done,
and what factors are taken into account with particular
emphasis on differences in the 245 (especially the $c
field), the 260/264, and 300 fields and to which extend
these can prevent a record from matching.
Your Questions
25. Matching:
• Every record in WorldCat has a set of
‘fingerprints’ (aka ‘keys’)
• Every record ingested has a set of fingerprints
generated and these are used to compare with
existing fingerprints to find candidates for
matching
Your Questions
26. Matching:
Data elements used as the primary source of retrieval and comparison for
matching include, but are not limited to the following:
• Standard Numbers including OCLC Numbers, ISSN, and others.
• Material Types
• Dates of Publication
• Language of Cataloging
• Title
• Author
• Edition
• Publisher
• Extent
Your Questions
27. Matching:
• Process is iterative, bibliographic Kerplunk!
• Certain data elements carry a higher weighting but can be invalidated by
certain criteria, such as an OCN matching a record that has a different
language of cataloguing or record type.
• The fingerprint matching is initially ‘fuzzy’ to allow a closer analysis of a set
of candidates.
• All elements of a record’s core description, including some 5XX notes fields
under certain circumstances, are always evaluated.
• Completeness is key.
• If in doubt the record is added, then Duplicate Detection and Resolution
(DDR) will evaluate the record again.
Your Questions
28. Q. As some cataloguers have been downloading foreign
agencies’ records and editing them and turning them
into English, is there going to be any issue in the
matching procedure if the ocn number in the 035 field
of the downloaded record remains? Does this point to
the foreign record?
A. No, the mismatch of language and OCN will prevent a
match, the 035 will be removed and either a correct
match will be found or the record will be added. Always
remove the OCN when copy cataloguing.
Your Questions
29. Q. I would like to understand how the indexes help in
searching for item, particularly non standard search
outside the usual of author/title/date.
A. See the help pages for indexes:
https://help.oclc.org/Librarian_Toolbox/Searching_Worl
dCat_Indexes
Your Questions
30. Can I search for instance for all engravers in my
Collections, if I have used the $eengraver for
engravers?
Yes, using the rx index which indexes
relationship, e.g. li:yt1 AND rx:engraver
Your Questions
31. Can I search for materials from a date range (Particularly
when dates look like: 18—, 165? , or between 1849 and
1853), or for works by language (English, German...), or
for country of publication (taken from the 008/15-17),
for items with illustrations etc.?
Yes you can refine searches with qualifiers:
yr:201?
/1914-1918
la:ger (this is a default facet in most WorldCat clients)
Your Questions
32. Q. I would like to know if when a library matches
its data to OCLC, if at the same time all headings
can be batch authorised with equivalent LC
headings.
A. Yes, software was introduced in 2012 to
automatically control headings in records when
new records are added and retrospectively.
Your Questions
If reading this as a hand out note that links in the notes are often hot links in the slides – move your cursor around to find the links.
First UK metadata specialist, part of a metadata team that is based in Leiden, the Netherlands.
Our headquarters are in Dublin, Ohio. OCLC was established in 1967 to support library cooperation and the result was WorldCat which began life in 1971. OCLC? Originally the Ohio College Library Center, now the Online Computer Library Center but OCLC for short.
Anyone who has a subscription is a member, and all members can vote and gain a place on the board of trustees or the regional council:
https://www.oclc.org/en/membership/councils/directory/emea.html
Some stats: https://www.oclc.org/en/worldcat/inside-worldcat.html
Bibliographic records, a Knowledge Base and a Central Index
https://www.oclc.org/en/worldcat/watch-worldcat-grow-popup.html
https://www.worldcat.org/
Note that worldcat.org includes more than just the WorldCat bibliographic database but also includes the central index and KB content: https://help.oclc.org/Discovery_and_Reference/WorldCat_Discovery/FAQ/What_is_the_difference_between_WorldCat_and_WorldCat.org%3F_What_do_I_need_to_know_about_the_central_index%3F
But today I just want to talk about collection cataloguing and WorldCat records.
Actual duplicates do occur. Hybrid records are not permitted, e.g. trying to use a print record for the electronic version as well. We need the record to describe the actual item in whatever form it exists, while we aim to make the user experience more simple by FRBRization in the discovery interface.
Some local data can be registered in subfield $5 where this is permitted by the WorldCat manual Bibliographic Formats and Standards.
https://www.worldcat.org/registry/Institutions
https://worldcat.org/config/ - contact support if you need to establish access
Where records are merged the OCN is retained in 019 of the retain record.
WorldCat does not represent a threat to your work, quite the opposite. We rely on individual cataloguers to do the work of contributing original records and completing and correcting the records that exist. OCLC cannot do the cataloguing, we don’t have the items to hand, but we do work hard to maintain quality to support you. Many records are now supplied directly by publishers.
Yet another client does exist, Cat Express, designed for copy cataloguing only, and this is a cut down version of Connexion browser.
https://www.oclc.org/support/services/connexion.en.html
No screen shot but it’s the grey one, quite a retro look these days.
LBD – when you load records via data sync your local system number will get recorded in a LBD record
https://help.oclc.org/Metadata_Services/WorldShare_Record_Manager
Demonstrate WorldShare here.
Community centre - https://www.oclc.org/community/record_manager.en.html note release notes and roadmap.
https://www.oclc.org/support/services/z3950.en.html
An obvious workflow but it does not offer a means to get your corrections and additions back into WorldCat so if you use this method think about sending regular updates via data sync and having one of the clients to hand to go and fix something that is wrong as if you don’t when you set holdings in WorldCat that record might not be as complete as you would want.
If you need help getting set up with WorldCat API functionality in MARCEdit please open a support call.
https://help.oclc.org/Metadata_Services/WorldShare_Collection_Manager/Understand_record_processing/Data_sync_processing
https://help.oclc.org/Metadata_Services/WorldShare_Collection_Manager/Understand_record_processing/Data_sync_processing#Transfer_of_bibliographic_data
You get more than just holdings maintenance – a metadata specialist is assigned and you will get feedback, reports in validation and record cleaning, plus reports of your holdings with the matching OCNs.
If you maintain your holdings in WorldCat you have a ready-made disaster plan, some libraries have pointed users to WorldCat when their local systems have been down. All records can be delivered back to you via a Collection Manager query collection.
Collection Manager does a number of jobs but mostly it is where KB collections are managed – select collections, set holdings, receive MARC records for upload to your local system.
Query collections use command line search syntax, more on that later.
OCLC manuals are ubiquitous in USA cataloguing departments.
We are taking a pragmatic approach to RDA, see the links to the RDA information page and the OCLC statement from bib formats and standards.
Any cataloguing rules are acceptable so long as the resulting records are well formed and valid.
We actively support PCC, CONSER, NACO and SACO. NACO/SACO membership should be considered if you want to steer authority record creation for the UK, forming a UK funnel might be a good idea.
https://www.loc.gov/aba/pcc/naco/
https://www.oclc.org/bibformats/en/quality.html#qualityassurance
Note the Ask QC email address and askqc@oclc.org
Note the excellent article by Jay Weitz on how to work with DDR.
You can see the worked grouping at the top of the record edit view in Record Manager.
https://help.oclc.org/Metadata_Services/WorldShare_Record_Manager/Authority_records – recently added German names, Maori subject headings, but LC names and authorities are still the default. In future name and subject headings will be contextual based on a user’s language, i.e. a German user would see the German headings and names rather than LC. If you want to add or edit authority records you need to be part of one of the authority programmes, such as NACO. If you use authorities that WorldCat does not support the headings are permitted but there will be no authority record linked to the heading, we call these ‘uncontrolled’ headings’.
https://help.oclc.org/Librarian_Toolbox/Searching_WorldCat_Indexes/Indexes – example command line search, excluding e-resources from the results as follows (the ‘li’ index (find holdings for a the symbol supplied) not valid in ‘worldcat.org’ web service) - li:yt1 AND x0:books NOT mt:url
Access to the Community pages will require a WorldShare login and is limited by what modules you have access to.
The over-view should have covered many of the aspects of WorldCat holdings maintenance that may be of interest to you. If your catalogue originates from multiple source catalogues, and has duplicate bib records, setting holdings in WorldCat as part of a group project can help you get a de-duplicated view of your holdings and the spread of copies across your sites. You will need to ensure that your branches each have symbols or request branch level symbols to create basic LHR entries to ensure that sites can be profiled in WorldCat
This lists the kind of data used to build a ‘fingerprint’ or match key.
We can’t share any more detail, this is complex proprietary software.
DDR picks up on looking for duplicates within WorldCat while ingest matching is trying to find a match before resorting to adding a new record.
The answer given assumes that the OCN is supplied in the 035 $a but it could be any field and subfield.
Use Record Manager for the li (holding library symbol) search, NOT worldcat.org. ‘rx’ is the index for relationship and covers a range of tags and subfields.
Have not found an option to search for illustrations yet.
Automated heading control announcement: https://www.oclc.org/en/news/announcements/2012/announcement1.html