RDA, URIs, and Linked Open Data: optimizing the catalogues we have to create the connections we envision / Casey Cheney
1. RDA, URIS, AND LINKED OPEN DATA
OPTIMIZING THE CATALOGS WE HAVE TO CREATE THE CONNECTIONS WE ENVISION
Presented by Casey Cheney
CILIP – CIG Conference
5-7 September 2018
Edinburgh, Scotland
9. AACR2 TO RDA CONVERSION
Manual upgrading or searching (time intensive)
Automated batch searching for updated copy
(expensive)
Third party automated conversion (cost effective,
minimal time commitments)
20. UNIFORM RESOURCE IDENTIFIER (URI)
Links to
Long, Richard,▼d1945-
Library of Congress
Name Authority File
▼0http://id.loc.gov/
authorities/names/
n84090805
21. REAL WORLD OBJECT (RWO)
Links to
Long, Richard,▼d1945-
Virtual International
Authority File
▼1http://viaf.org/viaf/
95766060
22. REAL WORLD OBJECT (RWO)
https://bit.ly/2MmWhOc
BBC Things
CERL Thesaurus
DBpedia
GeoNames
GND (excluding Subjects
Headings)
ISNI (URI only)
LC (/rwo/agents style URI)
MusicBrainz
ORCID
TGN (only certain styles of
URI)
ULAN (only certain styles of
URI)
VIAF
WikiData
24. <image>
<image_title>Lister Institute Group Photograph with names</image_title>
<image_iap_image_no>L0068570</image_iap_image_no>
<image_keywords_unauth>
<_>Institute;</_>
<_>International ACM SIGGROUP Conference on Supporting Group Work;</_>
<_>ACM SIGCHI International Conference on Supporting Group Work;</_>
<_>members;</_>
<_>fellows;</_>
</image_keywords_unauth>
</image>
<uris>
<uri_lc_name>http://id.loc.gov/authorities/names/no98079700</uri_lc_name>
<uri_viaf>http://viaf.org/viaf/312890028</uri_viaf>
<uri_lc_name>http://id.loc.gov/authorities/names/no2009138784</uri_lc_name>
<uri_viaf>http://viaf.org/viaf/160752016</uri_viaf>
</uris>
25. WHAT'S NEXT FOR BACKSTAGE?
Fine tuning 34X addition
Control terminology within 33X, 34X, 38X fields
Automatically splitting 3XX fields with multiple
vocabularies in one field
Fine-tuning current RDA processing for print and
recorded music
Incorporate other URI/RWO sources
26. THANK YOU!
Casey Cheney, PMP, Vice President of Automation
Services
ccheney@bslw.com
Editor's Notes
The tides are changing for how bibliographic data is recorded, packaged, and used. Within the last decade we saw a shift in recommendations for cataloging, a move to RDA from AACR2. RDA was designed with a greater purpose in mind, a purpose that reaches beyond MARC. So it's only natural that at the same time this discussion started the idea of library data sets linking together on a global scale was also introduced. Linked open data, as it relates to libraries, has been touted as a way to step away from the antiquated MARC model and move into the future of hosting a different format of data more openly so data can be searched on the World Wide Web and discovered by users around the world.
So my question to you is, where are you sitting? Are you in a boat that is moving with the flow of the changing tide? Or are you in a boat that is struggling to decide where you want to be and therefore you may be stuck on a sandbar waiting for the tide to inevitably sweep you away?
Being a vendor, Backstage Library Works has a unique perspective to the perception of Library Linked Data and it is a mixed reaction. We have seen some institutions jump head first into figuring out how to make both RDA and Linked Open Data usable. Others are quietly praying for retirement so they don't have to think about the changing landscape, or they're waiting for the greater community to decide that these are not the avenues to take. Some want to stay on top of the changes but they don't know where to begin. Regardless of how we feel about either RDA or Linked Data, it is necessary to be aware of the developments being made so we can make decisions that will be best for ourselves and for our patrons.
One of the questions that Backstage has received most often regarding linked data is “Where do I even begin?”. Another question we should be asking is "What kind of experience do I want my patron to have" or "How will RDA & Linked Data help my patron have a better experience"? Though overly simplified, the end goal of both RDA and Linked open data is to have an environment in which patrons can more easily access the amazing collections that are often hidden within our libraries. We want to show off these collections!
Once we decide that we do want a better experience for our patrons we may be wondering, well do I jump into Linked Data by transforming my data in Bibframe or another schema or do I start smaller?
As has been mentioned at previous American Library Association conferences, a very simple place to start is Enriching your bibliographic records with URIs from various sources, such as these presented here.
I would personally argue, that before you even get to that stage you should consider retrospective RDA Enrichment (if your catalog is not entirely compliant with RDA).
In March 2010 Barbara Tillet, from the Library of Congress, also pushed the point that before we can link data globally through URIs and Linked Open Data, data needs to be properly organized in a more usable format, meaning using RDA standards for our current Bibliographic data. (Tillett, Barbara. "RDA: Looking to the future". https://bit.ly/2OLTf2B)
So let's start with a little discussion about RDA
We know that the RDA standards are still very dynamic, they’re being updated and refined constantly, and yes, learning a new cataloging standard is difficult for some & time consuming for all but this doesn’t mean we shouldn’t keep moving forward. If we look at the Objectives of RDA as illustrated by the RDA Joint Steering Committee we’ll see that the primary goal is to increase findability of works, manifestations, expressions, and items. Again, we need to start with the "end goal in mind" when it comes to our decisions regarding RDA. Do we want our patrons to be able to find more within our collections?
Theoretically, through the use of RDA a Library Management System or a Discovery layer, a search by a patron should assist that individual in finding items that are related to each other, whether it be through subject matter, creator, format, inclusion within a series, and so forth.
RDA assists in this increased findability by attempting to facet data that were previously coded in unsearchable fields by placing them in fields that have the potential to be indexed and searched.
The majority of the faceted terminology within RDA are also controlled by various registries and term lists, meaning that any institution using that particular registry for terminology will all use the same vocabulary for certain metadata. This also lends the possibility to future automated control of these terms within your catalog as well as linking through Uniform Resource Identifiers (URIs).
Whether or not the LMS and/or Discovery layer can make use of this faceted information at the moment is still inconsistent, but if we keep pushing for changes we want to see, hopefully the changes will keep coming, especially since usable Linked Data is still a number of years down the road.
If you’re like many other institutions you’ve probably started cataloging in RDA for newly acquired materials. You also probably realize that the rest of your catalog may not be consistent with your current cataloging practices but aren't sure how to go about it making all records consistent. So here are a few options (and I'm sure you have thought of other solutions for your particular circumstances)
You could start a project to have interns, students, volunteers, etc. begin re-searching titles in national/international databases such as OCLC or the British Library to find RDA copy. This is extremely time intensive and costly, especially if you don't have ways of instituting batch search processes.
There are some vendors who, for a fee, could do this searching through automated batch processing. There may be some limitations due to subscription requirements for these external databases or you may still end up with AACR2 copy and will still have to manually upgrade. Now don't get me wrong, automated batch searching is phenomenal for certain projects very cost beneficial, but for the sole purpose of RDA Enrichment, it's not the best solution.
Something you may have never considered, however, is that you could use a 3rd party vendor to go a global, automated, enrichment of your data. This doesn’t re-code the 040 field as "rda" but it does create a hybrid record with RDA-elements. This is a very quick & cost effective process and the most time you'll probably spend in review is during initial sampling to ensure that conversion happens as you expect for certain types of records.
Any more, vendors that offer bibliographic services or authority control services have developed tools in which to globally change this data on behalf of their library clients. Backstage Library Works was fortunately to be part of the Library of Congress RDA test in 2011 and have since had staff members on various standards committees in order to develop and maintain a service that would fit PCC recommendations while still giving libraries the flexibility that they require, especially if there are aspects of RDA with which they may not particularly agree.
So, what does an automated RDA Enrichment service look like? What fields are edited or added? Because we don’t have time to go into every alteration our specific process handles, I’ll give you a few key examples.
First we'll start with abbreviations, because Appendix B in the original RDA Toolkit states that only certain elements within a bibliographic record are allowed to be abbreviated, we needed to expand abbreviations in other fields, such as the 300 physical description, 504 Bibliography, and 700 relator term.
As you can see in this example the abbreviations for "pages", "illustrations", and "editor" were expanded to their fuller forms.
There has been some debate whether an automated 260 to 264 conversion is possible, or even acceptable. For many years the PCC has recommended NOT making this conversion through an automated process, but we have found that this conversion is more straightforward than originally thought. As a result, while it is one of our default settings, our client libraries are free to decide whether or not they want this conversion made.
Another major change from AACR2 to RDA is the conversion of the 245$h (GMD) into RDA compatible fields. By utilizing a complex conversion table that is built into our programming we're able to make this particular conversion as well as a few others for audio-visual materials.
As you can see here, we've removed the 245$h and replaced it with the 336, 337, and 338 "content, media, and carrier" fields. We've also been able to add the 340, 344, and 346 fields.
Another change comes to printed music. The data from the 240 $m and $n can now be expressed elsewhere within the bibliographic record.
The 382 and 383 fields are added based on the 240 field while the 348 is added based on the 300 & 008 fixed field coding.
Standard cataloging practice dictates that only one language of an expression may be present in an authorized access point (RDA LC-PCC PS 6.2.2.4). As such, we have developed a process that will create separate access point entries for each language expression.
As we all know, parallel titles are often forgotten about and not added during original cataloging, and it's difficult to find every instance if you're working in your LMS. Our processing will automatically add those parallel titles for you, allowing even greater discovery by your patrons.
Once the RDA Enrichment is complete you’ll be in a prime spot to do automated global URI Enrichment.
When it comes to discussions about linked data, we've all seen "Linked Open Data" clouds like this that show relationships between entities. But how are these linking relationships created? They're created by the use of URIs in some sort of RDF-type schema. I'm not here today to talk about all of the various links that our Bibliographic data will create, nor where those "Bibframe" related links come from. I want to talk on a more basic level about adding URI links to our existing Bibliographic data. This is preparatory to any true Linked Data connections.
Generally speaking, a URI that is added in Bibliographic data will link out to the identity of what is being described in the Bibliographic heading. For instance, if we’re adding a URI to a personal name heading, the URI will likely be that of the authority record from the Library of Congress Name Authority File, as in this example. The $0 with the URI is added to the end of the string of your bibliographic heading.
The same goes for authorized Subject Terms and Title entries.
Another version of “URI” linking takes the form of Real World Object links, which appear in the $1 of the bibliographic heading instead of the $0. The difference between a URI and an RWO is that the RWO is supposed to represent the actual “Thing” represented in the heading while a URI links to a description of the thing in the heading.
You may be wondering, well how do I know if a URI should be in a $0 or if it’s really an RWO and should be in $1. The document by the PCC Task Group on URIs in MARC issued a document in Feburary 2018 entitled “Formulating and obtaining URIs: A guide to commonly used vocabularies and reference sources” gives instruction for which source URIs should be $0 and which should be RWO URIs should be in $1. The short list I have here are those they designate as RWOs.
You'll notice a couple oddities. First, the GND has a note that URIs for Subject Headings are entered in $0, all other heading types are $1. ISNI is a special creature in which if you wanted to add just the standard number, instead of the entire URI, it would be added in $0 as well. And LC, TGN, and ULAN have certain styles of URIs that are $0 and others that are $1. You can get to the PDF of the document from the link on this slide and it will give you examples of the exceptions.
I know that this document is still being updated but it is a really great resource as it currently stands!
Source: https://www.loc.gov/aba/pcc/bibframe/TaskGroups/formulate_obtain_URI_guide.pdf
Backstage had the privilege to be part of early conversations with the PCC Task Group on URIs in MARC to begin building our own Automated URI Enrichment service. We have continued testing of adding URIs to additional fields with institutions that are at the forefront of Linked Data exploration. We continue to stay on top of discussions and proposals for URIs in both Bibliographic records and Authority records.
As a result of these discussions and close work with Stanford University, we've been able to add the 043, 050, 336, 337, and 338 fields to our URI Enrichment processing in addition to your standard access point fields. Please note that the recommendation for adding a URI to a 6XX subject term is to only add one if there is an exact match against an Authority Record for the entire heading string, so if you have a subject heading where maybe the $v Form subdivision does not appear in the Authority Record with the rest of the heading, a URI would not be added.
Doing this URI and RWO enrichment while we’re still in a MARC Environment allows you to start looking at what URIs can (or cannot) do in a traditional Library Management System. It also puts you in a good place to start testing Linked Data transformations.
Performing URI Enrichment prior to any Linked Data transform does help lessen the amount of manual reconciliation after the transformation is complete. If you’re using a vendor for Automated Authority control, you should be able to add URI Enrichment to your processing specifications. If you’re not currently using an Authority Control vendor, it is worth considering a large retrospective project to insert these URIs into your Bibliographic data. Using an Authority Control vendor for this process will first ensure that the proper form of the heading is inserted into your Bibliographic record thus resulting in a significantly increased number of correct URIs being added. I am aware that there are other tools available to add URIs but using an Authority Control vendor will give you an edge by correcting the heading's form and we are continually exploring other MARC fields and other sources that would benefit from URI Enrichment.
Within the last few years we’ve also had the pleasure to work with the Wellcome library to perform Authority Control and URI Enrichment on non-MARC metadata; the work we do is not limited to just MARC! Each non-MARC project will have additional tailoring of our programming that is not required with MARC data, but it certainly opens the door for more standardization across a broader metadata landscape.
Backstage is always looking for other sources that could be tapped into, such as WikiData or MusicBrainz, other fields that can be enhanced by the presence of URIs or RWOs, And we’re always looking for institutions that would like to be part of fine-tuning our Non-Marc process or helping to develop a new, complementary service.
So what are we working on next? There are a few different things related specifically to our RDA & URI Enrichment processes that I wanted to highlight.
1. Within the last year we added the capability of adding 34X fields for audio visual materials; and we’re fine tuning this now based on feedback from a recent sample that was completed.
2. This year we’ve been working on actually creating a protocol that will control & change terminology within the 33X, 34X, and 38X fields, and will hopefully be adding URI linking capabilities shortly for the 34X and 38X fields.
3. We’ll also be starting to explore automatically splitting existing 33X and 34X fields that currently contain multiple vocabulary sources so they can be better used and controlled.
4. We’re also looking forward specifically fine-tuning existing processing for print & recorded music as well as seeing what other field additions we might be able to automate.
5. As always, we're looking for other URI/RWO sources to incorporate into our processing. So far there haven't been many requests for sources other than VIAF and ISNI, but we know there will be!
It’s been an exciting year so far for developments and we’re excited to see where the next year takes us.
While Backstage is not involved in actual Linked Data transformation, we are focusing our efforts on preparing our library clients for the changing tide. The RDA & URI Enrichment services that we provide (which include our extensive data cleanup protocols) will help ensure that the bibliographic data is ready for the move into the next generation of library data.