MarcEdit task lists and vendor-supplied metadata : revisiting the subscriber-publisher relationship at the University of Leeds / Ceilan Hunter-Green (University of Leeds).
Like many institutions, the University of Leeds began purchasing and subscribing to streaming video services during the Covid lockdowns, including the new offering of a British Film Institute institutional subscription. This BFI subscription offered an excellent selection, but there were barriers to discoverability and analytics with no vendor-supplied records and a high staffing cost for manual creation of individual records. Bare bones scratch records could be created quickly with the limited metadata provided by the vendor every month, but then had to be manually supplemented with copied metadata from the streaming platform BFI Player.
After about 18 months of this labour-intensive arrangement, a chance conversation with a colleague prompted a re-examination of our vendor-subscriber partnership. Instead of creating records just for our institution, why not share them with other subscribers and get a discount for our subscription?
A MarcEdit task list was developed that would enable creation of full RDA-compliant MARC records on the condition of BFI supplying a .csv file of comprehensive metadata from BFI Player. It was fantastic learning opportunity to develop skills in MarcEdit and to update the team’s knowledge of video streaming cataloguing. Much of this learning was done through the Library Juice Academy’s video streaming and MarcEdit courses, as well as consulting the NISO video and audio metadata guidelines to ensure that the records we would provide to our community would be the most comprehensive possible. Just as we’d hoped, this new arrangement allowed the negotiation of a subscription discount for the University in exchange for sharing these monthly addition and deletion records with other subscribing institutions at no extra cost.
While the task list creation process was a technical challenge, the community impact of the new arrangement has great potential to benefit our fellow subscribing institutions. Subscribers are now receiving records for individual films rather than relying on a single platform record, which will allow for greater analysis of collection usage, direct reading list linking for fellow academic institutions, and improved accessibility faceting through the discovery layer with the newly generated 341 and 655 fields. This presentation will serve both as a practical demonstration of MarcEdit task lists and regular expressions to normalise and enhance vendor metadata - including populating the 008 field with production date, runtime and language information, creating conditional 655 fields for Short/feature film and Fiction/nonfiction film, and adding enhanced accessibility fields for closed captioning and audio descriptions in the 341, 532 and 655 fields - and as an exploration of the potential for institutions with greater staffing power to facilitate community access to vendor content.
Paper presented at the Metadata & Discovery Group Conference & RDA Day (6th - 8th Sept 2023 at IET Austin Court, Birmingham)
Challenges to implementation - Jenny WrightCILIP MDG
More Related Content
Similar to MarcEdit task lists and vendor-supplied metadata : revisiting the subscriber-publisher relationship at the University of Leeds / Ceilan Hunter-Green (University of Leeds).
Similar to MarcEdit task lists and vendor-supplied metadata : revisiting the subscriber-publisher relationship at the University of Leeds / Ceilan Hunter-Green (University of Leeds). (20)
Design and Development of a Provenance Capture Platform for Data Science
MarcEdit task lists and vendor-supplied metadata : revisiting the subscriber-publisher relationship at the University of Leeds / Ceilan Hunter-Green (University of Leeds).
1. MARCEDIT TASK LISTS AND
VENDOR-SUPPLIED METADATA:
Ceilan Hunter-Green
Metadata and Discovery Coordinator
University of Leeds
REVISITING THE
SUBSCRIBER-PUBLISHER RELATIONSHIP
AT THE UNIVERSITY OF LEEDS
CILIP Metadata & Discovery Group Conference 2023
#CILIPMDG2023
2. TIMELINE
2
MARCH 2020 JAN 2021 SEPT 2021 AUG 2022 JUNE 2023
First COVID lockdown Purchase individual
Kanopy titles
BFI institutional
subscription
Begin negotiations to
provide records; start
developing tasklist
First batch of all
records sent to
subscribing institutions
3. BACKGROUND:
WHY THE BRITISH
FILM INSTITUTE?
• Strength of British Film Institute collection, particularly foreign-
language and early film history material
• New offering of institutional subscription
• Challenges:
o Around 600 active titles, comparatively small offering vs.
Box of Broadcasts (over 30,000 programmes) and
Kanopy (currently around 21,000 films)
o Incomplete metadata from streaming video platform
3
MarcEdit task lists and vendor-supplied metadata: revisiting the
subscriber-publisher relationship at the University of Leeds
4. BACKGROUND:
CHALLENGES
• Metadata received in the form of an 8-column
spreadsheet:
o Internal BFI ID, Title, Access start date, Access
end date, Country of origin, Year of release,
Genre 1, Genre 2
• Director names added in February 2022
• Basic Alma import profile and MarcEdit Delimited Text
Translator to create basic records
• Manually filled out remaining fields (Cast, Runtime,
Language, Accessibility, Summary, etc) by copy and
pasting from BFI Player website
MarcEdit task lists and vendor-supplied metadata: revisiting the
subscriber-publisher relationship at the University of Leeds
5. RELATIONSHIP
• Acquisitions colleagues about to renegotiate 2022-
2023 subscription terms
• BFI agreed to a subscription discount on condition of
Leeds providing monthly record files
• We requested all metadata held by the BFI Player
streaming video on demand platform
5
MarcEdit task lists and vendor-supplied metadata: revisiting the
subscriber-publisher relationship at the University of Leeds
6. VENDOR SPREADSHEET, FEB 2023
6
MarcEdit task lists and vendor-supplied metadata: revisiting the
subscriber-publisher relationship at the University of Leeds
7. VENDOR SPREADSHEET, MAY 2023
7
MarcEdit task lists and vendor-supplied metadata: revisiting the
subscriber-publisher relationship at the University of Leeds
8. IMPORT TEMPLATE
8
MarcEdit task lists and vendor-supplied metadata: revisiting the
subscriber-publisher relationship at the University of Leeds
9. TASK LIST 1
9
MarcEdit task lists and vendor-supplied metadata: revisiting the
subscriber-publisher relationship at the University of Leeds
10. 10
MarcEdit task lists and vendor-supplied metadata: revisiting the
subscriber-publisher relationship at the University of Leeds
TASK LIST 1
11. TASK LIST 2
11
MarcEdit task lists and vendor-supplied metadata: revisiting the
subscriber-publisher relationship at the University of Leeds
12. 12
TASK LIST 2
MarcEdit task lists and vendor-supplied metadata: revisiting the
subscriber-publisher relationship at the University of Leeds
13. TASK LIST 2
13
MarcEdit task lists and vendor-supplied metadata: revisiting the
subscriber-publisher relationship at the University of Leeds
14. TASK LIST 2
14
MarcEdit task lists and vendor-supplied metadata: revisiting the
subscriber-publisher relationship at the University of Leeds
15. 15
TASK LIST 2
TASK LIST 1
MarcEdit task lists and vendor-supplied metadata: revisiting the
subscriber-publisher relationship at the University of Leeds
16. DEVELOPMENTS
AND POTENTIAL
16
• Addition of Edited column to streamline extension ID workflow
• Addition of LoC URIs to authority entries
• Fine-tuning handover
• Investigating Free package records offer
MarcEdit task lists and vendor-supplied metadata: revisiting the
subscriber-publisher relationship at the University of Leeds
17. IMPACT
17
MarcEdit task lists and vendor-supplied metadata: revisiting the
subscriber-publisher relationship at the University of Leeds
• Streamlined our processes—a fraction of the staffing
resource
• Huge increase in knowledge around Regular Expressions,
MarcEdit, and streaming video cataloguing standards
• Value offer to other subscribing institutions at no additional
cost to them
• Discount on subscription for our University
• Enhanced accessibility cataloguing which improves user
experience
18. Our partnership with the University of Leeds has helped
us to deliver a much-requested resource by our BFI
player subscribing institutions. I have come to learn how
crucial MARC records are in aiding discoverability,
which is of the utmost importance to us, as our aim is for
students and staff to use their BFI player subscriptions to
engage with the cultural value of film and support their
studies. We didn't have the expertise to create them in-
house, and the insight of [the UoL team] has been beyond
valuable.
“
Simone Pyne
Senior Business Development Manager, BFI
simone.pyne@bfi.org.uk
”
18
IMPACT
MarcEdit task lists and vendor-supplied metadata: revisiting the
subscriber-publisher relationship at the University of Leeds
19. QUESTIONS?
Ceilan Hunter-Green
Metadata and Discovery Coordinator
University of Leeds
c.hunter-green@leeds.ac.uk
MarcEdit task lists and vendor-supplied metadata: revisiting the
subscriber-publisher relationship at the University of Leeds
Editor's Notes
Good morning. My name is Ceilan Hunter-Green and I’m one of the Metadata and Discovery Coordinators at the University of Leeds Libraries.
I’m going to be talking today about MarcEdit task lists and vendor-supplied metadata: revisiting the subscriber-publisher relationship at the University of Leeds. Specifically I’ll be going over a project that delivered a new process for creating MARC records for the British Film Institute’s subscription package of streaming videos.
So let me jump straight in to some quick background on this project. Like all other academic libraries, we needed to hugely increase our electronic provision after the first Covid lockdown in March 2020. Most of the pressure was on providing e-textbooks and e-books, but there were also many modules that suddenly needed access to streaming video. At our library the purchasing is handled by a separate Acquisitions and Reading Lists team, who were absolute heroes at identifying and negotiating with new suppliers, and my Metadata and Discovery team were responsible for handling the access and the discoverability of these resources once we had them.
At first we were handling streaming videos one by one from suppliers like Kanopy, until September of 2021 when we started subscribing to BFI’s institutional offer. For that first year we handled the records like any other subscriber and it wasn’t until the following August that we started looking into the options for providing these records to other subscribers. At the end of May we sent out our first batch of records to reflect all films that would be active on the platform as of early June. And now we’re over a year on from that initial negotiation and have been providing the records for four months.
So why did this vendor/subscriber relationship develop with the British Film Institute in particular?
We knew we wanted access to the BFI streaming video collection as it’s strong in areas that other suppliers aren’t, and their films are particularly needed on Japanese and history of film modules where foundational material is difficult to come by otherwise. There were some challenges, though, as their institutional subscription was new at the time and the streaming video service’s metadata format was very different to what we were used to as librarians. They also had a comparatively small collection which is updated very frequently compared to other more static collections. Most significantly to us, they didn’t offer MARC records, which meant that we spent a lot of time on maintaining our local collection of their films. But that became an opportunity for us to offer something back to them.
For the first year of our subscription we, like all the other subscribing institutions, received metadata for the streaming video collection’s films in the form of a very basic spreadsheet, originally 10 columns and upgraded to 12 columns in February 2022 with the addition of director names.
We used MarcEdit’s Delimited Text Translator to create bare-bones records, and then as an ExLibris library running Alma as our LMS we used our import profiles to run certain normalization rules on those basic records to get them into our system in semi-decent MARC shape. But then the remaining metadata that we needed wasn’t provided—so it was copied from the BFIPlayer streaming platform field by field into Alma’s metadata editor with each monthly update. And yes, this was as time-consuming as it sounds.
So, for about a year we were creating records by copying data mostly manually, in order to have an accurate view of the subscription contents and to facilitate reading list linking, which is helped by having a record for each title in a collection. BFI had asked about the process of us providing the records but it didn’t feel practical when the process was so time-consuming.
But as we went into our second year of subscription, with more familiarity with the metadata from BFI Player, more experience with MarcEdit and more confidence in the value and usefulness of our records to the community, we thought we would be able to take another look at the process of record creation.
To cut a long story and a lot of hard work short, our Acquisitions and Reading list colleagues were successful in negotiating a discount, so now we just had to create the records in a much more efficient way than we had been, we needed much more metadata to start with, and the records had to be an even higher standard if they were going to be shared with other institutions.
So just to illustrate the condition of the records we’d been creating by hand, this is the spreadsheet we were getting for the first year of our subscription. We had the title, an ID number, the start and end dates in two different formats, the country of origin, release year, two Genre terms and two Director names, though usually the films just had one director.
Again, all other metadata was copied by hand, one by one, for each record in a monthly update. The average monthly addition was 22 films, but in practice this meant some months had five additions and some months had fifty, so it was an unpredictable demand on our staffing resource.
Apologies for the small text! This is the spreadsheet that BFI now send us, ever since we started providing their records. It’s 56 columns wide and is a huge improvement on that previous spreadsheet. It’s now got all cast members in separate columns, has a unique system identifier from BFI, has runtime, rating, everything we need which previously had to be copied from the BFI Player site in order to fill out our records. It’s also got some additional data like original language titles in addition to English titles, which isn’t on the public BFI Player site, and has a direct URL instead of the general landing page URL we added before so reading lists now link straight to the film rather than to BFI Player where you would need to perform your search again.
The challenge was how to translate a 56-column spreadsheet into a MARC record.
So first can I ask for a show of hands if you use MARC? I think that’s many of us even if you also use archival standards at your library. And keep your hands up if you use MarcEdit?
Building on Claire’s great lightning talk about creating records for archival collections, we start with the MarcEdit import template, which brings that mammoth spreadsheet into MarcEdit through the Delimited Text Translator. That’s mostly a one-to-one import except a couple of places where data from a single column would be added to two fields—like the Cast columns are added both to a concatenated 511 and to individual 700 fields, and the year of release is added both to the 501 and the 046. Some fields are populated as placeholders as you can see in this record, like the 501 for the year of release will be moved to a 500 later, and the director is added to a 701 until we can add a relator term, then they’re moved to a 700 as well. Most of the data as you can see is pretty blunt, just years and dates and yesses and nos. So then we run the task lists.
So for anyone who doesn’t use MarcEdit much, its Task List functions let you join together lists of tasks that you always want it to run the same way. Instead of having to create the tasks from scratch and type out the replacements you want it to make every time, the program will remember the tasks and the order and you can run them all at once. It’s incredibly useful.
For our BFI data we run two task lists. This first set of tasks adds the Library of Congress fields for Short or Feature film depending on the run time in the 300. It’s a bit silly that it has to be a separate task list, but when I combined it with the second list, the second list started to delete the 856 field from every 13th record 🥴 so I decided to let MarcEdit win and stopped fighting it, and let the lists be separate.
This task list has a simple effect but uses a kind of logical puzzle to get there.
First it copies the text of the 300$a into a new 655 if the number in the 300 is over 40, which is the Library of Congress run time threshold for short films.
Then it replaces the text of any 655 with Feature Film, because it’s the only 655 in the record at this point.
It then adds a new 655 field for Short films to all of the records, then deletes that Short film field if a Feature films field is also present in the record. So it goes around the houses but gets there in the end.
The second task list does something similar for Fiction and Nonfiction films based on the presence of the genre term ‘Documentary.’
Here’s what our record looks like after running that first task list. The run time in the 300 field is 136 minutes, so this record needs a Feature Films field. The rest of the record is the same as before, only that new 655 has been added.
The second, more robust task list is 179 tasks long and handles all of the rest of the transformation from essentially a spreadsheet in MarcEdit form, into proper records. It does a few different types of tasks from really basic ones to more nimble ones that cross-reference multiple fields. I won’t get into every single one, but I am doing a poster session and am really happy to share the full list or go into detail if you’re curious about anything I don’t touch on in the next few slides.
So the basic tasks that I mentioned are the standard additions that every record needs and which aren’t conditional on any other fields and don’t involve altering the order or the content of the field text. So with these tasks we add things like the 006 and 007, the 336, 337 and 338, a 264 for BFI’s distribution, a 506 field to say that access is limited to within the UK, and a 588 field to indicate that we’ve constructed these records from vendor-supplied metadata. That’s all straightforward.
It then does some indelicate, brute-force amendments like the ones shown here. Library of Congress wants the Country of Production to be present in a 257 field and those countries should have standardized names according to the Name Authority File source, some of which are not how BFI provides them, so the task list converts them into the correct format. For example BFI might say the country of origin for a particular film was the USSR, but the NAF name is Soviet Union, so this set of tasks standardizes those in the 257 field. They stay in the original BFI phrasing in the 500 field which we’ll look at in a minute. Luckily the University of Leeds doesn’t have to make any geopolitical decisions here.
It then runs a similar batch of edits to delete initial articles from the Original Title field, the 130. MarcEdit isn’t smart enough to know what’s an article in the 130 so I added them in English, German, Spanish, French, and Italian, since they’re the bulk of the films in the collection. I also edited the indicators in the 245 for foreign-language titles in those languages, so there’s a margin of error there for other languages.
There’s another brute-force task to handle the language code in the 008. When we first received the bulk of BFI’s records in May, I went through that spreadsheet and pulled out every language represented in their collection regardless if the film was active on the streaming platform and added these all into the task list cross-referenced with the Marc language codes. I didn’t have to do anything clever with the 008 positions because my data is really consistent, so instead I could just search for any instance of ||und|| and replace it with the correct language code depending on the text of the 546. So in this top right example, any record in which there’s a 546 for Yoruba will perform a find and replace for every instance of ||und||, and because that only appears in the 008, it doesn’t have to be any smarter than that. This is the opposite of the beautiful machine learning that Alan described in yesterday’s keynote, we luckily already have the language so all we have to do is translate it into MARC. This would also be possible to do in Excel before importing the data but this is just how I chose to do it to limit the amount of manual manipulations we have to do with each new spreadsheet.
Since our first big upload of all active titles at the end of May, we’ve had new films in one or two languages not present in this list. But it’s simple enough to search the file for that ||und|| and visually check it against the 546. Some films really do have no dialogue so those are fine to leave, we just check there’s no language there, and replace it if need be.
Another task copies this new 008 language code into a 041 language field. The final language task in the screenshot on the bottom right amends the 546 text to add In [blank] with English subtitles, since all of the streaming video collection is accessible in English.
The list also does some other nifty things, sometimes needing regular expressions and sometimes not, to standardize the data provided in the spreadsheet. Earlier I mentioned the country of origin in the 500 field, so this will stay the way the BFI phrases it and a task changes the field from, for example, ‘United States’ to ‘Country of origin: United States’.
Same with adding ‘Access ends on’ to the beginning of the 506, which otherwise just has a date, adding ‘1 online resource’ and ‘mins’ to the 300 field, adding BBFC Certificate to the rating in the 521 field, changing Not Rated to ‘Certification status unknown,’ and tidying up the accessibility fields that previously just said Yes and No. If they say No, they’re deleted, and depending on the field, if they say Yes they’re amended to Closed Captioning or Audio Descriptions in English, and then all moved into the 532 field.
Here’s another look at how the task list manoeuvers data out of and into the 008 field in the screenshot on the top left. We’ve added the runtime to the 008 position 18 from the 300, but the 008 will only know what to do with that if it’s a three-digit runtime. So there are two more tasks underneath that, to add initial 0s if the runtime is only one or two digits.
There’s another task in the bottom screenshot to add relator terms depending on a genre field. As I said the 700 is initially only for actors, so the task will add a $e actor for all 700s, and then this task runs to change that subfield to on-screen participant if the Genre term in the 653 is Documentary or Short Documentary. The relator term ‘director’ is added to the placeholder 701 field, and then those are all moved to a 700 as well.
A final task in the screenshot on the right reverses the order of the names in all 700s to Last comma First comma relator term. This is not an exact science, as many Chinese and Korean names are already in the Last First order in the spreadsheet and the 511, but it takes less time to make those incorrect first and then correct them when we validate the headings. There’s potential here to add another task to only reverse the order of the names if the language codes KOR or CHI are not present in the 008.
So just to remind you what our record looked like after importing it using our import profile and running the first task list on it, that’s the example on the left. Many of the fields are still pretty indecipherable.
And after we’ve run the second task list, apologies for the small size, but you can see our record’s effectively doubled. Those accessibility fields are now analysable and also discoverable because they’re present correctly in the 341, 347, 532 and also the 655. The relator terms have been added and the 008 has been updated with runtime, year of distribution, year of release, country of distribution, language, form of item and type of visual material. The 700s have been reversed to Last name First name. And we’ve also added our local information into the 040 for when this record is shared with other institutions.
After this task list process is complete—which takes about ten seconds, in spite of how much talking I’ve just done—we do some spot checking to make sure it all appears as it should, and then we use MarcEdit to validate the 700 and 130 headings and correct most of them since the names and the original film titles are not in the Library of Congress authority format. The validation still takes the longest. We leave the subject headings as we’re given them from BFI, but indicate that they’re local headings. And then we create a much shorter and less detailed delete file which matches on the BFI unique identifier in the 024 to delete films whose access has expired, and we send those two files off to BFI to send to their subscribers.
So all of that was a huge amount of work to get done before we started providing the records to other institutions back at the end of May. But there have also been a few improvements since then as we continue to develop our relationship with BFI.
First, our major pain point was identifying films whose access had been extended. They wouldn’t be obvious in the spreadsheet since we add new films based on the access start date, but sometimes older films who had already expired would be extended without the start date being edited. So BFI have added an additional column to indicate which access dates have been edited in the past month, and this has made the process of identifying those extensions so much more efficient.
Another enhancement we’ve made is to start adding Library of Congress URIs to our validated 700 entries, thanks to MarcEdit again—there’s an option to do this when you’re validating the headings.
At the moment, we receive this spreadsheet of titles in the last week of the month and have a few working days to turn around the files before they’re then emailed out to other institutions. Most suppliers host files like these for institutions to download rather than emailing them out, so there’s a potential there for streamlining the supplying process. We’re also looking for more clarity on how many other subscribing institutions use Ex Libris Alma for their LMS like we do, because there’s potential to use Alma’s Community Zone to share records, but of course that’s only helpful with a critical mass of Alma users.
And finally BFI offer a package of freely available material in addition to the subscription films, and some subscribing institutions have expressed interest in getting records for those films as well. We’re currently in talks with BFI to understand the turnover and demand for those Free titles and may be able to offer this in future.
The biggest impact on my team is that a process that used to take at least a day, up to four days sometimes, is now maximum a few hours, including the time it takes to validate the Library of Congress authority headings.
It’s also been an incredibly useful exercise for us, both in terms of a huge stretch project for our team and my personal understanding of MarcEdit and regular expressions, and also in developing our knowledge of streaming video cataloguing standards.
We’ve been able to offer value to our fellow UK HE institutions who no longer have to create their own records in the painstaking way that we were. These MARC records are also now provided with no price increase for subscribers which is great and fairly rare I believe. Plus of course the discount for our own subscription is a bonus!
And of course it's had a positive impact for students and other catalogue users now that the records are provided faster and to a higher standard, especially for things like the accessibility fields which are much more user-friendly now.
From the BFI side it’s also been a useful partnership. We’ve gotten some really encouraging feedback from them, including this from Simone Pyne who is BFI’s Senior Business Development Manager.
Simone said “Our partnership with the University of Leeds has helped us to deliver a much-requested resource by our BFI player subscribing institutions. I have come to learn how crucial MARC records are in aiding discoverability, which is of the utmost importance to us, as our aim is for students and staff to use their BFI player subscriptions to engage with the cultural value of film and support their studies. We didn't have the expertise to create these records in-house, and the insight of the UoL team has been beyond valuable.”
So we’re really pleased to be ambassadors for library metadata standards and for MARC records, and of course also thrilled to have the feedback that the relationship is mutually beneficial!
Simone’s also happy to hear from anyone interested in the BFI Player institutional subscription, so please feel free to direct any questions or inquiries about that to her, her email’s simone.pyne@bfi.org.uk.
So there we have it! This project has been an incredibly useful exercise for me and my colleagues, but it’s also opened a door for us in terms of agency within our supplier-vendor relationships, and has provided value to our vendor BFI. We know there still areas for potential development to come, but for the time being we’re very happy with the work we’ve done so far.
If there are any questions I’m happy to answer them now, or I’ll be loitering around my poster for the next few breaks and sessions. Please do also feel free to email me with any other questions (or suggestions, especially if you’re one of the institutions who receive these records!)
Thanks so much for your time.
***if no questions, Library Juice courses on MarcEdit and streaming cataloguing taught me reg ex and the new fields we’d need to add***