Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Design for Context: Cataloging and Linked Data for Exposing National Educational Television (NET) Content


Published on

A poster presented by Rachel Curtis and Sadie Roosa at the Association of Moving Image Archivists Conference 2017.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Design for Context: Cataloging and Linked Data for Exposing National Educational Television (NET) Content

  1. 1. Sources of Information DESIGN FOR CONTEXT: Cataloging and Linked Data for Exposing National Educational Television (NET) Content An entry for Public Schools of England from the inventory of PBS titles donated in 1994. Feasibility of Using Linked Data to Expose NET Content and Aid Metadata Sharing The Library is currently working on a linked data feasibility report that uses records generated during this project as a dataset for conversion to RDF-standardized linked data. Linked data is a decentralized method of structuring data that can be re-used in other contexts, manipulated by machine processing, and linked to other data in a Web of semantically defined interlinked relationships. LC hopes to be able to exploit these capabilities of linked data to alleviate the challenges of sharing data by establishing titles as labels for content rather than making them authoritative identifiers for content as is the case in most cataloging and discovery systems where the concern is the indexing, collocation, and disambiguation of search results. RDF stands for Resource Description Framework, a data model for linked data that defines the functionality of linked data as both statements relating subjects to objects through predicates and also a graphical view of relations defined this way. Such statements take the form of SUBJECT-PREDICATE-OBJECT (these are frequently called triples). Each SUBJECT, PREDICATE, and OBJECT in linked data is ideally represented through a Uniform Resource Identifier (URI), which is an identifier that accomplishes a number of different things that are useful for resolving challenges with data sharing. The work of matching resources in different systems is resolved through matching unique identifiers rather than authorized headings, while the shift from records based metadata exchange to data-based exchange facilitates automated re-use of authorities, summaries, and other values populating descriptions of content. • EIDR (Entertainment Identifier Registry) IDs will be minted for NET titles to test the feasibility of using EIDR IDs to represent NET content at all levels of description as URIs, including archival description. • NET XML records will be converted to RDF-standardized linked data in order to provide an understanding of the workflow and challenges of this work for cultural heritage institutions, while providing a roadmap that others can use for their own projects. This project was funded by the Council on Library and Information Resources The primary source of information for this project comes from the NET microfiche (left). There are only three copies of this document, one of which was recently discovered at the Library of Congress. PBS compiled this document in the 1980s, well after NET ceased to exist. Winter Shanck (archivist at WNET) and Sadie Roosa (archivist at WGBH) transcribed the microfiche into a word document and shared it with all project partners. The Library also has an inventory of reels received from PBS in 1994 (see below). This donation included many NET titles, though they were not distinguished from the PBS materials. The NET catalogers at the Library of are comparing title entries on the spreadsheet to the microfiche to determine what falls under the PBS umbrella. Information for this project also comes from metadata recorded on the physical reels and tape held at the Library, and other internal inventories and databases. WGBH is researching copyright information from sources at the Library of Congress, Indiana University, and University of Wisconsin- Madison. The Challenges of Sharing Data An abbreviated entry for British Public School From the NET microfiche. Exporting Metadata Authorities Adding identifiers Devising a method of efficiently sharing data between two very different databases has been one of the biggest challenges of this project. The Library’s metadata is stored in MAVIS, their internal inventory database. MAVIS is a proprietary system that cannot easily export information. The Library exports their metadata in PBCore XML and adds an identifier in WGBH’s FileMaker database, which allows WGBH to match which records belong where. The compiled, authoritative data will eventually be exported from the FileMaker staging database and imported into the AAPB's Archival Management System (AMS). The matching work is being done manually, but through the application of Linked Data concepts, the feasibility report will illustrate how Linked Data concepts can be leveraged to address these problems. Under Sources of Information, notice how the same program has a different title in each source document. Title construction also differs between each institution’s database: MAVIS: NET Playhouse. A generation of leaves. [No. 1], American, Inc. AMS: Series: NET Playhouse Episode: A Generation of Leaves, Part 1: America Incorporated There are no identifiers unique to the titles themselves that would allow an automated approach. The NOLA codes (assigned by PBS after it absorbed NET) are not used consistently enough to fulfill this purpose. Titles must be reconciled manually. Records exchange Title matching Library staff add MAVIS ids to WGBH’s database. This allows the MAVIS XML exports to be matched to the correct record and minimizes manual entry of any other element to the AMS records. While MAVIS is not built on the PBCore schema, its metadata can be exported as PBCore XML. The Library sends batches of XML exports to WGBH. WGBH takes the PBCore xml exported from MAVIS and extracts the instantiations to add to the records in the AAPB’s AMS. Because pbcoreInstantiations are repeatable, one record in the AMS can have as many instantiations as are needed Once these are added, anyone researching in the AAPB will know that the copy of the program exists in the Library’s collection, what format it is recorded on, and the identifier that will help them locate the copy if they want to access it. They will also know about any other copies we record in the catalog. The NET catalogers at the Library do authority work to add names to credits. They work from the LCNAF, but sometimes have to create new authorities because of the obscurity of those involved with NET productions. As an added complication, MAVIS operates as a silo even within the Library. While the authorities do reference LCCN ids when available, they do not sync with LCNAF. WGBH catalogers originally stored names as LastName, FirstName. Using a reconciliation service through Open Refine, they have conducted authority work and added URIs to LCNAF entries for many of the NET records. Data exchange Metadata Reconciliation An example of how the holdings information will appear on the AAPB website.