Design for Context: Cataloging and Linked Data for Exposing National Educational Television (NET) Content
Sources of Information
DESIGN FOR CONTEXT:
Cataloging and Linked Data for Exposing National Educational Television (NET) Content
An entry for Public Schools of England from the inventory of PBS titles donated in 1994.
Feasibility of Using Linked Data to Expose NET Content and Aid Metadata Sharing
The Library is currently working on a linked data feasibility report that uses records generated during this project as a dataset for conversion to RDF-standardized linked data.
Linked data is a decentralized method of structuring data that can be re-used in other contexts, manipulated by machine processing, and linked to other data in a Web of semantically defined interlinked relationships. LC hopes to be able to exploit these capabilities of linked data to
alleviate the challenges of sharing data by establishing titles as labels for content rather than making them authoritative identifiers for content as is the case in most cataloging and discovery systems where the concern is the indexing, collocation, and disambiguation of search results.
RDF stands for Resource Description Framework, a data model for linked data that defines the functionality of linked data as both statements relating subjects to objects through predicates and also a graphical view of relations defined this way. Such statements take the form of
SUBJECT-PREDICATE-OBJECT (these are frequently called triples).
Each SUBJECT, PREDICATE, and OBJECT in linked data is ideally represented through a Uniform Resource Identifier (URI), which is an identifier that accomplishes a number of different things that are useful for resolving challenges with data sharing. The work of matching resources in
different systems is resolved through matching unique identifiers rather than authorized headings, while the shift from records based metadata exchange to data-based exchange facilitates automated re-use of authorities, summaries, and other values populating descriptions of
• EIDR (Entertainment Identifier Registry) IDs will be minted for NET titles to test the feasibility of using EIDR IDs to represent NET content at all levels of description as URIs, including archival description.
• NET XML records will be converted to RDF-standardized linked data in order to provide an understanding of the workflow and challenges of this work for cultural heritage institutions, while providing a roadmap that others can use for their own projects.
This project was funded by the Council on Library and Information Resources
The primary source of information for this
project comes from the NET microfiche (left).
There are only three copies of this document,
one of which was recently discovered at the
Library of Congress. PBS compiled this
document in the 1980s, well after NET ceased
to exist. Winter Shanck (archivist at WNET) and
Sadie Roosa (archivist at WGBH) transcribed
the microfiche into a word document and
shared it with all project partners.
The Library also has an inventory of reels
received from PBS in 1994 (see below). This
donation included many NET titles, though
they were not distinguished from the PBS
materials. The NET catalogers at the Library of
are comparing title entries on the spreadsheet
to the microfiche to determine what falls
under the PBS umbrella.
Information for this project also comes from
metadata recorded on the physical reels and
tape held at the Library, and other internal
inventories and databases. WGBH is
researching copyright information from
sources at the Library of Congress, Indiana
University, and University of Wisconsin-
The Challenges of Sharing Data
An abbreviated entry for British Public School From the NET microfiche. Exporting
Devising a method of efficiently sharing data between two very different databases has been one of the biggest challenges of this project. The Library’s metadata is stored in
MAVIS, their internal inventory database. MAVIS is a proprietary system that cannot easily export information. The Library exports their metadata in PBCore XML and adds an
identifier in WGBH’s FileMaker database, which allows WGBH to match which records belong where. The compiled, authoritative data will eventually be exported from the
FileMaker staging database and imported into the AAPB's Archival Management System (AMS). The matching work is being done manually, but through the application of
Linked Data concepts, the feasibility report will illustrate how Linked Data concepts can be leveraged to address these problems.
Under Sources of Information, notice how the same program has a
different title in each source document. Title construction also differs
between each institution’s database:
NET Playhouse. A generation of leaves. [No. 1], American, Inc.
Series: NET Playhouse
Episode: A Generation of Leaves, Part 1: America Incorporated
There are no identifiers unique to the titles themselves that would
allow an automated approach. The NOLA codes (assigned by PBS after
it absorbed NET) are not used consistently enough to fulfill this
purpose. Titles must be reconciled manually.
Library staff add MAVIS ids to WGBH’s database. This allows the
MAVIS XML exports to be matched to the correct record and
minimizes manual entry of any other element to the AMS records.
While MAVIS is not built on the PBCore schema, its metadata can be
exported as PBCore XML. The Library sends batches of XML exports
WGBH takes the PBCore xml exported from MAVIS and extracts the
instantiations to add to the records in the AAPB’s AMS. Because
pbcoreInstantiations are repeatable, one record in the AMS can
have as many instantiations as are needed
Once these are added, anyone researching in the AAPB will know
that the copy of the program exists in the Library’s collection, what
format it is recorded on, and the identifier that will help them
locate the copy if they want to access it. They will also know about
any other copies we record in the catalog.
The NET catalogers at the Library do authority work to
add names to credits. They work from the LCNAF, but
sometimes have to create new authorities because of
the obscurity of those involved with NET productions.
As an added complication, MAVIS operates as a silo
even within the Library. While the authorities do
reference LCCN ids when available, they do not sync
WGBH catalogers originally stored names as LastName,
FirstName. Using a reconciliation service through Open
Refine, they have conducted authority work and added
URIs to LCNAF entries for many of the NET records.
An example of how the holdings information will appear on the AAPB website.