NISO Webinar:
Experimenting with BIBFRAME:
Reports from Early Adopters
of Granular Discovery
Wednesday, April 8, 2015
Speakers:
Nancy Fallgren
Metadata Specialist Librarian, National Library of Medicine,
National Institutes of Health, US Department of Health and Human Services (DHHS)
Jeremy Nelson
Metadata and Systems Librarian, Colorado College
Nancy Lorimer
Head, Metadata Dept, Stanford University Libraries
http://www.niso.org/news/events/2015/webinars/bibframe_adopters/
Nancy Fallgren
Cataloging and Metadata Management Section, TSD
National Library of Medicine
National Institutes of Health
U.S. Department of Health and Human Services
NISO Webinar
April 8, 2015
Per LC Working Group on the Future of Bibliographic
Control, U.S. RDA Test Coordinating Committee, and
BIBFRAME primer --
3
Web based
Rule agnostic
Flexible
Extensible
Useful beyond the bibliographic
cataloging community
Broadly understandable
Usable
 LC Early Experimenters, October 2012-
November 2013
 LC Early Implementers Registration, April 2014
 Analysis of the Library of Congress’ published
BIBFRAME vocabulary using LC’s BIBFRAME editor
(BFE) and LC’s MARC2BF conversion
 Analysis of Zepheira’s BIBFRAME vocabulary using
its Scribe and MARC2BF conversion
4
5
 LC Early Implementers Registration update,
November 2014
 Develop a BIBFRAME vocabulary based on
generating new data, rather than legacy data
conversion
 Create flexibility and extensibility for broad
adoption with a ‘modular’ approach to BIBFRAME
 Develop a core BIBFRAME vocabulary that can be
extended with data elements or properties from existing
descriptive metadata schema
 BIBFRAME as a data interchange
6
7
Use the descriptive standards already
developed by resource experts
Not focused on any one existing standard
Connect equivalent data across
descriptive standards
Flex with changes to descriptive
standards
8
RDA
VRA
PressOO
EAD
MODS
Dublin
Core
Local
Schema
. . .
BIBFRAME Core
Data Elements
bf:title
bf:creator
bf:subject
bf:identifier
bf:date
. . .
1. Compared and mapped Zepheira’s BF Lite vocabulary to
PCC/RDA BIBCO Standard Record (PCC/RDA Core,
updated 2/2015) as applicable to print monographs
2. Removed PCC/RDA Core elements that we believed
would not be broadly used across cultural heritage
communities, e.g., date of Expression
3. Mapped the resulting list to RDA RDF and other schema
as needed (e.g., MODS RDF and Schema.org)
4. Added or revised definitions to enhance understanding
9
10
BF Property BF Definition RDA RDF/Other Scheme Property
bf:title Title of the resource rdaw:preferredTitleForTheWork
bf:startDate First date associated with the resource rdaw:dateOfWork
bf:language language(s) associated with the resource rdae:languageOfTheContent
bf:creator or
Agent + bf:role
An entity (e.g., person, organization, etc.)
associated with a resource
rdaw:creator, rdaw:otherPFCWork,
and other Work creator roles
bf:contributor or
Agent + bf:role
An entity (e.g., person, organization, etc.)
associated with a resource
rdae:contributor and other
appropriate contributor roles
bf:related a resource related to the origin resource rdaw:relatedWork
bf:authoritylink
Actionable IRI linking to an authoritative
controlled vocabulary rdaw:identifierForTheWork
bf:genre The 'is-ness' of the resource rdaw:formOfWork
bf:description Description of the content of the resource rdae:summarizationOfTheContent
bf:subject
a term or representative alphanumeric code which
captures the ‘aboutness’ of a resource modsrdf:subject
bf:audience
class of user for which the content of a resource is
intended, or for whom the content is considered
suitable rdaw:intendedAudience
 BIBFRAME Core is not
 Perfect
 Complete
 Print monographs only
 No items, holdings, or annotations
 BIBFRAME Core is good enough to test the viability of
a modular approach
 If a modular vocabulary approach is adopted, we
propose that a BIBFRAME Core vocabulary should be
developed iteratively and collaboratively by multiple
communities
11
 Collaborating with Zepheira and UC Davis to
design a BIBFRAME cataloging user interface
 Labels use RDA terminology, where it exists
 Catalogers add RDA RDF extensions from the RDA
registry as needed
 Extend BIBFRAME and RDA with other schema as
needed, e.g., modsrdf:subject
 Mapping to BIBFRAME takes place under the hood
12
13
Work Data
Preferred Title For The Work
Date of Work
Language Of The Content
Creator
Expression Data
+ Add data elements
Summarization Of The Content
Subject
Place Of Origin Of The Work
Contributor
+ Add data elements
14
Work Data
rdaw:preferredTitleForTheWork
rdaw:dateOfWork
rdae:languageOfTheContent
rdaw:creator
Expression Data
+ Add data elements
rdae:summarizationOfTheContent
modsrdf:subject
rdaw:placeOfOriginOfTheWork
rdae:contributor
+ Add data elements
BF Work
bf:title
bf:date
bf:creator
bf:subject
rdaw:placeOfOriginOfTheWork
bf:language
bf:description
bf:contributor
BF Instance
Jackie Shieh, George Washington University
 BIBFRAME Core vocabulary development and data
modeling
 Marshall Nirenberg proof of concept project
Zepheira and UC Davis / BIBFLOW project
 BF Lite / BIBFRAME Core vocabulary
 Kuali OLE cataloging module
 Cataloging user interface design
Library of Congress
 Parallel experimentation using BIBFRAME in the creation of
new bibliographic data, expected Summer 2015
15
16
BIBFRAME IS A
WORK IN PROGRESS
fallgrennj@mail.nlm.nih.gov
BIBFRAME for discovery;
BIBFRAME for production…
LINKED DATA FOR LIBRARIES; LINKED DATA FOR
TECHNICAL SERVICES PRODUCTION
Nancy Lorimer
Stanford University
Experimenting with Bibframe
(NISO webinar)
April 8, 2015
Linked Data for
Libraries
LOD CONNECTING ACADEMIC INFORMATION
RESOURCES
LD4L Project Outcomes
Create an open source extensible LD4L ontology for scholarly resources
◦ Encompasses traditional MARC metadata, non-traditional metadata from
digital repositories and special collections, joined with contextual elements
indicating community engagement
Create semantic editing, display, and discovery systems, that will
support incremental ingest from multiple information sources
Project Hydra based interface, supporting search across multiple LD4L
instances
Bibliographic
Data
• MARC
• MODS
• VRA
• EAD
Person Data
VIVO
ORCID
ISNI
VIAF
Usage Data
Circulation
Citation
Curation
Exhibits
Research Guides
Syllabi
Tags
LD4L Data Sources
Ontology guiding principles
Be sufficiently expressive to encompass traditional catalog metadata of
the 3 partners
Reuse appropriate parts of currently available ontologies rather than
building a new, self-contained ontology
Prioritize the ability to convert references within library metadata
records from “strings” to “things”
Seek out persistent global identifiers whenever possible
Conversion of bibliographic
data
MARC
Bibliographi
c Data
BIBFRAME
Non-MARC
Bibliographi
c Data
LC converter
LD4L conversion
Other ontologies
Library resources: BIBFRAME
Additional bibliographic types and partonomy relationships: FaBiO,
Music Ontology, Schema
People/Organizations: VIVO-ISF (includes FOAF)
Annotations: OpenAnnotation
Provenance: PAV
Virtual Collections and Structured Relationships: OAI-ORE
Concepts: SKOS (or vocabularies such as Getty with stable URIs)
Many identifiers: VIAF, ORCID, ISNI, OCLC Works
Use Cases
Annotations (Bibliographic + Curation data)
◦ 1.1 Build a Virtual Collection
◦ 1.2 Tag Scholarly Resources to Support Reuse
Authorities (Bibliographic + Person data)
◦ 2.1 Discover Works via People and their Relationships
◦ 3.1. Discover Works via Locations and their Relationships
◦ 3.2. Discover Works via Concepts and their Relationships
Linked Open Data (Leveraging External data)
◦ 4.1 Leverage the Deeper graph
◦ 5.1 Leverage Usage Data
◦ 6.1 Cross-Institution Discovery
Linking the Cornell University
Catalog and VIVO (Use Case 2)
Discover Works via People and their Relationships
See and search on works by people to discover works of interest based
on connection to people, and to understand people based on their
relation to works
 Demonstrate links between catalog and VIVO
 Round-trip from catalog to VIVO and back to catalog
 Sample data: Cornell thesis records
Linking hip-hop flyer data to
MusicBrainz/LinkedBrainz (Use case 4)
Use Case 4: Leveraging the deeper graph
…making use of complex graph relationships via queries or patterns
(rather than direct connections) to allow discovery that would not be
possible without the semantics of different relationships between items
and types of items included in the graph
Model non-MARC metadata to RDF
Use of LinkedBrainz URIs for performers to discover relationships to
other entities…
Ontology decision—Flyers
Ontology decisions—events,
performers
Ontology challenges
Limitations of an work-centric model (BIBFRAME)
Pulling in other vocabularies
Granularity distinctions
Instability of the BIBFRAME model/vocabulary
From Data Conversion to Data
Production
LD4L based on conversion from some other format
◦ Not about original cataloging (where will we get that data in the future?)
◦ Conversion is a complex process, with much post processing
◦ Internal links in MARC lack formal structure and convert poorly
The next logical step:
WORK NATIVELY IN BIBFRAME!
◦ Adapt technical services workflow to integrate linked data creation
Linked Data for
Technical Services
ProductionCREATING LINKED OPEN DATA WITH BIBFRAME
The Group of Six (Les Six)
Stanford
Cornell
Columbia
Harvard
Princeton
Library of Congress
Kuali OLE
Partners
Cataloging vendors
◦ What can they do to supply linked data?
◦ Can they enhance their MARC records to make conversion better
Authority vendors
◦ Can we send them a linked data graph and receive URIs?
General Acquisitions vendors
◦ EDI, shelf-ready?
Our ILS
◦ How do we link with our acquisitions data?
OCLC
◦ Sharing our data
Individual institutional
interests
Stanford
◦ Copy cataloging using vendor-supplied data
◦ Original cataloging in most formats
Cornell
◦ Hip-hop recordings, mostly non-commercial
Harvard
◦ Geospatial data
◦ Law
Columbia
◦ Art objects
Stanford’s focus
Copy cataloging
◦ Most of material comes this way
◦ Will still require conversion for some time to come, and that will require
manual remediation
◦ Who will do this?
◦ How do we link to ILS?
Original cataloging
◦ All our catalogers want to participate!
so…
◦ All formats!
CLOUD SPACE
The Cloudspace
Possible Outcomes
Specifications for needed infrastructure
◦ Cloud space
◦ Tools
Best practices for metadata creation
◦ Profiles
◦ Use of cataloging standards in the linked data environment
◦ How BIBFRAME coexists with other ontologies
◦ Metrics for assessment of effort, added value, staffing levels & skills
A large pool of native BIBFRAME data for developers to use
Thank you…
LD4L demos (from 2015 workshop):
https://wiki.duraspace.org/display/ld4l/LD4L+Workshop+Agenda
NISO Webinar • April 8, 2015
Questions?
All questions will be posted with presenter answers on
the NISO website following the webinar:
http://www.niso.org/news/events/2015/webinars/bibframe_adopters/
NISO Webinar
Experimenting with BIBFRAME:
Reports from Early Adopters
Thank you for joining us today.
Please take a moment to fill out the brief online survey.
We look forward to hearing from you!
THANK YOU

April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters

  • 1.
    NISO Webinar: Experimenting withBIBFRAME: Reports from Early Adopters of Granular Discovery Wednesday, April 8, 2015 Speakers: Nancy Fallgren Metadata Specialist Librarian, National Library of Medicine, National Institutes of Health, US Department of Health and Human Services (DHHS) Jeremy Nelson Metadata and Systems Librarian, Colorado College Nancy Lorimer Head, Metadata Dept, Stanford University Libraries http://www.niso.org/news/events/2015/webinars/bibframe_adopters/
  • 2.
    Nancy Fallgren Cataloging andMetadata Management Section, TSD National Library of Medicine National Institutes of Health U.S. Department of Health and Human Services NISO Webinar April 8, 2015
  • 3.
    Per LC WorkingGroup on the Future of Bibliographic Control, U.S. RDA Test Coordinating Committee, and BIBFRAME primer -- 3 Web based Rule agnostic Flexible Extensible Useful beyond the bibliographic cataloging community Broadly understandable Usable
  • 4.
     LC EarlyExperimenters, October 2012- November 2013  LC Early Implementers Registration, April 2014  Analysis of the Library of Congress’ published BIBFRAME vocabulary using LC’s BIBFRAME editor (BFE) and LC’s MARC2BF conversion  Analysis of Zepheira’s BIBFRAME vocabulary using its Scribe and MARC2BF conversion 4
  • 5.
  • 6.
     LC EarlyImplementers Registration update, November 2014  Develop a BIBFRAME vocabulary based on generating new data, rather than legacy data conversion  Create flexibility and extensibility for broad adoption with a ‘modular’ approach to BIBFRAME  Develop a core BIBFRAME vocabulary that can be extended with data elements or properties from existing descriptive metadata schema  BIBFRAME as a data interchange 6
  • 7.
    7 Use the descriptivestandards already developed by resource experts Not focused on any one existing standard Connect equivalent data across descriptive standards Flex with changes to descriptive standards
  • 8.
    8 RDA VRA PressOO EAD MODS Dublin Core Local Schema . . . BIBFRAMECore Data Elements bf:title bf:creator bf:subject bf:identifier bf:date . . .
  • 9.
    1. Compared andmapped Zepheira’s BF Lite vocabulary to PCC/RDA BIBCO Standard Record (PCC/RDA Core, updated 2/2015) as applicable to print monographs 2. Removed PCC/RDA Core elements that we believed would not be broadly used across cultural heritage communities, e.g., date of Expression 3. Mapped the resulting list to RDA RDF and other schema as needed (e.g., MODS RDF and Schema.org) 4. Added or revised definitions to enhance understanding 9
  • 10.
    10 BF Property BFDefinition RDA RDF/Other Scheme Property bf:title Title of the resource rdaw:preferredTitleForTheWork bf:startDate First date associated with the resource rdaw:dateOfWork bf:language language(s) associated with the resource rdae:languageOfTheContent bf:creator or Agent + bf:role An entity (e.g., person, organization, etc.) associated with a resource rdaw:creator, rdaw:otherPFCWork, and other Work creator roles bf:contributor or Agent + bf:role An entity (e.g., person, organization, etc.) associated with a resource rdae:contributor and other appropriate contributor roles bf:related a resource related to the origin resource rdaw:relatedWork bf:authoritylink Actionable IRI linking to an authoritative controlled vocabulary rdaw:identifierForTheWork bf:genre The 'is-ness' of the resource rdaw:formOfWork bf:description Description of the content of the resource rdae:summarizationOfTheContent bf:subject a term or representative alphanumeric code which captures the ‘aboutness’ of a resource modsrdf:subject bf:audience class of user for which the content of a resource is intended, or for whom the content is considered suitable rdaw:intendedAudience
  • 11.
     BIBFRAME Coreis not  Perfect  Complete  Print monographs only  No items, holdings, or annotations  BIBFRAME Core is good enough to test the viability of a modular approach  If a modular vocabulary approach is adopted, we propose that a BIBFRAME Core vocabulary should be developed iteratively and collaboratively by multiple communities 11
  • 12.
     Collaborating withZepheira and UC Davis to design a BIBFRAME cataloging user interface  Labels use RDA terminology, where it exists  Catalogers add RDA RDF extensions from the RDA registry as needed  Extend BIBFRAME and RDA with other schema as needed, e.g., modsrdf:subject  Mapping to BIBFRAME takes place under the hood 12
  • 13.
    13 Work Data Preferred TitleFor The Work Date of Work Language Of The Content Creator Expression Data + Add data elements Summarization Of The Content Subject Place Of Origin Of The Work Contributor + Add data elements
  • 14.
    14 Work Data rdaw:preferredTitleForTheWork rdaw:dateOfWork rdae:languageOfTheContent rdaw:creator Expression Data +Add data elements rdae:summarizationOfTheContent modsrdf:subject rdaw:placeOfOriginOfTheWork rdae:contributor + Add data elements BF Work bf:title bf:date bf:creator bf:subject rdaw:placeOfOriginOfTheWork bf:language bf:description bf:contributor BF Instance
  • 15.
    Jackie Shieh, GeorgeWashington University  BIBFRAME Core vocabulary development and data modeling  Marshall Nirenberg proof of concept project Zepheira and UC Davis / BIBFLOW project  BF Lite / BIBFRAME Core vocabulary  Kuali OLE cataloging module  Cataloging user interface design Library of Congress  Parallel experimentation using BIBFRAME in the creation of new bibliographic data, expected Summer 2015 15
  • 16.
    16 BIBFRAME IS A WORKIN PROGRESS fallgrennj@mail.nlm.nih.gov
  • 17.
    BIBFRAME for discovery; BIBFRAMEfor production… LINKED DATA FOR LIBRARIES; LINKED DATA FOR TECHNICAL SERVICES PRODUCTION Nancy Lorimer Stanford University Experimenting with Bibframe (NISO webinar) April 8, 2015
  • 18.
    Linked Data for Libraries LODCONNECTING ACADEMIC INFORMATION RESOURCES
  • 19.
    LD4L Project Outcomes Createan open source extensible LD4L ontology for scholarly resources ◦ Encompasses traditional MARC metadata, non-traditional metadata from digital repositories and special collections, joined with contextual elements indicating community engagement Create semantic editing, display, and discovery systems, that will support incremental ingest from multiple information sources Project Hydra based interface, supporting search across multiple LD4L instances
  • 20.
    Bibliographic Data • MARC • MODS •VRA • EAD Person Data VIVO ORCID ISNI VIAF Usage Data Circulation Citation Curation Exhibits Research Guides Syllabi Tags LD4L Data Sources
  • 21.
    Ontology guiding principles Besufficiently expressive to encompass traditional catalog metadata of the 3 partners Reuse appropriate parts of currently available ontologies rather than building a new, self-contained ontology Prioritize the ability to convert references within library metadata records from “strings” to “things” Seek out persistent global identifiers whenever possible
  • 22.
    Conversion of bibliographic data MARC Bibliographi cData BIBFRAME Non-MARC Bibliographi c Data LC converter LD4L conversion
  • 23.
    Other ontologies Library resources:BIBFRAME Additional bibliographic types and partonomy relationships: FaBiO, Music Ontology, Schema People/Organizations: VIVO-ISF (includes FOAF) Annotations: OpenAnnotation Provenance: PAV Virtual Collections and Structured Relationships: OAI-ORE Concepts: SKOS (or vocabularies such as Getty with stable URIs) Many identifiers: VIAF, ORCID, ISNI, OCLC Works
  • 24.
    Use Cases Annotations (Bibliographic+ Curation data) ◦ 1.1 Build a Virtual Collection ◦ 1.2 Tag Scholarly Resources to Support Reuse Authorities (Bibliographic + Person data) ◦ 2.1 Discover Works via People and their Relationships ◦ 3.1. Discover Works via Locations and their Relationships ◦ 3.2. Discover Works via Concepts and their Relationships Linked Open Data (Leveraging External data) ◦ 4.1 Leverage the Deeper graph ◦ 5.1 Leverage Usage Data ◦ 6.1 Cross-Institution Discovery
  • 25.
    Linking the CornellUniversity Catalog and VIVO (Use Case 2) Discover Works via People and their Relationships See and search on works by people to discover works of interest based on connection to people, and to understand people based on their relation to works  Demonstrate links between catalog and VIVO  Round-trip from catalog to VIVO and back to catalog  Sample data: Cornell thesis records
  • 29.
    Linking hip-hop flyerdata to MusicBrainz/LinkedBrainz (Use case 4) Use Case 4: Leveraging the deeper graph …making use of complex graph relationships via queries or patterns (rather than direct connections) to allow discovery that would not be possible without the semantics of different relationships between items and types of items included in the graph Model non-MARC metadata to RDF Use of LinkedBrainz URIs for performers to discover relationships to other entities…
  • 30.
  • 31.
  • 32.
    Ontology challenges Limitations ofan work-centric model (BIBFRAME) Pulling in other vocabularies Granularity distinctions Instability of the BIBFRAME model/vocabulary
  • 33.
    From Data Conversionto Data Production LD4L based on conversion from some other format ◦ Not about original cataloging (where will we get that data in the future?) ◦ Conversion is a complex process, with much post processing ◦ Internal links in MARC lack formal structure and convert poorly The next logical step: WORK NATIVELY IN BIBFRAME! ◦ Adapt technical services workflow to integrate linked data creation
  • 34.
    Linked Data for TechnicalServices ProductionCREATING LINKED OPEN DATA WITH BIBFRAME
  • 35.
    The Group ofSix (Les Six) Stanford Cornell Columbia Harvard Princeton Library of Congress Kuali OLE
  • 36.
    Partners Cataloging vendors ◦ Whatcan they do to supply linked data? ◦ Can they enhance their MARC records to make conversion better Authority vendors ◦ Can we send them a linked data graph and receive URIs? General Acquisitions vendors ◦ EDI, shelf-ready? Our ILS ◦ How do we link with our acquisitions data? OCLC ◦ Sharing our data
  • 37.
    Individual institutional interests Stanford ◦ Copycataloging using vendor-supplied data ◦ Original cataloging in most formats Cornell ◦ Hip-hop recordings, mostly non-commercial Harvard ◦ Geospatial data ◦ Law Columbia ◦ Art objects
  • 38.
    Stanford’s focus Copy cataloging ◦Most of material comes this way ◦ Will still require conversion for some time to come, and that will require manual remediation ◦ Who will do this? ◦ How do we link to ILS? Original cataloging ◦ All our catalogers want to participate! so… ◦ All formats!
  • 39.
  • 40.
    Possible Outcomes Specifications forneeded infrastructure ◦ Cloud space ◦ Tools Best practices for metadata creation ◦ Profiles ◦ Use of cataloging standards in the linked data environment ◦ How BIBFRAME coexists with other ontologies ◦ Metrics for assessment of effort, added value, staffing levels & skills A large pool of native BIBFRAME data for developers to use
  • 41.
    Thank you… LD4L demos(from 2015 workshop): https://wiki.duraspace.org/display/ld4l/LD4L+Workshop+Agenda
  • 42.
    NISO Webinar •April 8, 2015 Questions? All questions will be posted with presenter answers on the NISO website following the webinar: http://www.niso.org/news/events/2015/webinars/bibframe_adopters/ NISO Webinar Experimenting with BIBFRAME: Reports from Early Adopters
  • 43.
    Thank you forjoining us today. Please take a moment to fill out the brief online survey. We look forward to hearing from you! THANK YOU

Editor's Notes

  • #18 Linked data has been championed as the route to improve our users’ ability to find our resources via the web, and BIBFRAME was created as a means to expressing our bibliographic data in a linked data framework. It has been difficult, however, of us to see these promises in action, and to understand how BIBFRAME specifically, and linked data in general will impact library workflows. My talk today will focus on two collaborative linked data projects that Stanford is invested in, both of which make use of BIBFRAME for expressing library data. The first of these is Linked Data for Libraries, a Mellon funded initiative. The second is Linked Data for Production, a project focused on the changes needed in traditional Technical Services workflows, to work natively with linked data, that we will be developing over the course of this year.
  • #19 At the beginning of 2014, Cornell University, Stanford and Harvard were awarded a grant by the Mellon Foundation to create a Semantic Information Store model for scholarly resources. This model will work both within individual institutions and through a coordinated, extensible network of Linked Open Data to capture the intellectual value that librarians and other domain experts and scholars add to information resources. To that end the project is leveraging existing work from other projects, particularly the VIVO Project at Cornell and the Hydra Partnership.
  • #20 The planned outcomes of the Linked Data for Libraries project are: an open source LD4L ontology compatible with Bibframe and other LOD efforts, and that encompasses traditional MARC metadata, non-traditional metadata from digital repositories and special collections, joined with contextual elements indicating community engagement; open source LD4L semantic editing, display, and discovery systems that support incremental ingest from multiple information source; and a Project Hydra based interface, supporting search across multiple LD4L instances. In this talk, I will concentrate primarily on the LD4L ontology.
  • #21 LD4L data sources. LD4L seeks to link three large repositories of data about scholarly resources: 1. Bibliographic data, mostly in MARC, but also in MODS and other schemas; 2. Person data, much of it from faculty profiles retained by each institution (VIVO for Cornell, CAP for Stanford, FacultyFinder at Harvard), as well from identifiers, both local and international, of the those persons, such as VIAF, and ISNI; and 3., what we have loosely termed usage data or data that indicates active community engagement with a resource, such as circulation and citation statistics and the use of bibliographic and person data in various curatorial activities such as course guides, exhibits, and tagging.
  • #22 A number of guiding principles and goals have influenced the ontology work. First, the ontology must be sufficiently expressive to encompass traditional catalog metadata. Second, rather than build a new, self-contained LD4L ontology, the team would reuse appropriate parts of currently available ontologies. Making use as much as possible of ontologies that have already achieved significant adoption or are on the way to doing so, prevents continuous reinventing of the wheel, and lessens the need for crosswalking between ontologies. The team also is prioritizing the ability to convert references within library metadata records from “strings” or literals to “things” expressed by URIs, by adopting identifiers as a primary means of disambiguation; and finally to seek out persistent global identifiers whenever possible.
  • #23 Bibliographic data in the form of MARC records is by far the largest set of data that LD4L is working with. To make this data interoperable as linked data requires converting MARC to RDF using one or more ontologies. LD4L has focused first on Bibframe as the basis for the LD4L ontology. Bibframe is the leading contender as a replacement for MARC and seemed most likely able to provide a fairly complete conversion of MARC data. At the same time it seemed potentially flexible enough, to encompass other communication formats such as MODS and VRA, also commonly used in library catalogs. Also, the Library of Congress has developed a MARC-to-BIBFRAME converter, and it will likely remain a primary conversion tool for libraries as it is further refined. LD4L has explored pre-conversion record enrichment and post-conversion processing, primarily adding identifiers to MARC records to reduce the need for entity resolution during or after production and changes to the output RDF to produce better linked data. With non-MARC metadata currently being worked with, on the other hand, the team has needed to develop its own conversion development.
  • #24 A reminder: BIBFRAME is the basis for the LD4L ontology and the primary carrier of bibliographic data, but by necessity it makes use of other ontologies to integrate and enrich library bibliographic data with data from other sources and with real world objects. Some of these ontologies are inherited from the VIVO ontology, Cornell’s semantically based faculty profile system.
  • #25 The ontology and engineering work has been guided by a distinct set of use cases. 42 raw use cases or “stories” were developed, each illustrating a realistic user question about how to leverage scholarly data. These raw use cases were then reduced to 12 refined use cases in 6 clusters based on the necessary data. 24 initial use cases were initially developed and then refined down to 12 in 6 clusters, and finally grouped into 3 project phases, namely “Annotations”, “Authorities”, and “Linked Open Data”. Use cases 1.1 and 1.2 have focused more on the engineering end of things; but the other use cases have included ontology selection and design, as well as software design and pipelines to support conversion services and entity resolution. As a proof of concept, the ontology group has been developing pilot projects as demos based on the LD4L use cases. Preliminary results for these were presented at the project’s workshop at Stanford University back in February. The presentations for all these demos are available on the LD4L public wiki. Here I will simply give an illustration of the use of BIBFRAME in two use cases, Use case 4: Discover works via people and their relationships and Use case 4: Leverage the deeper graph and how Bibframe was supplemented from other ontologies as needed.
  • #26 Use case 2 is “Discover Works via People and their Relationships. We want to See and search on works by people to discover works of interest based on connection to people, and to understand people based on their relation to works. A demo for this was was developed by Rebecca Younes at Cornell, in which she links converted data from the Cornell University catalog with VIVO, the semantic faculty profile system. The trick is to connect the catalog data, which is bibliocentric, with the profile system, which is people-centered, and to be able to round trip between the two. The sample data were Cornell thesis records, all of which contain faculty advisors as access points.
  • #27 And here is a pictoral view of the graph created while to link the two datasets. Where I have circled, represents the MARC catalog record converted to Bibframe, with the thesis as a Bibframe work with two instances, one manuscript and one electronic.
  • #28 Up in the left hand side, the bf:Organization and the advisor, a bf:Person, are linked to VIVO URL through bf:Authority, then tot he real world objects through foaf. These in turn link back to other authored resources in the Cornell catalog.
  • #29 In addition, as you can see in the upper right, person data is also linked to international identifiers, allowing further connections, and the work is linked to and OCLC work. This closely links the two predominant data sources, while allowing the graph to link out to other data that may be out there.
  • #30 The second demo pilot I’ll introduce you to, is that for Use Case 4. Use case 4 is about “making use of complex graph relationships via queries or patterns (rather than direct connections) to allow discovery that would not be possible without the semantics of different relationships between items and types of items included in the graph. In this pilot, developed by Steven Folsom at Cornell, a set of flyers advertising hip-hop concerts is modeled and linked to LinkedBrainz, the linked data version of MusicBrainz, a rich source of recording issue data. The pilot required the modelling of non-MARC metadata to RDF, the metadata for the flyers having been created in ARTStor in a VRA-like schema. Then the performers and the performing groups would be linked to LinkedBrainz to discover further relationships. Many ontology decisions were necessary here—the flyers basically have as a subject a performance, which is an event, and multiple performers who might be connected to multiple musical groups, but who are otherwise unrelated to the actual flyer. It is a bit too complex to explain fully here, but I’ll just quickly show you the results of some of the decision making.
  • #31 First the flyers themselves, which became Bibframe works and instances (again one for the physical; one for the electronic). They are also mapped to multiple rdf:types, either Bibframe or AAT, the Art and Architectural Thesaurus terms. AAT terms were brought in as types, since they have a much higher degree of granularity than do Bibframe types, something that is important for art works. Then we run into the problem of the event, a concept whose place is rather unclear in Bibframe, and does not at this point seem to be able to act as a bf:subject, which is what it is here.. Instead, Steven has created an ld4l domain term of “hasEvent”. This might be replaced if Event is remodeled in Bibframe.
  • #32 The ld4l event, is further defined as being a Performance, a term borrowed from the Music Ontology. Terms for performers and performing groups are also taken from the Music Ontology, since this ontology defines relationships between performers and performing groups to which they belong, and is the basis of LinkedBrainz as well, and so will permit more straight forward queries. Schema.org vocabulary is then used to map information about the event’s venue—the club, address, geocoordinates, etc. So as you see, to get there we have moved well outside of the Bibframe ontology; this may well be necessary even within the library sphere to describe our resources sufficiently.
  • #33 These two pilots demonstrate some of the issues in developing the ontology and using Bibframe as its basis. Especially with the hip-hop flyers, it is clear there are limitations in describing some resources with a work-centric model such as BIBFRAME. Event-centric resources are difficult to model and mostly occurs outside of the Bibframe vocabulary (at least currently). Pulling in other ontologies can also be tricky since some basic BIBFRAME classes such as bf:Person and bf:Authority are not strongly defined, and thus may be interpreted in several ways. It is also difficult to know where the right point of crossover is from BIBFRAME to existing content standards such as RDA, or vocabularies such as AAT. Another issue is that all the ontology work is being done keeping in mind that the BIBFRAME model and vocabulary are still unstable. While the vocabulary has been frozen over the last year, we do know there are likely to be substantial changes to it in the next few months. And finally, conversion from either MARC or any other schema tends to be messy, resulting at times in missing or misascribed data.
  • #34 Conversion of data is a complex process, often with loss of data and a requirement for extensive post processing of the data. Converting MARC to linked data is especially difficult in that internal links in the record often lack formal structures, but rather depend on human interpretation, and thus convert poorly. Think of a MARC record for a sound recording containing multiple works. While there may be access points for specific performers and works, and notes that describe the specific links between each, there is no machine-actionable linking to convert the data cleanly. The work of Linked Data for Libraries concentrates on leveraging existing data from various scholarly resources and to better link and expose them through linked data, freeing data from restrictive silos. The emphasis is on the conversion of the data from some other format. It is not about original cataloging, nor does it move the cataloger out of working in our traditional ways. Given the drawbacks in relying on conversion, and our own wish to move beyond the limitations of MARC, it seems the next logical step would be to work natively in a linked data environment. To do so we will need to adapt our current technical services workflows to integrate linked data creation.
  • #35 And so we come to our second project, Linked Data for Technical Services Production or LD4P. LD4P is in part an outgrowth of LD4L, and a response to the messiness of converting MARC data to BIBFRAME, but also a move toward creating our own linked data assertions within our technical services environments. If we are to use Linked data in general or Bibframe specifically we have to viable workflows that allow us to contribute to the data graph, but still link us (at least for now) to all things our current MARC environment does in a database structure—bring in vendor records, record acquisitions and payment data, circulate materials and everything else that is now centred around the MARC record in our ILSes. We have been looking at Bibframe in isolation, but we now need to see how it works in the library workflow environment.
  • #36 Phil Schreur, Interim AUL for Technical Services at Stanford has brought together a group of 6 academic libraries to focus on technical services and linked data, with representation from Kuali OLE (an open source library management system). Phil and I soon adopted the nickname Les Six, after a loosely knit group of 20th century surrealist composers, who you see picture above, with their mentor Jean Cocteau. The participants will explore production in a variety of formats, and institutional workflows, develop and/or improve tools for linked data cataloging, test and augment as necessary the Bibframe vocabulary, whether by augmenting BIBFRAME itself, or bringing in vocabularies from other ontologies. We are currently in the planning stage of a grant proposal for a 12 month grant focused on the transition of the nuts and bolts of our technical services workflow to one based in LOD. We do realize that this is an enormous task that cannot be completed within the time frame of a single grant, but we hope that by forcing ourselves to face the actual needs of production, we will haven been able to develop at very least some explicit statement of needs for the infrastructure and tools that we need. In some ways, the musical-historical reference to Les Six is apt. They were forward-looking and experimented with new musical frameworks. They were styled by journalists as avant-garde, along with the surrealist and cubist artists and poets of the time, though in reality they all followed disparate musical ideas of various levels of audacity. The photo you see on the slide, by the way, is of five members of the group, plus the artist Jean Cocteau, who acted as a sort of mentor. The sixth member of the group, Georges Auric is represented by that drawing on the wall above.
  • #37 So much of what we do in Technical Services is outsourced fully or partially to essential partners, and we all have agreed that they need to be part of this project one or another from the beginning if it is to work. To that end we have already begun outreach to potential partners—Casalini, MARCNow and Backstage Library Works for vendor cataloging; BackStage, LTI for authorities; EDI vendors such as Harrassowitz; our ILS’s—to see how they can help us. So many questions. Can our cataloging vendors enhance MARC records for better conversion; or better still can they provide linked data? Can we send our authority vendors a linked data graph and receive URIs in return? How do we work with acquisitions partners like Harrassowitz, for EDI, shelf-ready materials. And what can our ILS’s do? Can they ingest linked data? Besides these links to tech services vendors, we also want to share our data with others. It is linked open data after all. So it is important to explore how we can interact with OCLC, and standards groups such as PCC and domain-related experts such as the Music Library Association.
  • #38 Early on, we decided the 6 institutions are very different from one another, despite all being large academic libraries, and each has different interests to pursue and staff to pursue them. So instead of trying to decide on one domain or workflow for all of us to work on together, we decide to develop one common cloud environment to work in and follow our own interests there. This has the added benefit of being much more a real world test than if we had set heavy restrictions. Currently, we cover these interests, though as we are still in the early days of our exploration, they are subject to change
  • #39 A little more about Stanford’s focus. First, we intend to explore integrating linked data with vendor-supplied copy cataloging. This will be only a partial shift to our goal of native linked data creation, since for some time to come, at least, we will need to convert supplied MARC records, and perform some manual remediation to make them useful. But since the majority of materials come to us this way, it is important that this workflow can shift to the new environment. It is also an interesting problem in that most of the processing of these materials are done by non-professional staff in our Acquisitions Department; we have to explore who will do the remediation work. We also will explore how to connect our linked data with the ILS acquisitions data, which will necessarily still be in MARC. Our second focus is on original cataloging. Our original catalogers are all fairly well versed in linked data and Bibframe (to the extent they can be), and all want to participate. They are also quite adamant that we work in all formats—it is more realistic and treats all formats equally. It will not be easy…
  • #40 While we are working on separate projects, the LD4P group also realized there was a very important need common to us all, the need for a cloud space in which to work, and designing this shared space will be one of the most important parts of the project. It will hold our common data, the tools with which we work, and have links to local triple stores for the preservation of our data. It is likely that this local triple store will what be used to power discovery in our local institutions and will have to link to the ILS for information such as circulation. One institution is also intending to test working completely within a partitioned area of the cloud space to explore the possibility of that model for institutions that cannot support a local triple store. We will be developing the requirements for this space in the coming months.
  • #41 So what are the possible outcomes of this project? Obviously, we will not be able to develop a full-fledged infrastructure and workflow for a linked data technical services. We hope that we may be able to provide some required specifications for needed infrastructure—for the cloud space, for tools (including any new tools needed), and how these all might interact. We also hope to at least lay the basis for some best practices for metadata creation—beginning development of profiles for various formats, how current cataloging standards work in an open linked data environment, how Bibframe will coexist and combine with other ontologies, and—important for our administrators, some metrics to assess the effort, added value, needed staffing level & skills necessary. Finally, we intend to make available a large pool of native BIbFRAME data that we and other developers can make use of in future explorations. With our pilot projects, the Group of Six hopes to seriously begin to develop workflows, tools, and infrastructure to implement Bibframe and other linked data into the Technical Services workflow. These will be practical pilots, dealing with the day to day workflow of tech services staff. It will also serve as a complement to the Linked Data for Libraries project, balancing the end user empowerment of semantic discovery, with the behind the scenes creation of linked data in Tech Services. In creating native BIBFRAME-based data, and not being reliant on inadequate transformations of outdated data models, we hope to move ourselves out of the MARC universe into the alternate and much larger, if more surreal, universe of linked data.