LINKED DATA: WHY
BOTHER?

JENNIFER BOWEN, UNIVERSITY OF
ROCHESTER
NOTSL MEETING, KENT STATE
UNIVERSITY
NOVEMBER 22, 2013
My Topics Today
2








The ―Vision‖ piece: Why should libraries care
about linked data?
A few linked data use cases for libraries
Can libraries achieve their metadata-related
goals WITHOUT linked data?
Lessons learned from developing the
eXtensible Catalog and what that has to do
with linked data
eXtensibleCatalog.org
3
XC User Research
Partners:
Cornell University
Ohio State University
University of Rochester
Yale University

4

Studying scholars at the UR…
Scholars want to read everything
on the topic that they are researching
5
6

They want to be in the middle of
everything they need, all organized
so it is findable and usable
Scholars want their research
to be findable and usable by others.
7
“These other researchers cite
MY research…”
8
Scholars want to connect to people
whose work is interesting and useful to them.
9
Scholars don’t care what the technology is,
as long as it helps them do their work
10
A shift in how people
seek and use information
11







Systems that libraries provide (websites,
catalogs, databases) are bypassed
…not just in favor of Google and the Web in
general
…but also in favor of tailored desktop, mobile,
and web applications
Beyond library finding tools
12

―Even scholars who continue to use library
finding tools are turning to new
applications to aggregate and analyze
information in ways that extend their
scholarship beyond what manual
searching and analyzing allows.‖
-- Nancy Fried Foster
Senior Anthropologist, Ithaka S+R
Vision for how to address this…
13



Make library resources discoverable on the
open web, through applications that potential
readers are already using:
Search engines
Mobile apps
Social media
AN EXAMPLE…
An example…Mt. Hope Cemetery
15

Photo credits: ROCHESTER’S SPEAKING STONES By Th. Emil Homerin; University of Rochester
Department of Religion and Classics
http://www.rochester.edu/College/REL/faculty/homerin/REL167/reports.htm
An example…Mt. Hope Cemetery
16

Photo credit: www.findagrav.com/cgibin/fg.cgi?page=pv&GRid=31&PIpi=76
016
17

Photo credits: University of Rochester. River Campus Libraries. Department of Rare Books and Special
Collections. http://www.lib.rochester.edu/index.cfm?PAGE=4119
What’s the role of linked data?
18





Tools like this are possible today with
dedicated programming.
Linked Data will enable library resources to be
included in applications like this by allowing
application developers access to a “…a store
of machine-actionable data on which improved
services can be built”. (Linked Open Data
value statement)
THREE INITIATIVES
RELATED TO LINKED
DATA AND LIBRARIES
Stanford Linked Data Workshop (2011)
20

Linked Open
Data Value
Statements
http://www.clir.org/pubs/reports/pub152/LinkedData
Workshop.pdf
Linked Open Data Value
Statements
21










Linked Open Data (LOD) puts information
where people are looking for it: on the web
LOD can expand discoverability of our content
LOD opens opportunities for creative
innovation in digital scholarship and
participation
LOD allows for open continuous improvement
of data
LOD creates a store of machine-actionable
http://www.clir.org/pubs/reports/pub152/LinkedData
data on which improved services can be built
Workshop.pdf
More Linked Open Data Value
Statements
22





Library LOD might facilitate the breakdown of
the tyranny of domain silos
LOD can provide direct access to data in ways
that are not currently possible, and provides
unanticipated benefits that will emerge later as
the stores of LOD expand

http://www.clir.org/pubs/reports/pub152/LinkedData
Workshop.pdf
Another library linked data
initiative: BIBFRAME
23

www.loc.gov/bibframe/
What is BIBFRAME?
24





Library of Congress-led effort to replace
MARC 21 with a new bibliographic model
based upon linked data
―Determine a transition path for the MARC 21
exchange format in order to reap the benefits
of newer technology while preserving a robust
data exchange that has supported resource
sharing and cataloging cost savings in recent
decades.‖
More goals of LC’s BIBFRAME
25







Differentiate between conceptual content and
physical manifestations (works and instances)
Focus on unambiguously identifying
information entities (e.g. authorities)
Leverage and expose relationships between
and among entities

http://bibframe.org/
26

Potential Issues with
BIBFRAME






Conceptual model doesn’t fully conform to
either FRBR or RDA (e.g. no ―expression‖
level) – is this a problem?
Will organizations that have already
implemented linked data use BIBFRAME once
it is finished?
Do we really need a new serialization of
MARC dictated by LC?
http://bibframe.org/
Let’s get a little more specific…

WHAT CAN LIBRARIES ACTUALLY
DO WITH LINKED DATA?
62 Use Cases for Library Linked Data!

―The mission of the Library Linked Data incubator
group is to help increase global interoperability of
library data on the Web, by bringing together people
involved in Semantic Web activities—focusing on
Linked Data—in the library community and beyond,
building on existing initiatives, and identifying
collaboration tracks for the future.‖
28
W3C Library Linked Data (LLD) Incubator
Group Use Case Areas
29












Bibliographic data
Authority data
Vocabulary alignment
Archives and heterogeneous data
Citations
Digital objects
Collections
Social and new uses
Source: Library
Linked Data Use
Cases
30

Some Sample Use Cases

W3C Library Linked Data Incubator Group:
http://www.w3.org/2005/Incubator/lld/XGR-lld-usecase-20111025/
31

Bibliographic Data Use Case:
Deduplication and Unification of Library
Records








Enable matching not based upon data from a
central provider
More reference data for
matching/deduplication would be available
openly for any library to use
Non-MARC metadata also need deduplication
and unification
Using linked data would result in more trusted
matches, more opportunities to automate the
Source: Library
matching process
Linked Data Use
Cases
Deduplication/Merging of Metadata With and
Without Linked Data

Using
Linked
Data

32

Record for
Resource
B:

Match Point
1
Match Point
2

Deduping
Records

Record for
Resource
A:

Match Point
2

Graph for
Resource A
URI:
URI:
URI:
URI:
URI:

Graph for
Resource
B
URI:
URI:
URI:
URI:
URI:

If records match
on a designated
match
point, one
record overlays
the other or a
merge algorithm
can keep data
from both
records
Algorithm
could look at
all URIs
representing
two resources
to determine a
―match‖ and
combine all
URIs into a
single graph
Authority Data Use Case: Authority
Data Enrichment (VIAF)
33









Enrich already existing authority data with
additional information from external data sets
by linking instead of copying & merging
Enables VIAF (Virtual International Authority
File) to be expanded with huge amounts of
data from all over the world
Align different representations of the same
real-world resource
Linked data allows the usage of remote data in
applications
Source: Library
Linked Data Use
Cases
Vocabulary Alignment Use Case:
Vocabulary Merging
34







Users expect to be able to search for subjects
using their own language and terms in an
unambiguous, contextualized manner.
Linked Data technologies could provide the
underlying infrastructure by semantic mapping
or merging of concepts across vocabularies.
Allow vocabularies defined by different
sources to organize (classify, index ...) legacy
data to be used together
Source: Library
Linked Data Use
Cases
35

http://aep.lib.rochester.edu/hom
e
36
Keyword =
―arrow‖

37
LCSH via
id.loc.gov
38
Vocabulary Merging:
Rochester AIDS Posters vs. LCSH
39

Arrow
[URI for UR’s
vocabulary AIDS
poster terms]

Same as

Arrow (Symbol)
id.loc.gov/authorities/
subjects/sh20130005
24
Archives and Heterogeneous Data
Use Case: Semantic Connections
40





A group of archives would like to better share
information about their holdings. They have
separate catalogs and these catalogs do not
necessarily use the same data formats.
Exporting and sharing their data in Linked
Data format would allow them to make
connections between the collections using
topics, names, place names, and other
information contained in their metadata.
Source: Library
Linked Data Use
Cases
41
LCSH: AIDS (Disease)-Prevention

42
UR Local vocabulary: AIDS
Prevention

43
44

Rochester AIDS Posters vs. UCLA
AIDS Posters: Semantic Connections

AIDS prevention
[URI for UR’s
vocabulary AIDS
prevention]

Same as

AIDS (Disease)—
Prevention [URI for
LCSH term]
Social and New Uses Use Case:
Search Engine Optimization
45



Make library data searchable through Web
search engines by:
 Adopting

an architecture that is compatible with
web crawling by bots, and
 Optimizing the available content so that search
engines can process it efficiently


Adding structured metadata (e.g. RDFa) to
library online catalogs could increase the
visibility and accessibility of their data.
Source: Library
Linked Data Use
Cases
46

―…the entire publicly available version of WorldCat
is now available for use by intelligent Web
crawlers, like Google and Bing, that can make use
of this metadata in search indexes and other
applications. ‖
LET’S TURN
EVERYTHING ON ITS
HEAD…
48

Envisioning The Future
Without Linked Data

Or,
What we learned from developing
eXtensible Catalog (XC) software
What is XC software?
49

eXtensible Catalog (XC) is open
source, user-centered, next generation
software for libraries.

XC provides a discovery system and
a set of tools for libraries to manage
metadata and build applications.
eXtensible Catalog Funders and
Contributors
50

Major Funding

Andrew W. Mellon Foundation
Major Contributors

Consortium of Academic and Research Libraries
in Illinois (CARLI)
Kyushu University

University of Rochester
Why Did We Build XC?
51

Empower libraries to have control
over their discovery environment
Put results of user research
into practice
Extremely customizable user
interface
Why Did We Build XC?
52









Create a new
metadata
management
platform
Implement a
FRBR-based
record structure
Facilitate RDA
implementation
Repurpose
MARC 21
records
―FRBRized‖ MARC records
53

Parsing MARCXML records into linked
FRBR-based XC Schema records

XC
Work

Work Expressed
XC
Expression

MARCXML
Bibliographic

―Uplink‖= Record ID of
the parent record
created during OAIPMH harvest.

Expression Manifested
XC
Manifestation
Facilitating RDA Implementation
XC transforms MARC data into a
FRBR-informed ―transitional‖ XML
schema
The ―XC Schema‖ uses a subset of
RDA elements and roles alongside
Dublin Core, some XC data elements
More RDA elements can be added to
the schema in the future
54
Repurposing MARC 21 records
55

Converts MARC codes to vocabulary
values
 Removes extraneous data
 Normalizes inconsistencies
 Maps most MARC fields/subfields
and parse to appropriate FRBR
Group 1 entity records

56

How XC Software Works

(in a nutshell…)
How XC software works
57









Harvests a copy of metadata records in an
existing repository
Processes (cleans up, transforms) those
records
Makes records available for use in other
applications
Synchronize records in XC with records in
original repositories
…it’s all about metadata records!
eXtensible Catalog Architecture
58

Drupal

MST

OAI

NCIP

Toolkit

Toolkit

Toolkit

Toolkit

Metadata Services
- Cleanup
- Format Convert

ILS Connectivity
Synchronize
data with XC

User Interface
- Search
- Browse

ILS Connectivity
- Circ. status
- Account info

ILS ―Driver‖

Digital
Repository
ILS
User Interface
Metadata Live Circ. Data

ILS ―Driver‖
eXtensible Catalog Architecture
59

Insert your
Application
with OAI-PMH
Toolkit
Harvester
User Interface
here!
- Search

Drupal
- Browse

MST

OAI

NCIP

Toolkit

Toolkit

Toolkit

Metadata Services
- Cleanup
- Format Convert

ILS Connectivity
Synchronize
data with XC

ILS Connectivity
- Circ. status
- Account info

ILS ―Driver‖

Digital
Repository
ILS
User Interface
Metadata Live Circ. Data

ILS ―Driver‖
60

What we learned from ―FRBRizing‖
MARC in a live production system

…three
issues…
―FRBRizing‖ MARC records
61

Parsing MARCXML records into linked
FRBR-based XC Schema records

XC
Work

Work Expressed
XC
Expression
MARCXML
Bibliographic

―Uplink‖= Record ID of
the parent record
created during OAIPMH harvest.

Expression Manifested
XC
Manifestation
Linked Work, Expression
and
Manifestation Records in
XC

62
63
―Uplinks‖ between FRBR levels

64
65

Issue 1: Managing
Relationships
Parses MARCXML records into
linked FRBR-based records
How many FRBR entity
relationships
can we support with MARCXML
Bibliographic
XC software?
―Uplink‖= Record ID of
the parent record
created during OAIPMH harvest.

XC
Work

XC
Expression

XC
Manifestation
66

Issue 1: Managing
Relationships
MARC bibliographic records can refer to
multiple FRBR entities of the same type
(analytics that represent multiple
works/expressions, e.g. tracks on a CD)
XC
XC
Work
Work

XC
Work

XC
XC
Expression
Expression

XC
Expression

MARCXML
Bibliographic
XC
Manifestation
Issue 2: Beyond FRBR Group 1
Entities
67

MARC ―Alternate Graphic Representation‖ (880
fields) can contain data that belong in records
for Group 2 and Group 3 entities
Contributor:
700 1 ‡6 880-08 ‡a Vasil’ev, Maksim.
880 1 ‡6 700-08 ‡a Васильев, Максим.
Subject:
600 10 ‡6 880-06 ‡a Putin, Vladimir Vladimirovich, ‡d 1952880 10 ‡6 600-06 ‡a Путин, Владимир Владимирович, ‡d
1952-
Issue 2: Beyond FRBR Group 1
Entities
68

If we were to parse this 880 data correctly, we
would need to create and link to two additional
records for Contributor and Subject that include
the alternate scripts
Contributor
Subject

(alternate forms from 880)
•Contributor in Cyrillic
characters
•Contributor in Roman
characters

(alternate forms from 880)
•Subject in Cyrillic
characters
•Subject in Roman
characters

XC
Work

XC
Expression

MARCXML
Bibliographic
XC
Manifestation
69

Issue 3: Related Group 1
Entities
Language attribute for a related expression
041 1 ‡a eng ‡h ita
100 0 ‡a Dante Alighieri, ‡d 1265-1321.
240 10 ‡a Divina commedia. ‡l English
245 14 ‡a The divine comedy / ‡c Dante ; a new
verse translation by C.H. Sisson.
500
‡a Translation of: Divina commedia.
Managing Relationships
70

If we were to parse the original language from
041 ‡h, we would need to create and link to
another ―based on‖ expression record (if we
even have enough information to create it)
Contributor
Subject

(alternate forms from 880)
•Contributor in Cyrillic
characters
•Contributor in Roman
characters

(alternate forms from 880)
•Subject in Cyrillic
characters
•Subject in Roman
characters

XC
Work

XC
Expression

Based on
(Expression)
– from 041 ‡h

MARCXML
Bibliographic
XC
Manifestation
71

What XC has taught us about
FRBR…


The GOOD news: MARC data is very rich,
and contains data about MANY relationships
described in FRBR and related data models

There are hundreds of
RDA Relationships
between FRBR
entitles!
What XC has taught us about FRBR
72

Maintaining links between separate
FRBR entity records in a production
environment is likely not scalable if we
continue to manipulate records.
XC
Work

•new records
•changed
records
•deleted
records
•changed
relationships

XC
Expression

XC
Manifestation
73

What XC has taught us about
FRBR…


The GOOD news: MARC data is very
rich, and contains data about MANY
relationships described in FRBR and related
data models



The BAD news: managing all of these
relationships in a record-based system is
probably not feasible
RDA Implementation Scenario 1
(2007)
74
XC AND LINKED DATA:
OUR ―AHA!‖ MOMENTS!‖
Our first ―Aha! Moment‖
76



It would be much easier to
―FRBRize‖ MARC data using
Linked Data than by creating and
maintaining links between
separate metadata records that
have FRBR-related relationships
to each other!
A Second ―Aha‖ Moment!
77

Creating Linked Data triples that refer to
FRBR entities would be more meaningful
than creating triples that refer to MARC
records
XC handles the interim step, of converting
MARC data to FRBR entities
RDF triple
78

Subjec
t
This resource

Predicat
e

Object

has creator
J. K. Rowling
With and without FRBR
79












Without FRBR:
<MARCBibRecord-number> has_author ―J K Rowling‖
With FRBR:
<Work-id> has_creator ―J K Rowling‖
<Expression-id> has_language ―English‖
<Expression-id> has_parent_work <Work-id>
<Manifestation-id> has_isbn <ISBN-number>
<Manifestation-id> has_parent_expression <Expression-id>
80

Why use FRBR for Linked
Data?






User research shows that users want to see
the relationships between resources, etc.
With XC, we can explore when/how FRBR
might be useful for linked data
Other data models may be more appropriate in
some contexts and those can be explored as
well.
Another not-quite ―AHA! Moment‖…
81

XC can serve as an interim step to create
Linked Data because XC’s underlying
schema uses elements from registered
element sets (i.e. data elements already
have URIs)
RDF Triple - Registered Data
Elements
82

Subjec
t
oai:mst.rochester.edu:
MST/
MARCToXCTransformatio
n/
10081

This resource

Predicat
e

Object
http://id.loc.gov/authoritie
s/sh85103735#concept

http://www.
extensiblecatalog.in
fo/Elements/subject

has subject

Poets, America
n
XC Schema Properties
83

DC







Dublin Core terms (all)
RDA – subset of elements and
role designators
XC elements (newly-defined) –
when necessary
All properties are from
registered element sets and
thus already have URIs

RDA

XC
84

XC and Linked Data: What’s
Next?
XC facilitates associating metadata with FRBR
Group 1 entities using data elements (mostly
from RDA and Dublin Core)
Implementing FRBR may help us create more
meaningful Linked Data in some situations
How can we make XC actually output Linked
Data?
http://estc.bl.uk/

85
http://estc21.wordpress.com
/

86
eXtensible Catalog Architecture
87

Drupal
Toolkit
New ESTC
User Interface
Interface to be
- Search
built on Collex
- Browse
software

MST

OAI

NCIP

Toolkit

Toolkit

Toolkit

Metadata Services
Metadata Services
- - Cleanup
Cleanup
- - Format Convert
Format Convert

ILS Connectivity
Synchronize
data with XC

ILS Connectivity
- Circ. status
- Account info

ILS ―Driver‖

Digital
Repository
ILS
User Interface
Metadata Live Circ. Data

ILS ―Driver‖
ESTC Linked Data Benefits
88












Make data available for computational use
Transform data back to MARC for reuse in library
systems
More granularity of data (e.g. date ranges)
Collect new types of information, some not
supported by MARC
Incorporate information from other projects (VIAF)
Make ESTC data more amenable to reuse by
other projects, including discrete bits of data
http://estc21.wordpress.com/data/
LINKED DATA
CHALLENGES (WHY WE
SHOULDN’T CREATE
LINKED DATA?)
―We won’t be able to control our data!‖
90
Linked OPEN Data?
91









How much data to make available?
Concerns about jeopardizing future business
models
Can we predict now how much data will be
needed to fulfill future use cases?
Metadata licensing issues
Rights management
How will we assess quality?
92






Provenance: where did this data come from?
Should ―triples‖ become ―quadruples‖ so we
can tell ―who said this‖?
Is the data accurate?
How can we maintain/improve
quality?
93









How to manage data coming from multiple
sources?
What are best practices for improving it?
Can we take advantage of information in
application profiles?
How/when should we aggregate metadata?
What tools will we need?
94

Next Steps: Continue the
discussion!
Thank you!
Additional photo credits:
University of Rochester Photographic Services
www.publicdomainpictures.net/view-image.php?image=54374&picture=runningbulls-12
www.dreamstime.com/stock-photos-group-kids-children-running-image5855523
www.publicdomainpictures.net/view-image.php?image=49200&picture=herd-ofhorses
www.publicdomainpictures.net/view-image.php?image=42311&picture=oceanthrough-window-frame
www.publicdomainpictures.net/view-image.php?image=10217&picture=goldenstar
www.publicdomainpictures.net/view-image.php?image=27317&picture=hand-tools
www.publicdomainpictures.net/view-image.php?image=27274&picture=stair-steps

JENNIFER BOWEN
JBOWEN@LIBRARY.ROCHESTER.ED
U

Linked Data: Why Bother?

  • 1.
    LINKED DATA: WHY BOTHER? JENNIFERBOWEN, UNIVERSITY OF ROCHESTER NOTSL MEETING, KENT STATE UNIVERSITY NOVEMBER 22, 2013
  • 2.
    My Topics Today 2     The―Vision‖ piece: Why should libraries care about linked data? A few linked data use cases for libraries Can libraries achieve their metadata-related goals WITHOUT linked data? Lessons learned from developing the eXtensible Catalog and what that has to do with linked data
  • 3.
  • 4.
    XC User Research Partners: CornellUniversity Ohio State University University of Rochester Yale University 4 Studying scholars at the UR…
  • 5.
    Scholars want toread everything on the topic that they are researching 5
  • 6.
    6 They want tobe in the middle of everything they need, all organized so it is findable and usable
  • 7.
    Scholars want theirresearch to be findable and usable by others. 7
  • 8.
    “These other researcherscite MY research…” 8
  • 9.
    Scholars want toconnect to people whose work is interesting and useful to them. 9
  • 10.
    Scholars don’t carewhat the technology is, as long as it helps them do their work 10
  • 11.
    A shift inhow people seek and use information 11    Systems that libraries provide (websites, catalogs, databases) are bypassed …not just in favor of Google and the Web in general …but also in favor of tailored desktop, mobile, and web applications
  • 12.
    Beyond library findingtools 12 ―Even scholars who continue to use library finding tools are turning to new applications to aggregate and analyze information in ways that extend their scholarship beyond what manual searching and analyzing allows.‖ -- Nancy Fried Foster Senior Anthropologist, Ithaka S+R
  • 13.
    Vision for howto address this… 13  Make library resources discoverable on the open web, through applications that potential readers are already using: Search engines Mobile apps Social media
  • 14.
  • 15.
    An example…Mt. HopeCemetery 15 Photo credits: ROCHESTER’S SPEAKING STONES By Th. Emil Homerin; University of Rochester Department of Religion and Classics http://www.rochester.edu/College/REL/faculty/homerin/REL167/reports.htm
  • 16.
    An example…Mt. HopeCemetery 16 Photo credit: www.findagrav.com/cgibin/fg.cgi?page=pv&GRid=31&PIpi=76 016
  • 17.
    17 Photo credits: Universityof Rochester. River Campus Libraries. Department of Rare Books and Special Collections. http://www.lib.rochester.edu/index.cfm?PAGE=4119
  • 18.
    What’s the roleof linked data? 18   Tools like this are possible today with dedicated programming. Linked Data will enable library resources to be included in applications like this by allowing application developers access to a “…a store of machine-actionable data on which improved services can be built”. (Linked Open Data value statement)
  • 19.
    THREE INITIATIVES RELATED TOLINKED DATA AND LIBRARIES
  • 20.
    Stanford Linked DataWorkshop (2011) 20 Linked Open Data Value Statements http://www.clir.org/pubs/reports/pub152/LinkedData Workshop.pdf
  • 21.
    Linked Open DataValue Statements 21      Linked Open Data (LOD) puts information where people are looking for it: on the web LOD can expand discoverability of our content LOD opens opportunities for creative innovation in digital scholarship and participation LOD allows for open continuous improvement of data LOD creates a store of machine-actionable http://www.clir.org/pubs/reports/pub152/LinkedData data on which improved services can be built Workshop.pdf
  • 22.
    More Linked OpenData Value Statements 22   Library LOD might facilitate the breakdown of the tyranny of domain silos LOD can provide direct access to data in ways that are not currently possible, and provides unanticipated benefits that will emerge later as the stores of LOD expand http://www.clir.org/pubs/reports/pub152/LinkedData Workshop.pdf
  • 23.
    Another library linkeddata initiative: BIBFRAME 23 www.loc.gov/bibframe/
  • 24.
    What is BIBFRAME? 24   Libraryof Congress-led effort to replace MARC 21 with a new bibliographic model based upon linked data ―Determine a transition path for the MARC 21 exchange format in order to reap the benefits of newer technology while preserving a robust data exchange that has supported resource sharing and cataloging cost savings in recent decades.‖
  • 25.
    More goals ofLC’s BIBFRAME 25    Differentiate between conceptual content and physical manifestations (works and instances) Focus on unambiguously identifying information entities (e.g. authorities) Leverage and expose relationships between and among entities http://bibframe.org/
  • 26.
    26 Potential Issues with BIBFRAME    Conceptualmodel doesn’t fully conform to either FRBR or RDA (e.g. no ―expression‖ level) – is this a problem? Will organizations that have already implemented linked data use BIBFRAME once it is finished? Do we really need a new serialization of MARC dictated by LC? http://bibframe.org/
  • 27.
    Let’s get alittle more specific… WHAT CAN LIBRARIES ACTUALLY DO WITH LINKED DATA?
  • 28.
    62 Use Casesfor Library Linked Data! ―The mission of the Library Linked Data incubator group is to help increase global interoperability of library data on the Web, by bringing together people involved in Semantic Web activities—focusing on Linked Data—in the library community and beyond, building on existing initiatives, and identifying collaboration tracks for the future.‖ 28
  • 29.
    W3C Library LinkedData (LLD) Incubator Group Use Case Areas 29         Bibliographic data Authority data Vocabulary alignment Archives and heterogeneous data Citations Digital objects Collections Social and new uses Source: Library Linked Data Use Cases
  • 30.
    30 Some Sample UseCases W3C Library Linked Data Incubator Group: http://www.w3.org/2005/Incubator/lld/XGR-lld-usecase-20111025/
  • 31.
    31 Bibliographic Data UseCase: Deduplication and Unification of Library Records     Enable matching not based upon data from a central provider More reference data for matching/deduplication would be available openly for any library to use Non-MARC metadata also need deduplication and unification Using linked data would result in more trusted matches, more opportunities to automate the Source: Library matching process Linked Data Use Cases
  • 32.
    Deduplication/Merging of MetadataWith and Without Linked Data Using Linked Data 32 Record for Resource B: Match Point 1 Match Point 2 Deduping Records Record for Resource A: Match Point 2 Graph for Resource A URI: URI: URI: URI: URI: Graph for Resource B URI: URI: URI: URI: URI: If records match on a designated match point, one record overlays the other or a merge algorithm can keep data from both records Algorithm could look at all URIs representing two resources to determine a ―match‖ and combine all URIs into a single graph
  • 33.
    Authority Data UseCase: Authority Data Enrichment (VIAF) 33     Enrich already existing authority data with additional information from external data sets by linking instead of copying & merging Enables VIAF (Virtual International Authority File) to be expanded with huge amounts of data from all over the world Align different representations of the same real-world resource Linked data allows the usage of remote data in applications Source: Library Linked Data Use Cases
  • 34.
    Vocabulary Alignment UseCase: Vocabulary Merging 34    Users expect to be able to search for subjects using their own language and terms in an unambiguous, contextualized manner. Linked Data technologies could provide the underlying infrastructure by semantic mapping or merging of concepts across vocabularies. Allow vocabularies defined by different sources to organize (classify, index ...) legacy data to be used together Source: Library Linked Data Use Cases
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
    Vocabulary Merging: Rochester AIDSPosters vs. LCSH 39 Arrow [URI for UR’s vocabulary AIDS poster terms] Same as Arrow (Symbol) id.loc.gov/authorities/ subjects/sh20130005 24
  • 40.
    Archives and HeterogeneousData Use Case: Semantic Connections 40   A group of archives would like to better share information about their holdings. They have separate catalogs and these catalogs do not necessarily use the same data formats. Exporting and sharing their data in Linked Data format would allow them to make connections between the collections using topics, names, place names, and other information contained in their metadata. Source: Library Linked Data Use Cases
  • 41.
  • 42.
  • 43.
    UR Local vocabulary:AIDS Prevention 43
  • 44.
    44 Rochester AIDS Postersvs. UCLA AIDS Posters: Semantic Connections AIDS prevention [URI for UR’s vocabulary AIDS prevention] Same as AIDS (Disease)— Prevention [URI for LCSH term]
  • 45.
    Social and NewUses Use Case: Search Engine Optimization 45  Make library data searchable through Web search engines by:  Adopting an architecture that is compatible with web crawling by bots, and  Optimizing the available content so that search engines can process it efficiently  Adding structured metadata (e.g. RDFa) to library online catalogs could increase the visibility and accessibility of their data. Source: Library Linked Data Use Cases
  • 46.
    46 ―…the entire publiclyavailable version of WorldCat is now available for use by intelligent Web crawlers, like Google and Bing, that can make use of this metadata in search indexes and other applications. ‖
  • 47.
  • 48.
    48 Envisioning The Future WithoutLinked Data Or, What we learned from developing eXtensible Catalog (XC) software
  • 49.
    What is XCsoftware? 49 eXtensible Catalog (XC) is open source, user-centered, next generation software for libraries. XC provides a discovery system and a set of tools for libraries to manage metadata and build applications.
  • 50.
    eXtensible Catalog Fundersand Contributors 50 Major Funding Andrew W. Mellon Foundation Major Contributors Consortium of Academic and Research Libraries in Illinois (CARLI) Kyushu University University of Rochester
  • 51.
    Why Did WeBuild XC? 51 Empower libraries to have control over their discovery environment Put results of user research into practice Extremely customizable user interface
  • 52.
    Why Did WeBuild XC? 52     Create a new metadata management platform Implement a FRBR-based record structure Facilitate RDA implementation Repurpose MARC 21 records
  • 53.
    ―FRBRized‖ MARC records 53 ParsingMARCXML records into linked FRBR-based XC Schema records XC Work Work Expressed XC Expression MARCXML Bibliographic ―Uplink‖= Record ID of the parent record created during OAIPMH harvest. Expression Manifested XC Manifestation
  • 54.
    Facilitating RDA Implementation XCtransforms MARC data into a FRBR-informed ―transitional‖ XML schema The ―XC Schema‖ uses a subset of RDA elements and roles alongside Dublin Core, some XC data elements More RDA elements can be added to the schema in the future 54
  • 55.
    Repurposing MARC 21records 55 Converts MARC codes to vocabulary values  Removes extraneous data  Normalizes inconsistencies  Maps most MARC fields/subfields and parse to appropriate FRBR Group 1 entity records 
  • 56.
    56 How XC SoftwareWorks (in a nutshell…)
  • 57.
    How XC softwareworks 57     Harvests a copy of metadata records in an existing repository Processes (cleans up, transforms) those records Makes records available for use in other applications Synchronize records in XC with records in original repositories …it’s all about metadata records!
  • 58.
    eXtensible Catalog Architecture 58 Drupal MST OAI NCIP Toolkit Toolkit Toolkit Toolkit MetadataServices - Cleanup - Format Convert ILS Connectivity Synchronize data with XC User Interface - Search - Browse ILS Connectivity - Circ. status - Account info ILS ―Driver‖ Digital Repository ILS User Interface Metadata Live Circ. Data ILS ―Driver‖
  • 59.
    eXtensible Catalog Architecture 59 Insertyour Application with OAI-PMH Toolkit Harvester User Interface here! - Search Drupal - Browse MST OAI NCIP Toolkit Toolkit Toolkit Metadata Services - Cleanup - Format Convert ILS Connectivity Synchronize data with XC ILS Connectivity - Circ. status - Account info ILS ―Driver‖ Digital Repository ILS User Interface Metadata Live Circ. Data ILS ―Driver‖
  • 60.
    60 What we learnedfrom ―FRBRizing‖ MARC in a live production system …three issues…
  • 61.
    ―FRBRizing‖ MARC records 61 ParsingMARCXML records into linked FRBR-based XC Schema records XC Work Work Expressed XC Expression MARCXML Bibliographic ―Uplink‖= Record ID of the parent record created during OAIPMH harvest. Expression Manifested XC Manifestation
  • 62.
  • 63.
  • 64.
  • 65.
    65 Issue 1: Managing Relationships ParsesMARCXML records into linked FRBR-based records How many FRBR entity relationships can we support with MARCXML Bibliographic XC software? ―Uplink‖= Record ID of the parent record created during OAIPMH harvest. XC Work XC Expression XC Manifestation
  • 66.
    66 Issue 1: Managing Relationships MARCbibliographic records can refer to multiple FRBR entities of the same type (analytics that represent multiple works/expressions, e.g. tracks on a CD) XC XC Work Work XC Work XC XC Expression Expression XC Expression MARCXML Bibliographic XC Manifestation
  • 67.
    Issue 2: BeyondFRBR Group 1 Entities 67 MARC ―Alternate Graphic Representation‖ (880 fields) can contain data that belong in records for Group 2 and Group 3 entities Contributor: 700 1 ‡6 880-08 ‡a Vasil’ev, Maksim. 880 1 ‡6 700-08 ‡a Васильев, Максим. Subject: 600 10 ‡6 880-06 ‡a Putin, Vladimir Vladimirovich, ‡d 1952880 10 ‡6 600-06 ‡a Путин, Владимир Владимирович, ‡d 1952-
  • 68.
    Issue 2: BeyondFRBR Group 1 Entities 68 If we were to parse this 880 data correctly, we would need to create and link to two additional records for Contributor and Subject that include the alternate scripts Contributor Subject (alternate forms from 880) •Contributor in Cyrillic characters •Contributor in Roman characters (alternate forms from 880) •Subject in Cyrillic characters •Subject in Roman characters XC Work XC Expression MARCXML Bibliographic XC Manifestation
  • 69.
    69 Issue 3: RelatedGroup 1 Entities Language attribute for a related expression 041 1 ‡a eng ‡h ita 100 0 ‡a Dante Alighieri, ‡d 1265-1321. 240 10 ‡a Divina commedia. ‡l English 245 14 ‡a The divine comedy / ‡c Dante ; a new verse translation by C.H. Sisson. 500 ‡a Translation of: Divina commedia.
  • 70.
    Managing Relationships 70 If wewere to parse the original language from 041 ‡h, we would need to create and link to another ―based on‖ expression record (if we even have enough information to create it) Contributor Subject (alternate forms from 880) •Contributor in Cyrillic characters •Contributor in Roman characters (alternate forms from 880) •Subject in Cyrillic characters •Subject in Roman characters XC Work XC Expression Based on (Expression) – from 041 ‡h MARCXML Bibliographic XC Manifestation
  • 71.
    71 What XC hastaught us about FRBR…  The GOOD news: MARC data is very rich, and contains data about MANY relationships described in FRBR and related data models There are hundreds of RDA Relationships between FRBR entitles!
  • 72.
    What XC hastaught us about FRBR 72 Maintaining links between separate FRBR entity records in a production environment is likely not scalable if we continue to manipulate records. XC Work •new records •changed records •deleted records •changed relationships XC Expression XC Manifestation
  • 73.
    73 What XC hastaught us about FRBR…  The GOOD news: MARC data is very rich, and contains data about MANY relationships described in FRBR and related data models  The BAD news: managing all of these relationships in a record-based system is probably not feasible
  • 74.
  • 75.
    XC AND LINKEDDATA: OUR ―AHA!‖ MOMENTS!‖
  • 76.
    Our first ―Aha!Moment‖ 76  It would be much easier to ―FRBRize‖ MARC data using Linked Data than by creating and maintaining links between separate metadata records that have FRBR-related relationships to each other!
  • 77.
    A Second ―Aha‖Moment! 77 Creating Linked Data triples that refer to FRBR entities would be more meaningful than creating triples that refer to MARC records XC handles the interim step, of converting MARC data to FRBR entities
  • 78.
  • 79.
    With and withoutFRBR 79         Without FRBR: <MARCBibRecord-number> has_author ―J K Rowling‖ With FRBR: <Work-id> has_creator ―J K Rowling‖ <Expression-id> has_language ―English‖ <Expression-id> has_parent_work <Work-id> <Manifestation-id> has_isbn <ISBN-number> <Manifestation-id> has_parent_expression <Expression-id>
  • 80.
    80 Why use FRBRfor Linked Data?    User research shows that users want to see the relationships between resources, etc. With XC, we can explore when/how FRBR might be useful for linked data Other data models may be more appropriate in some contexts and those can be explored as well.
  • 81.
    Another not-quite ―AHA!Moment‖… 81 XC can serve as an interim step to create Linked Data because XC’s underlying schema uses elements from registered element sets (i.e. data elements already have URIs)
  • 82.
    RDF Triple -Registered Data Elements 82 Subjec t oai:mst.rochester.edu: MST/ MARCToXCTransformatio n/ 10081 This resource Predicat e Object http://id.loc.gov/authoritie s/sh85103735#concept http://www. extensiblecatalog.in fo/Elements/subject has subject Poets, America n
  • 83.
    XC Schema Properties 83 DC     DublinCore terms (all) RDA – subset of elements and role designators XC elements (newly-defined) – when necessary All properties are from registered element sets and thus already have URIs RDA XC
  • 84.
    84 XC and LinkedData: What’s Next? XC facilitates associating metadata with FRBR Group 1 entities using data elements (mostly from RDA and Dublin Core) Implementing FRBR may help us create more meaningful Linked Data in some situations How can we make XC actually output Linked Data?
  • 85.
  • 86.
  • 87.
    eXtensible Catalog Architecture 87 Drupal Toolkit NewESTC User Interface Interface to be - Search built on Collex - Browse software MST OAI NCIP Toolkit Toolkit Toolkit Metadata Services Metadata Services - - Cleanup Cleanup - - Format Convert Format Convert ILS Connectivity Synchronize data with XC ILS Connectivity - Circ. status - Account info ILS ―Driver‖ Digital Repository ILS User Interface Metadata Live Circ. Data ILS ―Driver‖
  • 88.
    ESTC Linked DataBenefits 88        Make data available for computational use Transform data back to MARC for reuse in library systems More granularity of data (e.g. date ranges) Collect new types of information, some not supported by MARC Incorporate information from other projects (VIAF) Make ESTC data more amenable to reuse by other projects, including discrete bits of data http://estc21.wordpress.com/data/
  • 89.
    LINKED DATA CHALLENGES (WHYWE SHOULDN’T CREATE LINKED DATA?)
  • 90.
    ―We won’t beable to control our data!‖ 90
  • 91.
    Linked OPEN Data? 91      Howmuch data to make available? Concerns about jeopardizing future business models Can we predict now how much data will be needed to fulfill future use cases? Metadata licensing issues Rights management
  • 92.
    How will weassess quality? 92    Provenance: where did this data come from? Should ―triples‖ become ―quadruples‖ so we can tell ―who said this‖? Is the data accurate?
  • 93.
    How can wemaintain/improve quality? 93      How to manage data coming from multiple sources? What are best practices for improving it? Can we take advantage of information in application profiles? How/when should we aggregate metadata? What tools will we need?
  • 94.
    94 Next Steps: Continuethe discussion!
  • 95.
    Thank you! Additional photocredits: University of Rochester Photographic Services www.publicdomainpictures.net/view-image.php?image=54374&picture=runningbulls-12 www.dreamstime.com/stock-photos-group-kids-children-running-image5855523 www.publicdomainpictures.net/view-image.php?image=49200&picture=herd-ofhorses www.publicdomainpictures.net/view-image.php?image=42311&picture=oceanthrough-window-frame www.publicdomainpictures.net/view-image.php?image=10217&picture=goldenstar www.publicdomainpictures.net/view-image.php?image=27317&picture=hand-tools www.publicdomainpictures.net/view-image.php?image=27274&picture=stair-steps JENNIFER BOWEN JBOWEN@LIBRARY.ROCHESTER.ED U