Roy Tennant, Senior Program Officer, OCLC Research
As library collections shift from print materials to digital formats, and as the web enables ubiquitous and instantaneous discovery of information, library users expect to find and access materials online. It’s not enough to have pages “on the web”; library data must be “woven into the web” and integrated into the sites and services that library users frequent daily – Google, Wikipedia, social networks. When information about a library’s collection is locked up behind a specific web site (such as an OPAC), it is often exceedingly difficult for services, such as search engines, to consume that data. Information seekers need to be connected back to their local library resources from wherever they are on the web. The imperative is to make library data available in new data formats that are native to the web, exposing it to the wider web community, making it easily discoverable by other sites, services, and ultimately consumers. Roy Tennant will shed light on what linked data is and how to re-envision, expose and share library data as entities that are part of the web.
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
Linked Data: What’s the Story?
1. The world’s libraries. Connected.
Roy Tennant
Senior Program Officer, Research
Linked Data:
What’s the Story?
Slides contributed by Bruce Washburn,Richard Wallis
Karen Smith-Yoshimura, Dan Benson, and Mike Teets
2. The world’s libraries. Connected.
Photo by Li Chen, https://www.flickr.com/photos/yiie/, CC BY 2.0 Generic License
7. The world’s libraries. Connected.
• A collection of statements…
• Taken from the piece itself…
• Sometimes “enhanced” with inferred
parentheticals (e.g., [1975] )…
• Or additional statements not on the piece (e.g.,
subject headings)
• Mostly uncontrolled and only loosely connected
to anything else
Classic Bib Record
9. The world’s libraries. Connected.
• Identification Problems:
• “The Hamlet Problem” (titles aren’t enough)
• “The Wang/Li/Zhang Problem” (names aren’t enough)
• Linkage Problems:
• “The Web Problem” (records aren’t enough, you need links)
• “The Language Problem” (surfacing the right translation for a
given user is still a problem)
• Quality Problems:
• Strings are not controlled terms; often, they cannot be turned into
them
Actually, A Number of Problems
14. entity
/ nt ti/ˈɛ ɪ
noun
a thing with distinct and independent
existence.
relationship
/r le (ə)n p/ɪˈ ɪʃ ʃɪ
noun
the way in which two or more people or
things are connected
15. Record
Title: "War and Peace"
Author: "Leo Tolstoy 1828-
1910"
ISBN: 0307266931
Type: Work
Name: "War and Peace"
Author: http://worldcat.org/entity/person/id/1234
Entity (http://worldcat.org/entity/work/id/115206288)
Type: Person
Name: "Leo Tolstoy "
Born: 1828
Died: 1910
Birthplace: http://worldcat.org/entity/place/id/8976
Entity (http://worldcat.org/entity/person/id/1234)
Type: Place
Name: "Yasnaya Polyana"
SameAs: http://geonames.org/468686
Entity (http://worldcat.org/entity/place/id/8976)
⤵
︎
⤵
︎
⟶
17. The world’s libraries. Connected.
Entities of Core Concepts
person place
object concept
organization work
18. The world’s libraries. Connected.
Establishing Relationships Between Entities
person place
object concept
organization work
author
subjectitem
availability
22. The world’s libraries. Connected.
OCLC Production Services
External OCLC Research Systems
Internal OCLC Research
Resources
enhanced
WorldCat
WORKS
Kindred Works
Classify
Identities
FictionFinder
Cookbook
Finder
LCSH
FAST
VIAF
GMGPC
GSAFD
GTT
DDC
LCTGM
MeSH
Linked Data Entities
27. The world’s libraries. Connected.
Cataloging
Integration with the web
Cascading updates More options
Intuitive searching
Benefits for all library workflows
28. Improving Discovery
The Name of the RoseThe Name of the Rose
Summary: The year is 1327. Franciscans in a
wealthy Italian abbey are suspected of heresy,
and Brother William of Baskerville arrives to
investigate. His delicate mission is suddenly
overshadowed by seven bizarre deaths that take
place in seven days and nights of apocalyptic
terror.
Subjects
Borrowing Options
eBooks | Printed Books | Audio Books
Other Languages
Monastic libraries -- Italy – Fiction | Semiotics --
Fiction
37. The world’s libraries. Connected.
Improving Cataloging
• Improve data quality
• Cascading updates
• A new approach to
cataloging:
• Point and click
cataloging
• Managing entities
instead of managing
records
• Consistent with RDA
Photo by https://www.flickr.com/photos/97741188@N04/, CC BY 2.0
38. The world’s libraries. Connected.
Title: Journey to the West
Language: English
Translator: Anthony C. Yu
Date: 1977
IsTranslationOf:
Title: Journey to the West
Language: English
Translator: Anthony C. Yu
Date: 1977
IsTranslationOf:
Title: 西遊記
Language: Chinese
Author: 吳承恩
Created: 1592
HasTranslation:
Title: 西遊記
Language: Chinese
Author: 吳承恩
Created: 1592
HasTranslation:
39. The world’s libraries. Connected.
Title: Journey to the West
Language: English
Translator: Anthony C. Yu
Date: 1977
IsTranslationOf:
Title: Journey to the West
Language: English
Translator: Anthony C. Yu
Date: 1977
IsTranslationOf:
Title: Journey to the West
Language: English
Translator: W. J. F. Jenner
Date: 1982-1984
IsTranslationOf:
Title: Journey to the West
Language: English
Translator: W. J. F. Jenner
Date: 1982-1984
IsTranslationOf:
Title: 西遊記
Language: Chinese
Author: 吳承恩
Created: 1592
HasTranslation:
Title: 西遊記
Language: Chinese
Author: 吳承恩
Created: 1592
HasTranslation:
Title: Tay du ký bình khảô
Language: Vietnamese
Translator: Phan Quan̂
Date: 1980
IsTranslationOf:
Title: Tay du ký bình khảô
Language: Vietnamese
Translator: Phan Quan̂
Date: 1980
IsTranslationOf:
Title: 西遊記
Language: Japanese
Translator: 中野美代子
Date: 1986
IsTranslationOf:
Title: 西遊記
Language: Japanese
Translator: 中野美代子
Date: 1986
IsTranslationOf:
Title: Pilgerfahrt
Language: German
Translator: Georgette Boner
Date: 1983
IsTranslationOf:
Title: Pilgerfahrt
Language: German
Translator: Georgette Boner
Date: 1983
IsTranslationOf:
44. The world’s libraries. Connected.
• Requires a new kind of thinking: “sets of
assertions about something” NOT a “record”
• Assertions about an item can be dynamic
• Authority control is built-in
• Requires much more than simply translating a
record from MARC to a new format
It’s Not the Record, It’s the Linkage
46. The world’s libraries. Connected.
Roy Tennant
roytennant.com
@rtennant
facebook.com/roytennant/
tennantr@oclc.org
Thank You!
Editor's Notes
Two common words… John Rock. Two very different people. The web now understands!
We at OCLC are attacking the problem from two separate directions. These directions may converge in the middle, or may stay separate.
Identify methods that will operate on all text under management
Improve the coverage and quality of formally modeled data
But there is a technology that allows both to co-exist. Triplestores
In a record view you get bits of data [evidence] about an item.
Related entities provide descriptions of real things/entities enabling you to follow the relationships between them – inside and beyond the closed world of libraries
We have internal processes that take MARC records serialized as XML and convert them using an XSLT stylesheet into RDF data. This graphic visualizes how we take a MARC records with a $100 Personal Name (Robert Pirsig), a $245 Title (Zen and the art of motorcycle maintenance) and a $650 subject (Fathers and sons) and translate the string values into RDF entities and values. In addition to shredding the MARC record into entities, the other crucial part of this process is clustering WorldCat manifestation-y type records into Work clusters. The resulting image shows you how we are exposing the OCLC Works data right now. With a known Work ID, you can lookup OCLC Works as well as content negotiate for various RDF serializations (NT, JSON-LD, RDF/XML and Turtle).
We, at OCLC, with our major data ingest and processing techniques – Big Data tech
Matching incoming data with what we have
Identifying the entities and associating their role attributes
Woks – not so far very visible in libraries – important on the web
So how do we get from documents to entities?
Once solution is an entity-relationship model of the domain of resources managed by libraries. We get there by shredding documents into a set of statements into a simplified form of ‘English’ (or other human language) that a computer can understand. This diagram shows that a bibliographic record about Hamlet can be represented as a set of statements such as ‘William Shakespeare is a person’ Hamlet is a work, ws is the author of hamlet, and so on.
Entity-relationship models like this one have been created for decades. Students learn how to do this in CS 101. It’s an admission that natural-language processing isn’t going to get us where we want to be—not fast enough, anyway.
The linked data paradigm just adds some new requirements:
--use published vocabularies; identify as many links as possible; define entities and relationships that are real– or things in the world that regular people care about – not software objects, database records, or program states that are understood only by me or my project team.
--So this is a big deal. ld is about a shared semantics. It is about a theory of reference that makes the claim that meaning is not just something inside a person’s head, but something much more public. Lots of philosophers have written about that.
Statistics as of August 2014
KSY’s survey results, September 2014 (72 respondents)
OCLC’s RDF datasets are among the oldest, largest, and most referenced LD stores in the ‘library’ sector of the linked data cloud.
There is international interest in LD
There are many emerging datasets, including WorldCat Works
But:
The cloud is still organizing itself.
Many small datasets, relatively little cross-linking from:
The broader web to the library community
The library community to the broader web
The library community to itself
The survey will be repeated this summer
Data to underpin innovation! - A person knowledge card in a prototype WorldCat Discovery interface
This is an example of where it grows up and just becomes useful.
VIAF helps!
VIAF helps!
Vistior number grow as the graph is exposed
This is a fast-moving story and a difficult subject to nail down in a book-length work.
But it’s a living document.