1. MARC and Beyond
Our Three Linked Data Choices
Richard Wallis
Evangelist and Founder
Data Liberate
richard.wallis@dataliberate.com
@rjw
IFLA WLIC 2018
Kuala Lumpar
August 26th 2018
2. Independent Consultant, Evangelist & Founder
Worked With:
•Google – Schema.org vocabulary, site, extensions, documentation and community
•OCLC – Global library cooperative
•FIBO – Financial Industry Business Ontology Group
•Various Clients – Implementing/understanding Schema.org
British Library — Stanford University — Europeana — NLB Singapore
W3C Community Groups:
•Schema Bib Extend (Chair) - Bibliographic data
•Schema Architypes (Chair) - Archives
•Financial Industry Business Ontology – fibo.schema.org
•Tourism Structured Web Data (Co-Chair)
•Schema Course Extension
•Schema IoT Community
•Educational & Occupational Credentials in Schema.org
richard.wallis@dataliberate.com — @rjw
40+ Years – Computing
28+ Years – Cultural Heritage technology
13+ Years – Semantic Web & Linked Data
3. • 1989 The Web Conceived
• 1994 1st
Web-based library discovery interface
• 2001 Semantic Web Introduced
• 2006 Linked Data
• 2011 British Library Data Model
• 2011 Schema.org Launched
• 2012 OCLC add Schema.org to WorldCat.org
• 2012 Library of Congress launch BIBFRAME Project
A bit of Linked Data HistoryA bit of Linked Data HistoryLinked Data
Tim Berners-Lee
4. • 1989 The Web Conceived
• 1994 1st
Web-based library discovery interface
• 2001 Semantic Web Introduced
• 2006 Linked Data
• 2011 British Library Data Model
• 2011 Schema.org Launched
• 2012 OCLC add Schema.org to WorldCat.org
• 2012 Library of Congress launch BIBFRAME Project
A bit of Linked Data HistoryA bit of Linked Data History
5. • Lots of great individual linked data initiatives
(interoperability between them a question)
• Bibframe 2.0 – ready for wider standardized adoption
• Schema.org – the de facto structured data standard for the web
(30% domains using it)
• Potential for adding web links to MARC
(Linky MARC)
Library Linked DataLibrary Linked Data
…… where are we now?where are we now?
8. Prototyping a Linked Data Platform
for Production Cataloging Workflows
A project vision statement
Work with our members through a foundational shift
in the collaborative work of libraries, communities of
practice, and end-users—dramatically improving
efficiency, embracing the inclusive, diverse, and earnest
OCLC membership, and empowering a new and
trusted knowledge network enabled by the web.
Andrew K. Pace
Executive Director, Technical Research
9. #ALAAC18
Phase I Partners (Dec ’17 - Apr ‘18)
• Cornell University
• University of California, Davis
Who Phase II Partners (!!!!) (May ‘18 – Sep ‘18)
– American University
– Brigham Young University
– Cleveland Public Library
– Harvard University
– Michigan State University
– National Library of Medicine
– North Carolina State University
– Northwestern University
– Princeton University
– Smithsonian Library
– Temple University
– University of Minnesota
– University of New Hampshire
– Yale University
OCLC
Global
Technolo-
gies
OCLC
Research
OCLC Global
Product
Management
10. #ALAAC18
What
• Develop an Entity Ecosystem that facilitates:
o Creation and editing of new entities
o Connecting entities to the Web
• Build a community of users who can:
o Create/Curate data in the ecosystem
o Imagine/propose workflow uses
o Communicate easily with each other and with OCLC to iteratively improve the
prototype
• Provide services to:
o Reconcile data
o Explore the data
11. RECONCILER
INDEX
RECONCILIATION
API
BATCH
Local Bibliographic and
Authority Data
RANKING BY
EDITOR
UI
DUPLICATE
DETECTION
WORLDCAT
CREATIVE WORK
ASSOCIATION
ENTITY
ECOSYSTEM
UI
MINTING / EDITING
API
AUTHENTICATION &
AUTHORIZATION
ENTITY to ENTITY
RELATOR
External
Client Applications
External
Client Applications
12. Entity Ecosystem
• Entity Creation / Edit / Reconciliation
• Research Prototype – but working with OCLC Production and OCLC Members
• Building on Open Source tools
• Learning and applying lessons
• Vocabulary independent
• Potential to become a useful tool/service plus an authoritative knowledge graph
13. What are my Library Linked Data
Options?
BIBFRAME 2.0 Schema.org
Linky MARC Do nothing
Who Wants to be a Linked Data Library
14. BIBFRAME 2.0
• Matured from 1.0 (and variations)
• Not perfect but ‘good enough’ – continuing to develop
• MARC Conversion specifications
• Open source MARC Conversion software
• Good basis for standardized Linked Data (RDF) interchange
• Backed by Library of Congress - Supported by others
• Introduces potential for entity-based cataloguing
15. BIBFRAME 2.0
• A Library vocabulary/ontology/standard
• Not recognized outside of library and
associated organisations
• No real use for increasing visibility &
discoverability on the general web
16. • De facto structured web data standard vocabulary
• Has bibliographic extension
• Shared by embedding in normal page HTML
(No special data endpoint required)
• Already found in 30% of web
• Requested by Google and others
• Used to populate Search Engine Knowledge Graphs
• Acknowledged by Google to influence indexing
• Driving Semantic Search, Voice Search, Local Search, etc.
• Not detailed/specific enough for library cataloguing, etc.
Schema.org
17. • Adding http URIs to $0 & $1 MARC subfields
• Recommendation of the PCC Task Group on URIs in MARC
• Consistent approach to include links to entities such as
people, organisations, etc. (authorities)
• Not Linked Data – but a way to preserve identified entity URIs
for the future and possibly use them in user interfaces etc.
Linky MARC
18. Do nothing
• Wait for system suppliers to catch up
• Need to keep aware of developments
19. What are my Library Linked Data
Options?
BIBFRAME 2.0 Schema.org
Linky MARC Do nothing
Who Wants to be a Linked Data Library
BIBFRAME 2.0 Schema.org
20. MARC and Beyond
Our Three Linked Data Choices
Richard Wallis
Evangelist and Founder
Data Liberate
richard.wallis@dataliberate.com
@rjw
IFLA WLIC 2018
Kuala Lumpar
August 26th 2018
Editor's Notes
My topic for today – Linked Data
Let’s start with a brief bit of history
1989 The Web Conceived
Tim Berners Lee introduces “Information Management: A Proposal” to his boss – eventual response was “Vague but interesting”
1994 1st Web-based library discovery interface
Cooperation between Loughborough University & BLCMP (UK Library Cooperative) – I had pleasure to be part of
2001 Semantic Web Introduced
By Tim Berners Lee & others in now infamous Scientific American Article
2006 Linked Data
Sir Tim introduces a pragmatic application of Semantic Web standards
2011 British Library Data Model
A seminal project that delivered a native Linked Data Model for library meta data
2011 Schema.org Launched
Google, Bing, Yahoo introduce a standard vocabulary for web data markup
2012 OCLC add Schema.org to WorldCat.org
Schema.org structured data embedded in all 300 million+ WorldCat pages – still live today
2012 Library of Congress launch BIBFRAME Project
A Linked Data replacement/future for MARC
As you can see it has been a long and lively history for linked data.
I have been promoting the benefits for well over a decade now.
So for libraries…
Library Linked Data – where are we now?
Over the years there have been many great Linked Data projects – from British, French, German, Spanish and other national libraries, open source systems, large public libraries, Europeana etc. – all great projects but one has to question how data interchange will scale between them.
Bibframe, after a bit of an uneven start, has evolved into Bibframe 2.0 – ready for wider adoption
Schema.org – grown in capability – has bibliographic extension (which I helped introduce) – suitable for sharing library data with the world - widely adopted across many domains (on approx. 30%) - starting to be adopted by the SEO community,
Linky MARC - my term for proposals for adding URLs to MARC records describing major entities such as people, places, organisations, Works, subjects, etc.
So the obvious question is – why do we want linked data
Here are a few reasons I hear
Picking out three of these highlights what I like to call the The Challenge of Linked Data Entity Reconciliation
When working with entities it becomes obvious to users when we match or duplicate entity descriptions wrongly.
A problem with naming that we have been grappling with for years, exacerbated by entity properties and a wide range of matching authorities.
I have identified a project from OCLC Research that serves to address the workflows around such issues.
They are working with OCLC Members to prototype workflows that will empower a new and trusted knowledge network enabled by the web.
To help me describe this, my former colleague Andrew Pace has kindly lent me some slides from his recent presentation at at ALA in New Orleans.
The initial phase was in cooperation with two OCLC members and, critically, two front line areas of OCLC – this is a project heading toward productized delivery.
The second phase has added 14 other significant US members
They are building an Entity Ecosystem to enable the creation, relating, management and reconciliation of entities of all relevant types – people, organisations, places, concepts, events, etc.
It is being built using open source technologies such as MediaWiki, WikiBase, and OpenRefine.
It addresses issues such as relating entities, as against traditional practices of copy cataloguing; and the linking of entities to external authorities such as Wikidata and Geonames etc.
I’m not going to go into detail, but it is fairly clear from this diagram that there are two main functional areas focused on editing and reconciliation are built around a shared entity ecosystem – each with their own focused API and user interfaces.
In summary OCLC are working on:
Entity Creation / Edit / Reconciliation
Research Prototype – but working with OCLC Production and OCLC Members
Building on Open Source tools
Learning and applying lessons
Vocabulary independent
Potential to become a useful tool/service plus an authoritative knowledge graph
Such a service can be vocabulary independent, but individual libraries and system suppliers need to decide what linked data vocabularies they are going to support in their systems….
For libraries with Linked Data ambitions the $64,000 question is which option do I take?
BIBFRAME 2.0
Schema.org
Linky MARC
Do nothing
Let’s review the options
Bibframe 2.0:
Matured from 1.0 (and variations)
Not perfect but ‘good enough’ – continuing to develop
MARC Conversion specifications
Open source MARC Conversion software
Good basis for standardized Linked Data (RDF) interchange
Backed by Library of Congress - Supported by others
Introduces potential for entity-based cataloguing
However….
It is a library specific standard - not recognized by organisations outside of the library world, such as the search engines.
If you are looking to increase the visibility & discoverability of you resources across the web – as my mother would have put it Bibframe is about as much use as a chocolate teapot!
De facto structured web data standard vocabulary
Has bibliographic extension
Shared by embedding in normal page HTML(No special data endpoint required)
Already found in 30% of web
Requested by Google and others
Used to populate Search Engine Knowledge Graphs
Acknowledged by Google to influence indexing
Driving Semantic Search, Voice Search, Local Search, etc.
Not detailed/specific enough for library cataloguing, etc.
So, you would not use Schema.org to build a comprehensive library system around.
As a library, you could wait for the system suppliers to catch up and then implement what they deliver.
I would however suggest that you do not ignore developments in this area, engaging staff in understanding the basic principles and watching what the community and system suppliers are doing.
So what options should we take?
Schema.org – for discovery on and from the web
BIBFRAME 2.0 – for entity-based cataloguing – library linked data interchange – enhanced local user experience
You really need to do both, but as libraries are primarily there to assist their users, most of which do not start discovery in a library interface, Schema.org should provide the most initial benefit.
Linky MARC – if you systems do not yet support these Linked Data options, look into adding this to your cataloguing practices to stop throwing away those links you increasingly discover.
Doing nothing is not an ideal option for libraries – if you are a system supplier it is definitely not an option.