Presentation given by Gill Hamilton, National Library of Scotland at the OCLC seminar "Is there are library shaped black hole in the web?" on 16 October 2015 at Royal College of Surgeons, Edinburgh.
Gill's presentation explains the experiments undertake at the Library into linked open data. She suggests several practical tips to help libraries prepare for linked open data including; recording URIs, not dumbing down your metadata, concentrating on your unique collections, openly licensing your metadata, using open vocabularies and demanding better systems to manage linked data components and requirements.
1. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
deus ex machina
is there a library shaped hole in the web?
16 October 2015
is linked data the answer?
Gill Hamilton
Digital Access Manager
2. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
• about the Library and me
• our modest experiments
• Top Tips for preparing for LOD
• is linked data the answer?
is there a library shaped hole in the web?
16 October 2015
I’ll be looking at ….
3. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
National Library of Scotland by Ross G Strachan
4. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
All images CC-BY National Library of Scotland
5. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
All images CC-BY National Library of Scotland
The Way Forward
PDF
6. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
Image CC-BY Gill Hamilton
7. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
• if I learned one thing it is …
• dabbling in the DOD
• The 3Rs
our experiments
is there a library shaped hole in the web?
16 October 2015
8. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
LOD?
Oh yeah I geddit!
Errr… no I don’t geddit
Ah that’s right …
Errr…
Images CC-BY Gill Hamilton
9. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
What the hell is?
SKOS RDF
triple Store URI
SPARQL OWL
TURTLE XSLT
Images CC=BY Gill Hamilton and God divides heaven … Public Domain
10. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
oh no!
Oh No!
OH NO!
HELP!
Images CC-BY Gill Hamilton and God divides heaven … Public Domain
11. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
Och!
Dinnae fash
yersel!
Images CC-BY Gill Hamilton and God divides heaven … Public Domain
12. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
you and I know it already …
we call it interoperability
ye
olden
days
BM
rules
AACR
MARC
collaboration
RDA
LOD
C
L
0
S
E
D
O
P
E
N
Open
vocs
13. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
dabbling in the DOD
is there a library shaped hole in the web?
16 October 2015
http://www.math.uh.edu/~tomforde/images/UniverseAndMan.jpgAll images CC-BY National Library of Scotland
14. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
stop ... start ... stop ... start ... stop ...
• went to school to learn RDF
• tried mapping our data to DC Terms
• tried to discover URIs
• created RDF for a single resource
• published the DOD element set
15. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
All images CC-BY National Library of Scotland
16. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
RDA (and FRBR) [link]
RIMMF [link]
RDF
The 3Rs
is there a library shaped hole in the web?
16 October 2015
17. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
Images CC-BY Gill Hamilton
18. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
1. strings and things
2. be smart, not dumb
3. uniqueness
….. and some more
19. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
Top Tip 1
we have strings
“Hamilton, Gill”
but we need things too
we need URIs
THE BIG
ISSUE
20. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
machines are really stupid
Main catalogue Archive catalogue Moving image catalogue
Hamilton, Gill W. Hamilton, Gillian W. Hamilton, G. W.
you are really really smart
22. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
record just 1 more thing
the URI
c’est tout!
stop digging
23. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
and …
you might
be lucky
Fingers Crossed by ~dgies CC-BY-NC-ND
24. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
Get others to help
We think this
cathedral’s in
Cambrai
Or do you
think
Cambrai
is here?
Do you
think
Cambrai is
here?
25. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
Top Tip 2
27. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
DOD
data
structure
DOD.title
DOD.keyword
DOD.who
local & closed global & open
X
S
L
T
LOD in
DC
is there a library shaped hole in the web?
16 October 2015
DC:title
DC:subject
DC:creator
28. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
Image CC-BY Gill Hamilton
29. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
DOD
data
local
& closed
global & open
DOD in
LOD
structure
DOD.title
DOD.keyword
DOD.who
M
A
P
P
I
N
G
DC
RDA
SCHEMA
is there a library shaped hole in the web?
16 October 2015
30. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
National Library of Scotland’s DOD RDF element set at Open Metadata Registry
is there a library shaped hole in the web?
16 October 2015
31. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
Top Tip 3
32. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
concentrate on the unique
Order to Capt. Campbell by Maj. Duncanson
You are hereby ordered to fall upon the
rebells, the McDonalds of Glencoe, and put
all to the sword under seventy.
33. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
uniqueness is about
making the BEST use of limited resources
making the BEST contribution
making BEST metadata for LOD
34. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
1. record URIs
2. don’t dumb the data
3. work on unique stuff
….. and some more
35. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
the other tips
36. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
ZERO
CC
openly licence your metadata
37. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
LCNAF
LCSH
TGMI
AAT
TGN
DDC … soon?
http://id.loc.gov/
http://www.getty.edu/research/tools/vocabularies/
use open vocabularies
Library of Congress by The Agency CC-BY-SA
OCLC Building by Matkatamiba CC-BY-SA
Getty Institute by Pbjamesphoto CC-BY-SA
38. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
DEMAND better systems
Uncle Sam by James Montgomery Flag - Public domain
39. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
So is linked data
the answer?
40. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
is there a library shaped hole in the web?
16 October 2015
Well maybe …
it’s risky!
not convincing
but it’s worth the risk
41. National Library of Scotland
Leabharlann Nàiseanta na h-Alba
Gill Hamilton
Digital Access Manager
g.hamilton@nls.uk
NationalLibraryofScotland
@natlibscot
thank you
is there a library shaped hole in the web?
16 October 2015
Editor's Notes
National Library of Scotland is Scotland’s Legal Deposit Library. It’s main building on George IV Bridge, Edinburgh houses our General and Special collections reading room, exhibitions and café/ Our Causewayside Building in Edinburgh houses the map reading room. There are 2 other administration buildings in Edinburgh. The Library’s Moving Image Archive is in Hillington. In late 2016 the Library will open it’s first public premises in Glasgow at Kelvin Hall. The centre will make available the moving image and digital collections.
National Library of Scotland’s collections are vast and varied. 2 million maps, 7 million manuscripts, more than 4 million books, more than 30,000 films and videos. Our fastest growing collections are our digital collections which are already run to about 9 million items, most of this is legal deposit however a growing number is digitisation. We have available 12,000 digitised books or other resources with several thousand in processing.
Dr John Scally, National Librarian launched the Library’sambitious new strategy in September 2015. It has 6 priorities including guardianship of the collection, promoting research and developing the physical library but most important and excitingly for me are is the priority that commits the Library to describing all its collections in the next 10 years and giving digital access to a third of its collection. A great challenge for us and will deliver a radically different kind of National Library of Scotland.
And me… I’m the digital access manager. I oversee the access to the digital collections, lead on resource discovery and library management systems. As part of the new strategy I just moved to a new team and will be working hard with my colleagues to develop the systems and processes that will allow us to deliver the strategy. It’s exciting times!
I am by no means an expert in linked data. I guess I’m an expert in metadata (but not from the cataloguing point of view) more from processing point of view.
Oh and in my spare time I shout at the TV when Scotland play rugby and do things like cycle across America.
Our experiments with linked data – just a little bit of back ground about what we’ve been doing.
I should stress this is only experiments, we have nothing in production.
So when I started looking into LOD I would understand it, get a sense of it and then lose it.
Then understand it again and lose it again.
It seems this getting it and not getting it is quite common.
And then the jargon. The technologies. It all looks so scary and freaked me out.
OH NO!
But actually what I learned is that it’s actually very familiar and you don’t need to overly fret about LOD. Just think about it another way ….
If I learned one thing about experimenting is that it isn’t new, and is actually very familiar to us as librarian When you strip away the bamboozling chitter chatter, the technology and the threats of the end of the Library then you see that it’s a continuum ofinteroperability but this time we move from the local library domain to a true global domain. LOD lets you move out of the traditional library sphere and reach from local to global
In the beginning libraries were complete closed. Monks looked after them, books were chained to the wall.
Then we see developments like the BM Cataloguing rules and libraries start to describe things more consistently
Then AACR and the english speaking world describing things in the same way
Then MARC which that allows libraries to exchange data efficiently with each other for the 1st time
MARC leads to an era of Collaboration between libraries allowing things such as national libraries like NLS to participate in national shared cataloguing programmes and development of global authority files like LCSH and LCNAF.
Then we see vocabularies being published openly with RDF representions (RDF representation means it’s LOD-Ready). Vocabularies such as The Getty Institutes Thesaurus for Geographic Names and Library of Congress Name authority file
And now we have the new content description standard RDA (Resource Description and Access) which has an RDF representation
And we have linked open data which allows us to continue to interoperate but in a new and wider way. It lets us reach out to the global graph. Take us beyond the library and bibliographic domain.
The continuum of interoperability has moved us from a closed environment to an open environment, enabling us to focus on the local and participate and link to the global
DOD = digital object database. National Library of Scotland developed database that manages our digitised content. Has more than 15 million records in the database and it represents digitised books maps, broadsides, photos, posters, manuscripts.
It’s a nice place to start to learn LOD coz it’s home grown and very familiar, has a very well considered structure and the resources are consistently well described using AACR and a range of standard and open vocabularies such as TGN, AAT, TGMI, LCSH and LCNAF. It is also very easy to export nicely structured XML from the database coz is SQL Server
We output in XML a very small section of data of the photos of the construction of the Forth Bridge and worked at transforming the XML in to RDF.
So we took a W3C schools course to learn RDF
We mapped our data to the RDF representation of Dublin Core
We tried to discover URIs for the vocabularies that we used using a tool called Google Refine
We made RDF for a single resource
And we openly published the DOD element set in an open registry.
And here’s an RDF graph of one the Forth Bridge construction records.
Over the summer we’ve done some experimenting and research into RDA and RDF.
We were interested in learning more about RDA, not the faux RDA we use in MARC which just seems to be a couple of new fields about content and carrier but real pure RDA. Cataloguing from scratch in RDA, learning all about the benefits of FRBR.
To do this we used RIMMF which is a training tool that lets you work in pure RDA and interact with common vocabularies like LCNAF and then output RDF (and if you must MARC). From RIMMF you can output RDA WEMI in RDF. I’ll tell you a secret, you can also output it as MARC!
.RDA has an RDF representation which basically means its linked data. The RDA elements are registered as URIs in the Open Metadata Registry, which means anyone and anything (machines) can view and use them and see the semantics.
Some screenshots of the work we did with RDA and RIMMF
So from our experiments I have 3 tips for you to consider if you want to prepare yourself for linked data
From all our experiments the biggest issue is this
The data we have in the Library and that you all probably have is that we record “strings”. Human readable strings It’s a piece of text.
Linked data is about linking things together or URIs so machines can process them. So we need to gather the URIs
Now you might think that a machine could do the matching of strings for you but machines are really really stoopid
So consider this – this is a real example from the library.
We have 3 several databases and people are sometimes described differently in each database.
Let’s say we want to find the URI in LCNAF for me. (well actually I’m not in LCNAF)
Do you think this is the same person?
Probably! And if not you’d get on the phone and ask. Then you’d find out that I’m mortified by my middle name Wendy
But machines can’t do that. Machines don’t know that these are the same people. You’d have to program a lot of regular expression work and checking other data and still you couldn’t be confident that a machine would get it right.
So machines can’t be trusted to go off and look for URIs. You’ll get multiple hits and false drops.
Here’s another one ….. Horses.
You’ve described horses in your database and you want a machine to try and find URIs for that?
Not a chance!
It’s hard for machines
It’s about language
It’s about ambiguity
It’s simple for humans, machines have no intelligence
So my tip is …...
In your database stop recording only the string
When you record Gill Hamilton record the URI too –
When you write the word “horse” into your database record it’s URI too from LCSH
And look in your authority file and see “pseudo URIs” like the LCSH and LCNAF id numbers – these are infact, usually the URI.
You can’t always be sure of this tho, especially with things like subject headings due to their complexity, but it’s probably OK for names.
Get others to help you find URIs
You can crowdsource this – send the machine off to find Cambria and it tells you there are 2 Cambrais. Well ask people what they think is the correct Cambrai. The crowd will tell you. It’s like “Ask the audience”.
Also, we had colleagues at work do this for us. Students found URIs for LCSH and DDC for us for a subset.
Or if you have colleagues who have work on reception desks, where their work is stop start they can look up URIs in between dealing with visitors and forwarding phone calls.
We learned this for DOD.
Very well considered database structure.
We clearly understand the semantics of our database.
When we say “keyword” we know what that means.
When we say “Who” we know what that means.
We exported XML from the DOD and mapped it using an XSLT to an RDF representation of Dublin Core
We were so very pleased to make some RDF.
But we really really hated that we had to squeeze our data in to dublin core and in doing this it caused us to lose data and semantics
We didn’t like that we were losing semantics
We spend a lot of money on smart people and we don’t want to lose the meaning as we transform to other format
In the DOD, DOD:who is much much richer than DC:creator. In DOD we can indicate roles such as Author, depicted, is collector, is subject of. This is lost as we transform to DC
So we explored a different approach, actually a more open approach.
We published the structure of the DOD as an RDF representation.
Converted it into LOD
It means everyone can see how you structure you data. It is open
You don’t need to compromise your data. It’s in its original format.
And then you can write mappings in to other formats
Can have you cake and eat it!
To do this you need some kind of registry to record and publish your element set. It’s actually quite straightforwad
It means everyone can see how you structure you data. It is open
You don’t need to comprimise your data. It’s in its orgina
In terms of linked data you don’t need to worry about the bibliographic universe. Someone else will sort that, the publishers or the national libraries. What you should concentrate on are your unique collections. Describe them and describe them well. You probably already do that, or are thinking of doing it.
When you have limited resources only focus on your unique collections, invest your time there. For example the published output of the UK available as LOD – that’s a problem for national libraries. BL have started doing this by publishing LOD for BNB. Invest your effort on what is unique. Perhaps that photo collection. Perhaps those local history pamphlets. Perhaps those manuscripts
Best contribution. You will then make the best contribution. You wont be replicating anything else that is being done and you can be satisfied that it is a valueable contribution. You add to the linked data universe, you don’t duplicate it.
BEST metadata – what we’re thinking about in the Library is, coz of our strategy, we will most likely digitised a lot fo unique material such as our manuscripts and archives. To do that we need to touch the describe them. As we describe them we can record URIs from open vocabularies. So we improve access to the collections in terms of traditional and linked metadata and access in that we can present a digital version of the original. WIN WIN WIN
So to recap
We were interested in learning more about RDA, not the faux RDA we use in MARC but real pure RDA.
RDA has an RDF representation which basically means its linked data. The RDA elements are registered as URIs in the Open Metadata Registry, which means anyone and anything (machines) can view and use them and see the semantics.
RIMMF is a training tool that lets you work in pure RDA and interact with common vocabularies like LCNAF and then output RDF (and if you must MARC)
You could publish that as a document, but then you would have to issue updates everytime something changed - a registry has much better functionality if the RDF is likely to change, which RDA is/does
It’s the O in open.
To link you need to publish open coz others will use and re-use your metadata for the purposes of linking.
If you’re nervous about this remember
Metadata is an advert to the resource, it isn’t the resource. Your digital object can be licensed another way
You don’t need to publish all of your metadata as CC-0. Perhaps you have curated info that you want to retain the intellectual property over. Just don’t include it with the metadata that is CC-0
So you might not be able to do anything to make your metadata linked but others will do it for you. For example giving your metadata for digital resources to Europeana and they will turn it into linked open data to power Europeana. The data your sending to OCLC is being turned in to linked data.
use open vocabulariesthe big library ones are:DDCLCNAFLCSHTGMIthe others are TGN, AAT, and perhaps v specificy vocs for your collection focus
If you have a specialised local voc then consider publishing it and mapping to other open vocabularies
Demand better systems that can use modern content standards such as RDA
That can help you manage URIs (creation, deprecation)
Help you publish LOD and make RDF represenations
Services that LOD relies on are all in beta. DDC is down, LC was down for maintenance, we don’t have systems do manage URIs. It’s all in Beta
It’s difficult convincing management to make even modest investment coz is difficult to demonstrate.
We’re going that way anyway …. It’s a continuum.