These five ‘stars’ were developed by Tim Berners-Lee to describe his vision for a web of data.
A fuller explanation of five star linked data (more than can be fit on a coffee cup
XML has its uses, and will continue to be useful in a number of ways. But it does not support linked data very well, and we can’t use it as the sole basis for moving forward.
RDF is the language of the Web—it requires very different ways of thinking for us to manage.
This is the human-friendly view, but the text-basis is not very helpful for machines.
This is the machine-friendly view, all URIs except for the date.
This is the scariest and least well understood part of linked data. But it’s essential to understand and accommodate this view. It’s why the notion of a master record and de-duping is often beside the point.
The Library of Congress Linked data service, LC Name Authority File. This view is optimized for human readers, using text. At the end of the page you can see the list of other formats available by machine.
This is the MADS/RDF machine-friendlier version of the previous human-friendly page. Note that the full name information and the date range correlate to the MARC authority heading, although the name and date information is separated (much as MARC already does with $d). Interestingly, the additional info with the red highlights is defined as FOAF Person information, which records the actually birthdate and death date as YYYYMMDD, just the way computers like it. Note that her 239th birthday just passed.
This is the OCLC Work record for Pride and Prejudice. Sorry it’s impossible to see well on a slide. But I wanted to show how much data there is, in order to look closely later to see who’s data is being used or re-used. Note especially that there seems to be very little BibFrame or RDA data being used. Also note that the information on the ‘schema:creator’ includes fairly complete information on Jane Austen, as well as a link to an authority record.
A portion of the data on the previous page. Still not easy to see, but the JA info under her schema creator ‘box’ does not specifically identify her as an author—instead she is a ‘creator’ in schema.org terms. Keep in mind that the description is about the work, not about the author. This is quite different from traditional library practice, as well as most LOD practices.
This is a sample showing the Microdata serialization for a schema.org person description. Note that for the most part, schema.org info is expected to be embedded in the resource to aid the search engines as they seek to discover resources on the web. This has important implications for maintenance of the data.
This view is of a map of the area in England that includes Bath, with the selected metadata about Bath in the white box.
This is a larger view of that white box, called a map key by Geonames. Links and icons point out to other data or representations.
Here is some of the coded detail from Geonames, showing alternate names in the data and the language source for each alternate. A local ‘gn’ namespace is used for the data.
Note that the ‘gn’ local set, including features, types of features, countryCodes (probably based on ISO codes). But the lat/long info uses a different namespace specifically for that data (there are two formats available). The red arrows point out links to wikipedia and dbpedia.
This is a screenshot of an Expression record in RIMMF (the “E” on the top tells you this, as does the color of the frame). The yellow oval shows the links to the work and the manifestation, and below that are the primary actors. The red arrows show navigation—on the left a button to invoke the work; on the right a button to view the R-tree for the entire stack.
This is a screen shot from RIMMF (RDA in Many Metadata Formats) for the movie ‘Clueless’. This view is called an ‘R-Tree’ and is intended to show the structure of a specific resource. Each of the numbers in the middle column is an identifier for a RIMMF description. The parts of the description are based on FRBR packages: Work, Expression, Manifestation (RIMMF doesn’t handle Items at this time). One issue: all of these identifiers are local to a particular RIMMF user.
This is the work record for the resource. Note the ‘W’ at the top, and the top level ‘Work’ label on the record template. Jane Austen comes in because ‘Clueless’ was adapted from Jane Austen’s ‘Emma’. The RDA Toolkit reference links are in the third column—subscribers can go directly to the applicable instruction.
This is the manifestation record for Clueless. The RIMMF system creates the “Composite Key” as an assist to the cataloger—it’s an analog to what catalogers have done with authorized access points, but with a different function.
This is the RIMMF person description, including birth and death dates and information on where she lived and died. Also are gathered other people associated with her via her works.
This is a screenshot of the RDA Registry showing information about the ‘is author of’ property, showing its relationships—to ‘is creator of’ and the same property in the unconstrained set.
This is the site where RIMMF info can be examined and downloaded. Topics are expanding, and this is a good place to get a glimpse of RDA info.
Here are the rdf statements in the triplestore deveoped for the Jane-athon data. Note the categorization of statements to those where the selected statement id is either subject or object. The triplestore is definitely a work in progress, and again, it’s usefulness is limited by the fact that the ids are local to RIMMF.
Not intended for a user display (unless the user is a tech savvy cataloger!)
Lavender group in upper right is Dublin Core Peach groups are RDA, some constrained (e, m) some unconstrained (u) Darker peach on bottom level: a variety of schemas: BIBO, UniMARC, MARC 21
The dotted lines indicate mapping relationships that could be made but are not yet formally available. The solid lines indicate existing mapping relationships.
Library administrators are reading (?) but too often denying the many studies that indicate that (especially younger) users are NOT USING CATALOGS, in any form (including mobile-enabled ones), and this is, in effect, the writing on the wall. The very real costs of retraining and trying to enable the newer ways of providing data outside the silo can’t be compared with doing things the same old way—that path is not viable.
Choosing one ‘solution’ is not more expensive—the standards themselves are generally available for free (except the RDA Toolkit, at present).
Eric Hellman is the founder and president of Gluejar, a company dedicated to helping authors and publishers sustainably release free ebooks under Creative Commons licenses.
Moving to an open world
What is Linked Open Data?
”… a term used to describe a recommended best practice for exposing,
sharing, and connecting pieces of data, information, and knowledge on the
Semantic Web using URIs and RDF."
NETSL, Worcester (Apr. 2015) 2
Five Linked Data
0 Make your stuff available on the Web (whatever
format) under an open license
0 Make it available as structured data (e.g. spreadsheet
instead of an image scan of a table)
0 Use non-proprietary formats (e.g. CSV instead of
0 Use URIs to denote things, so that people can point at
0 Link your data to other data to provide context
NETSL, Worcester (Apr. 2015) 3
Why Linked Open Data?
0 The current Web is primarily a Web of DOCUMENTS,
where URLs embedded in documents link to other
documents. The Semantic Web is a Web of DATA that is
built and maintained outside of documents, and focuses on
meaning or semantics
0 RDF is a general-purpose framework that provides
structured, machine-understandable metadata for the Web
0 RDF Schemas (RDFS) and/or Web Ontology Language
(OWL) describe the meaning of each class or property
0 Metadata vocabularies can be developed and/or extended
for LOD use without central coordination
NETSL, Worcester (Apr. 2015) 4
Model of ‘the World’ /XML
0XML assumes a 'closed' world (domain), usually
defined by a schema:
0"We know all of the data describing this
resource. The single description must be a valid
document according to our schema. The data
must be valid.”
0XML's document model provides a neat
equivalence to a metadata 'record’
NETSL, Worcester (Apr. 2015) 5
Model of ‘the World’ /RDF
0 RDF assumes an 'open' world:
0 "There's an infinite amount of unknown data
describing this resource yet to be discovered. It will
come from an infinite number of providers. There
will be an infinite number of descriptions. Those
descriptions must be consistent."
0RDF's statement-oriented data model has
no notion of 'record’ (rather, statements
can be aggregated for a fuller description
of a resource)
NETSL, Worcester (Apr. 2015) 6
Subject Predicate Object
is author of
has place of residence, etc.
has date of publication
NETSL, Worcester (Apr. 2015) 7
Where do the identifiers in that
picture come from?
0 LC NAF (Jane Austen)
0 RDA Registry (isAuthorOf)
0 Worldcat (Work identifier)
0 Bath, UK (Geonames)
0 Date is a ‘literal’, with no identifier (but could be
explicitly ‘typed’ as a date)
NETSL, Worcester (Apr. 2015) 9
Warning: Linked Data is
0 Requires creating and aggregating data in a broader
0 There is no one ‘correct’ record to be made from this data, no
objective ‘truth’, just multiple points of view
0 This approach is different from the cataloging tradition
0 BUT, the focus on vocabularies is familiar
0 Linked data relies on the RDF model (although XML can be
used to express RDF, it’s not always a happy marriage)
0 The bottom-up chaos and uncertainty of the linked data
world is possibly the hardest thing for catalogers to get
their heads around
NETSL, Worcester (Apr. 2015) 10
Let’s Look at Data
Keeping in mind the limitations of the catalog card above!
NETSL, Worcester (Apr. 2015) 11
While it’s great to see how others are
building and sharing open data (and
how much of it could be reused), how
can libraries and communities of
Disclaimer: I’m a member of the RDA
Development Team (sponsor of the
recent Jane-athon), so I’ll show you
how that data was built
NETSL, Worcester (Apr. 2015) 21
-There’s not just one way to
align bibliographic data: we
don’t have to agree on one
aimed at use by particular
applications, seen as
primarily a network activity
-Crosswalks only recognize
one relationship: sameAs—a
very blunt instrument!
NETSL, Worcester (Apr. 2015) 30
What We Mean by ‘Mapping’
NETSL, Worcester (Apr. 2015) 31
Will This Shift Cost Too Much?
0 Our current investments have reached the end of their usefulness
0 All the possible efficiencies for traditional cataloging have already
been accomplished (and the goalposts have shifted considerably)
0 We need to support efforts to invest in more distributed
innovation and focused collaboration—sharing the costs
0 It’s the human effort that costs us most
0 Cost of traditional cataloging is far too high, for increasingly
0 Quality is not tied to any particular creation strategy
0 Human created metadata can be extremely variable
0 Machine-created metadata is far more consistent, but that
consistency may not be correct
NETSL, Worcester (Apr. 2015) 32
0 Our big investment is (and has always been) in our data,
not our systems
0 Looking at new descriptive standards, we should recognize
that libraries still have multiple needs
0 Beware of experts who promote a ‘one standard’ solution
0 For data creation—likely to need richer data for your
institution or community and another for external
0 Distribution can include multiple streams with choices made
by downstream user
0 There are multiple paths to Linked Open Data; putting all your
eggs in one basket will make your institution more vulnerable,
NETSL, Worcester (Apr. 2015)
Notes on Planning
0 Keep in mind that ‘crosswalks’ are outdated (and
provide only one relationship: sameAs)
0 Semantic maps are flexible, shareable, and use a variety
0 Support for building and sharing semantic maps is being
0 As you plan: talk to systems people about data flow;
your ILS may last for a while longer if you position it
as part of your data flow, not your only data flow
0 Investing in staff education is money well spent—
before decisions are made, not after
NETSL, Worcester (Apr. 2015) 34
NETSL, Worcester (Apr. 2015) 35
“In the big picture, very little will change: libraries
will need to be in the data business to help people
find things. In the close-up view, everything is
changing-- the materials and players are different, the
machines are different, and the technologies can do
things that were hard to imagine even 20 years ago.”
0 Rick Anderson, The Crisis in Research Librarianship, Journal of
Academic Librarianship, July 2011
0 Introducing Linked Data and The Semantic Web
0 Free Your Metadata http://freeyourmetadata.org/
0 Linked Open Data Laundromat http://lodlaundromat.org
0 Linked data for libraries, archives and museums : how to clean,
link and publish your metadata. Chicago : Neal-Schuman, 2014
0 Linked data: what cataloguers need to know
NETSL, Worcester (Apr. 2015) 36
NETSL, Worcester (Apr. 2015) 37
The First MetadataMobile
(now driving MetadataMobile 2.0)