What do libraries need to do in order to begin to join the web of linked data?Are semantic tools sufficiently mature? The first step is for libraries to expose their data as RDF.:Convert existing data; create new data using RDF publishing tools.
One huge advantage that libraries have is that we already own a lot of heavily structured data. A number of crosswalks and converters already exist to port structured library data to RDF. Many are freely available online.
Convert existing MARC records to RDF documents. Both require a bit of fiddling to make them run properly. Ross Singer MARC2RDF Modelerhttp://github.com/rsinger/marc2rdf-modelerWritten in Ruby. I’ve tried this and it worked. http://simile.mit.edu/wiki/MARC/MODS_RDFizerWritten in Java.
http://simile.mit.edu/wiki/OAI-PMH_RDFizerI’ve also had success using this one.
Another big advantage is that Dublin Core, a commonly used library schema, is a very heavily used vocabulary in RDF documents. http://dublincore.org/documents/dc-rdf/DC Elementsxmlns:dc="http://purl.org/dc/elements/1.1/DC Termsxmlns:dcterms=“http://purl.org/dc/terms/”DC Abstract Model xmnls:dcam=“http://purl.org/dc/dcam/
RFDizers for formats of all kinds can be found at MIT’s SIMILE project website.http://simile.mit.edu/wiki/RDFizers
Old formats will need to be converted, but ideally new information will be published directly to RDF, and enhanced with semantic data as it is created.Many popular web publishing tools now have RDF & Semantic capabilities.
Many libraries host blogging platforrms, and a number of tools exist to make blogging more semantic. Zemanta is one example of a tool to make blogs more semantic. As you blog, Zemanta performs on the fly term extraction, disambiguates by examining the surrounding context, and suggests appropriate enrichment material.
As you select linked and related resources Zemanta inserts them into your blog code. Using a protocol called Common Tags, Zemanta can insert RDFa snippets into your code, linking the entities in your blog to the RDF and OWL vocabularies, and to their own Zemanta RDF vocabulary.Currently Zemanta can insert Common Tags on only a limited number of blog platforms. The following platforms are supported: Movable Type, TypePad, andDrupal. - from Zemanta FAQ
Wikis are another type of publishing tool that we frequently find in academic libraries.You’ve probably heard of MediaWiki, the software that famously powers wikipedia, well a Semantic MediaWiki is also being developed that will be capable of exposing wiki data as RDF.Uses forms to collect information so that data is structured as it is entered.
Looks like a normal wiki post. http://smwdemo.ontoprise.com/index.php/User:Lisagoddard
But produces RDF structured data. The RDF feed link exposes my personal profile using FOAF and vcard vocabularies.
Can produce structured event information in iCal format.
Many libraries use theDrupal Content Management System as the backend of the library website. Thecurrent stable version is Drupal 6, which does not include core support for RDF, but there are several contributed modules that help to produce RDF export, including evoc which allows you to import external vocabularies to Drupal, and expose those classes and properties to other Drupal modules for reuse. http://drupal.org/project/rdf
InDrupal7the ability to map the data structure to RDF and expose this in RDFawill be ported to the Drupal 7 coreThis means that site developers will have many options for linking data if they wish to import various schemas, but even if the site manager has no knowledge about RDF, Drupal 7 sites will expose the common elements like title, author, date, etc. asRDFa.Here we can see a Project Blog that has been created by semantic web research organization DERI using Drupal 7 Beta.http://srvgal65.deri.ie/projectblogs/http://openspring.net/blog/2010/01/12/rdfa-in-drupal-7-last-call-for-feedback-before-alpha-release
Using the Visinav RDF browser we can query the blog data in new ways. For example, “show me everyone whom Manfred Hauswirth knows, then show me everything that has been published by those people”.TheVisinav data browser is able to read the relationships in the RDF mark-up, allowing us to execute queries like
Blogs, wikis, and Drupal are generic tools for creating web content, but what about tools that are specific to libraries?
RDF support for Dspace is in development Here we see an MITexample of aDspaceimplementation that can generate an RDF representation of the record for objects stored in the repository. http://dspace-test.mit.edu/handle/1721.1/39126http://dspace-test.mit.edu/metadata/handle/1721.1/39126/rdf.xmlProject Simile (Semantic Interoperability of Metadata and Information in unLike Environments) – to leverage and extend DSPACEhttp://simile.mit.edu/http://www.fedora-commons.org/confluence/display/DSPACE/Basic+RDF+implementation
Fedora is natively semantic, as it has integrated RDF triple store called Mulgara.http://www.fedora-commons.org/download/2.1.1/userdocs/digitalobjects/introRelsExt.html – how relationships are encoded in Fedora.
A record from Oxford University’s fedora-based repositoryhttp://ora.ouls.ox.ac.uk/objects/uuid%3A96efaceb-fcc8-46db-94c9-c2e0bd99d34d
RDF output options at the record level - from Oxford’s Fedora-based repositoryhttp://ora.ouls.ox.ac.uk/objects/uuid%3A96efaceb-fcc8-46db-94c9-c2e0bd99d34d
Eprints is a very commonly deployed institutional repository platform.http://eprints.ecs.soton.ac.uk/7970/http://eprints.ecs.soton.ac.uk/cgi/export_redirect?eprintid=7970&format=RDFXML
We can see in this example from the University of Southhampton that Eprintsoffers exporting as RDF/XML, N3, N-Triples, and OAI-ORE (RDF)http://eprints.ecs.soton.ac.uk/7970/http://eprints.ecs.soton.ac.uk/cgi/export_redirect?eprintid=7970&format=RDFXML
After looking around I was surprised to discover the number of RDF publishing tools that are already available.Librarians need to become aware of the existing tools, experiment with these tools, develop them, and also begin to demand more semantically compliant tools from our vendors.
The first step towards the semantic web vision as articulated by the W3C is for organizations to publish their data as RDF, but linked data is more than just RDF output.
Linking hubs are distributed nodes on the internet that host and expose a lot of ontologies and RDF data drawn from different organizations. Use existing vocabularies like FOAF & DC rather than trying to model all of the concepts yourself. It is common practice to mix terms from different vocabularies.
Before you begin to mint URIs for resources and concepts, check to see if any exist already. It is better to use existing URIs like those exposed by DBpedia or Geonames, as these URIs are already interlinked with many other sources, so you are linking your data to an existing rich data set. The most valuable RDF links are those that connect a resource to external data published by other data sources, because they link up different islands of data into a Web.Technically, such an external RDF link is a RDF triple which has a subject URI from one data source and an object URI from another data source.So what kinds of sites might become large providers or URIs for library related resources and concepts?
One very well known example of a library linking hub is loc.id.gov that provides linked URIs for the entire LCSH vocabulary.
Led and hosted by OCLC, the virtual international authority file combines personal name authority files from 15 international organizations and exposes the available data as RDF. http://viaf.org/http://bibwild.wordpress.com/2010/03/29/scope-of-id-loc-gov-and-viaf/http://viaf.org/viaf/34456780#Freud,%20Sigmund,,%201856-1939http://viaf.org/viaf/34456780.rdf
The RDF book mashup, developed at FreiUniversitat Berlin, uses a search to determine the ISBN and then queries APIs fromAmazon andGoogleBooks. The resulting XML responses are turned into an RDF model which is serialized to RDF/XML.
Dataincubator has harvested journal metadata provided by CrossRef, Highwire and the NLM, and has converted that data to RDF, providing URIs and linked data for major journal titles and publishers.http://periodicals.dataincubator.org/.htmlOutput citations as RDF/XML, JSON, Turtle.
RKBExplorer project uses RDF data from 18 partner institutions, and has also harvested and converted quite a lot of people and publication data from major metadata resources. They now have some 50 million triples from Citeseer, the ACM, DBLP, NSF and selected IEEE conferences.http://eprints.ecs.soton.ac.uk/15152/http://www.rkbexplorer.com/data/http://www.rkbexplorer.com/explorer/#display=person-%7Bhttp%253A%252F%252Fsouthampton.rkbexplorer.com%252Fid%252Fperson-da9c463f8b783083d7d7e9003db8224f-57e2ec2d7aee429c73fef344805033e2%7D
The project is especially interesting because it allows you to navigate the relationships between publications, people, organizations, and projects.
RKBExplorer is an interesting example of how we might use linked data to explore relationships that would not be easily exposed
There are lots of existing organizations that already function as hubs and aggregators, these could also become library linking hubs.Library of Congress, OCLC, LIBRIS, VIAF, LibraryThing, Google Books, EBSCO, Elsevier, OpenLibrary, DataIncubator
Lisa covered the how, now I'm going to talk a little bit about the why. Linked data, especially in North America, isn’t really on libraries’ radars Let’s face it, it’s not particularly sexy, or easy both to conceptualize and to implement
To the human eye, these are obviously describing the same book; but look how different the labels, the type of information. Beyond the ISBNs (which are formatted differently) and title, a computer wouldn’t have an easy time identifying the two
We’ve coped with our silos by finding unique identifiers to link across them. Unfortunately, these are generally weak connectors – look at Isxns – multiple issns for journals, eissns, 13 digit isbns…not precisely unique.CLICKThis is where the potential of schemas and ontologies to link information comes in. Aimagine the potential of marking up LC (already done) and APA’s psychological index term. Then you have the ability, referring to the previous record, to easily indicate <antisocial personality disorders> in LCSH is <same as> <antisocial behaviour>
A second major advantage of Linked Data for searching is the increased ability to disambiguate terms. Google and other relevancy based search engines have several methods for attempting to disambiguate, but when it comes down to it, they’re all pretty ineffective in determining what the query “Madonna’s Rafael” might mean. The power of RDF and linked data is that it exposes relationships between terms in a machine understandable way. SoRafael – painted – Madonna and ChildMadonna – visited – Rafael’s nightclub
The predicate element is obviously of immense value when we think of related items. Our cat standards haven't done us any favours when it comes to pulling together different versions of the same but, but are really quite useless when it comes to related items. ‘CLICKThe fact that, with all our metadata, we can’t easily present our users with a list of all works based on P&P is a serious indictment of our organization of information.
Lisa went over in some detail about linking hubs, I won’t repeat it here except to say, imagine the power and sustainability of enriching your content. A perfect example is the LIBRIS, the Swedish National Union Catalogue, which has added links to biographical information to many authors using dbpedia Of particular value is the ability to link geographically disparate special collections together. Not only will linked data streamline process by which users can discover like items in a single library collection, but it will allow machine understandable links between collections. As Dan Chudov put it, “We all know that there's a large collection of rare recordings from that musician we were talking about over at that other library, right? And for so many years, the musician's band mates deposited their papers at that other special collection up north.”
There is also power in linked data beyond the bibliographic. Envision revising our patron database as linked data – This is all information we collect about users and their relationship to library materials, but the database format we use now isn’t of much use to it.To be clear, not talking about publically exposing this data, but rather using it as we use patron data now, just far more effectively, much like the Netflix recommender example used in the lkeynote.
I admit, I just always feel the need to add a little monty python to my preses. But one of the major ongoing failures of libraries is the lack of effective fed search. But revisioning our metadata rather than creating discovery layers, might actually get us the ultimate library holy grail – effective fed search for our users.
So why, if the holy grail is the end result, aren’t we rushing off to get this done? Why the slow uptake (or in NorAm, little to no uptake’’)?
Trust is an interesting one – as I talked about earlier, the power of enriching data will more than likely rely on externally created sources like dbpedia. Will librarians be able to cede control of data?However, many of the sources we use even now rely on metadata created outside the library world – librarians, I think, don’t have as much trouble with externally created sources that they’ve deemed credible (APA ex)
Even if we wanted to steam full ahead, here’s a fundamental problem – we don’t own most of our data. Even if we rework all of the data we own, for our users the most important data in our databases, can’t be included unless the publishers work with us
Oh yeah, also…it’s a lot of hard work. There’s a reason we keep adding discovery layers and every other trick in the book – to hide the fact that we’re working with pretty poor data. But, the light at the end of the tunnel might be the oncoming train of RDA. I’m no cataloguer, but if you look at the aim of the DCMI/RDA Task Group, there is much promise:Define RDA modeling entities as an RDF vocabulary (properties and classes). Identify in-line value vocabularies as candidates for publication in RDFS or SKOS.3. Develop a DC Application Profile for RDA based on FRBR and FRAD.
Many libraries simply do not have the resources to transform all their data stores into linked data, but most have small standalone collections of structured data that can work to develop expertise and technologies within libraries. Taking advantage of resources in platforms that support linked data such as DSPACE and Fedora as well as the many freely available tools to convert data to RDF can operationalize what seems like an insurmountable task.Where you can REALLY add value (local collections)
Exposing data, while a great first step, is not enough to realize the full potential of linked data. this is a particularly significant role for libraries to fulfill. Librarians have been using controlled vocabulary as a of their services, and “converting these tools and vocabularies to linked data standards will provide limitless potentialSupporting and contributing to the efforts underway such as those of the DCMI, as well as ‘web’ifying locally maintained controlled vocabulary is a natural fit for the profession.This is an example of a simple locally maintained thesaurii I converted using skos as an experiment.
I admit, I need this cartoon.In all seriousness, this is another easy fit for libraries – we love to share. Lisa and I came here, not as experts, but as advocates of the importance of linked data for the future. We hope equally to take home everything we’ve learned here.
There is a large advocacy role for librarians in linked data. Librarians must demand that vendors develop their own data semantically. Institutions can work on data in catalogues and digital repositories, but if datasets in article databases and electronic resources remain mired in web 1.0, the true benefits of linked data cannot be realised. Additional advocacy efforts around open data.it might be simplistic to proclaim “data wants to be free”, but being able to access and reuse publicly funded data will enrich all the resources libraries offer users.
The Strongest Link: Libraries and Linked Data
The Strongest Link: Libraries and the
Web of Linked Data
Gillian Byrne & Lisa Goddard
Memorial University of Newfoundland
Emerging Technologies in Academic Libraries 2010
Trondheim, Norway - April 2010
What do Libraries need to do?
1. Use the RDF data model to
publish structured data on the
2. Use RDF links to interlink data
from different data sources.
What are the obstacles?
• Data ownership
The largest hurdle to library adoption
of Linked Data, though, may not be
educational or technological …The
sticking point for librarians may be an
issue of trust.
- Ross Singer, “Linked Data Now!”
Rocking the foundations
• DCMI/RDA Task Group
What can we do…now?
• Eric Miller noted in a 2004 talk that libraries
have four major roles in the semantic web:
1. exposing collections – use Semantic Web
technologies to make content available;
2. web'ifying thesaurus/mappings/services;
3. sharing lessons learned;