In the Internet the web is linked through various documents that we voluntarily share and link to other documents. So, for example, if you are on my wordpress site you can follow a the links I have created to other documents that relate to my topic. From these documents you can click on their their links to other documents and so forth. It is a fairly elegant idea that has allowed us to share almost anything we want to with anyone willing to look. Since about 2006 there has been a push, begun with Tim Berners-Lee (inventor of the web), to share and link our data together. This is especially relevant to our cultural heritage institutions – our libraries, archives and museums – because of massive amount of data they store and create. A recent conference at the Smithsonian, Jon Voss, a leader in information technology stated that in 2010 alone data produced by libraries archives and museums increased by an astounding 1000%. So the issue we will be discussing is how do we create a structure for data, how does that structure relate to LAMs, and what can come from sharing data.
The key concept of Linked Data are RDF triples. Triples are essentially nodes and links. Any piece of information can be defined as this and contains three pieces of information: The Subject (any piece of data), the Predicate (the vocabulary used to define relationships) and the object (any other piece of data. This makes data machine readable, scalable. In other words it is taking what makes the web so great and applying it data. The vocabularies used in the triples should come from commonly used authorities. I am saying should for two reasons: 1) remember that this model for creating linked open data is just a set of standards and best practices established to attempt at making it universal practice and 2) this method is so new that there may not be an ontology out there for you to use. In this case it will be necessary to create it and, hopefully, your ontolgy becomes an authoritative source. Some examples of widely used ontologies include FOAF, used to define personal data, Geo-names, an ontology for names of places, SKOS, which is an ontology used to define taxonomies.
Before we look at the technical aspects of LOD I think we first need to look at the principle of LD first devised by Tim Berners-Lee in 2006 and the addendum he added in 2010 which specifically addresses linked open data. For now we won’t get too caught up on the difference between LD and LOD suffice to say one is open and the other is closed; at the level I’ll be describing them they can be considered the same thing. The first principle is straight-forward, when you name any linked data use a URI. The W3C says that a URI is essentially a superset of a URL. URL describe locations for things and URNs describe names of things. Everybody follows this rule. The second rule flows naturally from the first. Use HTTP URIs so people can look up names. It is the standard for web communication and will be used in the Web of data. The third principle establishes the need for standards for linked data. While not in place in 2006 as of now the W3C has called for RDFs and SPARQL as standards. RDF is a data structure and SPARQL is a database query. I’ll go into detail on these later. Just now that they are considered to be standards for linking data. The forth principle advocates RDF hyperlinks to aid in the discovery of other data. These operate in essesntaily the same way as hyperlinks between web documents with one exception; Tom Heath and Christian Bizer explain in their book, Linked Data: Evolving the Web into a Global Data Space, that RDF links are typed, which means they are able to describe relationships between things. For example the type ‘performed at’ may be set between a musician and a place.
RDF triples are elegant because they are so scalable. You can easily add millions of triples to a single piece of data. You can have millions of pieces of data. Remember that statistic about how in 2010 data for LAMs grew 1000%. What is needed is a way to access these large data-sets. That is what RDF Query languages do and W3C advocates for SPARL as the standard. SPARQL stands for SPARQL Protocol and RDF Query Language.
Linked Data Structure:RDF
How to best provide access to data so it can mosteasily be reused? How to enable the discovery of relevent data withinthe multiude of availible dta sets? How to enable applications to integrate data fromlarge numbers of formerly iunknown data sources?Data for LibrariesArchives and Museumsgrew over 1000% in2010
FOAF (Friend of a Friend) – Defines personal relationshipsGeonames – defines names of placesSKOS - defines various taxonomiesRDF TriplesSubject ObjectPredicateKnowshttp://futurama.com/reaccuringcharacters/robot-devilhttp://xmlns.com/foaf/0.1/knowshttp://futurama.com/maincharacters/phillip-j-fryOntologies
RDF Characteristics RDF links things, not just documents RDF links are typed
Linked Data Principles1. Use URIs as names for things1. Use HTTP URIs so that people can look up those names1. When someone looks up a URI, provide useful informationusing standards (RDF, SPARQL)1. Include links to other URIs so that they can discover morethings2. Use HTTP URIs so that people can look up thosenames3. When someone looks up a URI, provide usefulinformation using standards (RDF, SPARQL)4. Include links to other URIs so that others can discovermore things