Dealing with the “new” data in the “Cloud” – Linked DataLondon - New York - Dubai - Mumbai 2011
Table of Contents Definitions 3 History 5 The Modigliani Test 11 Link Data 13 Raw Data 23 Resource Description Framework 30 Linked Data Principles 42 Publishing Linked Data 57 Faceted Browsers 65 On-the-fly Mashups 67 SPARQL 73 What is a Linked Data Application 77 Characteristics of a Linked Data Application 78 Contact Us 81
DefinitionsRDF: The RDF data model is similar to classic conceptualmodelling approaches such as Entity-Relationship or Classdiagrams, as it is based upon the idea of making statements aboutresources (in particular Web resources) in the form of subject-predicate-object expressions. These expressions are known astriples in RDF terminology. The subject denotes the resource, andthe predicate denotes traits or aspects of the resource andexpresses a relationship between the subject and the object. Forexample, one way to represent the notion "The sky has the colourblue" in RDF is as the triple: a subject denoting "the sky", apredicate denoting "has the colour", and an object denoting "blue".RDF is an abstract model with several serialization formats (i.e.,file formats), and so the particular way in which a resource ortriple is encoded varies from format to format.
DefinitionsSPARQL: (SPARQL Protocol and RDF Query Language,pronounced "sparkle") is an RDF query languageLinked Data: Linked Data describes a method of publishingstructured data, so that it can be interlinked and become moreuseful. It builds upon standard Web technologies, such as HTTPand URIs - but rather than using them to serve web pages forhuman readers, it extends them to share information in a way thatcan be read automatically by computers. This enables data fromdifferent sources to be connected and queried.
History Linked Data Design Issues by Tim Berners-Lee July 2006 Linked Open Data Project WWW2007 First LOD Cloud May 2007 BBC publishes Linked Data 2008 NY Times announcement SemTech2009 - ISWC09 Data.gov.uk publishes Linked Data 2010
Using the Current Web =internet + links + docsis terribly inefficient
So what is the problem? We aren’t always interested in documents • We are interested in THINGS • These THINGS might be in documents We can read a HTML document rendered in a browser and findwhat we are searching for • This is hard for computers. It’s typically based on guesswork from some primitive NLP engine, or simple keyword search
What do we need to do?Make it easy for computers/software to find THINGS
How can we do that? • Besides publishing documents on the web - which computers can’t understand easily • Let’s publish something that computers can understand
RAW DATA!But don’t we already publish raw data in RDBMS, XML, CSV, etc?
Yes!But it’s not in a consistent format, and very difficult to integrate (or “link”).
For example, how do I know that theWael Elrifai in Facebook is the same as Wael Elrifai in Twitter
Don’t we already have a standard way of publishing on the web?
We have a standardized way ofpublishing documents on the web, right? HTML
Then why can’t we have a standard way of publishing data on the Web?
Resource Description Framework (RDF) A data model •A way to model data •i.e. Relational databases use relational data model RDF is a triple data model Labeled Graph Subject, Predicate, Object <Wael> <was born in> <Beirut> <Beirut> <is part of> <the Lebanon> <Wael> <likes> <the Semantic Web>
RDF can be serialized in different ways RDF/XML RDFa (RDF in HTML) N3 Turtle JSON
So does that mean that I have to publish my data in RDF now?
Databases back up documents THINGS have PROPERTIES: A Book as a Title, an author, …Isbn Title Author PublisherID ReleasedData978-0-596- Programming Toby Segaran 1 July 200915381-6 the Semantic Web… … … … … PublisherID PublisherNa This is a THING: me A book title “Programming the Semantic Web” by Toby Segaran, 1 O’Reilly … Media … …
Lets represent the data in RDFIsbn Title Author PublisherID ReleasedData978-0- Programming Toby 1 July 2009596- the Semantic Segaran15381- Web6 Programming thePublisherID PublisherName title Semantic Web1 O’Reilly Media author book Toby Segaran isbn 978-0-596-15381-6 publisher Publisher O’Reilly name
Remember that we are on the webEverything on the web is identified by a URL
And now let’s link the data to other data Programming the Semantic Web title http://…/isbn Toby author Segaran 978 978-0-596-15381-6 isbn publisher http://…/pu O’Reilly blisher1 name
And now consider the data from Revyu.com hasReview http:// http://…/ …/revie isbn978 w1 descriptionreviewer Awesom e Book name http://… Wael /reviewer Elrifai
Let’s start to link data http:// hasReview http://…/ …/revie isbn978 w1 Programming description title the SemantichasReviewer sameAs Web Awesom http:// author Toby e Book …/isbn9 Segaran 78 http:// …/revie name wer isbn 978-0-596-15381-6 Wael publisher Elrifai http://…/ name publisher1 O’Reilly
Data on the Web that is in RDF and is linked to other RDF data is LINKED DATA
Linked Data Principles 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up (dereference) those names. 3. When someone looks up a URI, provide useful information. 4. Include links to other URIs so that they can discover more things.
Linked Data makes the web appear a single global database!The same can be done inside your company!
What if you wanted to know your company’s EBITDA for Catalonia in 2010? You could have a EDW pre-aggregate anddistribute the data, an analyst calculate it on the spot, or…
Linked data in your internal semanticweb could relate all transactions to alinked financial formulae!You ask the question, tell your systemwhere to look (as part of the question,this can be prebuilt) and voilà!
I can query a database with SQL. Isthere a way to query Linked Data with a query language?
Yes! There is actually a standardize language for that
FIND all the reviews on the book“Programming the Semantic Web” by people who live in London
hasReview http://…/ http://…/ Programming isbn978 the Semantic review1 Web description titlehasReviewer sameAs Toby Awesom http:// Segaran author e Book …/isbn9 78 http://… 978-0-596-15381-6 /reviewer name isbn sameAs Wael publisher http://… Elrifai name O’Reilly /publishe r1 http://waelw orldwide.com livesIn http://dbpedia.org/London name Wael Elrifai
This looks cool, but let’s be realistic. What is the incentive to publish Linked Data?
What was your incentive to publishan HTML (Intranet) page in 1990?
1) Share data in documents2) Because you neighbor was doing it
Publishing Linked Data • Legacy Data in Relational Databases • D2R Server • Virtuoso • Triplify • Ultrawrap • CMS • Drupal 7 • Native RDF Stores • Databases for RDF (Triple Stores) • AllegroGraph, Jena, Sesame, Virtuoso • Talis Platform (Linked Data in the Cloud) • In HTML with RDFa
Time to create new and innovativeways to interact with Linked Data
This may be one of the Killer Apps that we have all beenwaiting for http://en.wikipedia.org/wiki/File:Mosaic_browser_plaque_ncsa.jpg
Where can I find SPARQL Endpoints?Dbpedia:http://dbpedia.org/sparqlMusicbrainz: http://dbtune.org/musicbrainz/sparqlU.S. Census:http://www.rdfabout.com/sparqlSemantic Crunchbase: http://cb.semsol.org/sparqlhttp://esw.w3.org/topic/SparqlEndpoints
• Querying a single dataset is quite boring compared to:• Issuing SPARQL queries over multiple datasets• How can you do this? 1. Issue follow-up queries to different endpoints 2. Querying a central collection of datasets 3. Build store with copies of relevant datasets 4. Use query federation system
Follow-up Queries• Idea: issue follow-up queries over otherdatasets based on results from previousqueries• Substituting placeholders in query templates
What is a Linked Data applicationSoftware system that makes use of data on theweb from multiple datasets AND that benefitsfrom links between the datasets
Characteristics of Linked Data Applications• Consume data that is published on the web following the Linked Data principles• Discover further information by following the links between different data sources• Combine the consumed linked data with data from sources (not necessarily Linked Data)• Expose the combined data back to the web following the Linked Data principles• Offer value to end-users
Hot Research Topics • Interlinking Algorithms • Provenance and Trust • Dataset Dynamics • UI • Distributed Query
ContactPEAK Consulting United States United Arab EmiratesHeadquarters 11 Penn Plaza, 5th floor Unit P12 Rimal, The90 Long Acre, Covent Garden New York, NY 1000 WalkLondon WC2E 9RZ United States PO Box 487 177 DubaiUnited Kingdom United Arab Emirates Tel: +1 (212) 946 4824Tel: +44 (0)207 849 3422 Fax: +1 (212) 946 2801 Tel: +44 (0)207 849Fax: +44 (0)207 990 9478 3422 Fax: +44 (0)207 990 9478 http://www.peakconsulting.eu firstname.lastname@example.org