2024: Domino Containers - The Next Step. News from the Domino Container commu...
Peak cloud based data - linked data
1. Dealing with the “new” data in the
“Cloud” – Linked Data
London - New York - Dubai - Mumbai 2011
2. Table of Contents
Definitions 3
History 5
The Modigliani Test 11
Link Data 13
Raw Data 23
Resource Description Framework 30
Linked Data Principles 42
Publishing Linked Data 57
Faceted Browsers 65
On-the-fly Mashups 67
SPARQL 73
What is a Linked Data Application 77
Characteristics of a Linked Data Application 78
Contact Us 81
3. Definitions
RDF: The RDF data model is similar to classic conceptual
modelling approaches such as Entity-Relationship or Class
diagrams, as it is based upon the idea of making statements about
resources (in particular Web resources) in the form of subject-
predicate-object expressions. These expressions are known as
triples in RDF terminology. The subject denotes the resource, and
the predicate denotes traits or aspects of the resource and
expresses a relationship between the subject and the object. For
example, one way to represent the notion "The sky has the colour
blue" in RDF is as the triple: a subject denoting "the sky", a
predicate denoting "has the colour", and an object denoting "blue".
RDF is an abstract model with several serialization formats (i.e.,
file formats), and so the particular way in which a resource or
triple is encoded varies from format to format.
4. Definitions
SPARQL: (SPARQL Protocol and RDF Query Language,
pronounced "sparkle") is an RDF query language
Linked Data: Linked Data describes a method of publishing
structured data, so that it can be interlinked and become more
useful. It builds upon standard Web technologies, such as HTTP
and URIs - but rather than using them to serve web pages for
human readers, it extends them to share information in a way that
can be read automatically by computers. This enables data from
different sources to be connected and queried.
5. History
Linked Data Design Issues by Tim Berners-Lee July 2006
Linked Open Data Project WWW2007
First LOD Cloud May 2007
BBC publishes Linked Data 2008
NY Times announcement SemTech2009 - ISWC09
Data.gov.uk publishes Linked Data 2010
11. The Modigliani Test
Show me all the locations of all the original paintings
of Modigliani
Daniel Koller (@dakoller) showed that you can find
this with a SPARQL query on DBpedia
19. Using the Current Web =internet + links + docs
is terribly inefficient
20. So what is the problem?
We aren’t always interested in documents
• We are interested in THINGS
• These THINGS might be in documents
We can read a HTML document rendered in a browser and find
what we are searching for
• This is hard for computers. It’s typically based on
guesswork from some primitive NLP engine, or simple
keyword search
21. What do we need to do?
Make it easy for computers/software to find THINGS
22. How can we do that?
• Besides publishing documents on the web
- which computers can’t understand easily
• Let’s publish something that computers can
understand
30. Resource Description Framework (RDF)
A data model
•A way to model data
•i.e. Relational databases use relational data model
RDF is a triple data model
Labeled Graph
Subject, Predicate, Object
<Wael> <was born in> <Beirut>
<Beirut> <is part of> <the Lebanon>
<Wael> <likes> <the Semantic Web>
31. RDF can be serialized in different ways
RDF/XML
RDFa (RDF in HTML)
N3
Turtle
JSON
32. So does that mean that I have to
publish my data in RDF now?
35. Databases back up documents
THINGS have PROPERTIES:
A Book as a Title, an author, …
Isbn Title Author PublisherID ReleasedData
978-0-596- Programming Toby Segaran 1 July 2009
15381-6 the Semantic
Web
… … … … …
PublisherID PublisherNa
This is a THING: me
A book title “Programming the
Semantic Web” by Toby Segaran, 1 O’Reilly
… Media
… …
36. Lets represent the data in RDF
Isbn Title Author PublisherID ReleasedData
978-0- Programming Toby 1 July 2009
596- the Semantic Segaran
15381- Web
6
Programming the
PublisherID PublisherName title Semantic Web
1 O’Reilly Media
author
book Toby Segaran
isbn 978-0-596-15381-6
publisher
Publisher O’Reilly
name
37. Remember that we are on the web
Everything on the web is identified by a URL
38. And now let’s link the data to other data
Programming the
Semantic Web
title
http://…/isbn Toby
author Segaran
978
978-0-596-15381-6
isbn
publisher
http://…/pu O’Reilly
blisher1 name
39. And now consider the data from Revyu.com
hasReview
http:// http://…/
…/revie isbn978
w1
description
reviewer
Awesom
e Book
name
http://… Wael
/reviewer Elrifai
40. Let’s start to link data
http:// hasReview http://…/
…/revie isbn978
w1 Programming
description title the Semantic
hasReviewer sameAs Web
Awesom http:// author Toby
e Book …/isbn9
Segaran
78
http://
…/revie name
wer isbn
978-0-596-15381-6
Wael publisher
Elrifai http://…/ name
publisher1 O’Reilly
41. Data on the Web that is in RDF and
is linked to other RDF data is
LINKED DATA
42. Linked Data Principles
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up
(dereference) those names.
3. When someone looks up a URI, provide
useful information.
4. Include links to other URIs so that they can
discover more things.
43. Linked Data makes the web appear
a single global database!
The same can be done inside your company!
44. What if you wanted to know your company’s
EBITDA for Catalonia in 2010?
You could have a EDW pre-aggregate and
distribute the data, an analyst calculate it on
the spot, or…
45. Linked data in your internal semantic
web could relate all transactions to a
linked financial formulae!
You ask the question, tell your system
where to look (as part of the question,
this can be prebuilt) and voilà!
46. I can query a database with SQL. Is
there a way to query Linked Data with a
query language?
47. Yes! There is actually a standardize
language for that
48. FIND all the reviews on the book
“Programming the Semantic Web”
by people who live in London
49. hasReview http://…/
http://…/ Programming
isbn978 the Semantic
review1
Web
description title
hasReviewer sameAs
Toby
Awesom http:// Segaran
author
e Book …/isbn9
78
http://… 978-0-596-15381-6
/reviewer name
isbn
sameAs Wael publisher http://…
Elrifai name O’Reilly
/publishe
r1
http://waelw
orldwide.com livesIn http://dbpedia.org/London
name Wael Elrifai
50. This looks cool, but let’s be realistic.
What is the incentive to publish
Linked Data?
51. What was your incentive to publish
an HTML (Intranet) page in 1990?
52. 1) Share data in documents
2) Because you neighbor was doing it
58. Publishing Linked Data
• Legacy Data in Relational Databases
• D2R Server
• Virtuoso
• Triplify
• Ultrawrap
• CMS
• Drupal 7
• Native RDF Stores
• Databases for RDF (Triple Stores)
• AllegroGraph, Jena, Sesame, Virtuoso
• Talis Platform (Linked Data in the Cloud)
• In HTML with RDFa
62. Google and Yahoo are starting to crawl
RDFa!
The Semantic Web is a reality!
63. The Reality
•Yahoo is crawling data that is in RDFa and
Microformats under a specific vocabularies
• FOAF
• GoodRelations
• Google is crawling RDFa and Microformaats that
use the Google vocabulary
71. Time to create new and innovative
ways to interact with Linked Data
72. This may be one of the Killer Apps that we have all been
waiting for
http://en.wikipedia.org/wiki/File:Mosaic_browser_plaque_ncsa.jpg
73. Where can I find SPARQL Endpoints?
Dbpedia:
http://dbpedia.org/sparql
Musicbrainz: http://dbtune.org/musicbrainz/sparql
U.S. Census:
http://www.rdfabout.com/sparql
Semantic Crunchbase: http://cb.semsol.org/sparql
http://esw.w3.org/topic/SparqlEndpoints
74. • Querying a single dataset is quite boring
compared to:
• Issuing SPARQL queries over multiple datasets
• How can you do this?
1. Issue follow-up queries to different endpoints
2. Querying a central collection of datasets
3. Build store with copies of relevant datasets
4. Use query federation system
75. Follow-up Queries
• Idea: issue follow-up queries over other
datasets based on results from previous
queries
• Substituting placeholders in query templates
76. Getting Started
• Finding URIs
• Finding Additional Data
• Finding SPARQL Endpoints
77. What is a Linked Data application
Software system that makes use of data on the
web from multiple datasets AND that benefits
from links between the datasets
78. Characteristics of Linked Data Applications
• Consume data that is published on the web following
the Linked Data principles
• Discover further information by following the links
between different data sources
• Combine the consumed linked data with data from
sources (not necessarily Linked Data)
• Expose the combined data back to the web
following the Linked Data principles
• Offer value to end-users
80. Hot Research Topics
• Interlinking Algorithms
• Provenance and Trust
• Dataset Dynamics
• UI
• Distributed Query
81. Contact
PEAK Consulting United States United Arab Emirates
Headquarters
11 Penn Plaza, 5th floor Unit P12 Rimal, The
90 Long Acre, Covent Garden New York, NY 1000 Walk
London WC2E 9RZ United States PO Box 487 177 Dubai
United Kingdom United Arab Emirates
Tel: +1 (212) 946 4824
Tel: +44 (0)207 849 3422 Fax: +1 (212) 946 2801 Tel: +44 (0)207 849
Fax: +44 (0)207 990 9478 3422
Fax: +44 (0)207 990
9478
http://www.peakconsulting.eu
info@peakconsulting.eu