SlideShare a Scribd company logo
1 of 81
Download to read offline
Dealing with the “new” data in the
                  “Cloud” – Linked Data




London   -   New York   - Dubai - Mumbai          2011
Table of Contents

     Definitions                                     3
     History                                         5
     The Modigliani Test                            11
     Link Data                                      13
     Raw Data                                       23
     Resource Description Framework                 30
     Linked Data Principles                         42
     Publishing Linked Data                         57
     Faceted Browsers                               65
     On-the-fly Mashups                             67
     SPARQL                                         73
     What is a Linked Data Application              77
     Characteristics of a Linked Data Application   78
     Contact Us                                     81
Definitions
RDF: The RDF data model is similar to classic conceptual
modelling approaches such as Entity-Relationship or Class
diagrams, as it is based upon the idea of making statements about
resources (in particular Web resources) in the form of subject-
predicate-object expressions. These expressions are known as
triples in RDF terminology. The subject denotes the resource, and
the predicate denotes traits or aspects of the resource and
expresses a relationship between the subject and the object. For
example, one way to represent the notion "The sky has the colour
blue" in RDF is as the triple: a subject denoting "the sky", a
predicate denoting "has the colour", and an object denoting "blue".
RDF is an abstract model with several serialization formats (i.e.,
file formats), and so the particular way in which a resource or
triple is encoded varies from format to format.
Definitions
SPARQL: (SPARQL Protocol and RDF Query Language,
pronounced "sparkle") is an RDF query language

Linked Data: Linked Data describes a method of publishing
structured data, so that it can be interlinked and become more
useful. It builds upon standard Web technologies, such as HTTP
and URIs - but rather than using them to serve web pages for
human readers, it extends them to share information in a way that
can be read automatically by computers. This enables data from
different sources to be connected and queried.
History

 Linked Data Design Issues by Tim Berners-Lee July 2006
 Linked Open Data Project WWW2007
 First LOD Cloud May 2007
 BBC publishes Linked Data 2008
 NY Times announcement SemTech2009 - ISWC09
 Data.gov.uk publishes Linked Data 2010
May 2007
Mar 2008
Sept 2008
Mar 2009
July 2009
The Modigliani Test

 Show me all the locations of all the original paintings
of Modigliani
 Daniel Koller (@dakoller) showed that you can find
this with a SPARQL query on DBpedia
So what is Linked Data?
Do you SEARCH or do you FIND?
Search for


 Football Players who went to the University of
Texas at Austin, played for the Dallas Cowboys as
                    Cornerback
Why can’t we just FIND it…
Using the Current Web =internet + links + docs
is terribly inefficient
So what is the problem?
 We aren’t always interested in documents
     • We are interested in THINGS
     • These THINGS might be in documents
 We can read a HTML document rendered in a browser and find
what we are searching for
     • This is hard for computers. It’s typically based on
     guesswork from some primitive NLP engine, or simple
     keyword search
What do we need to do?

Make it easy for computers/software to find THINGS
How can we do that?


  • Besides publishing documents on the web
    - which computers can’t understand easily
  • Let’s publish something that computers can
  understand
RAW DATA!
But don’t we already publish raw data in
      RDBMS, XML, CSV, etc?
Yes!

But it’s not in a consistent format, and very
      difficult to integrate (or “link”).
For example, how do I know that the
Wael Elrifai in Facebook is the same
      as Wael Elrifai in Twitter
Don’t we already have a standard
 way of publishing on the web?
We have a standardized way of
publishing documents on the web, right?
                 HTML
Then why can’t we have a standard way
    of publishing data on the Web?
In fact, we do have one.
Resource Description Framework (RDF)
  A data model
    •A way to model data
    •i.e. Relational databases use relational data model
  RDF is a triple data model
  Labeled Graph
  Subject, Predicate, Object
  <Wael> <was born in> <Beirut>
  <Beirut> <is part of> <the Lebanon>
  <Wael> <likes> <the Semantic Web>
RDF can be serialized in different ways

       RDF/XML
       RDFa (RDF in HTML)
       N3
       Turtle
       JSON
So does that mean that I have to
 publish my data in RDF now?
You don’t have to… but it sure
       would be nice.
Document on the Web
Databases back up documents
                                     THINGS have PROPERTIES:
                                     A Book as a Title, an author, …

Isbn            Title          Author           PublisherID ReleasedData
978-0-596-      Programming    Toby Segaran 1               July 2009
15381-6         the Semantic
                Web
…               …              …                …           …


                                            PublisherID     PublisherNa
    This is a THING:                                        me
    A book title “Programming the
    Semantic Web” by Toby Segaran,          1               O’Reilly
    …                                                       Media
                                            …               …
Lets represent the data in RDF
Isbn     Title              Author    PublisherID     ReleasedData

978-0-   Programming        Toby      1               July 2009
596-     the Semantic       Segaran
15381-   Web
6
                                                                                Programming the
PublisherID      PublisherName                      title                        Semantic Web
1                O’Reilly Media


                                                            author
                                      book                                      Toby Segaran




                                                            isbn               978-0-596-15381-6

                                              publisher
                                                                   Publisher               O’Reilly
                                                                                   name
Remember that we are on the web

Everything on the web is identified by a URL
And now let’s link the data to other data

                                      Programming the
                                       Semantic Web
                    title


 http://…/isbn                             Toby
                            author        Segaran
       978


                                     978-0-596-15381-6
                            isbn

                 publisher
                  http://…/pu                        O’Reilly
                     blisher1           name
And now consider the data from Revyu.com


            hasReview
  http://                http://…/
  …/revie                  isbn978
    w1
                description
reviewer
                        Awesom
                         e Book
            name

  http://…               Wael
  /reviewer              Elrifai
Let’s start to link data


  http://      hasReview         http://…/
  …/revie                          isbn978
    w1                                                                 Programming
              description                 title                        the Semantic
hasReviewer                 sameAs                                         Web


              Awesom            http://           author                 Toby
               e Book           …/isbn9
                                                                        Segaran
                                   78
   http://
   …/revie          name
     wer                                          isbn
                                                                      978-0-596-15381-6
                      Wael           publisher
                      Elrifai                            http://…/        name
                                                         publisher1                O’Reilly
Data on the Web that is in RDF and
  is linked to other RDF data is
         LINKED DATA
Linked Data Principles

   1.   Use URIs as names for things
   2.   Use HTTP URIs so that people can look up
        (dereference) those names.
   3.   When someone looks up a URI, provide
        useful information.
   4.   Include links to other URIs so that they can
        discover more things.
Linked Data makes the web appear
     a single global database!
The same can be done inside your company!
What if you wanted to know your company’s
      EBITDA for Catalonia in 2010?

 You could have a EDW pre-aggregate and
distribute the data, an analyst calculate it on
                the spot, or…
Linked data in your internal semantic
web could relate all transactions to a
linked financial formulae!

You ask the question, tell your system
where to look (as part of the question,
this can be prebuilt) and voilà!
I can query a database with SQL. Is
there a way to query Linked Data with a
            query language?
Yes! There is actually a standardize
        language for that
FIND all the reviews on the book
“Programming the Semantic Web”
  by people who live in London
hasReview         http://…/
  http://…/                                                 Programming
                                   isbn978                  the Semantic
   review1
                                                                Web
              description                  title
hasReviewer                 sameAs
                                                                 Toby
              Awesom             http://                        Segaran
                                                   author
               e Book            …/isbn9
                                    78
   http://…                                                   978-0-596-15381-6
   /reviewer        name
                                                   isbn

    sameAs            Wael            publisher http://…
                      Elrifai                                  name       O’Reilly
                                                /publishe
                                                    r1
    http://waelw
    orldwide.com            livesIn          http://dbpedia.org/London
                        name           Wael Elrifai
This looks cool, but let’s be realistic.
  What is the incentive to publish
           Linked Data?
What was your incentive to publish
an HTML (Intranet) page in 1990?
1) Share data in documents
2) Because you neighbor was doing it
So why should we publish
  Linked Data in 2011?
1) Share data as data
2) Because you neighbor is doing it
You’ll be among good company…
Linked Data Publishers
  UK Government
  US Government
  BBC
  Open Calais – Thomson Reuters
  Freebase
  NY Times
  Best Buy
  CNET
  Dbpedia
How can I publish Linked Data?
Publishing Linked Data
  •   Legacy Data in Relational Databases
          • D2R Server
          • Virtuoso
          • Triplify
          • Ultrawrap
  •   CMS
          • Drupal 7
  •   Native RDF Stores
          • Databases for RDF (Triple Stores)
             • AllegroGraph, Jena, Sesame, Virtuoso
          • Talis Platform (Linked Data in the Cloud)
  •   In HTML with RDFa
Consuming Linked Data by Humans
HTML Browsers
 RDF can be serialized in RDFa
 Have you heard of
   •Yahoo’s Search Monkey
   •Google Rich Snippets?
 They are consuming RDFa
 But WHY?
Because there is life beyond ten
          blue links
Google and Yahoo are starting to crawl
              RDFa!

       The Semantic Web is a reality!
The Reality

 •Yahoo is crawling data that is in RDFa and
 Microformats under a specific vocabularies
     • FOAF
     • GoodRelations

 • Google is crawling RDFa and Microformaats that
 use the Google vocabulary
Linked Data Browsers

Tabulator
   •http://www.w3.org/2005/ajar/tab
OpenLink
   •http://ode.openlinksw.com/
Zitgist Dataviewr
   •http://dataviewer.zitgist.com/
Marbles
   •http://www5.wiwiss.fu-berlin.de/marbles/
Explorator
   •http://www.tecweb.inf.puc-rio.br/explorator
Faceted Browsers
http://dbpedia.neofonie.de
http://dev.semsol.com/2010/semtech/
On-the-fly Mashups
http://sig.ma
What’s next?
Time to create new and innovative
ways to interact with Linked Data
This may be one of the Killer Apps that we have all been
waiting for




 http://en.wikipedia.org/wiki/File:Mosaic_browser_plaque_ncsa.jpg
Where can I find SPARQL Endpoints?

Dbpedia:
http://dbpedia.org/sparql
Musicbrainz: http://dbtune.org/musicbrainz/sparql
U.S. Census:
http://www.rdfabout.com/sparql
Semantic Crunchbase: http://cb.semsol.org/sparql
http://esw.w3.org/topic/SparqlEndpoints
•   Querying a single dataset is quite boring
                       compared to:
•   Issuing SPARQL queries over multiple datasets

•   How can you do this?
     1. Issue follow-up queries to different endpoints
     2. Querying a central collection of datasets
     3. Build store with copies of relevant datasets
     4. Use query federation system
Follow-up Queries


• Idea: issue follow-up queries over other
datasets based on results from previous
queries
• Substituting placeholders in query templates
Getting Started

•   Finding URIs
•   Finding Additional Data
•   Finding SPARQL Endpoints
What is a Linked Data application

Software system that makes use of data on the
web from multiple datasets AND that benefits
from links between the datasets
Characteristics of Linked Data Applications

•   Consume data that is published on the web following
    the Linked Data principles
•   Discover further information by following the links
    between different data sources
•   Combine the consumed linked data with data from
    sources (not necessarily Linked Data)
•   Expose the combined data back to the web
    following the Linked Data principles
•   Offer value to end-users
Examples

 •   http://data-gov.tw.rpi.edu/wiki
 •   http://dbrec.net/
 •   http://fanhu.bz/
 •   http://data.nytimes.com/schools/schools.html
 •   http://sig.ma
 •   http://visinav.deri.org/semtech2010/
Hot Research Topics

   •   Interlinking Algorithms
   •   Provenance and Trust
   •   Dataset Dynamics
   •   UI
   •   Distributed Query
Contact

PEAK Consulting                  United States              United Arab Emirates
Headquarters
                                 11 Penn Plaza, 5th floor   Unit P12 Rimal, The
90 Long Acre, Covent Garden      New York, NY 1000          Walk
London WC2E 9RZ                  United States              PO Box 487 177 Dubai
United Kingdom                                              United Arab Emirates
                                 Tel: +1 (212) 946 4824
Tel: +44 (0)207 849 3422         Fax: +1 (212) 946 2801     Tel: +44 (0)207 849
Fax: +44 (0)207 990 9478                                    3422
                                                            Fax: +44 (0)207 990
                                                            9478



                           http://www.peakconsulting.eu
                              info@peakconsulting.eu

More Related Content

Similar to Peak cloud based data - linked data

Introduction to Linked Data 1/5
Introduction to Linked Data 1/5Introduction to Linked Data 1/5
Introduction to Linked Data 1/5
Juan Sequeda
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Cory Lampert
 

Similar to Peak cloud based data - linked data (20)

Introduction to Linked Data 1/5
Introduction to Linked Data 1/5Introduction to Linked Data 1/5
Introduction to Linked Data 1/5
 
Linked Data tutorial at Semtech 2012
Linked Data tutorial at Semtech 2012Linked Data tutorial at Semtech 2012
Linked Data tutorial at Semtech 2012
 
NCompass Live: Linked Data and Libraries: What? Why? How?
NCompass Live: Linked Data and Libraries: What? Why? How?NCompass Live: Linked Data and Libraries: What? Why? How?
NCompass Live: Linked Data and Libraries: What? Why? How?
 
Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010Consuming Linked Data SemTech2010
Consuming Linked Data SemTech2010
 
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
 
ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2
 
GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2
 
Smxeastbarbarastarr2012
Smxeastbarbarastarr2012Smxeastbarbarastarr2012
Smxeastbarbarastarr2012
 
1428393873 mhkx3 ln
1428393873 mhkx3 ln1428393873 mhkx3 ln
1428393873 mhkx3 ln
 
Semantic web assignment 3
Semantic web assignment 3Semantic web assignment 3
Semantic web assignment 3
 
Linked Data:Libraries and Beyond
Linked Data:Libraries and BeyondLinked Data:Libraries and Beyond
Linked Data:Libraries and Beyond
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
Towards a Web of Data?
Towards a Web of Data?Towards a Web of Data?
Towards a Web of Data?
 
Hacking with Semantic Web
Hacking with Semantic WebHacking with Semantic Web
Hacking with Semantic Web
 
San Diego Meetup - Sem Web Overview - 2009.04.27
San Diego Meetup - Sem Web Overview - 2009.04.27San Diego Meetup - Sem Web Overview - 2009.04.27
San Diego Meetup - Sem Web Overview - 2009.04.27
 
Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.
 
Introduction to Linked Data
Introduction to Linked DataIntroduction to Linked Data
Introduction to Linked Data
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overview
 
"What is left to do?", Dublin Core 2012 Keynote
"What is left to do?", Dublin Core 2012 Keynote"What is left to do?", Dublin Core 2012 Keynote
"What is left to do?", Dublin Core 2012 Keynote
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Peak cloud based data - linked data

  • 1. Dealing with the “new” data in the “Cloud” – Linked Data London - New York - Dubai - Mumbai 2011
  • 2. Table of Contents Definitions 3 History 5 The Modigliani Test 11 Link Data 13 Raw Data 23 Resource Description Framework 30 Linked Data Principles 42 Publishing Linked Data 57 Faceted Browsers 65 On-the-fly Mashups 67 SPARQL 73 What is a Linked Data Application 77 Characteristics of a Linked Data Application 78 Contact Us 81
  • 3. Definitions RDF: The RDF data model is similar to classic conceptual modelling approaches such as Entity-Relationship or Class diagrams, as it is based upon the idea of making statements about resources (in particular Web resources) in the form of subject- predicate-object expressions. These expressions are known as triples in RDF terminology. The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. For example, one way to represent the notion "The sky has the colour blue" in RDF is as the triple: a subject denoting "the sky", a predicate denoting "has the colour", and an object denoting "blue". RDF is an abstract model with several serialization formats (i.e., file formats), and so the particular way in which a resource or triple is encoded varies from format to format.
  • 4. Definitions SPARQL: (SPARQL Protocol and RDF Query Language, pronounced "sparkle") is an RDF query language Linked Data: Linked Data describes a method of publishing structured data, so that it can be interlinked and become more useful. It builds upon standard Web technologies, such as HTTP and URIs - but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried.
  • 5. History Linked Data Design Issues by Tim Berners-Lee July 2006 Linked Open Data Project WWW2007 First LOD Cloud May 2007 BBC publishes Linked Data 2008 NY Times announcement SemTech2009 - ISWC09 Data.gov.uk publishes Linked Data 2010
  • 11. The Modigliani Test  Show me all the locations of all the original paintings of Modigliani  Daniel Koller (@dakoller) showed that you can find this with a SPARQL query on DBpedia
  • 12.
  • 13. So what is Linked Data?
  • 14. Do you SEARCH or do you FIND?
  • 15. Search for Football Players who went to the University of Texas at Austin, played for the Dallas Cowboys as Cornerback
  • 16.
  • 17. Why can’t we just FIND it…
  • 18.
  • 19. Using the Current Web =internet + links + docs is terribly inefficient
  • 20. So what is the problem?  We aren’t always interested in documents • We are interested in THINGS • These THINGS might be in documents  We can read a HTML document rendered in a browser and find what we are searching for • This is hard for computers. It’s typically based on guesswork from some primitive NLP engine, or simple keyword search
  • 21. What do we need to do? Make it easy for computers/software to find THINGS
  • 22. How can we do that? • Besides publishing documents on the web - which computers can’t understand easily • Let’s publish something that computers can understand
  • 23. RAW DATA! But don’t we already publish raw data in RDBMS, XML, CSV, etc?
  • 24. Yes! But it’s not in a consistent format, and very difficult to integrate (or “link”).
  • 25. For example, how do I know that the Wael Elrifai in Facebook is the same as Wael Elrifai in Twitter
  • 26. Don’t we already have a standard way of publishing on the web?
  • 27. We have a standardized way of publishing documents on the web, right? HTML
  • 28. Then why can’t we have a standard way of publishing data on the Web?
  • 29. In fact, we do have one.
  • 30. Resource Description Framework (RDF) A data model •A way to model data •i.e. Relational databases use relational data model RDF is a triple data model Labeled Graph Subject, Predicate, Object <Wael> <was born in> <Beirut> <Beirut> <is part of> <the Lebanon> <Wael> <likes> <the Semantic Web>
  • 31. RDF can be serialized in different ways RDF/XML RDFa (RDF in HTML) N3 Turtle JSON
  • 32. So does that mean that I have to publish my data in RDF now?
  • 33. You don’t have to… but it sure would be nice.
  • 35. Databases back up documents THINGS have PROPERTIES: A Book as a Title, an author, … Isbn Title Author PublisherID ReleasedData 978-0-596- Programming Toby Segaran 1 July 2009 15381-6 the Semantic Web … … … … … PublisherID PublisherNa This is a THING: me A book title “Programming the Semantic Web” by Toby Segaran, 1 O’Reilly … Media … …
  • 36. Lets represent the data in RDF Isbn Title Author PublisherID ReleasedData 978-0- Programming Toby 1 July 2009 596- the Semantic Segaran 15381- Web 6 Programming the PublisherID PublisherName title Semantic Web 1 O’Reilly Media author book Toby Segaran isbn 978-0-596-15381-6 publisher Publisher O’Reilly name
  • 37. Remember that we are on the web Everything on the web is identified by a URL
  • 38. And now let’s link the data to other data Programming the Semantic Web title http://…/isbn Toby author Segaran 978 978-0-596-15381-6 isbn publisher http://…/pu O’Reilly blisher1 name
  • 39. And now consider the data from Revyu.com hasReview http:// http://…/ …/revie isbn978 w1 description reviewer Awesom e Book name http://… Wael /reviewer Elrifai
  • 40. Let’s start to link data http:// hasReview http://…/ …/revie isbn978 w1 Programming description title the Semantic hasReviewer sameAs Web Awesom http:// author Toby e Book …/isbn9 Segaran 78 http:// …/revie name wer isbn 978-0-596-15381-6 Wael publisher Elrifai http://…/ name publisher1 O’Reilly
  • 41. Data on the Web that is in RDF and is linked to other RDF data is LINKED DATA
  • 42. Linked Data Principles 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up (dereference) those names. 3. When someone looks up a URI, provide useful information. 4. Include links to other URIs so that they can discover more things.
  • 43. Linked Data makes the web appear a single global database! The same can be done inside your company!
  • 44. What if you wanted to know your company’s EBITDA for Catalonia in 2010? You could have a EDW pre-aggregate and distribute the data, an analyst calculate it on the spot, or…
  • 45. Linked data in your internal semantic web could relate all transactions to a linked financial formulae! You ask the question, tell your system where to look (as part of the question, this can be prebuilt) and voilà!
  • 46. I can query a database with SQL. Is there a way to query Linked Data with a query language?
  • 47. Yes! There is actually a standardize language for that
  • 48. FIND all the reviews on the book “Programming the Semantic Web” by people who live in London
  • 49. hasReview http://…/ http://…/ Programming isbn978 the Semantic review1 Web description title hasReviewer sameAs Toby Awesom http:// Segaran author e Book …/isbn9 78 http://… 978-0-596-15381-6 /reviewer name isbn sameAs Wael publisher http://… Elrifai name O’Reilly /publishe r1 http://waelw orldwide.com livesIn http://dbpedia.org/London name Wael Elrifai
  • 50. This looks cool, but let’s be realistic. What is the incentive to publish Linked Data?
  • 51. What was your incentive to publish an HTML (Intranet) page in 1990?
  • 52. 1) Share data in documents 2) Because you neighbor was doing it
  • 53. So why should we publish Linked Data in 2011?
  • 54. 1) Share data as data 2) Because you neighbor is doing it
  • 55. You’ll be among good company…
  • 56. Linked Data Publishers UK Government US Government BBC Open Calais – Thomson Reuters Freebase NY Times Best Buy CNET Dbpedia
  • 57. How can I publish Linked Data?
  • 58. Publishing Linked Data • Legacy Data in Relational Databases • D2R Server • Virtuoso • Triplify • Ultrawrap • CMS • Drupal 7 • Native RDF Stores • Databases for RDF (Triple Stores) • AllegroGraph, Jena, Sesame, Virtuoso • Talis Platform (Linked Data in the Cloud) • In HTML with RDFa
  • 60. HTML Browsers RDF can be serialized in RDFa Have you heard of •Yahoo’s Search Monkey •Google Rich Snippets? They are consuming RDFa But WHY?
  • 61. Because there is life beyond ten blue links
  • 62. Google and Yahoo are starting to crawl RDFa! The Semantic Web is a reality!
  • 63. The Reality •Yahoo is crawling data that is in RDFa and Microformats under a specific vocabularies • FOAF • GoodRelations • Google is crawling RDFa and Microformaats that use the Google vocabulary
  • 64. Linked Data Browsers Tabulator •http://www.w3.org/2005/ajar/tab OpenLink •http://ode.openlinksw.com/ Zitgist Dataviewr •http://dataviewer.zitgist.com/ Marbles •http://www5.wiwiss.fu-berlin.de/marbles/ Explorator •http://www.tecweb.inf.puc-rio.br/explorator
  • 71. Time to create new and innovative ways to interact with Linked Data
  • 72. This may be one of the Killer Apps that we have all been waiting for http://en.wikipedia.org/wiki/File:Mosaic_browser_plaque_ncsa.jpg
  • 73. Where can I find SPARQL Endpoints? Dbpedia: http://dbpedia.org/sparql Musicbrainz: http://dbtune.org/musicbrainz/sparql U.S. Census: http://www.rdfabout.com/sparql Semantic Crunchbase: http://cb.semsol.org/sparql http://esw.w3.org/topic/SparqlEndpoints
  • 74. Querying a single dataset is quite boring compared to: • Issuing SPARQL queries over multiple datasets • How can you do this? 1. Issue follow-up queries to different endpoints 2. Querying a central collection of datasets 3. Build store with copies of relevant datasets 4. Use query federation system
  • 75. Follow-up Queries • Idea: issue follow-up queries over other datasets based on results from previous queries • Substituting placeholders in query templates
  • 76. Getting Started • Finding URIs • Finding Additional Data • Finding SPARQL Endpoints
  • 77. What is a Linked Data application Software system that makes use of data on the web from multiple datasets AND that benefits from links between the datasets
  • 78. Characteristics of Linked Data Applications • Consume data that is published on the web following the Linked Data principles • Discover further information by following the links between different data sources • Combine the consumed linked data with data from sources (not necessarily Linked Data) • Expose the combined data back to the web following the Linked Data principles • Offer value to end-users
  • 79. Examples • http://data-gov.tw.rpi.edu/wiki • http://dbrec.net/ • http://fanhu.bz/ • http://data.nytimes.com/schools/schools.html • http://sig.ma • http://visinav.deri.org/semtech2010/
  • 80. Hot Research Topics • Interlinking Algorithms • Provenance and Trust • Dataset Dynamics • UI • Distributed Query
  • 81. Contact PEAK Consulting United States United Arab Emirates Headquarters 11 Penn Plaza, 5th floor Unit P12 Rimal, The 90 Long Acre, Covent Garden New York, NY 1000 Walk London WC2E 9RZ United States PO Box 487 177 Dubai United Kingdom United Arab Emirates Tel: +1 (212) 946 4824 Tel: +44 (0)207 849 3422 Fax: +1 (212) 946 2801 Tel: +44 (0)207 849 Fax: +44 (0)207 990 9478 3422 Fax: +44 (0)207 990 9478 http://www.peakconsulting.eu info@peakconsulting.eu