Consuming Linked Data SemTech2010

2 months ago

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Do you like this presentation?

No comments yet

Post a comment

    Login or Signup to post a comment
    Login to SlideShare
    Login to Twitter
    Edit your comment Cancel

    10 Favorites

    Consuming Linked Data SemTech2010 - Presentation Transcript

    1. Consuming Linked Data
      Juan F. Sequeda
      Department of Computer Science
      University of Texas at Austin
      SemTech 2010
    2. How many people are familiar with
      RDF
      SPARQL
      Linked Data
      Web Architecture (HTTP, etc)
    3. History
      Linked Data Design Issues by TimBL July 2006
      Linked Open Data Project WWW2007
      First LOD Cloud May 2007
      1st Linked Data on the Web Workshop WWW2008
      1stTriplification Challenge 2008
      How to Publish Linked Data Tutorial ISWC2008
      BBC publishes Linked Data 2008
      2nd Linked Data on the Web Workshop WWW2009
      NY Times announcement SemTech2009 - ISWC09
      1st Linked Data-a-thon ISWC2009
      1st How to Consume Linked Data Tutorial ISWC2009
      Data.gov.uk publishes Linked Data 2010
      2st How to Consume Linked Data Tutorial WWW2010
      1st International Workshop on Consuming Linked Data COLD2010

    4. May 2007
    5. Oct 2007
    6. Nov 2007 (1)
    7. Nov 2007 (2)
    8. Feb 2008
    9. Mar 2008
    10. Sept 2008
    11. Mar 2009 (1)
    12. Mar 2009 (2)
    13. July 2009
    14. June 2010
      YOU GET THE PICTURE
      ITS BIG and getting BIGGER and
      BIGGER
    15. Now what can we do with this data?
    16. Let’s consume it!
    17. The Modigliani Test
      Show me all the locations of all the original paintings of Modigliani
      Daniel Koller (@dakoller) showed that you can find this with a SPARQL query on DBpedia
      Thanks Richard MacManus - ReadWriteWeb
    18. Results of the Modigliani Test
      AtanasKiryakov from Ontotext
      Used LDSR – Linked Data Semantic Repository
      Dbpedia
      Freebase
      Geonames
      UMBEL
      Wordnet
      Published April 26, 2010:
      http://www.readwriteweb.com/archives/the_modigliani_test_for_linked_data.php
    19. SPARQL Query
      PREFIX fb: http://rdf.freebase.com/ns/
      PREFIX dbpedia: http://dbpedia.org/resource/
      PREFIX dbp-prop: http://dbpedia.org/property/
      PREFIX dbp-ont: http://dbpedia.org/ontology/
      PREFIX umbel-sc: http://umbel.org/umbel/sc/
      PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
      PREFIX ot: http://www.ontotext.com/
      SELECT DISTINCT ?painting_l ?owner_l ?city_fb_con ?city_db_loc ?city_db_cit
      WHERE
      { ?pfb:visual_art.artwork.artistdbpedia:Amedeo_Modigliani ; fb:visual_art.artwork.owners [ fb:visual_art.artwork_owner_relationship.owner ?ow ] ; ot:preferredLabel ?painting_l. ?owot:preferredLabel ?owner_l . OPTIONAL { ?owfb:location.location.containedby [ ot:preferredLabel ?city_fb_con ] } .
      OPTIONAL { ?owdbp-prop:location ?loc. ?loc rdf:type umbel-sc:City ; ot:preferredLabel ?city_db_loc }
      OPTIONAL { ?owdbp-ont:city [ ot:preferredLabel ?city_db_cit ] }}
    20. Let’s start by making sure that we understand what Linked Data is…
    21. Do you SEARCH or do you FIND?
    22. Search for
      Football Players who went to the University of Texas at Austin, played for the Dallas Cowboys as Cornerback
    23. Why can’t we just FIND it…
    24. Guess how I FOUND out?
    25. I’ll tell you how I did NOT find it
    26. Current Web = internet + links + docs
    27. So what is the problem?
      We aren’t always interested in documents
      We are interested in THINGS
      These THINGS might be in documents
      We can read a HTML document rendered in a browser and find what we are searching for
      This is hard for computers.
      Computers have to guess (even though they are pretty good at it)
    28. What do we need to do?
      Make it easy for computers/software to find THINGS
    29. How can we do that?
      Besides publishing documents on the web
      which computers can’t understand easily
      Let’s publish something that computers can understand
    30. RAW DATA!
    31. But wait… don’t we do that already?
    32. Current Data on the Web
      Relational Databases
      APIs
      XML
      CSV
      XLS

      Can’t computers and applications already consume that data on the web?
    33. True! But it is all in different formats and data models!
    34. This makes it hard to integrate data
    35. The data in different data sources aren’t linked
    36. For example, how do I know that the Juan Sequeda in Facebook is the same as Juan Sequeda in Twitter
    37. Or if I create a mashup from different services, I have to learn different APIs and I get different formats of data back
    38. Wouldn’t it be great if we had a standard way of publishing data on the Web?
    39. We have a standardized way of publishing documents on the web, right?
      HTML
    40. Then why can’t we have a standard way of publishing data on the Web?
    41. Good question! And the answer is YES. There is!
    42. Resource Description Framework (RDF)
      A data model
      A way to model data
      i.e. Relational databases use relational data model
      RDF is a triple data model
      Labeled Graph
      Subject, Predicate, Object
      <Juan> <was born in> <California>
      <California> <is part of> <the USA>
      <Juan> <likes> <the Semantic Web>
    43. RDF can be serialized in different ways
      RDF/XML
      RDFa (RDF in HTML)
      N3
      Turtle
      JSON
    44. So does that mean that I have to publish my data in RDF now?
    45. You don’t have to… but we would like you to 
    46. An example
    47. Document on the Web
    48. Databases back up documents
      THINGS have PROPERTIES:
      A Book as a Title, an author, …
      This is a THING:
      A book title “Programming the Semantic Web” by Toby Segaran, …
    49. Lets represent the data in RDF
      Programming the Semantic Web
      title
      author
      book
      Toby Segaran
      isbn
      978-0-596-15381-6
      publisher
      name
      Publisher
      O’Reilly
    50. Remember that we are on the web
      Everything on the web is identified by a URI
    51. And now let’s link the data to other data
      Programming the Semantic Web
      title
      author
      http://…/isbn978
      Toby Segaran
      isbn
      978-0-596-15381-6
      publisher
      name
      http://…/publisher1
      O’Reilly
    52. And now consider the data from Revyu.com
      hasReview
      http://…/review1
      http://…/isbn978
      description
      reviewer
      Awesome Book
      http://…/reviewer
      name
      Juan Sequeda
    53. Let’s start to link data
      hasReview
      http://…/review1
      http://…/isbn978
      Programming the Semantic Web
      title
      description
      sameAs
      hasReviewer
      Awesome Book
      author
      http://…/isbn978
      Toby Segaran
      http://…/reviewer
      name
      isbn
      978-0-596-15381-6
      Juan Sequeda
      publisher
      name
      http://…/publisher1
      O’Reilly
    54. Juan Sequeda publishes data too
      http://juansequeda.com/id
      http://dbpedia.org/Austin
      livesIn
      name
      Juan Sequeda
    55. Let’s link more data
      hasReview
      http://…/review1
      http://…/isbn978
      description
      hasReviewer
      Awesome Book
      http://…/reviewer
      name
      Juan Sequeda
      sameAs
      http://juansequeda.com/id
      http://dbpedia.org/Austin
      livesIn
      name
      Juan Sequeda
    56. And more
      hasReview
      http://…/review1
      http://…/isbn978
      Programming the Semantic Web
      title
      description
      sameAs
      hasReviewer
      Awesome Book
      author
      http://…/isbn978
      Toby Segaran
      http://…/reviewer
      name
      isbn
      978-0-596-15381-6
      Juan Sequeda
      publisher
      sameAs
      http://…/publisher1
      name
      O’Reilly
      http://juansequeda.com/id
      http://dbpedia.org/Austin
      livesIn
      name
      Juan Sequeda
    57. Data on the Web that is in RDF and is linked to other RDF data is LINKED DATA
    58. Linked Data Principles
      Use URIs as names for things
      Use HTTP URIs so that people can look up (dereference) those names.
      When someone looks up a URI, provide useful information.
      Include links to other URIs so that they can discover more things.
    59. Linked Data makes the web appear as ONEGIANTHUGEGLOBALDATABASE!
    60. I can query a database with SQL. Is there a way to query Linked Data with a query language?
    61. Yes! There is actually a standardize language for that
      SPARQL
    62. FIND all the reviews on the book “Programming the Semantic Web” by people who live in Austin
    63. hasReview
      http://…/review1
      http://…/isbn978
      Programming the Semantic Web
      title
      description
      sameAs
      hasReviewer
      Awesome Book
      author
      http://…/isbn978
      Toby Segaran
      http://…/reviewer
      name
      isbn
      978-0-596-15381-6
      Juan Sequeda
      publisher
      sameAs
      name
      http://…/publisher1
      O’Reilly
      http://juansequeda.com
      http://dbpedia.org/Austin
      livesIn
      name
      Juan Sequeda
    64. This looks cool, but let’s be realistic. What is the incentive to publish Linked Data?
    65. What was your incentive to publish an HTML page in 1990?
    66. 1) Share data in documents2) Because you neighbor was doing it
    67. So why should we publish Linked Data in 2010?
    68. 1) Share data as data2) Because you neighbor is doing it
    69. And guess who is starting to publish Linked Data now?
    70. Linked Data Publishers
      UK Government
      US Government
      BBC
      Open Calais – Thomson Reuters
      Freebase
      NY Times
      Best Buy
      CNET
      Dbpedia
      Are you?
    71. How can I publish Linked Data?
    72. Publishing Linked Data
      Legacy Data in Relational Databases
      D2R Server
      Virtuoso
      Triplify
      Ultrawrap
      CMS
      Drupal 7
      Native RDF Stores
      Databases for RDF (Triple Stores)
      AllegroGraph, Jena, Sesame, Virtuoso
      Talis Platform (Linked Data in the Cloud)
      In HTML with RDFa
    73. Consuming Linked Data by Humans
    74. HTML Browsers
    75. Links to other URIs
    76. <span rel="foaf:interest">
      <a href="http://dbpedia.org/resource/Database" property="dcterms:title">Database</a>,
      <a href="http://dbpedia.org/resource/Data_integration" property="dcterms:title">Data Integration</a>,
      <a href="http://dbpedia.org/resource/Semantic_Web" property="dcterms:title">Semantic Web</a>,
      <a href="http://dbpedia.org/resource/Linked_Data" property="dcterms:title">Linked Data</a>,
      etc.</span>
    77. HTML Browsers
      RDF can be serialized in RDFa
      Have you heard of
      Yahoo’s Search Monkey
      Google Rich Snippets?
      They are consuming RDFa
      But WHY?
    78. Because there is life beyond ten blue links
    79. Google and Yahoo are starting to crawl RDFa!
      The Semantic Web is a reality!
    80. The Reality
      Yahoo is crawling data that is in RDFa and Microformats under a specific vocabularies
      FOAF
      GoodRelations

      Google is crawling RDFa and Microformats that use the Google vocabulary
    81. Linked Data Browsers
    82. Linked Data Browsers
      Not actually separate browsers. Run inside of HTML browsers
      View the data that is returned after looking up a URI in tabular form
      (IMO) UI lacks usability
    83. Linked Data Browsers
      Tabulator
      http://www.w3.org/2005/ajar/tab
      OpenLink
      http://ode.openlinksw.com/
      ZitgistDataviewr
      http://dataviewer.zitgist.com/
      Marbles
      http://www5.wiwiss.fu-berlin.de/marbles/
      Explorator
      http://www.tecweb.inf.puc-rio.br/explorator
    84. Faceted Browsers
    85. http://dbpedia.neofonie.de
    86. http://dev.semsol.com/2010/semtech/
    87. On-the-fly Mashups
    88. http://sig.ma
    89. What’s next?
    90. Time to create new and innovative ways to interact with Linked Data
    91. This may be one of the Killer Apps that we have all been waiting for
      http://en.wikipedia.org/wiki/File:Mosaic_browser_plaque_ncsa.jpg
    92. It’s time to partner with HCI community
      Semantic Web UIs don’t have to be ugly
    93. Consume Linked Data with SPARQL
    94. SPARQL Endpoints
      Linked Data sources usually provide a SPARQL endpoint for their dataset(s)
      SPARQL endpoint: SPARQL query processing service that supports the SPARQL protocol*
      Send your SPARQL query, receive the result
      * http://www.w3.org/TR/rdf-sparql-protocol/
    95. Where can I find SPARQL Endpoints?
      Dbpedia: http://dbpedia.org/sparql
      Musicbrainz: http://dbtune.org/musicbrainz/sparql
      U.S. Census: http://www.rdfabout.com/sparql
      Semantic Crunchbase: http://cb.semsol.org/sparql
      http://esw.w3.org/topic/SparqlEndpoints
    96. Accessing a SPARQL Endpoint
      SPARQL endpoints: RESTful Web services
      Issuing SPARQL queries to a remote SPARQL endpoint is basically an HTTP GET request to the SPARQL endpoint with parameter query
      GET /sparql?query=PREFIX+rd... HTTP/1.1 Host: dbpedia.orgUser-agent: my-sparql-client/0.1
      URL-encoded string with the SPARQL query
    97. Query Results Formats
      SPARQL endpoints usually support different result formats:
      XML, JSON, plain text (for ASK and SELECT queries)
      RDF/XML, NTriples, Turtle, N3 (for DESCRIBE and CONSTRUCT queries)
    98. Query Results Formats
      PREFIX dbp: http://dbpedia.org/ontology/
      PREFIX dbpprop: http://dbpedia.org/property/
      SELECT ?name ?bdayWHERE {
      ?pdbp:birthplace <http://dbpedia.org/resource/Berlin> .
      ?pdbpprop:dateOfBirth ?bday .
      ?pdbpprop:name ?name .
      }
    99. Query Result Formats
      Use the ACCEPT header to request the preferred result format:
      GET /sparql?query=PREFIX+rd... HTTP/1.1
      Host: dbpedia.org
      User-agent: my-sparql-client/0.1
      Accept: application/sparql-results+json
    100. Query Result Formats
      As an alternative some SPARQL endpoint implementations (e.g. Joseki) provide an additional parameter out
      GET /sparql?out=json&query=... HTTP/1.1
      Host: dbpedia.org
      User-agent: my-sparql-client/0.1
    101. Accessing a SPARQL Endpoint
      More convenient: use a library
      SPARQL JavaScript Library
      http://www.thefigtrees.net/lee/blog/2006/04 sparql_calendar_demo_a_sparql.html
      ARC for PHP
      http://arc.semsol.org/
      RAP – RDF API for PHP
      http://www4.wiwiss.fu-berlin.de/bizer/rdfapi/index.html
    102. Accessing a SPARQL Endpoint
      Jena / ARQ (Java)
      http://jena.sourceforge.net/
      Sesame (Java)
      http://www.openrdf.org/
      SPARQL Wrapper (Python)
      http://sparql-wrapper.sourceforge.net/
      PySPARQL (Python)
      http://code.google.com/p/pysparql/
    103. Accessing a SPARQL Endpoint
      Example with Jena/ARQ
      import com.hp.hpl.jena.query.*;
      String service = "..."; // address of the SPARQL endpoint
      String query = "SELECT ..."; // your SPARQL query
      QueryExecutione = QueryExecutionFactory.sparqlService(service, query)
      ResultSet results = e.execSelect();
      while ( results.hasNext() ) {
      QuerySolutions = results.nextSolution();
      // ...
      }
      e.close();
    104. Querying a single dataset is quite boring
      compared to:
      Issuing SPARQL queries over multiple datasets
      How can you do this?
      Issue follow-up queries to different endpoints
      Querying a central collection of datasets
      Build store with copies of relevant datasets
      Use query federation system
    105. Follow-up Queries
      Idea: issue follow-up queries over other datasets based on results from previous queries
      Substituting placeholders in query templates
    106. String s1 = "http://cb.semsol.org/sparql";
      String s2 = "http://dbpedia.org/sparql";
      String qTmpl = "SELECT ?c WHERE{ <%s> rdfs:comment ?c }";
      String q1 = "SELECT ?s WHERE { ...";
      QueryExecution e1 = QueryExecutionFactory.sparqlService(s1,q1);
      ResultSet results1 = e1.execSelect();
      while ( results1.hasNext() ) {
      QuerySolution s1 = results.nextSolution();
      String q2 = String.format( qTmpl, s1.getResource("s"),getURI() );
      QueryExecution e2= QueryExecutionFactory.sparqlService(s2,q2);
      ResultSet results2 = e2.execSelect();
      while ( results2.hasNext() ) {
      // ...
      }
      e2.close();
      }
      e1.close();
      Find a list of companies
      Filtered by some criteria and return DbpediaURIs from them
    107. Follow-up Queries
      Advantage
      Queried data is up-to-date
      Drawbacks
      Requires the existence of a SPARQL endpoint for each dataset
      Requires program logic
      Very inefficient
    108. Querying a Collection of Datasets
      Idea: Use an existing SPARQL endpoint that provides access to a set of copies of relevant datasets
      Example:
      SPARQL endpoint over a majority of datasets from the LOD cloud at:
      http://uberblic.org
      http://lod.openlinksw.com/sparql
    109. Querying a Collection of Datasets
      Advantage:
      No need for specific program logic
      Drawbacks:
      Queried data might be out of date
      Not all relevant datasets in the collection
    110. Own Store of Dataset Copies
      Idea: Build your own store with copies of relevant datasets and query it
      Possible stores:
      Jena TDB http://jena.hpl.hp.com/wiki/TDB
      Sesame http://www.openrdf.org/
      OpenLink Virtuoso http://virtuoso.openlinksw.com/
      4store http://4store.org/
      AllegroGraphhttp://www.franz.com/agraph/
      etc.
    111. Populating Your Store
      Get RDF dumps provided for the datasets
      (Focused) Crawling
      ldspiderhttp://code.google.com/p/ldspider/
      Multithreaded API for focussed crawling
      Crawling strategies (breath-first, load-balancing)
      Flexible configuration with callbacks and hooks
    112. Own Store of Dataset Copies
      Advantages:
      No need for specific program logic
      Can include all datasets
      Independent of the existence, availability, and efficiency of SPARQL endpoints
      Drawbacks:
      Requires effort to set up and to operate the store
      Ideally, data sources provide RDF dumps; if not?
      How to keep the copies in sync with the originals?
      Queried data might be out of date
    113. Federated Query Processing
      Idea: Querying a mediator which distributes sub-queries to relevant sources and integrates the results
    114. Federated Query Processing
      Instance-based federation
      Each thing described by only one data source
      Untypical for the Web of Data
      Triple-based federation
      No restrictions
      Requires more distributed joins
      Statistics about datasets required (both cases)
    115. Federated Query Processing
      DARQ (Distributed ARQ)
      http://darq.sourceforge.net/
      Query engine for federated SPARQL queries
      Extension of ARQ (query engine for Jena)
      Last update: June 28, 2006
      Semantic Web Integrator and Query Engine(SemWIQ)
      http://semwiq.sourceforge.net/
      Actively maintained
    116. Federated Query Processing
      Advantages:
      No need for specific program logic
      Queried data is up to date
      Drawbacks:
      Requires the existence of a SPARQL endpoint for each dataset
      Requires effort to set up and configure the mediator
    117. In any case:
      You have to know the relevant data sources
      When developing the app using follow-up queries
      When selecting an existing SPARQL endpoint over a collection of dataset copies
      When setting up your own store with a collection of dataset copies
      When configuring your query federation system
      You restrict yourself to the selected sources
    118. In any case:
      You have to know the relevant data sources
      When developing the app using follow-up queries
      When selecting an existing SPARQL endpoint over a collection of dataset copies
      When setting up your own store with a collection of dataset copies
      When configuring your query federation system
      You restrict yourself to the selected sources
      There is an alternative:
      Remember, URIs link to data
    119. Automated Link Traversal
      Idea: Discover further data by looking up relevant URIs in your application
      Can be combined with the previous approaches
    120. Link Traversal Based Query Execution
      Applies the idea of automated link traversal to the execution of SPARQL queries
      Idea:
      Intertwine query evaluation with traversal of RDF links
      Discover data that might contribute to query results during query execution
      Alternately:
      Evaluate parts of the query
      Look up URIs in intermediate solutions
    121. Link Traversal Based Query Execution
    122. Link Traversal Based Query Execution
    123. Link Traversal Based Query Execution
    124. Link Traversal Based Query Execution
    125. Link Traversal Based Query Execution
    126. Link Traversal Based Query Execution
    127. Link Traversal Based Query Execution
    128. Link Traversal Based Query Execution
    129. Link Traversal Based Query Execution
    130. Link Traversal Based Query Execution
    131. Link Traversal Based Query Execution
      Advantages:
      No need to know all data sources in advance
      No need for specific programming logic
      Queried data is up to date
      Does not depend on the existence of SPARQL endpoints provided by the data sources
      Drawbacks:
      Not as fast as a centralized collection of copies
      Unsuitable for some queries
      Results might be incomplete (do we care?)
    132. Implementations
      Semantic Web Client library (SWClLib) for Java
      http://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/
      SWIC for Prolog
      http://moustaki.org/swic/
    133. Implementations
      SQUIN http://squin.org
      Provides SWClLib functionality as a Web service
      Accessible like a SPARQL endpoint
      Install package: unzip and start
      Less than 5 mins!
      Convenient access with SQUIN PHP tools:
      $s = 'http:// ...'; // address of the SQUIN service
      $q = new SparqlQuerySock( $s, '... SELECT ...' );
      $res = $q->getJsonResult();// or getXmlResult()
    134. Real World Example
    135. Getting Started
      Finding URIs
      Finding Additional Data
      Finding SPARQL Endpoints
    136. What is a Linked Data application
      Software system that makes use of data on the web from multiple datasets and that benefits from links between the datasets
    137. Characteristics of Linked Data Applications
      • Consume data that is published on the web following the Linked Data principles: an application should be able to request, retrieve and process the accessed data
      • Discover further information by following the links between different data sources: the fourth principle enables this.
      • Combine the consumed linked data with data from sources (not necessarily Linked Data)
      • Expose the combined data back to the web following the Linked Data principles
      • Offer value to end-users
    138. Examples
      http://data-gov.tw.rpi.edu/wiki
      http://dbrec.net/
      http://fanhu.bz/
      http://data.nytimes.com/schools/schools.html
      http://sig.ma
      http://visinav.deri.org/semtech2010/
    139. Hot Research Topics
      Interlinking Algorithms
      Provenance and Trust
      Dataset Dynamics
      UI
      Distributed Query
      Evaluation
      “You want a good thesis? IR is based on precision and recall. The minute you add semantics, it is a meaningless feature. Logic is based on soundness and completeness. We don’t want soundness and completeness. We want a few good answers quickly.” – Jim Hendler at WWW2009 during the LOD gathering
      Thanks Michael Hausenblas
    140. THANKS
      Juan Sequeda
      www.juansequeda.com
      @juansequeda
      #cold
      www.consuminglinkeddata.org
      Acknowledgements: Olaf Hartig, Patrick Sinclair, Jamie Taylor
      Slides for Consuming Linked Data with SPARQL by Olaf Hartig

    juansequedajuansequeda + Follow

    2035 views, 10 favs, 2 embeds more

    About this presentation

    Usage Rights

    © All Rights Reserved

    Stats

    • 10 Favorites
    • 0 Comments
    • 44 Downloads
    • 2,028 Views on
      SlideShare
    • 7 Views on
      Embeds
    • 2,035 Total Views

    Embed views

    • 4 views on http://www.consuminglinkeddata.org
    • 3 views on http://consuminglinkeddata.org

    more

    Embed views

    • 4 views on http://www.consuminglinkeddata.org
    • 3 views on http://consuminglinkeddata.org

    less

    Accessibility

    Additional Details

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint

    Follow SlideShare