Consuming Linked Data SemTech2010

  • 6,420 views
Uploaded on

This is a one hour talk introducing Linked Data and how Linked Data can be consumed by humans and by machines through SPARQL

This is a one hour talk introducing Linked Data and how Linked Data can be consumed by humans and by machines through SPARQL

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
6,420
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
297
Comments
0
Likes
14

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Consuming Linked Data
    Juan F. Sequeda
    Department of Computer Science
    University of Texas at Austin
    SemTech 2010
  • 2. How many people are familiar with
    RDF
    SPARQL
    Linked Data
    Web Architecture (HTTP, etc)
  • 3. History
    Linked Data Design Issues by TimBL July 2006
    Linked Open Data Project WWW2007
    First LOD Cloud May 2007
    1st Linked Data on the Web Workshop WWW2008
    1stTriplification Challenge 2008
    How to Publish Linked Data Tutorial ISWC2008
    BBC publishes Linked Data 2008
    2nd Linked Data on the Web Workshop WWW2009
    NY Times announcement SemTech2009 - ISWC09
    1st Linked Data-a-thon ISWC2009
    1st How to Consume Linked Data Tutorial ISWC2009
    Data.gov.uk publishes Linked Data 2010
    2st How to Consume Linked Data Tutorial WWW2010
    1st International Workshop on Consuming Linked Data COLD2010

  • 4. May 2007
  • 5. Oct 2007
  • 6. Nov 2007 (1)
  • 7. Nov 2007 (2)
  • 8. Feb 2008
  • 9. Mar 2008
  • 10. Sept 2008
  • 11. Mar 2009 (1)
  • 12. Mar 2009 (2)
  • 13. July 2009
  • 14. June 2010
    YOU GET THE PICTURE
    ITS BIG and getting BIGGER and
    BIGGER
  • 15. Now what can we do with this data?
  • 16. Let’s consume it!
  • 17. The Modigliani Test
    Show me all the locations of all the original paintings of Modigliani
    Daniel Koller (@dakoller) showed that you can find this with a SPARQL query on DBpedia
    Thanks Richard MacManus - ReadWriteWeb
  • 18.
  • 19. Results of the Modigliani Test
    AtanasKiryakov from Ontotext
    Used LDSR – Linked Data Semantic Repository
    Dbpedia
    Freebase
    Geonames
    UMBEL
    Wordnet
    Published April 26, 2010:
    http://www.readwriteweb.com/archives/the_modigliani_test_for_linked_data.php
  • 20. SPARQL Query
    PREFIX fb: http://rdf.freebase.com/ns/
    PREFIX dbpedia: http://dbpedia.org/resource/
    PREFIX dbp-prop: http://dbpedia.org/property/
    PREFIX dbp-ont: http://dbpedia.org/ontology/
    PREFIX umbel-sc: http://umbel.org/umbel/sc/
    PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
    PREFIX ot: http://www.ontotext.com/
    SELECT DISTINCT ?painting_l ?owner_l ?city_fb_con ?city_db_loc ?city_db_cit
    WHERE
    { ?pfb:visual_art.artwork.artistdbpedia:Amedeo_Modigliani ; fb:visual_art.artwork.owners [ fb:visual_art.artwork_owner_relationship.owner ?ow ] ; ot:preferredLabel ?painting_l. ?owot:preferredLabel ?owner_l . OPTIONAL { ?owfb:location.location.containedby [ ot:preferredLabel ?city_fb_con ] } .
    OPTIONAL { ?owdbp-prop:location ?loc. ?loc rdf:type umbel-sc:City ; ot:preferredLabel ?city_db_loc }
    OPTIONAL { ?owdbp-ont:city [ ot:preferredLabel ?city_db_cit ] }}
  • 21.
  • 22. Let’s start by making sure that we understand what Linked Data is…
  • 23. Do you SEARCH or do you FIND?
  • 24. Search for
    Football Players who went to the University of Texas at Austin, played for the Dallas Cowboys as Cornerback
  • 25.
  • 26.
  • 27.
  • 28. Why can’t we just FIND it…
  • 29.
  • 30.
  • 31. Guess how I FOUND out?
  • 32. I’ll tell you how I did NOT find it
  • 33. Current Web = internet + links + docs
  • 34. So what is the problem?
    We aren’t always interested in documents
    We are interested in THINGS
    These THINGS might be in documents
    We can read a HTML document rendered in a browser and find what we are searching for
    This is hard for computers.
    Computers have to guess (even though they are pretty good at it)
  • 35. What do we need to do?
    Make it easy for computers/software to find THINGS
  • 36. How can we do that?
    Besides publishing documents on the web
    which computers can’t understand easily
    Let’s publish something that computers can understand
  • 37. RAW DATA!
  • 38. But wait… don’t we do that already?
  • 39. Current Data on the Web
    Relational Databases
    APIs
    XML
    CSV
    XLS

    Can’t computers and applications already consume that data on the web?
  • 40. True! But it is all in different formats and data models!
  • 41. This makes it hard to integrate data
  • 42. The data in different data sources aren’t linked
  • 43. For example, how do I know that the Juan Sequeda in Facebook is the same as Juan Sequeda in Twitter
  • 44. Or if I create a mashup from different services, I have to learn different APIs and I get different formats of data back
  • 45. Wouldn’t it be great if we had a standard way of publishing data on the Web?
  • 46. We have a standardized way of publishing documents on the web, right?
    HTML
  • 47. Then why can’t we have a standard way of publishing data on the Web?
  • 48. Good question! And the answer is YES. There is!
  • 49. Resource Description Framework (RDF)
    A data model
    A way to model data
    i.e. Relational databases use relational data model
    RDF is a triple data model
    Labeled Graph
    Subject, Predicate, Object
    <Juan> <was born in> <California>
    <California> <is part of> <the USA>
    <Juan> <likes> <the Semantic Web>
  • 50. RDF can be serialized in different ways
    RDF/XML
    RDFa (RDF in HTML)
    N3
    Turtle
    JSON
  • 51. So does that mean that I have to publish my data in RDF now?
  • 52. You don’t have to… but we would like you to 
  • 53. An example
  • 54. Document on the Web
  • 55. Databases back up documents
    THINGS have PROPERTIES:
    A Book as a Title, an author, …
    This is a THING:
    A book title “Programming the Semantic Web” by Toby Segaran, …
  • 56. Lets represent the data in RDF
    Programming the Semantic Web
    title
    author
    book
    Toby Segaran
    isbn
    978-0-596-15381-6
    publisher
    name
    Publisher
    O’Reilly
  • 57. Remember that we are on the web
    Everything on the web is identified by a URI
  • 58. And now let’s link the data to other data
    Programming the Semantic Web
    title
    author
    http://…/isbn978
    Toby Segaran
    isbn
    978-0-596-15381-6
    publisher
    name
    http://…/publisher1
    O’Reilly
  • 59. And now consider the data from Revyu.com
    hasReview
    http://…/review1
    http://…/isbn978
    description
    reviewer
    Awesome Book
    http://…/reviewer
    name
    Juan Sequeda
  • 60. Let’s start to link data
    hasReview
    http://…/review1
    http://…/isbn978
    Programming the Semantic Web
    title
    description
    sameAs
    hasReviewer
    Awesome Book
    author
    http://…/isbn978
    Toby Segaran
    http://…/reviewer
    name
    isbn
    978-0-596-15381-6
    Juan Sequeda
    publisher
    name
    http://…/publisher1
    O’Reilly
  • 61. Juan Sequeda publishes data too
    http://juansequeda.com/id
    http://dbpedia.org/Austin
    livesIn
    name
    Juan Sequeda
  • 62. Let’s link more data
    hasReview
    http://…/review1
    http://…/isbn978
    description
    hasReviewer
    Awesome Book
    http://…/reviewer
    name
    Juan Sequeda
    sameAs
    http://juansequeda.com/id
    http://dbpedia.org/Austin
    livesIn
    name
    Juan Sequeda
  • 63. And more
    hasReview
    http://…/review1
    http://…/isbn978
    Programming the Semantic Web
    title
    description
    sameAs
    hasReviewer
    Awesome Book
    author
    http://…/isbn978
    Toby Segaran
    http://…/reviewer
    name
    isbn
    978-0-596-15381-6
    Juan Sequeda
    publisher
    sameAs
    http://…/publisher1
    name
    O’Reilly
    http://juansequeda.com/id
    http://dbpedia.org/Austin
    livesIn
    name
    Juan Sequeda
  • 64. Data on the Web that is in RDF and is linked to other RDF data is LINKED DATA
  • 65. Linked Data Principles
    Use URIs as names for things
    Use HTTP URIs so that people can look up (dereference) those names.
    When someone looks up a URI, provide useful information.
    Include links to other URIs so that they can discover more things.
  • 66. Linked Data makes the web appear as ONEGIANTHUGEGLOBALDATABASE!
  • 67. I can query a database with SQL. Is there a way to query Linked Data with a query language?
  • 68. Yes! There is actually a standardize language for that
    SPARQL
  • 69. FIND all the reviews on the book “Programming the Semantic Web” by people who live in Austin
  • 70. hasReview
    http://…/review1
    http://…/isbn978
    Programming the Semantic Web
    title
    description
    sameAs
    hasReviewer
    Awesome Book
    author
    http://…/isbn978
    Toby Segaran
    http://…/reviewer
    name
    isbn
    978-0-596-15381-6
    Juan Sequeda
    publisher
    sameAs
    name
    http://…/publisher1
    O’Reilly
    http://juansequeda.com
    http://dbpedia.org/Austin
    livesIn
    name
    Juan Sequeda
  • 71. This looks cool, but let’s be realistic. What is the incentive to publish Linked Data?
  • 72. What was your incentive to publish an HTML page in 1990?
  • 73. 1) Share data in documents2) Because you neighbor was doing it
  • 74. So why should we publish Linked Data in 2010?
  • 75. 1) Share data as data2) Because you neighbor is doing it
  • 76. And guess who is starting to publish Linked Data now?
  • 77. Linked Data Publishers
    UK Government
    US Government
    BBC
    Open Calais – Thomson Reuters
    Freebase
    NY Times
    Best Buy
    CNET
    Dbpedia
    Are you?
  • 78. How can I publish Linked Data?
  • 79. Publishing Linked Data
    Legacy Data in Relational Databases
    D2R Server
    Virtuoso
    Triplify
    Ultrawrap
    CMS
    Drupal 7
    Native RDF Stores
    Databases for RDF (Triple Stores)
    AllegroGraph, Jena, Sesame, Virtuoso
    Talis Platform (Linked Data in the Cloud)
    In HTML with RDFa
  • 80. Consuming Linked Data by Humans
  • 81. HTML Browsers
  • 82. Links to other URIs
  • 83. <span rel="foaf:interest">
    <a href="http://dbpedia.org/resource/Database" property="dcterms:title">Database</a>,
    <a href="http://dbpedia.org/resource/Data_integration" property="dcterms:title">Data Integration</a>,
    <a href="http://dbpedia.org/resource/Semantic_Web" property="dcterms:title">Semantic Web</a>,
    <a href="http://dbpedia.org/resource/Linked_Data" property="dcterms:title">Linked Data</a>,
    etc.</span>
  • 84. HTML Browsers
    RDF can be serialized in RDFa
    Have you heard of
    Yahoo’s Search Monkey
    Google Rich Snippets?
    They are consuming RDFa
    But WHY?
  • 85. Because there is life beyond ten blue links
  • 86.
  • 87. Google and Yahoo are starting to crawl RDFa!
    The Semantic Web is a reality!
  • 88. The Reality
    Yahoo is crawling data that is in RDFa and Microformats under a specific vocabularies
    FOAF
    GoodRelations

    Google is crawling RDFa and Microformats that use the Google vocabulary
  • 89. Linked Data Browsers
  • 90. Linked Data Browsers
    Not actually separate browsers. Run inside of HTML browsers
    View the data that is returned after looking up a URI in tabular form
    (IMO) UI lacks usability
  • 91.
  • 92. Linked Data Browsers
    Tabulator
    http://www.w3.org/2005/ajar/tab
    OpenLink
    http://ode.openlinksw.com/
    ZitgistDataviewr
    http://dataviewer.zitgist.com/
    Marbles
    http://www5.wiwiss.fu-berlin.de/marbles/
    Explorator
    http://www.tecweb.inf.puc-rio.br/explorator
  • 93. Faceted Browsers
  • 94. http://dbpedia.neofonie.de
  • 95. http://dev.semsol.com/2010/semtech/
  • 96. On-the-fly Mashups
  • 97. http://sig.ma
  • 98. What’s next?
  • 99. Time to create new and innovative ways to interact with Linked Data
  • 100. This may be one of the Killer Apps that we have all been waiting for
    http://en.wikipedia.org/wiki/File:Mosaic_browser_plaque_ncsa.jpg
  • 101. It’s time to partner with HCI community
    Semantic Web UIs don’t have to be ugly
  • 102. Consume Linked Data with SPARQL
  • 103. SPARQL Endpoints
    Linked Data sources usually provide a SPARQL endpoint for their dataset(s)
    SPARQL endpoint: SPARQL query processing service that supports the SPARQL protocol*
    Send your SPARQL query, receive the result
    * http://www.w3.org/TR/rdf-sparql-protocol/
  • 104. Where can I find SPARQL Endpoints?
    Dbpedia: http://dbpedia.org/sparql
    Musicbrainz: http://dbtune.org/musicbrainz/sparql
    U.S. Census: http://www.rdfabout.com/sparql
    Semantic Crunchbase: http://cb.semsol.org/sparql
    http://esw.w3.org/topic/SparqlEndpoints
  • 105. Accessing a SPARQL Endpoint
    SPARQL endpoints: RESTful Web services
    Issuing SPARQL queries to a remote SPARQL endpoint is basically an HTTP GET request to the SPARQL endpoint with parameter query
    GET /sparql?query=PREFIX+rd... HTTP/1.1 Host: dbpedia.orgUser-agent: my-sparql-client/0.1
    URL-encoded string with the SPARQL query
  • 106. Query Results Formats
    SPARQL endpoints usually support different result formats:
    XML, JSON, plain text (for ASK and SELECT queries)
    RDF/XML, NTriples, Turtle, N3 (for DESCRIBE and CONSTRUCT queries)
  • 107. Query Results Formats
    PREFIX dbp: http://dbpedia.org/ontology/
    PREFIX dbpprop: http://dbpedia.org/property/
    SELECT ?name ?bdayWHERE {
    ?pdbp:birthplace <http://dbpedia.org/resource/Berlin> .
    ?pdbpprop:dateOfBirth ?bday .
    ?pdbpprop:name ?name .
    }
  • 108.
  • 109.
  • 110. Query Result Formats
    Use the ACCEPT header to request the preferred result format:
    GET /sparql?query=PREFIX+rd... HTTP/1.1
    Host: dbpedia.org
    User-agent: my-sparql-client/0.1
    Accept: application/sparql-results+json
  • 111. Query Result Formats
    As an alternative some SPARQL endpoint implementations (e.g. Joseki) provide an additional parameter out
    GET /sparql?out=json&query=... HTTP/1.1
    Host: dbpedia.org
    User-agent: my-sparql-client/0.1
  • 112. Accessing a SPARQL Endpoint
    More convenient: use a library
    SPARQL JavaScript Library
    http://www.thefigtrees.net/lee/blog/2006/04 sparql_calendar_demo_a_sparql.html
    ARC for PHP
    http://arc.semsol.org/
    RAP – RDF API for PHP
    http://www4.wiwiss.fu-berlin.de/bizer/rdfapi/index.html
  • 113. Accessing a SPARQL Endpoint
    Jena / ARQ (Java)
    http://jena.sourceforge.net/
    Sesame (Java)
    http://www.openrdf.org/
    SPARQL Wrapper (Python)
    http://sparql-wrapper.sourceforge.net/
    PySPARQL (Python)
    http://code.google.com/p/pysparql/
  • 114. Accessing a SPARQL Endpoint
    Example with Jena/ARQ
    import com.hp.hpl.jena.query.*;
    String service = "..."; // address of the SPARQL endpoint
    String query = "SELECT ..."; // your SPARQL query
    QueryExecutione = QueryExecutionFactory.sparqlService(service, query)
    ResultSet results = e.execSelect();
    while ( results.hasNext() ) {
    QuerySolutions = results.nextSolution();
    // ...
    }
    e.close();
  • 115. Querying a single dataset is quite boring
    compared to:
    Issuing SPARQL queries over multiple datasets
    How can you do this?
    Issue follow-up queries to different endpoints
    Querying a central collection of datasets
    Build store with copies of relevant datasets
    Use query federation system
  • 116. Follow-up Queries
    Idea: issue follow-up queries over other datasets based on results from previous queries
    Substituting placeholders in query templates
  • 117. String s1 = "http://cb.semsol.org/sparql";
    String s2 = "http://dbpedia.org/sparql";
    String qTmpl = "SELECT ?c WHERE{ <%s> rdfs:comment ?c }";
    String q1 = "SELECT ?s WHERE { ...";
    QueryExecution e1 = QueryExecutionFactory.sparqlService(s1,q1);
    ResultSet results1 = e1.execSelect();
    while ( results1.hasNext() ) {
    QuerySolution s1 = results.nextSolution();
    String q2 = String.format( qTmpl, s1.getResource("s"),getURI() );
    QueryExecution e2= QueryExecutionFactory.sparqlService(s2,q2);
    ResultSet results2 = e2.execSelect();
    while ( results2.hasNext() ) {
    // ...
    }
    e2.close();
    }
    e1.close();
    Find a list of companies
    Filtered by some criteria and return DbpediaURIs from them
  • 118. Follow-up Queries
    Advantage
    Queried data is up-to-date
    Drawbacks
    Requires the existence of a SPARQL endpoint for each dataset
    Requires program logic
    Very inefficient
  • 119. Querying a Collection of Datasets
    Idea: Use an existing SPARQL endpoint that provides access to a set of copies of relevant datasets
    Example:
    SPARQL endpoint over a majority of datasets from the LOD cloud at:
    http://uberblic.org
    http://lod.openlinksw.com/sparql
  • 120. Querying a Collection of Datasets
    Advantage:
    No need for specific program logic
    Drawbacks:
    Queried data might be out of date
    Not all relevant datasets in the collection
  • 121. Own Store of Dataset Copies
    Idea: Build your own store with copies of relevant datasets and query it
    Possible stores:
    Jena TDB http://jena.hpl.hp.com/wiki/TDB
    Sesame http://www.openrdf.org/
    OpenLink Virtuoso http://virtuoso.openlinksw.com/
    4store http://4store.org/
    AllegroGraphhttp://www.franz.com/agraph/
    etc.
  • 122. Populating Your Store
    Get RDF dumps provided for the datasets
    (Focused) Crawling
    ldspiderhttp://code.google.com/p/ldspider/
    Multithreaded API for focussed crawling
    Crawling strategies (breath-first, load-balancing)
    Flexible configuration with callbacks and hooks
  • 123. Own Store of Dataset Copies
    Advantages:
    No need for specific program logic
    Can include all datasets
    Independent of the existence, availability, and efficiency of SPARQL endpoints
    Drawbacks:
    Requires effort to set up and to operate the store
    Ideally, data sources provide RDF dumps; if not?
    How to keep the copies in sync with the originals?
    Queried data might be out of date
  • 124. Federated Query Processing
    Idea: Querying a mediator which distributes sub-queries to relevant sources and integrates the results
  • 125. Federated Query Processing
    Instance-based federation
    Each thing described by only one data source
    Untypical for the Web of Data
    Triple-based federation
    No restrictions
    Requires more distributed joins
    Statistics about datasets required (both cases)
  • 126. Federated Query Processing
    DARQ (Distributed ARQ)
    http://darq.sourceforge.net/
    Query engine for federated SPARQL queries
    Extension of ARQ (query engine for Jena)
    Last update: June 28, 2006
    Semantic Web Integrator and Query Engine(SemWIQ)
    http://semwiq.sourceforge.net/
    Actively maintained
  • 127. Federated Query Processing
    Advantages:
    No need for specific program logic
    Queried data is up to date
    Drawbacks:
    Requires the existence of a SPARQL endpoint for each dataset
    Requires effort to set up and configure the mediator
  • 128. In any case:
    You have to know the relevant data sources
    When developing the app using follow-up queries
    When selecting an existing SPARQL endpoint over a collection of dataset copies
    When setting up your own store with a collection of dataset copies
    When configuring your query federation system
    You restrict yourself to the selected sources
  • 129. In any case:
    You have to know the relevant data sources
    When developing the app using follow-up queries
    When selecting an existing SPARQL endpoint over a collection of dataset copies
    When setting up your own store with a collection of dataset copies
    When configuring your query federation system
    You restrict yourself to the selected sources
    There is an alternative:
    Remember, URIs link to data
  • 130. Automated Link Traversal
    Idea: Discover further data by looking up relevant URIs in your application
    Can be combined with the previous approaches
  • 131. Link Traversal Based Query Execution
    Applies the idea of automated link traversal to the execution of SPARQL queries
    Idea:
    Intertwine query evaluation with traversal of RDF links
    Discover data that might contribute to query results during query execution
    Alternately:
    Evaluate parts of the query
    Look up URIs in intermediate solutions
  • 132. Link Traversal Based Query Execution
  • 133. Link Traversal Based Query Execution
  • 134. Link Traversal Based Query Execution
  • 135. Link Traversal Based Query Execution
  • 136. Link Traversal Based Query Execution
  • 137. Link Traversal Based Query Execution
  • 138. Link Traversal Based Query Execution
  • 139. Link Traversal Based Query Execution
  • 140. Link Traversal Based Query Execution
  • 141. Link Traversal Based Query Execution
  • 142. Link Traversal Based Query Execution
    Advantages:
    No need to know all data sources in advance
    No need for specific programming logic
    Queried data is up to date
    Does not depend on the existence of SPARQL endpoints provided by the data sources
    Drawbacks:
    Not as fast as a centralized collection of copies
    Unsuitable for some queries
    Results might be incomplete (do we care?)
  • 143. Implementations
    Semantic Web Client library (SWClLib) for Java
    http://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/
    SWIC for Prolog
    http://moustaki.org/swic/
  • 144. Implementations
    SQUIN http://squin.org
    Provides SWClLib functionality as a Web service
    Accessible like a SPARQL endpoint
    Install package: unzip and start
    Less than 5 mins!
    Convenient access with SQUIN PHP tools:
    $s = 'http:// ...'; // address of the SQUIN service
    $q = new SparqlQuerySock( $s, '... SELECT ...' );
    $res = $q->getJsonResult();// or getXmlResult()
  • 145. Real World Example
  • 146. Getting Started
    Finding URIs
    Finding Additional Data
    Finding SPARQL Endpoints
  • 147. What is a Linked Data application
    Software system that makes use of data on the web from multiple datasets and that benefits from links between the datasets
  • 148. Characteristics of Linked Data Applications
    • Consume data that is published on the web following the Linked Data principles: an application should be able to request, retrieve and process the accessed data
    • 149. Discover further information by following the links between different data sources: the fourth principle enables this.
    • 150. Combine the consumed linked data with data from sources (not necessarily Linked Data)
    • 151. Expose the combined data back to the web following the Linked Data principles
    • 152. Offer value to end-users
  • Examples
    http://data-gov.tw.rpi.edu/wiki
    http://dbrec.net/
    http://fanhu.bz/
    http://data.nytimes.com/schools/schools.html
    http://sig.ma
    http://visinav.deri.org/semtech2010/
  • 153. Hot Research Topics
    Interlinking Algorithms
    Provenance and Trust
    Dataset Dynamics
    UI
    Distributed Query
    Evaluation
    “You want a good thesis? IR is based on precision and recall. The minute you add semantics, it is a meaningless feature. Logic is based on soundness and completeness. We don’t want soundness and completeness. We want a few good answers quickly.” – Jim Hendler at WWW2009 during the LOD gathering
    Thanks Michael Hausenblas
  • 154. THANKS
    Juan Sequeda
    www.juansequeda.com
    @juansequeda
    #cold
    www.consuminglinkeddata.org
    Acknowledgements: Olaf Hartig, Patrick Sinclair, Jamie Taylor
    Slides for Consuming Linked Data with SPARQL by Olaf Hartig