Your SlideShare is downloading. ×
Consuming Linked Data 4/5 Semtech2011
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Consuming Linked Data 4/5 Semtech2011

2,420
views

Published on

Published in: Education, Technology

0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,420
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
111
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Consuming Linked Data
    Juan F. Sequeda
    Semantic Technology Conference
    June 2011
  • 2. Now what can we do with this data?
  • 3. Linked Data Applications
    Software system that makes use of data on the web from multiple datasets and that benefits from links between the datasets
  • 4. Characteristics of Linked Data Applications
    • Consume data that is published on the web following the Linked Data principles: an application should be able to request, retrieve and process the accessed data
    • 5. Discover further information by following the links between different data sources: the fourth principle enables this.
    • 6. Combine the consumed linked data with data from sources (not necessarily Linked Data)
    • 7. Expose the combined data back to the web following the Linked Data principles
    • 8. Offer value to end-users
  • Generic Applications
  • 9. Linked Data Browsers
  • 10. Linked Data Browsers
    Not actually separate browsers. Run inside of HTML browsers
    View the data that is returned after looking up a URI in tabular form
    User can navigate between data sources by following RDF Links
    (IMO) No usability
  • 11.
  • 12. Linked Data Browsers
    http://browse.semanticweb.org/
    Tabulator
    OpenLinkDataexplorer
    Zitgist
    Marbles
    Explorator
    Disco
    LinkSailor
  • 13. Linked Data (Semantic Web) Search Engines
  • 14. Linked Data (Semantic Web) Search Engines
    Just like conventional search engines (Google, Bing, Yahoo), crawl RDF documents and follow RDF links.
    Current search engines don’t crawl data, unless it’s RDFa
    Human focus Search
    Falcons - Keyword
    SWSE – Keyworkd
    VisiNav – Complex Queries
    Machine focus Search
    Sindice – data instances
    Swoogle - ontologies
    Watson - ontologies
    Uberblic – curated integrated data instances
  • 15. (Semantic) SEO ++
    Markup your HTML with RDFa
    Use standard vocabularies (ontologies)
    Google Vocabulary
    Good Relations
    Dublin Core
    Google and Yahoo will crawl this data and use it for better rendering
  • 16.
  • 17. On-the-fly Mashups
  • 18. http://sig.ma
  • 19. Domain Specific Applications
  • 20. Domain Specific Applications
    Government
    Data.gov
    Data.gov.uk
    http://data-gov.tw.rpi.edu/wiki/Demos
    Music
    Seevl.net
    Dbpedia Mobile
    Life Science
    LinkedLifeData
    Sports
    BBC World Cup
  • 21. Faceted Browsers
  • 22. http://dbpedia.neofonie.de/browse/
  • 23. http://dev.semsol.com/2010/semtech/
  • 24. Query your data
  • 25. Find all the locations of all the original paintings of Modigliani
  • 26. Select all proteins that are linked to a curated interaction from the literature and to inflammatory response
    http://linkedlifedata.com/
  • 27. SPARQL Endpoints
    Linked Data sources usually provide a SPARQL endpoint for their dataset(s)
    SPARQL endpoint: SPARQL query processing service that supports the SPARQL protocol*
    Send your SPARQL query, receive the result
    * http://www.w3.org/TR/rdf-sparql-protocol/
  • 28. Where can I find SPARQL Endpoints?
    Dbpedia: http://dbpedia.org/sparql
    Musicbrainz: http://dbtune.org/musicbrainz/sparql
    U.S. Census: http://www.rdfabout.com/sparql
    http://esw.w3.org/topic/SparqlEndpoints
  • 29. Accessing a SPARQL Endpoint
    SPARQL endpoints: RESTful Web services
    Issuing SPARQL queries to a remote SPARQL endpoint is basically an HTTP GET request to the SPARQL endpoint with parameter query
    GET /sparql?query=PREFIX+rd... HTTP/1.1 Host: dbpedia.orgUser-agent: my-sparql-client/0.1
    URL-encoded string with the SPARQL query
  • 30. Query Results Formats
    SPARQL endpoints usually support different result formats:
    XML, JSON, plain text (for ASK and SELECT queries)
    RDF/XML, NTriples, Turtle, N3 (for DESCRIBE and CONSTRUCT queries)
  • 31. Query Results Formats
    PREFIX dbp: http://dbpedia.org/ontology/
    PREFIX dbpprop: http://dbpedia.org/property/
    SELECT ?name ?bdayWHERE {
    ?pdbp:birthplace <http://dbpedia.org/resource/Berlin> .
    ?pdbpprop:dateOfBirth ?bday .
    ?pdbpprop:name ?name .
    }
  • 32.
  • 33.
  • 34. Query Result Formats
    Use the ACCEPT header to request the preferred result format:
    GET /sparql?query=PREFIX+rd... HTTP/1.1
    Host: dbpedia.org
    User-agent: my-sparql-client/0.1
    Accept: application/sparql-results+json
  • 35. Query Result Formats
    As an alternative some SPARQL endpoint implementations (e.g. Joseki) provide an additional parameter out
    GET /sparql?out=json&query=... HTTP/1.1
    Host: dbpedia.org
    User-agent: my-sparql-client/0.1
  • 36. Accessing a SPARQL Endpoint
    More convenient: use a library
    SPARQL JavaScript Library
    http://www.thefigtrees.net/lee/blog/2006/04 sparql_calendar_demo_a_sparql.html
    ARC for PHP
    http://arc.semsol.org/
    RAP – RDF API for PHP
    http://www4.wiwiss.fu-berlin.de/bizer/rdfapi/index.html
  • 37. Accessing a SPARQL Endpoint
    Jena / ARQ (Java)
    http://jena.sourceforge.net/
    Sesame (Java)
    http://www.openrdf.org/
    SPARQL Wrapper (Python)
    http://sparql-wrapper.sourceforge.net/
    PySPARQL (Python)
    http://code.google.com/p/pysparql/
  • 38. Accessing a SPARQL Endpoint
    Example with Jena/ARQ
    import com.hp.hpl.jena.query.*;
    String service = "..."; // address of the SPARQL endpoint
    String query = "SELECT ..."; // your SPARQL query
    QueryExecutione = QueryExecutionFactory.sparqlService(service, query)
    ResultSet results = e.execSelect();
    while ( results.hasNext() ) {
    QuerySolutions = results.nextSolution();
    // ...
    }
    e.close();
  • 39. Querying a single dataset is quite boring
    compared to
    Issuing queries over multiple datasets
  • 40. Creating a Linked Data Application
  • 41. Linked Data Architectures
    Follow-up queries
    Querying Local Cache
    Crawling
    Federated Query Processing
    On-the-fly Dereferencing
  • 42. Follow-up Queries
    Idea: issue follow-up queries over other datasets based on results from previous queries
    Substituting placeholders in query templates
  • 43. String s1 = "http://cb.semsol.org/sparql";
    String s2 = "http://dbpedia.org/sparql";
    String qTmpl = "SELECT ?c WHERE{ <%s> rdfs:comment ?c }";
    String q1 = "SELECT ?s WHERE { ...";
    QueryExecution e1 = QueryExecutionFactory.sparqlService(s1,q1);
    ResultSet results1 = e1.execSelect();
    while ( results1.hasNext() ) {
    QuerySolution s1 = results.nextSolution();
    String q2 = String.format( qTmpl, s1.getResource("s"),getURI() );
    QueryExecution e2= QueryExecutionFactory.sparqlService(s2,q2);
    ResultSet results2 = e2.execSelect();
    while ( results2.hasNext() ) {
    // ...
    }
    e2.close();
    }
    e1.close();
    Find a list of companies
    Filtered by some criteria and return DbpediaURIs from them
  • 44. Follow-up Queries
    Advantage
    Queried data is up-to-date
    Drawbacks
    Requires the existence of a SPARQL endpoint for each dataset
    Requires program logic
    Very inefficient
  • 45. Querying Local Cache
    Idea: Use an existing SPARQL endpoint that provides access to a set of copies of relevant datasets
    Use RDF dumps of each dataset
    SPARQL endpoint over a majority of datasets from the LOD cloud at:
    http://uberblic.org
    http://lod.openlinksw.com/sparql
  • 46. Querying a Collection of Datasets
    Advantage:
    No need for specific program logic
    Includes the datasets that you want
    Complex queries and high performance
    Even reasoning
    Drawbacks:
    Depends on existence of RDF dump
    Requires effort to set up and to operate the store
    How to keep the copies in sync with the originals?
    Queried data might be out of date
  • 47. Crawling
    Crawl RDF in advance by following RDF links
    Integrate, clean and store in your own triplestore
    Same way we crawl HTML today
    LDSpider
  • 48. Crawling
    Advantages:
    No need for specific program logic
    Independent of the existence, availability, and efficiency of SPARQL endpoints
    Complex queries with high performance
    Can even reason about the data
    Drawbacks:
    Requires effort to set up and to operate the store
    How to keep the copies in sync with the originals?
    Queried data might be out of date
  • 49. Federated Query Processing
    Idea: Querying a mediator which distributes sub-queries to relevant sources and integrates the results
  • 50. Federated Query Processing
    Instance-based federation
    Each thing described by only one data source
    Untypical for the Web of Data
    Triple-based federation
    No restrictions
    Requires more distributed joins
    Statistics about datasets required (both cases)
  • 51. Federated Query Processing
    DARQ (Distributed ARQ)
    http://darq.sourceforge.net/
    Query engine for federated SPARQL queries
    Extension of ARQ (query engine for Jena)
    Last update: June 2006
    Semantic Web Integrator and Query Engine(SemWIQ)
    http://semwiq.sourceforge.net/
    Last update: March 2010
    Commercial

  • 52. Federated Query Processing
    Advantages:
    No need for specific program logic
    Queried data is up to date
    Drawbacks:
    Requires the existence of a SPARQL endpoint for each dataset
    Requires effort to set up and configure the mediator
  • 53. In any case:
    You have to know the relevant data sources
    When developing the app using follow-up queries
    When selecting an existing SPARQL endpoint over a collection of dataset copies
    When setting up your own store with a collection of dataset copies
    When configuring your query federation system
    You restrict yourself to the selected sources
  • 54. In any case:
    You have to know the relevant data sources
    When developing the app using follow-up queries
    When selecting an existing SPARQL endpoint over a collection of dataset copies
    When setting up your own store with a collection of dataset copies
    When configuring your query federation system
    You restrict yourself to the selected sources
    There is an alternative:
    Remember, URIs link to data
  • 55. On-the-fly Dereferencing
    Idea: Discover further data by looking up relevant URIs in your application on the fly
    Can be combined with the previous approaches
    Linked Data Browsers
  • 56. Link Traversal Based Query Execution
    Applies the idea of automated link traversal to the execution of SPARQL queries
    Idea:
    Intertwine query evaluation with traversal of RDF links
    Discover data that might contribute to query results during query execution
    Alternately:
    Evaluate parts of the query
    Look up URIs in intermediate solutions
  • 57. Link Traversal Based Query Execution
  • 58. Link Traversal Based Query Execution
  • 59. Link Traversal Based Query Execution
  • 60. Link Traversal Based Query Execution
  • 61. Link Traversal Based Query Execution
  • 62. Link Traversal Based Query Execution
  • 63. Link Traversal Based Query Execution
  • 64. Link Traversal Based Query Execution
  • 65. Link Traversal Based Query Execution
  • 66. Link Traversal Based Query Execution
  • 67. Link Traversal Based Query Execution
    Advantages:
    No need to know all data sources in advance
    No need for specific programming logic
    Queried data is up to date
    Does not depend on the existence of SPARQL endpoints provided by the data sources
    Drawbacks:
    Not as fast as a centralized collection of copies
    Unsuitable for some queries
    Results might be incomplete (do we care?)
  • 68. Implementations
    Semantic Web Client library (SWClLib) for Java
    http://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/
    SWIC for Prolog
    http://moustaki.org/swic/
  • 69. Implementations
    SQUIN http://squin.org
    Provides SWClLib functionality as a Web service
    Accessible like a SPARQL endpoint
    Install package: unzip and start
    Less than 5 mins!
    Convenient access with SQUIN PHP tools:
    $s = 'http:// ...'; // address of the SQUIN service
    $q = new SparqlQuerySock( $s, '... SELECT ...' );
    $res = $q->getJsonResult();// or getXmlResult()
  • 70. Real World Example
  • 71. What else?
    Vocabulary Mapping
    foaf:namevsfoo:name
    Identity Resolution
    ex:Juanowl:sameAsfoo:Juan
    Provenance
    Data Quality
    License
  • 72. Getting Started
    Finding URIs
    Use search engines
    Finding SPARQL Endpoints