Linked Data and Services                                                 Andreas Harth and Barry NortonInstitute AIFBKIT –...
Outline!    Motivation!    Linked Data Principles!    Query Processing over Linked Data!    Linked Data Services (LIDS) an...
Motivation!      Semantic Web/Linked Data technologies are well-suited       for data integration                         ...
Linked Data Principles*1.     Use URIs to name things; not only documents, but       also people, locations, concepts, etc...
Correspondence between thing-URI and    source-URI           User Agent                              http://www.polleres.n...
Correspondence between thing-URI and    source-URI             User Agent                                 http://dbpedia.o...
KIT – University of the State of Baden-Wuerttemberg andNational Laboratory of the Helmholtz Association
Queries over Linked DataSELECT ?f ?n WHERE {  an:f#ah foaf:knows ?f.  ?f foaf:name ?n.}SELECT ?x1 ?x2 WHERE {  dblppub:Hog...
Querying Data Across Sources    !     Data warehousing or materialisation-based approaches          (MAT)                 ...
DQP on Linked Data            SELECT *                                                        R                S          ...
Query Processing Overview      SELECT ?f ?n WHERE {        an:f#ah foaf:knows ?f.        ?f foaf:name ?n.      }          ...
Barry        KIT – University of the State of Baden-Wuerttemberg and        National Laboratory of the Helmholtz Association
Problem: Source Selection for Triple Patterns     !     (?s       ?p           ?o)     !     (#s       ?p           ?o)   ...
Schema-Level Indices [Stuckenschmidt et al.     2004]     !     Keep index of properties and/or classes contained in      ...
Direct Lookup (DL) [Hartig et al. 2009]     !     Exploits correspondence between thing-URI and source-URI     !     Linke...
Approximate Data Summaries     !     Combined description of schema-level and instance-level     !     Use approximation t...
Implementation!    Deploy wrappers „in the cloud“!    Google App Engine: hosting of Java and Python     webapps on Google’...
Linking Open Data Cloud 2007                               KIT – University of the State of Baden-Wuerttemberg and        ...
Linking Open Data Cloud 2008                               KIT – University of the State of Baden-Wuerttemberg and        ...
Linking Open Data Cloud 2009                               KIT – University of the State of Baden-Wuerttemberg and        ...
Linking Open Data Cloud 2010                               KIT – University of the State of Baden-Wuerttemberg and        ...
Geonames Services                    KIT – University of the State of Baden-Wuerttemberg and                    National L...
Geonames Services                    KIT – University of the State of Baden-Wuerttemberg and                    National L...
Geonames Services    {"weatherObservation":     {"clouds":"broken clouds",      "weatherCondition":"drizzle",      "observ...
Geonames Services    {"weatherObservation":     {"clouds":"broken clouds",      "weatherCondition":"drizzle",      "observ...
Linked Open Service Principles REST Principles 1. Application state and functionality is divided into resources 2. Every r...
LOS Weather Service                      KIT – University of the State of Baden-Wuerttemberg and                      Nati...
LOS Geo Resources                    KIT – University of the State of Baden-Wuerttemberg and                    National L...
Resource-Based Linked Open Services                                            GET                                        ...
Interlinking Data with Data from Services?                                    KIT – University of the State of Baden-Wuert...
Data Services!    Given input, provide output!    Input and output are related in a service-specific way!    Do not change...
Linked Data Services!     We’d like to integrate data services with Linked Data1.    LIDS need to adhere to Linked Data pr...
1. Data Services as Linked Data!    Input is given as URI                                  Service Endpointhttp://geowrap....
2. LIDS Descriptions!    LIDS characterised by     !    Endpoint URI ep, which is the base for all input entities     !   ...
Interlink LIDS and Linked Data                            !    Generate service URIs                                 with ...
Scale-Up Experiment: Link BTC to GeoNames!    3 billion triples from the Billion Triple Challenge (BTC) 2010     data set:...
Query Answering using LIDS and Linked Data                          !    Query execution                               res...
Experiment: Query Answering!    Input:     List of 562 (potential) universities from Facebook Graph     API!    Output:   ...
Linked Services and PlanetData!    Several areas seem likely to produce services:     !    Stream, inc. Sensor, resources ...
Upcoming SlideShare
Loading in …5
×

Linked Data and Sevices

995 views

Published on

The Linked Data and Services presentation was presented by Andreas Harth (KIT) and Barry Norton (KIT) at the PlanetData project Kick-off Meeting on October 11, 2010 in Palma de Mallorca, Spain.

Published in: Technology, Education
  • Be the first to comment

Linked Data and Sevices

  1. 1. Linked Data and Services Andreas Harth and Barry NortonInstitute AIFBKIT – University of the State of Baden-Wuerttemberg andNational Laboratory of the Helmholtz Association www.kit.edu
  2. 2. Outline!  Motivation!  Linked Data Principles!  Query Processing over Linked Data!  Linked Data Services (LIDS) and Linked Open Services (LOS)!  Conclusion KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  3. 3. Motivation!  Semantic Web/Linked Data technologies are well-suited for data integration ? ! Common Data Data Interactive Data Format/Access Integration Exploration Protocol 8/10/11 Taking the LIDS off Data Silos KIT – University of the State of Baden-Wuerttemberg and Andreas Harth National Laboratory of the Helmholtz Association
  4. 4. Linked Data Principles*1.  Use URIs to name things; not only documents, but also people, locations, concepts, etc.2.  To enable agents (human users and machine agents alike) to look up those names, use HTTP URIs3.  When someone looks up a URI we provide useful information; with useful in the strict sense we usually mean structured data in RDF.4.  Include links to other URIs allowing agents (machines and humans) to discover more things (*) http://www.w3.org/DesignIssues/LinkedData.html KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  5. 5. Correspondence between thing-URI and source-URI User Agent http://www.polleres.net/foaf.rdf#me HTTP RDF GET Web Server http://www.polleres.net/foaf.rdf5 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  6. 6. Correspondence between thing-URI and source-URI User Agent http://dbpedia.org/resource/Gordon_Brown HTTP 303 HTTP RDF GET GET http://dbpedia.org/data/Gordon_Brown Web Server http://dbpedia.org/page/Gordon_Brown6 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  7. 7. KIT – University of the State of Baden-Wuerttemberg andNational Laboratory of the Helmholtz Association
  8. 8. Queries over Linked DataSELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n.}SELECT ?x1 ?x2 WHERE { dblppub:HoganHP08 dc:creator ?a1. ?x1 owl:sameAs ?a1. ?x2 foaf:knows ?x1.} ?f ?n KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  9. 9. Querying Data Across Sources !  Data warehousing or materialisation-based approaches (MAT) CRAWL INDEX SERVE !  Distributed query processing approaches (DQP) SELECT * R S FROM… R S9 15.03.2010 Andreas Harth KIT – University of the State of Baden-Wuerttemberg and Data Summaries for On-Demand Queries over Linked Data National Laboratory of the Helmholtz Association
  10. 10. DQP on Linked Data SELECT * R S FROM… R S ODBC ODBC SELECT ?s TP TP WHERE… HTTP HTTP TP TP GET GET10 15.03.2010 Andreas Harth KIT – University of the State of Baden-Wuerttemberg and Data Summaries for On-Demand Queries over Linked Data National Laboratory of the Helmholtz Association
  11. 11. Query Processing Overview SELECT ?f ?n WHERE { an:f#ah foaf:knows ?f. ?f foaf:name ?n. } TP TP (an:f#ah foaf:knows ?f) (?f foaf:name ?n) Select source HTTP RDF Select source HTTP RDF (s) GET GET (s) ?f ?n http://danbri.org/foaf.rdf#danbri Dan Brickley11 15.03.2010 Andreas Harth KIT – University of the State of Baden-Wuerttemberg and Data Summaries for On-Demand Queries over Linked Data National Laboratory of the Helmholtz Association
  12. 12. Barry KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  13. 13. Problem: Source Selection for Triple Patterns !  (?s ?p ?o) !  (#s ?p ?o) !  (?s #p ?o) !  (?s ?p #o) !  (#s #p ?o) !  (#s ?p #o) !  (?s #p #o) !  (#s #p #o) !  Given a triple pattern, which source can contribute bindings for the triple pattern?13 15.03.2010 Andreas Harth KIT – University of the State of Baden-Wuerttemberg and Data Summaries for On-Demand Queries over Linked Data National Laboratory of the Helmholtz Association
  14. 14. Schema-Level Indices [Stuckenschmidt et al. 2004] !  Keep index of properties and/or classes contained in sources !  (?s #p ?o), (?s rdf:type #o) !  Covers only queries containing schema-level elements !  Commonly used properties select potentially too many sources SELECT ?x1 ?x2 WHERE { SELECT ?f ?n WHERE { dblppub:HoganHP08 dc:creator ?a1. an:f#ah foaf:knows ?f. ?x1 owl:sameAs ?a1. ?f foaf:name ?n. ?x2 foaf:knows ?x1. } }14 15.03.2010 Andreas Harth KIT – University of the State of Baden-Wuerttemberg and Data Summaries for On-Demand Queries over Linked Data National Laboratory of the Helmholtz Association
  15. 15. Direct Lookup (DL) [Hartig et al. 2009] !  Exploits correspondence between thing-URI and source-URI !  Linked Data sources (aka RDF files) return typically triples with a subject corresponding to the source !  Sometimes the sources return triples with object corresponding to the source !  (#s ?p ?o), (#s #p ?o), (#s #p #o) !  (?s ?p #o), (?s #p #o) !  Incomplete wrt. patterns but also wrt. to URI reuse across sources !  Limited parallelism, unclear how to schedule lookups SELECT ?x1 ?x2 WHERE { SELECT ?f ?n WHERE { dblppub:HoganHP08 dc:creator ?a1. an:f#ah foaf:knows ?f. ?x1 owl:sameAs ?a1. ?f foaf:name ?n. ?x2 foaf:knows ?x1. } }15 15.03.2010 Andreas Harth KIT – University of the State of Baden-Wuerttemberg and Data Summaries for On-Demand Queries over Linked Data National Laboratory of the Helmholtz Association
  16. 16. Approximate Data Summaries !  Combined description of schema-level and instance-level !  Use approximation to reduce index size (incurs false positives) !  Possible to use entire query for source selection !  Parallel lookups since sources can be determined for the entire query !  (?s ?p ?o), (#s ?p ?o), (?s #p ?o), (?s ?p #o), (#s #p ? o), (#s ?p #o), (?s #p #o), (#s #p #o) !  and combinations of triple patterns SELECT ?x1 ?x2 WHERE { SELECT ?f ?n WHERE { dblppub:HoganHP08 dc:creator ?a1. an:f#ah foaf:knows ?f. ?x1 owl:sameAs ?a1. ?f foaf:name ?n. ?x2 foaf:knows ?x1. } }16 15.03.2010 Andreas Harth KIT – University of the State of Baden-Wuerttemberg and Data Summaries for On-Demand Queries over Linked Data National Laboratory of the Helmholtz Association
  17. 17. Implementation!  Deploy wrappers „in the cloud“!  Google App Engine: hosting of Java and Python webapps on Google’s Cloud infrastructure!  Limited amount of processing time (6hrs/day)!  Single-threaded applications!  Suited for deploying wrappers!  e.g. http://twitter2foaf.appspot.com/ converts Twitter user data to RDF KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  18. 18. Linking Open Data Cloud 2007 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  19. 19. Linking Open Data Cloud 2008 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  20. 20. Linking Open Data Cloud 2009 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  21. 21. Linking Open Data Cloud 2010 KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  22. 22. Geonames Services KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  23. 23. Geonames Services KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  24. 24. Geonames Services {"weatherObservation": {"clouds":"broken clouds", "weatherCondition":"drizzle", "observation":"LESO 251300Z 03007KT 340V040 CAVOK 23/15 Q1010", "windDirection":30, "ICAO":"LESO", ... KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  25. 25. Geonames Services {"weatherObservation": {"clouds":"broken clouds", "weatherCondition":"drizzle", "observation":"LESO 251300Z 03007KT 340V040 CAVOK 23/15 Q1010", "windDirection":30, "ICAO":"LESO", ... KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  26. 26. Linked Open Service Principles REST Principles 1. Application state and functionality is divided into resources 2. Every resource is uniquely addressable 3. All resources share a uniform interface: a) A constrained set of well-defined operations b) A constrained set of content types Linked Data Principles 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) 4. Include links to other URIs. so that they can discover more things. Linked Open Service Principles 1. Describe services as LOD prosumers with input and output descriptions as SPARQL graph patterns 2. Communicate RDF by RESTful content negotiation 3. The output should make explicit its relation with the input KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  27. 27. LOS Weather Service KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  28. 28. LOS Geo Resources KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  29. 29. Resource-Based Linked Open Services GET Accept: text/html 303 REDIRECT /page GET Accept: application/rdf Linked Data +xml (or text/n3) 303 REDIRECT /data GET /weather Linked Service Accept: application/rdf +xml (or text/n3) 200 <rdf:Description> KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  30. 30. Interlinking Data with Data from Services? KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  31. 31. Data Services!  Given input, provide output!  Input and output are related in a service-specific way!  Do not change the state of the world Input relation Output defines Service!  E.g. GeoNames findNearbyWikipedia service !  Input: lat/lon !  Output: places KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association !  Relation: output places that are nearby input place
  32. 32. Linked Data Services!  We’d like to integrate data services with Linked Data1.  LIDS need to adhere to Linked Data principles!  We’d like to use data services in software programs2.  LIDS need machine-readable descriptions of input and output!  Compared to naïve approach: assign URI to service output!  Relationship between input and output is explicitly described!  Dynamicity is supported KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  33. 33. 1. Data Services as Linked Data!  Input is given as URI Service Endpointhttp://geowrap.openlids.org/findNearbyWikipedia?lat=37.416&lng=-122.152 Parameters#point Input Identifier Output!  Resolving the URI yields Relation RDF: Input@prefix dbp: <http://dbpedia.org/resource/> .@prefix : <http://geo..Wiki? lat=37.416&lng=-122.152#>:point foaf:based_near dbp:Palo_Alto KIT – University of the State of Baden-Wuerttemberg and %2C_California ; National Laboratory of the Helmholtz Association foaf:based_near dbp:Packard%27s_garage .
  34. 34. 2. LIDS Descriptions!  LIDS characterised by !  Endpoint URI ep, which is the base for all input entities !  Local identifier i of input entity !  List of parameters Xi !  Basic graph pattern Ti describing conditions on parameters !  Basic graph pattern To describing minimum output data!  Example: ep = <http:/geowrap.openlids.org/findNearbyWikipedia> i = point Xi = {?lat, ?lng} Ti = ?point a Point . ?point geo:lat ?lat . ?point geo:long ?lng To = ?point foaf:based_near ?feature KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  35. 35. Interlink LIDS and Linked Data !  Generate service URIs with input bindings, from evaluating : select Xi where Ti !  sameAs: binding for i
  36. 36. Scale-Up Experiment: Link BTC to GeoNames!  3 billion triples from the Billion Triple Challenge (BTC) 2010 data set:!  Annotate with LIDS wrapper of GeoNames findNearby service!  Annotation time: < 12 hours on laptop!!  ~ 12 hours for uncompressing the data set, cleaning results, and gather statistics!  Original BTC data: 74 different domains that linked to GeoNames URIs!  Interlinking process added 891 new now linked to LIDS geowrap!  In total 2,448,160 new links were added KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  37. 37. Query Answering using LIDS and Linked Data !  Query execution resolves URIs !  => enlarges data set !  LIDS are interlinked !  Query is executed again on new data set !  Repeat until no new links or no new data !  Combine results
  38. 38. Experiment: Query Answering!  Input: List of 562 (potential) universities from Facebook Graph API!  Output: Facebook fans and DBpedia student numbers for 104 universities!  PREFIX u: <http://openlids.org/ universities.rdf#> SELECT ?n ?f ?s WHERE { u:list foaf:topic ?u . ?u foaf:name ? n . ?u og:fan_count ?f .?u d:numberOfStudents ?s } KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association
  39. 39. Linked Services and PlanetData!  Several areas seem likely to produce services: !  Stream, inc. Sensor, resources (latest values) !  Any others exposing dynamic resources !  Dynamic computations, inc. on-the-fly quality assessments!  Other areas seem likely to consider service technologies and move towards more service-like HTTP interactions !  Access control (OpenID, OAuth, etc.)!  Finally, remaining areas could serve to complement LIDS/LOS alignment !  Provenance KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association

×