Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Why do they call it Linked Data when they want to say...?

3,080 views

Published on

The four Linked Data publishing principles established in 2006 seem to be quite clear and well understood by people inside and outside the core Linked Data and Semantic Web community. However, not only when discussing with outsiders about the goodness of Linked Data but also when reviewing papers for the COLD workshop series, I find myself, in many occasions, going back again to the principles in order to see whether some approach for Web data publication and consumption is actually Linked Data or not. In this talk we will review some of the current approaches that we have for publishing data on the Web, and we will reflect on why it is sometimes so difficult to get into an agreement on what we understand by Linked Data. Furthermore, we will take the opportunity to describe yet another approach that we have been working on recently at the Center for Open Middleware, a joint technology center between Banco Santander and Universidad Politécnica de Madrid, in order to facilitate Linked Data consumption.

Published in: Technology

Why do they call it Linked Data when they want to say...?

  1. 1. Why do they call it Linked Data when they want to say…? Keynote at The 6th International Workshop on Consuming Linked Data (COLD) 12/10/2015 Oscar Corcho ocorcho@fi.upm.es @ocorcho https://www.slideshare.com/ocorcho
  2. 2. License • This work is licensed under the license CC BY-NC-SA 4.0 International • http://purl.org/NET/rdflicense/cc-by-nc-sa4.0 • You are free: • to Share — to copy, distribute and transmit the work • to Remix — to adapt the work • Under the following conditions • Attribution — You must attribute the work by inserting • “[source Oscar Corcho]” at the footer of each reused slide • a credits slide stating: “These slides are partially based on “Why do they call it Linked Data when they want to say…?” by O. Corcho” • Non-commercial • Share-Alike
  3. 3. Motivation… I want to consume Linked Data. What do I use? • SQUIN • Linked Data Platform • Linked Data Fragments • JSON-LD • CSV on the Web • SPARQL endpoints • …
  4. 4. Outline of the talk • Where do we start from? • A few examples of applications that we have built by consuming RDF • …
  5. 5. Application 1. 3cixty http://www.3cixty.com/
  6. 6. 3cixty. Planning our visit to a city
  7. 7. 3cixty. Exploiting the wishlist while in the city Check it at the poster and demo session, for the Semantic Web Challenge
  8. 8. Application 2. Geomarketing
  9. 9. Application 3. Buyer profile at Zaragoza http://www.zaragoza.es/ciudad/gestionmunicipal/contratos/
  10. 10. Application 4. Smart Developer Hub http://www.smartdeveloperhub.org/
  11. 11. How are all these applications built? Application How is data stored & published? How is data consumed? 3cixty Centralised SPARQL endpoint Linked Data (Virtuoso) SPARQL queries (webapp) Ad-hoc API (mobile app) Linked Data (not used yet) Geomarketing Centralised SPARQL endpoint Linked Data (ELDA) Linked Data Ad-hoc API for RDF Data Cube Buyer profile at Zaragoza Oracle DB Linked Data (ad-hoc software) SOLR Centralised SPARQL endpoint Linked Data SOLR SPARQL for complex queries ?? ??
  12. 12. Outline of the talk • Where do we start from? • A few examples of applications that we have built by consuming RDF • Quiz time: what do we understand by Linked Data? • …
  13. 13. What do papers in COLD tell us about Linked Data? • KR2RML: An Alternative Interpretation of R2RML for Heterogenous Sources • Leveraging Linked Data to Infer Semantic Relations within Structured Sources • LOTUS: Linked Open Text UnleaShed • Optimizing RDF Data Cubes for Efficient Processing of Analytical Queries • Pattern-Based Linked Data Publication: The Linked Chess Dataset Case • Policies Composition based on Data Usage Context • Towards Crawling the Web for Structured Data: Pitfalls of Common Crawl for E-Commerce • Uniqueness, Density, and Keyness: Exploring Class Hierarchies • Topics • Makes use of Linked Data principles, including dereferencing • Involves direct use of multiple, real-world Linked Datasets
  14. 14. Linked Data principles 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using standards (RDF, SPARQL) 4. Include links to other URIs, so that they can discover more things
  15. 15. Quiz time What is Linked Data for you?
  16. 16. Quiz 1. Is this Linked Data? • They call it API. Do they mean Linked Data? • http://www.zaragoza.es/docs-api/
  17. 17. Quiz 1. A few hints • Let’s try to run • curl -X GET --header "Accept: application/x-turtle" "http://www.zaragoza.es/api/recurso/urbanismo- infraestructuras/callejero/via?rf=html&results_only=false" • Or a more specific one for one street • curl -X GET --header "Accept: application/ld+json" "http://www.zaragoza.es/api/recurso/urbanismo- infraestructuras/callejero/via/20?rf=html" • Then, what do we think about it?
  18. 18. Quiz 2. And what about this? • http://datos.localidata.com/recurso/comercio/Provinci a/Madrid/Municipio/madrid/Local/Distrito/Label/Tetuá n
  19. 19. Quiz 2. A few more hints • However, this is giving me access to lots of URIs • http://datos.localidata.com/recurso/comercio/Provincia/Madri d/Municipio/madrid/Local/11029404L0-PlantaPB-Local214- ID36963 • Which I could then use in order to start applying a Linked traversal approach with bound subjects (e.g., as in SQUIN)
  20. 20. In summary… • Several approaches for Linked Data exposure that go beyond “pure Linked Data” • Combining REST APIs that provide you access to lots of URIs • … with pure Linked Data approaches
  21. 21. Outline of the talk • Where do we start from? • A few examples of applications that we have built by consuming RDF • Quiz time: what do we understand by Linked Data? • A summary of current Linked Data consumption approaches • …
  22. 22. A summary of Linked Data consumption approaches • Stealing some copyrighted material from the Linked Data Fragments folks… • They will surely be better than me explaining this ;-)
  23. 23. A summary of Linked Data consumption approaches ?
  24. 24. Outline of the talk • Where do we start from? • A few examples of applications that we have built by consuming RDF • Quiz time: what do we understand by Linked Data? • A summary of current Linked Data consumption approaches • Yet another approach: AGORA • Plus some demos (compulsory when talking about Linked Data)
  25. 25. Attention!! • Ongoing work • Sneak-preview • No technical paper yet • We have to sit down and write everything carefully • Highly driven by our initial use case • Now in the process of generalising it
  26. 26. Our research hypothesis • Can we go a bit beyond triple pattern fragments while… …maintaining the good behaviour server-side , …exploiting Linked Data about subjects, and …keeping to the Web paradigm? Basic graph pattern fragments? BGPs-lite, that is, BGPs with some restrictions … The Agora (/ˈæɡərə/; Ancient Greek: Ἀγορά Agorá) was a central spot in ancient Greek city-states. The literal meaning of the word is "gathering place" or "assembly". [Wikipedia]
  27. 27. Our assumptions on BGPs • BGPs composed of triple patterns with… • Subjects are always variables • Properties must be URIs • Objects can be variables, URIs or literals (will only work with equality) • Easy extensions (not done because of lack of time) • Allowing URIs as subjects • Extending properties to property paths • Adding more types of FILTERS • Difficult extensions (need to think a bit more about them) • Properties as variables PROCESSABLE • {?x ci:codebase ?y} • {?s doap:name "jenkins" . ?s scm:hasBranch ?b} • {?a ci:hasBuild ?b . ?b ci:hasExecution ?c . ?c ci:hasResult ?d} NOT PROCESSABLE • {?x ?p "jenkins”} • {?x ?p ?y}
  28. 28. A few more assumptions • RDF data has been created according to some vocabulary • Resources are typed (<uri> a <Concept>) • Vocabularies may be lightweight or heavyweight • However, we are not exploiting all types of domain and range restrictions, or inferences, yet
  29. 29. Step 1. Provide some vocabularies to use for planning • Tell AGORA (our fountain) which are the vocabularies that it has to understand • Note: relevant for the production of query plans • Post to http://localhost:9001/vocabs the OWL file • Let’s check the results • http://localhost:9001/types • http://localhost:9001/properties
  30. 30. Step 2. Provide/get some seed URIs to start query plans • Tell AGORA’s seed collector which are the seeds that it can take to start the link traversal approach • Note: those seed URIs need to be connected to all data • Stored in redis • Post to http://localhost:9001/seeds every seed URI • One may be enough if it provides access to other URIs • Let’s check the results • http://localhost:9001/seeds
  31. 31. Step 2. Provide/get some seed URIs to start query plans • Seeds may be obtained from a list of URIs, queries to SPARQL endpoints, ad-hoc wrappers, etc.
  32. 32. Step 3. Obtain a query/search plan • Request a query plan to AGORA’s planner, for a given graph pattern • Let’s check the results • http://localhost:9001/plan?gp={?a ci:hasBuild ?b}
  33. 33. Step 3. Obtain a query/search plan [] a agora:SearchTree ; agora:fromType ci:CIHarvester ; agora:hasSeed <http://localhost:9001/ci/> ; agora:length 1 ; agora:next [ agora:byPattern _:tp_0 ; agora:expectedType ci:CIHarvester ] . [] a agora:SearchSpace ; agora:definedBy _:tp_0 . _:var_a a agora:Variable ; rdfs:label "?a"^^xsd:string . _:var_b a agora:Variable ; rdfs:label "?b"^^xsd:string . _:tp_0 a agora:TriplePattern ; agora:object _:var_b ; agora:predicate ci:hasBuild ; agora:subject _:var_a . Let’s check the results http://localhost:9001/plan?gp={?a ci:hasBuild ?b} Let’s check this URI
  34. 34. Looking up for that URI
  35. 35. Step 3. Obtain a query/search plan [] a agora:SearchTree ; agora:fromType ci:CIHarvester ; agora:hasSeed <http://localhost:9001/ci/> ; agora:length 52 ; agora:next [ agora:byPattern _:tp_2 ; agora:expectedType ci:CIHarvester ; agora:next [ agora:byPattern _:tp_0 ; agora:expectedType ci:Build ; agora:next [ agora:byPattern _:tp_1 ; agora:expectedType oslc_auto:AutomationRequest ] ; agora:onProperty ci:hasExecution ] ; agora:onProperty ci:hasBuild ] . [] a agora:SearchSpace ; agora:definedBy _:tp_0, _:tp_1, _:tp_2 . _:var_a a agora:Variable ; rdfs:label "?a"^^xsd:string . _:var_d a agora:Variable ; rdfs:label "?d"^^xsd:string . _:tp_0 a agora:TriplePattern ; agora:object _:var_c ; agora:predicate ci:hasExecution ; agora:subject _:var_b . ….. Let’s check the results of a more complex query http://localhost:9001/plan?gp={?a ci:hasBuild ?b . ?b ci:hasExecution ?c . ?c ci:hasResult ?d}
  36. 36. What is a query/search plan for a BGP? • Composed by: • A set of seed URIs • A set of search paths • What is a seed URI? • The subject of one of the triples contained in the Agora • What is a search path? • A finite and executable queue of search steps • Its execution starts by dereferencing the seed URIs, which initializes the set of query-relevant triples <SEED_URI> <?> <?> <CAND_URI> property 1 ... property N
  37. 37. Step 4. Evaluate the query plan by dereferencing • Let’s check the results • http://localhost:9001/fragment?gp={?a ci:hasBuild ?b}
  38. 38. Let’s now do a demo with dbpedia • Yeah, all this was working in a controlled environment. What about Dbpedia? • Obviuosly, DBpedia understood from a pure Linked Data perspective. • We will open a brand new AGORA and will tell it to understand about movies
  39. 39. A few operations to be done • First of all, load the vocabulary in AGORA and provide a few seeds • Through a SPARQL query to DBpedia, but could be a list of URIs • Then, we can start inspecting • http://localhost:9000/graph/ • http://localhost:9000/types • http://localhost:9000/properties • Let’s start querying • First let’s see a plan: • http://localhost:9000/plan?gp={?f%20dbpedia- owl:starring%20?a} • http://localhost:9000/plan/view?gp={?f%20dbpedia- owl:starring%20?a} • And then execute the query
  40. 40. A few other queries • Get all relations between the films and the actors who star on them • http://localhost:9000/fragment?gp={?f dbpedia-owl:starring ?a} • Same as previous query, but also getting the name of these actors • http://localhost:9000/fragment?gp={?f dpedia-owl:starring ?a. ?a dbp:birthName ?n} • Get all films, their distributors and known locations of each them • http://localhost:9000/fragment?gp={?f dbpedia-owl:distributor ?d. ?d dbpedia-owl:location ?l}
  41. 41. Outline of the talk • Where do we start from? • A few examples of applications that we have built by consuming RDF • Quiz time: what do we understand by Linked Data? • A summary of current Linked Data consumption approaches • Yet another approach: AGORA • Plus some demos (compulsory when talking about Linked Data) • Where do we go next?
  42. 42. What’s next for AGORA? • An additional bit of engineering • Extending to other parts of SPARQL • Exploiting caching even more • Pagination • Building the vocabularies automatically for all those cases where there is no vocabulary (using LOUPE) • etc. • (basically, all those things already very well done by LDF) • SPARQL Updates • Some Linked Data Platform (ldp4j) technology behind the scenes • Sitting down to write everything carefully • The whole framework • The query planning algorithm • Evaluations and comparisons with other approaches • Is this approach really worth it?
  43. 43. What have we been talking about? WAIT FOR OUR PAPER TO BE PUBLISHED
  44. 44. And now the main conclusions • Consumption of Linked Data is normally associated to SPARQL querying over some dataset of the LOD cloud • My feeling after having read many papers that talk about Linked Data consumption • Nothing against that (look at the original examples that I gave earlier), but we have to understand, as a community, whether there are any challenges that pure Linked Data approaches allows performing better • Why do all people talk about REST APIs and we don’t? • So, more work needed on… • Approaches that exploit the features of “pure Linked Data” (e.g., SQUIN and Linked Traversal querying) • Approaches that exploit the Web dimension infrastructure (e.g. Linked Data Fragments)
  45. 45. Conclusions (II) • We should continue exploring this space • But probably these dimensions are not enough • And many open challenges still • Federated query processing techniques (adaptive) AGORA
  46. 46. And the last (bonus) slide…
  47. 47. And this is what you should remember from the talk Source: "Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/"
  48. 48. Why do they call it Linked Data when they want to say…? Acknowledgements to the SDH team at the Center for Open Middleware: Fernando Serena, Carlos Blanco, Alejandro Fernández, Alejandro Vera, Miguel Esteban, Andrés García, Javier Soriano, Asunción Gómez Oscar Corcho ocorcho@fi.upm.es @ocorcho https://www.slideshare.com/ocorcho

×