The four Linked Data publishing principles established in 2006 seem to be quite clear and well understood by people inside and outside the core Linked Data and Semantic Web community. However, not only when discussing with outsiders about the goodness of Linked Data but also when reviewing papers for the COLD workshop series, I find myself, in many occasions, going back again to the principles in order to see whether some approach for Web data publication and consumption is actually Linked Data or not. In this talk we will review some of the current approaches that we have for publishing data on the Web, and we will reflect on why it is sometimes so difficult to get into an agreement on what we understand by Linked Data. Furthermore, we will take the opportunity to describe yet another approach that we have been working on recently at the Center for Open Middleware, a joint technology center between Banco Santander and Universidad Politécnica de Madrid, in order to facilitate Linked Data consumption.
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Why do they call it Linked Data when they want to say...?
1. Why do they call it Linked Data
when they want to say…?
Keynote at
The 6th International Workshop on
Consuming Linked Data (COLD)
12/10/2015
Oscar Corcho
ocorcho@fi.upm.es
@ocorcho
https://www.slideshare.com/ocorcho
2. License
• This work is licensed under the license
CC BY-NC-SA 4.0 International
• http://purl.org/NET/rdflicense/cc-by-nc-sa4.0
• You are free:
• to Share — to copy, distribute and transmit the work
• to Remix — to adapt the work
• Under the following conditions
• Attribution — You must attribute the work by inserting
• “[source Oscar Corcho]” at the footer of each reused slide
• a credits slide stating: “These slides are partially based on
“Why do they call it Linked Data when they want to say…?”
by O. Corcho”
• Non-commercial
• Share-Alike
3. Motivation…
I want to consume Linked Data. What do I use?
• SQUIN
• Linked Data Platform
• Linked Data Fragments
• JSON-LD
• CSV on the Web
• SPARQL endpoints
• …
4. Outline of the talk
• Where do we start from?
• A few examples of applications that we have built by
consuming RDF
• …
11. How are all these applications built?
Application How is data stored &
published?
How is data consumed?
3cixty Centralised SPARQL endpoint
Linked Data (Virtuoso)
SPARQL queries (webapp)
Ad-hoc API (mobile app)
Linked Data (not used yet)
Geomarketing Centralised SPARQL endpoint
Linked Data (ELDA)
Linked Data
Ad-hoc API for RDF Data Cube
Buyer profile at
Zaragoza
Oracle DB
Linked Data (ad-hoc software)
SOLR
Centralised SPARQL endpoint
Linked Data
SOLR
SPARQL for complex queries
?? ??
12. Outline of the talk
• Where do we start from?
• A few examples of applications that we have built by
consuming RDF
• Quiz time: what do we understand by Linked Data?
• …
13. What do papers in COLD tell us about Linked Data?
• KR2RML: An Alternative Interpretation
of R2RML for Heterogenous Sources
• Leveraging Linked Data to Infer
Semantic Relations within Structured
Sources
• LOTUS: Linked Open Text UnleaShed
• Optimizing RDF Data Cubes for
Efficient Processing of Analytical
Queries
• Pattern-Based Linked Data Publication:
The Linked Chess Dataset Case
• Policies Composition based on Data
Usage Context
• Towards Crawling the Web for
Structured Data: Pitfalls of Common
Crawl for E-Commerce
• Uniqueness, Density, and Keyness:
Exploring Class Hierarchies
• Topics
• Makes use of Linked
Data principles,
including
dereferencing
• Involves direct use of
multiple, real-world
Linked Datasets
14. Linked Data principles
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those
names.
3. When someone looks up a URI, provide useful
information, using standards (RDF, SPARQL)
4. Include links to other URIs, so that they can
discover more things
16. Quiz 1. Is this Linked Data?
• They call it API. Do they mean Linked Data?
• http://www.zaragoza.es/docs-api/
17. Quiz 1. A few hints
• Let’s try to run
• curl -X GET --header "Accept: application/x-turtle"
"http://www.zaragoza.es/api/recurso/urbanismo-
infraestructuras/callejero/via?rf=html&results_only=false"
• Or a more specific one for one street
• curl -X GET --header "Accept: application/ld+json"
"http://www.zaragoza.es/api/recurso/urbanismo-
infraestructuras/callejero/via/20?rf=html"
• Then, what do we think about it?
18. Quiz 2. And what about this?
• http://datos.localidata.com/recurso/comercio/Provinci
a/Madrid/Municipio/madrid/Local/Distrito/Label/Tetuá
n
19. Quiz 2. A few more hints
• However, this is giving me access to lots of URIs
• http://datos.localidata.com/recurso/comercio/Provincia/Madri
d/Municipio/madrid/Local/11029404L0-PlantaPB-Local214-
ID36963
• Which I could then use in order to start applying a
Linked traversal approach with bound subjects (e.g.,
as in SQUIN)
20. In summary…
• Several approaches for Linked Data exposure that go
beyond “pure Linked Data”
• Combining REST APIs that provide you access to lots of
URIs
• … with pure Linked Data approaches
21. Outline of the talk
• Where do we start from?
• A few examples of applications that we have built by
consuming RDF
• Quiz time: what do we understand by Linked Data?
• A summary of current Linked Data consumption
approaches
• …
22. A summary of Linked Data consumption approaches
• Stealing some copyrighted material from the Linked
Data Fragments folks…
• They will surely be better than me explaining this ;-)
23. A summary of Linked Data consumption approaches
?
24. Outline of the talk
• Where do we start from?
• A few examples of applications that we have built by
consuming RDF
• Quiz time: what do we understand by Linked Data?
• A summary of current Linked Data consumption
approaches
• Yet another approach: AGORA
• Plus some demos (compulsory when talking about Linked
Data)
25. Attention!!
• Ongoing work
• Sneak-preview
• No technical paper
yet
• We have to sit down
and write everything
carefully
• Highly driven by our
initial use case
• Now in the process
of generalising it
26. Our research hypothesis
• Can we go a bit beyond triple pattern fragments while…
…maintaining the good behaviour server-side
, …exploiting Linked Data about subjects, and
…keeping to the Web paradigm?
Basic graph pattern fragments?
BGPs-lite, that is, BGPs with some
restrictions
…
The Agora (/ˈæɡərə/; Ancient Greek: Ἀγορά Agorá) was a central spot in ancient Greek
city-states. The literal meaning of the word is "gathering place" or "assembly". [Wikipedia]
27. Our assumptions on BGPs
• BGPs composed of triple
patterns with…
• Subjects are always variables
• Properties must be URIs
• Objects can be variables, URIs
or literals (will only work with
equality)
• Easy extensions (not done
because of lack of time)
• Allowing URIs as subjects
• Extending properties to
property paths
• Adding more types of FILTERS
• Difficult extensions (need to
think a bit more about them)
• Properties as variables
PROCESSABLE
• {?x ci:codebase ?y}
• {?s doap:name "jenkins" .
?s scm:hasBranch ?b}
• {?a ci:hasBuild ?b .
?b ci:hasExecution ?c .
?c ci:hasResult ?d}
NOT PROCESSABLE
• {?x ?p "jenkins”}
• {?x ?p ?y}
28. A few more assumptions
• RDF data has been created according to some
vocabulary
• Resources are typed (<uri> a <Concept>)
• Vocabularies may be lightweight or heavyweight
• However, we are not exploiting all types of domain and
range restrictions, or inferences, yet
29. Step 1. Provide some vocabularies to use for planning
• Tell AGORA (our fountain) which are the
vocabularies that it has to understand
• Note: relevant for the production of query plans
• Post to http://localhost:9001/vocabs the OWL file
• Let’s check the results
• http://localhost:9001/types
• http://localhost:9001/properties
30. Step 2. Provide/get some seed URIs to start query plans
• Tell AGORA’s seed collector which are the seeds
that it can take to start the link traversal approach
• Note: those seed URIs need to be connected to all data
• Stored in redis
• Post to http://localhost:9001/seeds every seed URI
• One may be enough if it provides access to other URIs
• Let’s check the results
• http://localhost:9001/seeds
31. Step 2. Provide/get some seed URIs to start query plans
• Seeds may be obtained from a list of URIs, queries to
SPARQL endpoints, ad-hoc wrappers, etc.
32. Step 3. Obtain a query/search plan
• Request a query plan to AGORA’s planner, for a
given graph pattern
• Let’s check the results
• http://localhost:9001/plan?gp={?a ci:hasBuild ?b}
33. Step 3. Obtain a query/search plan
[] a agora:SearchTree ;
agora:fromType ci:CIHarvester ;
agora:hasSeed <http://localhost:9001/ci/> ;
agora:length 1 ;
agora:next [ agora:byPattern _:tp_0 ;
agora:expectedType ci:CIHarvester ] .
[] a agora:SearchSpace ;
agora:definedBy _:tp_0 .
_:var_a a agora:Variable ;
rdfs:label "?a"^^xsd:string .
_:var_b a agora:Variable ;
rdfs:label "?b"^^xsd:string .
_:tp_0 a agora:TriplePattern ;
agora:object _:var_b ;
agora:predicate ci:hasBuild ;
agora:subject _:var_a .
Let’s check the results
http://localhost:9001/plan?gp={?a ci:hasBuild ?b}
Let’s check this URI
35. Step 3. Obtain a query/search plan
[] a agora:SearchTree ;
agora:fromType ci:CIHarvester ;
agora:hasSeed <http://localhost:9001/ci/> ;
agora:length 52 ;
agora:next [ agora:byPattern _:tp_2 ;
agora:expectedType ci:CIHarvester ;
agora:next [ agora:byPattern _:tp_0 ;
agora:expectedType ci:Build ;
agora:next [ agora:byPattern _:tp_1 ;
agora:expectedType
oslc_auto:AutomationRequest ] ;
agora:onProperty ci:hasExecution ] ;
agora:onProperty ci:hasBuild ] .
[] a agora:SearchSpace ;
agora:definedBy _:tp_0,
_:tp_1,
_:tp_2 .
_:var_a a agora:Variable ;
rdfs:label "?a"^^xsd:string .
_:var_d a agora:Variable ;
rdfs:label "?d"^^xsd:string .
_:tp_0 a agora:TriplePattern ;
agora:object _:var_c ;
agora:predicate ci:hasExecution ;
agora:subject _:var_b .
…..
Let’s check the results of a more complex query
http://localhost:9001/plan?gp={?a ci:hasBuild ?b . ?b ci:hasExecution
?c . ?c ci:hasResult ?d}
36. What is a query/search plan for a BGP?
• Composed by:
• A set of seed URIs
• A set of search paths
• What is a seed URI?
• The subject of one of the triples
contained in the Agora
• What is a search path?
• A finite and executable queue
of search steps
• Its execution starts by dereferencing
the seed URIs, which initializes the
set of query-relevant triples
<SEED_URI>
<?>
<?>
<CAND_URI>
property 1
...
property N
37. Step 4. Evaluate the query plan by dereferencing
• Let’s check the results
• http://localhost:9001/fragment?gp={?a ci:hasBuild ?b}
38.
39. Let’s now do a demo with dbpedia
• Yeah, all this was working in a controlled
environment. What about Dbpedia?
• Obviuosly, DBpedia understood from a pure Linked Data
perspective.
• We will open a brand new AGORA and will tell it to
understand about movies
40. A few operations to be done
• First of all, load the vocabulary in AGORA and provide a
few seeds
• Through a SPARQL query to DBpedia, but could be a list of URIs
• Then, we can start inspecting
• http://localhost:9000/graph/
• http://localhost:9000/types
• http://localhost:9000/properties
• Let’s start querying
• First let’s see a plan:
• http://localhost:9000/plan?gp={?f%20dbpedia-
owl:starring%20?a}
• http://localhost:9000/plan/view?gp={?f%20dbpedia-
owl:starring%20?a}
• And then execute the query
41. A few other queries
• Get all relations between the films and the actors who
star on them
• http://localhost:9000/fragment?gp={?f dbpedia-owl:starring
?a}
• Same as previous query, but also getting the name of
these actors
• http://localhost:9000/fragment?gp={?f dpedia-owl:starring ?a.
?a dbp:birthName ?n}
• Get all films, their distributors and known locations of
each them
• http://localhost:9000/fragment?gp={?f dbpedia-owl:distributor
?d. ?d dbpedia-owl:location ?l}
42. Outline of the talk
• Where do we start from?
• A few examples of applications that we have built by
consuming RDF
• Quiz time: what do we understand by Linked Data?
• A summary of current Linked Data consumption
approaches
• Yet another approach: AGORA
• Plus some demos (compulsory when talking about Linked
Data)
• Where do we go next?
43. What’s next for AGORA?
• An additional bit of engineering
• Extending to other parts of SPARQL
• Exploiting caching even more
• Pagination
• Building the vocabularies automatically for all those cases where
there is no vocabulary (using LOUPE)
• etc.
• (basically, all those things already very well done by LDF)
• SPARQL Updates
• Some Linked Data Platform (ldp4j) technology behind the scenes
• Sitting down to write everything carefully
• The whole framework
• The query planning algorithm
• Evaluations and comparisons with other approaches
• Is this approach really worth it?
44. What have we been talking about?
WAIT FOR OUR
PAPER TO BE
PUBLISHED
45. And now the main conclusions
• Consumption of Linked Data is normally associated
to SPARQL querying over some dataset of the LOD
cloud
• My feeling after having read many papers that talk about
Linked Data consumption
• Nothing against that (look at the original examples that I
gave earlier), but we have to understand, as a community,
whether there are any challenges that pure Linked Data
approaches allows performing better
• Why do all people talk about REST APIs and we don’t?
• So, more work needed on…
• Approaches that exploit the features of “pure Linked Data”
(e.g., SQUIN and Linked Traversal querying)
• Approaches that exploit the Web dimension infrastructure
(e.g. Linked Data Fragments)
46. Conclusions (II)
• We should continue exploring this space
• But probably these dimensions are not enough
• And many open challenges still
• Federated query processing techniques (adaptive)
AGORA
48. And this is what you should remember from the talk
Source: "Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/"
49. Why do they call it Linked Data
when they want to say…?
Acknowledgements to the SDH team at the Center for
Open Middleware:
Fernando Serena, Carlos Blanco, Alejandro Fernández,
Alejandro Vera, Miguel Esteban, Andrés García, Javier
Soriano, Asunción Gómez
Oscar Corcho
ocorcho@fi.upm.es
@ocorcho
https://www.slideshare.com/ocorcho
Editor's Notes
The four Linked Data publishing principles established in 2006 seem to be quite clear and well understood by people inside and outside the core Linked Data and Semantic Web community. However, not only when discussing with outsiders about the goodness of Linked Data but also when reviewing papers for the COLD workshop series, I find myself, in many occasions, going back again to the principles in order to see whether some approach for Web data publication and consumption is actually Linked Data or not. In this talk we will review some of the current approaches that we have for publishing data on the Web, and we will reflect on why it is sometimes so difficult to get into an agreement on what we understand by Linked Data. Furthermore, we will take the opportunity to describe yet another approach that we have been working on recently at the Center for Open Middleware, a joint technology center between Banco Santander and Universidad Politécnica de Madrid, in order to facilitate Linked Data consumption.