Why do they call it Linked Data when they want to say...?

Why do they call it Linked Data
when they want to say…?
Keynote at
The 6th International Workshop on
Consuming Linked Data (COLD)
12/10/2015
Oscar Corcho
ocorcho@fi.upm.es
@ocorcho
https://www.slideshare.com/ocorcho

License
• This work is licensed under the license
CC BY-NC-SA 4.0 International
• http://purl.org/NET/rdflicense/cc-by-nc-sa4.0
• You are free:
• to Share — to copy, distribute and transmit the work
• to Remix — to adapt the work
• Under the following conditions
• Attribution — You must attribute the work by inserting
• “[source Oscar Corcho]” at the footer of each reused slide
• a credits slide stating: “These slides are partially based on
“Why do they call it Linked Data when they want to say…?”
by O. Corcho”
• Non-commercial
• Share-Alike

Motivation…
I want to consume Linked Data. What do I use?
• SQUIN
• Linked Data Platform
• Linked Data Fragments
• JSON-LD
• CSV on the Web
• SPARQL endpoints
• …

Outline of the talk
• Where do we start from?
• A few examples of applications that we have built by
consuming RDF
• …

Application 1. 3cixty
http://www.3cixty.com/

3cixty. Planning our visit to a city

3cixty. Exploiting the wishlist while in the city
Check it at the poster and demo session, for the Semantic Web Challenge

Application 3. Buyer profile at Zaragoza
http://www.zaragoza.es/ciudad/gestionmunicipal/contratos/

Application 4. Smart Developer Hub
http://www.smartdeveloperhub.org/

How are all these applications built?
Application How is data stored &
published?
How is data consumed?
3cixty Centralised SPARQL endpoint
Linked Data (Virtuoso)
SPARQL queries (webapp)
Ad-hoc API (mobile app)
Linked Data (not used yet)
Geomarketing Centralised SPARQL endpoint
Linked Data (ELDA)
Linked Data
Ad-hoc API for RDF Data Cube
Buyer profile at
Zaragoza
Oracle DB
Linked Data (ad-hoc software)
SOLR
Centralised SPARQL endpoint
Linked Data
SOLR
SPARQL for complex queries
?? ??

Outline of the talk
consuming RDF
• Quiz time: what do we understand by Linked Data?
• …

What do papers in COLD tell us about Linked Data?
• KR2RML: An Alternative Interpretation
of R2RML for Heterogenous Sources
• Leveraging Linked Data to Infer
Semantic Relations within Structured
Sources
• LOTUS: Linked Open Text UnleaShed
• Optimizing RDF Data Cubes for
Efficient Processing of Analytical
Queries
• Pattern-Based Linked Data Publication:
The Linked Chess Dataset Case
• Policies Composition based on Data
Usage Context
• Towards Crawling the Web for
Structured Data: Pitfalls of Common
Crawl for E-Commerce
• Uniqueness, Density, and Keyness:
Exploring Class Hierarchies
• Topics
• Makes use of Linked
Data principles,
including
dereferencing
• Involves direct use of
multiple, real-world
Linked Datasets

Linked Data principles
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those
names.
3. When someone looks up a URI, provide useful
information, using standards (RDF, SPARQL)
4. Include links to other URIs, so that they can
discover more things

Quiz time
What is Linked Data
for you?

Quiz 1. Is this Linked Data?
• They call it API. Do they mean Linked Data?
• http://www.zaragoza.es/docs-api/

Quiz 1. A few hints
• Let’s try to run
• curl -X GET --header "Accept: application/x-turtle"
"http://www.zaragoza.es/api/recurso/urbanismo-
infraestructuras/callejero/via?rf=html&results_only=false"
• Or a more specific one for one street
• curl -X GET --header "Accept: application/ld+json"
"http://www.zaragoza.es/api/recurso/urbanismo-
infraestructuras/callejero/via/20?rf=html"
• Then, what do we think about it?

Quiz 2. And what about this?
• http://datos.localidata.com/recurso/comercio/Provinci
a/Madrid/Municipio/madrid/Local/Distrito/Label/Tetuá
n

Quiz 2. A few more hints
• However, this is giving me access to lots of URIs
• http://datos.localidata.com/recurso/comercio/Provincia/Madri
d/Municipio/madrid/Local/11029404L0-PlantaPB-Local214-
ID36963
• Which I could then use in order to start applying a
Linked traversal approach with bound subjects (e.g.,
as in SQUIN)

In summary…
• Several approaches for Linked Data exposure that go
beyond “pure Linked Data”
• Combining REST APIs that provide you access to lots of
URIs
• … with pure Linked Data approaches

Outline of the talk
consuming RDF
• A summary of current Linked Data consumption
approaches
• …

A summary of Linked Data consumption approaches
• Stealing some copyrighted material from the Linked
Data Fragments folks…
• They will surely be better than me explaining this ;-)

A summary of Linked Data consumption approaches
?

Outline of the talk
consuming RDF
approaches
• Yet another approach: AGORA
• Plus some demos (compulsory when talking about Linked
Data)

Attention!!
• Ongoing work
• Sneak-preview
• No technical paper
yet
• We have to sit down
and write everything
carefully
• Highly driven by our
initial use case
• Now in the process
of generalising it

Our research hypothesis
• Can we go a bit beyond triple pattern fragments while…
…maintaining the good behaviour server-side
, …exploiting Linked Data about subjects, and
…keeping to the Web paradigm?
Basic graph pattern fragments?
BGPs-lite, that is, BGPs with some
restrictions
…
The Agora (/ˈæɡərə/; Ancient Greek: Ἀγορά Agorá) was a central spot in ancient Greek
city-states. The literal meaning of the word is "gathering place" or "assembly". [Wikipedia]

Our assumptions on BGPs
• BGPs composed of triple
patterns with…
• Subjects are always variables
• Properties must be URIs
• Objects can be variables, URIs
or literals (will only work with
equality)
• Easy extensions (not done
because of lack of time)
• Allowing URIs as subjects
• Extending properties to
property paths
• Adding more types of FILTERS
• Difficult extensions (need to
think a bit more about them)
• Properties as variables
PROCESSABLE
• {?x ci:codebase ?y}
• {?s doap:name "jenkins" .
?s scm:hasBranch ?b}
• {?a ci:hasBuild ?b .
?b ci:hasExecution ?c .
?c ci:hasResult ?d}
NOT PROCESSABLE
• {?x ?p "jenkins”}
• {?x ?p ?y}

A few more assumptions
• RDF data has been created according to some
vocabulary
• Resources are typed (<uri> a <Concept>)
• Vocabularies may be lightweight or heavyweight
• However, we are not exploiting all types of domain and
range restrictions, or inferences, yet

Step 1. Provide some vocabularies to use for planning
• Tell AGORA (our fountain) which are the
vocabularies that it has to understand
• Note: relevant for the production of query plans
• Post to http://localhost:9001/vocabs the OWL file
• Let’s check the results
• http://localhost:9001/types
• http://localhost:9001/properties

Step 2. Provide/get some seed URIs to start query plans
• Tell AGORA’s seed collector which are the seeds
that it can take to start the link traversal approach
• Note: those seed URIs need to be connected to all data
• Stored in redis
• Post to http://localhost:9001/seeds every seed URI
• One may be enough if it provides access to other URIs
• http://localhost:9001/seeds

Step 2. Provide/get some seed URIs to start query plans
• Seeds may be obtained from a list of URIs, queries to
SPARQL endpoints, ad-hoc wrappers, etc.

Step 3. Obtain a query/search plan
• Request a query plan to AGORA’s planner, for a
given graph pattern
• http://localhost:9001/plan?gp={?a ci:hasBuild ?b}

[] a agora:SearchTree ;
agora:fromType ci:CIHarvester ;
agora:hasSeed <http://localhost:9001/ci/> ;
agora:length 1 ;
agora:next [ agora:byPattern _:tp_0 ;
agora:expectedType ci:CIHarvester ] .
[] a agora:SearchSpace ;
agora:definedBy _:tp_0 .
_:var_a a agora:Variable ;
rdfs:label "?a"^^xsd:string .
_:var_b a agora:Variable ;
rdfs:label "?b"^^xsd:string .
_:tp_0 a agora:TriplePattern ;
agora:object _:var_b ;
agora:predicate ci:hasBuild ;
agora:subject _:var_a .
Let’s check the results
http://localhost:9001/plan?gp={?a ci:hasBuild ?b}
Let’s check this URI

[] a agora:SearchTree ;
agora:fromType ci:CIHarvester ;
agora:hasSeed <http://localhost:9001/ci/> ;
agora:length 52 ;
agora:expectedType ci:CIHarvester ;
agora:expectedType ci:Build ;
agora:expectedType
oslc_auto:AutomationRequest ] ;
agora:onProperty ci:hasExecution ] ;
agora:onProperty ci:hasBuild ] .
[] a agora:SearchSpace ;
agora:definedBy _:tp_0,
_:tp_1,
_:tp_2 .
_:var_a a agora:Variable ;
rdfs:label "?a"^^xsd:string .
_:var_d a agora:Variable ;
rdfs:label "?d"^^xsd:string .
_:tp_0 a agora:TriplePattern ;
agora:object _:var_c ;
agora:predicate ci:hasExecution ;
agora:subject _:var_b .
…..
Let’s check the results of a more complex query
http://localhost:9001/plan?gp={?a ci:hasBuild ?b . ?b ci:hasExecution
?c . ?c ci:hasResult ?d}

What is a query/search plan for a BGP?
• Composed by:
• A set of seed URIs
• A set of search paths
• What is a seed URI?
• The subject of one of the triples
contained in the Agora
• What is a search path?
• A finite and executable queue
of search steps
• Its execution starts by dereferencing
the seed URIs, which initializes the
set of query-relevant triples
<SEED_URI>
<?>
<?>
<CAND_URI>
property 1
...
property N

Step 4. Evaluate the query plan by dereferencing
• http://localhost:9001/fragment?gp={?a ci:hasBuild ?b}

Let’s now do a demo with dbpedia
• Yeah, all this was working in a controlled
environment. What about Dbpedia?
• Obviuosly, DBpedia understood from a pure Linked Data
perspective.
• We will open a brand new AGORA and will tell it to
understand about movies

A few operations to be done
• First of all, load the vocabulary in AGORA and provide a
few seeds
• Through a SPARQL query to DBpedia, but could be a list of URIs
• Then, we can start inspecting
• http://localhost:9000/graph/
• http://localhost:9000/types
• http://localhost:9000/properties
• Let’s start querying
• First let’s see a plan:
• http://localhost:9000/plan?gp={?f%20dbpedia-
owl:starring%20?a}
• http://localhost:9000/plan/view?gp={?f%20dbpedia-
owl:starring%20?a}
• And then execute the query

A few other queries
• Get all relations between the films and the actors who
star on them
• http://localhost:9000/fragment?gp={?f dbpedia-owl:starring
?a}
• Same as previous query, but also getting the name of
these actors
• http://localhost:9000/fragment?gp={?f dpedia-owl:starring ?a.
?a dbp:birthName ?n}
• Get all films, their distributors and known locations of
each them
• http://localhost:9000/fragment?gp={?f dbpedia-owl:distributor
?d. ?d dbpedia-owl:location ?l}

Outline of the talk
consuming RDF
approaches
• Yet another approach: AGORA
• Plus some demos (compulsory when talking about Linked
Data)
• Where do we go next?

What’s next for AGORA?
• An additional bit of engineering
• Extending to other parts of SPARQL
• Exploiting caching even more
• Pagination
• Building the vocabularies automatically for all those cases where
there is no vocabulary (using LOUPE)
• etc.
• (basically, all those things already very well done by LDF)
• SPARQL Updates
• Some Linked Data Platform (ldp4j) technology behind the scenes
• Sitting down to write everything carefully
• The whole framework
• The query planning algorithm
• Evaluations and comparisons with other approaches
• Is this approach really worth it?

What have we been talking about?
WAIT FOR OUR
PAPER TO BE
PUBLISHED

And now the main conclusions
• Consumption of Linked Data is normally associated
to SPARQL querying over some dataset of the LOD
cloud
• My feeling after having read many papers that talk about
Linked Data consumption
• Nothing against that (look at the original examples that I
gave earlier), but we have to understand, as a community,
whether there are any challenges that pure Linked Data
approaches allows performing better
• Why do all people talk about REST APIs and we don’t?
• So, more work needed on…
• Approaches that exploit the features of “pure Linked Data”
(e.g., SQUIN and Linked Traversal querying)
• Approaches that exploit the Web dimension infrastructure
(e.g. Linked Data Fragments)

Conclusions (II)
• We should continue exploring this space
• But probably these dimensions are not enough
• And many open challenges still
• Federated query processing techniques (adaptive)
AGORA

And this is what you should remember from the talk
Source: "Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/"

Why do they call it Linked Data
when they want to say…?
Acknowledgements to the SDH team at the Center for
Open Middleware:
Fernando Serena, Carlos Blanco, Alejandro Fernández,
Alejandro Vera, Miguel Esteban, Andrés García, Javier
Soriano, Asunción Gómez
Oscar Corcho
ocorcho@fi.upm.es
@ocorcho
https://www.slideshare.com/ocorcho

Why do they call it Linked Data when they want to say...?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Why do they call it Linked Data when they want to say...?

Similar to Why do they call it Linked Data when they want to say...? (20)

More from Oscar Corcho

More from Oscar Corcho (20)

Recently uploaded

Recently uploaded (20)

Why do they call it Linked Data when they want to say...?

Editor's Notes