Invited talk at USEWOD2014 (http://people.cs.kuleuven.be/~bettina.berendt/USEWOD2014/)
A tremendous amount of machine-interpretable information is available in the Linked Open Data Cloud. Unfortunately, much of this data remains underused as machine clients struggle to use the Web. I believe this can be solved by giving machines interfaces similar to those we offer humans, instead of separate interfaces such as SPARQL endpoints. In this talk, I'll discuss the Linked Data Fragments vision on machine access to the Web of Data, and indicate how this impacts usage analysis of the LOD Cloud. We all can learn a lot from how humans access the Web, and those strategies can be applied to querying and analysis. In particular, we have to focus first on solving those use cases that humans can do easily, and only then consider tackling others.
12. Currently, there are three ways
to provide access to a Linked Data dataset.
SPARQL endpoint
data dump
Linked Data documents
13. Those three ways have one thing in common:
they offer fragments of a dataset.
SPARQL endpoint
data dump
Linked Data documents
14. Linked Data Fragments look
at all ways at the same time.
specific queries
high server effort
low availability
generic requests
high client effort
high availability
LD
document
data
dump
SPARQL
result
15. Each type of Linked Data Fragment
is defined by three characteristics.
selector
metadata
controls
What data does it contain?
What do we know about it?
What can we do next?
16. Each type of Linked Data Fragment
is defined by three characteristics.
selector
metadata
controls
a specific entity
creator, maintainer, …
links to other LD documents
Linked Data Document
17. Each type of Linked Data Fragment
is defined by three characteristics.
selector
metadata
controls
a SPARQL query
(none)
(none)
SPARQL CONSTRUCT result
18. Each type of Linked Data Fragment
is defined by three characteristics.
selector
metadata
controls
everything
(none)
data dump
number of triples, file size
19. Any API that provides triples
publishes Linked Data Fragments.
specific queries
high server effort
low availability
generic requests
high client effort
high availability
LD
document
data
dump
SPARQL
result
20. Can we define APIs that efficiently allow
SPARQL querying with high availability?
specific queries
high server effort
low availability
generic requests
high client effort
high availability
LD
document
data
dump
SPARQL
result
basic
LDFs
21. A basic Linked Data Fragments API
offers triple-pattern-based access.
selector
metadata
controls
a triple pattern
total number of matches
access to all basic LDFs
basic Linked Data Fragment
23. Triple-pattern-based access to Linked Data
doesn’t endanger a server’s availability.
Easy to generate
Efficiently cacheable through HTTP
Low message entropy
compressed triple format HDT
24. The higher the message entropy,
the more interesting analysis becomes.
high message entropylow message entropy
LD
document
data
dump
SPARQL
result
basic
LDFs
interesting for USEWOD?boring for USEWOD?
28. SELECT ?person ?city WHERE {
!
!
}
How can we answer this query
using basic Linked Data Fragments?
?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace ?city.
?city foaf:name "York"@en.
29. Split the query based on
the available fragment types.
?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace ?city.
?city foaf:name "York"@en.
30. ?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace ?city.
?city foaf:name "York"@en.
dbpedia:York foaf:name "York"@en.
dbpedia:York,_Ontario foaf:name "York"@en.
…
dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency.
dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce.
…
dbpedia:Aamir_Zaki a dbpedia-owl:Artist.
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
…
Get the first page
of the corresponding fragments.
31. ?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace ?city.
?city foaf:name "York"@en.
dbpedia:York foaf:name "York"@en.
dbpedia:York,_Ontario foaf:name "York"@en.
…
dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency.
dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce.
…
dbpedia:Aamir_Zaki a dbpedia-owl:Artist.
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
…
Read the count metadata
of each fragment page.
±61,000
±470,000
12
32. ?person a dbpedia-owl:Artist
?person dbpedia-owl:birthPlace
?city foaf:name "York"@en.
dbpedia:York foaf:name "York"@en.
dbpedia:York,_Ontario foaf:name "York"@en.
…
dbpedia:Ganesh_Ghosh …:birthPlace dbpedia:Bengal_Presidency.
dbpedia:Jacques_L'enfant …:birthPlace dbpedia:Beauce.
…
dbpedia:Aamir_Zaki
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
…
Take the smallest fragment,
start with its first match.
±61,
±470,
12
33. SELECT ?person WHERE {
!
!
}
How can we answer this query
using basic Linked Data Fragments?
?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace dbpedia:York.
dbpedia:York foaf:name "York"@en.
35. ?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace dbpedia:York.
dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York.
dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
…
dbpedia:Aamir_Zaki a dbpedia-owl:Artist.
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
…
Get the first page
of the corresponding fragments.
36. ?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace dbpedia:York.
dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York.
dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
…
dbpedia:Aamir_Zaki a dbpedia-owl:Artist.
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
…
Read the count metadata
of each fragment page.
±61,000
75
37. ?person a dbpedia-owl:Artist
?person dbpedia-owl:birthPlace dbpedia:York.
dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York.
dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.
…
dbpedia:Aamir_Zaki
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
…
±61,
75
Take the smallest fragment,
start with its first match.
38. ASK {
!
!
}
How can we answer this query
using basic Linked Data Fragments?
dbpedia:John_Flaxman a dbpedia-owl:Artist.
dbpedia:John_Flaxman :birthPlace dbpedia:York.
dbpedia:York foaf:name "York"@en.
45. How can we analyze queries
from intelligent clients?
Client queries are different
Look at the logs
Treat machine clients as humans
46. How can we analyze queries
from intelligent clients?
Client queries are different
Look at the logs
Treat machine clients as humans
47. Despite being on the Web, we use
public SPARQL endpoints like databases.
Ask a complex question.
Wait.
Process the answer.
48. When was the last time
you used the Web like that?
Ask a complex question.
Wait.
Process the answer.
49. On the Web, there are no final answers.
We ask questions and iteratively improve.
Ask a simple questions.
Process answers as they arrive.
Create new questions.
50. Show a sorted list of names of Greek artists,
linked to their DBpedia page.
…
καλλιτέχνες endpoint
approach
fragment
approach
52. endpoint
approach
Show a sorted list of names of Greek artists,
linked to their DBpedia page.
SELECT DISTINCT(?person) MIN(?name)
WHERE {
?person a dbpedia-owl:Artist;
foaf:name ?name;
dbpedia-owl:birthPlace ?city.
?city dbpedia-owl:country dbpedia:Greece.
}
ORDER BY ?name
53. endpoint
approach
Show a sorted list of names of Greek artists,
linked to their DBpedia page.
SELECT DISTINCT(?person) MIN(?name)
WHERE {
?person a dbpedia-owl:Artist;
foaf:name ?name;
dbpedia-owl:birthPlace ?city.
?city dbpedia-owl:country dbpedia:Greece.
}
ORDER BY ?name
54. endpoint
approach
Show a sorted list of names of Greek artists,
linked to their DBpedia page.
DISTINCT
MIN
SORT BY
keep all results in memory
keep all results in memory, blocking
keep all results in memory, blocking
Consequences:
Doesn’t matter; we’re waiting anyway.
55. fragment
approach
Show a sorted list of names of Greek artists,
linked to their DBpedia page.
SELECT ?person ?name
WHERE {
?person a dbpedia-owl:Artist;
foaf:name ?name;
dbpedia-owl:birthPlace ?city.
?city dbpedia-owl:country dbpedia:Greece
}
No blocking operators; streaming is important.
57. Making the LOD cloud less lonesome
starts with embracing its open nature.
How meaningful is a sort anyway?
How meaningful is a single answer?
Build applications that react to data.
58. How can we analyze queries
from intelligent clients?
Client queries are different
Look at the logs
Treat machine clients as humans
59. Let’s closely inspect the server logs
of the “Artists from York” query.
SPARQL:
http://dbpedia.org/sparql?query=SELECT+%3Fp+
%3Fc+WHERE+%7B%0D%0A++++%3Fp+a+
%3Chttp%3A%2F%2Fdbpedia.org%2Fontology
SELECT ?person ?city WHERE {
!
!
}
?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace ?city.
?city foaf:name "York"@en.
60. Let’s closely inspect the server logs
of the “Artists from York” query.
basic Linked Data Fragments:
/dbpedia
/dbpedia?predicate=http%3A%2F%2Fxmlns.com%2Ffoa
/dbpedia?predicate=http%3A%2F%2Fwww.w3.org%2F1
/dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo
/dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo
/dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo
/dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo
/dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo
/dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo
/dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo
/dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo
61. Let’s closely inspect the server logs
of the “Artists from York” query.
basic Linked Data Fragments:
?c foaf:name "York"@en.
?p rdf:type dbpedia-owl:Artist.
?p dbpedia-owl:birthPlace ?c.
?p dbpedia-owl:birthPlace dbpedia:York_(explorer).
?p dbpedia-owl:birthPlace dbpedia:York_railway_station.
?p dbpedia-owl:birthPlace dbpedia:28220_York.
?p dbpedia-owl:birthPlace dbpedia:York_(provincial_elect
?p dbpedia-owl:birthPlace dbpedia:York,_New_York.
dbpedia:Cornelius_R._Parsons rdf:type dbpedia-owl:Artist
dbpedia:John_R._McPherson rdf:type dbpedia-owl:Artist.
62. Access logs resulting from basic LDF clients
are hard to interpret.
Parallel requests, unclear dependencies
Full query hard to reconstruct
Was it SPARQL in the first place?
63. What would we do
if the users were humans?
Create a user profile
Use cookies
Check the Referer header
64. The Referer header tells us
the path the client has followed.
Interesting, still underused idea
Augmenting the Web of Data using Referers
by Hannes Mühleisen & Anja Jentzsch
It explains part of the “why”
Allows to reconstruct dependencies
66. These dependencies can help us
cache and prefetch.
After retrieving ?s <p> <o> patterns,
clients often ask for <s> rdfs:label ?l .
Example observation:
Example action:
Always add labels to concepts
in all responses.
cfr. Caching and Prefetching Strategies for SPARQL queries
by Johannes Lorey and Felix Naumann
67. However, the open-world assumption
can cause cardinality trouble.
SELECT ?person ?label WHERE {
?person a dbpedia-owl:Artist;
rdfs:label ?label.
}
dbpedia:Yannis_Markopoulos a dbpedia-owl:Artist;
rdfs:label "Yannis Markopoulos"@en.
dbpedia:Yanni a dbpedia-owl:Artist;
rdfs:label “Yanni"@en.
…
Are these all labels?
Should I ask for more?
fragment “?person a dbpedia-owl:Artist”
68. The intent of this query
is probably different from its semantics.
SELECT ?person ?label WHERE {
?person a dbpedia-owl:Artist;
rdfs:label ?label.
}
With SPARQL endpoints, this doesn’t matter.
Clients don’t have to work more.
To optimize client usage patterns,
this difference is really important.
69. Referers only show part of the story.
Can we know more?
GET /dbpedia?o=dbpedia%3AGreece HTTP/1.1
User-Agent: curl/7.35.0
Host: data.linkeddatafragments.org
Accept: text/turtle
Referer: http://data.linkeddatafragments.org/dbpedia
X-Executed-Query: SELECT ?person ?label WHERE { ?person a dbpe
Inform the server what you’re doing.
Then the server can help you better in the future.
70. How can we analyze queries
from intelligent clients?
Client queries are different
Look at the logs
Treat machine clients as humans
71. My reflex when building machine clients
is to wonder: what would a human do?
I don’t expect any server
to solve my queries;
I collect small pieces of information
to solve queries myself.
72. If you as a human use a website
and it doesn’t work the way you want,
what would you do?
73. As a human, I would leave feedback.
I would comment, like or upvote/downvote.
“I tried to find artists from Greece.
Finding out Greek citizens was easy,
but the artist checks went quite slow.
The total query took me 4 minutes,
whereas I would prefer 1 minute.”
★★★☆☆
Feedback is key
to improving a service.
74. Why don’t we let machines
give feedback about their experience?
[ a f:ExperienceFeedback;
f:author _:agent;
f:subject _:query;
f:actualSituation [
f:duration "3m";
f:bandwidth "500KB"
];
f:desiredSituation [
f:duration "1m"
] ].
75. If clients are more intelligent than servers,
we have to analyze usage differently.
Enable clients to act smart
Creatively reuse human techniques
Learn from optimizations
feedback
77. Machine clients sending feedback?
What you say is total science fiction!
What’s next, machine clients
that poke you on Facebook?
“
78. What I consider science fiction:
a public endpoint on the Web
that answers any question.
79. 99.9% of time, a basic LDF client
solves this query in 3 seconds:
Which public SPARQL endpoints
could guarantee you that?
SELECT ?person ?city WHERE {
?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace ?city.
?city foaf:name “York"@en.
}
80. You cannot solve all queries
with basic Linked Data Fragments!
SELECT ?x ?l WHERE {
?x rdfs:label ?l.
FILTER REGEX(?l, "^A")
}
“
81. The Semantic Web tried
to solve too much too fast.
The result is
a very lonesome LOD Cloud.
You can query anything,
but it never works.
82. Start with enabling tasks
humans could easily do.
SELECT ?x ?l WHERE {
?x rdfs:label ?l.
FILTER REGEX(?l, "^A")
}
83. Start with enabling tasks
humans could easily do.
SELECT ?person ?city WHERE {
?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace ?city.
?city foaf:name "York"@en.
}
84. Start with enabling tasks
humans could easily do.
“I tried to find artists from Greece.
Finding out Greek citizens was easy,
but the artist checks went quite slow.
The total query took me 4 minutes,
whereas I would prefer 1 minute.”
★★★☆☆
85. Start with enabling tasks
humans could easily do.
After that,
we’ll talk about the rest.
86. The LOD usage community
can help create intelligent clients.
Put the intelligent servers aside,
enable clients to be intelligent.
Look at usage from the perspective
of intelligent clients.
87. The LOD Cloud is lonesome
because we gave
human and machine clients
different interfaces.
Let’s make the simple things work.
Let’s get the data used.