This document describes LODeX, a tool for exploring and querying Linked Open Data (LOD) sources. LODeX aims to make LOD discovery and consumption easier for both skilled and unskilled users. It has two main modules: an extraction and summarization module that analyzes LOD datasets and generates schema summaries; and a visualization and querying module that allows users to browse schema summaries and build visual queries without SPARQL knowledge. The visual queries are compiled into SPARQL and executed over LOD endpoints. LODeX indexes information from the LOD cloud and aims to provide a standardized way to understand LOD dataset structures and query LOD sources.
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Visual Querying LOD sources with LODeX
1. DBGroup@UNIMO
Fabio Benedetti, Sonia Bergamaschi, Laura Po
Department of Engineering “Enzo Ferrari”
University of Modena & Reggio Emilia
K-Cap 2015 - The 8th International Conference on Knowledge Capture
October 7-10, 2015, Palisades, NY, USA
2. DBGroup@UNIMO
3
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 3
[Schmachtenberg, Max, Christian Bizer, and Heiko Paulheim. "Adoption of the Linked Data Best Practices in
Different Topical Domains." The Semantic Web–ISWC 2014. Springer International Publishing, 2014. 245-260]
3. DBGroup@UNIMO
4
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 4
*Only 570 datasets belong to the LOD cloud,
the remaining datasets do not contain
ingoing/outgoing links to the LOD Cloud.
2009 2014*
Domain Number % Number %
Cross-domain 41 13.95% 41 4.04%
Geographic 31 10.54% 21 2.07%
Government 49 16.67% 183 18.05%
Life sciences 41 13.95% 83 8.19%
Media 25 8.50% 22 2.17%
Publications 87 29.59% 96 9.47%
Social web 0 0.00% 520 51.28%
User-generated
content 20 6.80% 48 4.73%
Total 294 1014
2009 Domain
Cross-domain
Geographic
Government
Life sciences
Media
Publications
Social web
2014
4. DBGroup@UNIMO
5
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 5
The Open Access trends encourage
the publication of Open Data in form
of Linked Data
But
Discovering and consuming LOD
sources is a complex task for both
skilled and unskilled user
5. DBGroup@UNIMO
6
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 6
• There does not exist any standard for documenting a
dataset
• A great number of datasets is published without a real
documentation that could help on revealing their structure.
To understand if a dataset really contains interesting information a
user have to manually explore it using SPARQL queries.
Unskilled
user
A user with no SPARQL knowledge cannot become
a consumer of Linked Data
Skilled
user
The task of exploring a dataset can be time
consuming without having any knowledge of its
structure
6. DBGroup@UNIMO
7
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 7
A tool for that promotes the understanding, navigation
and querying of LOD sources
Requirements
• portable to the LOD Cloud
• provide a synthetic representation of the structure of
the dataset
• provide visual query building functionalities hiding
the complexity of Semantic Web technologies
7. DBGroup@UNIMO
8
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 8
Two main modules
• Extraction & Summarization
– Index Extraction (IE)
– Post Processing (PP)
LOD Cloud
SPARQL
Queries
LODeX
Post-
processing
Statistical
Indexes
LODeX
Indexes
Extraction
Endpoint
URLs
Schema
Summary
NoSQL
SPARQL
Queries
Schema
Summary
Query
Orchestrator
Schema
Summary
Visualizzation
Basic
QueryResults
• Visualization & Querying
– Schema Summary Visualization
– Query Orchestrator
8. DBGroup@UNIMO
9
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 9
Index Extraction [1]
The IE process is able to generate the SPARQL queries used
to extract the different indexes.
• Pattern Strategy technique
– It is a technique able to produce an higher number of less complex
SPARQL query
Post Processing
An algorithm combines the information contained in the
Statistical Indexes to produce and store the Schema
Summary
9. DBGroup@UNIMO
10
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 10
The Schema Summary is a pseudograph composed by:
• C - Classes (nodes)
• P - Properties (edges)
And additional elements and function:
• A - Attributes associated to each class
– Each attribute represent the existence of a Datatype property
from the instances of the class
• 𝒍 - labels
• l – labeling function
• count - count function
The Schema Summary is inferred by the distribution of
the instances of a dataset
10. DBGroup@UNIMO
11
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 11
ex:Sector foaf:Organization
owl:Class
ex:sector
“sector”
rdf:type rdf:type
rdf:Propertyrdf:type
owl:ObjectProperty
rdf:type
sector1 organization1ex:sector
dc:title
“Energy” organization2
Extensional
Classes
Extensional
Knowledge
Intensional
Knowledge
ex:activity
“Village electrification
in the Pacific”
“+41331231”
rdfs:label
rdfs:label
rdfs:domain
rdf:type
ex:sector
rdf:type rdf:type
dbpedia:fax
person1
foaf:Person
ex:activity
“Paolo”
“Rossi”
rdf:type
ex:ceo
rdf:type foaf:firstName
foaf:lastName
The information contained in the Intensional knowledge can be incomplete
or absent
11. DBGroup@UNIMO
12
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 12
These indexes belong to extensional group of the Statistical Indexes [2]:
• SC (Subject Class) contains the pairs (p,c) where p is an object property
and c is its domain class.
• SCl (Subject Class to literal) contains the pairs (p,c) where p is a datatype
property and c is its domain class.
• OC (Object Class) contains the pairs (p,c) where p is an object property
and c is its range class.
12. DBGroup@UNIMO
13
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 13
These indexes belong to extensional group of the Statistical Indexes [2]:
• SC (Subject Class) contains the pairs (p,c) where p is an object property
and c is its domain class.
• SCl (Subject Class to literal) contains the pairs (p,c) where p is a datatype
property and c is its domain class.
• OC (Object Class) contains the pairs (p,c) where p is an object property
and c is its range class.
ex:Sector foaf:Organization
sector1 organization1ex:sector
dc:title
“Energy” organization2
Extensional
Classes
Extensional
Knowledge
“Village electrification
in the Pacific”
“+41331231”
ex:sector
rdf:type rdf:type
dbpedia:fax
person1
foaf:Person
ex:activity
“Paolo”
“Rossi”
rdf:type
ex:ceo
rdf:type foaf:firstName
foaf:lastName
13. DBGroup@UNIMO
14
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 14
These indexes belong to extensional group of the Statistical Indexes [2]:
• SC (Subject Class) contains the pairs (p,c) where p is an object property
and c is its domain class.
• SCl (Subject Class to literal) contains the pairs (p,c) where p is a datatype
property and c is its domain class.
• OC (Object Class) contains the pairs (p,c) where p is an object property
and c is its range class.
ex:Sector foaf:Organization
sector1 organization1ex:sector
dc:title
“Energy” organization2
Extensional
Classes
Extensional
Knowledge
“Village electrification
in the Pacific”
“+41331231”
ex:sector
rdf:type rdf:type
dbpedia:fax
person1
foaf:Person
ex:activity
“Paolo”
“Rossi”
rdf:type
ex:ceo
rdf:type foaf:firstName
foaf:lastName
14. DBGroup@UNIMO
15
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 15
These indexes belong to extensional group of the Statistical Indexes [2]:
• SC (Subject Class) contains the pairs (p,c) where p is an object property
and c is its domain class.
• SCl (Subject Class to literal) contains the pairs (p,c) where p is a datatype
property and c is its domain class.
• OC (Object Class) contains the pairs (p,c) where p is an object property
and c is its range class.
ex:Sector foaf:Organization
sector1 organization1ex:sector
dc:title
“Energy” organization2
Extensional
Classes
Extensional
Knowledge
“Village electrification
in the Pacific”
“+41331231”
ex:sector
rdf:type rdf:type
dbpedia:fax
person1
foaf:Person
ex:activity
“Paolo”
“Rossi”
rdf:type
ex:ceo
rdf:type foaf:firstName
foaf:lastName
15. DBGroup@UNIMO
16
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 16
We use an algorithm for combining these indexes and produce a Schema
Summary
Name Values
SC
(foaf:Organization,ex:ceo,1),
(foaf:Organization,ex:sector,2)
SCl
(foaf:Person,foaf:firstName,1),
(foaf:Person,foaf:lastName,1),
(foaf:Organization,ex:dbpedia:fax,1),
(ex:Sector,dc:title,1),
(foaf:Organization,ex:activity,1),
(foaf:Organization,dbpedia:fax,1)
OC
(ex:Sector,ex:sector,1)
(ex:Person,ex:ceo,1)
16. DBGroup@UNIMO
17
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 17
foaf:Organizzation
2
ex:Sector
1
ex:sector 2foaf:Person
1
ex:ceo 1
dc:title 1foaf:firstName 1
foaf:lastName 1
ex:activity 1
dbpedia:fax 1
We use an algorithm for combining these indexes and produce a Schema
Summary
Name Values
SC
(foaf:Organization,ex:ceo,1),
(foaf:Organization,ex:sector,2)
SCl
(foaf:Person,foaf:firstName,1),
(foaf:Person,foaf:lastName,1),
(foaf:Organization,ex:dbpedia:fax,1),
(ex:Sector,dc:title,1),
(foaf:Organization,ex:activity,1),
(foaf:Organization,dbpedia:fax,1)
OC
(ex:Sector,ex:sector,1)
(ex:Person,ex:ceo,1)
17. DBGroup@UNIMO
18
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 18
Schema Summary Visualization
Front end of the Web Application composed by three panel:
• List of datasets indexed in LODeX
• Schema Summary and query building panel
• Refinement panel
Query Orchestrator
• It manages the interaction between the User and the GUI
• It contains a SPARQL compiler able to compile the visual
query in a SPARQL one
18. DBGroup@UNIMO
19
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 19
19. DBGroup@UNIMO
20
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 20
20. DBGroup@UNIMO
21
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 21
Schema
Summary
SPARQL
compiler
SPARQL
query
Basic
Query
The Visual Query has a tree structure
A SPARQL compiler exploits a recursive
algorithm to generate the corresponding
SPARQL query
Operators supported by the compiler:
• AND
• Optional
• Filter
The query is sent to the SPARQL endpoint
and the results can be visualized in a tabular
view
• ORDER BY
• LIMIT
• OFFSET
21. DBGroup@UNIMO
22
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 22
We performed 3 different kinds of evaluation to inspect:
• Portability of LODeX to SPARQL endpoints
• SPARQL expressiveness
• Usability of LODeX
– to verify if the graph visualization of the SS is clear in representing the
structure of a dataset
– to prove if the visual query panel is a powerful and adequate way for
generating SPARQL queries
22. DBGroup@UNIMO
23
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 23
We evaluate the complexity of the graph visualization with a group
of group of 5 students.
• Task: find a node in graphs of increasing size
The test set is composed by 185 datasets taken from Datahub
Result portability test Number of
datasets
%
Huge Schema Summary
(more than 80 nodes)
40 21%
Offline endpoints 7 4%
Not standard response 28 15%
Pass the test 110 60%
23. DBGroup@UNIMO
24
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 24
We analyzed what kind of SPARQL query LODeX is able to generate
We used as reference the queries contained in the Berlin SPARQL
Benchmark [3]
• LODeX is able to generate 6 of 10 queries contained in BSBM
24. DBGroup@UNIMO
25
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 25
We analyzed what kind of SPARQL query LODeX is able to generate
We used as reference the queries contained in the Berlin SPARQL
Benchmark [3]
• LODeX is able to generate 6 of 10 queries contained in BSBM
• UNION queries
• CONSTRUCT queries
• ASK queries
25. DBGroup@UNIMO
26
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 26
We analyzed what kind of SPARQL query LODeX is able to generate
We used as reference the queries contained in the Berlin SPARQL
Benchmark [3]
• LODeX is able to generate 6 of 10 queries contained in BSBM
• UNION queries
• CONSTRUCT queries
• ASK queries
• All JOIN acyclic queries
• All FILTER queries
• All ORDER queries
26. DBGroup@UNIMO
27
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 27
We performed an online survey where has been enrolled 27 users
The survey is divided in two parts having different goals:
• Evaluate the clarity of Schema Summary
• Evaluate the functionality of visual query building
For each part has been designed some tasks and a SUS [4] questionnaires
27. DBGroup@UNIMO
28
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 28
Tasks:
Datasets:
• (T1)Indicate the topic of the dataset
• (T2)Find out the class with the largest number of instances
• (T3)Find out the classes connected to a given class chosen by us
• (T4)Find out the most used attribute of a class chosen by us
• Bio2RDF - INOH - pathway database of model organisms
• Linked Open Aalto Data Service - Open data published by
Aalto University
Task Number n Correct %
T1 54 48 89%
T2 54 48 89%
T3 27 23 89%
T4 27 27 100%
Total 162 148 91%
28. DBGroup@UNIMO
30
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 30
Tasks:
Dataset:
• (Q1)Return all the different category of Nobel prizes
• (Q2) Return a table containing the list of winners of a Nobel
prizes ordered by the name of the winner; the table has to
contain the date of birth of the winner.
• (Q3) Find the award files related to the award of Peter W. Higgs
• (Q4) Find the organizations that won a Nobel prize after the 1999
Nobel Prizes - Linked Open Data about every Nobel Prize
Task Number n Correct %
Q1 27 27 100%
Q2 27 26 96%
Q3 27 22 81%
Q4 27 23 85%
Total 108 98 90%
29. DBGroup@UNIMO
31
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 31
We obtained a median SUS score of 85.5
• No remarkable differences between skilled and unskilled user
• This score classifies the usability of LODeX as “Excellent” [5]
Feedback
Unskilled users write their
SPARQL query for the first time
“LODeX is cognitively less
demanding that write SPARQL
query”
Browser rendering difference
Starting a query can be
difference and keyword
search techniques could be
helpful
30. DBGroup@UNIMO
32
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 32
Conclusion
• LODeX is portable with the 60% of the datasets tested
– 19% a failure induced by endpoint issues
• Both skilled and unskilled users appreciated LODeX
Future works
• Modify the interface of LODeX according to the results of
the online survey
• Define clustering and new techniques of browsing to
reduce the complexity of the Summary for huge dataset
• Extend the group of operators supported by the SPARQL
compiler
31. DBGroup@UNIMO
33
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 33
• [1] F. Benedetti, S. Bergamaschi, and L. Po, A visual summary for
linked open data sources. 2014, International Semantic Web
Conference (Posters & Demos).
• [2] F. Benedetti, S. Bergamaschi, and L. Po. Online index
extraction from linked open data sources. Linked Data for
Information Extraction (LD4IE) Workshop held at International
Semantic Web Conference, 2014.
• [3] C. Bizer and A. Schultz. Benchmarking the performance of
storage systems that expose sparql endpoints.
• [4] J. Brooke. Sus-a quick and dirty usability scale. Usability
evaluation in industry, 189(194):4–7, 1996.
• [5] A. Bangor, P. Kortum, and J. Miller. Determining what
individual sus scores mean: Adding an adjective rating scale.
32. DBGroup@UNIMO
34
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 34
33. DBGroup@UNIMO
35
Visual Querying LOD sources with LODeX
K-Cap 2015 - The 8th International Conference on Knowledge
Capture October 7-10, 2015, Palisades, NY, USA
Fabio Benedetti
Dip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
Thanks for your attention!
Try LODeX at: http://dbgroup.unimo.it/lodex2