Presented in : JIST2015, Yichang, China
Prototype: http://rc.lodac.nii.ac.jp/rdf4u/
Video: https://www.youtube.com/watch?v=z3roA9-Cp8g
Abstract: It is known that Semantic Web and Linked Open Data (LOD) are powerful technologies for knowledge management, and explicit knowledge is expected to be presented by RDF format (Resource Description Framework), but normal users are far from RDF due to technical skills required. As we learn, a concept-map or a node-link diagram can enhance the learning ability of learners from beginner to advanced user level, so RDF graph visualization can be a suitable tool for making users be familiar with Semantic technology. However, an RDF graph generated from the whole query result is not suitable for reading, because it is highly connected like a hairball and less organized. To make a graph presenting knowledge be more proper to read, this research introduces an approach to sparsify a graph using the combination of three main functions: graph simplification, triple ranking, and property selection. These functions are mostly initiated based on the interpretation of RDF data as knowledge units together with statistical analysis in order to deliver an easily-readable graph to users. A prototype is implemented to demonstrate the suitability and feasibility of the approach. It shows that the simple and flexible graph visualization is easy to read, and it creates the impression of users. In addition, the attractive tool helps to inspire users to realize the advantageous role of linked data in knowledge management.
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
1. RDF GRAPH VISUALIZATION
BY INTERPRETING LINKED DATA AS KNOWLEDGE
Rathachai CHAWUTHAI & Prof.Hideaki TAKEDA
National Institute of Informatics , and SOKENDAI
RDF4U
JIST2015 Yichang, China 11-13 Nov 2015
4. THE ROLE OF SEMANTIC WEB IN KNOWLEDGE MANAGEMENT
DDaattaa ttiieerr
SSeerrvviiccee ttiieerr
VViissuuaalliissaattiioonn ttiieerr
SSPPAARRQQLL JJEENNAA eettcc..
4
AApppplliiccaattiioonn//PPrreesseennttaattiioonn//
At Visualisation Tier,
• RDF data are transformed into
Chart, Geographic Map, etc.
and then serve users.
It’s cool, but
• Users are far from RDF data, so
they do not understand the power
of Semantic Web and do not realise
how to contribute RDF data.
For this reason,
• It could be good if users can read
RDF data directly using node-link
diagram or concept-map diagram.
read
5. READING FROM A QUERY GRAPH
5
Querying the 2-hop neighbourhood (or more hops) of a given URI
gives wider information on the topic.
CCaaffffee
MMoocchhaa
EEsspprreessssoo CChhooccoollaattee
SSuuggaarr MMiillkk
CCooffffeeeettyyppee
sswweeeett
ttyyppee
ttaassttee
ssuuggaarrccaannee
mmaaddee ffrroomm
ccooww
pprroodduucceess
wwhhiittee
ccoolloorr
ccooccooaa
ccoonnttaaiinnss
aa sshhoott ooff
ttooppppeedd bbyyccoonnttaaiinnss
hhaass llaayyeerr ooff
ccaaffffeeiinnee
ccoonnttaaiinn
443300 mmgg//LL
bbllaacckk
ccoolloorr
bbiitttteerr
ttaassttee
6. PROBLEMS
1) A Query Graph is TOO Complicated to Read.
http://lod.ac/species/Bubohttp://dbpedia.org/resource/Tokyo
6
7. PROBLEMS
7
2) Lacking of Reading Flow of RDF Data
All triples are equal, so Background Content and Main Point
are NOT structured in any RDF graphs.
≠ TTooppiicc
8. GOAL
8
we prefer …….
✦ A Simply Readable Graph
✦ A Well-Reading-Flow Graph
TTooppiicc
TTooppiicc
Common Information
Topic-Specific Information
12. GRAPH SIMPLICATION
12
• Some well-prepared RDF repositories did reasoning on
ontologies in order to support a SPARQL service.
• One impact is that the inferred triples create giant
components in a graph.
• A closer look at the data indicates that the following
situations are commonly found in any complex RDF graph.
• equivalent or same-as instances (owl:sameAs),
• transitive properties (e.g. skos:broaderTransitive), and
• hierarchical classification (rdf:type & rdfs:subClassOf)
• Thus, this method aims to remove some redundant triples
by using the mechanism of Semantic Web rules.
15. TRIPLE RANKING
15
Since users have different background knowledge in a specific topic,
beginners may interested in reading common information before getting
topic-specific information, while experts may prefer to read only topic-
specific information.
• Concept Level (resources || properties)
• General Concepts are terms that are commonly known such as
“name”, “address”, and “class”, and they are always found in a corpus.
• Key Concepts are important terms that are always found in the query
result and not many in the whole dataset.
• Information Level (triples)
• Common Information explains background knowledge that supports
readers to understand the main content. (a lot of general concepts)
• Topic-Specific Information contains specific terms that are highly
relevance to the article. (a lot of key concepts)
16. TRIPLE RANKING
16
are General Concepts are Key Concepts
Identify
• General concepts
• Key concepts
Get an RDF graph 2211
17. TRIPLE RANKING
17
are General Concepts are Key Concepts
Common Information
Most of nodes and links
are general concepts
33 44
Topic-Specific Information
Most of nodes and links are
key concepts
18. α⋅w(s) + β⋅w(p) + γ⋅w(o)
3
α⋅w(s) + β⋅w(p) + γ⋅w(o)
α + β + γ
TRIPLE RANKING
18
w(uri)=
fQ(uri)
log( fD(uri) + 1)
vw(〈s,p,o〉)=
a number of a URI in a Query result
a logarithmic scale of a number of a URI
in a whole Dataset
Weight of a URI
Visualization-Weight of a Triple
The coefficients are 1.0 by default,
but they can be adjusted due to for specific purpose.
Concept Level
Information Level
high: key concept
low: general concept
high: topic-specific
low: common info
20. TRIPLE RANKING
20
Subject Predicate Object vw
dp:Hydrogen rdf:type owl:Thing 5.62
dp:Hydrogen rdf:type skos:Concept 6.01
dp:Hydrogen dct:subject dp:Chemical_elements 7.31
dp:Hydrogen dct:subject dp:Airship_technology 7.35
dp:Hydrogen rdf:type dp:Diatomic_nonmetals 7.48
H
For Example
http://dbpedia.org/resource/Hydrogen
Common
Topic-Specific
Information Level
21. TRIPLE RANKING
21
In case of sub-property (also sub-class)
ltk:higherTaxon
ltk:mergedInto
skos:broader
rdfs:subPropertyOf
rdfs:subPropertyOf
ltk:higherTaxon
ltk:mergedInto
a x
a y
skos:broader
a x
a y
skos:broader
more specific than
Raw Data Inferred Data
23. PROTOTYPE
23
http://rc.lodac.nii.ac.jp/rdf4u/
Thanks to
Client: D3js, Bootstrap, jQuery,
Server: SimpleRDF, SPARQL for PHP
• To simplify a graph by removing some
inferred triples.
• To give ranking scores to triples based on
common and topic-specific information.
• To filter a graph by selecting preferred
properties.
• To control an interactive graph diagram.
Features
bit.ly/rdf4u
24. DISCUSSION
Usefulness
Uniqueness
Novelty
Prospect
Some graph visualisation works: Motif,
Gephi, RDF Gravity, Fenfire, and
IsaViz,
• do not use the power of Semantic
Web to sparsity a graph, and
• do not mention to provide
different data for different user
levels
• TF-IDF is adapted for ordering
triple from common to topic-
specific level of information.
• The degree of commonness versus
specificity is calculated by
evaluating the nature of the
dataset with the algorithm.
• The triple ranking can be extended
by applying various algorithm in
order to satisfy diverse
characteristics of the data in other
domains such as Biodiversity
Informatics.
• Mashup tools should consider this
idea.
24
• A diagram is sparser and easier
to be read by human.
• Beginners can read common
information firstly.
• Expert can read topic-specific
information.
25. FUTURE PLAN
• To do critical evaluation
• Survey
• Number of cutting edge
• To find the precise border between
common information and topic-
specific information
• To find a better way to count the
number of URIs
(always timeout)
• To remove noisy triples
• To improve triple ranking algorithm
for other domains
25