+ 
Walking Linked Data: 
a graph traversal approach to explain clusters 
Ilaria Tiddi, Mathieu d’Aquin, Enrico Motta 
s
+ 
Problem: explaining patterns 
Data: women/men literacy rate from UNESCO [1] 
In which countries are men more educated than women? 
ThHeo ywe ldloow y coouu knntroiews? ( ) 
Education : Men Women Equal
+ 
Problem: explaining behaviors 
We explain thanks to our own (background) knowledge. 
Can we do the same with the knowledge from Linked Data?
+ 
Linked Data contain explanations 
:Somalia db:Somalia 
but where? 
:Ethiopia 
:India 
:UK 
:US 
db:Ethiopia 
db:India 
db:UK 
db:US 
sameAs 
sameAs 
sameAs 
sameAs 
sameAs
+ 
Linked Data contain explanations 
:Somalia db:Somalia 
but where? 
:Ethiopia 
:India 
:UK 
:US 
db:Ethiopia 
db:India 
db:UK 
db:US 
… 
… 
… 
sameAs 
dc:subject 
dc:subject 
dc:subject 
sameAs 
sameAs 
sameAs 
sameAs 
dc:subject 
dc:subject
+ 
Linked Data contain explanations 
:Somalia db:Somalia 
but where? 
:Ethiopia 
:India 
:UK 
:US 
db:Ethiopia 
db:India 
db:UK 
db:US 
… 
… 
… 
db:Category:Least 
DevelopedCountries 
db:Category: 
LiberalCountries 
sameAs 
dc:subject 
dc:subject 
dc:subject 
sameAs 
sameAs 
sameAs 
sameAs 
dc:subject 
dc:subject 
skos:relatedMatch 
skos:relatedMatch 
skos:relatedMatch
+ 
Linked Data contain explanations 
:Somalia db:Somalia 
1,200/pp 
but where? 
:Ethiopia 
:India 
:UK 
:US 
db:Ethiopia 
db:India 
db:UK 
db:US 
600/pp 
3,800/pp 
36,000/pp 
dbp:gdp 
49,000/pp 
sameAs dbp:gdp 
sameAs 
sameAs 
sameAs 
sameAs 
dbp:gdp 
dbp:gdp 
dbp:gdp
+ 
Linked Data contain explanations 
:Somalia db:Somalia 
1,200/pp 
dbp:gdp 
dbp:gdp 
dbp:gdp 
but where? 
:Ethiopia 
:India 
:UK 
:US 
db:Ethiopia 
db:India 
db:UK 
db:US 
600/pp 
3,800/pp 
36,000/pp 
3,800/pp 
36,000/pp 
≤ 
≤ 
≤ 
≥ 
49,000/pp ≥ 
sameAs 
sameAs 
sameAs 
sameAs 
sameAs 
dbp:gdp 
dbp:gdp
+ 
Looking for explanations in graph 
:Somalia 
:Ethiopia 
:India 
dbp:gdp 
sameAs 4,000/pp 
cat:LeastDeveloped 
Countries 
sameAs 
sameAs 
… 
… 
… 
dbp:gdp 
 Given a graph of Linked Data where 
 URI are nodes 
 RDF properties are edges 
… 
… 
… 
… 
… 
dc:subject 
dc:subject 
dbp:gdp 
dc:subject 
skos:related 
skos:related 
≤ 
≤ 
≤
+ 
Looking for explanations in graph 
:Somalia 
:Ethiopia 
:India 
dbp:gdp 
sameAs 4,000/pp 
cat:LeastDeveloped 
Countries 
sameAs 
sameAs 
… 
… 
… 
… 
… 
… 
… 
dc:subject 
dc:subject 
dbp:gdp 
dbp:gdp 
dc:subject 
skos:related 
skos:related 
≤ 
≤ 
≤ 
… 
 Find 
 the ending value most pointed by entities in the cluster 
 the best path in order to further expand the graph
+ 
A* algorithm for Linked Data 
 Best-first search algorithm 
 Given an initial node and a final node 
 find the least expensive path between them 
 Path cost function f(path) = actual cost g(path)+ future cost h(path) 
 Without knowledge of the graph 
 Search in the graph for the best path and explanation 
 The graph is iteratively build by URI dereferencing 
 No need to know the Linked Data graph a priori
+ 
Dedalo: an A* process for Linked Data 
Building graph 
(URI dereferencing) 
Choosing the 
best path 
Finding the 
best explanation 
 Iteratively building a Linked Data graph and looking for an 
explanation of the pattern
+ 
Dedalo: an A* process for Linked Data 
Dereference URIs through HTTP GET 
 take an entity 
db:Ethiopia 
 read its properties and values 
db:Ethiopia 
 add them to the graph 
db:Category:AfricanCountries 
dc:subject 
dbp:gdp 1,200 
: 
I 
n 
d 
i 
a 
: 
I 
n 
d 
i 
a 
: 
I 
n 
d 
i 
a 
: 
I 
n 
d 
i 
a 
: 
I 
n 
d 
i 
db:Ethiopia 
dc:subject db:Category:AfricanCountries 
dbp:gdp 1,200 
owl:sameAs 
… 
… 
… 
… 
… 
…
+ 
Dedalo: an A* process for Linked Data 
Collect new paths (sequences of edges) 
 add the new property to the previous path 
owl:sameAsdc:subject 
owl:sameAsdbp:gdp 
 evaluate new paths with Entropy1 
ent(owl:sameAsdc:subject) 
ent(owl:sameAsdbp:gdp) 
 add to the pile of paths (the first one is chosen) 
owl:sameAsdc:subject 
owl:sameAsdbp:gdp 
owl:sameAs 
[1] Tiddi et al., ESWC 2014 
: 
I 
n 
d 
i 
a 
: 
I 
n 
d 
i 
a 
: 
I 
n 
d 
i 
a 
: 
I 
n 
d 
i 
a 
: 
I 
n 
d 
i 
a 
… 
… 
… 
… 
… 
… 
… 
… 
… 
… 
…
+ 
Dedalo: an A* process for Linked Data 
Build explanations (path + final nodes) 
 Each of the values the new path points to 
 e1= owl:sameAsdc:subject 
 e2= owl:sameAsdc:subject 
db:Category:AfricanCountries 
db:Category:SouthAsianCountries 
 Compare numerical value if the property is a datatype 
 e2= owl:sameAsdc:gdp ≥ 
1,200 
 e3= owl:sameAsdc:gdp ≤ 1,200 
 Rank explanations according to the 
F-Measure 
initial URIs 
(countries) 
URIs pointing to 
URIs 
in 
1,200
+ 
Dedalo: experiments 
Countries where men are more educated than women 
skos:exactMatchdbp:hdiRank ≥ 126 87.8% 197” 
skos:exactMatchdc:subject 
db:Category:Least_Developed_Countries 
74.7% 524’’ 
skos:exactMatchdbp:gdpPPPPerCapitaRank ≥ 89 68.3% 269” 
Countries where women are more educated than men 
skos:exactMatchdbp:hdiRank ≤ 119 63.4% 198” 
skos:exactMatchdbp:gdpPPPPerCapitaRank ≤ 56 62.3% 236’’ 
Countries where education is equal 
skos:exactMatchdbp:gdpPPPRank ≥ 64 62.0% 234” 
skos:exactMatchdbp:gdpPPPPerCapitaRank ≥ 29 61.0% 268’’
+ 
Conclusions and future work 
 Dedalo, A* process to search explanation within Linked Data 
 From a pattern to explain 
 Finds the path to the best explanation 
 Using Entropy and F-Measure 
 Focusing on the bias introduced by incomplete data2 
 Combining atomic explanations3 
 Evaluating Dedalo on a large use case: Google Trends 
[2, 3] Tiddi et al., EKAW 2014
+ 
Thanks! Questions? 
s
Walking Linked Data: a graph traversal approach to explain clusters
Walking Linked Data: a graph traversal approach to explain clusters
Walking Linked Data: a graph traversal approach to explain clusters
Walking Linked Data: a graph traversal approach to explain clusters

Walking Linked Data: a graph traversal approach to explain clusters

  • 1.
    + Walking LinkedData: a graph traversal approach to explain clusters Ilaria Tiddi, Mathieu d’Aquin, Enrico Motta s
  • 2.
    + Problem: explainingpatterns Data: women/men literacy rate from UNESCO [1] In which countries are men more educated than women? ThHeo ywe ldloow y coouu knntroiews? ( ) Education : Men Women Equal
  • 3.
    + Problem: explainingbehaviors We explain thanks to our own (background) knowledge. Can we do the same with the knowledge from Linked Data?
  • 4.
    + Linked Datacontain explanations :Somalia db:Somalia but where? :Ethiopia :India :UK :US db:Ethiopia db:India db:UK db:US sameAs sameAs sameAs sameAs sameAs
  • 5.
    + Linked Datacontain explanations :Somalia db:Somalia but where? :Ethiopia :India :UK :US db:Ethiopia db:India db:UK db:US … … … sameAs dc:subject dc:subject dc:subject sameAs sameAs sameAs sameAs dc:subject dc:subject
  • 6.
    + Linked Datacontain explanations :Somalia db:Somalia but where? :Ethiopia :India :UK :US db:Ethiopia db:India db:UK db:US … … … db:Category:Least DevelopedCountries db:Category: LiberalCountries sameAs dc:subject dc:subject dc:subject sameAs sameAs sameAs sameAs dc:subject dc:subject skos:relatedMatch skos:relatedMatch skos:relatedMatch
  • 7.
    + Linked Datacontain explanations :Somalia db:Somalia 1,200/pp but where? :Ethiopia :India :UK :US db:Ethiopia db:India db:UK db:US 600/pp 3,800/pp 36,000/pp dbp:gdp 49,000/pp sameAs dbp:gdp sameAs sameAs sameAs sameAs dbp:gdp dbp:gdp dbp:gdp
  • 8.
    + Linked Datacontain explanations :Somalia db:Somalia 1,200/pp dbp:gdp dbp:gdp dbp:gdp but where? :Ethiopia :India :UK :US db:Ethiopia db:India db:UK db:US 600/pp 3,800/pp 36,000/pp 3,800/pp 36,000/pp ≤ ≤ ≤ ≥ 49,000/pp ≥ sameAs sameAs sameAs sameAs sameAs dbp:gdp dbp:gdp
  • 9.
    + Looking forexplanations in graph :Somalia :Ethiopia :India dbp:gdp sameAs 4,000/pp cat:LeastDeveloped Countries sameAs sameAs … … … dbp:gdp  Given a graph of Linked Data where  URI are nodes  RDF properties are edges … … … … … dc:subject dc:subject dbp:gdp dc:subject skos:related skos:related ≤ ≤ ≤
  • 10.
    + Looking forexplanations in graph :Somalia :Ethiopia :India dbp:gdp sameAs 4,000/pp cat:LeastDeveloped Countries sameAs sameAs … … … … … … … dc:subject dc:subject dbp:gdp dbp:gdp dc:subject skos:related skos:related ≤ ≤ ≤ …  Find  the ending value most pointed by entities in the cluster  the best path in order to further expand the graph
  • 11.
    + A* algorithmfor Linked Data  Best-first search algorithm  Given an initial node and a final node  find the least expensive path between them  Path cost function f(path) = actual cost g(path)+ future cost h(path)  Without knowledge of the graph  Search in the graph for the best path and explanation  The graph is iteratively build by URI dereferencing  No need to know the Linked Data graph a priori
  • 12.
    + Dedalo: anA* process for Linked Data Building graph (URI dereferencing) Choosing the best path Finding the best explanation  Iteratively building a Linked Data graph and looking for an explanation of the pattern
  • 13.
    + Dedalo: anA* process for Linked Data Dereference URIs through HTTP GET  take an entity db:Ethiopia  read its properties and values db:Ethiopia  add them to the graph db:Category:AfricanCountries dc:subject dbp:gdp 1,200 : I n d i a : I n d i a : I n d i a : I n d i a : I n d i db:Ethiopia dc:subject db:Category:AfricanCountries dbp:gdp 1,200 owl:sameAs … … … … … …
  • 14.
    + Dedalo: anA* process for Linked Data Collect new paths (sequences of edges)  add the new property to the previous path owl:sameAsdc:subject owl:sameAsdbp:gdp  evaluate new paths with Entropy1 ent(owl:sameAsdc:subject) ent(owl:sameAsdbp:gdp)  add to the pile of paths (the first one is chosen) owl:sameAsdc:subject owl:sameAsdbp:gdp owl:sameAs [1] Tiddi et al., ESWC 2014 : I n d i a : I n d i a : I n d i a : I n d i a : I n d i a … … … … … … … … … … …
  • 15.
    + Dedalo: anA* process for Linked Data Build explanations (path + final nodes)  Each of the values the new path points to  e1= owl:sameAsdc:subject  e2= owl:sameAsdc:subject db:Category:AfricanCountries db:Category:SouthAsianCountries  Compare numerical value if the property is a datatype  e2= owl:sameAsdc:gdp ≥ 1,200  e3= owl:sameAsdc:gdp ≤ 1,200  Rank explanations according to the F-Measure initial URIs (countries) URIs pointing to URIs in 1,200
  • 16.
    + Dedalo: experiments Countries where men are more educated than women skos:exactMatchdbp:hdiRank ≥ 126 87.8% 197” skos:exactMatchdc:subject db:Category:Least_Developed_Countries 74.7% 524’’ skos:exactMatchdbp:gdpPPPPerCapitaRank ≥ 89 68.3% 269” Countries where women are more educated than men skos:exactMatchdbp:hdiRank ≤ 119 63.4% 198” skos:exactMatchdbp:gdpPPPPerCapitaRank ≤ 56 62.3% 236’’ Countries where education is equal skos:exactMatchdbp:gdpPPPRank ≥ 64 62.0% 234” skos:exactMatchdbp:gdpPPPPerCapitaRank ≥ 29 61.0% 268’’
  • 17.
    + Conclusions andfuture work  Dedalo, A* process to search explanation within Linked Data  From a pattern to explain  Finds the path to the best explanation  Using Entropy and F-Measure  Focusing on the bias introduced by incomplete data2  Combining atomic explanations3  Evaluating Dedalo on a large use case: Google Trends [2, 3] Tiddi et al., EKAW 2014
  • 18.

Editor's Notes

  • #11 explanations might lead to more than one value
  • #13 if we take the example as a graph search explanation to reach and entities as starting points
  • #14 if we take the example as a graph search explanation to reach and entities as starting points
  • #15 adapted featrues
  • #16 overview of the process repeated
  • #19 the next step is to know which is the path to follow entropy a measure leading to a good explanation
  • #20 the amount of items in the cluster to which the explanation applies awa the amount of items outside the cluster one being precision and the latter being recall
  • #21 the green cluster shows how the developed countries have equal education
  • #22 explain Gtrends on based on events happened at a given time