inteSearch: An Intelligent Linked Data Information Access Framework

inteSearch: An Intelligent Linked Data Information Access
Framework
Md-Mizanur Rahoman, Ryutaro Ichise
November 11, 2014

Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion
Outline
Introduction
Background of Linked Data Information Access
Problem and Probable Solution
Proposed Retrieval Framework: inteSearch
Pre-processing of Linked Data
Framework Details
Experiment
Conclusion
Md-Mizanur Rahoman, Ryutaro Ichise j 2

Linked Data (LD)
are structured data
represent knowledge with tuples like
<< Subject, Predicate, Object >>
which called as RDF triples
can be represented by graph
can use SQL-like expressive query
store, as openly available,
2122 datasets, 61 billion
RDF triples (as of Apr. 2014)
label
type
Property
type type
:birthPlace :supervisor :spouse
Birth Place
Supervisor Spouse
label label
range domain
domain range
domainrange
:Country :Person
Country
Person
label label
type
Class
type
Schema/Ontology
:amnd :barl :clra :dnld
label label
Amanda
type
:grmn :uk :grce
Germany United
Kingdom
Greece
Donald
:spouse :supervisor :spouse
:birthPlace :birthPlace :birthPlace
:birthPlace
label label label
type
Berlusconi Cleyra
label label
Instances

Information Access over LD
It require
sub-graph

nding over LD graph
impose sub-stantial execution cost,
if graph size get bigger
know-how of (dataset speci

c)
vocabulary, schema, LD query
(i.e., linked data semantics)
demand domain-level expertise
expect automated tool to
understand linked data semantics
label
type
Property
type type
Birth Place
Supervisor Spouse
label label
range domain
domain range
domainrange
:Country :Person
Country
Person
label label
type
Class
type
Schema/Ontology
label label
Amanda
type
:grmn :uk :grce
Germany United
Kingdom
Greece
Donald
:birthPlace
label label label
type
Berlusconi Cleyra
label label
Instances
:spouse
:dnld
:birthPlacelabel
:grce
Donald
label
Greece

Contemporary LD Information Access Systems
Language-Tool-Based-Systems (PowerAqua'06, TBSL'12,
FREyA'11, SemSek'12, CASIA'13 etc.)
use language tools (e.g., parser, POS tagger etc.) to predict possible
sub-graphs (over LD graph)
convert sub-graphs to

nd SPARQL query
Pivot-Point-Based-Systems (Treo'11, NLP-Reduce'07 etc.)
pick a query word (i.e., pivot point), then try to pick other query word
w.r.t. the pivot point and predict a possible sub-graph (over LD graph)
convert sub-graph to

nd SPARQL query

Language-Tool-Based-Systems
Problem
generate many improper parsed trees - dierent parser gives dierent
parsed trees, with dierent parsing tags.
tag for improper semantics (e.g., miss tagging of query words, such as
whether query word spouse should be tagged for Object or
Predicate)
generate empty result or improper result - choosing incorrect sub-graph

Pivot-Point-Based-Systems
Problem
depend heavily upon picking correct pivot point - most of the cases,
systems pick NE (named entities) related pivot points

rst, then other
pivot points
impose huge cost, if pivot point need to change - one pivot point can
have multiple LD resources
miss contextual information attachment e.g., random choosing of pivot
points could generate very dierent result

Problem Statement Probable Solution
Problem Statement
To LD information access, how can we

nd the required sub-graph
(over LD graph) within minimum execution cost that
will not generate empty result
will not miss contextual information of query
Solution
To

nd correct sub-graph - check maximum possible sub-graph
generation possibility
To achieve minimum execute cost - prepare pre-processed LD statistics
which insight sub-graph generation possibility
To not lose contextual information of query - adapt a sub-graph
joining technique called Progressive Joining Approach (Rahoman
Ichise'14)

inteSearch - Overview
Pre-processed data statistics
store LD resources in a way so that they can be picked easily
store pattern of LD resources so that they can give insight about
possible sub-graph
Development of framework
generate single query word based graph (called as, Basic Graph)
merge all Basic Graphs to predict all possible sub-graphs (i.e., called as
Keyword Graphs)
rank all possible Keyword Graphs using pre-processed data statistics
generate SPARQL query for the best ranked Keyword Graphs

Label Extractor - extract and store label of LD resource
lv (r ) = fo j 9 r ; p; o 2 RDF triples of dataset ^ p 2 rrp
rrp is resource representing Predicates e.g., label, title etc.g
Pattern-wise Resource Frequency Generator - compute and store
LD resource pattern frequency
sf (r ) = j f r ; p; o j 9 r ; p; o 2 RDF triples of datasetg j
pf (r ) = j f s; r ; o j 9 s; r ; o 2 RDF triples of datasetg j
of (r ) = j f s; p; r j 9 s; p; r 2 RDF triples of datasetg j

Example of Pre-processed Data Statistics
Exemplary LD graph
Supervisor Spouse
label
type
Property
type type
Birth Place
label label
range domain
domain range
domainrange
:Country :Person
Country
Person
label label
type
Class
type
Schema/Ontology
label label
Amanda
type
:grmn :uk :grce
Germany United
Kingdom
Greece
Donald
:birthPlace
label label label
type
Berlusconi Cleyra
label label
Instances
Country
label
:Country
type
Class
r lv (r ) sf(r) pf (r ) of (r )
:Country Country 2 ... ...
:... ... ... ... ...

Development of Framework
Basic Graph Generator - generate the Basic Graphs
Keyword Graph Generator - merge all Basic Graphs to predict the
Keyword Graphs
Ranker - rank all possible Keyword Graphs using pre-processed data
statistics
SPARQL Query Generator - generate SPARQL query for the best
ranked Keyword Graphs

Basic Graph Generator
Choose one of the three Basic Graphs for each query word
?o
?p
k
?s , or k
k , or ?o
?p
?s
decided by (particular) similar LD resources (toward the query word)
and their pattern frequencies
e.g.,
if (particular) similar LD resources fR
g and
Predicate Pattern-wise Resource Frequency of a LD resource (e.g.,
pf (ri )) is bigger than all Subject and Object Pattern-wise Resource
Frequencies, then we select Basic Graph
?o
k
?s
weight computed by highest pattern frequencies of LD resources fR
g

Keyword Graph Generator
Merge all Basic Graphs in their all possible merging options by
following Progressive Joining Approach
e.g., merging 1st and 2nd Basic Graphs at all possible options
k1
?s ?o
k
?p
?s 2
1st Basic Graph
k
1
2nd Basic Graph k
?s1 2
, and ?s
k
?o
1
1
k
2
?p
2
1
?o
k
?s
1
1 k
2
?p
2
1
Progressive Joining Approach - if query words with order
fk1; k2; k3; :::; kmg, then
join Basic Graph of k1 and Basic Graph of k2 and

nd a
Intermediate-version Keyword Graph, then
progressively join next Basic Graph for remaining query words and
update Intermediate-version Keyword Graph, until there is query word
Progressive Joining Approach maintain contextual information
attachment

Progressive Joining Approach - an Example
Intermediate-version Keyword Graph k
?p
?s
1
1 ?o
2
k2
1
?p
and Next query word corresponding Basic Graph k
?s 3
all possible contextualy-feasible Keyword Graph
Intermediate Next BG Joining between Increase of KG
Version KG last joined BG
and next BG
k
?p
?s
1
1 ?o
2
k2
?p
1 k
?s 3
k
k
2
?s 3
1
?s
k
?o
2
1
k
3
?p
3
2
?o
k
?s
2
2
k
3
?p
3
1
k
k
2
?s 3
1
?s
k
?o
2
1
k
3
?p
3
2
?o
k
?s
2
2
k
3
?p
3
1
k1
k1
k1

Ranker
Rank Keyword Graphs for
Weight - minimum weight of constituent Basic Graphs
Depth level - how many edges a Keyword Graph holds
Consider lower depth level Keyword Graphs with higher ranked than
higher depth level Keyword Graphs

inteSearch: An Intelligent Linked Data Information Access Framework

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to inteSearch: An Intelligent Linked Data Information Access Framework

Similar to inteSearch: An Intelligent Linked Data Information Access Framework (20)

Recently uploaded

Recently uploaded (20)

inteSearch: An Intelligent Linked Data Information Access Framework