Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

inteSearch: An Intelligent Linked Data Information Access Framework

440 views

Published on

Information access over linked data requires to determine
subgraph(s), in linked data's underlying graph, that correspond to the required information need. Usually, an information access framework is able to retrieve richer information by checking of a large number of possible subgraphs. However, on the ecking of a large number of possible subgraphs increases information access complexity. This makes information access frameworks less e ective. A large number of contemporary linked data information access frameworks reduce the complexity by introducing di erent heuristics but they su er on retrieving richer information. Or, some frameworks do not care about the complexity. However, a practically usable framework should retrieve richer information with lower complexity. In linked data information access, we hypothesize that pre-processed data statistics of linked data can be used to eciently check a large number of possible subgraphs. This will help to retrieve comparatively richer information with lower data access complexity. Preliminary evaluation of our proposed hypothesis shows promising performance.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

inteSearch: An Intelligent Linked Data Information Access Framework

  1. 1. inteSearch: An Intelligent Linked Data Information Access Framework Md-Mizanur Rahoman, Ryutaro Ichise November 11, 2014
  2. 2. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion Outline Introduction Background of Linked Data Information Access Problem and Probable Solution Proposed Retrieval Framework: inteSearch Pre-processing of Linked Data Framework Details Experiment Conclusion Md-Mizanur Rahoman, Ryutaro Ichise j 2
  3. 3. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion Linked Data (LD) are structured data represent knowledge with tuples like << Subject, Predicate, Object >> which called as RDF triples can be represented by graph can use SQL-like expressive query store, as openly available, 2122 datasets, 61 billion RDF triples (as of Apr. 2014) label type Property type type :birthPlace :supervisor :spouse Birth Place Supervisor Spouse label label range domain domain range domainrange :Country :Person Country Person label label type Class type Schema/Ontology :amnd :barl :clra :dnld label label Amanda type :grmn :uk :grce Germany United Kingdom Greece Donald :spouse :supervisor :spouse :birthPlace :birthPlace :birthPlace :birthPlace label label label type Berlusconi Cleyra label label Instances Md-Mizanur Rahoman, Ryutaro Ichise j 3
  4. 4. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion Information Access over LD It require sub-graph
  5. 5. nding over LD graph impose sub-stantial execution cost, if graph size get bigger know-how of (dataset speci
  6. 6. c) vocabulary, schema, LD query (i.e., linked data semantics) demand domain-level expertise expect automated tool to understand linked data semantics label type Property type type :birthPlace :supervisor :spouse Birth Place Supervisor Spouse label label range domain domain range domainrange :Country :Person Country Person label label type Class type Schema/Ontology :amnd :barl :clra :dnld label label Amanda type :grmn :uk :grce Germany United Kingdom Greece Donald :spouse :supervisor :spouse :birthPlace :birthPlace :birthPlace :birthPlace label label label type Berlusconi Cleyra label label Instances :spouse :dnld :birthPlacelabel :grce Donald label Greece Md-Mizanur Rahoman, Ryutaro Ichise j 4
  7. 7. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion Contemporary LD Information Access Systems Language-Tool-Based-Systems (PowerAqua'06, TBSL'12, FREyA'11, SemSek'12, CASIA'13 etc.) use language tools (e.g., parser, POS tagger etc.) to predict possible sub-graphs (over LD graph) convert sub-graphs to
  8. 8. nd SPARQL query Pivot-Point-Based-Systems (Treo'11, NLP-Reduce'07 etc.) pick a query word (i.e., pivot point), then try to pick other query word w.r.t. the pivot point and predict a possible sub-graph (over LD graph) convert sub-graph to
  9. 9. nd SPARQL query Md-Mizanur Rahoman, Ryutaro Ichise j 5
  10. 10. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion Language-Tool-Based-Systems Problem generate many improper parsed trees - dierent parser gives dierent parsed trees, with dierent parsing tags. tag for improper semantics (e.g., miss tagging of query words, such as whether query word spouse should be tagged for Object or Predicate) generate empty result or improper result - choosing incorrect sub-graph Md-Mizanur Rahoman, Ryutaro Ichise j 6
  11. 11. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion Pivot-Point-Based-Systems Problem depend heavily upon picking correct pivot point - most of the cases, systems pick NE (named entities) related pivot points
  12. 12. rst, then other pivot points impose huge cost, if pivot point need to change - one pivot point can have multiple LD resources miss contextual information attachment e.g., random choosing of pivot points could generate very dierent result Md-Mizanur Rahoman, Ryutaro Ichise j 7
  13. 13. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion Problem Statement Probable Solution Problem Statement To LD information access, how can we
  14. 14. nd the required sub-graph (over LD graph) within minimum execution cost that will not generate empty result will not miss contextual information of query Solution To
  15. 15. nd correct sub-graph - check maximum possible sub-graph generation possibility To achieve minimum execute cost - prepare pre-processed LD statistics which insight sub-graph generation possibility To not lose contextual information of query - adapt a sub-graph joining technique called Progressive Joining Approach (Rahoman Ichise'14) Md-Mizanur Rahoman, Ryutaro Ichise j 8
  16. 16. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion inteSearch - Overview Pre-processed data statistics store LD resources in a way so that they can be picked easily store pattern of LD resources so that they can give insight about possible sub-graph Development of framework generate single query word based graph (called as, Basic Graph) merge all Basic Graphs to predict all possible sub-graphs (i.e., called as Keyword Graphs) rank all possible Keyword Graphs using pre-processed data statistics generate SPARQL query for the best ranked Keyword Graphs Md-Mizanur Rahoman, Ryutaro Ichise j 9
  17. 17. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion Pre-processed data statistics Label Extractor - extract and store label of LD resource lv (r ) = fo j 9 r ; p; o 2 RDF triples of dataset ^ p 2 rrp rrp is resource representing Predicates e.g., label, title etc.g Pattern-wise Resource Frequency Generator - compute and store LD resource pattern frequency sf (r ) = j f r ; p; o j 9 r ; p; o 2 RDF triples of datasetg j pf (r ) = j f s; r ; o j 9 s; r ; o 2 RDF triples of datasetg j of (r ) = j f s; p; r j 9 s; p; r 2 RDF triples of datasetg j Md-Mizanur Rahoman, Ryutaro Ichise j 10
  18. 18. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion Example of Pre-processed Data Statistics Exemplary LD graph Supervisor Spouse label type Property type type :birthPlace :supervisor :spouse Birth Place label label range domain domain range domainrange :Country :Person Country Person label label type Class type Schema/Ontology :amnd :barl :clra :dnld label label Amanda type :grmn :uk :grce Germany United Kingdom Greece Donald :spouse :supervisor :spouse :birthPlace :birthPlace :birthPlace :birthPlace label label label type Berlusconi Cleyra label label Instances Country label :Country type Class Pre-processed data statistics r lv (r ) sf(r) pf (r ) of (r ) :Country Country 2 ... ... :... ... ... ... ... Md-Mizanur Rahoman, Ryutaro Ichise j 11
  19. 19. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion Development of Framework Basic Graph Generator - generate the Basic Graphs Keyword Graph Generator - merge all Basic Graphs to predict the Keyword Graphs Ranker - rank all possible Keyword Graphs using pre-processed data statistics SPARQL Query Generator - generate SPARQL query for the best ranked Keyword Graphs Md-Mizanur Rahoman, Ryutaro Ichise j 12
  20. 20. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion Development of Framework Md-Mizanur Rahoman, Ryutaro Ichise j 13
  21. 21. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion Basic Graph Generator Choose one of the three Basic Graphs for each query word ?o ?p k ?s , or k k , or ?o ?p ?s decided by (particular) similar LD resources (toward the query word) and their pattern frequencies e.g., if (particular) similar LD resources fR g and Predicate Pattern-wise Resource Frequency of a LD resource (e.g., pf (ri )) is bigger than all Subject and Object Pattern-wise Resource Frequencies, then we select Basic Graph ?o k ?s weight computed by highest pattern frequencies of LD resources fR g Md-Mizanur Rahoman, Ryutaro Ichise j 14
  22. 22. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion Development of Framework Md-Mizanur Rahoman, Ryutaro Ichise j 15
  23. 23. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion Keyword Graph Generator Merge all Basic Graphs in their all possible merging options by following Progressive Joining Approach e.g., merging 1st and 2nd Basic Graphs at all possible options k1 ?s ?o k ?p ?s 2 1st Basic Graph k 1 2nd Basic Graph k ?s1 2 , and ?s k ?o 1 1 k 2 ?p 2 1 ?o k ?s 1 1 k 2 ?p 2 1 Progressive Joining Approach - if query words with order fk1; k2; k3; :::; kmg, then join Basic Graph of k1 and Basic Graph of k2 and
  24. 24. nd a Intermediate-version Keyword Graph, then progressively join next Basic Graph for remaining query words and update Intermediate-version Keyword Graph, until there is query word Progressive Joining Approach maintain contextual information attachment Md-Mizanur Rahoman, Ryutaro Ichise j 16
  25. 25. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion Progressive Joining Approach - an Example Intermediate-version Keyword Graph k ?p ?s 1 1 ?o 2 k2 1 ?p and Next query word corresponding Basic Graph k ?s 3 all possible contextualy-feasible Keyword Graph Intermediate Next BG Joining between Increase of KG Version KG last joined BG and next BG k ?p ?s 1 1 ?o 2 k2 ?p 1 k ?s 3 k k 2 ?s 3 1 ?s k ?o 2 1 k 3 ?p 3 2 ?o k ?s 2 2 k 3 ?p 3 1 k k 2 ?s 3 1 ?s k ?o 2 1 k 3 ?p 3 2 ?o k ?s 2 2 k 3 ?p 3 1 k1 k1 k1 Md-Mizanur Rahoman, Ryutaro Ichise j 17
  26. 26. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion Development of Framework Md-Mizanur Rahoman, Ryutaro Ichise j 18
  27. 27. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion Ranker Rank Keyword Graphs for Weight - minimum weight of constituent Basic Graphs Depth level - how many edges a Keyword Graph holds Consider lower depth level Keyword Graphs with higher ranked than higher depth level Keyword Graphs Md-Mizanur Rahoman, Ryutaro Ichise j 19
  28. 28. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion Development of Framework Md-Mizanur Rahoman, Ryutaro Ichise j 20
  29. 29. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion SPARQL Query Generator Construct SPARQL query for higher ranked Keyword Graphs, until get the
  30. 30. rst non-empty result directly converted by putting Variables in SELECT clause merging keyword corresponding resources in UNION clause Md-Mizanur Rahoman, Ryutaro Ichise j 21
  31. 31. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion Experimental Setup Question setup Questions: Question Answering over Linked Data test question set 3(QALD-3) consist natural language questions Dataset Total Qs QALD-3 DBpedia 99 99 Keywords: constructed manually w.r.t. word order of question words Evaluation metrics Recall, Precision F1-Measure Evaluated for detail performance analysis, execution complexity measure, comparison with other systems Md-Mizanur Rahoman, Ryutaro Ichise j 22
  32. 32. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion Detail performance analysis Analyzed for number of keywords each question hold No of Qs Recall (Avg) Precision (Avg) F1 Measure (Avg) One Keyword Group 1 1.00 1.00 1.00 Two Keyword Group 45 0.90 0.96 0.92 Three Keyword Group 13 0.77 0.77 0.77 Four Keyword Group 8 0.75 0.75 0.75 Five Keyword Group 3 1.000 1.000 1.000 0.87 0.90 0.88 Observation according to One/Two/Three Keyword Group questions, selection of Basic Graph works well according to more-than-one Keyword Group questions, merging-based Keyword Graph construction and ranking works well pre-processed data statistics helps in ecient sub-graph
  33. 33. nding over linked data graph Md-Mizanur Rahoman, Ryutaro Ichise j 23
  34. 34. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion Execution time wise performance analysis Environment Machine: Intel R CoreTMi7-4770K central processing unit (CPU) 3.50 GHz based system with 16 GB memory. Triple Store: Network-connected Virtuoso (version 06.01.3127) One Two Three Four Five Keyword Keyword Keyword Keyword Keyword Group Group Group Group Group 710 (ms) 2441 (ms) 2774 (ms) 3585 (ms) 3720 (ms) Observation execution cost linearly increase over number of keywords pre-processed data statistics supports in faster execution Md-Mizanur Rahoman, Ryutaro Ichise j 24
  35. 35. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion Performance Comparison Compared for QALD-3 challenge participant systems # of Questions Processed Right Partially Recall Precision F1-Measure squall2sparql 99 99 80 13 0.88 0.93 0.90 CASIA 99 52 29 8 0.36 0.35 0.36 Scalewelis 99 70 32 1 0.33 0.33 0.33 inteSearch 99 70 60 1 0.87 0.90 0.88 Observation: pre-processed data statistics helps in ecient sub-graph
  36. 36. nding over linked data graph Md-Mizanur Rahoman, Ryutaro Ichise j 25
  37. 37. Introduction Proposed Retrieval Framework: inteSearch Experiment Conclusion Conclusion IA over LD require
  38. 38. nding proper sub-graph over LD graph We contributed devising LD IA framework that does not generate empty result maintain contextual information attachment retrieve rich information with low execution cost Single query word based Basic Graph can be extended for multiple query words, that can increase further eciency Md-Mizanur Rahoman, Ryutaro Ichise j 26
  39. 39. Questions? Md-Mizanur Rahoman, mizan@nii.ac.jp Ryutaro Ichise, ichise@nii.ac.jp Md-Mizanur Rahoman, Ryutaro Ichise j 27

×