Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage


Published on

Published in: Education, Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Towards an Empirical Semantic Web Science: Knowledge Pattern Extraction and Usage

  1. 1. Towards an Empirical SemanticWeb Science: Knowledge PatternExtraction and Usage Andrea Nuzzolese Ph.D. Student Università di Bologna STLab, ISTC-CNR
  2. 2. Outline• Empirical Semantic Web Science and Knowledge Patterns (KPs)• A possible methodology for making KPs emerge from the Web of Data• The work done so far in KP extraction• Evaluating KPs efficacy through Exploratory Search 2
  3. 3. Does a Web science exist?• A science usually is applied to clear research objects ✦ Physical and biological science analyzes the natural world, and tries to find microscopic laws that, extrapolated to the macroscopic realm, would generate the behavior observed• The Web is an engineered space created through formally specified languages and protocols• Web pages with their content and links are created by humans with a particular task governed by social conventions and laws• A Web science exists [Berners-Lee Et Al., 2006] and is oriented to: ✦ Growth of the engineered space; ✦ Human-web interaction patterns 3
  4. 4. What about a Web of Data science?• Linked data offers huge data for empirical research 4
  5. 5. What are the research objects of the empirical SW science? • The Semantic Web and Linked data give us the chance to empirically study what are the patterns in organizing and representing knowledge • The research objects of the Semantic Web as an empirical science are Knowledge Patterns (KPs) 5
  6. 6. Knoweldge Patterns• KPs are small well connected units of meaning, which are ✦ task based ✦ well grounded ✦ cognitively sound• KPs find their theoretical grounding in frames ✦ “… a frame is a data-structure for representing a stereotyped situation.” [Minsky 1975] ✦ “...the availability of global patterns of knowledge cuts down on non-determinacy enough to offset idiosyncratic bottom-up input that might otherwise be confusing.” [Beaugrande 1980] 6
  7. 7. An example of KP 7
  8. 8. Empirical Semantic Web and KPs• KPs emerge from the knowledge soup deriving from the Web• A methodology for KP extraction from the Web 8
  9. 9. KP extraction• The Web is populated by heterogeneous sources• We can classify sources in two categories ✦ Formal and semi-formal sources modeled by adopting a top-down approach ✴ e.g., foundational ontologies, frames, thesauri, etc. ✦ Non-formal sources modeled by adopting a bottom-up approach ✴ e.g., RDBs, Linked Data, Web pages, XML documents, etc.• Our KP extraction methodology is based on two complementary approaches ✦ A top-down approach ✦ A bottom-up approach 9
  10. 10. KP boundary 10
  11. 11. KP detection and discovery• The top-down approach is aimed to extract KPs that already exists in a formal or semi-formal structure ✦ Possible techniques: reengineering, refactoring based on association rules, key concept identification, ontology mapping, etc.• The bottom-up approach is aimed to extract to discover or detect KPs from data ✦ Possible techniques: inductive techniques, machine learning, data mining, ontology mining, etc. 11
  12. 12. KP validation• The top-down and the bottom-up approaches concur in the validation of KPs• KP extraction is a matter of understanding how the world or specific domains have been described from different perspectives ✦ The perspective of domain experts, ontologists, etc., which try to give formalizations either of the world or of specific domains ✦ The perspective of users, data entries, etc, which effectively populate and manage data that report facts about the world• For example it would be cognitively relevant if an occurrence of KP emerges both with the top-down and the bottom-up approach 12
  13. 13. KP extraction methodology 13
  14. 14. KP reengineering from FrameNet’s frames• FrameNet is a cognitive sound lexical knowledge base, which is grounded in a large corpus• FrameNet consists of a set of frames, which have frame elements lexical units, which pair words (lexemes) to frames, and relations to corpus elements ✦ Each frame can be interpreted as a class of situations 14
  15. 15. An example of frame 15
  16. 16. Using Semion for reengineering and refactoring FrameNet’s frame!"#$%"$#&(!%)*+&(,-./$-01%(!%)*+&(,-./$-01%(2&"&(34#5$0(2&"&(6*7*#*.1&(2&"&( 16
  17. 17. FrameNet as LOD 17
  18. 18. FrameNet as KPs 18
  19. 19. KP discovery from Wikipedia links• Hypothesis ✦ the types of linked resources that occur most often for a certain type of resource constitute its KP ✦ since we expect that any cognitive invariance in explaining/describing things is reflected in the wikilink graph, discovered KPs are cognitively sound• Contribution ✦ an EKP discovery procedure ✦ 184 EKPs published in OWL2 19
  20. 20. Collecting paths from wikilinks dbpedia: dbpo:Person owl:Thing owl:Thing Organisation Path dbpo: dbpedia: db:Minnie_Mouse db:The_Walt_Disney_Company Company FictionalCharacterdbpo:wikiPageWikiLink Path rdf:type dbpo: db:Mickey_Mouse FictionalCharacter rdfs:subClassOf dbpo: owl:Thing FictionalCharacter dbpo:Person 20
  21. 21. Path popularity Jackson_5 Dave_Grohl Michael_Jackson Jackie_Jackson Nirvana Madonna Prince Charlie_Parker Keith_JarrettFoo Fighters Beatles nSubjectRes(Pi,j)/nRes(Si) John_Lennon Paul_McCartney 21
  22. 22. Boundaries of KPs• An KP(Si) is a set of paths, such that Pi,j ∈ KP(Si) ! pathPopularity(Pi,j, Si) ≥ t• t is a threshold, under which a path is not included in an KP• How to get a good value for t? 22
  23. 23. Boundary inductionStep Description 1 For each path, calculate the path popularity For each subject type, get the 40 top-ranked path popularity 2 values* Apply multiple correlation (Pearson ρ) between the paths of all 3 subject types by rank, and check for homogeneity of ranks across subject types For each of the 40 path popularity ranks, calculate its mean 4 across all subject types 5 Apply k-means clustering on the 40 ranks Decide threshold(s) based on k-means as well as other 6 indicators (e.g. FrameNet roles distribution) 23
  24. 24. Boundary induction 24
  25. 25. How can be KPs evaluated and used?• The evaluation of KPs should be performed in terms of their capability to be cognitively sound in capturing and representing knowledge• A scenario that can be used as for evaluating the efficacy of KPs is the exploratory search combined with user studies. 25
  26. 26. Why exploratory search?• Exploratory search is characterized “by uncertainty about the space being searched and the nature of the problem that motivates the search” [White Et Al., 2005]• KPs can be used for supporting exploratory search ✦ They can be used in order to filter knowledge by drawing a meaningful boundary around the retrieved data ✦ They allow to suggest exploratory paths based on cognitive criteria of relevance• We can investigate how KPs help users in exploratory search tasks 26
  27. 27. Aemoo: KP-based exploratory search• A Web application that supports exploratory search on the Web based on KPs extracted from Wikipedia links• It aggregates knowledge from Linked Data, Wikipedia, Twitter and Google News by applying KPs as knowledge lenses over data• It provides an effective summary of knowledge about an entity, including explanations 27
  28. 28. Exploring knowledge with Aemoo (1) 28
  29. 29. Exploring knowledge with Aemoo (2) 29
  30. 30. Conclusions• We want to contribute to the realization of the Semantic Web as an empirical science by providing a methodology for KP extraction• Our methodology for extracting KPs is based on two approaches ✦ a top-down approach ✦ a bottom-up approach• We have seen our experience in KP extraction so far ✦ KPs from FrameNet’s frames ✦ KPs from Wikipedia links• The evaluation we have in mind should be performed by means of exploratory search tasks ✦ Aemoo 30
  31. 31. Thanks 31