NetIKX Semantic Search Presentation

841 views
753 views

Published on

The slides discuss the research agenda for search of the semantic web and current available search tools. The slides were prepared for an audience of information

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
841
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
22
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

NetIKX Semantic Search Presentation

  1. 1. Semantic SearchReady to Use?Dr Victoria Uren
  2. 2. Motivation“The classic keyword search box exerts a powerful gravitational pull.Academics and industry researchers need to achieve the intellectual‘escape velocity’ necessary to revolutionize search. They must investmuch more in bold strategies that can achieve natural-languagesearching and answering, rather than providing the electronicequivalent of the index at the back of a reference book. “Oren Etzioni, Search needs a shake up, Nature, 4 Aug. 2011, v.476,pp25-26“A little semantics goes a long way”Jim Hendler
  3. 3. Plan Introduction - What is semantic search? Research Background How it works Interface types Research Issues What is usable? For web search For corporate data management
  4. 4. Introduction
  5. 5. Search as we know it Full text search TF-IDF & other statistical approaches PageRank – exploiting hyperlink graph Controlled term search OPAC MESH etc. Other metadata Date of publication, author etc. Output typically ranked pages, records, documents
  6. 6. Semantic SearchClassic IR perspective Improve statistical/link based search of documents / webpages by better understanding user’s information need Resolve ambiguity Clustering Query expansion Past searches, WordNet etc. to suggest related terms
  7. 7. Semantic SearchWeb 3.0 perspective Improve search over machine understandable data which may, or may not, include annotated documents Search for entities (people, products …) Search for facts (capital of Georgia?) Fuse knowledge from different sources Exploit structure of formal knowledge Broader / narrower plus much more
  8. 8. Web 3.0 Search isMetadata search So more like Searching a relational database E.g. an OPAC Search of the deep web BUT linked data is “heterogeneous” Multiple domains mixed together Microformats & RDFa are from multiple sources Quality & consistency variable
  9. 9. Benefits of Semantic Search Machine understandability i.e. controlled by “ontologies” so you can reason over it Supports entity search Ambiguity Seat/SEAT Broader/narrower Exploiting hierarchical class relations Complex queries over triples E.g. Joint between mild steel and stainless steel Heterogeneity Mappings between ontologies (silo bridging)
  10. 10. Research Systems
  11. 11. Formal queries over RDF SQL-like languages SPARQL , SeRQL Xpath like languages Xquery, Rpath Others Metalog (controlled English) F-logic RDF-QBE (query by example) James Bailey et al., Web and Semantic Web Query Languages: A Survey. Reasoning Web 2005: 35-133
  12. 12. Sample SPARQL Subject Object PredicateSELECT ?xWHERE { ?x <http://www.w3.org/2001/vcard-rdf/3.0#FN> "John Smith" }PREFIX vcard: http://www.w3.org/2001/vcard-rdf/3.0#SELECT ?y ?givenNameWHERE { ?y vcard:Family "Smith" . ?y vcard:Given ?givenName . }Examples from http://jena.sourceforge.net/ARQ/Tutorial/
  13. 13. Interfaces for Query Generation Keyword Forms Graph based Question answering Tabular browsers
  14. 14. Keyword based Aims to be as close as possible to Google-like keyword search Pluses Minimal learning curve for users Can handle heterogeneity Minus Query complexity is limited to Entity search & Simple triples
  15. 15. SemSearch Y. Lei, V. Uren, and E. Motta, A Ranking-Driven Approach to Semantic Search, Poster in ASWC 2008
  16. 16. SemSearch4 matches 6 matches(2 classes & 2 individuals) (relations) Total queries generated = 4*6 = 24 for “News: Victoria“
  17. 17. Forms Familiar interface metaphor Database search Product search Plus Allows construction of more complex searches Minus Can’t handle heterogeneous open web - forms need to be pre-defined
  18. 18. Graph-based Search Aim is to expose the structure of the ontology to the user to scaffold query formulation Pluses Good for single ontology environments Helps the user comprehend the domain Minuses Can become unwieldy with big and complex domains
  19. 19. Question Answering Natural language input “What is the capital of Georgia?” Translation process transforms the natural language into a formal query Pluses Relatively complex queries possible (intersection of 2 triples) Can deal with heterogeneity User doesn’t need to understand the ontology Minuses Heavy computation
  20. 20. AquaLog: question answeringWhat are the which is, project, has- AKT,projects projects, project-member/ Dot.KoMof Vanessa? vanessa has-project-leader, vanessa Natural Linguistic Logical Language Answer Triple Triples Query GATE Relation Semantic components Similarity match Service Lopez, V., Uren, V., Motta, E. and Pasin, M. (2007) AquaLog: An ontology-driven question answering system for organizational semantic intranets, Journal of Web Semantics, 5, 2, pp. 72-105.
  21. 21. Tabular Browsing Start with keyword search expand by browsing through links Pluses Supports data exploration Output as sets of facts Minuses Not suitable for heterogeneous datasets Can be slow
  22. 22. Parallax(http://www.freebase.com/labs/parallax/)
  23. 23. Research Challenges Usability / expressivity trade off Heterogeneity Ontologies, quality, provenance Mapping, filtering Security & Privacy Personal data, social web Scalability
  24. 24. Near Commercial Systems
  25. 25. Usable Web3.0 Tools For Web search For Corporate data managementNOTE – a personal selection – I’m not endorsing any of these!
  26. 26. Sig.ma (Semantic Information Mashup) http://sig.ma Runs off Sindice crawl of pages with embedded RDFa and other microformats Uses a keyword search for entities No attempt at fusion or disambiguation
  27. 27. Web Search -Sig.ma
  28. 28. Google RichSnippets Entity data based on microformats, RDFa, microdata Reviews People Products (GoodRelations) Businesses & Organizations Recipes Events Video Supports entity search, with keyword search & facetted browsing Harvested from sites which supply the data in the required formats
  29. 29. Wolfram|Alphahttp://www.wolframalpha.com/ Focus is on computational knowledge Natural language question input Uses its own proprietary knowledge base
  30. 30. DBpediahttp://dbpedia.neofonie.de/browse/ Searches factual information extracted from Wikipedia as RDF Facetted browse approach in the home page BUT used in many many other research & Open Linked Data sites (e.g. Sig.ma)
  31. 31. Usable Web3.0 Tools For Web Search For Corporate Data Management Opportunity for bridging data silos Keyword search has never been as good for CMS and Intranet as for internet Need experts to configure free text search well Distribution of terms can be skewed – impossible to configure Web3.0 is a network native technology
  32. 32. Drupal 7 One of the most popular CMS E.g. Recovery.gov was originally on Drupal Semantic Drupal research pioneered by DERI Galway Open Source Developers often prefer it to Sharepoint RDFa export as standard from CMS structure (no annotation needed) Publish structured data that Google, Sindice etc. can harvest API methods built in Search NOT built in
  33. 33. Virtuoso(http://virtuoso.openlinksw.com/) Hybrid server XML SQL RDF Free Text Supporting Merging of data silos in different formats Production of Web applications & services Large Scale Open Source version
  34. 34. Ready to use?Beyond the TRL3-5 “valley ofDeath”TRL7? for facetted browse, servertechnologyNot yet a stable market -technologies like SearchMonkeymay come & go
  35. 35. Acknowledgements People: Fabio Ciravegna , Aba-Sah Dadzie, Khadija Elbedweihy, Miriam Fernandez, Yuangui Lei, Vanessa Lopez, Enrico Motta Projects: X-Media, OpenKnowledge, AKT, SmartProducts

×