• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
NetIKX Semantic Search Presentation
 

NetIKX Semantic Search Presentation

on

  • 775 views

The slides discuss the research agenda for search of the semantic web and current available search tools. The slides were prepared for an audience of information

The slides discuss the research agenda for search of the semantic web and current available search tools. The slides were prepared for an audience of information

Statistics

Views

Total Views
775
Views on SlideShare
775
Embed Views
0

Actions

Likes
1
Downloads
10
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    NetIKX Semantic Search Presentation NetIKX Semantic Search Presentation Presentation Transcript

    • Semantic SearchReady to Use?Dr Victoria Uren
    • Motivation“The classic keyword search box exerts a powerful gravitational pull.Academics and industry researchers need to achieve the intellectual‘escape velocity’ necessary to revolutionize search. They must investmuch more in bold strategies that can achieve natural-languagesearching and answering, rather than providing the electronicequivalent of the index at the back of a reference book. “Oren Etzioni, Search needs a shake up, Nature, 4 Aug. 2011, v.476,pp25-26“A little semantics goes a long way”Jim Hendler
    • Plan Introduction - What is semantic search? Research Background How it works Interface types Research Issues What is usable? For web search For corporate data management
    • Introduction
    • Search as we know it Full text search TF-IDF & other statistical approaches PageRank – exploiting hyperlink graph Controlled term search OPAC MESH etc. Other metadata Date of publication, author etc. Output typically ranked pages, records, documents
    • Semantic SearchClassic IR perspective Improve statistical/link based search of documents / webpages by better understanding user’s information need Resolve ambiguity Clustering Query expansion Past searches, WordNet etc. to suggest related terms
    • Semantic SearchWeb 3.0 perspective Improve search over machine understandable data which may, or may not, include annotated documents Search for entities (people, products …) Search for facts (capital of Georgia?) Fuse knowledge from different sources Exploit structure of formal knowledge Broader / narrower plus much more
    • Web 3.0 Search isMetadata search So more like Searching a relational database E.g. an OPAC Search of the deep web BUT linked data is “heterogeneous” Multiple domains mixed together Microformats & RDFa are from multiple sources Quality & consistency variable
    • Benefits of Semantic Search Machine understandability i.e. controlled by “ontologies” so you can reason over it Supports entity search Ambiguity Seat/SEAT Broader/narrower Exploiting hierarchical class relations Complex queries over triples E.g. Joint between mild steel and stainless steel Heterogeneity Mappings between ontologies (silo bridging)
    • Research Systems
    • Formal queries over RDF SQL-like languages SPARQL , SeRQL Xpath like languages Xquery, Rpath Others Metalog (controlled English) F-logic RDF-QBE (query by example) James Bailey et al., Web and Semantic Web Query Languages: A Survey. Reasoning Web 2005: 35-133
    • Sample SPARQL Subject Object PredicateSELECT ?xWHERE { ?x <http://www.w3.org/2001/vcard-rdf/3.0#FN> "John Smith" }PREFIX vcard: http://www.w3.org/2001/vcard-rdf/3.0#SELECT ?y ?givenNameWHERE { ?y vcard:Family "Smith" . ?y vcard:Given ?givenName . }Examples from http://jena.sourceforge.net/ARQ/Tutorial/
    • Interfaces for Query Generation Keyword Forms Graph based Question answering Tabular browsers
    • Keyword based Aims to be as close as possible to Google-like keyword search Pluses Minimal learning curve for users Can handle heterogeneity Minus Query complexity is limited to Entity search & Simple triples
    • SemSearch Y. Lei, V. Uren, and E. Motta, A Ranking-Driven Approach to Semantic Search, Poster in ASWC 2008
    • SemSearch4 matches 6 matches(2 classes & 2 individuals) (relations) Total queries generated = 4*6 = 24 for “News: Victoria“
    • Forms Familiar interface metaphor Database search Product search Plus Allows construction of more complex searches Minus Can’t handle heterogeneous open web - forms need to be pre-defined
    • Graph-based Search Aim is to expose the structure of the ontology to the user to scaffold query formulation Pluses Good for single ontology environments Helps the user comprehend the domain Minuses Can become unwieldy with big and complex domains
    • Question Answering Natural language input “What is the capital of Georgia?” Translation process transforms the natural language into a formal query Pluses Relatively complex queries possible (intersection of 2 triples) Can deal with heterogeneity User doesn’t need to understand the ontology Minuses Heavy computation
    • AquaLog: question answeringWhat are the which is, project, has- AKT,projects projects, project-member/ Dot.KoMof Vanessa? vanessa has-project-leader, vanessa Natural Linguistic Logical Language Answer Triple Triples Query GATE Relation Semantic components Similarity match Service Lopez, V., Uren, V., Motta, E. and Pasin, M. (2007) AquaLog: An ontology-driven question answering system for organizational semantic intranets, Journal of Web Semantics, 5, 2, pp. 72-105.
    • Tabular Browsing Start with keyword search expand by browsing through links Pluses Supports data exploration Output as sets of facts Minuses Not suitable for heterogeneous datasets Can be slow
    • Parallax(http://www.freebase.com/labs/parallax/)
    • Research Challenges Usability / expressivity trade off Heterogeneity Ontologies, quality, provenance Mapping, filtering Security & Privacy Personal data, social web Scalability
    • Near Commercial Systems
    • Usable Web3.0 Tools For Web search For Corporate data managementNOTE – a personal selection – I’m not endorsing any of these!
    • Sig.ma (Semantic Information Mashup) http://sig.ma Runs off Sindice crawl of pages with embedded RDFa and other microformats Uses a keyword search for entities No attempt at fusion or disambiguation
    • Web Search -Sig.ma
    • Google RichSnippets Entity data based on microformats, RDFa, microdata Reviews People Products (GoodRelations) Businesses & Organizations Recipes Events Video Supports entity search, with keyword search & facetted browsing Harvested from sites which supply the data in the required formats
    • Wolfram|Alphahttp://www.wolframalpha.com/ Focus is on computational knowledge Natural language question input Uses its own proprietary knowledge base
    • DBpediahttp://dbpedia.neofonie.de/browse/ Searches factual information extracted from Wikipedia as RDF Facetted browse approach in the home page BUT used in many many other research & Open Linked Data sites (e.g. Sig.ma)
    • Usable Web3.0 Tools For Web Search For Corporate Data Management Opportunity for bridging data silos Keyword search has never been as good for CMS and Intranet as for internet Need experts to configure free text search well Distribution of terms can be skewed – impossible to configure Web3.0 is a network native technology
    • Drupal 7 One of the most popular CMS E.g. Recovery.gov was originally on Drupal Semantic Drupal research pioneered by DERI Galway Open Source Developers often prefer it to Sharepoint RDFa export as standard from CMS structure (no annotation needed) Publish structured data that Google, Sindice etc. can harvest API methods built in Search NOT built in
    • Virtuoso(http://virtuoso.openlinksw.com/) Hybrid server XML SQL RDF Free Text Supporting Merging of data silos in different formats Production of Web applications & services Large Scale Open Source version
    • Ready to use?Beyond the TRL3-5 “valley ofDeath”TRL7? for facetted browse, servertechnologyNot yet a stable market -technologies like SearchMonkeymay come & go
    • Acknowledgements People: Fabio Ciravegna , Aba-Sah Dadzie, Khadija Elbedweihy, Miriam Fernandez, Yuangui Lei, Vanessa Lopez, Enrico Motta Projects: X-Media, OpenKnowledge, AKT, SmartProducts