NetIKX Semantic Search Presentation

Semantic Search
Ready to Use?

Dr Victoria Uren

Motivation

“The classic keyword search box exerts a powerful gravitational pull.
Academics and industry researchers need to achieve the intellectual
‘escape velocity’ necessary to revolutionize search. They must invest
much more in bold strategies that can achieve natural-language
searching and answering, rather than providing the electronic
equivalent of the index at the back of a reference book. “

Oren Etzioni, Search needs a shake up, Nature, 4 Aug. 2011, v.476,
pp25-26

“A little semantics goes a long way”
Jim Hendler

Plan

Introduction - What is semantic search?

Research Background
How it works
Interface types
Research Issues

What is usable?
For web search
For corporate data management

Search as we know it

Full text search
TF-IDF & other statistical approaches
PageRank – exploiting hyperlink graph

Controlled term search
OPAC
MESH etc.

Other metadata
Date of publication, author etc.

Output typically ranked pages, records, documents

Semantic Search
Classic IR perspective

Improve statistical/link based search of documents / webpages
by better understanding user’s information need

Resolve ambiguity
Clustering

Query expansion
Past searches, WordNet etc. to suggest related terms

Semantic Search
Web 3.0 perspective

Improve search over machine understandable data which
may, or may not, include annotated documents

Search for entities (people, products …)

Search for facts (capital of Georgia?)

Fuse knowledge from different sources

Exploit structure of formal knowledge
Broader / narrower plus much more

Web 3.0 Search is
Metadata search

So more like
Searching a relational database
E.g. an OPAC
Search of the deep web

BUT linked data is “heterogeneous”
Multiple domains mixed together

Microformats & RDFa are from multiple sources
Quality & consistency variable

Benefits of Semantic Search

Machine understandability
i.e. controlled by “ontologies” so you can reason over it
Supports entity search

Ambiguity
Seat/SEAT

Broader/narrower
Exploiting hierarchical class relations

Complex queries over triples
E.g. Joint between mild steel and stainless steel

Heterogeneity
Mappings between ontologies (silo bridging)

Formal queries over RDF

SQL-like languages
SPARQL , SeRQL

Xpath like languages
Xquery, Rpath

Others
Metalog (controlled English)
F-logic
RDF-QBE (query by example)
James Bailey et al., Web and Semantic Web Query
Languages: A Survey. Reasoning Web 2005: 35-133

Sample SPARQL

Subject Object
Predicate

SELECT ?x
WHERE { ?x <http://www.w3.org/2001/vcard-rdf/3.0#FN> "John Smith" }

PREFIX vcard: http://www.w3.org/2001/vcard-rdf/3.0#

SELECT ?y ?givenName
WHERE { ?y vcard:Family "Smith" .
?y vcard:Given ?givenName . }

Examples from http://jena.sourceforge.net/ARQ/Tutorial/

Interfaces for Query Generation

Keyword

Forms

Graph based

Question answering

Tabular browsers

Keyword based

Aims to be as close as possible to Google-like keyword search

Pluses
Minimal learning curve for users
Can handle heterogeneity

Minus
Query complexity is limited to Entity search & Simple
triples

SemSearch

Y. Lei, V. Uren, and E. Motta, A Ranking-Driven
Approach to Semantic Search, Poster in ASWC 2008

SemSearch

4 matches 6 matches
(2 classes & 2 individuals) (relations)

Total queries generated = 4*6 = 24
for “News: Victoria“

Forms

Familiar interface metaphor
Database search
Product search

Plus
Allows construction of more complex searches

Minus
Can’t handle heterogeneous open web - forms need to be
pre-defined

Graph-based Search

Aim is to expose the structure of the ontology to the user to
scaffold query formulation

Pluses
Good for single ontology environments
Helps the user comprehend the domain

Minuses
Can become unwieldy with big and complex domains

Question Answering

Natural language input
“What is the capital of Georgia?”

Translation process transforms the natural language into a formal
query

Pluses
Relatively complex queries possible (intersection of 2 triples)
Can deal with heterogeneity
User doesn’t need to understand the ontology

Minuses
Heavy computation

AquaLog: question answering

What are the which is, project, has- AKT,
projects projects, project-member/ Dot.KoM
of Vanessa? vanessa has-project-leader,
vanessa
Natural
Linguistic Logical
Language Answer
Triple Triples
Query

GATE Relation Semantic
components Similarity match
Service
Lopez, V., Uren, V., Motta, E. and Pasin, M. (2007) AquaLog: An
ontology-driven question answering system for organizational
semantic intranets, Journal of Web Semantics, 5, 2, pp. 72-105.

Tabular Browsing

Start with keyword search expand by browsing through links

Pluses
Supports data exploration
Output as sets of facts

Minuses
Not suitable for heterogeneous datasets
Can be slow

Parallax
(http://www.freebase.com/labs/parallax/)

Research Challenges

Usability / expressivity trade off

Heterogeneity
Ontologies, quality, provenance
Mapping, filtering

Security & Privacy
Personal data, social web

Scalability

Usable Web3.0 Tools

For Web search

For Corporate data management

NOTE – a personal selection – I’m not endorsing any of these!

Sig.ma (Semantic Information Mashup) http://sig.ma

Runs off Sindice crawl of pages with embedded RDFa and
other microformats

Uses a keyword search for entities

No attempt at fusion or disambiguation

Google RichSnippets

Entity data based on microformats, RDFa, microdata
Reviews
People
Products (GoodRelations)
Businesses & Organizations
Recipes
Events
Video

Supports entity search, with keyword search & facetted browsing

Harvested from sites which supply the data in the required formats

Wolfram|Alpha
http://www.wolframalpha.com/

Focus is on computational knowledge

Natural language question input

Uses its own proprietary knowledge base

DBpedia
http://dbpedia.neofonie.de/browse/

Searches factual information extracted from Wikipedia as RDF

Facetted browse approach in the home page

BUT used in many many other research & Open Linked Data
sites (e.g. Sig.ma)

Usable Web3.0 Tools

For Web Search

For Corporate Data Management

Opportunity for bridging data silos
Keyword search has never been as good for CMS and
Intranet as for internet
Need experts to configure free text search well
Distribution of terms can be skewed – impossible to
configure
Web3.0 is a network native technology

Drupal 7

One of the most popular CMS
E.g. Recovery.gov was originally on Drupal
Semantic Drupal research pioneered by DERI Galway

Open Source
Developers often prefer it to Sharepoint

RDFa export as standard from CMS structure (no annotation needed)
Publish structured data that Google, Sindice etc. can harvest

API methods built in

Search NOT built in

Virtuoso
(http://virtuoso.openlinksw.com/)

Hybrid server
XML
SQL
RDF
Free Text

Supporting
Merging of data silos in different formats
Production of Web applications & services
Large Scale
Open Source version

Ready to use?

Beyond the TRL3-5 “valley of
Death”

TRL7? for facetted browse, server
technology

Not yet a stable market -
technologies like SearchMonkey
may come & go

Acknowledgements

People: Fabio Ciravegna , Aba-Sah Dadzie, Khadija
Elbedweihy, Miriam Fernandez, Yuangui Lei, Vanessa Lopez,
Enrico Motta

Projects: X-Media, OpenKnowledge, AKT, SmartProducts

NetIKX Semantic Search Presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to NetIKX Semantic Search Presentation

Similar to NetIKX Semantic Search Presentation (20)

More from urvics

More from urvics (8)

Recently uploaded

Recently uploaded (20)

NetIKX Semantic Search Presentation