SlideShare a Scribd company logo
1 of 61
Semantic search:
from document retrieval to virtual
assistants
P R E S E N T E D B Y P e t e r M i k a , S r . R e s e a r c h S c i e n t i s t , Y a h o o L a b s ⎪ J u n e 1 9 , 2 0 1 4
Agenda
2
 Invite
 What is Semantic Search?
 Applications to Web search
› Enhanced results
› Entity retrieval and recommendations
 Beyond Web search
Yahoo Labs Barcelona
 Established January, 2006
› Part of a global network of Labs in
Sunnyvale, New York, Barcelona, Haifa,
Bangalore, Beijing, Santiago
 Led by Ricardo Baeza-Yates
 Research areas
› Distributed Systems
› Semantic Search
› Social Media
› Web Mining
› Web Retrieval
Semantic Search Research
Jordi Atserias
Sr. Research Engineer
Roi Blanco
Sr. Research Scientist
Hugues Bouchard
Sr. Research Engineer
Peter Mika
Sr. Research Scientist
Manager
Tim Potter
Research Engineer
Edgar Meij
Research Scientist
What is Semantic Search?
5
Search is really fast, without necessarily being intelligent
Why Semantic Search?
 Improvements in IR are harder and harder to come by
› Basic relevance models are well established
› Machine learning using hundreds of features
› Heavy investment in computational power, e.g. real-time indexing and instant search
 Remaining challenges are not computational, but in modeling user
cognition
› Could Watson explain why the answer is Toronto?
› Need a deeper understanding of the query, the content and the relationship of the two
 Semantic gap
› Ambiguity
• jaguar
• paris hilton
› Secondary meaning
• george bush (and I mean the beer brewer
in Arizona)
› Subjectivity
• reliable digital camera
• paris hilton sexy
› Imprecise or overly precise searches
• jim hendler
 Complex needs
› Missing information
• brad pitt zombie
• florida man with 115 guns
• 35 year old computer scientist living in
barcelona
› Category queries
• countries in africa
• barcelona nightlife
› Transactional or computational queries
• 120 dollars in euros
• digital camera under 300 dollars
• world temperature in 2020
Poorly solved information needs remain
Are there even
true keyword
queries?
Users may
have stopped
asking them
Real problem
What it’s like to be a machine?
Roi Blanco
What it’s like to be a machine?
↵⏏☐ģ
✜Θ♬♬ţğ√∞§®ÇĤĪ✜★♬☐✓✓
ţğ★✜
✪✚✜ΔΤΟŨŸÏĞÊϖυτρ℠≠⅛⌫
≠=⅚©§★✓♪ΒΓΕ℠
✖Γ♫⅜±⏎↵⏏☐ģğğğμλκσςτ
⏎⌥°¶§ΥΦΦΦ✗✕☐
 Def. Semantic Search is any
retrieval method where
› User intent and resources are
represented in a semantic model
• A set of concepts or topics that generalize
over tokens/phrases
• Additional structure such as a hierarchy
among concepts, relationships among
concepts etc.
› Semantic representations of the query
and the user intent are exploited in
some part of the retrieval process
 As a research field
› Workshops
• ESAIR (2008-2014) at CIKM, Semantic
Search (SemSearch) workshop series
(2008-2011) at ESWC/WWW, EOS
workshop (2010-2011) at SIGIR, JIWES
workshop (2012) at SIGIR, Semantic
Search Workshop (2011-2014) at VLDB
› Special Issues of journals
› Surveys
• Christos L. Koumenides, Nigel R.
Shadbolt: Ranking methods for entity-
oriented semantic web search.
JASIST 65(6): 1091-1106 (2014)
12
Semantic Search
Semantic models: implicit vs. explicit
13
 Implicit/internal semantics
› Models of text extracted from a corpus of queries, documents or interaction logs
• Query reformulation, term dependency models, translation models, topic models, latent space
models, learning to match (PLS)
› See
• Hang Li and Jun Xu: Semantic Matching in Search. Foundations and Trends in Information
Retrieval Vol 7 Issue 5, 2013, pp 343-469
 Explicit/external semantics
› Explicit linguistic or ontological structures extracted from text and linked to external
knowledge
› Obtained using IE techniques or acquired from Semantic Web markup
Semantic Search – a process view
Query
Constructi
on
•Keywords
•Forms
•NL
•Formal language
Query
Processin
g
•IR-style matching & ranking
•DB-style precise matching
•KB-style matching & inferences
Result
Presentation
•Query visualization
•Document and data presentation
•Summarization
Query
Refinement
•Implicit feedback
•Explicit feedback
•Incentives
Document Representation
Knowledge Representation
Semantic Models
Resources
Documents
What it’s like to be a machine?
↵⏏☐ģ
✜Θ♬♬ţğ√∞§®ÇĤĪ✜★♬☐✓✓
ţğ★✜
✪✚✜ΔΤΟŨŸÏĞÊϖυτρ℠≠⅛⌫
≠=⅚©§★✓♪ΒΓΕ℠
✖Γ♫⅜±⏎↵⏏☐ģğğğμλκσςτ
⏎⌥°¶§ΥΦΦΦ✗✕☐
What it’s like to be a machine?
<roi>↵⏏☐ģ</roi>
✜Θ♬♬ţğ√∞§®ÇĤĪ✜★♬☐✓✓
ţğ★✜
✪✚✜ΔΤΟŨŸÏĞÊϖυτρ℠≠⅛⌫
≠=⅚©§★✓♪ΒΓΕ℠
✖Γ♫⅜±<roi>⏎↵⏏☐ģ</roi>ğğğμλκσςτ
⏎⌥°¶§ΥΦΦΦ✗✕☐
<roi>
Information Extraction
17
 Documents
› Natural language
• Named Entity Recognition & Disambiguation (“entity linking”)
• Deep parsing (dependency parsing)
› Specific to the Web
• Extraction from web tables, wrapper induction etc.
• Open Information Extraction such as NELL, ReVerb etc.
 Queries
› Short text and no structure… nothing to do?
Information Extraction on queries
18
 Entities play an important role
› ~70% of queries contain a named entity (entity mention queries) and
~50% of queries have an entity focus (entity seeking queries)
• brad pitt attacked by fans
› ~10% of queries are looking for a class of entities
• brad pitt movies
› See
• Jeffrey Pound, Peter Mika, Hugo Zaragoza: Ad-hoc object retrieval in the web of data. WWW
2010: 771-780
• Thomas Lin, Patrick Pantel, Michael Gamon, Anitha Kannan, Ariel Fuxman: Active objects:
actions for entity-centric search. WWW 2012: 589-598
Information Extraction on queries
19
 Common structure to entity mention queries:
query = <entity> + <intent>
› Intent is typically an additional word or phrase to
• Disambiguate, e.g. brad pitt actor
• Specify action or aspect e.g. brad pitt net worth, brad pitt download
 Useful also in off-line query log analysis
› Reduce the sparsity of query log data by mapping entities and intents to a
reference base of entities and intents
oakland as bradd pitt movie moneyball movies.yahoo.com oakland as wikipedia.org
captain america movies.yahoo.com moneyball trailer movies.yahoo.com
money moneyball movies.yahoo.com
moneyball movies.yahoo.com movies.yahoo.com en.wikipedia.org movies.yahoo.com peter brand
peter brand oakland nymag.com moneyball the movie www.imdb.com
moneyball trailer movies.yahoo.com moneyball trailer
brad pitt brad pitt moneyball brad pitt moneyball movie brad pitt moneyball brad pitt moneyball oscar
www.imdb.com
relay for life calvert ocunty www.relayforlife.org trailer for moneyball movies.yahoo.com
moneyball.movie-trailer.com
moneyball en.wikipedia.org movies.yahoo.com map of africa www.africaguide.com
money ball movie www.imdb.com money ball movie trailer moneyball.movie-trailer.com
brad pitt new www.zimbio.com www.usaweekend.com www.ivillage.com www.ivillage.com brad pitt
news news.search.yahoo.com moneyball trailer moneyball trailer www.imdb.com www.imdb.com
Patterns in logs are hard to see
 Sample of sessions from June, 2011 containing the term “moneyball”
› What are users trying to do?
oakland as bradd pitt movie moneyball trailer movies.yahoo.com oakland as wikipedia.org
Semantic annotations help to generalize…
Sports team
Movie
Actor
… and understand user needs
6/19/201422
moneyball trailer
what the user wants to do with it
Movie
Object of the query
Information extraction on queries
23
 Entity linking
› Tutorial: Entity Linking and Retrieval by Edgar Meij, Krisztián Balog and Daan Odijk
› Dataset for evaluation of entity linking (2013)
• Yahoo WebScope dataset L24 - Yahoo Search Query Log To Entities, version 1.0
 Semantic annotation for query log analysis
› Frequent pattern mining on raw queries fails due to large amount of noise
› Meaningful patterns start to emerge when mining the semantic annotations instead
› Laura Hollink, Peter Mika, Roi Blanco: Web usage mining with semantic analysis. WWW
2013: 561-570
Semantic Web
24
 Significant extension of the Web stack
› Languages for publishing raw data and document annotations
› Standards for querying, validating and reasoning with data
distributed across the Web
 Research community formed around 2001
› ISWC, ESWC, WWW Semantic Web Track, JWS
 Conflicted history with Information Retrieval
› Misplaced expectations as to what the Semantic Web will bring
› Building the chicken farm before any chickens or eggs
 Since 2007 more solid progress in adoption
› Metadata in HTML
› Public and private ‘Knowledge Graphs’
Metadata in HTML: schema.org
25
 Agreement on a shared set of schemas for common types of web
content
› Bing, Google, and Yahoo! as initial founders (June, 2011), joined by Yandex later
› Similar in intent to sitemaps.org
• Use a single format to communicate the same information to all three search engines
<div vocab="http://schema.org/" typeof="Movie">
<h1 property="name">Pirates of the Carribean: On Stranger Tides (2011)</h1>
<span property="description">Jack Sparrow and Barbossa embark on a quest to
find the elusive fountain of youth, only to discover that Blackbeard and
his daughter are after it too.</span>
Director: <div property="director” typeof="Person">
<span property="name">Rob Marshall</span>
</div>
</div>
Substantial adoption of schema.org markup
26
 Over 15% of all pages now have schema.org markup
 Over 5 million sites, over 25 billion entity references
 In other words: same order of magnitude as the web
› Source: R.V. Guha: Light at the end of the tunnel, ISWC 2013 keynote
 See also
› P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus, LDOW 2012
• Based on Bing US corpus
• 31% of webpages, 5% of domains contain some metadata (including Facebook’s OGP)
› WebDataCommons
• Based on CommonCrawl Nov 2013
• 26% of webpages, 14% of domains contain some metadata (including Facebook’s OGP)
Knowledge Graphs
27
 Linked (Open) Data (linkeddata.org)
› Public movement for making open/public databases
• available in standard Semantic Web formats
• interlinking them
› Dbpedia is a central hub in this network of datasets
• Software framework to extract structured data from Wikipedia
and consolidate it under a common ontology
• The resulting dataset that contains links to Freebase and
others
– Freebase links to IMDB and so on
 Basis for private Knowledge Graphs
› Bing, Google, Yahoo
Yahoo’s Knowledge Graph
Chicago Cubs
Chicago
Barack Obama
Carlos Zambrano
10% off tickets
for
plays for
plays in
lives in
Brad Pitt
Angelina Jolie
Steven Soderbergh
George Clooney
Ocean’s Twelve
partner
directs
casts in
E/R
casts
in
takes place in
Fight Club
casts in
Dust Brothers
casts
in
music by
Nicolas Torzec: Making knowledge reusable at Yahoo!:
a Look at the Yahoo! Knowledge Base (SemTech 2013)
Building Yahoo’s Knowledge Graph
 Ontology building and maintenance
› Editorially maintained OWL ontology with 300+ classes
› Covering the domains of interest of Yahoo
 Information extraction
› Public datasets and proprietary data
 Data fusion
› Manual mapping from the source schemas to the ontology
› Supervised entity reconciliation
• Kedar Bellare, Carlo Curino, Ashwin Machanavajihala, Peter Mika, Mandar Rahurkar, Aamod Sane:
WOO: A Scalable and Multi-tenant Platform for Continuous Knowledge Base Synthesis. PVLDB 2013
• Michael J. Welch, Aamod Sane, Chris Drome: Fast and accurate incremental entity resolution relative to
an entity knowledge base. CIKM 2012
› Editorial curation and quality assessment
Applications in Web Search
33
Semantic Search for…
34
 Improving ad-hoc document retrieval
› Query composition
› Result presentation
› Matching
› Ranking
 Providing new search functionality
› Entity retrieval
• Related entity recommendation
› Personalization
› Question-answering
› Task completion
Exploiting Semantic Web markup
(internal prototype, 2007)
Personal and
private
homepage
of the same
person
(clear from the
snippet but it
could be also
automatically
de-duplicated)
Conferences
he plans to attend
and his vacations
from homepage
plus bio events
from LinkedIn
Geolocation
Search snippets using Semantic Web markup
 Summarization of HTML is a hard task
• Template detection
• Selecting relevant snippets
• Composing readable text
› Efficiency constraints
 Yahoo SearchMonkey (2008)
› Enhanced results using structured data from the page
• Key/value pairs
• Deep links
• Image or Video
Effectiveness of enhanced results
 Explicit user feedback
› Side-by-side editorial evaluation (A/B testing)
• Editors are shown a traditional search result and enhanced result for the same page
• Users prefer enhanced results in 84% of the cases and traditional results in 3% (N=384)
 Implicit user feedback
› Click-through rate analysis
• Long dwell time limit of 100s (Ciemiewicz et al. 2010)
• 15% increase in ‘good’ clicks
› User interaction model
• Enhanced results lead users to relevant documents
– even though less likely to clicked than textual results
• Enhanced results effectively reduce bad clicks!
 See
› Kevin Haas, Peter Mika, Paul Tarjan, Roi Blanco: Enhanced results for web search. SIGIR 2011:
725-734
Enhanced results at other search providers
 Google announces Rich Snippets - June, 2009
› Faceted search for recipes - Feb, 2011
 Bing tiles – Feb, 2011
 Facebook’s Like button and the Open Graph Protocol (2010)
› Shows up in profiles and news feed
› Site owners can later reach users who have liked an object
Moving beyond entity markup
39
 We would like to help our users in task completion
› But we have trained our users to talk in nouns
• Retrieval performance decreases by adding verbs to queries
› Markup for actions/intents could potentially help
 Modeling actions
› Understand what actions can be taken on a page
› Help users in mapping their query to potential actions
› Applications in web search, email etc.
THING
THING
Schema.org v1.2
including Actions
vocabulary
published
April 16, 2014
Applications of Actions markup
Email (Gmail) SERP (Yandex)
 Entity retrieval
› Which entity does a keyword query
refer to, if any?
 Related entities for navigation
› Which entity would the user visit next?
Entity displays in web search
Entity Retrieval
 Keyword search over entity graphs
› see Pound et al. WWW08 for a definition
› No common benchmark until 2010
 SemSearch Challenge 2010/2011
• 50 entity-mention queries Selected from the Search Query Tiny Sample v1.0 dataset (Yahoo!
Webscope)
• Billion Triples Challenge 2009 data set
• Evaluation using Mechanical Turk
› See report:
• Roi Blanco, Harry Halpin, Daniel M. Herzig, Peter Mika, Jeffrey Pound, Henry S. Thompson,
Thanh Tran: Repeatable and reliable semantic search evaluation. J. Web Sem. 21: 14-29 (2013)
Glimmer: open-source entity retrieval engine from Yahoo
 Extension of MG4J from University of Milano
 Indexing of RDF data
› MapReduce-based
› Horizontal indexing (subject/predicate/object fields)
› Vertical indexing (one field per predicate)
 Retrieval
› BM25F with machine-learned weights for properties and domains
› 52% improvement over the best system in SemSearch 2010
 See
› Roi Blanco, Peter Mika, Sebastiano Vigna: Effective and Efficient Entity Search in RDF Data.
International Semantic Web Conference (1) 2011: 83-97
› https://github.com/yahoo/Glimmer/
Other evaluations in Entity Retrieval
 TREC Entity Track
› 2009-2011
› Data
• ClueWeb 09 collection
› Queries
• Related Entity Finding
– Entities related to a given entity through a
particular relationship
– (Homepages of) airlines that fly Boeing 747
• Entity List Completion
– Given some elements of a list of entities,
complete the list
 Professional sports teams in Philadelphia such
as the Philadelphia Wings, …
› Relevance assessments provided by
TREC assessors
 Question Answering over Linked Data
› 2011-2014
› Data
• Dbpedia and MusicBrainz in RDF
› Queries
• Full natural language questions of different
forms, written by the organizers
• Multi-lingual
• Give me all actors starring in Batman
Begins
› Results are defined by an equivalent
SPARQL query
• Systems are free to return list of results or
a SPARQL query
45
Related entity recommendations Related
entities
Example user sessions
Spark(le) system for related entity recommendations
1. Knowledge Graph
› Filtering and enrichment
2. Feature extraction
› Query logs, Flickr, Twitter
3. MLR
4. Online/offline evaluation
› Point-wise assessments
› Side-by-side testing
› Online evaluation
5. Runtime
› Unary
• Popularity features from text: probability,
entropy, Wiki entity popularity …
• Graph features: PageRank on the entity
graph, Wikipedia, Web graph
• Type features: entity type
› Binary
• Co-occurrence features from text:
conditional probability, joint probability …
• Graph features: common neighbors …
• Type features: relation type
48
Roi Blanco, B. Barla Cambazoglu, Peter Mika, Nicolas Torzec: Entity Recommendations in Web Search. ISWC 2013
Beyond Web Search
49
Mobile search on the rise
 Information access on-the-go requires hands-free operation
› Driving, walking, gym, etc.
• Americans spend 540 hours a year in their cars [1] vs. 348 hours browsing the Web [2]
 ~50% of queries are coming from mobile devices (and growing)
› Changing habits, e.g. iPad usage peaks before bedtime
› Limitations in input/output
[1] http://answers.google.com/answers/threadview?id=392456
[2] http://articles.latimes.com/2012/jun/22/business/la-fi-tn-top-us-brands-news-web-sites-20120622
Mobile search: challenges and opportunities
51
 Interaction
› Question-answering
› Support for interactive retrieval
› Spoken-language access
› Task completion
 Contextualization
› Personalization
› Geo
› Context (work/home/travel)
• Try getaviate.com
Interactive, conversational voice search
 Parlance EU project
› Complex dialogs within a domain
• Requires complete semantic understanding
 Complete system (mixed license)
› Automated Speech Recognition (ASR)
› Spoken Language Understanding (SLU)
› Interaction Management
› Knowledge Base
› Natural Language Generation (NLG)
› Text-to-Speech (TTS)
 Video
Example dialogue
Components of a Spoken Dialog Systems (SDS)
Recognizer
(ASR)
Semantic
Decoder
Dialog
Control
Synthesizer
(TTS)
Message
Generator
User
Waveforms Words
Dialog
Acts
I want to find a
restaurant?
inform(task=find, entity=restaurant)
request(food)What kind of food
would you like?
The Web
• Currently limited domain
• Hand-crafted using rule-based parsers, template
generators and flowchart-based dialog control
• Expensive to build and fragile in operation
A Statistical Spoken Dialogue System
Bayesian
Belief
Network
Semantic
Decoder
Stochastic
Policy
Response
Generator
Ontology
inform(food=italian){0.6}
inform(food=indian) {0.2}
inform(area=east){0.1}
null(){0.1}
confirm(food=italian)
request(area)
Action
Reward Function
Rewards: success/fail
Reinforcement
Learning
Supervised Learning
Partially Observable Markov Decision Process (POMDP)
ASR
Evidence
Belief
State
Belief
Propagation
I want
an
Italian
You are looking for an Italian
restaurant? Whereabouts?
Id like italian {0.4}
I want an Italian {0.2}
Id like Indian{0.2}
In the east{0.1}
TTS
Ita Ind -
Food
N E S W
Area
Semantic Decoding
I’m looking for a place to eat – perhaps french.
Extract features
eg frequent N-grams
I’m looking
I’m looking for
for a place
place to eat
french
u-act = request
u-act = inform
entity=restaurant
entity=bar
entity=hotel
food=french
food=chinese
etc
Bank of binary classifiers
inform(entity=restaurant,
food=french) {0.5}
User Acts0.1
0.6
0.5
0.3
0.0
0.8
0.1
inform(entity=bar,
food=french) {0.3}
….
inform(entity=restaurant,
food=chinese) {0.1}
Belief State
oentity
gentity
uentity
Goal
User
Act
Observation
at time t
User
Behaviour
Recognition/
Understanding
Errors
task -> find(entity,method,…)
entity -> restaurant(food, ..)
entity -> bar(food, ..)
food = French, Italian, Indian, ..
ofood
gfood
ufood
NextTimeSlicet+1
Compile
Bayesian
Network
a
Ontology
Choosing the next action – the Policy
gentity gfood
inform(entity=bar) {0.4}
HB R Fr It In -
b
Feature
Extraction
summary
belief space
select(entity=bar,
entity=restaurant)
Sample
argmaxa{Q(b,a): a Î A}
Gaussian
Process
Q-Function
Approximation
Q(b, a) = E rt | b, a
t =t+1
T
å
é
ë
ê
ù
û
ú
{Q(b,a) : a Î A}
Large Scale Evaluation – Task Success Rates
Word Err Rate Conventional
Success Rate
POMDP System
Success Rate
Telephone 21% 84.6% 86.9%
Telephone +
noise
30% 75.2% 81.2%
In Car 29% 67.8% 75.8%
Success = finding the required information for a restaurant
which matches the supplied criteria
Note that user’s perceived success rate was ~10% higher!
Real
Users
Working
System
Scaling up to the Web
We can build a fully statistical spoken dialogue system for a specific
narrow domain – but how do we scale up too much broader domains?
CamInfo
Restaurant System
Crowd-sourced annotators
Data for input
output mapping
User simulator for
policy optimisation
Corpus Data for
model parameter
estimation
Domain
Ontology
Hand-crafted
input, output,
and model
parameters
Personal
Assistant
Corpus Data for
model parameter
estimation
Domain
Ontology
Unsupervised
learning
Fast on-line
reinforcement
learning
Wide
coverage
ontology
Real
Users
Conclusions
61
 Semantic Search
› Explicit understanding for queries and documents
through links to external knowledge
• Using methods of Information Extraction or
explicit annotations (markup) in webpages
• Semantic Web as a source of external knowledge
 Increasing level of understanding
› Early focus on entities and their attributes
• Applications in web search: rich results,
entity displays, entity recommendation
› Moving toward modeling intents/actions
› Adding human-like interaction
Q&A
 Many thanks to members of the Semantic Search team
at Yahoo Labs Barcelona and to Yahoos around the world
› Slides on POMDP-based dialogue systems courtesy of prof. Steve Young, UCAM
 Contact
› pmika@yahoo-inc.com
› @pmika
› http://www.slideshare.net/pmika/
› Ask about our internships and other opportunities

More Related Content

What's hot

An Introduction to Entities in Semantic Search
An Introduction to Entities in Semantic SearchAn Introduction to Entities in Semantic Search
An Introduction to Entities in Semantic SearchDavid Amerland
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Peter Mika
 
Making the Web Searchable - Keynote ICWE 2015
Making the Web Searchable - Keynote ICWE 2015Making the Web Searchable - Keynote ICWE 2015
Making the Web Searchable - Keynote ICWE 2015Peter Mika
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialBarbara Starr
 
Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Roi Blanco
 
Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011sssw2011
 
Wimmics Overview 2021
Wimmics Overview 2021Wimmics Overview 2021
Wimmics Overview 2021Fabien Gandon
 
Inference on the Semantic Web
Inference on the Semantic WebInference on the Semantic Web
Inference on the Semantic WebMyungjin Lee
 
Search Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your CustomersSearch Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your Customersrichwig
 
LD4 Wikidata Affinity Group - Shorthouse
LD4 Wikidata Affinity Group - ShorthouseLD4 Wikidata Affinity Group - Shorthouse
LD4 Wikidata Affinity Group - ShorthouseDavid Shorthouse
 
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlHaystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlOpenSource Connections
 
The Semantic Web
The Semantic WebThe Semantic Web
The Semantic Webostephens
 
Web science AI and IA
Web science AI and IAWeb science AI and IA
Web science AI and IAFabien Gandon
 
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Ig Bittencourt
 
2011 and still bruteforcing - OWASP Spain
2011 and still bruteforcing - OWASP Spain2011 and still bruteforcing - OWASP Spain
2011 and still bruteforcing - OWASP SpainChristian Martorella
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebMarina Santini
 
The Web We Mix - benevolent AIs for a resilient web
The Web We Mix - benevolent AIs for a resilient webThe Web We Mix - benevolent AIs for a resilient web
The Web We Mix - benevolent AIs for a resilient webFabien Gandon
 

What's hot (20)

An Introduction to Entities in Semantic Search
An Introduction to Entities in Semantic SearchAn Introduction to Entities in Semantic Search
An Introduction to Entities in Semantic Search
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
 
Making the Web Searchable - Keynote ICWE 2015
Making the Web Searchable - Keynote ICWE 2015Making the Web Searchable - Keynote ICWE 2015
Making the Web Searchable - Keynote ICWE 2015
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorial
 
Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations
 
Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011
 
Wimmics Overview 2021
Wimmics Overview 2021Wimmics Overview 2021
Wimmics Overview 2021
 
Semantic Web, e-commerce
Semantic Web, e-commerceSemantic Web, e-commerce
Semantic Web, e-commerce
 
Inference on the Semantic Web
Inference on the Semantic WebInference on the Semantic Web
Inference on the Semantic Web
 
Search Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your CustomersSearch Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your Customers
 
LD4 Wikidata Affinity Group - Shorthouse
LD4 Wikidata Affinity Group - ShorthouseLD4 Wikidata Affinity Group - Shorthouse
LD4 Wikidata Affinity Group - Shorthouse
 
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlHaystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
 
Tactical Information Gathering
Tactical Information GatheringTactical Information Gathering
Tactical Information Gathering
 
The Semantic Web
The Semantic WebThe Semantic Web
The Semantic Web
 
Web science AI and IA
Web science AI and IAWeb science AI and IA
Web science AI and IA
 
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)
 
2011 and still bruteforcing - OWASP Spain
2011 and still bruteforcing - OWASP Spain2011 and still bruteforcing - OWASP Spain
2011 and still bruteforcing - OWASP Spain
 
Introduction to RDA Part 1
Introduction to RDA Part 1Introduction to RDA Part 1
Introduction to RDA Part 1
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
 
The Web We Mix - benevolent AIs for a resilient web
The Web We Mix - benevolent AIs for a resilient webThe Web We Mix - benevolent AIs for a resilient web
The Web We Mix - benevolent AIs for a resilient web
 

Similar to Semantic search: from document retrieval to virtual assistants

From Queries to Answers in the Web
From Queries to Answers in the WebFrom Queries to Answers in the Web
From Queries to Answers in the WebRoi Blanco
 
Semantic Search keynote at CORIA 2015
Semantic Search keynote at CORIA 2015Semantic Search keynote at CORIA 2015
Semantic Search keynote at CORIA 2015Peter Mika
 
(Keynote) Peter Mika - “Making the Web Searchable”
(Keynote) Peter Mika - “Making the Web Searchable”(Keynote) Peter Mika - “Making the Web Searchable”
(Keynote) Peter Mika - “Making the Web Searchable”icwe2015
 
Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Roi Blanco
 
WTF is Semantic Web?
WTF is Semantic Web?WTF is Semantic Web?
WTF is Semantic Web?milesw
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopPeter Skomoroch
 
Semantic mark-up with schema.org: helping search engines understand the Web
Semantic mark-up with schema.org: helping search engines understand the WebSemantic mark-up with schema.org: helping search engines understand the Web
Semantic mark-up with schema.org: helping search engines understand the WebPeter Mika
 
Schema.org Structured data the What, Why, & How
Schema.org Structured data the What, Why, & HowSchema.org Structured data the What, Why, & How
Schema.org Structured data the What, Why, & HowRichard Wallis
 
Search Analytics for Content Strategists
Search Analytics for Content StrategistsSearch Analytics for Content Strategists
Search Analytics for Content StrategistsLouis Rosenfeld
 
Data Feed SEO for Affiliates by Will Critchlow
Data Feed SEO for Affiliates by Will CritchlowData Feed SEO for Affiliates by Will Critchlow
Data Feed SEO for Affiliates by Will Critchlowauexpo Conference
 
Bill Slawski SEO and the New Search Results
Bill Slawski   SEO and the New Search ResultsBill Slawski   SEO and the New Search Results
Bill Slawski SEO and the New Search ResultsBill Slawski
 
Organisational Identifiers at OpenCon satellite event, Oxford 2016
Organisational Identifiers at OpenCon satellite event, Oxford 2016Organisational Identifiers at OpenCon satellite event, Oxford 2016
Organisational Identifiers at OpenCon satellite event, Oxford 2016Crossref
 
The Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewThe Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewNeo4j
 
Big Data Ecosystem for Data-Driven Decision Making
Big Data Ecosystem for Data-Driven Decision MakingBig Data Ecosystem for Data-Driven Decision Making
Big Data Ecosystem for Data-Driven Decision MakingAbzetdin Adamov
 
Entity Search: The Last Decade and the Next
Entity Search: The Last Decade and the NextEntity Search: The Last Decade and the Next
Entity Search: The Last Decade and the Nextkrisztianbalog
 
YQL:: Select * from Internet
YQL:: Select * from InternetYQL:: Select * from Internet
YQL:: Select * from Internetdrgath
 
YQL: Select * from Internet
YQL: Select * from InternetYQL: Select * from Internet
YQL: Select * from Internetdrgath
 
Semantic Web and Schema.org
Semantic Web and Schema.orgSemantic Web and Schema.org
Semantic Web and Schema.orgrvguha
 

Similar to Semantic search: from document retrieval to virtual assistants (20)

From Queries to Answers in the Web
From Queries to Answers in the WebFrom Queries to Answers in the Web
From Queries to Answers in the Web
 
Semantic Search keynote at CORIA 2015
Semantic Search keynote at CORIA 2015Semantic Search keynote at CORIA 2015
Semantic Search keynote at CORIA 2015
 
(Keynote) Peter Mika - “Making the Web Searchable”
(Keynote) Peter Mika - “Making the Web Searchable”(Keynote) Peter Mika - “Making the Web Searchable”
(Keynote) Peter Mika - “Making the Web Searchable”
 
Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Mining Web content for Enhanced Search
Mining Web content for Enhanced Search
 
Searching Online
Searching OnlineSearching Online
Searching Online
 
WTF is Semantic Web?
WTF is Semantic Web?WTF is Semantic Web?
WTF is Semantic Web?
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With Hadoop
 
Semantic mark-up with schema.org: helping search engines understand the Web
Semantic mark-up with schema.org: helping search engines understand the WebSemantic mark-up with schema.org: helping search engines understand the Web
Semantic mark-up with schema.org: helping search engines understand the Web
 
Information Update Feb 2008
Information Update Feb  2008Information Update Feb  2008
Information Update Feb 2008
 
Schema.org Structured data the What, Why, & How
Schema.org Structured data the What, Why, & HowSchema.org Structured data the What, Why, & How
Schema.org Structured data the What, Why, & How
 
Search Analytics for Content Strategists
Search Analytics for Content StrategistsSearch Analytics for Content Strategists
Search Analytics for Content Strategists
 
Data Feed SEO for Affiliates by Will Critchlow
Data Feed SEO for Affiliates by Will CritchlowData Feed SEO for Affiliates by Will Critchlow
Data Feed SEO for Affiliates by Will Critchlow
 
Bill Slawski SEO and the New Search Results
Bill Slawski   SEO and the New Search ResultsBill Slawski   SEO and the New Search Results
Bill Slawski SEO and the New Search Results
 
Organisational Identifiers at OpenCon satellite event, Oxford 2016
Organisational Identifiers at OpenCon satellite event, Oxford 2016Organisational Identifiers at OpenCon satellite event, Oxford 2016
Organisational Identifiers at OpenCon satellite event, Oxford 2016
 
The Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewThe Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j Overview
 
Big Data Ecosystem for Data-Driven Decision Making
Big Data Ecosystem for Data-Driven Decision MakingBig Data Ecosystem for Data-Driven Decision Making
Big Data Ecosystem for Data-Driven Decision Making
 
Entity Search: The Last Decade and the Next
Entity Search: The Last Decade and the NextEntity Search: The Last Decade and the Next
Entity Search: The Last Decade and the Next
 
YQL:: Select * from Internet
YQL:: Select * from InternetYQL:: Select * from Internet
YQL:: Select * from Internet
 
YQL: Select * from Internet
YQL: Select * from InternetYQL: Select * from Internet
YQL: Select * from Internet
 
Semantic Web and Schema.org
Semantic Web and Schema.orgSemantic Web and Schema.org
Semantic Web and Schema.org
 

More from Peter Mika

Hackathon s pb
Hackathon s pbHackathon s pb
Hackathon s pbPeter Mika
 
Investigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log AnalysisInvestigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log AnalysisPeter Mika
 
Making the Web searchable
Making the Web searchableMaking the Web searchable
Making the Web searchablePeter Mika
 
SemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialSemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialPeter Mika
 
Publishing data on the Semantic Web
Publishing data on the Semantic WebPublishing data on the Semantic Web
Publishing data on the Semantic WebPeter Mika
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011Peter Mika
 
Semantic Search Summer School2009
Semantic Search Summer School2009Semantic Search Summer School2009
Semantic Search Summer School2009Peter Mika
 
Year of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkeyYear of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkeyPeter Mika
 
Semantic Web Austin Yahoo
Semantic Web Austin YahooSemantic Web Austin Yahoo
Semantic Web Austin YahooPeter Mika
 

More from Peter Mika (9)

Hackathon s pb
Hackathon s pbHackathon s pb
Hackathon s pb
 
Investigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log AnalysisInvestigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log Analysis
 
Making the Web searchable
Making the Web searchableMaking the Web searchable
Making the Web searchable
 
SemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialSemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorial
 
Publishing data on the Semantic Web
Publishing data on the Semantic WebPublishing data on the Semantic Web
Publishing data on the Semantic Web
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011
 
Semantic Search Summer School2009
Semantic Search Summer School2009Semantic Search Summer School2009
Semantic Search Summer School2009
 
Year of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkeyYear of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkey
 
Semantic Web Austin Yahoo
Semantic Web Austin YahooSemantic Web Austin Yahoo
Semantic Web Austin Yahoo
 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Semantic search: from document retrieval to virtual assistants

  • 1. Semantic search: from document retrieval to virtual assistants P R E S E N T E D B Y P e t e r M i k a , S r . R e s e a r c h S c i e n t i s t , Y a h o o L a b s ⎪ J u n e 1 9 , 2 0 1 4
  • 2. Agenda 2  Invite  What is Semantic Search?  Applications to Web search › Enhanced results › Entity retrieval and recommendations  Beyond Web search
  • 3. Yahoo Labs Barcelona  Established January, 2006 › Part of a global network of Labs in Sunnyvale, New York, Barcelona, Haifa, Bangalore, Beijing, Santiago  Led by Ricardo Baeza-Yates  Research areas › Distributed Systems › Semantic Search › Social Media › Web Mining › Web Retrieval
  • 4. Semantic Search Research Jordi Atserias Sr. Research Engineer Roi Blanco Sr. Research Scientist Hugues Bouchard Sr. Research Engineer Peter Mika Sr. Research Scientist Manager Tim Potter Research Engineer Edgar Meij Research Scientist
  • 5. What is Semantic Search? 5
  • 6. Search is really fast, without necessarily being intelligent
  • 7. Why Semantic Search?  Improvements in IR are harder and harder to come by › Basic relevance models are well established › Machine learning using hundreds of features › Heavy investment in computational power, e.g. real-time indexing and instant search  Remaining challenges are not computational, but in modeling user cognition › Could Watson explain why the answer is Toronto? › Need a deeper understanding of the query, the content and the relationship of the two
  • 8.  Semantic gap › Ambiguity • jaguar • paris hilton › Secondary meaning • george bush (and I mean the beer brewer in Arizona) › Subjectivity • reliable digital camera • paris hilton sexy › Imprecise or overly precise searches • jim hendler  Complex needs › Missing information • brad pitt zombie • florida man with 115 guns • 35 year old computer scientist living in barcelona › Category queries • countries in africa • barcelona nightlife › Transactional or computational queries • 120 dollars in euros • digital camera under 300 dollars • world temperature in 2020 Poorly solved information needs remain Are there even true keyword queries? Users may have stopped asking them
  • 10. What it’s like to be a machine? Roi Blanco
  • 11. What it’s like to be a machine? ↵⏏☐ģ ✜Θ♬♬ţğ√∞§®ÇĤĪ✜★♬☐✓✓ ţğ★✜ ✪✚✜ΔΤΟŨŸÏĞÊϖυτρ℠≠⅛⌫ ≠=⅚©§★✓♪ΒΓΕ℠ ✖Γ♫⅜±⏎↵⏏☐ģğğğμλκσςτ ⏎⌥°¶§ΥΦΦΦ✗✕☐
  • 12.  Def. Semantic Search is any retrieval method where › User intent and resources are represented in a semantic model • A set of concepts or topics that generalize over tokens/phrases • Additional structure such as a hierarchy among concepts, relationships among concepts etc. › Semantic representations of the query and the user intent are exploited in some part of the retrieval process  As a research field › Workshops • ESAIR (2008-2014) at CIKM, Semantic Search (SemSearch) workshop series (2008-2011) at ESWC/WWW, EOS workshop (2010-2011) at SIGIR, JIWES workshop (2012) at SIGIR, Semantic Search Workshop (2011-2014) at VLDB › Special Issues of journals › Surveys • Christos L. Koumenides, Nigel R. Shadbolt: Ranking methods for entity- oriented semantic web search. JASIST 65(6): 1091-1106 (2014) 12 Semantic Search
  • 13. Semantic models: implicit vs. explicit 13  Implicit/internal semantics › Models of text extracted from a corpus of queries, documents or interaction logs • Query reformulation, term dependency models, translation models, topic models, latent space models, learning to match (PLS) › See • Hang Li and Jun Xu: Semantic Matching in Search. Foundations and Trends in Information Retrieval Vol 7 Issue 5, 2013, pp 343-469  Explicit/external semantics › Explicit linguistic or ontological structures extracted from text and linked to external knowledge › Obtained using IE techniques or acquired from Semantic Web markup
  • 14. Semantic Search – a process view Query Constructi on •Keywords •Forms •NL •Formal language Query Processin g •IR-style matching & ranking •DB-style precise matching •KB-style matching & inferences Result Presentation •Query visualization •Document and data presentation •Summarization Query Refinement •Implicit feedback •Explicit feedback •Incentives Document Representation Knowledge Representation Semantic Models Resources Documents
  • 15. What it’s like to be a machine? ↵⏏☐ģ ✜Θ♬♬ţğ√∞§®ÇĤĪ✜★♬☐✓✓ ţğ★✜ ✪✚✜ΔΤΟŨŸÏĞÊϖυτρ℠≠⅛⌫ ≠=⅚©§★✓♪ΒΓΕ℠ ✖Γ♫⅜±⏎↵⏏☐ģğğğμλκσςτ ⏎⌥°¶§ΥΦΦΦ✗✕☐
  • 16. What it’s like to be a machine? <roi>↵⏏☐ģ</roi> ✜Θ♬♬ţğ√∞§®ÇĤĪ✜★♬☐✓✓ ţğ★✜ ✪✚✜ΔΤΟŨŸÏĞÊϖυτρ℠≠⅛⌫ ≠=⅚©§★✓♪ΒΓΕ℠ ✖Γ♫⅜±<roi>⏎↵⏏☐ģ</roi>ğğğμλκσςτ ⏎⌥°¶§ΥΦΦΦ✗✕☐ <roi>
  • 17. Information Extraction 17  Documents › Natural language • Named Entity Recognition & Disambiguation (“entity linking”) • Deep parsing (dependency parsing) › Specific to the Web • Extraction from web tables, wrapper induction etc. • Open Information Extraction such as NELL, ReVerb etc.  Queries › Short text and no structure… nothing to do?
  • 18. Information Extraction on queries 18  Entities play an important role › ~70% of queries contain a named entity (entity mention queries) and ~50% of queries have an entity focus (entity seeking queries) • brad pitt attacked by fans › ~10% of queries are looking for a class of entities • brad pitt movies › See • Jeffrey Pound, Peter Mika, Hugo Zaragoza: Ad-hoc object retrieval in the web of data. WWW 2010: 771-780 • Thomas Lin, Patrick Pantel, Michael Gamon, Anitha Kannan, Ariel Fuxman: Active objects: actions for entity-centric search. WWW 2012: 589-598
  • 19. Information Extraction on queries 19  Common structure to entity mention queries: query = <entity> + <intent> › Intent is typically an additional word or phrase to • Disambiguate, e.g. brad pitt actor • Specify action or aspect e.g. brad pitt net worth, brad pitt download  Useful also in off-line query log analysis › Reduce the sparsity of query log data by mapping entities and intents to a reference base of entities and intents
  • 20. oakland as bradd pitt movie moneyball movies.yahoo.com oakland as wikipedia.org captain america movies.yahoo.com moneyball trailer movies.yahoo.com money moneyball movies.yahoo.com moneyball movies.yahoo.com movies.yahoo.com en.wikipedia.org movies.yahoo.com peter brand peter brand oakland nymag.com moneyball the movie www.imdb.com moneyball trailer movies.yahoo.com moneyball trailer brad pitt brad pitt moneyball brad pitt moneyball movie brad pitt moneyball brad pitt moneyball oscar www.imdb.com relay for life calvert ocunty www.relayforlife.org trailer for moneyball movies.yahoo.com moneyball.movie-trailer.com moneyball en.wikipedia.org movies.yahoo.com map of africa www.africaguide.com money ball movie www.imdb.com money ball movie trailer moneyball.movie-trailer.com brad pitt new www.zimbio.com www.usaweekend.com www.ivillage.com www.ivillage.com brad pitt news news.search.yahoo.com moneyball trailer moneyball trailer www.imdb.com www.imdb.com Patterns in logs are hard to see  Sample of sessions from June, 2011 containing the term “moneyball” › What are users trying to do?
  • 21. oakland as bradd pitt movie moneyball trailer movies.yahoo.com oakland as wikipedia.org Semantic annotations help to generalize… Sports team Movie Actor
  • 22. … and understand user needs 6/19/201422 moneyball trailer what the user wants to do with it Movie Object of the query
  • 23. Information extraction on queries 23  Entity linking › Tutorial: Entity Linking and Retrieval by Edgar Meij, Krisztián Balog and Daan Odijk › Dataset for evaluation of entity linking (2013) • Yahoo WebScope dataset L24 - Yahoo Search Query Log To Entities, version 1.0  Semantic annotation for query log analysis › Frequent pattern mining on raw queries fails due to large amount of noise › Meaningful patterns start to emerge when mining the semantic annotations instead › Laura Hollink, Peter Mika, Roi Blanco: Web usage mining with semantic analysis. WWW 2013: 561-570
  • 24. Semantic Web 24  Significant extension of the Web stack › Languages for publishing raw data and document annotations › Standards for querying, validating and reasoning with data distributed across the Web  Research community formed around 2001 › ISWC, ESWC, WWW Semantic Web Track, JWS  Conflicted history with Information Retrieval › Misplaced expectations as to what the Semantic Web will bring › Building the chicken farm before any chickens or eggs  Since 2007 more solid progress in adoption › Metadata in HTML › Public and private ‘Knowledge Graphs’
  • 25. Metadata in HTML: schema.org 25  Agreement on a shared set of schemas for common types of web content › Bing, Google, and Yahoo! as initial founders (June, 2011), joined by Yandex later › Similar in intent to sitemaps.org • Use a single format to communicate the same information to all three search engines <div vocab="http://schema.org/" typeof="Movie"> <h1 property="name">Pirates of the Carribean: On Stranger Tides (2011)</h1> <span property="description">Jack Sparrow and Barbossa embark on a quest to find the elusive fountain of youth, only to discover that Blackbeard and his daughter are after it too.</span> Director: <div property="director” typeof="Person"> <span property="name">Rob Marshall</span> </div> </div>
  • 26. Substantial adoption of schema.org markup 26  Over 15% of all pages now have schema.org markup  Over 5 million sites, over 25 billion entity references  In other words: same order of magnitude as the web › Source: R.V. Guha: Light at the end of the tunnel, ISWC 2013 keynote  See also › P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus, LDOW 2012 • Based on Bing US corpus • 31% of webpages, 5% of domains contain some metadata (including Facebook’s OGP) › WebDataCommons • Based on CommonCrawl Nov 2013 • 26% of webpages, 14% of domains contain some metadata (including Facebook’s OGP)
  • 27. Knowledge Graphs 27  Linked (Open) Data (linkeddata.org) › Public movement for making open/public databases • available in standard Semantic Web formats • interlinking them › Dbpedia is a central hub in this network of datasets • Software framework to extract structured data from Wikipedia and consolidate it under a common ontology • The resulting dataset that contains links to Freebase and others – Freebase links to IMDB and so on  Basis for private Knowledge Graphs › Bing, Google, Yahoo
  • 28. Yahoo’s Knowledge Graph Chicago Cubs Chicago Barack Obama Carlos Zambrano 10% off tickets for plays for plays in lives in Brad Pitt Angelina Jolie Steven Soderbergh George Clooney Ocean’s Twelve partner directs casts in E/R casts in takes place in Fight Club casts in Dust Brothers casts in music by Nicolas Torzec: Making knowledge reusable at Yahoo!: a Look at the Yahoo! Knowledge Base (SemTech 2013)
  • 29. Building Yahoo’s Knowledge Graph  Ontology building and maintenance › Editorially maintained OWL ontology with 300+ classes › Covering the domains of interest of Yahoo  Information extraction › Public datasets and proprietary data  Data fusion › Manual mapping from the source schemas to the ontology › Supervised entity reconciliation • Kedar Bellare, Carlo Curino, Ashwin Machanavajihala, Peter Mika, Mandar Rahurkar, Aamod Sane: WOO: A Scalable and Multi-tenant Platform for Continuous Knowledge Base Synthesis. PVLDB 2013 • Michael J. Welch, Aamod Sane, Chris Drome: Fast and accurate incremental entity resolution relative to an entity knowledge base. CIKM 2012 › Editorial curation and quality assessment
  • 30.
  • 31.
  • 32.
  • 33. Applications in Web Search 33
  • 34. Semantic Search for… 34  Improving ad-hoc document retrieval › Query composition › Result presentation › Matching › Ranking  Providing new search functionality › Entity retrieval • Related entity recommendation › Personalization › Question-answering › Task completion
  • 35. Exploiting Semantic Web markup (internal prototype, 2007) Personal and private homepage of the same person (clear from the snippet but it could be also automatically de-duplicated) Conferences he plans to attend and his vacations from homepage plus bio events from LinkedIn Geolocation
  • 36. Search snippets using Semantic Web markup  Summarization of HTML is a hard task • Template detection • Selecting relevant snippets • Composing readable text › Efficiency constraints  Yahoo SearchMonkey (2008) › Enhanced results using structured data from the page • Key/value pairs • Deep links • Image or Video
  • 37. Effectiveness of enhanced results  Explicit user feedback › Side-by-side editorial evaluation (A/B testing) • Editors are shown a traditional search result and enhanced result for the same page • Users prefer enhanced results in 84% of the cases and traditional results in 3% (N=384)  Implicit user feedback › Click-through rate analysis • Long dwell time limit of 100s (Ciemiewicz et al. 2010) • 15% increase in ‘good’ clicks › User interaction model • Enhanced results lead users to relevant documents – even though less likely to clicked than textual results • Enhanced results effectively reduce bad clicks!  See › Kevin Haas, Peter Mika, Paul Tarjan, Roi Blanco: Enhanced results for web search. SIGIR 2011: 725-734
  • 38. Enhanced results at other search providers  Google announces Rich Snippets - June, 2009 › Faceted search for recipes - Feb, 2011  Bing tiles – Feb, 2011  Facebook’s Like button and the Open Graph Protocol (2010) › Shows up in profiles and news feed › Site owners can later reach users who have liked an object
  • 39. Moving beyond entity markup 39  We would like to help our users in task completion › But we have trained our users to talk in nouns • Retrieval performance decreases by adding verbs to queries › Markup for actions/intents could potentially help  Modeling actions › Understand what actions can be taken on a page › Help users in mapping their query to potential actions › Applications in web search, email etc. THING THING Schema.org v1.2 including Actions vocabulary published April 16, 2014
  • 40. Applications of Actions markup Email (Gmail) SERP (Yandex)
  • 41.  Entity retrieval › Which entity does a keyword query refer to, if any?  Related entities for navigation › Which entity would the user visit next? Entity displays in web search
  • 42. Entity Retrieval  Keyword search over entity graphs › see Pound et al. WWW08 for a definition › No common benchmark until 2010  SemSearch Challenge 2010/2011 • 50 entity-mention queries Selected from the Search Query Tiny Sample v1.0 dataset (Yahoo! Webscope) • Billion Triples Challenge 2009 data set • Evaluation using Mechanical Turk › See report: • Roi Blanco, Harry Halpin, Daniel M. Herzig, Peter Mika, Jeffrey Pound, Henry S. Thompson, Thanh Tran: Repeatable and reliable semantic search evaluation. J. Web Sem. 21: 14-29 (2013)
  • 43. Glimmer: open-source entity retrieval engine from Yahoo  Extension of MG4J from University of Milano  Indexing of RDF data › MapReduce-based › Horizontal indexing (subject/predicate/object fields) › Vertical indexing (one field per predicate)  Retrieval › BM25F with machine-learned weights for properties and domains › 52% improvement over the best system in SemSearch 2010  See › Roi Blanco, Peter Mika, Sebastiano Vigna: Effective and Efficient Entity Search in RDF Data. International Semantic Web Conference (1) 2011: 83-97 › https://github.com/yahoo/Glimmer/
  • 44. Other evaluations in Entity Retrieval  TREC Entity Track › 2009-2011 › Data • ClueWeb 09 collection › Queries • Related Entity Finding – Entities related to a given entity through a particular relationship – (Homepages of) airlines that fly Boeing 747 • Entity List Completion – Given some elements of a list of entities, complete the list  Professional sports teams in Philadelphia such as the Philadelphia Wings, … › Relevance assessments provided by TREC assessors  Question Answering over Linked Data › 2011-2014 › Data • Dbpedia and MusicBrainz in RDF › Queries • Full natural language questions of different forms, written by the organizers • Multi-lingual • Give me all actors starring in Batman Begins › Results are defined by an equivalent SPARQL query • Systems are free to return list of results or a SPARQL query 45
  • 45. Related entity recommendations Related entities
  • 47. Spark(le) system for related entity recommendations 1. Knowledge Graph › Filtering and enrichment 2. Feature extraction › Query logs, Flickr, Twitter 3. MLR 4. Online/offline evaluation › Point-wise assessments › Side-by-side testing › Online evaluation 5. Runtime › Unary • Popularity features from text: probability, entropy, Wiki entity popularity … • Graph features: PageRank on the entity graph, Wikipedia, Web graph • Type features: entity type › Binary • Co-occurrence features from text: conditional probability, joint probability … • Graph features: common neighbors … • Type features: relation type 48 Roi Blanco, B. Barla Cambazoglu, Peter Mika, Nicolas Torzec: Entity Recommendations in Web Search. ISWC 2013
  • 49. Mobile search on the rise  Information access on-the-go requires hands-free operation › Driving, walking, gym, etc. • Americans spend 540 hours a year in their cars [1] vs. 348 hours browsing the Web [2]  ~50% of queries are coming from mobile devices (and growing) › Changing habits, e.g. iPad usage peaks before bedtime › Limitations in input/output [1] http://answers.google.com/answers/threadview?id=392456 [2] http://articles.latimes.com/2012/jun/22/business/la-fi-tn-top-us-brands-news-web-sites-20120622
  • 50. Mobile search: challenges and opportunities 51  Interaction › Question-answering › Support for interactive retrieval › Spoken-language access › Task completion  Contextualization › Personalization › Geo › Context (work/home/travel) • Try getaviate.com
  • 51. Interactive, conversational voice search  Parlance EU project › Complex dialogs within a domain • Requires complete semantic understanding  Complete system (mixed license) › Automated Speech Recognition (ASR) › Spoken Language Understanding (SLU) › Interaction Management › Knowledge Base › Natural Language Generation (NLG) › Text-to-Speech (TTS)  Video
  • 53. Components of a Spoken Dialog Systems (SDS) Recognizer (ASR) Semantic Decoder Dialog Control Synthesizer (TTS) Message Generator User Waveforms Words Dialog Acts I want to find a restaurant? inform(task=find, entity=restaurant) request(food)What kind of food would you like? The Web • Currently limited domain • Hand-crafted using rule-based parsers, template generators and flowchart-based dialog control • Expensive to build and fragile in operation
  • 54. A Statistical Spoken Dialogue System Bayesian Belief Network Semantic Decoder Stochastic Policy Response Generator Ontology inform(food=italian){0.6} inform(food=indian) {0.2} inform(area=east){0.1} null(){0.1} confirm(food=italian) request(area) Action Reward Function Rewards: success/fail Reinforcement Learning Supervised Learning Partially Observable Markov Decision Process (POMDP) ASR Evidence Belief State Belief Propagation I want an Italian You are looking for an Italian restaurant? Whereabouts? Id like italian {0.4} I want an Italian {0.2} Id like Indian{0.2} In the east{0.1} TTS Ita Ind - Food N E S W Area
  • 55. Semantic Decoding I’m looking for a place to eat – perhaps french. Extract features eg frequent N-grams I’m looking I’m looking for for a place place to eat french u-act = request u-act = inform entity=restaurant entity=bar entity=hotel food=french food=chinese etc Bank of binary classifiers inform(entity=restaurant, food=french) {0.5} User Acts0.1 0.6 0.5 0.3 0.0 0.8 0.1 inform(entity=bar, food=french) {0.3} …. inform(entity=restaurant, food=chinese) {0.1}
  • 56. Belief State oentity gentity uentity Goal User Act Observation at time t User Behaviour Recognition/ Understanding Errors task -> find(entity,method,…) entity -> restaurant(food, ..) entity -> bar(food, ..) food = French, Italian, Indian, .. ofood gfood ufood NextTimeSlicet+1 Compile Bayesian Network a Ontology
  • 57. Choosing the next action – the Policy gentity gfood inform(entity=bar) {0.4} HB R Fr It In - b Feature Extraction summary belief space select(entity=bar, entity=restaurant) Sample argmaxa{Q(b,a): a Î A} Gaussian Process Q-Function Approximation Q(b, a) = E rt | b, a t =t+1 T å é ë ê ù û ú {Q(b,a) : a Î A}
  • 58. Large Scale Evaluation – Task Success Rates Word Err Rate Conventional Success Rate POMDP System Success Rate Telephone 21% 84.6% 86.9% Telephone + noise 30% 75.2% 81.2% In Car 29% 67.8% 75.8% Success = finding the required information for a restaurant which matches the supplied criteria Note that user’s perceived success rate was ~10% higher!
  • 59. Real Users Working System Scaling up to the Web We can build a fully statistical spoken dialogue system for a specific narrow domain – but how do we scale up too much broader domains? CamInfo Restaurant System Crowd-sourced annotators Data for input output mapping User simulator for policy optimisation Corpus Data for model parameter estimation Domain Ontology Hand-crafted input, output, and model parameters Personal Assistant Corpus Data for model parameter estimation Domain Ontology Unsupervised learning Fast on-line reinforcement learning Wide coverage ontology Real Users
  • 60. Conclusions 61  Semantic Search › Explicit understanding for queries and documents through links to external knowledge • Using methods of Information Extraction or explicit annotations (markup) in webpages • Semantic Web as a source of external knowledge  Increasing level of understanding › Early focus on entities and their attributes • Applications in web search: rich results, entity displays, entity recommendation › Moving toward modeling intents/actions › Adding human-like interaction
  • 61. Q&A  Many thanks to members of the Semantic Search team at Yahoo Labs Barcelona and to Yahoos around the world › Slides on POMDP-based dialogue systems courtesy of prof. Steve Young, UCAM  Contact › pmika@yahoo-inc.com › @pmika › http://www.slideshare.net/pmika/ › Ask about our internships and other opportunities