SlideShare a Scribd company logo
1 of 33
Semantic Search: from document
retrieval to Virtual Assistants
P R E S E N T E D B Y P e t e r M i k a , D i r e c t o r o f R e s e a r c h , Y a h o o L a b s ⎪ M a r c h 2 0 , 2 0 1 5
The Semantic Web (2001-)
3/21/20152
 Part of Tim Berners-Lee’s
original proposal for the Web
 Beginning of a research community
› Formal ontology
› Logical reasoning
› Agents, web services
 Rough start in deployment
› Misplaced expectations
› Lack of adoption
 The Semantic Web, May 2001
 “At the doctor's office, Lucy instructed her
Semantic Web agent through her handheld Web
browser. The agent promptly retrieved
information about Mom's prescribed treatment
from the doctor's agent, looked up several lists
of providers, and checked for the ones in-plan
for Mom's insurance within a 20-mile radius of
her home and with a rating of excellent or very
good on trusted rating services. It then began
trying to find a match between available
appointment times (supplied by the agents of
individual providers through their Web sites) and
Pete's and Lucy's busy schedules.”
 (The emphasized keywords indicate terms
whose semantics, or meaning, were defined for
the agent through the Semantic Web.)
3/21/20153
Misplaced expectations?
Lack of adoption
 Standardization ahead of adoption
› URI, RDF, RDF/XML, RDFa, JSON-LD,
OWL, RIF, SPARQL, OWL-S, POWDER …
 Chicken and egg problem
› No users/use cases, hence no data
› No data, because no users/use cases
 By 2007, some modest progress
› Metadata in HTML: microformats
› Linked Data: simplifying the stack
Web search by 2007
5
 Large classes of queries are solved to perfection
 Improvements in web search are harder and harder to come by
› Relevance models, hyperlink structure and interaction data
› Combination of features using machine learning
› Heavy investment in computational power
• real-time indexing, instant search, datacenters and edge services
 Language issues
› Multiple interpretations
• jaguar
• paris hilton
› Secondary meaning
• george bush (and I mean the beer brewer
in Arizona)
› Subjectivity
• reliable digital camera
• paris hilton sexy
› Imprecise or overly precise searches
• jim hendler
 Complex needs
› Missing information
• brad pitt zombie
• florida man with 115 guns
• 35 year old computer scientist living in
barcelona
› Category queries
• countries in africa
• barcelona nightlife
› Transactional or computational queries
• 120 dollars in euros
• digital camera under 300 dollars
• world temperature in 2020
Poorly solved information needs remain
Many of these queries would
not be asked by users, who
learned over time what search
technology can and can not
do.
Web search by 2007
7
 Are there even any true keyword queries?
› Lyrics, quotes and bugs… anything else?
 Remaining challenges are not computational, but in modeling user
cognition
› Need a deeper understanding of the query, the content and/or the world at large
Microsearch internal prototype (2007)
Personal and
private
homepage
of the same
person
(clear from the
snippet but it
could be also
automatically
de-duplicated)
Conferences
he plans to attend
and his vacations
from homepage
plus bio events
from LinkedIn
Geolocation
Enhanced Results
 Computing abstracts is hard
› Summarization of HTML
• Template detection
• Selecting relevant snippets
• Composing readable text
› Efficiency constraints
 Structured data to replace or complement text summary
› Key/value pairs
› Deep links
› Image or Video
Yahoo SearchMonkey (2008)
1. Extract structured data
› Semantic Web markup
• Example:
<span property=“vcard:city”>Santa Clara</span>
<span property=“vcard:region”>CA</span>
› Information Extraction
2. Presentation
› Fixed presentation templates
• One template per object type
› Applications
• Third-party modules to display data (SearchMonkey)
Effectiveness of enhanced results
 Explicit user feedback
› Side-by-side editorial evaluation (A/B testing)
• Editors are shown a traditional search result and enhanced result for the same page
• Users prefer enhanced results in 84% of the cases and traditional results in 3% (N=384)
 Implicit user feedback
› Click-through rate analysis
• Long dwell time limit of 100s (Ciemiewicz et al. 2010)
• 15% increase in ‘good’ clicks
› User interaction model
• Enhanced results lead users to relevant documents (IV) even though less likely to clicked than
textual (III)
• Enhanced results effectively reduce bad clicks!
 See
› Kevin Haas, Peter Mika, Paul Tarjan, Roi Blanco: Enhanced results for web search. SIGIR
2011: 725-734
Adoption among consumers of web content
 Google announces Rich Snippets - June, 2009
› Faceted search for recipes - Feb, 2011
 Bing tiles – Feb, 2011
 Facebook’s Like button and the Open Graph Protocol (2010)
› Shows up in profiles and news feed
› Site owners can later reach users who have liked an object
schema.org
 Agreement on a shared set of schemas for common types of web
content
› Bing, Google, and Yahoo! as initial founders (June, 2011)
• Yandex joins schema.org in Nov, 2011
› Similar in intent to sitemaps.org
• Use a single format to communicate the same information to all three search engines
 schema.org covers areas of interest to all search engines
› Business listings (local), creative works (video), recipes, reviews and more
› Microdata, RDFa, JSON-LD syntax
 Collaborative effort
› Growing number of 3rd party contributions
› schema.org discussions at public-vocabs@w3.org
Adoption among publishers of content
 R.V. Guha: Light at the end of the tunnel (ISWC 2013 keynote)
› Over 15% of all pages now have schema.org markup
› Over 5 million sites, over 25 billion entity references
› In other words
• Same order of magnitude as the web
 See also
› P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus, LDOW 2012
• Based on Bing US corpus
• 31% of webpages, 5% of domains contain some metadata
› WebDataCommons
• Based on CommonCrawl Nov 2013
• 26% of webpages, 14% of domains contain some metadata
Semantic Search at Yahoo
15
Yahoo’s Knowledge Graph
Chicago Cubs
Chicago
Barack Obama
Carlos Zambrano
10% off tickets
for
plays for
plays in
lives in
Brad Pitt
Angelina Jolie
Steven Soderbergh
George Clooney
Ocean’s Twelve
partner
directs
casts in
E/R
casts
in
takes place in
Fight Club
casts in
Dust Brothers
casts
in
music by
Nicolas Torzec: Making knowledge reusable at Yahoo!:
a Look at the Yahoo! Knowledge Base (SemTech 2013)
Information extraction and reconciliation
 Information extraction
› Automated information extraction
• e.g. wrapper induction
› Metadata from HTML pages
• Focused crawler
› Public datasets (e.g. Dbpedia)
› Proprietary data
 Data fusion
› Manual mapping from the source schemas to the
ontology
› Supervised entity reconciliation
• Kedar Bellare, Carlo Curino, Ashwin
Machanavajihala, Peter Mika, Mandar Rahurkar,
Aamod Sane:
WOO: A Scalable and Multi-tenant Platform for
Continuous Knowledge Base Synthesis. PVLDB 2013
• Michael J. Welch, Aamod Sane, Chris Drome: Fast
and accurate incremental entity resolution relative to
an entity knowledge base. CIKM 2012
 Ontology management
› Editorially maintained OWL ontology with 300+
classes
› Covering the domains of interest of Yahoo
 Curation and quality assessment
› Editors and user feedback still play a large role
Semantic Search
 Active research field at the intersection of IR, NLP, DB and SemWeb
› ESAIR at SIGIR, SemSearch at ESWC/WWW, EOS and JIWES at SIGIR, Semantic Search
at VLDB
 Exploiting semantic understanding in the retrieval process
› User intent and resources are represented using semantic models
• Not just symbolic representations
› Semantic models are exploited in the matching and ranking of resources
 Tasks
› information extraction
› information reconciliation/tracking
› query understanding
› retrieving/ranking entities/attributes/relations
› result presentation
Semantic Search – a process view
Query
Constructi
on
•Keywords
•Forms
•NL
•Formal language
Query
Processin
g
•IR-style matching & ranking
•DB-style precise matching
•KB-style matching & inferences
Result
Presentation
•Query visualization
•Document and data presentation
•Summarization
Query
Refinement
•Implicit feedback
•Explicit feedback
•Incentives
Document Representation
Knowledge Representation
Semantic Models
Resources
Documents
Semantic understanding
23
 Documents
› Text in general
• Exploiting natural language structure and semantic coherence
› Specific to the Web
• Exploiting structure of web pages, e.g. annotation of web tables
 Queries
› Short text and no structure… nothing to do?
Semantic understanding of queries
24
 Entities play an important role
› [Pound et al, WWW 2010], [Lin et al WWW 2012]
› ~70% of queries contain a named entity (entity mention queries)
• brad pitt height
› ~50% of queries have an entity focus (entity seeking queries)
• brad pitt attacked by fans
› ~10% of queries are looking for a class of entities
• brad pitt movies
 Entity mention query = <entity> {+ <intent>}
› Intent is typically an additional word or phrase to
• Disambiguate, most often by type e.g. brad pitt actor
• Specify action or aspect e.g. brad pitt net worth, toy story trailer
Entities and Intents
3/21/201525
moneyball trailer
what the user wants to do with it (intent)
Movie
Object of the query (entity)
oakland as bradd pitt movie moneyball trailer movies.yahoo.com oakland as wikipedia.org
Annotation over sessions
Sports team
Movie
Actor
list search
related entity finding
entity search
SemSearch 2010/11
list completion
SemSearch 2011
TREC ELC taskTREC REF-LOD task
entity retrieval
Common tasks in Semantic Search
question-answering
QALD 2012/13/14
document retrieval
e.g. Dalton et al SIGIR 2014
 Entity-seeking queries make up
40-50% of the query volume
› Jeffrey Pound, Peter Mika, Hugo Zaragoza: Ad-hoc
object retrieval in the web of data. WWW 2010: 771-
780
› Thomas Lin, Patrick Pantel, Michael Gamon, Anitha
Kannan, Ariel Fuxman: Active objects: actions for
entity-centric search. WWW 2012: 589-598
 Show a summary of the most
likely information-needs
› Including related entities for navigation
› Roi Blanco, Berkant Barla Cambazoglu,
Peter Mika, Nicolas Torzec: Entity
Recommendations in Web Search.
ISWC 2013
Application:
entity displays in web search
Application: personalization in online news
 Entity linking
 Entity ranking according to relevance to the document
New applications
Mobile search on the rise
 Information access on-the-go requires hands-free operation
› Driving, walking, gym, etc.
• Americans spend 540 hours a year in their cars [1] vs. 348 hours browsing the Web [2]
 ~50% of queries are coming from mobile devices (and growing)
› Changing habits, e.g. iPad usage peaks before bedtime
› Limitations in input/output
[1] http://answers.google.com/answers/threadview?id=392456
[2] http://articles.latimes.com/2012/jun/22/business/la-fi-tn-top-us-brands-news-web-sites-20120622
Mobile search challenges and opportunities
35
 Interaction
› Question-answering
› Support for interactive retrieval
› Spoken-language access
› Task completion
 Contextualization
› Personalization
› Geo
› Context (work/home/travel)
• Try getaviate.com
Interactive, conversational voice search
 Parlance EU project
› Complex dialogs within a domain
• Requires complete semantic understanding
 Complete system (mixed license)
› Automated Speech Recognition (ASR)
› Spoken Language Understanding (SLU)
› Interaction Management
› Knowledge Base
› Natural Language Generation (NLG)
› Text-to-Speech (TTS)
 Video
Task completion
37
 We would like to help our users in task completion
› But we have trained our users to talk in nouns
• Retrieval performance decreases by adding verbs to queries
› We need to understand what the available actions are
 Modeling actions
› Understand what actions can be taken on a page
› Help users in mapping their query to potential actions
› Applications in web search, email etc.
THING
THING
Schema.org v1.2
including Actions
published
April 16, 2014
Applications
Email (Gmail) SERP (Yandex)
Q&A
 Many thanks to members of the Semantic Search team
at Yahoo Labs Barcelona and to Yahoos around the world
 Contact me
› pmika@yahoo-inc.com
› @pmika
› http://www.slideshare.net/pmika/

More Related Content

What's hot

Tutorial: Social Semantics
Tutorial: Social SemanticsTutorial: Social Semantics
Tutorial: Social SemanticsMatthew Rowe
 
Making the Web Searchable - Keynote ICWE 2015
Making the Web Searchable - Keynote ICWE 2015Making the Web Searchable - Keynote ICWE 2015
Making the Web Searchable - Keynote ICWE 2015Peter Mika
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in PracticePeter Mika
 
Harith Alani's presentation at SSSW 2011
Harith Alani's presentation at SSSW 2011Harith Alani's presentation at SSSW 2011
Harith Alani's presentation at SSSW 2011sssw2011
 
The Semantic Web
The Semantic WebThe Semantic Web
The Semantic Webostephens
 
Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011sssw2011
 
The Social Semantic Web
The Social Semantic Web The Social Semantic Web
The Social Semantic Web John Breslin
 
Evolution Towards Web 3.0: The Semantic Web
Evolution Towards Web 3.0: The Semantic WebEvolution Towards Web 3.0: The Semantic Web
Evolution Towards Web 3.0: The Semantic WebLeeFeigenbaum
 
Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011sssw2011
 
Dagstuhl FOAF history talk
Dagstuhl FOAF history talkDagstuhl FOAF history talk
Dagstuhl FOAF history talkDan Brickley
 
AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5Traian Rebedea
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebMarina Santini
 
Social Semantic Web (Social Activity and Facebook)
Social Semantic Web (Social Activity and Facebook)Social Semantic Web (Social Activity and Facebook)
Social Semantic Web (Social Activity and Facebook)Myungjin Lee
 
Social semantic web
Social semantic webSocial semantic web
Social semantic webVlad Posea
 
09 semantic web & ontologies
09 semantic web & ontologies09 semantic web & ontologies
09 semantic web & ontologiesMarina Santini
 
Michalis Vafopoulos: Initial thoughts about existence in the Web
Michalis Vafopoulos: Initial thoughts about existence in the WebMichalis Vafopoulos: Initial thoughts about existence in the Web
Michalis Vafopoulos: Initial thoughts about existence in the WebPhiloWeb
 
Semantic Web: an Introduction
Semantic Web: an IntroductionSemantic Web: an Introduction
Semantic Web: an IntroductionLuigi De Russis
 
Data Portability with SIOC and FOAF
Data Portability with SIOC and FOAFData Portability with SIOC and FOAF
Data Portability with SIOC and FOAFUldis Bojars
 

What's hot (20)

Tutorial: Social Semantics
Tutorial: Social SemanticsTutorial: Social Semantics
Tutorial: Social Semantics
 
Making the Web Searchable - Keynote ICWE 2015
Making the Web Searchable - Keynote ICWE 2015Making the Web Searchable - Keynote ICWE 2015
Making the Web Searchable - Keynote ICWE 2015
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in Practice
 
Harith Alani's presentation at SSSW 2011
Harith Alani's presentation at SSSW 2011Harith Alani's presentation at SSSW 2011
Harith Alani's presentation at SSSW 2011
 
The Semantic Web
The Semantic WebThe Semantic Web
The Semantic Web
 
Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011
 
The Social Semantic Web
The Social Semantic Web The Social Semantic Web
The Social Semantic Web
 
Evolution Towards Web 3.0: The Semantic Web
Evolution Towards Web 3.0: The Semantic WebEvolution Towards Web 3.0: The Semantic Web
Evolution Towards Web 3.0: The Semantic Web
 
Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011
 
Dagstuhl FOAF history talk
Dagstuhl FOAF history talkDagstuhl FOAF history talk
Dagstuhl FOAF history talk
 
AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
 
Library Linked Data
Library Linked DataLibrary Linked Data
Library Linked Data
 
Social Semantic Web (Social Activity and Facebook)
Social Semantic Web (Social Activity and Facebook)Social Semantic Web (Social Activity and Facebook)
Social Semantic Web (Social Activity and Facebook)
 
Social semantic web
Social semantic webSocial semantic web
Social semantic web
 
09 semantic web & ontologies
09 semantic web & ontologies09 semantic web & ontologies
09 semantic web & ontologies
 
Michalis Vafopoulos: Initial thoughts about existence in the Web
Michalis Vafopoulos: Initial thoughts about existence in the WebMichalis Vafopoulos: Initial thoughts about existence in the Web
Michalis Vafopoulos: Initial thoughts about existence in the Web
 
Semantic Web: an Introduction
Semantic Web: an IntroductionSemantic Web: an Introduction
Semantic Web: an Introduction
 
Data Portability with SIOC and FOAF
Data Portability with SIOC and FOAFData Portability with SIOC and FOAF
Data Portability with SIOC and FOAF
 
When?
When?When?
When?
 

Similar to Semantic Search keynote at CORIA 2015

Semantic Search at Yahoo
Semantic Search at YahooSemantic Search at Yahoo
Semantic Search at YahooPeter Mika
 
Semantic Search on the Rise
Semantic Search on the RiseSemantic Search on the Rise
Semantic Search on the RisePeter Mika
 
(Keynote) Peter Mika - “Making the Web Searchable”
(Keynote) Peter Mika - “Making the Web Searchable”(Keynote) Peter Mika - “Making the Web Searchable”
(Keynote) Peter Mika - “Making the Web Searchable”icwe2015
 
Semantic mark-up with schema.org: helping search engines understand the Web
Semantic mark-up with schema.org: helping search engines understand the WebSemantic mark-up with schema.org: helping search engines understand the Web
Semantic mark-up with schema.org: helping search engines understand the WebPeter Mika
 
Understanding Queries through Entities
Understanding Queries through EntitiesUnderstanding Queries through Entities
Understanding Queries through EntitiesPeter Mika
 
Semantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsSemantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsPeter Mika
 
From Queries to Answers in the Web
From Queries to Answers in the WebFrom Queries to Answers in the Web
From Queries to Answers in the WebRoi Blanco
 
Search and social patents for 2012 and beyond
Search and social patents for 2012 and beyondSearch and social patents for 2012 and beyond
Search and social patents for 2012 and beyondBill Slawski
 
The evolution of Search spscinci
The evolution of Search spscinciThe evolution of Search spscinci
The evolution of Search spscinciJohnny Lopez
 
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Connotate
 
Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Roi Blanco
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopPeter Skomoroch
 
Spivack Blogtalk 2008
Spivack Blogtalk 2008Spivack Blogtalk 2008
Spivack Blogtalk 2008Blogtalk 2008
 
Advanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU InvestigatorsAdvanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU InvestigatorsSloan Carne
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementTrey Grainger
 
Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014Marianne Sweeny
 
Business Intelligence and Big Data in Cloud
Business Intelligence and Big Data in CloudBusiness Intelligence and Big Data in Cloud
Business Intelligence and Big Data in CloudDing Li
 

Similar to Semantic Search keynote at CORIA 2015 (20)

Semantic Search at Yahoo
Semantic Search at YahooSemantic Search at Yahoo
Semantic Search at Yahoo
 
Semantic Search on the Rise
Semantic Search on the RiseSemantic Search on the Rise
Semantic Search on the Rise
 
(Keynote) Peter Mika - “Making the Web Searchable”
(Keynote) Peter Mika - “Making the Web Searchable”(Keynote) Peter Mika - “Making the Web Searchable”
(Keynote) Peter Mika - “Making the Web Searchable”
 
Semantic mark-up with schema.org: helping search engines understand the Web
Semantic mark-up with schema.org: helping search engines understand the WebSemantic mark-up with schema.org: helping search engines understand the Web
Semantic mark-up with schema.org: helping search engines understand the Web
 
Understanding Queries through Entities
Understanding Queries through EntitiesUnderstanding Queries through Entities
Understanding Queries through Entities
 
Semantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsSemantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistants
 
From Queries to Answers in the Web
From Queries to Answers in the WebFrom Queries to Answers in the Web
From Queries to Answers in the Web
 
Search and social patents for 2012 and beyond
Search and social patents for 2012 and beyondSearch and social patents for 2012 and beyond
Search and social patents for 2012 and beyond
 
The evolution of Search spscinci
The evolution of Search spscinciThe evolution of Search spscinci
The evolution of Search spscinci
 
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
 
Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Mining Web content for Enhanced Search
Mining Web content for Enhanced Search
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With Hadoop
 
Spivack Blogtalk 2008
Spivack Blogtalk 2008Spivack Blogtalk 2008
Spivack Blogtalk 2008
 
Advanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU InvestigatorsAdvanced Research Investigations for SIU Investigators
Advanced Research Investigations for SIU Investigators
 
Semantics and Machine Learning
Semantics and Machine LearningSemantics and Machine Learning
Semantics and Machine Learning
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge Management
 
Big Data
Big DataBig Data
Big Data
 
Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014
 
Not Your Mom's SEO
Not Your Mom's SEONot Your Mom's SEO
Not Your Mom's SEO
 
Business Intelligence and Big Data in Cloud
Business Intelligence and Big Data in CloudBusiness Intelligence and Big Data in Cloud
Business Intelligence and Big Data in Cloud
 

Recently uploaded

Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)itwameryclare
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 

Recently uploaded (20)

Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 

Semantic Search keynote at CORIA 2015

  • 1. Semantic Search: from document retrieval to Virtual Assistants P R E S E N T E D B Y P e t e r M i k a , D i r e c t o r o f R e s e a r c h , Y a h o o L a b s ⎪ M a r c h 2 0 , 2 0 1 5
  • 2. The Semantic Web (2001-) 3/21/20152  Part of Tim Berners-Lee’s original proposal for the Web  Beginning of a research community › Formal ontology › Logical reasoning › Agents, web services  Rough start in deployment › Misplaced expectations › Lack of adoption
  • 3.  The Semantic Web, May 2001  “At the doctor's office, Lucy instructed her Semantic Web agent through her handheld Web browser. The agent promptly retrieved information about Mom's prescribed treatment from the doctor's agent, looked up several lists of providers, and checked for the ones in-plan for Mom's insurance within a 20-mile radius of her home and with a rating of excellent or very good on trusted rating services. It then began trying to find a match between available appointment times (supplied by the agents of individual providers through their Web sites) and Pete's and Lucy's busy schedules.”  (The emphasized keywords indicate terms whose semantics, or meaning, were defined for the agent through the Semantic Web.) 3/21/20153 Misplaced expectations?
  • 4. Lack of adoption  Standardization ahead of adoption › URI, RDF, RDF/XML, RDFa, JSON-LD, OWL, RIF, SPARQL, OWL-S, POWDER …  Chicken and egg problem › No users/use cases, hence no data › No data, because no users/use cases  By 2007, some modest progress › Metadata in HTML: microformats › Linked Data: simplifying the stack
  • 5. Web search by 2007 5  Large classes of queries are solved to perfection  Improvements in web search are harder and harder to come by › Relevance models, hyperlink structure and interaction data › Combination of features using machine learning › Heavy investment in computational power • real-time indexing, instant search, datacenters and edge services
  • 6.  Language issues › Multiple interpretations • jaguar • paris hilton › Secondary meaning • george bush (and I mean the beer brewer in Arizona) › Subjectivity • reliable digital camera • paris hilton sexy › Imprecise or overly precise searches • jim hendler  Complex needs › Missing information • brad pitt zombie • florida man with 115 guns • 35 year old computer scientist living in barcelona › Category queries • countries in africa • barcelona nightlife › Transactional or computational queries • 120 dollars in euros • digital camera under 300 dollars • world temperature in 2020 Poorly solved information needs remain Many of these queries would not be asked by users, who learned over time what search technology can and can not do.
  • 7. Web search by 2007 7  Are there even any true keyword queries? › Lyrics, quotes and bugs… anything else?  Remaining challenges are not computational, but in modeling user cognition › Need a deeper understanding of the query, the content and/or the world at large
  • 8. Microsearch internal prototype (2007) Personal and private homepage of the same person (clear from the snippet but it could be also automatically de-duplicated) Conferences he plans to attend and his vacations from homepage plus bio events from LinkedIn Geolocation
  • 9. Enhanced Results  Computing abstracts is hard › Summarization of HTML • Template detection • Selecting relevant snippets • Composing readable text › Efficiency constraints  Structured data to replace or complement text summary › Key/value pairs › Deep links › Image or Video
  • 10. Yahoo SearchMonkey (2008) 1. Extract structured data › Semantic Web markup • Example: <span property=“vcard:city”>Santa Clara</span> <span property=“vcard:region”>CA</span> › Information Extraction 2. Presentation › Fixed presentation templates • One template per object type › Applications • Third-party modules to display data (SearchMonkey)
  • 11. Effectiveness of enhanced results  Explicit user feedback › Side-by-side editorial evaluation (A/B testing) • Editors are shown a traditional search result and enhanced result for the same page • Users prefer enhanced results in 84% of the cases and traditional results in 3% (N=384)  Implicit user feedback › Click-through rate analysis • Long dwell time limit of 100s (Ciemiewicz et al. 2010) • 15% increase in ‘good’ clicks › User interaction model • Enhanced results lead users to relevant documents (IV) even though less likely to clicked than textual (III) • Enhanced results effectively reduce bad clicks!  See › Kevin Haas, Peter Mika, Paul Tarjan, Roi Blanco: Enhanced results for web search. SIGIR 2011: 725-734
  • 12. Adoption among consumers of web content  Google announces Rich Snippets - June, 2009 › Faceted search for recipes - Feb, 2011  Bing tiles – Feb, 2011  Facebook’s Like button and the Open Graph Protocol (2010) › Shows up in profiles and news feed › Site owners can later reach users who have liked an object
  • 13. schema.org  Agreement on a shared set of schemas for common types of web content › Bing, Google, and Yahoo! as initial founders (June, 2011) • Yandex joins schema.org in Nov, 2011 › Similar in intent to sitemaps.org • Use a single format to communicate the same information to all three search engines  schema.org covers areas of interest to all search engines › Business listings (local), creative works (video), recipes, reviews and more › Microdata, RDFa, JSON-LD syntax  Collaborative effort › Growing number of 3rd party contributions › schema.org discussions at public-vocabs@w3.org
  • 14. Adoption among publishers of content  R.V. Guha: Light at the end of the tunnel (ISWC 2013 keynote) › Over 15% of all pages now have schema.org markup › Over 5 million sites, over 25 billion entity references › In other words • Same order of magnitude as the web  See also › P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus, LDOW 2012 • Based on Bing US corpus • 31% of webpages, 5% of domains contain some metadata › WebDataCommons • Based on CommonCrawl Nov 2013 • 26% of webpages, 14% of domains contain some metadata
  • 15. Semantic Search at Yahoo 15
  • 16. Yahoo’s Knowledge Graph Chicago Cubs Chicago Barack Obama Carlos Zambrano 10% off tickets for plays for plays in lives in Brad Pitt Angelina Jolie Steven Soderbergh George Clooney Ocean’s Twelve partner directs casts in E/R casts in takes place in Fight Club casts in Dust Brothers casts in music by Nicolas Torzec: Making knowledge reusable at Yahoo!: a Look at the Yahoo! Knowledge Base (SemTech 2013)
  • 17. Information extraction and reconciliation  Information extraction › Automated information extraction • e.g. wrapper induction › Metadata from HTML pages • Focused crawler › Public datasets (e.g. Dbpedia) › Proprietary data  Data fusion › Manual mapping from the source schemas to the ontology › Supervised entity reconciliation • Kedar Bellare, Carlo Curino, Ashwin Machanavajihala, Peter Mika, Mandar Rahurkar, Aamod Sane: WOO: A Scalable and Multi-tenant Platform for Continuous Knowledge Base Synthesis. PVLDB 2013 • Michael J. Welch, Aamod Sane, Chris Drome: Fast and accurate incremental entity resolution relative to an entity knowledge base. CIKM 2012  Ontology management › Editorially maintained OWL ontology with 300+ classes › Covering the domains of interest of Yahoo  Curation and quality assessment › Editors and user feedback still play a large role
  • 18. Semantic Search  Active research field at the intersection of IR, NLP, DB and SemWeb › ESAIR at SIGIR, SemSearch at ESWC/WWW, EOS and JIWES at SIGIR, Semantic Search at VLDB  Exploiting semantic understanding in the retrieval process › User intent and resources are represented using semantic models • Not just symbolic representations › Semantic models are exploited in the matching and ranking of resources  Tasks › information extraction › information reconciliation/tracking › query understanding › retrieving/ranking entities/attributes/relations › result presentation
  • 19. Semantic Search – a process view Query Constructi on •Keywords •Forms •NL •Formal language Query Processin g •IR-style matching & ranking •DB-style precise matching •KB-style matching & inferences Result Presentation •Query visualization •Document and data presentation •Summarization Query Refinement •Implicit feedback •Explicit feedback •Incentives Document Representation Knowledge Representation Semantic Models Resources Documents
  • 20. Semantic understanding 23  Documents › Text in general • Exploiting natural language structure and semantic coherence › Specific to the Web • Exploiting structure of web pages, e.g. annotation of web tables  Queries › Short text and no structure… nothing to do?
  • 21. Semantic understanding of queries 24  Entities play an important role › [Pound et al, WWW 2010], [Lin et al WWW 2012] › ~70% of queries contain a named entity (entity mention queries) • brad pitt height › ~50% of queries have an entity focus (entity seeking queries) • brad pitt attacked by fans › ~10% of queries are looking for a class of entities • brad pitt movies  Entity mention query = <entity> {+ <intent>} › Intent is typically an additional word or phrase to • Disambiguate, most often by type e.g. brad pitt actor • Specify action or aspect e.g. brad pitt net worth, toy story trailer
  • 22. Entities and Intents 3/21/201525 moneyball trailer what the user wants to do with it (intent) Movie Object of the query (entity)
  • 23. oakland as bradd pitt movie moneyball trailer movies.yahoo.com oakland as wikipedia.org Annotation over sessions Sports team Movie Actor
  • 24. list search related entity finding entity search SemSearch 2010/11 list completion SemSearch 2011 TREC ELC taskTREC REF-LOD task entity retrieval Common tasks in Semantic Search question-answering QALD 2012/13/14 document retrieval e.g. Dalton et al SIGIR 2014
  • 25.  Entity-seeking queries make up 40-50% of the query volume › Jeffrey Pound, Peter Mika, Hugo Zaragoza: Ad-hoc object retrieval in the web of data. WWW 2010: 771- 780 › Thomas Lin, Patrick Pantel, Michael Gamon, Anitha Kannan, Ariel Fuxman: Active objects: actions for entity-centric search. WWW 2012: 589-598  Show a summary of the most likely information-needs › Including related entities for navigation › Roi Blanco, Berkant Barla Cambazoglu, Peter Mika, Nicolas Torzec: Entity Recommendations in Web Search. ISWC 2013 Application: entity displays in web search
  • 26. Application: personalization in online news  Entity linking  Entity ranking according to relevance to the document
  • 28. Mobile search on the rise  Information access on-the-go requires hands-free operation › Driving, walking, gym, etc. • Americans spend 540 hours a year in their cars [1] vs. 348 hours browsing the Web [2]  ~50% of queries are coming from mobile devices (and growing) › Changing habits, e.g. iPad usage peaks before bedtime › Limitations in input/output [1] http://answers.google.com/answers/threadview?id=392456 [2] http://articles.latimes.com/2012/jun/22/business/la-fi-tn-top-us-brands-news-web-sites-20120622
  • 29. Mobile search challenges and opportunities 35  Interaction › Question-answering › Support for interactive retrieval › Spoken-language access › Task completion  Contextualization › Personalization › Geo › Context (work/home/travel) • Try getaviate.com
  • 30. Interactive, conversational voice search  Parlance EU project › Complex dialogs within a domain • Requires complete semantic understanding  Complete system (mixed license) › Automated Speech Recognition (ASR) › Spoken Language Understanding (SLU) › Interaction Management › Knowledge Base › Natural Language Generation (NLG) › Text-to-Speech (TTS)  Video
  • 31. Task completion 37  We would like to help our users in task completion › But we have trained our users to talk in nouns • Retrieval performance decreases by adding verbs to queries › We need to understand what the available actions are  Modeling actions › Understand what actions can be taken on a page › Help users in mapping their query to potential actions › Applications in web search, email etc. THING THING Schema.org v1.2 including Actions published April 16, 2014
  • 33. Q&A  Many thanks to members of the Semantic Search team at Yahoo Labs Barcelona and to Yahoos around the world  Contact me › pmika@yahoo-inc.com › @pmika › http://www.slideshare.net/pmika/