SlideShare a Scribd company logo
1 of 29
Learning to Match Using Local and Distributed
Representations of Text for Web Search
Nick Craswell
Microsoft
Bellevue, USA
nickcr@microsoft.com
*work done while at Microsoft
Fernando Diaz
Spotify*
New York, USA
diazf@acm.org
Bhaskar Mitra
Microsoft, UCL
Cambridge, UK
bmitra@microsoft.com
The Duet Model:
The document ranking task
Given a query rank documents
according to relevance
The query text has few terms
The document representation can be
long (e.g., body text) or short (e.g., title)
query
ranked results
search engine w/ an
index of retrievable items
This paper is focused on ranking documents
based on their long body text
Many DNN models for short text ranking
(Huang et al., 2013)
(Severyn and Moschitti, 2015)
(Shen et al., 2014)
(Palangi et al., 2015)
(Hu et al., 2014)
(Tai et al., 2015)
But few for long document ranking…
(Guo et al., 2016)
(Salakhutdinov and Hinton, 2009)
Challenges in short vs. long text retrieval
Short-text
Vocabulary mismatch more serious problem
Long-text
Documents contain mixture of many topics
Matches in different parts of the document non-uniformly important
Term proximity is important
The “black swans” of
Information Retrieval
The term black swan originally referred to impossible
events. In 1697, Dutch explorers encountered black
swans for the very first time in western Australia. Since
then, the term is used to refer to surprisingly rare events.
In IR, many query terms and intents are
never observed in the training data
Exact matching is effective in making the
IR model robust to rare events
Desiderata of document ranking
Exact matching
Important if query term is rare / fresh
Frequency and positions of matches
good indicators of relevance
Term proximity is important
Inexact matching
Synonymy relationships
united states president ↔ Obama
Evidence for document aboutness
Documents about Australia likely to contain
related terms like Sydney and koala
Proximity and position is important
Different text representations for matching
Local representation
Terms are considered distinct entities
Term representation is local (one-hot vectors)
Matching is exact (term-level)
Distributed representation
Represent text as dense vectors (embeddings)
Inexact matching in the embedding space
Local (one-hot) representation Distributed representation
A tale of two queries
“pekarovic land company”
Hard to learn good representation for
rare term pekarovic
But easy to estimate relevance based
on patterns of exact matches
Proposal: Learn a neural model to
estimate relevance from patterns of
exact matches
“what channel are the seahawks on today”
Target document likely contains ESPN
or sky sports instead of channel
An embedding model can associate
ESPN in document to channel in query
Proposal: Learn embeddings of text
and match query with document in
the embedding space
The Duet Architecture
Use a neural network to model both functions and learn their parameters jointly
The Duet
architecture
Linear combination of two models
trained jointly on labelled query-
document pairs
Local model operates on lexical
interaction matrix
Distributed model projects n-graph
vectors of text into an embedding
space and then estimates match
Sum
Query text
Generate query
term vector
Doc text
Generate doc
term vector
Generate interaction matrix
Query
term vector
Doc
term vector
Local model
Fully connected layers for matching
Query text
Generate query
embedding
Doc text
Generate doc
embedding
Hadamard product
Query
embedding
Doc
embedding
Distributed model
Fully connected layers for matching
Local model
Local model: term interaction matrix
𝑋𝑖,𝑗 =
1, 𝑖𝑓 𝑞𝑖 = 𝑑𝑗
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
In relevant documents,
→Many matches, typically clustered
→Matches localized early in document
→Matches for all query terms
→In-order (phrasal) matches
Local model: estimating relevance
← document words →
Convolve using window of size 𝑛 𝑑 × 1
Each window instance compares a query term w/
whole document
Fully connected layers aggregate evidence
across query terms - can model phrasal matches
Distributed model
Distributed model: input representation
dogs → [ d , o , g , s , #d , do , og , gs , s# , #do , dog , ogs , gs#, #dog, dogs, ogs#, #dogs, dogs# ]
(we consider 2K most popular n-graphs only for encoding)
d o g s h a v e o w n e r s c a t s h a v e s t a f f
n-graph
encoding
concatenate
Channels=2K
[words x channels]
convolutio
n
pooling
Query
embedding
…
…
…
HadamardproductHadamardproductFullyconnected
query document
Distributed model: estimating relevance
Convolve over query and
document terms
Match query with moving
windows over document
Learn text embeddings
specifically for the task
Matching happens in
embedding space
* Network architecture slightly
simplified for visualization – refer paper
for exact details
Putting the two models together…
The Duet model
Training sample: 𝑄, 𝐷+, 𝐷1
−
𝐷2
−
𝐷3
−
𝐷4
−
𝐷+ = 𝐷𝑜𝑐𝑢𝑚𝑒𝑛𝑡 𝑟𝑎𝑡𝑒𝑑 𝐸𝑥𝑐𝑒𝑙𝑙𝑒𝑛𝑡 𝑜𝑟 𝐺𝑜𝑜𝑑
𝐷− = 𝐷𝑜𝑐𝑢𝑚𝑒𝑛𝑡 2 𝑟𝑎𝑡𝑖𝑛𝑔𝑠 𝑤𝑜𝑟𝑠𝑒 𝑡ℎ𝑎𝑛 𝐷+
Optimize cross-entropy loss
Implemented using CNTK (GitHub link)
Data
Need large-scale training data
(labels or clicks)
We use Bing human labelled
data for both train and test
Results
Key finding: Duet performs significantly better than local and distributed
models trained individually
Random negatives vs. judged negatives
Key finding: training w/ judged
bad as negatives significantly
better than w/ random negatives
Local vs. distributed model
Key finding: local and distributed
model performs better on
different segments, but
combination is always better
Effect of training data volume
Key finding: large quantity of training data necessary for learning good
representations, less impactful for training local model
Term
importance
Local model
Only query terms have an impact
Earlier occurrences have bigger impact
Query: united states president
Visualizing impact of dropping terms on model score
Term
importance
Distributed model
Non-query terms (e.g., Obama and
federal) has positive impact on score
Common words like ‘the’ and ‘of’
probably good indicators of well-
formedness of content
Query: united states president
Visualizing impact of dropping terms on model score
Types of models
If we classify models by
query level performance
there is a clear clustering of
lexical (local) and semantic
(distributed) models
Duet on other
IR tasks
Promising early results on TREC
2017 Complex Answer Retrieval
(TREC-CAR)
Duet performs significantly
better when trained on large
data (~32 million samples)
(PAPER UNDER REVIEW)
Summary
Both exact and inexact matching is important for IR
Deep neural networks can be used to model both types of matching
Local model more effective for queries containing rare terms
Distributed model benefits from training on large datasets
Combine local and distributed model to achieve state-of-the-art performance
Get the model:
https://github.com/bmitra-msft/NDRM/blob/master/notebooks/Duet.ipynb

More Related Content

What's hot

What's hot (20)

Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
 
NS-CUK Joint Journal Club: V.T.Hoang, Review on "Heterogeneous Graph Attentio...
NS-CUK Joint Journal Club: V.T.Hoang, Review on "Heterogeneous Graph Attentio...NS-CUK Joint Journal Club: V.T.Hoang, Review on "Heterogeneous Graph Attentio...
NS-CUK Joint Journal Club: V.T.Hoang, Review on "Heterogeneous Graph Attentio...
 
Learning loss for active learning
Learning loss for active learningLearning loss for active learning
Learning loss for active learning
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
 
상상을 현실로 만드는, 이미지 생성 모델을 위한 엔지니어링
상상을 현실로 만드는, 이미지 생성 모델을 위한 엔지니어링상상을 현실로 만드는, 이미지 생성 모델을 위한 엔지니어링
상상을 현실로 만드는, 이미지 생성 모델을 위한 엔지니어링
 
Transformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxTransformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptx
 
Neo4j 4.1 overview
Neo4j 4.1 overviewNeo4j 4.1 overview
Neo4j 4.1 overview
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 
Deep learning with keras
Deep learning with kerasDeep learning with keras
Deep learning with keras
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity Recognition
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
 
Attention is all you need
Attention is all you needAttention is all you need
Attention is all you need
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Introduction to Keras
Introduction to KerasIntroduction to Keras
Introduction to Keras
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
 
Survey of Attention mechanism & Use in Computer Vision
Survey of Attention mechanism & Use in Computer VisionSurvey of Attention mechanism & Use in Computer Vision
Survey of Attention mechanism & Use in Computer Vision
 
EfficientNet
EfficientNetEfficientNet
EfficientNet
 
KERAS Python Tutorial
KERAS Python TutorialKERAS Python Tutorial
KERAS Python Tutorial
 

Similar to The Duet model

Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
Bhaskar Mitra
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
Kalpit Desai
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
Bhaskar Mitra
 
Search Engines
Search EnginesSearch Engines
Search Engines
butest
 
Document Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word RepresentationDocument Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word Representation
suthi
 

Similar to The Duet model (20)

Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document Ranking
 
EDI 2009- Advanced Search: What’s Under the Hood of your Favorite Search System?
EDI 2009- Advanced Search: What’s Under the Hood of your Favorite Search System?EDI 2009- Advanced Search: What’s Under the Hood of your Favorite Search System?
EDI 2009- Advanced Search: What’s Under the Hood of your Favorite Search System?
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information Retrieval
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
 
Eurolan 2005 Pedersen
Eurolan 2005 PedersenEurolan 2005 Pedersen
Eurolan 2005 Pedersen
 
Interactive Analysis of Word Vector Embeddings
Interactive Analysis of Word Vector EmbeddingsInteractive Analysis of Word Vector Embeddings
Interactive Analysis of Word Vector Embeddings
 
Search Engines
Search EnginesSearch Engines
Search Engines
 
Chat bot using text similarity approach
Chat bot using text similarity approachChat bot using text similarity approach
Chat bot using text similarity approach
 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for Search
 
Interpreting Embeddings with Comparison
Interpreting Embeddings with ComparisonInterpreting Embeddings with Comparison
Interpreting Embeddings with Comparison
 
Document Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word RepresentationDocument Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word Representation
 
Ijcai 2007 Pedersen
Ijcai 2007 PedersenIjcai 2007 Pedersen
Ijcai 2007 Pedersen
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
 
Survey of Generative Clustering Models 2008
Survey of Generative Clustering Models 2008Survey of Generative Clustering Models 2008
Survey of Generative Clustering Models 2008
 

More from Bhaskar Mitra

Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...
Bhaskar Mitra
 
Multisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and RecommendationMultisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and Recommendation
Bhaskar Mitra
 

More from Bhaskar Mitra (20)

Joint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and RecommendationJoint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and Recommendation
 
What’s next for deep learning for Search?
What’s next for deep learning for Search?What’s next for deep learning for Search?
What’s next for deep learning for Search?
 
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
 
Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...
 
Multisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and RecommendationMultisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and Recommendation
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Neural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progressNeural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progress
 
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning TrackConformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Duet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackDuet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning Track
 
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBenchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Learning to Rank with Neural Networks
Learning to Rank with Neural NetworksLearning to Rank with Neural Networks
Learning to Rank with Neural Networks
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Adversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalAdversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrieval
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information Retrieval
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

The Duet model

  • 1. Learning to Match Using Local and Distributed Representations of Text for Web Search Nick Craswell Microsoft Bellevue, USA nickcr@microsoft.com *work done while at Microsoft Fernando Diaz Spotify* New York, USA diazf@acm.org Bhaskar Mitra Microsoft, UCL Cambridge, UK bmitra@microsoft.com The Duet Model:
  • 2. The document ranking task Given a query rank documents according to relevance The query text has few terms The document representation can be long (e.g., body text) or short (e.g., title) query ranked results search engine w/ an index of retrievable items
  • 3. This paper is focused on ranking documents based on their long body text
  • 4. Many DNN models for short text ranking (Huang et al., 2013) (Severyn and Moschitti, 2015) (Shen et al., 2014) (Palangi et al., 2015) (Hu et al., 2014) (Tai et al., 2015)
  • 5. But few for long document ranking… (Guo et al., 2016) (Salakhutdinov and Hinton, 2009)
  • 6. Challenges in short vs. long text retrieval Short-text Vocabulary mismatch more serious problem Long-text Documents contain mixture of many topics Matches in different parts of the document non-uniformly important Term proximity is important
  • 7. The “black swans” of Information Retrieval The term black swan originally referred to impossible events. In 1697, Dutch explorers encountered black swans for the very first time in western Australia. Since then, the term is used to refer to surprisingly rare events. In IR, many query terms and intents are never observed in the training data Exact matching is effective in making the IR model robust to rare events
  • 8. Desiderata of document ranking Exact matching Important if query term is rare / fresh Frequency and positions of matches good indicators of relevance Term proximity is important Inexact matching Synonymy relationships united states president ↔ Obama Evidence for document aboutness Documents about Australia likely to contain related terms like Sydney and koala Proximity and position is important
  • 9. Different text representations for matching Local representation Terms are considered distinct entities Term representation is local (one-hot vectors) Matching is exact (term-level) Distributed representation Represent text as dense vectors (embeddings) Inexact matching in the embedding space Local (one-hot) representation Distributed representation
  • 10. A tale of two queries “pekarovic land company” Hard to learn good representation for rare term pekarovic But easy to estimate relevance based on patterns of exact matches Proposal: Learn a neural model to estimate relevance from patterns of exact matches “what channel are the seahawks on today” Target document likely contains ESPN or sky sports instead of channel An embedding model can associate ESPN in document to channel in query Proposal: Learn embeddings of text and match query with document in the embedding space The Duet Architecture Use a neural network to model both functions and learn their parameters jointly
  • 11. The Duet architecture Linear combination of two models trained jointly on labelled query- document pairs Local model operates on lexical interaction matrix Distributed model projects n-graph vectors of text into an embedding space and then estimates match Sum Query text Generate query term vector Doc text Generate doc term vector Generate interaction matrix Query term vector Doc term vector Local model Fully connected layers for matching Query text Generate query embedding Doc text Generate doc embedding Hadamard product Query embedding Doc embedding Distributed model Fully connected layers for matching
  • 13. Local model: term interaction matrix 𝑋𝑖,𝑗 = 1, 𝑖𝑓 𝑞𝑖 = 𝑑𝑗 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 In relevant documents, →Many matches, typically clustered →Matches localized early in document →Matches for all query terms →In-order (phrasal) matches
  • 14. Local model: estimating relevance ← document words → Convolve using window of size 𝑛 𝑑 × 1 Each window instance compares a query term w/ whole document Fully connected layers aggregate evidence across query terms - can model phrasal matches
  • 16. Distributed model: input representation dogs → [ d , o , g , s , #d , do , og , gs , s# , #do , dog , ogs , gs#, #dog, dogs, ogs#, #dogs, dogs# ] (we consider 2K most popular n-graphs only for encoding) d o g s h a v e o w n e r s c a t s h a v e s t a f f n-graph encoding concatenate Channels=2K [words x channels]
  • 17. convolutio n pooling Query embedding … … … HadamardproductHadamardproductFullyconnected query document Distributed model: estimating relevance Convolve over query and document terms Match query with moving windows over document Learn text embeddings specifically for the task Matching happens in embedding space * Network architecture slightly simplified for visualization – refer paper for exact details
  • 18. Putting the two models together…
  • 19. The Duet model Training sample: 𝑄, 𝐷+, 𝐷1 − 𝐷2 − 𝐷3 − 𝐷4 − 𝐷+ = 𝐷𝑜𝑐𝑢𝑚𝑒𝑛𝑡 𝑟𝑎𝑡𝑒𝑑 𝐸𝑥𝑐𝑒𝑙𝑙𝑒𝑛𝑡 𝑜𝑟 𝐺𝑜𝑜𝑑 𝐷− = 𝐷𝑜𝑐𝑢𝑚𝑒𝑛𝑡 2 𝑟𝑎𝑡𝑖𝑛𝑔𝑠 𝑤𝑜𝑟𝑠𝑒 𝑡ℎ𝑎𝑛 𝐷+ Optimize cross-entropy loss Implemented using CNTK (GitHub link)
  • 20. Data Need large-scale training data (labels or clicks) We use Bing human labelled data for both train and test
  • 21. Results Key finding: Duet performs significantly better than local and distributed models trained individually
  • 22. Random negatives vs. judged negatives Key finding: training w/ judged bad as negatives significantly better than w/ random negatives
  • 23. Local vs. distributed model Key finding: local and distributed model performs better on different segments, but combination is always better
  • 24. Effect of training data volume Key finding: large quantity of training data necessary for learning good representations, less impactful for training local model
  • 25. Term importance Local model Only query terms have an impact Earlier occurrences have bigger impact Query: united states president Visualizing impact of dropping terms on model score
  • 26. Term importance Distributed model Non-query terms (e.g., Obama and federal) has positive impact on score Common words like ‘the’ and ‘of’ probably good indicators of well- formedness of content Query: united states president Visualizing impact of dropping terms on model score
  • 27. Types of models If we classify models by query level performance there is a clear clustering of lexical (local) and semantic (distributed) models
  • 28. Duet on other IR tasks Promising early results on TREC 2017 Complex Answer Retrieval (TREC-CAR) Duet performs significantly better when trained on large data (~32 million samples) (PAPER UNDER REVIEW)
  • 29. Summary Both exact and inexact matching is important for IR Deep neural networks can be used to model both types of matching Local model more effective for queries containing rare terms Distributed model benefits from training on large datasets Combine local and distributed model to achieve state-of-the-art performance Get the model: https://github.com/bmitra-msft/NDRM/blob/master/notebooks/Duet.ipynb