SlideShare a Scribd company logo
1 of 45
Enriching Solr with Deep-
Learning for Question-
Answering Systems
Sanket Shahane
Data Scientist, Lucidworks
https://www.linkedin.com/in/shahanesanket
Savva Kolbachev
Data Scientist, Lucidworks
#Activate18 #ActivateSearch
Agenda
• Overview of QA
• Use Cases
• Demo
• Architecture
• Deep Dive and Research Findings in:
• Question Question similarity FAQ matching
• Answer Selection
• Questions / Discussion
Overview of QA
• Going Beyond Document Retrieval
• Isn’t it great to have someone answer your
question directly?
• In this fast moving world who wants to read
10 paragraphs when you are looking for just
1 sentence or word?
• Good News! We can do it...
Overview of QA
• Recent advances in GPU’s, and Deep Learning allow us the
possibility to provide higher accuracies.
• E.g. High accuracy on NER, POS etc, image recognition, learning deep
representation of the text
• 3 types of QA systems
Overview of QA
• We will focus on Information Retrieval Based QA Systems
• Architecture
• Research Findings
• Deep dive into the Use cases
• Solr returns documents and Advanced ML models to do the
reranking.
Agenda
• Overview of QA
• Use Cases
• Demo
• Architecture
• Deep Dive and Research Findings in:
• Question Question similarity FAQ matching
• Answer Selection
• Questions / Discussion
Use Cases
• Business Goals:
• Site Stickiness
• Efficient and streamlined customer support
• Implement on the help/contact us page
• Achieved through:
• Question to Question similarity
• FAQ matching
• Answer Selection
• paragraph/sentence level for a given question
• Training data is available for historical ticket support / Document FAQ's we
can find relevant answers from Documents
Agenda
• Overview of QA
• Use Cases
• Demo
• Architecture
• Deep Dive and Research Findings in:
• Question Question similarity FAQ matching
• Answer Selection
• Questions / Discussion
Demo!
Before we dive deep let’s see some examples
Demo!
Demo!
Demo!
Demo!
• Basic Terminology and concepts
• Supervised Learning
• Needs to have labelled data.
• Unsupervised Learning
• Used when labelled data is unavailable
• Deep Learning
• All approaches where we use deep neural network algorithms
• Traditional ML
• All other algorithms like Random Forest / Xgboost / Logistic Regression
ML Catch up
Agenda
• Overview of QA
• Use Cases
• Demo
• Architecture
• Deep Dive and Research Findings in:
• Question Question similarity FAQ matching
• Answer Selection
• Questions / Discussion
Architecture
• Time to Discuss technical stuff!
- Why not just use Solr?
- Why not just Machine Learning?
- Why both?
- How to use both?
• DL models will look at the top N (typically 100~1000)
candidates returned by Solr.
• Depends on how much time you spare for the DL models.
• In the two step architecture we are using it is important that
Solr returns the best possible results.
• High Recall from Solr
• High Precision from Deep Learning
Architecture
Architecture - Index
We need to compute multiple features
for ML models at run time.
Hence, it’s best to precompute and
index some expensive ones.
Use synonyms, answer type classifier which can
enrich your query.
We developed an unsupervised method based on
mutual information captures association between
the keywords in questions and answers
E.g.:
Q words: indexing, typeahead, autocomplete
A words: solr, lucene
Currently we use it as a feature in XGBoost
Architecture - Query
• Extract Entities like persons, organizations etc and
POS like Nouns, Verbs etc from the Query
Architecture - Query
Architecture - Query
• Get w2v vector for the query
• Retrieve documents from Solr, generate features
using LTR api. e.g.
• cosine between nouns
• number of overlapping tokens
Architecture - Query
• Generate features that LTR could not generate
• e.g. number of urls, num of `?` etc.
Architecture - Query
• Use ML model to rerank top N from Solr
Architecture - Query
Architecture - Query
Architecture - Query
Architecture - Query
Agenda
• Overview of QA
• Use Cases
• Demo
• Architecture
• Deep Dive and Research Findings in:
• Question Question similarity FAQ matching
• Answer Selection
• Questions / Discussion
Deep Dive (question to question similarity)
• Use case: Given a set of FAQ’s, historic questions, find
similar questions / duplicate questions.
• Research: Experiments with different types of models:
supervised, unsupervised, and ensembles, combined
with Solr to better capture contextual information.
total 38 different models compared
• Data: Quora duplicate question detection.
Two questions are labeled as duplicate if they are
asking the same thing.
Deep Dive (question to question similarity)
• Methods:
Supervised: XGBOOST
1.TFIDF cosine score
2.TFIDF cosine score Nouns
3.Number of question keywords
overlap
4.Number of overlapping POS tags
5.Number of overlapping NER tags
Unsupervised:
• BM25-Solr
• TFIDF w/ sublinear TF, Ngram
instead of just tokens
• Doc2Vec
• Fast Text
• Glove
• Google Sentence Encoder
(Universal Sentence Encoder)
• Ensemble Solr + GSE
Deep Dive (question to question similarity)
Deep Dive (question to question similarity)
• Conclusion:
- Unsupervised ensemble of TFIDF with GSE provides the best results with LESS
work than supervised.
- Single unsupervised models, generalized TFIDF on n-grams and the new
google sentence encoding provide best results
- Xgboost supervised methods do not have advantage over unsupervised
methods on this task.
Deep Dive (question to question similarity)
Agenda
• Overview of QA
• Use Cases
• Demo
• Architecture
• Deep Dive and Research Findings in:
• Question Question similarity FAQ matching
• Answer Selection
• Questions / Discussion
Deep Dive (Answer Selection)
• Use case: Given historic QA transactions (e.g. support team archives,
slack, StackOverflow, email thread), when a question is issued, pick the
best answer.
• Research: Extensive feature engineering for supervised traditional
models, deep learning models, and unsupervised models
total 21 models compared
• Data: Insurance QA.
The data has a list of question and corresponding answers
• Methods:
• Supervised: XGBOOST, Deep Learning Model
• Unsupervised: FastText, w2v, d2v
• Methods:
Supervised: XGBOOST
1.Cosine score between Tfidf vectors of question
and answer (stop words trimmed).
2.Cosine score between Tfidf vectors of question
and answer after extracting nouns.
3.One-hot-encoding of question keywords (what,
who, when, how long, how fast…)
4.One-hot-encoding of POS tags of question and
answers.
5.One-hot-encoding of NER tags of question and
answers.
6.W2v cosine distance of question and answer.
7.Doc2Vec cosine distance of question and
answer (supervised).
Deep Dive (Answer Selection)
Deep Dive (Answer Selection)
• Methods:
Supervised: Deep Learning
1.Encoder:
• Embeddings layer
• Dropout
• LSTM layer
• Attention Weighted Average layer
1.Simple triplet-based model that uses
shared Encoder for question, positive
answer and negative answer
1.15 negative answers are generated at
random for each epoch.
Deep Dive (Answer Selection)
Deep Dive (Answer Selection)
• Methods:
Supervised: Deep Learning
1.Simple Encoder:
Embeddings layer
LSTM layer
Attention Weighted Average
layer
1.Triplet-based model using
shared Encoder for question,
positive answer and negative
answer
1.15 negative answers are
generated at random for each
epoch.
Deep Dive (Answer Selection)
• Using at runtime:
Deep Dive (Answer Selection)
• Model Stability: Based on the amount of training data.
d2v (unsupervised) / xgboost (supervised) / xgboost_dl (supervised) / DL (supervised)
• Conclusion:
- DL models are significantly better than traditional models except when
training data size reduces to 10% (around 3K qa pairs). The performance is
very similar to traditional model
- There is a significant advantage of using supervised methods over
unsupervised.
- Suggestion: If the training data size < 5K use XGBoost model with pre-trained
embedding fasttext embeddings.
If the training data size is big enough, use DL models without feature
extraction and w2v/d2v.
Deep Dive (Answer Selection)
• Deep Learning is often regarded as unexplainable and a
black box that we do not know how it works or why it works.
• But is it really unexplainable?
• Well not all of it!
- We can explain why particular text is tagged as positive or
negative.
- Our DL model explains itself! 
- Using Attention mechanism we extract what the model considers
important and highlight it for our exploration
Bonus!
Sentiment analysis in Fusion
Thank you!
Sanket Shahane
Data Scientist, Lucidworks
linkedin.com/in/shahanesanket
#Activate18 #ActivateSearch
Savva Kolabchev
Data Scientist, Lucidworks
linkedin.com/in/savva-kolbachev-98669199/
#Activate18 #ActivateSearch

More Related Content

What's hot

(BDT311) Deep Learning: Going Beyond Machine Learning
(BDT311) Deep Learning: Going Beyond Machine Learning(BDT311) Deep Learning: Going Beyond Machine Learning
(BDT311) Deep Learning: Going Beyond Machine LearningAmazon Web Services
 
Very Small Tutorial on Terrier 3.0 Retrieval Toolkit
Very Small Tutorial on Terrier 3.0 Retrieval ToolkitVery Small Tutorial on Terrier 3.0 Retrieval Toolkit
Very Small Tutorial on Terrier 3.0 Retrieval ToolkitKavita Ganesan
 
Deep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVecDeep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVecJosh Patterson
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Lucidworks
 
Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
Java Serialization Facts and Fallacies
Java Serialization Facts and FallaciesJava Serialization Facts and Fallacies
Java Serialization Facts and FallaciesRoman Elizarov
 
Building Deep Learning Workflows with DL4J
Building Deep Learning Workflows with DL4JBuilding Deep Learning Workflows with DL4J
Building Deep Learning Workflows with DL4JJosh Patterson
 
Statistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learnStatistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learnOlivier Grisel
 
An Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingAn Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingTyrone Systems
 
GDG-Shanghai 2017 TensorFlow Summit Recap
GDG-Shanghai 2017 TensorFlow Summit RecapGDG-Shanghai 2017 TensorFlow Summit Recap
GDG-Shanghai 2017 TensorFlow Summit RecapJiang Jun
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!OSCON Byrum
 
Efficient top-k queries processing in column-family distributed databases
Efficient top-k queries processing in column-family distributed databasesEfficient top-k queries processing in column-family distributed databases
Efficient top-k queries processing in column-family distributed databasesRui Vieira
 
Vectorization - Georgia Tech - CSE6242 - March 2015
Vectorization - Georgia Tech - CSE6242 - March 2015Vectorization - Georgia Tech - CSE6242 - March 2015
Vectorization - Georgia Tech - CSE6242 - March 2015Josh Patterson
 

What's hot (16)

(BDT311) Deep Learning: Going Beyond Machine Learning
(BDT311) Deep Learning: Going Beyond Machine Learning(BDT311) Deep Learning: Going Beyond Machine Learning
(BDT311) Deep Learning: Going Beyond Machine Learning
 
Tensorflow vs MxNet
Tensorflow vs MxNetTensorflow vs MxNet
Tensorflow vs MxNet
 
CBOR - The Better JSON
CBOR - The Better JSONCBOR - The Better JSON
CBOR - The Better JSON
 
Very Small Tutorial on Terrier 3.0 Retrieval Toolkit
Very Small Tutorial on Terrier 3.0 Retrieval ToolkitVery Small Tutorial on Terrier 3.0 Retrieval Toolkit
Very Small Tutorial on Terrier 3.0 Retrieval Toolkit
 
Deep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVecDeep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVec
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
 
Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
Java Serialization Facts and Fallacies
Java Serialization Facts and FallaciesJava Serialization Facts and Fallacies
Java Serialization Facts and Fallacies
 
Building Deep Learning Workflows with DL4J
Building Deep Learning Workflows with DL4JBuilding Deep Learning Workflows with DL4J
Building Deep Learning Workflows with DL4J
 
Statistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learnStatistical Learning and Text Classification with NLTK and scikit-learn
Statistical Learning and Text Classification with NLTK and scikit-learn
 
TensorFlow 101
TensorFlow 101TensorFlow 101
TensorFlow 101
 
An Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingAn Introduction to Natural Language Processing
An Introduction to Natural Language Processing
 
GDG-Shanghai 2017 TensorFlow Summit Recap
GDG-Shanghai 2017 TensorFlow Summit RecapGDG-Shanghai 2017 TensorFlow Summit Recap
GDG-Shanghai 2017 TensorFlow Summit Recap
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!
 
Efficient top-k queries processing in column-family distributed databases
Efficient top-k queries processing in column-family distributed databasesEfficient top-k queries processing in column-family distributed databases
Efficient top-k queries processing in column-family distributed databases
 
Vectorization - Georgia Tech - CSE6242 - March 2015
Vectorization - Georgia Tech - CSE6242 - March 2015Vectorization - Georgia Tech - CSE6242 - March 2015
Vectorization - Georgia Tech - CSE6242 - March 2015
 

Similar to Enriching Solr with Deep Learning for a Question Answering System - Sanket Shahane, Lucidworks

Webinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningWebinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningLucidworks
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkSimon Hughes
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Lucidworks
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningJoaquin Delgado PhD.
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningS. Diana Hu
 
Asking Clarifying Questions in Open-Domain Information-Seeking Conversations
Asking Clarifying Questions in Open-Domain Information-Seeking ConversationsAsking Clarifying Questions in Open-Domain Information-Seeking Conversations
Asking Clarifying Questions in Open-Domain Information-Seeking ConversationsMohammad Aliannejadi
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Lucidworks
 
transferlearning.pptx
transferlearning.pptxtransferlearning.pptx
transferlearning.pptxAmit Kumar
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchRachel Berryman
 
The Art of Intelligence – Introduction Machine Learning for Java professional...
The Art of Intelligence – Introduction Machine Learning for Java professional...The Art of Intelligence – Introduction Machine Learning for Java professional...
The Art of Intelligence – Introduction Machine Learning for Java professional...Lucas Jellema
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Dios Kurniawan
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question AnsweringSujit Pal
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
 
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...Dataconomy Media
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptxShree Shree
 

Similar to Enriching Solr with Deep Learning for a Question Answering System - Sanket Shahane, Lucidworks (20)

Webinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningWebinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep Learning
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Deep learning for NLP
Deep learning for NLPDeep learning for NLP
Deep learning for NLP
 
Asking Clarifying Questions in Open-Domain Information-Seeking Conversations
Asking Clarifying Questions in Open-Domain Information-Seeking ConversationsAsking Clarifying Questions in Open-Domain Information-Seeking Conversations
Asking Clarifying Questions in Open-Domain Information-Seeking Conversations
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
Query-time Nonparametric Regression with Temporally Bounded Models - Patrick ...
 
transferlearning.pptx
transferlearning.pptxtransferlearning.pptx
transferlearning.pptx
 
Deep Domain
Deep DomainDeep Domain
Deep Domain
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
 
The Art of Intelligence – Introduction Machine Learning for Java professional...
The Art of Intelligence – Introduction Machine Learning for Java professional...The Art of Intelligence – Introduction Machine Learning for Java professional...
The Art of Intelligence – Introduction Machine Learning for Java professional...
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question Answering
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptx
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...caitlingebhard1
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 

Recently uploaded (20)

WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

Enriching Solr with Deep Learning for a Question Answering System - Sanket Shahane, Lucidworks

  • 1. Enriching Solr with Deep- Learning for Question- Answering Systems Sanket Shahane Data Scientist, Lucidworks https://www.linkedin.com/in/shahanesanket Savva Kolbachev Data Scientist, Lucidworks #Activate18 #ActivateSearch
  • 2. Agenda • Overview of QA • Use Cases • Demo • Architecture • Deep Dive and Research Findings in: • Question Question similarity FAQ matching • Answer Selection • Questions / Discussion
  • 3. Overview of QA • Going Beyond Document Retrieval • Isn’t it great to have someone answer your question directly? • In this fast moving world who wants to read 10 paragraphs when you are looking for just 1 sentence or word? • Good News! We can do it...
  • 4. Overview of QA • Recent advances in GPU’s, and Deep Learning allow us the possibility to provide higher accuracies. • E.g. High accuracy on NER, POS etc, image recognition, learning deep representation of the text • 3 types of QA systems
  • 5. Overview of QA • We will focus on Information Retrieval Based QA Systems • Architecture • Research Findings • Deep dive into the Use cases • Solr returns documents and Advanced ML models to do the reranking.
  • 6. Agenda • Overview of QA • Use Cases • Demo • Architecture • Deep Dive and Research Findings in: • Question Question similarity FAQ matching • Answer Selection • Questions / Discussion
  • 7. Use Cases • Business Goals: • Site Stickiness • Efficient and streamlined customer support • Implement on the help/contact us page • Achieved through: • Question to Question similarity • FAQ matching • Answer Selection • paragraph/sentence level for a given question • Training data is available for historical ticket support / Document FAQ's we can find relevant answers from Documents
  • 8. Agenda • Overview of QA • Use Cases • Demo • Architecture • Deep Dive and Research Findings in: • Question Question similarity FAQ matching • Answer Selection • Questions / Discussion
  • 9. Demo! Before we dive deep let’s see some examples
  • 10. Demo!
  • 11. Demo!
  • 12. Demo!
  • 13. Demo!
  • 14. • Basic Terminology and concepts • Supervised Learning • Needs to have labelled data. • Unsupervised Learning • Used when labelled data is unavailable • Deep Learning • All approaches where we use deep neural network algorithms • Traditional ML • All other algorithms like Random Forest / Xgboost / Logistic Regression ML Catch up
  • 15. Agenda • Overview of QA • Use Cases • Demo • Architecture • Deep Dive and Research Findings in: • Question Question similarity FAQ matching • Answer Selection • Questions / Discussion
  • 16. Architecture • Time to Discuss technical stuff! - Why not just use Solr? - Why not just Machine Learning? - Why both? - How to use both?
  • 17. • DL models will look at the top N (typically 100~1000) candidates returned by Solr. • Depends on how much time you spare for the DL models. • In the two step architecture we are using it is important that Solr returns the best possible results. • High Recall from Solr • High Precision from Deep Learning Architecture
  • 18. Architecture - Index We need to compute multiple features for ML models at run time. Hence, it’s best to precompute and index some expensive ones.
  • 19. Use synonyms, answer type classifier which can enrich your query. We developed an unsupervised method based on mutual information captures association between the keywords in questions and answers E.g.: Q words: indexing, typeahead, autocomplete A words: solr, lucene Currently we use it as a feature in XGBoost Architecture - Query
  • 20. • Extract Entities like persons, organizations etc and POS like Nouns, Verbs etc from the Query Architecture - Query
  • 21. Architecture - Query • Get w2v vector for the query
  • 22. • Retrieve documents from Solr, generate features using LTR api. e.g. • cosine between nouns • number of overlapping tokens Architecture - Query
  • 23. • Generate features that LTR could not generate • e.g. number of urls, num of `?` etc. Architecture - Query
  • 24. • Use ML model to rerank top N from Solr Architecture - Query
  • 28. Agenda • Overview of QA • Use Cases • Demo • Architecture • Deep Dive and Research Findings in: • Question Question similarity FAQ matching • Answer Selection • Questions / Discussion
  • 29. Deep Dive (question to question similarity) • Use case: Given a set of FAQ’s, historic questions, find similar questions / duplicate questions. • Research: Experiments with different types of models: supervised, unsupervised, and ensembles, combined with Solr to better capture contextual information. total 38 different models compared • Data: Quora duplicate question detection. Two questions are labeled as duplicate if they are asking the same thing.
  • 30. Deep Dive (question to question similarity) • Methods: Supervised: XGBOOST 1.TFIDF cosine score 2.TFIDF cosine score Nouns 3.Number of question keywords overlap 4.Number of overlapping POS tags 5.Number of overlapping NER tags Unsupervised: • BM25-Solr • TFIDF w/ sublinear TF, Ngram instead of just tokens • Doc2Vec • Fast Text • Glove • Google Sentence Encoder (Universal Sentence Encoder) • Ensemble Solr + GSE
  • 31. Deep Dive (question to question similarity)
  • 32. Deep Dive (question to question similarity)
  • 33. • Conclusion: - Unsupervised ensemble of TFIDF with GSE provides the best results with LESS work than supervised. - Single unsupervised models, generalized TFIDF on n-grams and the new google sentence encoding provide best results - Xgboost supervised methods do not have advantage over unsupervised methods on this task. Deep Dive (question to question similarity)
  • 34. Agenda • Overview of QA • Use Cases • Demo • Architecture • Deep Dive and Research Findings in: • Question Question similarity FAQ matching • Answer Selection • Questions / Discussion
  • 35. Deep Dive (Answer Selection) • Use case: Given historic QA transactions (e.g. support team archives, slack, StackOverflow, email thread), when a question is issued, pick the best answer. • Research: Extensive feature engineering for supervised traditional models, deep learning models, and unsupervised models total 21 models compared • Data: Insurance QA. The data has a list of question and corresponding answers • Methods: • Supervised: XGBOOST, Deep Learning Model • Unsupervised: FastText, w2v, d2v
  • 36. • Methods: Supervised: XGBOOST 1.Cosine score between Tfidf vectors of question and answer (stop words trimmed). 2.Cosine score between Tfidf vectors of question and answer after extracting nouns. 3.One-hot-encoding of question keywords (what, who, when, how long, how fast…) 4.One-hot-encoding of POS tags of question and answers. 5.One-hot-encoding of NER tags of question and answers. 6.W2v cosine distance of question and answer. 7.Doc2Vec cosine distance of question and answer (supervised). Deep Dive (Answer Selection)
  • 37. Deep Dive (Answer Selection)
  • 38. • Methods: Supervised: Deep Learning 1.Encoder: • Embeddings layer • Dropout • LSTM layer • Attention Weighted Average layer 1.Simple triplet-based model that uses shared Encoder for question, positive answer and negative answer 1.15 negative answers are generated at random for each epoch. Deep Dive (Answer Selection)
  • 39. Deep Dive (Answer Selection) • Methods: Supervised: Deep Learning 1.Simple Encoder: Embeddings layer LSTM layer Attention Weighted Average layer 1.Triplet-based model using shared Encoder for question, positive answer and negative answer 1.15 negative answers are generated at random for each epoch.
  • 40. Deep Dive (Answer Selection) • Using at runtime:
  • 41. Deep Dive (Answer Selection) • Model Stability: Based on the amount of training data. d2v (unsupervised) / xgboost (supervised) / xgboost_dl (supervised) / DL (supervised)
  • 42. • Conclusion: - DL models are significantly better than traditional models except when training data size reduces to 10% (around 3K qa pairs). The performance is very similar to traditional model - There is a significant advantage of using supervised methods over unsupervised. - Suggestion: If the training data size < 5K use XGBoost model with pre-trained embedding fasttext embeddings. If the training data size is big enough, use DL models without feature extraction and w2v/d2v. Deep Dive (Answer Selection)
  • 43. • Deep Learning is often regarded as unexplainable and a black box that we do not know how it works or why it works. • But is it really unexplainable? • Well not all of it! - We can explain why particular text is tagged as positive or negative. - Our DL model explains itself!  - Using Attention mechanism we extract what the model considers important and highlight it for our exploration Bonus!
  • 45. Thank you! Sanket Shahane Data Scientist, Lucidworks linkedin.com/in/shahanesanket #Activate18 #ActivateSearch Savva Kolabchev Data Scientist, Lucidworks linkedin.com/in/savva-kolbachev-98669199/ #Activate18 #ActivateSearch