Why Teams call analytics are critical to your entire business
Enriching Solr with Deep Learning for a Question Answering System - Sanket Shahane, Lucidworks
1. Enriching Solr with Deep-
Learning for Question-
Answering Systems
Sanket Shahane
Data Scientist, Lucidworks
https://www.linkedin.com/in/shahanesanket
Savva Kolbachev
Data Scientist, Lucidworks
#Activate18 #ActivateSearch
2. Agenda
• Overview of QA
• Use Cases
• Demo
• Architecture
• Deep Dive and Research Findings in:
• Question Question similarity FAQ matching
• Answer Selection
• Questions / Discussion
3. Overview of QA
• Going Beyond Document Retrieval
• Isn’t it great to have someone answer your
question directly?
• In this fast moving world who wants to read
10 paragraphs when you are looking for just
1 sentence or word?
• Good News! We can do it...
4. Overview of QA
• Recent advances in GPU’s, and Deep Learning allow us the
possibility to provide higher accuracies.
• E.g. High accuracy on NER, POS etc, image recognition, learning deep
representation of the text
• 3 types of QA systems
5. Overview of QA
• We will focus on Information Retrieval Based QA Systems
• Architecture
• Research Findings
• Deep dive into the Use cases
• Solr returns documents and Advanced ML models to do the
reranking.
6. Agenda
• Overview of QA
• Use Cases
• Demo
• Architecture
• Deep Dive and Research Findings in:
• Question Question similarity FAQ matching
• Answer Selection
• Questions / Discussion
7. Use Cases
• Business Goals:
• Site Stickiness
• Efficient and streamlined customer support
• Implement on the help/contact us page
• Achieved through:
• Question to Question similarity
• FAQ matching
• Answer Selection
• paragraph/sentence level for a given question
• Training data is available for historical ticket support / Document FAQ's we
can find relevant answers from Documents
8. Agenda
• Overview of QA
• Use Cases
• Demo
• Architecture
• Deep Dive and Research Findings in:
• Question Question similarity FAQ matching
• Answer Selection
• Questions / Discussion
14. • Basic Terminology and concepts
• Supervised Learning
• Needs to have labelled data.
• Unsupervised Learning
• Used when labelled data is unavailable
• Deep Learning
• All approaches where we use deep neural network algorithms
• Traditional ML
• All other algorithms like Random Forest / Xgboost / Logistic Regression
ML Catch up
15. Agenda
• Overview of QA
• Use Cases
• Demo
• Architecture
• Deep Dive and Research Findings in:
• Question Question similarity FAQ matching
• Answer Selection
• Questions / Discussion
16. Architecture
• Time to Discuss technical stuff!
- Why not just use Solr?
- Why not just Machine Learning?
- Why both?
- How to use both?
17. • DL models will look at the top N (typically 100~1000)
candidates returned by Solr.
• Depends on how much time you spare for the DL models.
• In the two step architecture we are using it is important that
Solr returns the best possible results.
• High Recall from Solr
• High Precision from Deep Learning
Architecture
18. Architecture - Index
We need to compute multiple features
for ML models at run time.
Hence, it’s best to precompute and
index some expensive ones.
19. Use synonyms, answer type classifier which can
enrich your query.
We developed an unsupervised method based on
mutual information captures association between
the keywords in questions and answers
E.g.:
Q words: indexing, typeahead, autocomplete
A words: solr, lucene
Currently we use it as a feature in XGBoost
Architecture - Query
20. • Extract Entities like persons, organizations etc and
POS like Nouns, Verbs etc from the Query
Architecture - Query
28. Agenda
• Overview of QA
• Use Cases
• Demo
• Architecture
• Deep Dive and Research Findings in:
• Question Question similarity FAQ matching
• Answer Selection
• Questions / Discussion
29. Deep Dive (question to question similarity)
• Use case: Given a set of FAQ’s, historic questions, find
similar questions / duplicate questions.
• Research: Experiments with different types of models:
supervised, unsupervised, and ensembles, combined
with Solr to better capture contextual information.
total 38 different models compared
• Data: Quora duplicate question detection.
Two questions are labeled as duplicate if they are
asking the same thing.
30. Deep Dive (question to question similarity)
• Methods:
Supervised: XGBOOST
1.TFIDF cosine score
2.TFIDF cosine score Nouns
3.Number of question keywords
overlap
4.Number of overlapping POS tags
5.Number of overlapping NER tags
Unsupervised:
• BM25-Solr
• TFIDF w/ sublinear TF, Ngram
instead of just tokens
• Doc2Vec
• Fast Text
• Glove
• Google Sentence Encoder
(Universal Sentence Encoder)
• Ensemble Solr + GSE
33. • Conclusion:
- Unsupervised ensemble of TFIDF with GSE provides the best results with LESS
work than supervised.
- Single unsupervised models, generalized TFIDF on n-grams and the new
google sentence encoding provide best results
- Xgboost supervised methods do not have advantage over unsupervised
methods on this task.
Deep Dive (question to question similarity)
34. Agenda
• Overview of QA
• Use Cases
• Demo
• Architecture
• Deep Dive and Research Findings in:
• Question Question similarity FAQ matching
• Answer Selection
• Questions / Discussion
35. Deep Dive (Answer Selection)
• Use case: Given historic QA transactions (e.g. support team archives,
slack, StackOverflow, email thread), when a question is issued, pick the
best answer.
• Research: Extensive feature engineering for supervised traditional
models, deep learning models, and unsupervised models
total 21 models compared
• Data: Insurance QA.
The data has a list of question and corresponding answers
• Methods:
• Supervised: XGBOOST, Deep Learning Model
• Unsupervised: FastText, w2v, d2v
36. • Methods:
Supervised: XGBOOST
1.Cosine score between Tfidf vectors of question
and answer (stop words trimmed).
2.Cosine score between Tfidf vectors of question
and answer after extracting nouns.
3.One-hot-encoding of question keywords (what,
who, when, how long, how fast…)
4.One-hot-encoding of POS tags of question and
answers.
5.One-hot-encoding of NER tags of question and
answers.
6.W2v cosine distance of question and answer.
7.Doc2Vec cosine distance of question and
answer (supervised).
Deep Dive (Answer Selection)
38. • Methods:
Supervised: Deep Learning
1.Encoder:
• Embeddings layer
• Dropout
• LSTM layer
• Attention Weighted Average layer
1.Simple triplet-based model that uses
shared Encoder for question, positive
answer and negative answer
1.15 negative answers are generated at
random for each epoch.
Deep Dive (Answer Selection)
39. Deep Dive (Answer Selection)
• Methods:
Supervised: Deep Learning
1.Simple Encoder:
Embeddings layer
LSTM layer
Attention Weighted Average
layer
1.Triplet-based model using
shared Encoder for question,
positive answer and negative
answer
1.15 negative answers are
generated at random for each
epoch.
41. Deep Dive (Answer Selection)
• Model Stability: Based on the amount of training data.
d2v (unsupervised) / xgboost (supervised) / xgboost_dl (supervised) / DL (supervised)
42. • Conclusion:
- DL models are significantly better than traditional models except when
training data size reduces to 10% (around 3K qa pairs). The performance is
very similar to traditional model
- There is a significant advantage of using supervised methods over
unsupervised.
- Suggestion: If the training data size < 5K use XGBoost model with pre-trained
embedding fasttext embeddings.
If the training data size is big enough, use DL models without feature
extraction and w2v/d2v.
Deep Dive (Answer Selection)
43. • Deep Learning is often regarded as unexplainable and a
black box that we do not know how it works or why it works.
• But is it really unexplainable?
• Well not all of it!
- We can explain why particular text is tagged as positive or
negative.
- Our DL model explains itself!
- Using Attention mechanism we extract what the model considers
important and highlight it for our exploration
Bonus!