3. Lucidworks AI Lab
Chao Han
VP, Head of Data Science
Sava Kalbachou
AI Research Engineer
Andy Liu
Senior Data Engineer
4. Agenda
• Overview of current QA solutions
• Why we choose Neural based approach
• Challenges in Neural Search implementation
• Fusion FAQ workflow
• Chatbot integration
5. Current QA solutions
STRENGHT: Comprehensive workflow building tools with UI.
WEAKNESS: Tedious template building process, low coverage,
general ontology do not apply on specific domains.
OUR QA SYSTEM: information retrieval based. Find answers in indexed
documents.
6. FAQ solution
QA SYSTEM to directly recommend answers from FAQ pool or
find similar questions asked before. Paired with a cold start
solution when no existing FAQ available.
BUSINESS USE CASES:
• Call center or IT support ticket records
• Questions about products for E-commerce
• Email and Slack conversations
• Sharepoint FAQs
• Semantic search for long queries using neural information retrieval
12. Implementation details
Agreement in average with top10 results from the full dense vectors
search
Comparison Time Agreement
Solr
Solr top10
65ms 2.27
FAQ Opt. 1
Solr top500 + Reranking
195ms 5.88
FAQ Opt. 2
1 Cluster + Reranking
154ms 7.80
FAQ Opt. 2
2 Clusters + Reranking
186ms 8.65
13. FAQ Solution Workflow
FAQ Input Run on-prem or cloud
DL training module
Model zip file Fusion
optimized
pipelines
Takes days rather than months from model training to
implementation
14. Training Module in Docker
AU TO P I LOT M O D E
• Auto parameters tuning to find the best possible model
• Auto adjust default parameters based on data size, resources etc.
• Suitable for non Data Scientists
A D VA N C E D U S E R M O D E
• Expose parameters for further tuning
• Suitable for Data Scientists with DL knowledge
C O M P R E H E N S I V E E VA LUAT I O N
• Helps to measure improvements
• Variety metrics like MAP, MRR, Precision, Recall, ROC-AUC
15. Fusion meets TensorFlow
O P T I M I Z E D I N D E X P I P E L I N E
• Pre-indexing answer vectors: no need to re-compute vectors for each
query
• Vectors compression: faster retrieval from Solr
• Clusterization as part of TensorFlow computational graph
• Encoding multiple fields for further results ensembling
O P T I M I Z E D Q U E RY P I P E L I N E
• On a fly query encoding and clusterization via TensorFlow model
• Faceting by clusters
• Efficient vectors similarity computation, supports variety of distances
• Ensemble of vectors similarity and Solr scores
• Suitable for any object-to-object dense vectors search
22. Chatbot integration
Fusion FAQ solution might be easily integrated to the existing chatbot workflow
Rasa Chatbot Fusion FAQ API TensorFlow DL
23. Fusion meets Rasa
C O M M U N I C AT E W I T H F U S I O N A P I
• To get answer from the existing knowledge base
F O L LOW U P Q U E S T I O N S
• Infer if additional information is needed and ask for it
• Easily extendable metadata collection
• No source code change needed
N O N E E D F O R TO N S O F I N T E N T S
• Just one intent to make Fusion FAQ call
FA L L B A C K S C E N A R I O
• Make fallback action in case there is no good answer in FAQ
S E N T I M E N T P R E D I C T I O N
• Adjust workflow based on users satisfaction
26. Sava Kalbachou
sava.kalbachou@lucidworks.com
AI Research Engineer at
Lucidworks
https://www.linkedin.com/in/sava-
kalbachou/
https://github.com/thinline72
https://www.kaggle.com/thinline72
https://twitter.com/thinline72s
https://t.me/thinline72
Editor's Notes
Hi all and thank you everyone for attending this talk! I’m really excited to be here and present things we have been working on recently at Lucidworks.
And today we are going to talk about Question Answering and Virtual Assistants with Deep Learning.
Let me start with introducing our team.
My name is Sava Kalbachou, I’m AI Research Engineer at Luidworks AI Lab team, which is led by Chao Han, our VP of Data Science.
Alongside with Andy Liu, our Senior Data Engineer, we have been working hard on research and development of QnA solution I’m going to talk about today.
Here is our agenda:
Firstly, we’ll start with overview of current QA solutions, including ours.Then we’ll discuss why we choose Neural Search based approach for Question Answering task.
After that we’ll talk about challenges that we faced in Neural Search implementation and how we were able to tackle them.
We’ll also discuss our solution workflow.
And Finally, I’ll show you how our QA solution might be integrated to Chatbot applications.
Most of QA solutions are chatbots which are basically comprehensive workflow building tools with UI. Usually, users have to manually provide examples, specify intents, build ontologies and use rule-based approaches. That’s why the most popular demos for chatbot solutions are for booking hotels or restaurants. Generally, these domains have a limited number of possible questions, so it’s possible to cover them using manual, rule-based methods.
However, for more complicated enterprise use cases, coverage is too low using these rule-based methods, leaving far too many questions unanswered.
In contrast, QA system we designed is information retrieval based. Which allows us to find accurate answers in indexed documents to fully utilize company’s existing knowledge bases. Without the need for extensive configuration or lots of manual work.
Our FAQ solution can directly recommend answers from an existing FAQ pool or it can find similar questions previously asked. There is also a possibility to leverage a cold start solution when no existing FAQ is available.
Typical business use-cases are:Call centers, support teams: Like If you have a call center or support tickets, FAQ solution can power search or a virtual assistant on a help/contact us page. It will help users to find answers by themselves and reduce the load on your call center. The same system can be used to drastically improve the efficiency of your customer support team, since they would be able to easily find solutions to already solved problems.In the E-Commerce domain, it might be applied to answer questions about a particular product.QA pairs might be extracted from Slack and Email conversations to achieve fast knowledge sharing.
If you don’t have an FAQ and just want to improve search for long queries, cold start solution is a good fit. It utilizes word embeddings to capture semantic and contextual information for long queries or natural language questions. It can also be combined with Solr scoring to provide even better results.
But why we choose Neural-based approach for our solution?
During our research phase we conducted a comprehensive study comparing different methods. Starting from unsupervised models like Doc2Vec and classical machine learning models like boosting trees, we moved to the more advanced cutting edge Deep Learning approaches.
On a screen you can see results of one of our experiments that also shows models stability depending on a size of training data. Although results of Deep Learning model drops when it’s trained using 50% or 10% of training data, it is still better than XGBoost trained on the whole training dataset. This is due recent achievements in Transfer Learning which allows us to leverage knowledge from already pre-trained models or embeddings.
Moreover, Deep Learning models don’t require any heavy feature engineering. We tried to incorporate such features like Part of Speech and Named Entity Recognition tags to the Deep Learning model and it didn’t give any reasonable boost in results. Which we believe because Deep Learning models can extract and learn such information by themselves.
But how it works under the hood and what is the difference between Classical and Neural search engines?
Well, classical search engines use TFIDF-like formulas to compute similarity scores between queries and documents. Like BM25 in Solr. But these approaches are purely based on words matching. They cannot easily incorporate synonyms or semantic knowledge.
In contrast, about 5-6 years ago a new approach appeared called word2vec. It maps conceptually similar words to the same vector space in such way that they have close vectors.
In our case we are moving even further. We are encoding the whole sentences, questions, answers in the deep vectors representation. That allows us to automatically tackle not only synonyms, but even semantically similar phrases.
Model, that we have been using, is siamise deep neural network trained in a supervised way, so it can learn how to map questions and relevant answers as close as possible to each other, but questions and irrelevant answers as far as possible from each other in the same vector space.
But you may ask how it works in a real world and how scalable it is?
Classical search engines like Solr use Inverted Index to store and find relevant documents. But in our case we need to search in some abstract high-dimensional vector space, where semantically similar texts are located near each other and even form groups. So we need to work with dense vectors.
And here comes the challenge: Solr does not support Dene Vectors Search. So we had to find an efficient and scalable way to use trained Deep Learning encoders in the wild.
We have addressed that challenge by implementing optimized query pipelines in Fusion, which enables fast runtime vectors similarity search. This table shows query performance vs agreement with a full dense vectors search.
Although Solr is fast, it doesn’t yield similar results for natural language queries. And we already saw that Deep Learning models drastically outperform classical search in terms of accuracy of the results.
So to support dense vector search in runtime, we implemented two pipeline options:
- The first option is to use Solr to retrieve the first top 500 candidates and use Deep Learning model to perform reranking.
- The second option yields faster and quite often more accurate results. We integrated an embedded clustering layer to the deep learning model. So at query time, we can get the closest clusters to the current query and then search and rerank answers only within those clusters.
By implementing these two options we were able to find a good trade off between query time performance and accuracy of the results.
Now let’s take a look at our solution workflow. It mainly consists from two parts: model training and query time model inference in Fusion.
Model training is performed in a Docker container, which has deep learning/word vectors training modules and configuration UI. So it can be easily ran on-prem or in cloud.
After training is finished, a zip file which includes the model and associated files is generated.
The inference part is performed in Fusion. Model is uploaded to Fusion BlobStore. Optimized index and query pipelines are used to conduct run-time neural search.
It really usually only takes days rather than months from training to implementation.
Here is how our training module in Docker looks like in details
- It has an auto pilot mode for non data scientists. This mode provides automatic parameter tuning and model selection, so that non data scientists can run the module and get good stable results with minimum work.
- We also have advanced user mode which exposes parameters for tuning by data scientists.
- Comprehensive evaluation metrics are computed at each modelling step for easy model comparisons.
Also it support both GPU and CPU. If you have GPU, that’s great, you’ll be able to do training experiments very fast. If you only have CPU, that’s also fine. Training on a typical dataset requires several hours, so it might be run during lunch or overnight.
After training is done, model is freezed and converted to the low-level computational TensorFlow graph, which is uploaded to Fusion and can be run even from Java. The same model is used in both index and query pipelines.
During indexing, model encodes texts to vectrors, clusterize and compress them. Compressed vectors help to achieve much faster retrieving from Solr.
It’s also possible to encode as many fields as you want to use in the final ensembling.
Query pipeline is quite similar. Queries are encoded and clusterized on a fly.
Query clusters might be used to search only within that clusters, which accelerates query time.
After answer candidates are retrieved from Solr, they are decompressed and vectors similarity is computed.
Finally, different similarity scores, like question-answer or questions-question similarity, and Solr score can be ensembled to get the best possible results.
Generally, this approach and pipeline stages are quite abstract and can be used for any kind of object-to-object dense vectors search. So it might be text, audio, images or structured objects if we can find a way to encode them in the same vectors space.
Here is how it looks in Fusion Index Workbench.
The model is basically the encoder that encodes text to the dense vector representation.
So on the right side of the slide you can see simulated results that contains document vector, its compressed version and clusters.
There are several additional stages in the query pipeline, like those for computing vectors distances and sorting results. But I’d like to emphasize Compute mathematical expression stage, which allows to ensemble different scores by using variety of mathematical expressions. So for example, we can combine Solr score with vectors distances, which gives us more control and leads to the better results. Especially in the cold start mode.
Now let’s take a look on several examples and compare results from vanilla Solr and from our FAQ solution.
These are results for Finance FAQ, which is about investment, mortgage and credits. Here you can see the difference for a question “How to compute gold value”.
Solr fails to provide best answers in top because it basically counts exact token matches to rank results. Whereas FAQ solution is capable to summarise text content in the deep vector representation, so it can find semantically similar answers and previously asked questions. Like this question “How do you measure the value of gold?”
Here is another example, now in E-Commerce domain. The question “Any screen protection ?” is asked against one of the phone cases products. As Solr search for exact tokens by default, it cannot match two similar words like “protection” and “protector” without synonyms list provided or stemming, which might affect results for other queries.
Also, the previously asked question contains a spelling mistake in the word “screeen” (additional e).Despite on the spelling mistake, FAQ solution is able to understand the context and provides the appropriate QA pair in top of the results.
Now let’s move to the insurance domain and check what kind of UI might be build on top of such FAQ solution. So once question is asked, users might see similar previously asked questions in the left column, open corresponded answer by clicking on one of them. Or jump straight to the right column with returned answers. In this case with the question like “Can my wife drive on my insurance?”, model can really infer context of the question from the phrase “drive on” and suggest results exactly related to Car insurance. But not for any other kind of possible insurance policies.Moreover, it can understand that terms like Wife, Husband, Spouse, Fiance or Girlfriend are synonyms without any additional information provided.
Here is another example with the question “Is it required to have home insurance?” Deep Learning model can not only understand that home insurance and homeowners insurance are the same things, but it can also infer that such constructions like “Is it required?”, “Is it legal?”, “Is it mandatory?” and even “Can I own a home without? ” are similar things.
So, as you can see, Deep Learning powered search is capable to provide much better results than just token based search. It can really understand the meaning and find good semantically similar answers for incoming queries.
Another good example would be to show how such FAQ solution might be integrated to chatbot workflows.
For instance to Rasa, which is quite popular open-source solution that can be run on prem or in cloud.
Rasa can easily communicate with Fusion API to get appropriate answers.
As we have informational retrieval based QA system, we don’t need to create tons of custom intents, which usually requires a lot of manual work and provided examples.
Instead, we just have one intent in the workflow that predicts that question should be answered by FAQ system.
As similarity score for each returned answer is in range between 0 and 1, it can serve as a confidence score to control the workflow. By using this feature, we developed a simple yet very effective mechanism that allows QA system to ask follow up questions to get more information from users if it’s needed.
It’s done by using metadata collection which can be easily modified and extended without a need to change any source code!
There are also situations when there is no right answers in the existing FAQ pool. Then QA system can do a fallback query to a regular site search or ask user to make a call.
Workflow and next actions might be also adjusted based on users satisfaction. It’s done by integrating sentiment analysis model. So, for example, if user isn’t happy with provided info, we can suggest to give user a call.
OK, let me show you a short video recording of such QA Chatbot that we built using FAQ from United Airlines website.
User starts interacting with the bot, asking about flight status. It’s controlled by regular chatbot workflow.
Then user is asking about variety of things like check-in, online payments and bags restrictions.
And here QA system sees that there are several similar answers for different ticket types, so it asks user to provide a ticket type. Once user provides that it’s Basic, FAQ returns the most appropriate answer about Basic Economy baggage allowance.
When user asks about holiday destinations, system isn’t able to find appropriate answer in the existing FAQ, so it does fallback call to United website search.
Once user gives a feedback regarding tickets changing policy, system is able to predict negative sentiment and asks to contact customer service.
And that’s all! Thank you everyone for attending this talk. Please, feel free to ask any questions.