SlideShare a Scribd company logo
1 of 26
Question Answering and Virtual
Assistants with Deep Learning
Lucidworks AI Lab
Chao Han
VP, Head of Data Science
Sava Kalbachou
AI Research Engineer
Andy Liu
Senior Data Engineer
Agenda
• Overview of current QA solutions
• Why we choose Neural based approach
• Challenges in Neural Search implementation
• Fusion FAQ workflow
• Chatbot integration
Current QA solutions
STRENGHT: Comprehensive workflow building tools with UI.
WEAKNESS: Tedious template building process, low coverage,
general ontology do not apply on specific domains.
OUR QA SYSTEM: information retrieval based. Find answers in indexed
documents.
FAQ solution
QA SYSTEM to directly recommend answers from FAQ pool or
find similar questions asked before. Paired with a cold start
solution when no existing FAQ available.
BUSINESS USE CASES:
• Call center or IT support ticket records
• Questions about products for E-commerce
• Email and Slack conversations
• Sharepoint FAQs
• Semantic search for long queries using neural information retrieval
0.551
0.634
0.731
0.853
0.532
0.589
0.726
0.831
0.513 0.535
0.721 0.735
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Solr Doc2Vec Xgboost + w2v DL on triplets
F1(binary) Model stability (varying size of training data)
100% 50% 10%
TF-IDF vs Word2Vec Encoding
Neural Encoding
Inverted Index vs Vectors
representation
Challenge
SOLR DOES NOT SUPPORT
DENSE VECTORS SEARCH
Implementation details
Agreement in average with top10 results from the full dense vectors
search
Comparison Time Agreement
Solr
Solr top10
65ms 2.27
FAQ Opt. 1
Solr top500 + Reranking
195ms 5.88
FAQ Opt. 2
1 Cluster + Reranking
154ms 7.80
FAQ Opt. 2
2 Clusters + Reranking
186ms 8.65
FAQ Solution Workflow
FAQ Input Run on-prem or cloud
DL training module
Model zip file Fusion
optimized
pipelines
Takes days rather than months from model training to
implementation
Training Module in Docker
AU TO P I LOT M O D E
• Auto parameters tuning to find the best possible model
• Auto adjust default parameters based on data size, resources etc.
• Suitable for non Data Scientists
A D VA N C E D U S E R M O D E
• Expose parameters for further tuning
• Suitable for Data Scientists with DL knowledge
C O M P R E H E N S I V E E VA LUAT I O N
• Helps to measure improvements
• Variety metrics like MAP, MRR, Precision, Recall, ROC-AUC
Fusion meets TensorFlow
O P T I M I Z E D I N D E X P I P E L I N E
• Pre-indexing answer vectors: no need to re-compute vectors for each
query
• Vectors compression: faster retrieval from Solr
• Clusterization as part of TensorFlow computational graph
• Encoding multiple fields for further results ensembling
O P T I M I Z E D Q U E RY P I P E L I N E
• On a fly query encoding and clusterization via TensorFlow model
• Faceting by clusters
• Efficient vectors similarity computation, supports variety of distances
• Ensemble of vectors similarity and Solr scores
• Suitable for any object-to-object dense vectors search
Solr
results
FAQ
results
Solr
results
FAQ
results
Chatbot integration
Fusion FAQ solution might be easily integrated to the existing chatbot workflow
Rasa Chatbot Fusion FAQ API TensorFlow DL
Fusion meets Rasa
C O M M U N I C AT E W I T H F U S I O N A P I
• To get answer from the existing knowledge base
F O L LOW U P Q U E S T I O N S
• Infer if additional information is needed and ask for it
• Easily extendable metadata collection
• No source code change needed
N O N E E D F O R TO N S O F I N T E N T S
• Just one intent to make Fusion FAQ call
FA L L B A C K S C E N A R I O
• Make fallback action in case there is no good answer in FAQ
S E N T I M E N T P R E D I C T I O N
• Adjust workflow based on users satisfaction
THANK YOU
Sava Kalbachou
sava.kalbachou@lucidworks.com
AI Research Engineer at
Lucidworks
https://www.linkedin.com/in/sava-
kalbachou/
https://github.com/thinline72
https://www.kaggle.com/thinline72
https://twitter.com/thinline72s
https://t.me/thinline72

More Related Content

What's hot

The Next Generation of AI-Powered Search
The Next Generation of AI-Powered SearchThe Next Generation of AI-Powered Search
The Next Generation of AI-Powered SearchLucidworks
 
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Databricks
 
Introduction to Recommender Systems
Introduction to Recommender SystemsIntroduction to Recommender Systems
Introduction to Recommender SystemsTuri, Inc.
 
Made to Measure: Ranking Evaluation using Elasticsearch
Made to Measure: Ranking Evaluation using ElasticsearchMade to Measure: Ranking Evaluation using Elasticsearch
Made to Measure: Ranking Evaluation using ElasticsearchDaniel Schneiter
 
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...Lucidworks
 
Deep Dive: Security Trimming in Fusion
Deep Dive: Security Trimming in FusionDeep Dive: Security Trimming in Fusion
Deep Dive: Security Trimming in FusionLucidworks
 
Personalized Re-Ranking of Documents
Personalized Re-Ranking of DocumentsPersonalized Re-Ranking of Documents
Personalized Re-Ranking of Documentskswapna9
 
Intelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning ToolkitsIntelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning ToolkitsTuri, Inc.
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku
 
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...Dataiku
 
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreH2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreSri Ambati
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningJoaquin Delgado PhD.
 
Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...Lucidworks
 
Florian Douetteau @ Dataiku
Florian Douetteau @ DataikuFlorian Douetteau @ Dataiku
Florian Douetteau @ DataikuPAPIs.io
 
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comPersonalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comLucidworks
 
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Databricks
 
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and DesignReal-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and DesignJuliet Hougland
 
Building Better Models Faster Using Active Learning
Building Better Models Faster Using Active LearningBuilding Better Models Faster Using Active Learning
Building Better Models Faster Using Active LearningCrowdFlower
 
Big data testing (1)
Big data testing (1)Big data testing (1)
Big data testing (1)vodqancr
 

What's hot (20)

The Next Generation of AI-Powered Search
The Next Generation of AI-Powered SearchThe Next Generation of AI-Powered Search
The Next Generation of AI-Powered Search
 
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
 
Introduction to Recommender Systems
Introduction to Recommender SystemsIntroduction to Recommender Systems
Introduction to Recommender Systems
 
Made to Measure: Ranking Evaluation using Elasticsearch
Made to Measure: Ranking Evaluation using ElasticsearchMade to Measure: Ranking Evaluation using Elasticsearch
Made to Measure: Ranking Evaluation using Elasticsearch
 
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
Automatically Build Solr Synonyms List using Machine Learning - Chao Han, Luc...
 
Vespa, A Tour
Vespa, A TourVespa, A Tour
Vespa, A Tour
 
Deep Dive: Security Trimming in Fusion
Deep Dive: Security Trimming in FusionDeep Dive: Security Trimming in Fusion
Deep Dive: Security Trimming in Fusion
 
Personalized Re-Ranking of Documents
Personalized Re-Ranking of DocumentsPersonalized Re-Ranking of Documents
Personalized Re-Ranking of Documents
 
Intelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning ToolkitsIntelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning Toolkits
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
 
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
 
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth RedmoreH2O World - Clustering & Feature Extraction on Text - Seth Redmore
H2O World - Clustering & Feature Extraction on Text - Seth Redmore
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...
 
Florian Douetteau @ Dataiku
Florian Douetteau @ DataikuFlorian Douetteau @ Dataiku
Florian Douetteau @ Dataiku
 
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comPersonalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
 
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with Da...
 
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and DesignReal-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
 
Building Better Models Faster Using Active Learning
Building Better Models Faster Using Active LearningBuilding Better Models Faster Using Active Learning
Building Better Models Faster Using Active Learning
 
Big data testing (1)
Big data testing (1)Big data testing (1)
Big data testing (1)
 

Similar to Question Answering and Virtual Assistants with Deep Learning

Webinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningWebinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningLucidworks
 
Webinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningWebinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningLucidworks
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Lucidworks
 
Fusion 3 Overview Webinar
Fusion 3 Overview Webinar Fusion 3 Overview Webinar
Fusion 3 Overview Webinar Lucidworks
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaDatabricks
 
Candidate selection tutorial
Candidate selection tutorialCandidate selection tutorial
Candidate selection tutorialYiqun Liu
 
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...Aman Grover
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchRachel Berryman
 
Deep Learning Automated Helpdesk
Deep Learning Automated HelpdeskDeep Learning Automated Helpdesk
Deep Learning Automated HelpdeskPranav Sharma
 
Webinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's NewWebinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's NewLucidworks
 
How to Use Artificial Intelligence by Microsoft Product Manager
 How to Use Artificial Intelligence by Microsoft Product Manager How to Use Artificial Intelligence by Microsoft Product Manager
How to Use Artificial Intelligence by Microsoft Product ManagerProduct School
 
Building a Data Ingestion & Processing Pipeline with Spark & Airflow
Building a Data Ingestion & Processing Pipeline with Spark & AirflowBuilding a Data Ingestion & Processing Pipeline with Spark & Airflow
Building a Data Ingestion & Processing Pipeline with Spark & AirflowTom Lous
 

Similar to Question Answering and Virtual Assistants with Deep Learning (20)

Webinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningWebinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep Learning
 
Webinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningWebinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep Learning
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
 
TensorFlow 101
TensorFlow 101TensorFlow 101
TensorFlow 101
 
Fusion 3 Overview Webinar
Fusion 3 Overview Webinar Fusion 3 Overview Webinar
Fusion 3 Overview Webinar
 
Deep learning for NLP
Deep learning for NLPDeep learning for NLP
Deep learning for NLP
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at Humana
 
Candidate selection tutorial
Candidate selection tutorialCandidate selection tutorial
Candidate selection tutorial
 
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
 
Deep Domain
Deep DomainDeep Domain
Deep Domain
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
 
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science teamLean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
 
Google cloud certification data engineer
Google cloud certification data engineerGoogle cloud certification data engineer
Google cloud certification data engineer
 
Deep Learning Automated Helpdesk
Deep Learning Automated HelpdeskDeep Learning Automated Helpdesk
Deep Learning Automated Helpdesk
 
Webinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's NewWebinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's New
 
How to Use Artificial Intelligence by Microsoft Product Manager
 How to Use Artificial Intelligence by Microsoft Product Manager How to Use Artificial Intelligence by Microsoft Product Manager
How to Use Artificial Intelligence by Microsoft Product Manager
 
Building a Data Ingestion & Processing Pipeline with Spark & Airflow
Building a Data Ingestion & Processing Pipeline with Spark & AirflowBuilding a Data Ingestion & Processing Pipeline with Spark & Airflow
Building a Data Ingestion & Processing Pipeline with Spark & Airflow
 

More from Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Question Answering and Virtual Assistants with Deep Learning

  • 1.
  • 2. Question Answering and Virtual Assistants with Deep Learning
  • 3. Lucidworks AI Lab Chao Han VP, Head of Data Science Sava Kalbachou AI Research Engineer Andy Liu Senior Data Engineer
  • 4. Agenda • Overview of current QA solutions • Why we choose Neural based approach • Challenges in Neural Search implementation • Fusion FAQ workflow • Chatbot integration
  • 5. Current QA solutions STRENGHT: Comprehensive workflow building tools with UI. WEAKNESS: Tedious template building process, low coverage, general ontology do not apply on specific domains. OUR QA SYSTEM: information retrieval based. Find answers in indexed documents.
  • 6. FAQ solution QA SYSTEM to directly recommend answers from FAQ pool or find similar questions asked before. Paired with a cold start solution when no existing FAQ available. BUSINESS USE CASES: • Call center or IT support ticket records • Questions about products for E-commerce • Email and Slack conversations • Sharepoint FAQs • Semantic search for long queries using neural information retrieval
  • 7. 0.551 0.634 0.731 0.853 0.532 0.589 0.726 0.831 0.513 0.535 0.721 0.735 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Solr Doc2Vec Xgboost + w2v DL on triplets F1(binary) Model stability (varying size of training data) 100% 50% 10%
  • 10. Inverted Index vs Vectors representation
  • 11. Challenge SOLR DOES NOT SUPPORT DENSE VECTORS SEARCH
  • 12. Implementation details Agreement in average with top10 results from the full dense vectors search Comparison Time Agreement Solr Solr top10 65ms 2.27 FAQ Opt. 1 Solr top500 + Reranking 195ms 5.88 FAQ Opt. 2 1 Cluster + Reranking 154ms 7.80 FAQ Opt. 2 2 Clusters + Reranking 186ms 8.65
  • 13. FAQ Solution Workflow FAQ Input Run on-prem or cloud DL training module Model zip file Fusion optimized pipelines Takes days rather than months from model training to implementation
  • 14. Training Module in Docker AU TO P I LOT M O D E • Auto parameters tuning to find the best possible model • Auto adjust default parameters based on data size, resources etc. • Suitable for non Data Scientists A D VA N C E D U S E R M O D E • Expose parameters for further tuning • Suitable for Data Scientists with DL knowledge C O M P R E H E N S I V E E VA LUAT I O N • Helps to measure improvements • Variety metrics like MAP, MRR, Precision, Recall, ROC-AUC
  • 15. Fusion meets TensorFlow O P T I M I Z E D I N D E X P I P E L I N E • Pre-indexing answer vectors: no need to re-compute vectors for each query • Vectors compression: faster retrieval from Solr • Clusterization as part of TensorFlow computational graph • Encoding multiple fields for further results ensembling O P T I M I Z E D Q U E RY P I P E L I N E • On a fly query encoding and clusterization via TensorFlow model • Faceting by clusters • Efficient vectors similarity computation, supports variety of distances • Ensemble of vectors similarity and Solr scores • Suitable for any object-to-object dense vectors search
  • 16.
  • 17.
  • 20.
  • 21.
  • 22. Chatbot integration Fusion FAQ solution might be easily integrated to the existing chatbot workflow Rasa Chatbot Fusion FAQ API TensorFlow DL
  • 23. Fusion meets Rasa C O M M U N I C AT E W I T H F U S I O N A P I • To get answer from the existing knowledge base F O L LOW U P Q U E S T I O N S • Infer if additional information is needed and ask for it • Easily extendable metadata collection • No source code change needed N O N E E D F O R TO N S O F I N T E N T S • Just one intent to make Fusion FAQ call FA L L B A C K S C E N A R I O • Make fallback action in case there is no good answer in FAQ S E N T I M E N T P R E D I C T I O N • Adjust workflow based on users satisfaction
  • 24.
  • 26. Sava Kalbachou sava.kalbachou@lucidworks.com AI Research Engineer at Lucidworks https://www.linkedin.com/in/sava- kalbachou/ https://github.com/thinline72 https://www.kaggle.com/thinline72 https://twitter.com/thinline72s https://t.me/thinline72

Editor's Notes

  1. Hi all and thank you everyone for attending this talk! I’m really excited to be here and present things we have been working on recently at Lucidworks. And today we are going to talk about Question Answering and Virtual Assistants with Deep Learning.
  2. Let me start with introducing our team. My name is Sava Kalbachou, I’m AI Research Engineer at Luidworks AI Lab team, which is led by Chao Han, our VP of Data Science. Alongside with Andy Liu, our Senior Data Engineer, we have been working hard on research and development of QnA solution I’m going to talk about today.
  3. Here is our agenda: Firstly, we’ll start with overview of current QA solutions, including ours. Then we’ll discuss why we choose Neural Search based approach for Question Answering task. After that we’ll talk about challenges that we faced in Neural Search implementation and how we were able to tackle them. We’ll also discuss our solution workflow. And Finally, I’ll show you how our QA solution might be integrated to Chatbot applications.
  4. Most of QA solutions are chatbots which are basically comprehensive workflow building tools with UI. Usually, users have to manually provide examples, specify intents, build ontologies and use rule-based approaches. That’s why the most popular demos for chatbot solutions are for booking hotels or restaurants. Generally, these domains have a limited number of possible questions, so it’s possible to cover them using manual, rule-based methods.  However, for more complicated enterprise use cases, coverage is too low using these rule-based methods, leaving far too many questions unanswered.   In contrast, QA system we designed is information retrieval based. Which allows us to find accurate answers in indexed documents to fully utilize company’s existing knowledge bases. Without the need for extensive configuration or lots of manual work.
  5. Our FAQ solution can directly recommend answers from an existing FAQ pool or it can find similar questions previously asked. There is also a possibility to leverage a cold start solution when no existing FAQ is available. Typical business use-cases are: Call centers, support teams: Like If you have a call center or support tickets, FAQ solution can power search or a virtual assistant on a help/contact us page. It will help users to find answers by themselves and reduce the load on your call center. The same system can be used to drastically improve the efficiency of your customer support team, since they would be able to easily find solutions to already solved problems. In the E-Commerce domain, it might be applied to answer questions about a particular product. QA pairs might be extracted from Slack and Email conversations to achieve fast knowledge sharing. If you don’t have an FAQ and just want to improve search for long queries, cold start solution is a good fit. It utilizes word embeddings to capture semantic and contextual information for long queries or natural language questions. It can also be combined with Solr scoring to provide even better results.
  6. But why we choose Neural-based approach for our solution? During our research phase we conducted a comprehensive study comparing different methods. Starting from unsupervised models like Doc2Vec and classical machine learning models like boosting trees, we moved to the more advanced cutting edge Deep Learning approaches. On a screen you can see results of one of our experiments that also shows models stability depending on a size of training data. Although results of Deep Learning model drops when it’s trained using 50% or 10% of training data, it is still better than XGBoost trained on the whole training dataset. This is due recent achievements in Transfer Learning which allows us to leverage knowledge from already pre-trained models or embeddings. Moreover, Deep Learning models don’t require any heavy feature engineering. We tried to incorporate such features like Part of Speech and Named Entity Recognition tags to the Deep Learning model and it didn’t give any reasonable boost in results. Which we believe because Deep Learning models can extract and learn such information by themselves.
  7. But how it works under the hood and what is the difference between Classical and Neural search engines? Well, classical search engines use TFIDF-like formulas to compute similarity scores between queries and documents. Like BM25 in Solr. But these approaches are purely based on words matching. They cannot easily incorporate synonyms or semantic knowledge. In contrast, about 5-6 years ago a new approach appeared called word2vec. It maps conceptually similar words to the same vector space in such way that they have close vectors.
  8. In our case we are moving even further. We are encoding the whole sentences, questions, answers in the deep vectors representation. That allows us to automatically tackle not only synonyms, but even semantically similar phrases. Model, that we have been using, is siamise deep neural network trained in a supervised way, so it can learn how to map questions and relevant answers as close as possible to each other, but questions and irrelevant answers as far as possible from each other in the same vector space.
  9. But you may ask how it works in a real world and how scalable it is? Classical search engines like Solr use Inverted Index to store and find relevant documents. But in our case we need to search in some abstract high-dimensional vector space, where semantically similar texts are located near each other and even form groups. So we need to work with dense vectors.
  10. And here comes the challenge: Solr does not support Dene Vectors Search. So we had to find an efficient and scalable way to use trained Deep Learning encoders in the wild.
  11. We have addressed that challenge by implementing optimized query pipelines in Fusion, which enables fast runtime vectors similarity search. This table shows query performance vs agreement with a full dense vectors search. Although Solr is fast, it doesn’t yield similar results for natural language queries. And we already saw that Deep Learning models drastically outperform classical search in terms of accuracy of the results. So to support dense vector search in runtime, we implemented two pipeline options:  - The first option is to use Solr to retrieve the first top 500 candidates and use Deep Learning model to perform reranking. - The second option yields faster and quite often more accurate results. We integrated an embedded clustering layer to the deep learning model. So at query time, we can get the closest clusters to the current query and then search and rerank answers only within those clusters. By implementing these two options we were able to find a good trade off between query time performance and accuracy of the results.
  12. Now let’s take a look at our solution workflow. It mainly consists from two parts: model training and query time model inference in Fusion. Model training is performed in a Docker container, which has deep learning/word vectors training modules and configuration UI. So it can be easily ran on-prem or in cloud. After training is finished, a zip file which includes the model and associated files is generated. The inference part is performed in Fusion. Model is uploaded to Fusion BlobStore. Optimized index and query pipelines are used to conduct run-time neural search. It really usually only takes days rather than months from training to implementation.
  13. Here is how our training module in Docker looks like in details - It has an auto pilot mode for non data scientists. This mode provides automatic parameter tuning and model selection, so that non data scientists can run the module and get good stable results with minimum work.  - We also have advanced user mode which exposes parameters for tuning by data scientists. - Comprehensive evaluation metrics are computed at each modelling step for easy model comparisons.  Also it support both GPU and CPU. If you have GPU, that’s great, you’ll be able to do training experiments very fast. If you only have CPU, that’s also fine. Training on a typical dataset requires several hours, so it might be run during lunch or overnight.
  14. After training is done, model is freezed and converted to the low-level computational TensorFlow graph, which is uploaded to Fusion and can be run even from Java. The same model is used in both index and query pipelines. During indexing, model encodes texts to vectrors, clusterize and compress them. Compressed vectors help to achieve much faster retrieving from Solr. It’s also possible to encode as many fields as you want to use in the final ensembling. Query pipeline is quite similar. Queries are encoded and clusterized on a fly. Query clusters might be used to search only within that clusters, which accelerates query time. After answer candidates are retrieved from Solr, they are decompressed and vectors similarity is computed. Finally, different similarity scores, like question-answer or questions-question similarity, and Solr score can be ensembled to get the best possible results. Generally, this approach and pipeline stages are quite abstract and can be used for any kind of object-to-object dense vectors search. So it might be text, audio, images or structured objects if we can find a way to encode them in the same vectors space.
  15. Here is how it looks in Fusion Index Workbench. The model is basically the encoder that encodes text to the dense vector representation. So on the right side of the slide you can see simulated results that contains document vector, its compressed version and clusters.
  16. There are several additional stages in the query pipeline, like those for computing vectors distances and sorting results. But I’d like to emphasize Compute mathematical expression stage, which allows to ensemble different scores by using variety of mathematical expressions. So for example, we can combine Solr score with vectors distances, which gives us more control and leads to the better results. Especially in the cold start mode.
  17. Now let’s take a look on several examples and compare results from vanilla Solr and from our FAQ solution.  These are results for Finance FAQ, which is about investment, mortgage and credits. Here you can see the difference for a question “How to compute gold value”.  Solr fails to provide best answers in top because it basically counts exact token matches to rank results. Whereas FAQ solution is capable to summarise text content in the deep vector representation, so it can find semantically similar answers and previously asked questions. Like this question “How do you measure the value of gold?”
  18. Here is another example, now in E-Commerce domain. The question “Any screen protection ?” is asked against one of the phone cases products. As Solr search for exact tokens by default, it cannot match two similar words like “protection” and “protector” without synonyms list provided or stemming, which might affect results for other queries. Also, the previously asked question contains a spelling mistake in the word “screeen” (additional e). Despite on the spelling mistake, FAQ solution is able to understand the context and provides the appropriate QA pair in top of the results.
  19. Now let’s move to the insurance domain and check what kind of UI might be build on top of such FAQ solution. So once question is asked, users might see similar previously asked questions in the left column, open corresponded answer by clicking on one of them. Or jump straight to the right column with returned answers.  In this case with the question like “Can my wife drive on my insurance?”, model can really infer context of the question from the phrase “drive on” and suggest results exactly related to Car insurance. But not for any other kind of possible insurance policies. Moreover, it can understand that terms like Wife, Husband, Spouse, Fiance or Girlfriend are synonyms without any additional information provided.
  20. Here is another example with the question “Is it required to have home insurance?”  Deep Learning model can not only understand that home insurance and homeowners insurance are the same things, but it can also infer that such constructions like “Is it required?”,  “Is it legal?”, “Is it mandatory?” and even “Can I own a home without? ” are similar things. So, as you can see, Deep Learning powered search is capable to provide much better results than just token based search. It can really understand the meaning and find good semantically similar answers for incoming queries.
  21. Another good example would be to show how such FAQ solution might be integrated to chatbot workflows. For instance to Rasa, which is quite popular open-source solution that can be run on prem or in cloud.
  22. Rasa can easily communicate with Fusion API to get appropriate answers. As we have informational retrieval based QA system, we don’t need to create tons of custom intents, which usually requires a lot of manual work and provided examples. Instead, we just have one intent in the workflow that predicts that question should be answered by FAQ system. As similarity score for each returned answer is in range between 0 and 1, it can serve as a confidence score to control the workflow. By using this feature, we developed a simple yet very effective mechanism that allows QA system to ask follow up questions to get more information from users if it’s needed. It’s done by using metadata collection which can be easily modified and extended without a need to change any source code! There are also situations when there is no right answers in the existing FAQ pool. Then QA system can do a fallback query to a regular site search or ask user to make a call. Workflow and next actions might be also adjusted based on users satisfaction. It’s done by integrating sentiment analysis model. So, for example, if user isn’t happy with provided info, we can suggest to give user a call.
  23. OK, let me show you a short video recording of such QA Chatbot that we built using FAQ from United Airlines website. User starts interacting with the bot, asking about flight status. It’s controlled by regular chatbot workflow. Then user is asking about variety of things like check-in, online payments and bags restrictions. And here QA system sees that there are several similar answers for different ticket types, so it asks user to provide a ticket type. Once user provides that it’s Basic, FAQ returns the most appropriate answer about Basic Economy baggage allowance. When user asks about holiday destinations, system isn’t able to find appropriate answer in the existing FAQ, so it does fallback call to United website search. Once user gives a feedback regarding tickets changing policy, system is able to predict negative sentiment and asks to contact customer service.
  24. And that’s all! Thank you everyone for attending this talk. Please, feel free to ask any questions.