SlideShare a Scribd company logo
1 of 22
Implementing Chatbots
using Deep Learning.
By : Rohan Chikorde
Introduction
What is a CHATBOT?
 A chat robot, a computer program that simulates human
conversation, or chat, through artificial intelligence.
 It is a service, powered by rules and artificial intelligence,
that you interact with via a chat interface.
 The service could be any number of things, ranging from
functional to fun, and it could live in any major chat product
(Facebook Messenger, Slack, Telegram, Text Messages, etc).
List of best AI Chatbots:
 Mitsuku (Leobner Prize Winner) - Prize in AI for Chatbots in 2013
 Jabberwacky
 PersonalityForge
 Botser
 Cleverbot
* http://www.techstext.com/list-of-best-chatbots-to-converse/
Types of Chatbot
 RETRIEVAL-BASED MODELS -
o Uses a repository of predefined responses and some kind of heuristic to
pick an appropriate response based on the input and context.
o The heuristic could be as simple as a rule-based expression match, or as
complex as an ensemble of Machine Learning classifiers.
 GENERATIVE MODELS-
o This bot has an artificial brain AKA artificial intelligence. You don’t have
to be ridiculously specific when you are talking to it. It understands
language, not just commands.
o This bot continuously gets smarter as it learns from conversations it has
with people.
Open Domain vs. Closed Domain
 In an open domain setting, the user can take the
conversation anywhere. There isn’t necessarily have a well-
defined goal or intention.
Ex: Conversation about refinancing one’s mortgage
 In a closed domain setting, the space of possible inputs and
outputs is somewhat limited because the system is trying to
achieve a very specific goal.
Ex : Hotel’s Customer Support or Shopping Assistants
 The longer the conversation the more difficult to automate it because it need to keep track of
what has been said.
Ex: Customer support conversations.
 Short-Text Conversations where the goal is to create a single response to a single input.
Ex: What is your name?
Long vs Short Conversations
Implementing a
Retrieval-Based
Model In
TensorFlow
Architecture
of AI Chatbot
Retrieval Based Model
 The vast majority of production systems today are retrieval-based, or a combination of
retrieval-based and generative model.
 Generative models are an active area of research, but we’re not quite there yet.
 For building Hotel’s Customer Support, right now best bet is most likely a retrieval-based
model.
The Ubuntu Dialog Corpus
 The Ubuntu Dialog Corpus (UDC) is one of the largest public dialog datasets available.
 It’s based on chat logs from the Ubuntu channels on a public IRC network.
 The training data consists of 1,000,000 examples, 50% positive (label 1) and 50% negative
(label 0).
 Each example consists of a context, the conversation up to this point, and an utterance, a
response to the context.
 The dataset originally comes in CSV format. We could work directly with CSVs, but it’s better
to convert our data into Tensorflow’s proprietary Example format.
 The main benefit of this format is that it allows us to load tensors directly from the input files
and let Tensorflow handle all the shuffling, batching and queuing of inputs. As part of the
preprocessing, also create a vocabulary.
 This means we map each word to an integer number, e.g. “cat” may become 2631. The
TFRecord files which will generate store these integer numbers instead of the word strings. Its
better to save the vocabulary so that we can map back from integers to words later on.
Data Pre-processing
 One of the Deep Learning model for building chatbot is called a Dual Encoder LSTM network.
 There are many Deep Learning architectures – it’s an active research area.
 seq2seq model often used in Machine Translation would probably do well on this task.
Deep Learning Model
 tf-idf predictor
o tf-idf stands for “term frequency – inverse document” frequency and it measures how important a
word in a document is relative to the whole corpus.
o Documents that have similar content will have similar tf-idf vectors.
o Intuitively, if a context and a response have similar words they are more likely to be a correct pair.
Implementation…
Dual Encoder LSTM Model
Working of Dual Encoder LSTM
 Both the context and the response text are split by words, and each word is embedded into a
vector. The word embedding are initialized with Stanford’s GloVe vectors and are fine-tuned during
training.
 Both the embedded context and response are fed into the same Recurrent Neural Network word-
by-word. The RNN generates a vector representation that, loosely speaking, captures the “meaning”
of the context and response (c and r).
 It then, multiply c with a matrix M to “predict” a response r’. The matrix M is learned during
training.
 It measure the similarity of the predicted response r’ and the actual response r by taking the dot
product of these two vectors. A large dot product means the vectors are similar and that the
response should receive a high score.
 Then it applies a sigmoid function to convert that score into a probability.
Creating an Input Function
 In order to use Tensorflow’s built-in support for training and evaluation we need to create an
input function — a function that returns batches of our input data.
 In fact, because our training and test data have different formats, we need different input
functions for them. The input function should return a batch of features and labels.
Steps:
 On a high level, the function does the following:
o Create a feature definition that describes the fields in our Example file
o Read records from the input_files with tf.TFRecordReader
o Parse the records according to the feature definition
o Extract the training labels
o Batch multiple examples and training labels
o Return the batched examples and training labels
Creating the Model
 As we have different formats of training and evaluation data we have to create a function
wrapper that take care of bringing the data into the right format.
 It takes a model argument, which is a function that actually makes predictions.
 In our case it’s the Dual Encoder LSTM, but we could easily swap it out for some other neural
network
Evaluating the model & making Predictions
 After training the model we can evaluate it on the test set.
 This will run the evaluation metrics on the test set instead of the validation set.
 We will get probability scores for unseen data.
 We could imagine feeding in 100 potential responses to a context and then picking the one
with the highest score.
References
 The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn
Dialogue Systems
o https://arxiv.org/abs/1506.08909
 Artificial intelligence markup language (aiml).
o http://alice.sunlitsurf.com/alice/aiml.html.
 Intelligent Chat Bot for Banking System
o http://www.ijettcs.org/Volume4Issue5(2)/IJETTCS-2015-10-09-16.pdf
 WILDML, Deep Learning for Chatbot
o http://www.wildml.com/2016/07/deep-learning-for-chatbots-2-retrieval-based-model-tensorflow/
Thank You

More Related Content

What's hot

Chat Bots Presentation 8.9.16
Chat Bots Presentation 8.9.16Chat Bots Presentation 8.9.16
Chat Bots Presentation 8.9.16
Samuel Adams, MBA
 
How do Chatbots Work? A Guide to Chatbot Architecture
How do Chatbots Work? A Guide to Chatbot ArchitectureHow do Chatbots Work? A Guide to Chatbot Architecture
How do Chatbots Work? A Guide to Chatbot Architecture
Maruti Techlabs
 
Chatbot Artificial Intelligence
Chatbot Artificial IntelligenceChatbot Artificial Intelligence
Chatbot Artificial Intelligence
Md. Mahedi Mahfuj
 

What's hot (20)

Ai chatbot ppt.pptx
Ai chatbot ppt.pptxAi chatbot ppt.pptx
Ai chatbot ppt.pptx
 
CHATBOT PPT-2.pptx
CHATBOT PPT-2.pptxCHATBOT PPT-2.pptx
CHATBOT PPT-2.pptx
 
Chat Bots Presentation 8.9.16
Chat Bots Presentation 8.9.16Chat Bots Presentation 8.9.16
Chat Bots Presentation 8.9.16
 
Chatbot
ChatbotChatbot
Chatbot
 
What is a chatbot?
What is a chatbot?What is a chatbot?
What is a chatbot?
 
Chatbot Abstract
Chatbot AbstractChatbot Abstract
Chatbot Abstract
 
Artificially Intelligent chatbot Implementation
Artificially Intelligent chatbot ImplementationArtificially Intelligent chatbot Implementation
Artificially Intelligent chatbot Implementation
 
Let's Build a Chatbot!
Let's Build a Chatbot!Let's Build a Chatbot!
Let's Build a Chatbot!
 
Chatbot and Virtual AI Assistant Implementation in Natural Language Processing
Chatbot and Virtual AI Assistant Implementation in Natural Language Processing Chatbot and Virtual AI Assistant Implementation in Natural Language Processing
Chatbot and Virtual AI Assistant Implementation in Natural Language Processing
 
Introduction to Chatbots
Introduction to ChatbotsIntroduction to Chatbots
Introduction to Chatbots
 
The Chatbots Are Coming: A Guide to Chatbots, AI and Conversational Interfaces
The Chatbots Are Coming: A Guide to Chatbots, AI and Conversational InterfacesThe Chatbots Are Coming: A Guide to Chatbots, AI and Conversational Interfaces
The Chatbots Are Coming: A Guide to Chatbots, AI and Conversational Interfaces
 
Using Machine Learning and Chatbots to handle 1st line Technical Support
Using Machine Learning and Chatbots to handle 1st line Technical SupportUsing Machine Learning and Chatbots to handle 1st line Technical Support
Using Machine Learning and Chatbots to handle 1st line Technical Support
 
How do Chatbots Work? A Guide to Chatbot Architecture
How do Chatbots Work? A Guide to Chatbot ArchitectureHow do Chatbots Work? A Guide to Chatbot Architecture
How do Chatbots Work? A Guide to Chatbot Architecture
 
Final presentation on chatbot
Final presentation on chatbotFinal presentation on chatbot
Final presentation on chatbot
 
An Introduction To Chat Bots
An Introduction To Chat BotsAn Introduction To Chat Bots
An Introduction To Chat Bots
 
Chatbot Technology
Chatbot TechnologyChatbot Technology
Chatbot Technology
 
Chatbot
ChatbotChatbot
Chatbot
 
Chatbot Artificial Intelligence
Chatbot Artificial IntelligenceChatbot Artificial Intelligence
Chatbot Artificial Intelligence
 
chatbots presentation .pptx
chatbots presentation .pptxchatbots presentation .pptx
chatbots presentation .pptx
 
Chatbots and Deep Learning
Chatbots and Deep LearningChatbots and Deep Learning
Chatbots and Deep Learning
 

Similar to Chatbot_Presentation

NLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_StudentsNLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
Himanshu kandwal
 
DataChat_FinalPaper
DataChat_FinalPaperDataChat_FinalPaper
DataChat_FinalPaper
Urjit Patel
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
Gabriel Moreira
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
kevig
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
ijnlc
 

Similar to Chatbot_Presentation (20)

ms_3.pdf
ms_3.pdfms_3.pdf
ms_3.pdf
 
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_StudentsNLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
NLP_A Chat-Bot_answering_queries_of_UT-Dallas_Students
 
IRJET - Deep Learning based Chatbot
IRJET - Deep Learning based ChatbotIRJET - Deep Learning based Chatbot
IRJET - Deep Learning based Chatbot
 
DataChat_FinalPaper
DataChat_FinalPaperDataChat_FinalPaper
DataChat_FinalPaper
 
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
AI生成工具的新衝擊 - MS Bing & Google Bard 能否挑戰ChatGPT-4領導地位
 
ijeter35852020.pdf
ijeter35852020.pdfijeter35852020.pdf
ijeter35852020.pdf
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
NEURAL NETWORK BOT
NEURAL NETWORK BOTNEURAL NETWORK BOT
NEURAL NETWORK BOT
 
ENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKINGENSEMBLE MODEL FOR CHUNKING
ENSEMBLE MODEL FOR CHUNKING
 
IRJET- Conversational Assistant based on Sentiment Analysis
IRJET- Conversational Assistant based on Sentiment AnalysisIRJET- Conversational Assistant based on Sentiment Analysis
IRJET- Conversational Assistant based on Sentiment Analysis
 
Tata Motors GDC .LTD Internship
Tata Motors GDC .LTD Internship Tata Motors GDC .LTD Internship
Tata Motors GDC .LTD Internship
 
leewayhertz.com-What role do embeddings play in a ChatGPT-like model.pdf
leewayhertz.com-What role do embeddings play in a ChatGPT-like model.pdfleewayhertz.com-What role do embeddings play in a ChatGPT-like model.pdf
leewayhertz.com-What role do embeddings play in a ChatGPT-like model.pdf
 
IRJET- Recruitment Chatbot
IRJET- Recruitment ChatbotIRJET- Recruitment Chatbot
IRJET- Recruitment Chatbot
 
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
 
Revolutionizing Industry 4.0: GPT-Enabled Real-Time Support
Revolutionizing Industry 4.0: GPT-Enabled Real-Time SupportRevolutionizing Industry 4.0: GPT-Enabled Real-Time Support
Revolutionizing Industry 4.0: GPT-Enabled Real-Time Support
 
A Research Paper on HUMAN MACHINE CONVERSATION USING CHATBOT
A Research Paper on HUMAN MACHINE CONVERSATION USING CHATBOTA Research Paper on HUMAN MACHINE CONVERSATION USING CHATBOT
A Research Paper on HUMAN MACHINE CONVERSATION USING CHATBOT
 
Named Entity Recognition For Hindi-English code-mixed Twitter Text
Named Entity Recognition For Hindi-English code-mixed Twitter Text Named Entity Recognition For Hindi-English code-mixed Twitter Text
Named Entity Recognition For Hindi-English code-mixed Twitter Text
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
 
DOMAIN BASED CHUNKING
DOMAIN BASED CHUNKINGDOMAIN BASED CHUNKING
DOMAIN BASED CHUNKING
 

Chatbot_Presentation

  • 1. Implementing Chatbots using Deep Learning. By : Rohan Chikorde
  • 3. What is a CHATBOT?  A chat robot, a computer program that simulates human conversation, or chat, through artificial intelligence.  It is a service, powered by rules and artificial intelligence, that you interact with via a chat interface.  The service could be any number of things, ranging from functional to fun, and it could live in any major chat product (Facebook Messenger, Slack, Telegram, Text Messages, etc).
  • 4. List of best AI Chatbots:  Mitsuku (Leobner Prize Winner) - Prize in AI for Chatbots in 2013  Jabberwacky  PersonalityForge  Botser  Cleverbot * http://www.techstext.com/list-of-best-chatbots-to-converse/
  • 5. Types of Chatbot  RETRIEVAL-BASED MODELS - o Uses a repository of predefined responses and some kind of heuristic to pick an appropriate response based on the input and context. o The heuristic could be as simple as a rule-based expression match, or as complex as an ensemble of Machine Learning classifiers.  GENERATIVE MODELS- o This bot has an artificial brain AKA artificial intelligence. You don’t have to be ridiculously specific when you are talking to it. It understands language, not just commands. o This bot continuously gets smarter as it learns from conversations it has with people.
  • 6. Open Domain vs. Closed Domain  In an open domain setting, the user can take the conversation anywhere. There isn’t necessarily have a well- defined goal or intention. Ex: Conversation about refinancing one’s mortgage  In a closed domain setting, the space of possible inputs and outputs is somewhat limited because the system is trying to achieve a very specific goal. Ex : Hotel’s Customer Support or Shopping Assistants
  • 7.  The longer the conversation the more difficult to automate it because it need to keep track of what has been said. Ex: Customer support conversations.  Short-Text Conversations where the goal is to create a single response to a single input. Ex: What is your name? Long vs Short Conversations
  • 10. Retrieval Based Model  The vast majority of production systems today are retrieval-based, or a combination of retrieval-based and generative model.  Generative models are an active area of research, but we’re not quite there yet.  For building Hotel’s Customer Support, right now best bet is most likely a retrieval-based model.
  • 11. The Ubuntu Dialog Corpus  The Ubuntu Dialog Corpus (UDC) is one of the largest public dialog datasets available.  It’s based on chat logs from the Ubuntu channels on a public IRC network.  The training data consists of 1,000,000 examples, 50% positive (label 1) and 50% negative (label 0).  Each example consists of a context, the conversation up to this point, and an utterance, a response to the context.
  • 12.  The dataset originally comes in CSV format. We could work directly with CSVs, but it’s better to convert our data into Tensorflow’s proprietary Example format.  The main benefit of this format is that it allows us to load tensors directly from the input files and let Tensorflow handle all the shuffling, batching and queuing of inputs. As part of the preprocessing, also create a vocabulary.  This means we map each word to an integer number, e.g. “cat” may become 2631. The TFRecord files which will generate store these integer numbers instead of the word strings. Its better to save the vocabulary so that we can map back from integers to words later on. Data Pre-processing
  • 13.  One of the Deep Learning model for building chatbot is called a Dual Encoder LSTM network.  There are many Deep Learning architectures – it’s an active research area.  seq2seq model often used in Machine Translation would probably do well on this task. Deep Learning Model
  • 14.  tf-idf predictor o tf-idf stands for “term frequency – inverse document” frequency and it measures how important a word in a document is relative to the whole corpus. o Documents that have similar content will have similar tf-idf vectors. o Intuitively, if a context and a response have similar words they are more likely to be a correct pair. Implementation…
  • 16. Working of Dual Encoder LSTM  Both the context and the response text are split by words, and each word is embedded into a vector. The word embedding are initialized with Stanford’s GloVe vectors and are fine-tuned during training.  Both the embedded context and response are fed into the same Recurrent Neural Network word- by-word. The RNN generates a vector representation that, loosely speaking, captures the “meaning” of the context and response (c and r).  It then, multiply c with a matrix M to “predict” a response r’. The matrix M is learned during training.  It measure the similarity of the predicted response r’ and the actual response r by taking the dot product of these two vectors. A large dot product means the vectors are similar and that the response should receive a high score.  Then it applies a sigmoid function to convert that score into a probability.
  • 17. Creating an Input Function  In order to use Tensorflow’s built-in support for training and evaluation we need to create an input function — a function that returns batches of our input data.  In fact, because our training and test data have different formats, we need different input functions for them. The input function should return a batch of features and labels.
  • 18. Steps:  On a high level, the function does the following: o Create a feature definition that describes the fields in our Example file o Read records from the input_files with tf.TFRecordReader o Parse the records according to the feature definition o Extract the training labels o Batch multiple examples and training labels o Return the batched examples and training labels
  • 19. Creating the Model  As we have different formats of training and evaluation data we have to create a function wrapper that take care of bringing the data into the right format.  It takes a model argument, which is a function that actually makes predictions.  In our case it’s the Dual Encoder LSTM, but we could easily swap it out for some other neural network
  • 20. Evaluating the model & making Predictions  After training the model we can evaluate it on the test set.  This will run the evaluation metrics on the test set instead of the validation set.  We will get probability scores for unseen data.  We could imagine feeding in 100 potential responses to a context and then picking the one with the highest score.
  • 21. References  The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems o https://arxiv.org/abs/1506.08909  Artificial intelligence markup language (aiml). o http://alice.sunlitsurf.com/alice/aiml.html.  Intelligent Chat Bot for Banking System o http://www.ijettcs.org/Volume4Issue5(2)/IJETTCS-2015-10-09-16.pdf  WILDML, Deep Learning for Chatbot o http://www.wildml.com/2016/07/deep-learning-for-chatbots-2-retrieval-based-model-tensorflow/