SlideShare a Scribd company logo
1 of 21
Download to read offline
1/21
How to train a transactional chatbot using reinforcement
learning?
leewayhertz.com/train-transactional-chatbot-using-reinforcement-learning
In an age where artificial intelligence is reshaping our world, chatbots have emerged as a
valuable tool for businesses. With a staggering 80% of businesses projected to integrate
chatbots in their operations by 2024, the focus is now shifting towards transactional chatbots,
also known as Goal-oriented (GO) chatbots. Unlike typical chatbots, transactional chatbots
are laser-focused on solving specific user problems. Need to book a ticket? There is a
chatbot for that. Looking to make a reservation? A transactional chatbot is on it. These
transactional chatbots are not just sophisticated, they are becoming smarter and more
efficient by the day.
But how are these transactional chatbots trained to be so proficient? The answer lies in two
major learning techniques: supervised learning and reinforcement learning. Supervised
learning uses an encoder-decoder approach to map user dialogue to responses directly. In
contrast, reinforcement learning takes a more hands-on approach, training chatbots using
trial-and-error conversations with rule-based user simulator or real users.
Among these, transactional chatbots using reinforcement learning have recently surfaced as
an exciting field teeming with potential applications. One stellar example of this rapidly
growing field is the TC-Bot developed by MiuLab. The TC-Bot showcases how a user can be
2/21
simulated using basic rules, significantly expediting the training process compared to using
real people.
With more advanced chatbot training methods being developed, it’s safe to say we are on
the cusp of a new era where transactional chatbots will become ubiquitous, changing the
way we interact with technology. In this article, we will dive deep into the world of
transactional chatbots, explore the process of their training, their use cases and other vital
aspects.
What is transactional chatbot?
Transactional chatbots vs. Traditional chatbots
Key components of transactional chatbot
Benefits of transactional chatbots
How does a transactional chatbot operate?
Understanding the dialogue system
The role of the user simulator and error model controller
An overview of Deep-Q-Network
Training a transactional chatbot using Deep-Q-Network
The scenario
Prerequisites
Understanding the data (movie tickets) for the chatbot
Understanding the anatomy of an action
Preparing the state
Dialogue configuration for the agent
Building neural network model
Implementing policy
Training an agent
Use cases of transactional chatbots
What is transactional chatbot?
A transactional chatbot, also known as a task-oriented or goal-oriented chatbot, is a
specialized form of artificial intelligence software designed with a clear purpose – to help
users achieve a specific goal or complete a specific task. This could range from booking a
flight, scheduling a doctor’s appointment, or placing an order for a pizza.
Unlike their counterparts (general conversation or social chatbots), which focus on simulating
human-like interaction and carrying out broad, non-specific conversations, transactional
chatbots have a clear focus. Their role is not to engage in small talk or provide entertainment
but to aid users in accomplishing a particular task as quickly and efficiently as possible.
3/21
Transactional chatbots operate by recognizing and understanding the user’s intent and then
taking appropriate actions to fulfill the user’s request. To do this, they employ sophisticated
Natural Language Understanding (NLU) capabilities and machine learning algorithms to
interpret the user’s inputs, map them to the correct action, and generate a suitable response.
The importance of these goal-oriented chatbots in today’s digital ecosystem cannot be
understated. In a world that is increasingly driven by speed, efficiency, and convenience,
transactional chatbots serve as a pivotal touchpoint between businesses and customers.
They provide instant, 24/7 support, helping to improve customer service and engagement,
streamline business processes, and reduce operational costs. Moreover, they provide a
personalized user experience, understand and remember customer preferences, and deliver
tailor-made solutions, enhancing customer satisfaction and loyalty.
Furthermore, in times of social distancing and remote operations, transactional chatbots
have become invaluable tools for businesses to maintain constant, uninterrupted customer
support. By handling routine tasks and queries, they allow human staff to focus on more
complex and critical issues, thus enhancing the overall efficiency of the business.
In sum, transactional chatbots are more than just fancy technology; they are powerful tools
that are reshaping the way businesses operate and interact with their customers, making
them indispensable in the modern digital landscape.
Transactional chatbots vs. traditional chatbots
Comparison
Criteria
Transactional Chatbot Traditional Chatbot
Purpose Primarily designed to handle transactions
and support complex tasks. They can
assist in making reservations, completing
purchases, and providing personalized
recommendations.
Typically designed for
simple tasks such as
answering basic FAQs or
guiding users to the
appropriate resources.
Complexity of
interaction
Capable of understanding and
responding to more complex customer
queries. These chatbots can process
multiple layers of communication and
follow the flow of conversation.
Generally capable of
managing simple, linear
conversations and might
struggle with complex
interactions.
Use of AI Uses advanced AI and machine learning
to provide personalized responses,
understand user intent, and remember
previous interactions.
Primarily uses rule-based
responses and may or may
not leverage AI. Its
capabilities are often limited
to predefined responses.
4/21
Data analysis Continually learns from user interactions,
enabling it to make more accurate
predictions and provide personalized
services.
Data analysis is typically
minimal or non-existent, with
less emphasis on learning
from user interactions.
User
experience
Enhances user experience by offering
personalized responses and handling
complex requests.
Provides a satisfactory user
experience for
straightforward inquiries but
may not handle complex
requests as effectively.
Integration with
other systems
Often integrated with other systems
(CRM, ERP) to access customer data,
process transactions, etc.
Usually standalone, with
minimal integration with
other systems.
Cost and
implementation
time
Might require a higher initial investment
and longer implementation time due to
their complex nature.
Generally cheaper and
quicker to implement as
they’re less complex.
Scalability High scalability due to its ability to learn
and adapt from interactions. Can handle
an increasing number of complex queries
effectively.
Limited scalability. As
queries become more
complex, these chatbots
might struggle to maintain
efficiency.
Key components of transactional chatbots
Goal-oriented chatbots or transactional chatbots, also known as task-oriented chatbots, have
several key components that enable them to interact with users effectively and accomplish
specific tasks. Here are some of the main elements:
Natural Language Understanding (NLU) unit: This is the component of the chatbot
that interprets and understands the user’s input. It transforms human language into a
machine-readable format. NLU employs tokenization, stemming, part-of-speech
tagging, and entity extraction to understand the user’s message’s context, intent, and
entities.
Dialogue Manager (DM): The DM is the central control unit of the chatbot. It maintains
the context and state of the conversation, decides the next action based on the current
state and user’s input, and generates the appropriate system response.
State Tracker (ST): Sometimes considered a part of the Dialogue Manager, the state
tracker keeps track of the current state of the conversation, including the user’s goals,
requests, and the information that the chatbot has provided.
Policy learner: This component uses reinforcement learning algorithms to determine
the best responses based on the state of the conversation. It “learns” from its past
actions and their outcomes to optimize the chatbot’s responses.
5/21
Natural Language Generator (NLG) unit: The NLG takes the system response
generated by the dialogue manager and translates it into natural, human-like language.
This can either be a simple template-based system or a more complex machine
learning model.
User simulator: In training a transactional chatbot, a user simulator is used. It’s a model
that generates simulated user behavior, which can be used for training the chatbot in a
controlled environment.
Database (DB): Chatbots that provide information or perform transactions often need
to interact with a database. This could be checking ticket availability, booking
appointments, providing product details, etc. The DB is an integral part of these chatbot
systems.
Error model controller: This component is often used during training to add some
noise to the user simulator’s responses, making the training environment more similar
to real-world conditions where user inputs can be unpredictable and varied.
These components work together in a cycle to enable transactional chatbots to handle
complex, multi-turn dialogues, manage user goals, and offer an engaging, human-like
conversation experience.
Benefits of transactional chatbots
Transactional chatbots, a form of virtual assistant, are seeing increased adoption across
various industries, all thanks to the multitude of benefits they bring to the table. Here are
some benefits of using them:
Enhanced efficiency: Transactional chatbots are designed for multitasking, handling
several customer interactions simultaneously without any hitches. They provide round-
the-clock service, responding to customer queries in real time, regardless of
geographical boundaries or time differences. Automated responses also guarantee
accuracy, improving the overall efficiency of your team and services.
Budget-friendly solution: Incorporating transactional chatbots into your customer
service protocol allows you to minimize the need for human intervention, leading to
considerable cost savings. With their capacity to operate 24/7, chatbots also contribute
to improved cost-effectiveness. By optimizing operations and reducing personnel
expenses, chatbots offer substantial cost advantages.
Tailored interactions: Chatbots can comprehend each customer’s preferences,
paving the way for more personalized interactions and tailored recommendations.
Customers are more likely to interact with businesses offering a personal touch,
enhancing their overall experience.
6/21
Augmented sales: Transactional chatbots can significantly boost sales by providing
personalized suggestions based on customer preferences and buying history. They
also contribute to lead generation by simultaneously managing multiple queries,
potentially enhancing your business’s revenue and sales figures.
Superior customer experience: With their round-the-clock service and efficient
customer management, transactional chatbots significantly improve the customer
experience. By offering seamless service without human involvement, these chatbots
can contribute to the growth and reputation of your organization.
How does a transactional chatbot operate?
Here is the sequence of steps that describe how a transactional chatbot works.
User initiation: The process begins when a user sends a message or a request to the
chatbot. This could be a query, a request for information, or an action such as booking
a ticket or making a reservation.
Input interpretation: The chatbot uses its Natural Language Understanding (NLU) unit
to interpret the user’s message. It converts the natural language input into a machine-
readable format. The NLU unit employs tokenization, stemming, part-of-speech
tagging, and entity extraction to understand the context, intent, and entities in the
user’s message.
Dialogue management: The Dialogue Manager (DM) processes this interpreted input.
It uses the state tracker to keep track of the conversation’s context, including the user’s
goals, requests, and the information the chatbot has provided.
Policy learning: Based on the current state of the conversation, the policy learner
uses reinforcement learning algorithms to decide on the best possible action or
response.
System response generation: Once the action is determined, the system generates
an appropriate response. This could involve querying a database for required
information, initiating a transaction, or formulating a reply to the user’s query.
Response delivery: The generated system response is then translated into natural,
human-like language using the Natural Language Generator (NLG) unit. This response
is then delivered to the user.
User feedback and learning: The chatbot observes and learns from user feedback.
For instance, if a user corrects information or rephrases a request, the chatbot uses
this feedback to update its understanding and improve future responses.
Conversation continuation or termination: Depending on the user’s response or the
chatbot’s settings, the conversation may continue with further exchanges or be
concluded if the chatbot has successfully addressed the user’s request.
7/21
This is a generalized flow of how a transactional chatbot operates. Please note that the exact
workings can vary based on the chatbot’s specific design, functionalities, and the complexity
of tasks it is programmed to perform.
Understanding the dialogue system
A transactional chatbot employs a dialogue system designed to facilitate meaningful,
purpose-driven conversations with users. This system revolves around three key
components: the Dialogue Manager (DM), the Natural Language Understanding (NLU) unit,
and the Natural Language Generator (NLG) unit, each playing a unique role in the
conversational process.
The NLU unit acts as the ears of the chatbot, listening to and interpreting user inputs. When
a user utters something, it is the job of the NLU to translate this into a semantic frame. This
frame is a structured representation of the user’s utterance, stripped of natural language
complexities and brought down to a format the chatbot can understand and process.
Now enter the DM, the chatbot’s brain. Composed of a Dialogue State Tracker (DST) and a
policy, often represented by a neural network, the DM controls the flow of the conversation.
The DST takes the semantic frame from the NLU, combines it with the history of the
conversation, and creates a state representation. This state is the distilled essence of the
dialogue so far, allowing the bot to maintain the context and continuity of the conversation.
8/21
Next, the state representation is ingested by the policy component of the DM, determining
the chatbot’s next action. Here, reinforcement learning can play a vital role, enabling the
chatbot to learn the best responses over time from repeated interactions.
In some cases, an external database can be consulted to supplement the chatbot’s
responses with useful information, like specifics about a restaurant reservation or movie
ticket availability.
Once the chatbot’s response is decided, it is still in a semantic frame, which isn’t user-
friendly. Here is where the NLG unit, the chatbot’s mouth, steps in. The NLG takes this
semantic frame and transforms it back into natural, human-like language. This allows the
chatbot to deliver responses that are easily understandable by the user.
The user’s goal, be it making a reservation, booking a ticket, or gathering information, forms
the driving force behind this dialogue loop. Through iterative cycles of understanding,
managing dialogue, and generating natural language, the transactional chatbot works
towards achieving this user goal, creating a dynamic, interactive, and purposeful
conversational experience.
The role of the user simulator and error model controller
In transactional chatbots, two significant components contribute to refining the model’s
training and performance: the user simulator and the Error Model Controller (EMC). Both are
crucial in enabling the chatbot to handle more realistic, diverse, and error-prone
conversations.
User simulator
The user simulator is akin to a virtual training partner for the chatbot. It emulates the
behavior of a real user, offering a more efficient way to train the bot compared to hours of
user interactions. This simulator operates based on an agenda, meaning it has a predefined
goal for each interaction episode, and its actions align with this goal. The internal state of the
simulator allows it to follow the dialogue progression and take informed actions accordingly.
Responses to agent actions are crafted using a combination of deterministic rules with a
touch of stochastic rules to introduce variety.
User goals are essential elements for the simulator, representing what the user wants to
achieve from a conversation. These goals can be sourced from actual dialogue corpus or be
manually created, comprising ‘inform slots’ and ‘request slots.’ The inform slots represent
constraints the user has in mind, while request slots simulate the user’s quest for specific
information. However, unlike real users who may change their minds during a conversation,
the simulator’s goals remain static throughout an episode. A “default slot” is added to every
goal’s request slots, and the agent must provide a value for this slot for successful goal
fulfillment.
9/21
The user simulator’s internal state records the goal slots and the conversation’s history. It
aids in formulating user actions at each step, containing dictionaries of slots and an intent:
rest slots, history slots, request slots, inform slots, and the intent of the current action.
The actions that a user simulator can perform are varied and can sometimes be complex,
incorporating multiple requests or inform slots. These actions can even contain a mix of both
types of slots.
Error Model Controller (EMC)
The Error Model Controller (EMC) comes into play once a user action is received from the
simulator. It is responsible for introducing errors into these actions, mimicking the
imperfections of real-world interactions and helping the bot cope with potential
misunderstandings or mistakes in user responses. The EMC can add errors to the user
action’s inform slots and intent, training the bot to handle unexpected scenarios better and
ensuring it’s equipped to deal with more realistic, less-than-perfect human interactions.
An overview of Deep-Q-Network
Deep Q-Network (DQN) is a reinforcement learning technique that combines Q-Learning with
deep neural networks. DQN was proposed by researchers at Google DeepMind and and it
had a significant impact on the field of reinforcement learning, particularly in environments
where input data has high-dimensional raw spaces, such as video games.
In traditional Q-Learning, a table called the Q-table stores the value of every possible state-
action pair. However, this approach doesn’t scale well to problems with large state spaces or
problems where states are not easily expressible in table form, such as image inputs.
DQN addresses these challenges using a deep neural network to approximate the Q-
function, which maps state-action pairs to expected future rewards. This way, a neural
network can be trained to predict the Q-values for a given state instead of maintaining a table
for each possible state-action pair.
A key innovation in DQN is using experience replay and target networks to stabilize training.
Experience replay stores past experiences in a replay buffer and samples mini-batches from
this buffer to train the network, which breaks the correlation between sequential experiences.
The target network is a separate network used to compute the target Q-values during
learning, which is periodically updated from the main network. This helps to avoid harmful
feedback loops during learning.
Since the inception of DQN, many extensions have been proposed to improve its
performance and stability, such as Double DQN, Dueling DQN, and Prioritized Experience
Replay.
10/21
Training a transactional chatbot using Deep-Q-Network
Building a transactional chatbot using reinforcement learning involves several steps that
should be executed sequentially. Here’s the sequence:
1. Preparing the state: The initial step in developing a chatbot is preparing the state,
which represents the current situation that the chatbot is in. This typically involves
processing the raw input data (like text conversation history) into a format the model
can understand. The state also includes the chatbot’s internal information about the
conversation, like the identified intents or entities in the user’s utterances.
2. Dialogue configuration for the agent: The next step is to set up the dialogue
configuration for the agent. This includes defining the possible actions that the agent
can take (like answering a question, asking for more information, or ending the
conversation) and defining the reward structure that the agent will use to learn. This
configuration guides the agent about the context of the conversation, its possible
actions, and their consequences.
3. Neural network model: Once the state and dialogue configuration have been set up,
the next step is to build the neural network model that will be used to learn the dialogue
policy. This model takes the current state as input and outputs the Q-values for each
possible action. The Q-values represent the expected future reward for taking each
action, which is used to decide the best action to take. This model could be a Deep Q-
Network (DQN) or other types of network, depending on the complexity of the task and
the available data.
4. Policy: With the neural network model in place, a policy that dictates how the agent
chooses its actions can be defined. A common policy is an epsilon-greedy policy,
where the agent mostly chooses the action with the highest Q-value (as predicted by
the model) but occasionally chooses a random action to explore the environment.
5. Agent training: Finally, with the state, dialogue configuration, neural network model,
and policy setup, the agent can be trained. During training, the agent interacts with the
environment (in this case, the chatbot conversing with users or a user simulator), takes
actions according to its policy, observes the results, and receives rewards. The agent
then uses these experiences to update its neural network model, intending to maximize
its total reward over time. The agent continually goes through this interaction and
learning process until it reaches a satisfactory performance level.
The scenario
The main objective of our transactional chatbot is to engage in proficient interactions with
real users, successfully accomplishing specific tasks such as locating suitable reservations
or movie tickets within the users’ specified constraints. The chatbot, referred to as the agent,
has a crucial role in processing an ongoing conversation’s state and generating an
11/21
appropriate, near-optimal response. In essence, the agent takes a snapshot of the current
dialogue history from the Dialogue State Tracker (ST) and uses it to decide on the most
fitting dialogue response to offer the next.
The supporting code for our system draws inspiration from a dialogue system developed by
MiuLab, known as TC-Bot. The notable achievement of their research is the demonstration
of a user simulation with fundamental rules. This approach enables the swift training of the
chatbot agent via reinforcement learning, which is considerably faster than when training with
real people. While other studies have attempted similar methods, the unique aspect of this
research lies in its effective training model, which is successful and accompanied by
accessible and comprehensive code.
The complete code is available here – https://github.com/maxbrenner-ai/GO-Bot-DRL
Prerequisites
To fully comprehend the code, there are a few prerequisites that won’t be explicitly covered
but are vital for a comprehensive understanding. Here they are:
Proficiency in Python programming – A solid grasp of Python programming language is
a must.
Mastery of Python dictionaries – We will extensively utilize dictionaries in Python, so
understanding their operation is crucial.
Understanding of the DQN (Deep Q-Network) – Familiarity with developing a simple
DQN is necessary.
Experience with Keras for building neural networks – You should know how to construct
a straightforward neural network model using Keras.
Please ensure you are familiar with these areas before proceeding.
You need to have the following dependencies ready before executing the code:
Python >= 3.5
Keras >= 2.24 (Earlier versions probably work)
numpy
Understanding the data (movie tickets) for the chatbot
12/21
Data sources: Our dataset comprises movie tickets with varied attributes or slots. It is
structured as a dictionary where the keys are the unique identifiers of the tickets
(represented as long integers) and the values are sub-dictionaries encapsulating the
detailed information that each ticket holds. It’s important to note that not every ticket will
have the same attributes and certainly not the same values! Data source –
https://gist.github.com/maxbrenner-ai/f665bb570e1ac55568001c7991faebcd#file-
movie_dict-txt
Database index: There is another file that houses a dictionary. The keys in this
dictionary represent different slots that a ticket might hold, while the values are lists of
potential values that each slot can take. Data dictionary link –
https://gist.github.com/maxbrenner-ai/f665bb570e1ac55568001c7991faebcd#file-
movie_dict-txt
User goal collection: Lastly, we have a list that stores user goals. Each goal is
represented as a dictionary comprising request and inform slots. We will delve deeper
into what these slots signify later on. User goal list –
https://gist.github.com/maxbrenner-ai/79c1ace99eafcc376f37090c7e5287aa#file-
movie_user_goals-txt
The core objective here is to enable the chatbot agent to locate a ticket that aligns with the
user’s specific requirements, which are defined by the goal for each episode. This is quite a
challenging task considering each ticket’s uniqueness and variance in slots!
Understanding the anatomy of an action
Understanding the structure of an action is crucial in this dialogue system. Ignoring the
natural language aspect for a moment, we can see that both the user simulator and the
agent work with actions represented as semantic frames. An action consists of an intent,
inform slots, and request slots. Here, a ‘slot’ signifies a key-value pair, typically referring to a
singular inform or request. For instance, in the dictionary {‘starttime’: ’tonight’, ‘theater’: ’regal
16’}, both ‘starttime: tonight’ and ‘theater: regal 16’ are considered slots. Here you will get
more example actions: https://gist.github.com/maxbrenner-
ai/dcf1185a0f2dffc9f88b4054b908cf13#file-action_examples-txt
The intent indicates the kind of action it is. The remainder of the action is divided into inform
slots, which contain constraints, and request slots, which carry information that needs
completion. The potential keys are specified in the dialogue_config.py, and their values are
provided in the aforementioned database dictionary.
An inform slot shares information that the sender wants the receiver to acknowledge. It
comprises a key from the list of keys and a value from that key’s associated list of values.
Conversely, a request slot contains a key for which the sender wishes to retrieve a value
from the receiver. In essence, it is a key from the list of keys and ‘UNK’ (indicating
“unknown”) as the value, as the sender doesn’t yet know the appropriate value for this slot.
13/21
The intents Include:
Inform: Provides constraints in the form of inform slots.
Request: Asks for the completion of request slots with values.
Thanks: Used exclusively by the user, it signals to the agent that it has done
something satisfactory, or that the user is prepared to conclude the conversation.
Match found: Used solely by the agent, it informs the user that a match fulfilling the
user’s goal has been identified.
Reject: Utilized only by the user in response to the agent’s ‘match found’ intent,
indicating that the suggested match doesn’t fit their constraints.
Done: The agent uses this to wrap up the conversation and verify if the current goal
has been accomplished. The user action automatically adopts this intent if the
conversation drags on too long.
Preparing the state
The Dialogue State Tracker (ST) is essential in a transactional chatbot. Its primary function is
to create a ‘state’ for the chatbot to work from. A ‘state’ is like a snapshot of the current
situation in the chat, which the chatbot uses to decide its next action.
To do this, the ST maintains a record of the dialogue, capturing both the user’s and chatbot’s
actions as they happen. It also keeps track of any information (known as ‘inform slots’)
shared in the chat. For instance, if the user mentions they prefer Italian food, this information
is saved in an ‘inform slot.’
The state prepared by the ST is essentially an array of data representing current dialogue
history and all the information slots mentioned so far. It’s like a conversation summary to
date, which helps the chatbot make informed decisions.
Also, whenever the chatbot needs to provide information to the user, the ST can fetch this
from a database using the data in the current information. For example, if the user asks for
Italian restaurants, the ST can pull a list from the database matching this criterion.
One crucial aspect of the ST’s job is to compile a useful state that gives the chatbot an
accurate view of the ongoing conversation. This state includes recent actions from both the
user and the chatbot, letting the chatbot know where the dialogue is at. It also includes a
count of the number of rounds or interactions that have occurred. This helps the chatbot
gauge how much time it has left, especially in scenarios where the chat has a maximum
number of rounds allowed.
Lastly, the state also includes details about the current inform slots and how many database
entries match this information. This helps the chatbot know how much information it has to
work with and how relevant it is to the user’s requirements.
14/21
The Dialogue State Tracker is like the chatbot’s memory and awareness, helping it
understand the current conversation and make the best possible response.
Dialogue configuration for the agent
Dialogue configuration for the agent is a critical step in building a transactional chatbot. This
process involves defining how the chatbot will interact with users, specifying the flow of
conversation, and the range of responses it can deliver. Essentially, it is setting up the rules
of engagement for the chatbot, ensuring that it can understand user inputs and provide
relevant and meaningful responses. This configuration becomes the foundation upon which
further layers of learning and adaptation are built, making it a vital part of any successful
chatbot development.
Here are the dialogue config constants used by the agent:
# Possible inform and request slots for the agent
agent_inform_slots = ['moviename', 'theater', 'starttime', 'date', 'genre', 'state', 'city', 'zip',
'critic_rating',
'mpaa_rating', 'distanceconstraints', 'video_format', 'theater_chain', 'price', 'actor',
'description', 'other', 'numberofkids']
agent_request_slots = ['moviename', 'theater', 'starttime', 'date', 'numberofpeople', 'genre',
'state', 'city', 'zip',
'critic_rating', 'mpaa_rating', 'distanceconstraints', 'video_format', 'theater_chain', 'price',
'actor', 'description', 'other', 'numberofkids']
# Possible actions for agent
agent_actions = [
{'intent': 'done', 'inform_slots': {}, 'request_slots': {}}, # Triggers closing of conversation
{'intent': 'match_found', 'inform_slots': {}, 'request_slots': {}}
]
for slot in agent_inform_slots:
agent_actions.append({'intent': 'inform', 'inform_slots': {slot: 'PLACEHOLDER'},
'request_slots': {}})
for slot in agent_request_slots:
15/21
agent_actions.append({'intent': 'request', 'inform_slots': {}, 'request_slots': {slot: 'UNK'}})
# Rule-based policy request list
rule_requests = ['moviename', 'starttime', 'city', 'date', 'theater', 'numberofpeople']
# These are possible inform slot keys that cannot be used to query
no_query_keys = ['numberofpeople', usersim_default_key]
Building a neural network model
In the development of a transactional chatbot, constructing the neural network model is a
pivotal step. Leveraging Keras, a popular deep learning framework, a model for the chatbot
agent is designed. This model comprises a single hidden layer neural network, which,
despite its simplicity, proves to be highly effective for the task at hand. The design of this
model plays a crucial role in enabling the chatbot to comprehend and respond appropriately
to the user’s input. Here is the code snippet:
def _build_model(self):
model = Sequential()
model.add(Dense(self.hidden_size, input_dim=self.state_size, activation='relu'))
model.add(Dense(self.num_actions, activation='linear'))
model.compile(loss='mse', optimizer=Adam(lr=self.lr))
return model
The instance variables are assigned in constants.json file located here –
https://github.com/maxbrenner-ai/GO-Bot-DRL/blob/master/constants.json
Implementing policy
The implementation of the policy in a transactional chatbot serves as a guide for the agent to
select a suitable action based on the current state. This varies according to whether the
dialogue is in the warm-up or training stage. The warm-up stage, which precedes the
training, is designed to fill the agent’s memory using generally a random policy. For our GO
chatbot, however, a basic rule-based policy is used during the warm-up phase.
def get_action(self, state, use_rule=False):
# self.eps is initialized to the starting epsilon and does NOT get annealed
if self.eps > random.random():
16/21
index = random.randint(0, self.num_actions - 1)
# self._map_index_to_action(index) takes an index and maps the action from all possible
agent actions
action = self._map_index_to_action(index)
return index, action
else:
if use_rule:
return self._rule_action()
else:
return self._dqn_action(state)
Upon transitioning into the training stage, the behavior model comes into play for action
selection. Here, the term ‘use rule’ signifies the warm-up stage. This policy determination
method provides both the index of the action and the action itself.
The rule-based policy employed during the warm-up stage is a straightforward one. A
noteworthy component of this rule-based policy is the reset method of the agent. This
primarily serves to reset a couple of variables associated with the rule-based policy. Although
simple, this policy is crucial for initiating the agent’s activity in a somewhat meaningful way,
thus improving results over taking random actions.
Training an agent
17/21
In a transactional chatbot, the agent’s role is much like a skilled conversation partner, adept
at helping users achieve a specific target, such as booking a reservation or buying a movie
ticket, while considering the user’s specific needs and limitations. This agent’s primary task is
navigating through a conversation and making the best possible decision at each step.
The agent relies on a Dialogue State Tracker (ST) to do this. This tracker is like the memory
of the conversation, keeping track of the discussion’s history. Using this information, the
agent selects an appropriate response that moves the conversation forward, aiming to fulfill
the user’s goal.
The agent chooses a course of action based on a specific state. During the warm-up phase,
this policy could be as simple as a list of requests. However, during training, the policy
becomes more complex, transforming into a single-layer behavior model.
The training method is pretty straightforward, with only a few variations from other methods
that use Deep Q-Network (DQN) training. It is always beneficial to experiment with the
model’s structure, incorporate prioritized experience replay (a technique that selectively
replays more important experiences), and develop a more sophisticated rule-based policy.
This continual tweaking and enhancement can make the agent even more efficient and
effective at accomplishing its goals.
Here’s a simpler explanation of the flow of an agent’s action in a transactional chatbot, as
shown in the above diagram:
A single round or loop in training involves four main components:
18/21
The agent (dqn_agent)
The dialogue state tracker (state_tracker)
The user (or user simulator)
The Error Model Controller (EMC)
The following steps outline the sequence of events:
1. The round begins by acquiring the current state, either an initial state for the start of the
conversation (episode) or equivalent to the previous round. This state is then fed into
the agent’s action determination method.
2. The agent decides on an action based on the current state and passes it to the state
tracker. The state tracker updates its record of the conversation and enriches the
agent’s action with additional information retrieved from a database.
3. The enriched agent’s action is then given to the user simulator. Here, the user simulator
generates a rule-based response and also provides details about the reward and
success rate (though these aren’t shown in the diagram).
4. The user’s response then goes through the error model controller, which introduces
potential errors mimicking real-world scenarios.
5. The possibly erroneous user response is then fed into the state tracker, which updates
its conversation record. However, unlike before, it doesn’t add any substantial updates
to the user response.
6. Lastly, the state tracker produces the next stage of the conversation, completing the
current experience tuple (state, action, reward, next state). This tuple is then added to
the agent’s memory, and the cycle continues with the next round.
Before the actual learning and decision-making begin for a Deep Q-Network (DQN) agent,
like our chatbot, it undergoes a ‘warm-up’ phase. This phase is necessary to fill the agent’s
memory buffer with initial information. But, unlike DQN applications in games where the
agent may perform random actions, our chatbot uses a basic rule-based algorithm during
this warm-up stage. The specifics of this algorithm will be covered in detail in part II of the
series.
It’s also important to note that we are not using any Natural Language (NL) components in
this training process. This means that all the actions of the chatbot will be in the form of
‘semantic frames’ – structured data representing meanings. The focus here is on training the
Dialogue Manager (DM), which doesn’t require Natural Language Understanding (NLU) or
Natural Language Generation (NLG). These NL components are usually pre-trained
separately from the agent and are not crucial to understand the reinforcement learning
process.
Here is the code snippet to train the agent:
print('Training Started...')
19/21
episode = 0
period_reward_total = 0
period_success_total = 0
success_rate_best = 0.0
while episode < NUM_EP_TRAIN: episode_reset() episode += 1 done = False state =
state_tracker.get_state() while not done: next_state, reward, done, success =
run_round(state) period_reward_total += reward state = next_state period_success_total +=
success # Train if episode % TRAIN_FREQ == 0: # Check success rate success_rate =
period_success_total / TRAIN_FREQ avg_reward = period_reward_total / TRAIN_FREQ #
Flush if success_rate >= success_rate_best and success_rate >=
SUCCESS_RATE_THRESHOLD:
dqn_agent.empty_memory()
# Update current best success rate
if success_rate > success_rate_best:
print('Episode: {} NEW BEST SUCCESS RATE: {} Avg Reward: {}' .format(episode,
success_rate, avg_reward))
success_rate_best = success_rate
dqn_agent.save_weights()
period_success_total = 0
period_reward_total = 0
# Copy
dqn_agent.copy()
# Train
dqn_agent.train()
print('...Training Ended')
The complete code is available here – https://github.com/maxbrenner-ai/GO-Bot-
DRL/blob/master/train.py
Use cases of transactional chatbots
20/21
Transactional chatbots hold great potential across a multitude of sectors, including but not
limited to banking, insurance, e-commerce, healthcare, and hospitality. Here is how they can
be leveraged in various contexts:
Banking: Transactional chatbots can enhance banking services by automating tasks
traditionally handled by bank operators. For instance, they can authenticate user
identities, block stolen credit cards, provide operational hours of nearby branches, or
confirm outgoing transfers. Moreover, they can offer immediate assistance in case of
account queries, balance checks, or recent transactions, providing users with real-time
convenience.
Insurance: In the insurance sector, these chatbots can offer quotes to potential
customers or distribute insurance certificates to existing ones. More advanced bots can
even streamline the conversion process, allowing prospects to sign up if the quote
matches their budget and needs directly. The bot gathers necessary details and
forwards the contract and supporting documents, reducing manual intervention and
accelerating policy issuance.
E-commerce: For e-commerce platforms, transactional chatbots can assist users in
product discovery based on their preferences. Additionally, they can facilitate the
buying process and handle requests for order modifications or cancellations. These
bots can also provide real-time order tracking, enhancing the shopping experience.
Healthcare: Transactional chatbots in the healthcare industry can help patients book
appointments, send reminders for medication, or guide common health issues. They
can also gather patient data for health records, making the patient intake process more
efficient.
Hospitality: In the hospitality sector, these bots can automate room bookings, provide
information about facilities, offer personalized recommendations, and address common
queries about the stay, check-in/check-out process, etc.
Energy companies or mobile service providers: Similar to insurance, these
businesses can use transactional chatbots to provide quotes, facilitate service sign-
ups, offer upgrades, or handle cancellation requests.
These few instances illustrate the versatility and utility of transactional chatbots. However,
their use is not confined to these areas, and they can be tailored to address the unique
needs of various other industries.
Endnote
Transactional chatbots have indeed ushered in a new era of interaction between businesses
and their customers. It has become vital for companies to incorporate this transformative
technology into their communication strategies, ensuring they remain adaptable and
responsive to the shifting needs of their clientele. The promise that transactional chatbots
hold for the future is substantial, and with careful planning and tactical execution, they can
21/21
contribute to substantial growth for any business. Therefore, if a company wishes to stay
competitive and not fall behind in the rapidly advancing digital world, integrating a
transactional chatbot into its strategic planning becomes an astute decision.
As consumer expectations continue to evolve, the prospects for transactional chatbots are
looking brighter than ever. Future developments may involve more advanced levels of
personalization, with chatbots becoming increasingly intelligent. This would offer a more
enriched user experience, potentially featuring responses or suggestions specifically tailored
to an individual user’s preferences or past interactions.
Security is another area poised for significant improvement, particularly given the sensitive
transactional information these chatbots handle. Expect to see advancements in encryption,
fraud detection, and even biometric authentication as a means to protect and secure user
data.
Another promising direction for chatbots is their increasing integration with other
sophisticated technologies. Currently, chatbots are deployed across a wide array of business
sectors. Still, in the future, we could see them amalgamating with other cutting-edge
technologies, such as voice assistants or augmented reality, to offer even more engaging
customer experiences.
In sum, transactional chatbots are fast becoming necessary for businesses wishing to thrive
and grow in the digital age. Their potential future developments point to a world of more
personalized, secure, and immersive customer experiences.
Looking to boost your business operations with AI-driven transactional chatbots? Achieve
this with LeewayHertz’s AI chatbot development expertise!

More Related Content

Similar to How to train a transactional chatbot using reinforcement learning.pdf

Understanding Chatbots
Understanding ChatbotsUnderstanding Chatbots
Understanding Chatbotsjotest372
 
Chatbots - The Next Generation Technology
Chatbots - The Next Generation TechnologyChatbots - The Next Generation Technology
Chatbots - The Next Generation Technologyaakash malhotra
 
IRJET- Artificial Intelligence Based Chat-Bot
IRJET-  	  Artificial Intelligence Based Chat-BotIRJET-  	  Artificial Intelligence Based Chat-Bot
IRJET- Artificial Intelligence Based Chat-BotIRJET Journal
 
Chatbot Service Providers | Chatbot Solution Providers | Ai Chatbot Platforms
Chatbot Service Providers | Chatbot Solution Providers | Ai Chatbot PlatformsChatbot Service Providers | Chatbot Solution Providers | Ai Chatbot Platforms
Chatbot Service Providers | Chatbot Solution Providers | Ai Chatbot PlatformsElfo Digital Solutions
 
it-Build-a-Chatbot-Proof-of-Concept-Executive-Brief-V1.pptx
it-Build-a-Chatbot-Proof-of-Concept-Executive-Brief-V1.pptxit-Build-a-Chatbot-Proof-of-Concept-Executive-Brief-V1.pptx
it-Build-a-Chatbot-Proof-of-Concept-Executive-Brief-V1.pptxCarlos Olivares
 
How to build an AI-powered chatbot.pdf
How to build an AI-powered chatbot.pdfHow to build an AI-powered chatbot.pdf
How to build an AI-powered chatbot.pdfAnastasiaSteele10
 
How to build an AI-powered chatbot.pdf
How to build an AI-powered chatbot.pdfHow to build an AI-powered chatbot.pdf
How to build an AI-powered chatbot.pdfStephenAmell4
 
How to build an AI-powered chatbot.pdf
How to build an AI-powered chatbot.pdfHow to build an AI-powered chatbot.pdf
How to build an AI-powered chatbot.pdfJamieDornan2
 
Chatbot Development Company in India
Chatbot Development Company in IndiaChatbot Development Company in India
Chatbot Development Company in IndiaHarithaMithran
 
Chatbot Development Company in India
Chatbot Development Company in IndiaChatbot Development Company in India
Chatbot Development Company in IndiaHarithaMithran
 
Everything You Know About Chatbots with Conversational AI.pptx
Everything You Know About Chatbots with Conversational AI.pptxEverything You Know About Chatbots with Conversational AI.pptx
Everything You Know About Chatbots with Conversational AI.pptxMeon TECHNOLOGIES
 
HubSpot_and_Motion_AI.pptx.pdf
HubSpot_and_Motion_AI.pptx.pdfHubSpot_and_Motion_AI.pptx.pdf
HubSpot_and_Motion_AI.pptx.pdfAnushka895649
 
leewayhertz.com-ChatGPT use cases and solutions for enterprises.pdf
leewayhertz.com-ChatGPT use cases and solutions for enterprises.pdfleewayhertz.com-ChatGPT use cases and solutions for enterprises.pdf
leewayhertz.com-ChatGPT use cases and solutions for enterprises.pdfKristiLBurns
 
Enhancing The Capability of Chatbots
Enhancing The Capability of ChatbotsEnhancing The Capability of Chatbots
Enhancing The Capability of Chatbotsvivatechijri
 
HealthCare ChatBot Using Machine Learning
HealthCare ChatBot Using Machine LearningHealthCare ChatBot Using Machine Learning
HealthCare ChatBot Using Machine LearningIRJET Journal
 
How Will Chatbots Affect Customer Service?
How Will Chatbots Affect Customer Service?How Will Chatbots Affect Customer Service?
How Will Chatbots Affect Customer Service?Robert Smith
 
Introduction to Chatbot
Introduction to ChatbotIntroduction to Chatbot
Introduction to ChatbotNupur Samaddar
 

Similar to How to train a transactional chatbot using reinforcement learning.pdf (20)

Understanding Chatbots
Understanding ChatbotsUnderstanding Chatbots
Understanding Chatbots
 
Chatbots - The Next Generation Technology
Chatbots - The Next Generation TechnologyChatbots - The Next Generation Technology
Chatbots - The Next Generation Technology
 
IRJET- Artificial Intelligence Based Chat-Bot
IRJET-  	  Artificial Intelligence Based Chat-BotIRJET-  	  Artificial Intelligence Based Chat-Bot
IRJET- Artificial Intelligence Based Chat-Bot
 
Chatbot Service Providers | Chatbot Solution Providers | Ai Chatbot Platforms
Chatbot Service Providers | Chatbot Solution Providers | Ai Chatbot PlatformsChatbot Service Providers | Chatbot Solution Providers | Ai Chatbot Platforms
Chatbot Service Providers | Chatbot Solution Providers | Ai Chatbot Platforms
 
it-Build-a-Chatbot-Proof-of-Concept-Executive-Brief-V1.pptx
it-Build-a-Chatbot-Proof-of-Concept-Executive-Brief-V1.pptxit-Build-a-Chatbot-Proof-of-Concept-Executive-Brief-V1.pptx
it-Build-a-Chatbot-Proof-of-Concept-Executive-Brief-V1.pptx
 
How to build an AI-powered chatbot.pdf
How to build an AI-powered chatbot.pdfHow to build an AI-powered chatbot.pdf
How to build an AI-powered chatbot.pdf
 
How to build an AI-powered chatbot.pdf
How to build an AI-powered chatbot.pdfHow to build an AI-powered chatbot.pdf
How to build an AI-powered chatbot.pdf
 
How to build an AI-powered chatbot.pdf
How to build an AI-powered chatbot.pdfHow to build an AI-powered chatbot.pdf
How to build an AI-powered chatbot.pdf
 
Chat Bots
Chat BotsChat Bots
Chat Bots
 
Chatbot Development Company in India
Chatbot Development Company in IndiaChatbot Development Company in India
Chatbot Development Company in India
 
Chatbot Development Company in India
Chatbot Development Company in IndiaChatbot Development Company in India
Chatbot Development Company in India
 
Everything You Know About Chatbots with Conversational AI.pptx
Everything You Know About Chatbots with Conversational AI.pptxEverything You Know About Chatbots with Conversational AI.pptx
Everything You Know About Chatbots with Conversational AI.pptx
 
Chatbots Latest development Technology
Chatbots Latest development TechnologyChatbots Latest development Technology
Chatbots Latest development Technology
 
HubSpot_and_Motion_AI.pptx.pdf
HubSpot_and_Motion_AI.pptx.pdfHubSpot_and_Motion_AI.pptx.pdf
HubSpot_and_Motion_AI.pptx.pdf
 
chatbot.ppt.pptx
chatbot.ppt.pptxchatbot.ppt.pptx
chatbot.ppt.pptx
 
leewayhertz.com-ChatGPT use cases and solutions for enterprises.pdf
leewayhertz.com-ChatGPT use cases and solutions for enterprises.pdfleewayhertz.com-ChatGPT use cases and solutions for enterprises.pdf
leewayhertz.com-ChatGPT use cases and solutions for enterprises.pdf
 
Enhancing The Capability of Chatbots
Enhancing The Capability of ChatbotsEnhancing The Capability of Chatbots
Enhancing The Capability of Chatbots
 
HealthCare ChatBot Using Machine Learning
HealthCare ChatBot Using Machine LearningHealthCare ChatBot Using Machine Learning
HealthCare ChatBot Using Machine Learning
 
How Will Chatbots Affect Customer Service?
How Will Chatbots Affect Customer Service?How Will Chatbots Affect Customer Service?
How Will Chatbots Affect Customer Service?
 
Introduction to Chatbot
Introduction to ChatbotIntroduction to Chatbot
Introduction to Chatbot
 

More from StephenAmell4

AI in supplier management - An Overview.pdf
AI in supplier management - An Overview.pdfAI in supplier management - An Overview.pdf
AI in supplier management - An Overview.pdfStephenAmell4
 
AI for customer success - An Overview.pdf
AI for customer success - An Overview.pdfAI for customer success - An Overview.pdf
AI for customer success - An Overview.pdfStephenAmell4
 
AI in financial planning - Your ultimate knowledge guide.pdf
AI in financial planning - Your ultimate knowledge guide.pdfAI in financial planning - Your ultimate knowledge guide.pdf
AI in financial planning - Your ultimate knowledge guide.pdfStephenAmell4
 
AI in anomaly detection - An Overview.pdf
AI in anomaly detection - An Overview.pdfAI in anomaly detection - An Overview.pdf
AI in anomaly detection - An Overview.pdfStephenAmell4
 
AI for sentiment analysis - An Overview.pdf
AI for sentiment analysis - An Overview.pdfAI for sentiment analysis - An Overview.pdf
AI for sentiment analysis - An Overview.pdfStephenAmell4
 
AI integration - Transforming businesses with intelligent solutions.pdf
AI integration - Transforming businesses with intelligent solutions.pdfAI integration - Transforming businesses with intelligent solutions.pdf
AI integration - Transforming businesses with intelligent solutions.pdfStephenAmell4
 
AI in visual quality control - An Overview.pdf
AI in visual quality control - An Overview.pdfAI in visual quality control - An Overview.pdf
AI in visual quality control - An Overview.pdfStephenAmell4
 
AI-based credit scoring - An Overview.pdf
AI-based credit scoring - An Overview.pdfAI-based credit scoring - An Overview.pdf
AI-based credit scoring - An Overview.pdfStephenAmell4
 
AI in marketing - A detailed insight.pdf
AI in marketing - A detailed insight.pdfAI in marketing - A detailed insight.pdf
AI in marketing - A detailed insight.pdfStephenAmell4
 
Generative AI in insurance- A comprehensive guide.pdf
Generative AI in insurance- A comprehensive guide.pdfGenerative AI in insurance- A comprehensive guide.pdf
Generative AI in insurance- A comprehensive guide.pdfStephenAmell4
 
AI IN INFORMATION TECHNOLOGY: REDEFINING OPERATIONS AND RESHAPING STRATEGIES.pdf
AI IN INFORMATION TECHNOLOGY: REDEFINING OPERATIONS AND RESHAPING STRATEGIES.pdfAI IN INFORMATION TECHNOLOGY: REDEFINING OPERATIONS AND RESHAPING STRATEGIES.pdf
AI IN INFORMATION TECHNOLOGY: REDEFINING OPERATIONS AND RESHAPING STRATEGIES.pdfStephenAmell4
 
AI IN THE WORKPLACE: TRANSFORMING TODAY’S WORK DYNAMICS.pdf
AI IN THE WORKPLACE: TRANSFORMING TODAY’S WORK DYNAMICS.pdfAI IN THE WORKPLACE: TRANSFORMING TODAY’S WORK DYNAMICS.pdf
AI IN THE WORKPLACE: TRANSFORMING TODAY’S WORK DYNAMICS.pdfStephenAmell4
 
AI IN REAL ESTATE: IMPACTING THE DYNAMICS OF THE MODERN PROPERTY MARKET.pdf
AI IN REAL ESTATE: IMPACTING THE DYNAMICS OF THE MODERN PROPERTY MARKET.pdfAI IN REAL ESTATE: IMPACTING THE DYNAMICS OF THE MODERN PROPERTY MARKET.pdf
AI IN REAL ESTATE: IMPACTING THE DYNAMICS OF THE MODERN PROPERTY MARKET.pdfStephenAmell4
 
How AI in business process automation is changing the game.pdf
How AI in business process automation is changing the game.pdfHow AI in business process automation is changing the game.pdf
How AI in business process automation is changing the game.pdfStephenAmell4
 
Generative AI in supply chain management.pdf
Generative AI in supply chain management.pdfGenerative AI in supply chain management.pdf
Generative AI in supply chain management.pdfStephenAmell4
 
AI in telemedicine: Shaping a new era of virtual healthcare.pdf
AI in telemedicine: Shaping a new era of virtual healthcare.pdfAI in telemedicine: Shaping a new era of virtual healthcare.pdf
AI in telemedicine: Shaping a new era of virtual healthcare.pdfStephenAmell4
 
AI in business management: An Overview.pdf
AI in business management: An Overview.pdfAI in business management: An Overview.pdf
AI in business management: An Overview.pdfStephenAmell4
 
AI in fleet management : An Overview.pdf
AI in fleet management : An Overview.pdfAI in fleet management : An Overview.pdf
AI in fleet management : An Overview.pdfStephenAmell4
 
AI in fuel distribution control Exploring the use cases.pdf
AI in fuel distribution control Exploring the use cases.pdfAI in fuel distribution control Exploring the use cases.pdf
AI in fuel distribution control Exploring the use cases.pdfStephenAmell4
 
AI in pricing engines.pdf
AI in pricing engines.pdfAI in pricing engines.pdf
AI in pricing engines.pdfStephenAmell4
 

More from StephenAmell4 (20)

AI in supplier management - An Overview.pdf
AI in supplier management - An Overview.pdfAI in supplier management - An Overview.pdf
AI in supplier management - An Overview.pdf
 
AI for customer success - An Overview.pdf
AI for customer success - An Overview.pdfAI for customer success - An Overview.pdf
AI for customer success - An Overview.pdf
 
AI in financial planning - Your ultimate knowledge guide.pdf
AI in financial planning - Your ultimate knowledge guide.pdfAI in financial planning - Your ultimate knowledge guide.pdf
AI in financial planning - Your ultimate knowledge guide.pdf
 
AI in anomaly detection - An Overview.pdf
AI in anomaly detection - An Overview.pdfAI in anomaly detection - An Overview.pdf
AI in anomaly detection - An Overview.pdf
 
AI for sentiment analysis - An Overview.pdf
AI for sentiment analysis - An Overview.pdfAI for sentiment analysis - An Overview.pdf
AI for sentiment analysis - An Overview.pdf
 
AI integration - Transforming businesses with intelligent solutions.pdf
AI integration - Transforming businesses with intelligent solutions.pdfAI integration - Transforming businesses with intelligent solutions.pdf
AI integration - Transforming businesses with intelligent solutions.pdf
 
AI in visual quality control - An Overview.pdf
AI in visual quality control - An Overview.pdfAI in visual quality control - An Overview.pdf
AI in visual quality control - An Overview.pdf
 
AI-based credit scoring - An Overview.pdf
AI-based credit scoring - An Overview.pdfAI-based credit scoring - An Overview.pdf
AI-based credit scoring - An Overview.pdf
 
AI in marketing - A detailed insight.pdf
AI in marketing - A detailed insight.pdfAI in marketing - A detailed insight.pdf
AI in marketing - A detailed insight.pdf
 
Generative AI in insurance- A comprehensive guide.pdf
Generative AI in insurance- A comprehensive guide.pdfGenerative AI in insurance- A comprehensive guide.pdf
Generative AI in insurance- A comprehensive guide.pdf
 
AI IN INFORMATION TECHNOLOGY: REDEFINING OPERATIONS AND RESHAPING STRATEGIES.pdf
AI IN INFORMATION TECHNOLOGY: REDEFINING OPERATIONS AND RESHAPING STRATEGIES.pdfAI IN INFORMATION TECHNOLOGY: REDEFINING OPERATIONS AND RESHAPING STRATEGIES.pdf
AI IN INFORMATION TECHNOLOGY: REDEFINING OPERATIONS AND RESHAPING STRATEGIES.pdf
 
AI IN THE WORKPLACE: TRANSFORMING TODAY’S WORK DYNAMICS.pdf
AI IN THE WORKPLACE: TRANSFORMING TODAY’S WORK DYNAMICS.pdfAI IN THE WORKPLACE: TRANSFORMING TODAY’S WORK DYNAMICS.pdf
AI IN THE WORKPLACE: TRANSFORMING TODAY’S WORK DYNAMICS.pdf
 
AI IN REAL ESTATE: IMPACTING THE DYNAMICS OF THE MODERN PROPERTY MARKET.pdf
AI IN REAL ESTATE: IMPACTING THE DYNAMICS OF THE MODERN PROPERTY MARKET.pdfAI IN REAL ESTATE: IMPACTING THE DYNAMICS OF THE MODERN PROPERTY MARKET.pdf
AI IN REAL ESTATE: IMPACTING THE DYNAMICS OF THE MODERN PROPERTY MARKET.pdf
 
How AI in business process automation is changing the game.pdf
How AI in business process automation is changing the game.pdfHow AI in business process automation is changing the game.pdf
How AI in business process automation is changing the game.pdf
 
Generative AI in supply chain management.pdf
Generative AI in supply chain management.pdfGenerative AI in supply chain management.pdf
Generative AI in supply chain management.pdf
 
AI in telemedicine: Shaping a new era of virtual healthcare.pdf
AI in telemedicine: Shaping a new era of virtual healthcare.pdfAI in telemedicine: Shaping a new era of virtual healthcare.pdf
AI in telemedicine: Shaping a new era of virtual healthcare.pdf
 
AI in business management: An Overview.pdf
AI in business management: An Overview.pdfAI in business management: An Overview.pdf
AI in business management: An Overview.pdf
 
AI in fleet management : An Overview.pdf
AI in fleet management : An Overview.pdfAI in fleet management : An Overview.pdf
AI in fleet management : An Overview.pdf
 
AI in fuel distribution control Exploring the use cases.pdf
AI in fuel distribution control Exploring the use cases.pdfAI in fuel distribution control Exploring the use cases.pdf
AI in fuel distribution control Exploring the use cases.pdf
 
AI in pricing engines.pdf
AI in pricing engines.pdfAI in pricing engines.pdf
AI in pricing engines.pdf
 

Recently uploaded

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

How to train a transactional chatbot using reinforcement learning.pdf

  • 1. 1/21 How to train a transactional chatbot using reinforcement learning? leewayhertz.com/train-transactional-chatbot-using-reinforcement-learning In an age where artificial intelligence is reshaping our world, chatbots have emerged as a valuable tool for businesses. With a staggering 80% of businesses projected to integrate chatbots in their operations by 2024, the focus is now shifting towards transactional chatbots, also known as Goal-oriented (GO) chatbots. Unlike typical chatbots, transactional chatbots are laser-focused on solving specific user problems. Need to book a ticket? There is a chatbot for that. Looking to make a reservation? A transactional chatbot is on it. These transactional chatbots are not just sophisticated, they are becoming smarter and more efficient by the day. But how are these transactional chatbots trained to be so proficient? The answer lies in two major learning techniques: supervised learning and reinforcement learning. Supervised learning uses an encoder-decoder approach to map user dialogue to responses directly. In contrast, reinforcement learning takes a more hands-on approach, training chatbots using trial-and-error conversations with rule-based user simulator or real users. Among these, transactional chatbots using reinforcement learning have recently surfaced as an exciting field teeming with potential applications. One stellar example of this rapidly growing field is the TC-Bot developed by MiuLab. The TC-Bot showcases how a user can be
  • 2. 2/21 simulated using basic rules, significantly expediting the training process compared to using real people. With more advanced chatbot training methods being developed, it’s safe to say we are on the cusp of a new era where transactional chatbots will become ubiquitous, changing the way we interact with technology. In this article, we will dive deep into the world of transactional chatbots, explore the process of their training, their use cases and other vital aspects. What is transactional chatbot? Transactional chatbots vs. Traditional chatbots Key components of transactional chatbot Benefits of transactional chatbots How does a transactional chatbot operate? Understanding the dialogue system The role of the user simulator and error model controller An overview of Deep-Q-Network Training a transactional chatbot using Deep-Q-Network The scenario Prerequisites Understanding the data (movie tickets) for the chatbot Understanding the anatomy of an action Preparing the state Dialogue configuration for the agent Building neural network model Implementing policy Training an agent Use cases of transactional chatbots What is transactional chatbot? A transactional chatbot, also known as a task-oriented or goal-oriented chatbot, is a specialized form of artificial intelligence software designed with a clear purpose – to help users achieve a specific goal or complete a specific task. This could range from booking a flight, scheduling a doctor’s appointment, or placing an order for a pizza. Unlike their counterparts (general conversation or social chatbots), which focus on simulating human-like interaction and carrying out broad, non-specific conversations, transactional chatbots have a clear focus. Their role is not to engage in small talk or provide entertainment but to aid users in accomplishing a particular task as quickly and efficiently as possible.
  • 3. 3/21 Transactional chatbots operate by recognizing and understanding the user’s intent and then taking appropriate actions to fulfill the user’s request. To do this, they employ sophisticated Natural Language Understanding (NLU) capabilities and machine learning algorithms to interpret the user’s inputs, map them to the correct action, and generate a suitable response. The importance of these goal-oriented chatbots in today’s digital ecosystem cannot be understated. In a world that is increasingly driven by speed, efficiency, and convenience, transactional chatbots serve as a pivotal touchpoint between businesses and customers. They provide instant, 24/7 support, helping to improve customer service and engagement, streamline business processes, and reduce operational costs. Moreover, they provide a personalized user experience, understand and remember customer preferences, and deliver tailor-made solutions, enhancing customer satisfaction and loyalty. Furthermore, in times of social distancing and remote operations, transactional chatbots have become invaluable tools for businesses to maintain constant, uninterrupted customer support. By handling routine tasks and queries, they allow human staff to focus on more complex and critical issues, thus enhancing the overall efficiency of the business. In sum, transactional chatbots are more than just fancy technology; they are powerful tools that are reshaping the way businesses operate and interact with their customers, making them indispensable in the modern digital landscape. Transactional chatbots vs. traditional chatbots Comparison Criteria Transactional Chatbot Traditional Chatbot Purpose Primarily designed to handle transactions and support complex tasks. They can assist in making reservations, completing purchases, and providing personalized recommendations. Typically designed for simple tasks such as answering basic FAQs or guiding users to the appropriate resources. Complexity of interaction Capable of understanding and responding to more complex customer queries. These chatbots can process multiple layers of communication and follow the flow of conversation. Generally capable of managing simple, linear conversations and might struggle with complex interactions. Use of AI Uses advanced AI and machine learning to provide personalized responses, understand user intent, and remember previous interactions. Primarily uses rule-based responses and may or may not leverage AI. Its capabilities are often limited to predefined responses.
  • 4. 4/21 Data analysis Continually learns from user interactions, enabling it to make more accurate predictions and provide personalized services. Data analysis is typically minimal or non-existent, with less emphasis on learning from user interactions. User experience Enhances user experience by offering personalized responses and handling complex requests. Provides a satisfactory user experience for straightforward inquiries but may not handle complex requests as effectively. Integration with other systems Often integrated with other systems (CRM, ERP) to access customer data, process transactions, etc. Usually standalone, with minimal integration with other systems. Cost and implementation time Might require a higher initial investment and longer implementation time due to their complex nature. Generally cheaper and quicker to implement as they’re less complex. Scalability High scalability due to its ability to learn and adapt from interactions. Can handle an increasing number of complex queries effectively. Limited scalability. As queries become more complex, these chatbots might struggle to maintain efficiency. Key components of transactional chatbots Goal-oriented chatbots or transactional chatbots, also known as task-oriented chatbots, have several key components that enable them to interact with users effectively and accomplish specific tasks. Here are some of the main elements: Natural Language Understanding (NLU) unit: This is the component of the chatbot that interprets and understands the user’s input. It transforms human language into a machine-readable format. NLU employs tokenization, stemming, part-of-speech tagging, and entity extraction to understand the user’s message’s context, intent, and entities. Dialogue Manager (DM): The DM is the central control unit of the chatbot. It maintains the context and state of the conversation, decides the next action based on the current state and user’s input, and generates the appropriate system response. State Tracker (ST): Sometimes considered a part of the Dialogue Manager, the state tracker keeps track of the current state of the conversation, including the user’s goals, requests, and the information that the chatbot has provided. Policy learner: This component uses reinforcement learning algorithms to determine the best responses based on the state of the conversation. It “learns” from its past actions and their outcomes to optimize the chatbot’s responses.
  • 5. 5/21 Natural Language Generator (NLG) unit: The NLG takes the system response generated by the dialogue manager and translates it into natural, human-like language. This can either be a simple template-based system or a more complex machine learning model. User simulator: In training a transactional chatbot, a user simulator is used. It’s a model that generates simulated user behavior, which can be used for training the chatbot in a controlled environment. Database (DB): Chatbots that provide information or perform transactions often need to interact with a database. This could be checking ticket availability, booking appointments, providing product details, etc. The DB is an integral part of these chatbot systems. Error model controller: This component is often used during training to add some noise to the user simulator’s responses, making the training environment more similar to real-world conditions where user inputs can be unpredictable and varied. These components work together in a cycle to enable transactional chatbots to handle complex, multi-turn dialogues, manage user goals, and offer an engaging, human-like conversation experience. Benefits of transactional chatbots Transactional chatbots, a form of virtual assistant, are seeing increased adoption across various industries, all thanks to the multitude of benefits they bring to the table. Here are some benefits of using them: Enhanced efficiency: Transactional chatbots are designed for multitasking, handling several customer interactions simultaneously without any hitches. They provide round- the-clock service, responding to customer queries in real time, regardless of geographical boundaries or time differences. Automated responses also guarantee accuracy, improving the overall efficiency of your team and services. Budget-friendly solution: Incorporating transactional chatbots into your customer service protocol allows you to minimize the need for human intervention, leading to considerable cost savings. With their capacity to operate 24/7, chatbots also contribute to improved cost-effectiveness. By optimizing operations and reducing personnel expenses, chatbots offer substantial cost advantages. Tailored interactions: Chatbots can comprehend each customer’s preferences, paving the way for more personalized interactions and tailored recommendations. Customers are more likely to interact with businesses offering a personal touch, enhancing their overall experience.
  • 6. 6/21 Augmented sales: Transactional chatbots can significantly boost sales by providing personalized suggestions based on customer preferences and buying history. They also contribute to lead generation by simultaneously managing multiple queries, potentially enhancing your business’s revenue and sales figures. Superior customer experience: With their round-the-clock service and efficient customer management, transactional chatbots significantly improve the customer experience. By offering seamless service without human involvement, these chatbots can contribute to the growth and reputation of your organization. How does a transactional chatbot operate? Here is the sequence of steps that describe how a transactional chatbot works. User initiation: The process begins when a user sends a message or a request to the chatbot. This could be a query, a request for information, or an action such as booking a ticket or making a reservation. Input interpretation: The chatbot uses its Natural Language Understanding (NLU) unit to interpret the user’s message. It converts the natural language input into a machine- readable format. The NLU unit employs tokenization, stemming, part-of-speech tagging, and entity extraction to understand the context, intent, and entities in the user’s message. Dialogue management: The Dialogue Manager (DM) processes this interpreted input. It uses the state tracker to keep track of the conversation’s context, including the user’s goals, requests, and the information the chatbot has provided. Policy learning: Based on the current state of the conversation, the policy learner uses reinforcement learning algorithms to decide on the best possible action or response. System response generation: Once the action is determined, the system generates an appropriate response. This could involve querying a database for required information, initiating a transaction, or formulating a reply to the user’s query. Response delivery: The generated system response is then translated into natural, human-like language using the Natural Language Generator (NLG) unit. This response is then delivered to the user. User feedback and learning: The chatbot observes and learns from user feedback. For instance, if a user corrects information or rephrases a request, the chatbot uses this feedback to update its understanding and improve future responses. Conversation continuation or termination: Depending on the user’s response or the chatbot’s settings, the conversation may continue with further exchanges or be concluded if the chatbot has successfully addressed the user’s request.
  • 7. 7/21 This is a generalized flow of how a transactional chatbot operates. Please note that the exact workings can vary based on the chatbot’s specific design, functionalities, and the complexity of tasks it is programmed to perform. Understanding the dialogue system A transactional chatbot employs a dialogue system designed to facilitate meaningful, purpose-driven conversations with users. This system revolves around three key components: the Dialogue Manager (DM), the Natural Language Understanding (NLU) unit, and the Natural Language Generator (NLG) unit, each playing a unique role in the conversational process. The NLU unit acts as the ears of the chatbot, listening to and interpreting user inputs. When a user utters something, it is the job of the NLU to translate this into a semantic frame. This frame is a structured representation of the user’s utterance, stripped of natural language complexities and brought down to a format the chatbot can understand and process. Now enter the DM, the chatbot’s brain. Composed of a Dialogue State Tracker (DST) and a policy, often represented by a neural network, the DM controls the flow of the conversation. The DST takes the semantic frame from the NLU, combines it with the history of the conversation, and creates a state representation. This state is the distilled essence of the dialogue so far, allowing the bot to maintain the context and continuity of the conversation.
  • 8. 8/21 Next, the state representation is ingested by the policy component of the DM, determining the chatbot’s next action. Here, reinforcement learning can play a vital role, enabling the chatbot to learn the best responses over time from repeated interactions. In some cases, an external database can be consulted to supplement the chatbot’s responses with useful information, like specifics about a restaurant reservation or movie ticket availability. Once the chatbot’s response is decided, it is still in a semantic frame, which isn’t user- friendly. Here is where the NLG unit, the chatbot’s mouth, steps in. The NLG takes this semantic frame and transforms it back into natural, human-like language. This allows the chatbot to deliver responses that are easily understandable by the user. The user’s goal, be it making a reservation, booking a ticket, or gathering information, forms the driving force behind this dialogue loop. Through iterative cycles of understanding, managing dialogue, and generating natural language, the transactional chatbot works towards achieving this user goal, creating a dynamic, interactive, and purposeful conversational experience. The role of the user simulator and error model controller In transactional chatbots, two significant components contribute to refining the model’s training and performance: the user simulator and the Error Model Controller (EMC). Both are crucial in enabling the chatbot to handle more realistic, diverse, and error-prone conversations. User simulator The user simulator is akin to a virtual training partner for the chatbot. It emulates the behavior of a real user, offering a more efficient way to train the bot compared to hours of user interactions. This simulator operates based on an agenda, meaning it has a predefined goal for each interaction episode, and its actions align with this goal. The internal state of the simulator allows it to follow the dialogue progression and take informed actions accordingly. Responses to agent actions are crafted using a combination of deterministic rules with a touch of stochastic rules to introduce variety. User goals are essential elements for the simulator, representing what the user wants to achieve from a conversation. These goals can be sourced from actual dialogue corpus or be manually created, comprising ‘inform slots’ and ‘request slots.’ The inform slots represent constraints the user has in mind, while request slots simulate the user’s quest for specific information. However, unlike real users who may change their minds during a conversation, the simulator’s goals remain static throughout an episode. A “default slot” is added to every goal’s request slots, and the agent must provide a value for this slot for successful goal fulfillment.
  • 9. 9/21 The user simulator’s internal state records the goal slots and the conversation’s history. It aids in formulating user actions at each step, containing dictionaries of slots and an intent: rest slots, history slots, request slots, inform slots, and the intent of the current action. The actions that a user simulator can perform are varied and can sometimes be complex, incorporating multiple requests or inform slots. These actions can even contain a mix of both types of slots. Error Model Controller (EMC) The Error Model Controller (EMC) comes into play once a user action is received from the simulator. It is responsible for introducing errors into these actions, mimicking the imperfections of real-world interactions and helping the bot cope with potential misunderstandings or mistakes in user responses. The EMC can add errors to the user action’s inform slots and intent, training the bot to handle unexpected scenarios better and ensuring it’s equipped to deal with more realistic, less-than-perfect human interactions. An overview of Deep-Q-Network Deep Q-Network (DQN) is a reinforcement learning technique that combines Q-Learning with deep neural networks. DQN was proposed by researchers at Google DeepMind and and it had a significant impact on the field of reinforcement learning, particularly in environments where input data has high-dimensional raw spaces, such as video games. In traditional Q-Learning, a table called the Q-table stores the value of every possible state- action pair. However, this approach doesn’t scale well to problems with large state spaces or problems where states are not easily expressible in table form, such as image inputs. DQN addresses these challenges using a deep neural network to approximate the Q- function, which maps state-action pairs to expected future rewards. This way, a neural network can be trained to predict the Q-values for a given state instead of maintaining a table for each possible state-action pair. A key innovation in DQN is using experience replay and target networks to stabilize training. Experience replay stores past experiences in a replay buffer and samples mini-batches from this buffer to train the network, which breaks the correlation between sequential experiences. The target network is a separate network used to compute the target Q-values during learning, which is periodically updated from the main network. This helps to avoid harmful feedback loops during learning. Since the inception of DQN, many extensions have been proposed to improve its performance and stability, such as Double DQN, Dueling DQN, and Prioritized Experience Replay.
  • 10. 10/21 Training a transactional chatbot using Deep-Q-Network Building a transactional chatbot using reinforcement learning involves several steps that should be executed sequentially. Here’s the sequence: 1. Preparing the state: The initial step in developing a chatbot is preparing the state, which represents the current situation that the chatbot is in. This typically involves processing the raw input data (like text conversation history) into a format the model can understand. The state also includes the chatbot’s internal information about the conversation, like the identified intents or entities in the user’s utterances. 2. Dialogue configuration for the agent: The next step is to set up the dialogue configuration for the agent. This includes defining the possible actions that the agent can take (like answering a question, asking for more information, or ending the conversation) and defining the reward structure that the agent will use to learn. This configuration guides the agent about the context of the conversation, its possible actions, and their consequences. 3. Neural network model: Once the state and dialogue configuration have been set up, the next step is to build the neural network model that will be used to learn the dialogue policy. This model takes the current state as input and outputs the Q-values for each possible action. The Q-values represent the expected future reward for taking each action, which is used to decide the best action to take. This model could be a Deep Q- Network (DQN) or other types of network, depending on the complexity of the task and the available data. 4. Policy: With the neural network model in place, a policy that dictates how the agent chooses its actions can be defined. A common policy is an epsilon-greedy policy, where the agent mostly chooses the action with the highest Q-value (as predicted by the model) but occasionally chooses a random action to explore the environment. 5. Agent training: Finally, with the state, dialogue configuration, neural network model, and policy setup, the agent can be trained. During training, the agent interacts with the environment (in this case, the chatbot conversing with users or a user simulator), takes actions according to its policy, observes the results, and receives rewards. The agent then uses these experiences to update its neural network model, intending to maximize its total reward over time. The agent continually goes through this interaction and learning process until it reaches a satisfactory performance level. The scenario The main objective of our transactional chatbot is to engage in proficient interactions with real users, successfully accomplishing specific tasks such as locating suitable reservations or movie tickets within the users’ specified constraints. The chatbot, referred to as the agent, has a crucial role in processing an ongoing conversation’s state and generating an
  • 11. 11/21 appropriate, near-optimal response. In essence, the agent takes a snapshot of the current dialogue history from the Dialogue State Tracker (ST) and uses it to decide on the most fitting dialogue response to offer the next. The supporting code for our system draws inspiration from a dialogue system developed by MiuLab, known as TC-Bot. The notable achievement of their research is the demonstration of a user simulation with fundamental rules. This approach enables the swift training of the chatbot agent via reinforcement learning, which is considerably faster than when training with real people. While other studies have attempted similar methods, the unique aspect of this research lies in its effective training model, which is successful and accompanied by accessible and comprehensive code. The complete code is available here – https://github.com/maxbrenner-ai/GO-Bot-DRL Prerequisites To fully comprehend the code, there are a few prerequisites that won’t be explicitly covered but are vital for a comprehensive understanding. Here they are: Proficiency in Python programming – A solid grasp of Python programming language is a must. Mastery of Python dictionaries – We will extensively utilize dictionaries in Python, so understanding their operation is crucial. Understanding of the DQN (Deep Q-Network) – Familiarity with developing a simple DQN is necessary. Experience with Keras for building neural networks – You should know how to construct a straightforward neural network model using Keras. Please ensure you are familiar with these areas before proceeding. You need to have the following dependencies ready before executing the code: Python >= 3.5 Keras >= 2.24 (Earlier versions probably work) numpy Understanding the data (movie tickets) for the chatbot
  • 12. 12/21 Data sources: Our dataset comprises movie tickets with varied attributes or slots. It is structured as a dictionary where the keys are the unique identifiers of the tickets (represented as long integers) and the values are sub-dictionaries encapsulating the detailed information that each ticket holds. It’s important to note that not every ticket will have the same attributes and certainly not the same values! Data source – https://gist.github.com/maxbrenner-ai/f665bb570e1ac55568001c7991faebcd#file- movie_dict-txt Database index: There is another file that houses a dictionary. The keys in this dictionary represent different slots that a ticket might hold, while the values are lists of potential values that each slot can take. Data dictionary link – https://gist.github.com/maxbrenner-ai/f665bb570e1ac55568001c7991faebcd#file- movie_dict-txt User goal collection: Lastly, we have a list that stores user goals. Each goal is represented as a dictionary comprising request and inform slots. We will delve deeper into what these slots signify later on. User goal list – https://gist.github.com/maxbrenner-ai/79c1ace99eafcc376f37090c7e5287aa#file- movie_user_goals-txt The core objective here is to enable the chatbot agent to locate a ticket that aligns with the user’s specific requirements, which are defined by the goal for each episode. This is quite a challenging task considering each ticket’s uniqueness and variance in slots! Understanding the anatomy of an action Understanding the structure of an action is crucial in this dialogue system. Ignoring the natural language aspect for a moment, we can see that both the user simulator and the agent work with actions represented as semantic frames. An action consists of an intent, inform slots, and request slots. Here, a ‘slot’ signifies a key-value pair, typically referring to a singular inform or request. For instance, in the dictionary {‘starttime’: ’tonight’, ‘theater’: ’regal 16’}, both ‘starttime: tonight’ and ‘theater: regal 16’ are considered slots. Here you will get more example actions: https://gist.github.com/maxbrenner- ai/dcf1185a0f2dffc9f88b4054b908cf13#file-action_examples-txt The intent indicates the kind of action it is. The remainder of the action is divided into inform slots, which contain constraints, and request slots, which carry information that needs completion. The potential keys are specified in the dialogue_config.py, and their values are provided in the aforementioned database dictionary. An inform slot shares information that the sender wants the receiver to acknowledge. It comprises a key from the list of keys and a value from that key’s associated list of values. Conversely, a request slot contains a key for which the sender wishes to retrieve a value from the receiver. In essence, it is a key from the list of keys and ‘UNK’ (indicating “unknown”) as the value, as the sender doesn’t yet know the appropriate value for this slot.
  • 13. 13/21 The intents Include: Inform: Provides constraints in the form of inform slots. Request: Asks for the completion of request slots with values. Thanks: Used exclusively by the user, it signals to the agent that it has done something satisfactory, or that the user is prepared to conclude the conversation. Match found: Used solely by the agent, it informs the user that a match fulfilling the user’s goal has been identified. Reject: Utilized only by the user in response to the agent’s ‘match found’ intent, indicating that the suggested match doesn’t fit their constraints. Done: The agent uses this to wrap up the conversation and verify if the current goal has been accomplished. The user action automatically adopts this intent if the conversation drags on too long. Preparing the state The Dialogue State Tracker (ST) is essential in a transactional chatbot. Its primary function is to create a ‘state’ for the chatbot to work from. A ‘state’ is like a snapshot of the current situation in the chat, which the chatbot uses to decide its next action. To do this, the ST maintains a record of the dialogue, capturing both the user’s and chatbot’s actions as they happen. It also keeps track of any information (known as ‘inform slots’) shared in the chat. For instance, if the user mentions they prefer Italian food, this information is saved in an ‘inform slot.’ The state prepared by the ST is essentially an array of data representing current dialogue history and all the information slots mentioned so far. It’s like a conversation summary to date, which helps the chatbot make informed decisions. Also, whenever the chatbot needs to provide information to the user, the ST can fetch this from a database using the data in the current information. For example, if the user asks for Italian restaurants, the ST can pull a list from the database matching this criterion. One crucial aspect of the ST’s job is to compile a useful state that gives the chatbot an accurate view of the ongoing conversation. This state includes recent actions from both the user and the chatbot, letting the chatbot know where the dialogue is at. It also includes a count of the number of rounds or interactions that have occurred. This helps the chatbot gauge how much time it has left, especially in scenarios where the chat has a maximum number of rounds allowed. Lastly, the state also includes details about the current inform slots and how many database entries match this information. This helps the chatbot know how much information it has to work with and how relevant it is to the user’s requirements.
  • 14. 14/21 The Dialogue State Tracker is like the chatbot’s memory and awareness, helping it understand the current conversation and make the best possible response. Dialogue configuration for the agent Dialogue configuration for the agent is a critical step in building a transactional chatbot. This process involves defining how the chatbot will interact with users, specifying the flow of conversation, and the range of responses it can deliver. Essentially, it is setting up the rules of engagement for the chatbot, ensuring that it can understand user inputs and provide relevant and meaningful responses. This configuration becomes the foundation upon which further layers of learning and adaptation are built, making it a vital part of any successful chatbot development. Here are the dialogue config constants used by the agent: # Possible inform and request slots for the agent agent_inform_slots = ['moviename', 'theater', 'starttime', 'date', 'genre', 'state', 'city', 'zip', 'critic_rating', 'mpaa_rating', 'distanceconstraints', 'video_format', 'theater_chain', 'price', 'actor', 'description', 'other', 'numberofkids'] agent_request_slots = ['moviename', 'theater', 'starttime', 'date', 'numberofpeople', 'genre', 'state', 'city', 'zip', 'critic_rating', 'mpaa_rating', 'distanceconstraints', 'video_format', 'theater_chain', 'price', 'actor', 'description', 'other', 'numberofkids'] # Possible actions for agent agent_actions = [ {'intent': 'done', 'inform_slots': {}, 'request_slots': {}}, # Triggers closing of conversation {'intent': 'match_found', 'inform_slots': {}, 'request_slots': {}} ] for slot in agent_inform_slots: agent_actions.append({'intent': 'inform', 'inform_slots': {slot: 'PLACEHOLDER'}, 'request_slots': {}}) for slot in agent_request_slots:
  • 15. 15/21 agent_actions.append({'intent': 'request', 'inform_slots': {}, 'request_slots': {slot: 'UNK'}}) # Rule-based policy request list rule_requests = ['moviename', 'starttime', 'city', 'date', 'theater', 'numberofpeople'] # These are possible inform slot keys that cannot be used to query no_query_keys = ['numberofpeople', usersim_default_key] Building a neural network model In the development of a transactional chatbot, constructing the neural network model is a pivotal step. Leveraging Keras, a popular deep learning framework, a model for the chatbot agent is designed. This model comprises a single hidden layer neural network, which, despite its simplicity, proves to be highly effective for the task at hand. The design of this model plays a crucial role in enabling the chatbot to comprehend and respond appropriately to the user’s input. Here is the code snippet: def _build_model(self): model = Sequential() model.add(Dense(self.hidden_size, input_dim=self.state_size, activation='relu')) model.add(Dense(self.num_actions, activation='linear')) model.compile(loss='mse', optimizer=Adam(lr=self.lr)) return model The instance variables are assigned in constants.json file located here – https://github.com/maxbrenner-ai/GO-Bot-DRL/blob/master/constants.json Implementing policy The implementation of the policy in a transactional chatbot serves as a guide for the agent to select a suitable action based on the current state. This varies according to whether the dialogue is in the warm-up or training stage. The warm-up stage, which precedes the training, is designed to fill the agent’s memory using generally a random policy. For our GO chatbot, however, a basic rule-based policy is used during the warm-up phase. def get_action(self, state, use_rule=False): # self.eps is initialized to the starting epsilon and does NOT get annealed if self.eps > random.random():
  • 16. 16/21 index = random.randint(0, self.num_actions - 1) # self._map_index_to_action(index) takes an index and maps the action from all possible agent actions action = self._map_index_to_action(index) return index, action else: if use_rule: return self._rule_action() else: return self._dqn_action(state) Upon transitioning into the training stage, the behavior model comes into play for action selection. Here, the term ‘use rule’ signifies the warm-up stage. This policy determination method provides both the index of the action and the action itself. The rule-based policy employed during the warm-up stage is a straightforward one. A noteworthy component of this rule-based policy is the reset method of the agent. This primarily serves to reset a couple of variables associated with the rule-based policy. Although simple, this policy is crucial for initiating the agent’s activity in a somewhat meaningful way, thus improving results over taking random actions. Training an agent
  • 17. 17/21 In a transactional chatbot, the agent’s role is much like a skilled conversation partner, adept at helping users achieve a specific target, such as booking a reservation or buying a movie ticket, while considering the user’s specific needs and limitations. This agent’s primary task is navigating through a conversation and making the best possible decision at each step. The agent relies on a Dialogue State Tracker (ST) to do this. This tracker is like the memory of the conversation, keeping track of the discussion’s history. Using this information, the agent selects an appropriate response that moves the conversation forward, aiming to fulfill the user’s goal. The agent chooses a course of action based on a specific state. During the warm-up phase, this policy could be as simple as a list of requests. However, during training, the policy becomes more complex, transforming into a single-layer behavior model. The training method is pretty straightforward, with only a few variations from other methods that use Deep Q-Network (DQN) training. It is always beneficial to experiment with the model’s structure, incorporate prioritized experience replay (a technique that selectively replays more important experiences), and develop a more sophisticated rule-based policy. This continual tweaking and enhancement can make the agent even more efficient and effective at accomplishing its goals. Here’s a simpler explanation of the flow of an agent’s action in a transactional chatbot, as shown in the above diagram: A single round or loop in training involves four main components:
  • 18. 18/21 The agent (dqn_agent) The dialogue state tracker (state_tracker) The user (or user simulator) The Error Model Controller (EMC) The following steps outline the sequence of events: 1. The round begins by acquiring the current state, either an initial state for the start of the conversation (episode) or equivalent to the previous round. This state is then fed into the agent’s action determination method. 2. The agent decides on an action based on the current state and passes it to the state tracker. The state tracker updates its record of the conversation and enriches the agent’s action with additional information retrieved from a database. 3. The enriched agent’s action is then given to the user simulator. Here, the user simulator generates a rule-based response and also provides details about the reward and success rate (though these aren’t shown in the diagram). 4. The user’s response then goes through the error model controller, which introduces potential errors mimicking real-world scenarios. 5. The possibly erroneous user response is then fed into the state tracker, which updates its conversation record. However, unlike before, it doesn’t add any substantial updates to the user response. 6. Lastly, the state tracker produces the next stage of the conversation, completing the current experience tuple (state, action, reward, next state). This tuple is then added to the agent’s memory, and the cycle continues with the next round. Before the actual learning and decision-making begin for a Deep Q-Network (DQN) agent, like our chatbot, it undergoes a ‘warm-up’ phase. This phase is necessary to fill the agent’s memory buffer with initial information. But, unlike DQN applications in games where the agent may perform random actions, our chatbot uses a basic rule-based algorithm during this warm-up stage. The specifics of this algorithm will be covered in detail in part II of the series. It’s also important to note that we are not using any Natural Language (NL) components in this training process. This means that all the actions of the chatbot will be in the form of ‘semantic frames’ – structured data representing meanings. The focus here is on training the Dialogue Manager (DM), which doesn’t require Natural Language Understanding (NLU) or Natural Language Generation (NLG). These NL components are usually pre-trained separately from the agent and are not crucial to understand the reinforcement learning process. Here is the code snippet to train the agent: print('Training Started...')
  • 19. 19/21 episode = 0 period_reward_total = 0 period_success_total = 0 success_rate_best = 0.0 while episode < NUM_EP_TRAIN: episode_reset() episode += 1 done = False state = state_tracker.get_state() while not done: next_state, reward, done, success = run_round(state) period_reward_total += reward state = next_state period_success_total += success # Train if episode % TRAIN_FREQ == 0: # Check success rate success_rate = period_success_total / TRAIN_FREQ avg_reward = period_reward_total / TRAIN_FREQ # Flush if success_rate >= success_rate_best and success_rate >= SUCCESS_RATE_THRESHOLD: dqn_agent.empty_memory() # Update current best success rate if success_rate > success_rate_best: print('Episode: {} NEW BEST SUCCESS RATE: {} Avg Reward: {}' .format(episode, success_rate, avg_reward)) success_rate_best = success_rate dqn_agent.save_weights() period_success_total = 0 period_reward_total = 0 # Copy dqn_agent.copy() # Train dqn_agent.train() print('...Training Ended') The complete code is available here – https://github.com/maxbrenner-ai/GO-Bot- DRL/blob/master/train.py Use cases of transactional chatbots
  • 20. 20/21 Transactional chatbots hold great potential across a multitude of sectors, including but not limited to banking, insurance, e-commerce, healthcare, and hospitality. Here is how they can be leveraged in various contexts: Banking: Transactional chatbots can enhance banking services by automating tasks traditionally handled by bank operators. For instance, they can authenticate user identities, block stolen credit cards, provide operational hours of nearby branches, or confirm outgoing transfers. Moreover, they can offer immediate assistance in case of account queries, balance checks, or recent transactions, providing users with real-time convenience. Insurance: In the insurance sector, these chatbots can offer quotes to potential customers or distribute insurance certificates to existing ones. More advanced bots can even streamline the conversion process, allowing prospects to sign up if the quote matches their budget and needs directly. The bot gathers necessary details and forwards the contract and supporting documents, reducing manual intervention and accelerating policy issuance. E-commerce: For e-commerce platforms, transactional chatbots can assist users in product discovery based on their preferences. Additionally, they can facilitate the buying process and handle requests for order modifications or cancellations. These bots can also provide real-time order tracking, enhancing the shopping experience. Healthcare: Transactional chatbots in the healthcare industry can help patients book appointments, send reminders for medication, or guide common health issues. They can also gather patient data for health records, making the patient intake process more efficient. Hospitality: In the hospitality sector, these bots can automate room bookings, provide information about facilities, offer personalized recommendations, and address common queries about the stay, check-in/check-out process, etc. Energy companies or mobile service providers: Similar to insurance, these businesses can use transactional chatbots to provide quotes, facilitate service sign- ups, offer upgrades, or handle cancellation requests. These few instances illustrate the versatility and utility of transactional chatbots. However, their use is not confined to these areas, and they can be tailored to address the unique needs of various other industries. Endnote Transactional chatbots have indeed ushered in a new era of interaction between businesses and their customers. It has become vital for companies to incorporate this transformative technology into their communication strategies, ensuring they remain adaptable and responsive to the shifting needs of their clientele. The promise that transactional chatbots hold for the future is substantial, and with careful planning and tactical execution, they can
  • 21. 21/21 contribute to substantial growth for any business. Therefore, if a company wishes to stay competitive and not fall behind in the rapidly advancing digital world, integrating a transactional chatbot into its strategic planning becomes an astute decision. As consumer expectations continue to evolve, the prospects for transactional chatbots are looking brighter than ever. Future developments may involve more advanced levels of personalization, with chatbots becoming increasingly intelligent. This would offer a more enriched user experience, potentially featuring responses or suggestions specifically tailored to an individual user’s preferences or past interactions. Security is another area poised for significant improvement, particularly given the sensitive transactional information these chatbots handle. Expect to see advancements in encryption, fraud detection, and even biometric authentication as a means to protect and secure user data. Another promising direction for chatbots is their increasing integration with other sophisticated technologies. Currently, chatbots are deployed across a wide array of business sectors. Still, in the future, we could see them amalgamating with other cutting-edge technologies, such as voice assistants or augmented reality, to offer even more engaging customer experiences. In sum, transactional chatbots are fast becoming necessary for businesses wishing to thrive and grow in the digital age. Their potential future developments point to a world of more personalized, secure, and immersive customer experiences. Looking to boost your business operations with AI-driven transactional chatbots? Achieve this with LeewayHertz’s AI chatbot development expertise!