Chatbot Summit Singapore 2018 Post Conference Workshop

www.vanillanbanana.com
1
Post-Conference Day Workshop
Delivered by:
Ines Lin, PMP
Project Manager of Chatbot in ASUS
Trimming and Training
Data for Smart
Conversations
Work in Practice

2
WELCOME !
Know how to deploy
your chatbots and data
flow infrastructure in
the backend
After this workshop, you will be able to
1.
Identify the chatbot for
your companies
Select the right data for
your chatbot’s NLU*
machine learning
model
2. 3.
* Natural Language Understanding

3
WHO I AM
Ines Lin, PMP
Project Manager
Chatbot at ASUS
9 yearsTechnical & Copy Writer
6 yearsSystem & Cloud Project Manager
Applied Linguistic Major
PMP Certification
DL Specialization

4
READY SOLUTION, OR
MAKE YOUR OWN? AND
OTHER TOOLS TO HELP
YOU MONITOR AND
IMPROVE YOUR
CHATBOT
CONTINUOUS
IMPROVEMENT AND
DATA OPTIMIZATION
WHAT ALGORITHMS,
MACHINE LEARNING MODEL,
NATURAL LANGUAGE
UNDERSTADNING, AND
ARTIFICIAL INTELLIGNECE
EXACTLY ARE.
UNDERSTANDING
BETTER THE BIG
WORDS
WITH ALL THE DATA
COLLECTED BY COMPNAY,
WHICH TYPES TO USE, AND
HOW TO USE THEM?
IDENTIFYING THE
RIGHT DATA,
CHANNELS, AND
SOURCES
THE MORE, THE BETTER?
THE UNDERESTIMATED DATA
PRE-PROCESSING
HOW MUCH DATA IS
ENOUGH TO TRIAN A
CHABOT
AGENDA

5
3 42
1
• How much data is enough to train a
chatbot
• Recipe for machine learning
• Structuring your project
QUANTITY
• Identify the right data for
conversational model – QA
pairing model
• The NLP Pipeline
• The underestimated data pre-
processing
DATA
• Machine Learning Algorithms-what they
do
• Natural Language Understanding-
approaches-how to do it
• Natural Language Processing-techniques
BIG WORDS
• The journey towards continuous
improvement and data
optimization
• Ecosystem among your model,
ready solution, and corporate data
• Utilize bot analytics for constant
monitor
JOURNEY
TAKEAWAYS

6
* Machine Learning
ML* ALGORITHMS

7
ML* ALGORITHMS MAP

8
K-means
Principle Component Analysis (PCA)
UNSUPERVISED LEARNING
TECHNIQUES
Ensemble (sequential & parallel)
ANN (CNN, RNN)
ENSENMLING LEARNING
TECHNIQUES
Linear regression
Logistic regression
Classification Regression Tree (CART)
Naïve Bayes
K Nearest Neighbor (KNN)
Support Vector machine (SVM)
SUPERVISED LEARNING
TECHNIQUES
TOP 10 ML*ALGORITHMS

9
SUPERVISED
1. LINEAR REGRESSION 2. LOGISTIC REGRESSION

10
IS IT A CAT

11
SUPERVISED
3. CLASSIFICATION & REGRESSION TREE
(BINARY DECISSION TREE)
4. NAIVE BAYES (BINARY)
• All variables are independent
• All data attributes do not interact
• Data fits normal distribution
 But it works very well in real practice

12
RETAIL STORE CAMPAIGN STUDY CASE

13
13
IS THE PASSENGER GOING TO SURVIVE?
• Real time Prediction: It’s fast
• Multi class Prediction: calculating the
probabilities
• Text classification/ Spam Filtering/
Sentiment Analysis:
• Recommendation System:

14
SUPERVISED
5. K-NEAREST NEIGHBOR (KNN) 6. SUPPORT VECTOR MACHINE (SVM)

15
15
Beer Classification
REAL LIFE APPLICATION OF KNN

16
16
Object Classifier
REAL LIFE APPLICATION OF SVM

17
17
REAL LIFE APPLICATION OF SVM
Camera Face Recognition

18
UNSUPERVISED
7. K-MEANS 8. PRINCUPLE COMPONENT ANALYSIS (PCA)

19
19
Pizza Delivery Area
REAL LIFE APPLICATION OF K-MEANS

20
ENSEMBLE
9. SEQUENTIAL 10. PARALLEL
• Base learners are generated sequentially
(AdaBoost = Adaptive Boost)
• Motivation: To exploit DEPENDENCE
between the base learners
• Overall performance: can be boosted by
weighing previously mislabeled examples
with higher weight.
• Base learners are generated sequentially (AdaBoost =
Adaptive Boost)
• Motivation: To exploit DEPENDENCE between the base
learners
• Overall performance: can be boosted by weighing
previously mislabeled examples with higher weight.

21
COMBINING ML TECHNIQUES
Improve Predictions
(Stacking)
1.
Decrease Variance
(Bagging-BootstrapAggr.)
Bias (Boosting)
2. 3.

22
Training
Test
KNN LinReg
Decision
Tree
SVM
model1
train
model2 model3 model4X X X X
MEAN Y
ENSEMBLE LEARNERS

23
ENSEMBLE MODEL: ANN TO DNN

24
Framework for
representing specific
knowledge
FRAME-BASED
Composition
breakdown and
recombination
MODEL-THEORETICAL
statistical tactics of
machine learning
and deep learning
DISTRIBUTIONAL
By consistent
interactive language
and reaction to build
NLU (environment)
instead of looking for
better models
INTERACTIVE LEARNING
APPROACHES TO NLU

25
DISTRIBUTIONAL
Turn content into word vectors for
mathematical analysis and perform quite
well at tasks such as part-of-speech
tagging, dependency parsing, and
semantic”

26
FRAME-BASED
“A frame is a data-structure for
representing a stereotyped situation,”
” Think of frames as a canonical
representation for which specifics can be
interchanged.”

27
MODEL -THEORETICAL
Model theory refers to the idea that sentences
refer to the world, as in the case with
grounded language (i.e. the block is blue).
In compositionality, meanings of the parts of a
sentence can be combined to deduce the
whole meaning.
Linguistic Approach

28
INTERACTIVE LEARNING
Paul Grice, a British philosopher of language,
described language as a cooperative game
between speaker and listener. Liang is
inclined to agree.

29
RECURRENT NEURAL
NETWORK (RNN)
LSTMs are a special type of RNNs,
where you connect these units in a
specific way, to avoid some
problems that arise in regular RNNs
[vanishing and exploding gradient].
LSTM
Problem setting
Input = sequence
Output = sequence
SEQUENCE TO
SEQUENCE
a class of neural networks where there are
loops in the network graph, and the output
of one unit may go back to one of the
already visited units.
IMPORTANT CONCEPTS IN NLP
For Chatbot

30
RECURRENT NEURAL NETWORK
RNN

31
LSTM

32
SEQUENCE TO SEQUENCE
Learning for Language
PROBLEM SETTING
Input is a sequence and output is
also a sequence.
e.g. machine translation, question
answering,
generating natural language
description of videos,
automatic summarization, etc..

33
SEQUENCE TO SEQUENCE
Model with RNN Modules Inside
A sequence-to-sequence model that
use LSTMs/RNNs as modules
inside to solve a sequence to
sequence task, i.e. chatbot Q-A
capability
Encoder are neural networks to train
the parameters

34
CLASSICAL VS. DL NLP

35
The Break!

36
3 41
2
chatbot
QUANTITY
pairing model
processing
DATA
do
BIG WORDS
optimization
monitor
JOURNEY
TAKEAWAYS

37
IDENTIFYING THE
RIGHT DATA,
CHANNELS, AND
SOURCES

38
Style: fixed question descriptions
Language: descriptive and instructive
Characteristics: one final question to
answer set in the same domain
knowledge
ICR RECORDS
Style: short Q & A
Language:
conversational
Characteristics: most
similar to human
conversations, not only
Q & A but also other
interruptions
CHAT LOGS
Style: paragraphs
Characteristics: many to many in the
same domain knowledge
FAQ-KNOWLEDGE BANK
Style: paragraphs
Characteristics: many to many in the
same domain knowledge
eMAIL
COMPANY COLLECTED DATA

39
WHAT WE EXPECT THEM TO BE LIKE
Something went wrong with my phone
Would you please let me know the model of your phone?
What’s the exact problem you having
It’s ZH008. I only used it for 3 months. The Bluetooth just
stopped working
Have you tried to reboot your phone?
No, I haven’t. Will that help?
Usually it solves 90% in this case. Can you try to reboot and
see how it works?
Hold on a second… Oh the BT icon works
again.
Okay thanks
No problem. Have a nice
day

40
Hi?
Yes, this is Michael. How can I help you?
Oh, ok…Umm, I think something is wrong with my Bluetooth. I
cannot
Can you describe more on the Bluetooth problem?
WHAT THEY ACTUALLY ARE
turn it on or off anymore. It just doesn’t respond at all. It’s till in the
warranty period, so I am thinking to send to your service center. Can
Have you tried to reboot the phone ?
Reboot? What do you mean?
How many service centers do you have in Taipei?

41
RAW DATA PROBLEM 1
THE IDEAL DATA OBSTACLES IN REALITY
One-on-one Q & A relationship
Q has only one intent
Q doesn’t have typos, or
Q and A don’t always carry on in order
There are typos
More than one intent in one Q
There are irrelevant information in Q

42
42
Raw Data Preprocessing
Tokenization for
language units
Mathematical
representation of
language units
Deciding
training/test data
Train model
using training
data
Test the model
with the test
data
Building ML
model
NLP PIPELINE WITH ML

43
Raw data
Clean data
Remove Test Data
Remove Sessions with One Person
Remove Agents’ talk
Remove System Announcement
Concatenate Sessions (within x mins)
Remove Duplicates
Remove Null
Remove non-text info (video, image, audio) Feature extraction,
such as
Dialogue avg. session
FAQ
Keywords
Data Cleaning Data Summarization
DATA PRE-PROCESSING

44
Data Cleaning
Data Mining
QA Pairing
Data for
Training
Raw
Data
DATA PRE-PROCESSING to ML Model
S2S LSTM (RNN)
Vectorization-W2V
Segmentation
Similarity Matching

45
41
3
2
chatbot
QUANTITY
pairing model
processing
DATA
do
BIG WORDS
optimization
monitor
JOURNEY
TAKEAWAYS

46
HOW MUCH DATA IS
ENOUGH
TO TRIAN A CHABOT

47
RECIPE FOR ML DATA TRAINING

48
1 2
4
chatbot
QUANTITY
pairing model
processing
DATA
do
BIG WORDS
optimization
monitor
JOURNEY
TAKEAWAYS
3

49
THE JOURNEY TOWARDS
CONTINUOUS
IMPROVEMENT AND DATA
OPTIMIZATION

50
Chatbots on various
channels/platforms
Chatbot buildup solutions,
usually with NLU capability
Corporate inner data
management system
MARKET SOLUTION
Infrastructure

51
THE GAP
Infrastructure
Chatbots on various
channels/platforms
Chatbot buildup solutions,
usually with NLU capability
Corporate inner data
management system
Own QA
ML Model

52
Chatbot on
various channels
Live
Chat
with
human
agents
Online
QA ML
Model
Offline QA
ML Model
Raw data
QA ML
Model
Suggested
Answer
Chatbot
Suggested
Answer
CHATBOT ML ECOSYSTEM
Dataflow

53
Performance Daily Monitor-Tool
Bot Analytics Platform

54
Performance Daily Monitor-Tool

55
Performance Daily Monitor Tool

56
WRAP UP
AND Q&A

57
KNOW WHICH KIND
OF DATA TO USE
KNOW HOW TO
CONNECT/
UTILIZE CORPORATE
DATA WITH
CURRENT SOLUTION
DEMYSTIFY
ML, NLP,
ALGORITHMS AND
CHATBOT
KNOW HOW TO
DESIGN
THE DATA FLOW FOR
ML ECOSYSTEM
RECAP

58
Ines Lin, PMP
Senior Project Manager
Chatbot at ASUS
GET IN TOUCH
EMAIL: Ines_Lin@asus.com or
ineslin@outlook.com
PORTFOLIO: www.vanillanbanana.com

Chatbot Summit Singapore 2018 Post Conference Workshop

Recommended

Recommended

More Related Content

Similar to Chatbot Summit Singapore 2018 Post Conference Workshop

Similar to Chatbot Summit Singapore 2018 Post Conference Workshop (20)

Recently uploaded

Recently uploaded (20)

Chatbot Summit Singapore 2018 Post Conference Workshop

Editor's Notes