Language Empowering Intelligent Assistants (CHT)

Yun-Nung (Vivian) Chen
Yun-Nung (Vivian) ChenAssistant Professor at National Taiwan University
Y U N - N U N G ( V I V I A N ) C H E N 陳 縕 儂
H T T P : / / V I V I A N C H E N . I D V . T W
Outline
Introduction
Intelligent Assistants 智慧助理
Mobile Service 行動客服
Dialogue System/Bot 對話系統/機器人
Framework
Language Understanding 語意理解
Dialogue Management 對話管理
Output Generation 輸出生成
Recent Trend
Industrial Trend and Challenge
Deep Learning Basics
Deep Learning for Dialogues
Conclusion 2
Outline
Introduction
Intelligent Assistants 智慧助理
Mobile Service 行動客服
Dialogue System/Bot 對話系統/機器人
Framework
Language Understanding 語意理解
Dialogue Management 對話管理
Output Generation 輸出生成
Recent Trend
Industrial Trend and Challenge
Deep Learning Basics
Deep Learning for Dialogues
Conclusion 3
Apple Siri
(2011)
Google Now
(2012)
Facebook M & Bot
(2015)
Intelligent Assistants 智慧助理
4
Google Home
(2016)
Microsoft Cortana
(2014)
Amazon Alexa/Echo
(2014)
Why do we need them?
– Get things done
• E.g. set up alarm/reminder, take note
– Easy access to structured data, services and apps
• E.g. find docs/photos/notes
– Assist your routine schedule
• E.g. check the account balance
– Be more productive in managing your work and personal life
5
Mobile Service 行動客服
6
• allows customers to conduct a range of financial transactions remotely
using a mobile devices, usually called an app (e.g. Richart)
reducing the need for visiting a branch  cost reduction
7
Mobile Service 行動客服
• The users can finish specific tasks that are predefined by the app
• Limitation
– App usage design may not be user-friendly
– Good designs may differ across people
– Learning how to use app takes time
Why Natural Language?
• Global Digital Statistics (2015 January)
Global Population
7.21B
Active Internet Users
3.01B
Active Social
Media Accounts
2.08B
Active Unique
Mobile Users
3.65B
The more natural and convenient input of devices evolves towards speech.
8
Intelligent Assistant Architecture
Reactive
Assistance
反應式協助
Proactive
Assistance
主動式協助
Data
Data Bases and
Client Signals
Device/Service End-points
(Phone, PC, Xbox, Web Browser, Messaging Apps)
User Experience
“restaurant suggestions”“call taxi”
9
• Dialogue systems are intelligent agents that are able to help users
finish tasks more efficiently via spoken interactions.
• Dialogue systems are being incorporated into various devices (smart-
phones, smart TVs, in-car navigating system, etc).
Good dialogue systems assist users to access information conveniently
and finish tasks efficiently.
Dialogue System 對話系統
JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion
10
App  Bot
11
• A bot is responsible for a “single” domain, similar to an app
Seamless and automatic information transferring across domains
 reduce duplicate information and interaction
Outline
Introduction
Intelligent Assistants 智慧助理
Mobile Service 行動客服
Dialogue System/Bot 對話系統/機器人
Framework
Language Understanding 語意理解
Dialogue Management 對話管理
Output Generation 輸出生成
Recent Trend
Industrial Trend and Challenge
Deep Learning Basics
Deep Learning for Dialogues
Conclusion 12
System Framework
13
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking
• System Action/Policy
Decision
Output
Generation
Recognized Text
我要申辦下下周期間的國外上網方案
Semantic Frame
apply_international_dataplan
period=下下周
System Action/Policy
request_country
Text response
你要去哪一國
Screen
Display
country?
Text Input
我要申辦下下周期間的國外上網方案
Speech Signal
Interaction Example
User
Intelligent
Agent Q: How does a dialogue system process this request?
你上個月的電話費帳單金額為800元,
請問你要用你預設的帳號繳款嗎?
我要繳交上個月的手機費帳單
14
System Framework
15
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking
• System Action/Policy
Decision
Output
Generation
Recognized Text
我要申辦下下周期間的國外上網方案
Semantic Frame
apply_international_dataplan
period=下下周
System Action/Policy
request_country
Text response
你要去哪一國
Screen
Display
country?
Text Input
我要申辦下下周期間的國外上網方案
Speech Signal
1. Domain Identification
Requires Predefined Domain Ontology
User
Organized Domain Knowledge (Database)Intelligent
Agent
16
市話 DB 個人資料 DB
Machine Learning for Classification
手機 DB
我要繳交上個月的手機費帳單
我要繳交上個月的手機費帳單
2. Intent Detection
Requires Predefined Schema
User
Intelligent
Agent
17
FEE_PAYMENT
CHECK_REMAINING_DATA
:
Machine Learning for Classification
手機 DB
我要繳交上個月的手機費帳單
3. Slot Filling
Requires Predefined Schema
User
Intelligent
Agent
18
手機 DB
Number Period Amount
0933xxx 12月 800
0928xxx 11月 560
: : :
FEE_PAYMENT
period=“上個月”
FEE_PAYMENT
period=“12月”
amount=“800”
Semantic Frame
Machine Learning for Information Extraction
System Framework
19
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking
• System Action/Policy
Decision
Output
Generation
Recognized Text
我要申辦下下周期間的國外上網方案
Semantic Frame
apply_international_dataplan
period=下下周
System Action/Policy
request_country
Text response
你要去哪一國
Screen
Display
country?
Text Input
我要申辦下下周期間的國外上網方案
Speech Signal
State Tracking
Requires Hand-Crafted States
User
Intelligent
Agent
20
amount period number
amount,
period
period,
number
amount,
card
all
要0933那個號碼
NULL
我要繳交上個月的手機費帳單
State Tracking
Requires Hand-Crafted States
User
Intelligent
Agent
21
period
period,
number
要0933那個號碼
NULL
我要繳交上個月的手機費帳單
State Tracking
Requires Hand-Crafted States
User
Intelligent
Agent
22
amount period number
amount,
period
period,
number
amount,
number
all
NULL
我要繳交x個月的手機費帳單
FEE_PAYMENT
period=“這個月”
FEE_PAYMENT
period=“上個月”
FEE_PAYMENT
?
?
Policy for Agent Action
• Inform
– “你的帳單金額為800元”
• Request
– “請問是要繳交哪一支號碼的呢?”
• Confirm
– “你要繳交12月的帳單嗎?”
• Database Search
• Task Completion / Information Display
– Payment / Data Checking
23
0933xxx
0928xxx
:
System Framework
24
Speech
Recognition
Language Understanding (LU)
• Domain Identification
• User Intent Detection
• Slot Filling
Dialogue Management (DM)
• Dialogue State Tracking
• System Action/Policy
Decision
Output
Generation
Recognized Text
我要申辦下下周期間的國外上網方案
Semantic Frame
apply_international_dataplan
period=下下周
System Action/Policy
request_country
Text response
你要去哪一國
Screen
Display
country?
Text Input
我要申辦下下周期間的國外上網方案
Speech Signal
Output / NL Generation
• Inform
– “你的帳單為800” v.s.
• Request
– “你要繳交哪一支號碼的帳單?” v.s.
• Confirm
– “你要繳交12月的帳單嗎?”
25
$800
Outline
Introduction
Intelligent Assistants 智慧助理
Mobile Service 行動客服
Dialogue System/Bot 對話系統/機器人
Framework
Language Understanding 語意理解
Dialogue Management 對話管理
Output Generation 輸出生成
Recent Trend
Industrial Trend and Challenge
Deep Learning Basics
Deep Learning for Dialogues
Conclusion 26
AI Startups
27
ChatBot Startups
28
FinTech ChatBot
3分でわかるFintech – Chat botが作り出す新しい世界
29
Challenge
• Predefined semantic schema
Chen et al., “Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding,” in ACL-IJCNLP, 2015.
• Data without annotations
Chen et al., “Zero-Shot Learning of Intent Embeddings for Expansion by Convolutional Deep Structured Semantic Models,” in ICASSP, 2016.
• Semantic concept interpretation
Chen et al., “Deriving Local Relational Surface Forms from Dependency-Based Entity Embeddings for Unsupervised Spoken Language Understanding,” in SLT, 2014.
• Predefined dialogue states
Chen, et al., “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding,” in Interspeech, 2016.
• Error propagation
Hakkani-Tur et al., “Multi-Domain Joint Semantic Frame Parsing using Bi-directional RNN-LSTM,” in Interspeech, 2016.
• Cross-domain intention/bot hierarchy
Sun et al., “An Intelligent Assistant for High-Level Task Understanding,” in IUI, 2016.
Sun et al., “AppDialogue: Multi-App Dialogues for Intelligent Assistants,” in LREC, 2016.
Chen et al., “Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding,” in ICMI, 2016.
• Cross-domain information transferring
Kim et al., “New Transfer Learning Techniques For Disparate Label Sets,” in ACL-IJCNLP, 2015.
FIND_RESTAURANT
rating=“good” rating=5? 4?
HotelRest Flight
Travel
Trip
Planning
30
Deep Learning
Basics 31
Learning ≈
Looking for a Function
• Speech Recognition
• Handwritten Recognition
• Weather forecast
• Play video games
 f
 f
 f
 f
“2”
“你好”
“ Saturday”
“move left”
Thursday
32
Machine Learning Framework
Training is to pick the best function given the observed data
Testing is to predict the label using the learned function
Training Data
Model: Hypothesis Function Set
21, ff
Training: Pick the best function f *
Testing:   yxf 
y
*
f“Best” Function
    ,ˆ,,ˆ, 2211 yxyx
Testing Data
  ,?,x
“It claims too much.”
- (negative)
:x
:ˆy
33
function input
function output
Target Function
• Classification Task
– x: input object to be classified  a N-dim vector
– y: class/label  a M-dim vector
  yxf  MN
RRf :
Assume both x and y can be represented as fixed-size vectors
34
Vector Representation Ex
• Handwriting Digit Classification
“2”“1”













0
0
1
10 dimensions for digit
recognition
“1”
“2”
“3”













0
1
0 “1”
“2”
“3”
1: for ink
0: otherwise
Each pixel
corresponds to an
element in the vector











1
0
16 x 16
16 x 16 = 256
dimensions
x: image y: class/label
“1” or not
“2” or not
“3” or not
MN
RRf :
35
Vector Representation Ex
• Sentiment Analysis
“-”“+”













0
0
1
3 dimensions
(positive, negative, neutral)
“+”
“-”
“?”













0
1
0 “+”
“-”
“?”
1: indicates the word
0: otherwise
Each element in the vector
corresponds to a word in the
vocabulary











1
0
dimensions =
size of vocab
x: word y: class/label
“+” or not
“-” or not
“?” or not
MN
RRf :
“love”
36
A Single Neuron
z
1w
2w
Nw
…
1x
2x
Nx

b
 z
 z
zbias
y
  z
e
z 


1
1

Sigmoid function
Activation
function
Each neuron is a very simple function
37
A Single Neuron
z
1w
2w
Nw
…
1x
2x
Nx

b
 z
bias
y
  z
e
z 


1
1

38
1
w, b are the parameters of this neuron
A Single Neuron
z
1w
2w
Nw
…1x
2x
Nx

b
bias
y
39
1
MN
RRf :





5.0"2"
5.0"2"
ynot
yis
A single neuron can only handle binary classification
A Layer of Neurons
• Handwriting digit classification
40
MN
RRf :
A layer of neurons can handle multiple possible output,
and the result depends on the max one
…
1x
2x
Nx

1
 1y

…
…
“1” or not
“2” or not
“3” or not
2y
3y
10 neurons/10 classes
Which
one is
the
max?
Neural Networks – Multi-
Layer Perceptron (MLP)
1a 1z
 2z
1x
2x 
z
2a
Hidden Units
1 1
y
41
• Continuous function w/ 2 layers
• Combine two opposite-facing
threshold functions to make a
ridge
• Continuous function w/ 3 layers
• Combine two perpendicular
ridges to make a bump
 Add bumps of various sizes
and locations to fit any surface
Expression of MLP
http://aima.eecs.berkeley.edu/slides-pdf/chapter20b.pdf
Multiple layers enrich the model expression, so that
the model can approximate more complex functions
42
Deep Neural Networks (DNN)
• Fully connected feedforward network
1x
2x
……
Layer 1
……
1y
2y
……
Layer 2
……
Layer L
……
……
……
Input Output
MyNx
vector
x
vector
y
MN
RRf :
Deep NN: multiple hidden layers
43
Deep Learning
for Dialogues 44
RNN for SLU
• IOB Sequence Labeling for Slot Filling
• Intent Classification
45
𝑤0 𝑤1 𝑤2 𝑤 𝑛
ℎ0
𝑓
ℎ1
𝑓
ℎ2
𝑓
ℎ 𝑛
𝑓
ℎ0
𝑏
ℎ1
𝑏
ℎ2
𝑏 ℎ 𝑛
𝑏
𝑦0 𝑦1 𝑦2 𝑦 𝑛
(a) LSTM (b) LSTM-LA (c) bLSTM-LA
(d) Intent LSTM
intent
𝑤0 𝑤1 𝑤2 𝑤 𝑛
ℎ0 ℎ1 ℎ2 ℎ 𝑛
𝑦0 𝑦1 𝑦2 𝑦 𝑛
𝑤0 𝑤1 𝑤2 𝑤 𝑛
ℎ0 ℎ1 ℎ2 ℎ 𝑛
𝑦0 𝑦1 𝑦2 𝑦 𝑛
𝑤0 𝑤1 𝑤2 𝑤 𝑛
ℎ0 ℎ1 ℎ2 ℎ 𝑛
RNN for SLU
• Joint Multi-Domain Intent Prediction and Slot Filling
– Information can mutually enhanced
46
semantic frame sequence
ht-1 ht+1ht
W W W W
taiwanese
B-type
U
food
U
please
U
V
O
V
O
V
hT+1
EOS
U
FIND_REST
V
Slot Tagging Intent
Prediction
Hakkani-Tur, et al., “Multi-Domain Joint Semantic Frame Parsing using Bi-directional RNN-LSTM,” in Interspeech, 2016.
47
just sent email to bob about fishing this weekend
O O O O
B-contact_name
O
B-subject I-subject I-subject
U
S
I send_email
D communication
 send_email(contact_name=“bob”, subject=“fishing this weekend”)
are we going to fish this weekend
U1
S2
 send_email(message=“are we going to fish this weekend”)
send email to bob
U2
 send_email(contact_name=“bob”)
B-message
I-message
I-message I-message I-message
I-message I-message
B-contact_nameS1
Domain Identification  Intent Prediction  Slot Filling
Contextual SLU (Chen et al., 2016)
48
u
Knowledge Attention Distributionpi
mi
Memory Representation
Weighted
Sum
h
∑ Wkg
o
Knowledge Encoding
Representation
history utterances {xi}
current utterance
c
Inner
Product
Sentence
Encoder
RNNin
x1 x2 xi…
Contextual
Sentence Encoder
x1 x2 xi…
RNNmem
slot tagging sequence y
ht-1 ht
V V
W W W
wt-1 wt
yt-1 yt
U U
RNN
Tagger
M M
Chen, et al., “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding,” in Interspeech, 2016.
1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding
Contextual SLU (Chen et al., 2016)
Idea: additionally incorporating contextual knowledge during slot tagging
 track dialogue states in a latent way
End-to-End Supervised
Dialogue System
49Wen, et al., “A Network-based End-to-End Trainable Task-Oriented Dialogue System,” arXiv.:1604.04562v2.
InfoBot:
End-to-End Dialogue System with
Supervised & Reinforcement Learning
50
Movie=?; Actor=Bill Murray; Release Year=1993
Find me the Bill Murray’s movie.
I think it came out in 1993.
When was it released?
Groundhog Day is a Bill Murray
movie which came out in 1993.
KB-InfoBotUser
(Groundhog Day, actor, Bill Murray)
(Groundhog Day, release year, 1993)
(Australia, actor, Nicole Kidman)
(Mad Max: Fury Road, release year, 2015)
Knowledge Base (head, relation, tail)
Dhingra, et al., “End-to-End Reinforcement Learning of Dialogue Agents for Information Access,” arXiv.:1609.00777v2.
E2E RL Task-Completion Bot
51
wi
B-
typ
e
wi
+1
wi+2
O O
E
OS
<intent
>
wi
B-
typ
e
wi
+1
wi+2
O O
E
OS
<intent
> Semantic Frame
request_movie
genre=action,
date=this weekend
System Action/Policy
request_location
User Dialogue Action
Inform(location=Bellevue)
Time t-1
wi
<slot>
wi+1 wi+2
O O
EOS
<intent>
Language Understanding (LU)
Time t-2
Time t
Dialogue
Management
(DM)
w0 w1
w2
Natural Language Generation (NLG)
EOS
User
Goal
User Agenda Modeling
User Simulator (Li et al., 2016)
End-to-End Neural Dialogue System
Text Input
Are there any action movies to
see this weekend?
E2E RL Task-Completion Bot
• Simulated User
– Generate interactions based on a predefined fake goal
– Automatically learn strategy by training on the simulated data
52
User goal: Two tickets for “the witch” tomorrow 9:30 PM at regal meridian 16, Seattle.
RULE BASED SYSTEM
User: Can I get 2 tickets for the witch?
Agent: Which theater do you want?
User: Regal meridian 16
Agent: What time would you like to see it?
User: 9:30 pm
Agent: What date would you like to see it?
User: Tomorrow!
Agent: How many people are you?
User: Just two adults.
Agent: What is your location?
User: Seattle!
Agent: What kind of video format do you like?
User: Oh, any kind is okay.
Agent: Do you have any price constraint?
User: No, I don’t care.
Agent: Great - I was able to purchase 2 tickets for you to
see the witch tomorrow at regal meridian 16 theater in
Seattle at 9:30 pm.
User: Thank you.
Agent: Thank you!
REINFORCEMENT LEARNING SYSTEM
User: Can I get 2 tickets for the witch?
Agent: What date would you like to see it?
User: Tomorrow
Agent: Which theater do you prefer?
User: Regal meridian 16
Agent: What is your location?
User: Seattle
Agent: What time would you like to see it?
User: 9:30 pm.
Agent: Great - I was able to purchase 2 tickets for you
to see the witch tomorrow at regal meridian 16
theater in Seattle at 9:30 pm.
User: Thanks.
Agent: Thanks!
Outline
Introduction
Intelligent Assistants 智慧助理
Mobile Service 行動客服
Dialogue System/Bot 對話系統/機器人
Framework
Language Understanding 語意理解
Dialogue Management 對話管理
Output Generation 輸出生成
Recent Trend
Industrial Trend and Challenge
Deep Learning Basics
Deep Learning for Dialogues
Conclusion 53
Conclusion
• The conversational bots can help users manage information
access and finish tasks via spoken interactions
– More natural
– More convenient
– More efficient
– User-centered
• Future Vision
– Not only single-turn requests but also multi-turn conversations
– Not only simple transactions but also complicated ones
• Dialogues can span on multiple domains (e.g. check remaining
data and then apply for more data)
• NN-Based Dialogue System
– Pipeline outputs are represented as vectors  distributional
– The execution is constrained by backend services  symbolic
54
Q & A
T H A N K S F O R YO U R AT T E N T I O N !
55
1 of 55

More Related Content

What's hot(20)

 лексичне значення слова. лексичне значення слова.
лексичне значення слова.
Танюшка Пересунько3.8K views
Tutorial on Ontology editor: Protege Tutorial on Ontology editor: Protege
Tutorial on Ontology editor: Protege
Biswanath Dutta290 views
Natural Language Processing Natural Language Processing
Natural Language Processing
Adarsh Saxena1.4K views
Мова. 8 класМова. 8 клас
Мова. 8 клас
Yulia Novichenko977 views
тема  лексикологія. фразеологіятема  лексикологія. фразеологія
тема лексикологія. фразеологія
Валентина Кодола6.2K views
 Cultural Terms in Translation: Techniques and Gaps Cultural Terms in Translation: Techniques and Gaps
Cultural Terms in Translation: Techniques and Gaps
English Literature and Language Review ELLR126 views
Sentiwordnet [IIT-Bombay]Sentiwordnet [IIT-Bombay]
Sentiwordnet [IIT-Bombay]
Sagar Ahire5.8K views
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
Prof.Ravindra Borse343 views
Critical Discourse AnalysisCritical Discourse Analysis
Critical Discourse Analysis
KamranAhmed173675 views
дитячий фольклордитячий фольклор
дитячий фольклор
Olena Pyzaenko1.4K views
OCR with MXNet GluonOCR with MXNet Gluon
OCR with MXNet Gluon
Apache MXNet2.9K views
SYSTEMIC FUNCTIONAL LINGUISTICS: REGISTER & GENRESYSTEMIC FUNCTIONAL LINGUISTICS: REGISTER & GENRE
SYSTEMIC FUNCTIONAL LINGUISTICS: REGISTER & GENRE
A. Tenry Lawangen Aspat Colle1.1K views

Viewers also liked(20)

Deep Learning for Dialogue Modeling - NTHUDeep Learning for Dialogue Modeling - NTHU
Deep Learning for Dialogue Modeling - NTHU
Yun-Nung (Vivian) Chen874 views
給軟體工程師的不廢話 R 語言精要班給軟體工程師的不廢話 R 語言精要班
給軟體工程師的不廢話 R 語言精要班
台灣資料科學年會30.2K views
Prashobh_Resume_2_001Prashobh_Resume_2_001
Prashobh_Resume_2_001
Prashobh Paul Maliyekal Nambadan243 views

Similar to Language Empowering Intelligent Assistants (CHT)

Logic tree mobile_gvLogic tree mobile_gv
Logic tree mobile_gvLogictreeit
560 views16 slides
Vivek_MKVivek_MK
Vivek_MKVivek MK
196 views4 slides
CV_Sayani_UpdatedCV_Sayani_Updated
CV_Sayani_UpdatedSAYANI ROY
221 views7 slides

Similar to Language Empowering Intelligent Assistants (CHT)(20)

Enterprise Mobility @ NeevEnterprise Mobility @ Neev
Enterprise Mobility @ Neev
Neev Technologies1.5K views
Logic tree mobile_gvLogic tree mobile_gv
Logic tree mobile_gv
Logictreeit560 views
Vivek_MKVivek_MK
Vivek_MK
Vivek MK196 views
CV_Sayani_UpdatedCV_Sayani_Updated
CV_Sayani_Updated
SAYANI ROY221 views
Neev mobile offeringsNeev mobile offerings
Neev mobile offerings
Neev Technologies10.3K views
ResumeResume
Resume
amit kumar133 views
ResumeResume
Resume
amit kumar194 views
Dot Net ProfileDot Net Profile
Dot Net Profile
amit kumar284 views
2016 IBM Watson IoT Forum2016 IBM Watson IoT Forum
2016 IBM Watson IoT Forum
Deirdre Curran663 views
YOGESH_SINGH_RESUME_JAVAYOGESH_SINGH_RESUME_JAVA
YOGESH_SINGH_RESUME_JAVA
Yogesh Singh356 views
Machine Learning AND Deep Learning for OpenPOWERMachine Learning AND Deep Learning for OpenPOWER
Machine Learning AND Deep Learning for OpenPOWER
Ganesan Narayanasamy432 views
Amit BhandariAmit Bhandari
Amit Bhandari
Amit Bhandari274 views

More from Yun-Nung (Vivian) Chen(8)

Recently uploaded(20)

[2023] Putting the R! in R&D.pdf[2023] Putting the R! in R&D.pdf
[2023] Putting the R! in R&D.pdf
Eleanor McHugh34 views
Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet48 views
ChatGPT and AI for Web DevelopersChatGPT and AI for Web Developers
ChatGPT and AI for Web Developers
Maximiliano Firtman152 views

Language Empowering Intelligent Assistants (CHT)

  • 1. Y U N - N U N G ( V I V I A N ) C H E N 陳 縕 儂 H T T P : / / V I V I A N C H E N . I D V . T W
  • 2. Outline Introduction Intelligent Assistants 智慧助理 Mobile Service 行動客服 Dialogue System/Bot 對話系統/機器人 Framework Language Understanding 語意理解 Dialogue Management 對話管理 Output Generation 輸出生成 Recent Trend Industrial Trend and Challenge Deep Learning Basics Deep Learning for Dialogues Conclusion 2
  • 3. Outline Introduction Intelligent Assistants 智慧助理 Mobile Service 行動客服 Dialogue System/Bot 對話系統/機器人 Framework Language Understanding 語意理解 Dialogue Management 對話管理 Output Generation 輸出生成 Recent Trend Industrial Trend and Challenge Deep Learning Basics Deep Learning for Dialogues Conclusion 3
  • 4. Apple Siri (2011) Google Now (2012) Facebook M & Bot (2015) Intelligent Assistants 智慧助理 4 Google Home (2016) Microsoft Cortana (2014) Amazon Alexa/Echo (2014)
  • 5. Why do we need them? – Get things done • E.g. set up alarm/reminder, take note – Easy access to structured data, services and apps • E.g. find docs/photos/notes – Assist your routine schedule • E.g. check the account balance – Be more productive in managing your work and personal life 5
  • 6. Mobile Service 行動客服 6 • allows customers to conduct a range of financial transactions remotely using a mobile devices, usually called an app (e.g. Richart) reducing the need for visiting a branch  cost reduction
  • 7. 7 Mobile Service 行動客服 • The users can finish specific tasks that are predefined by the app • Limitation – App usage design may not be user-friendly – Good designs may differ across people – Learning how to use app takes time
  • 8. Why Natural Language? • Global Digital Statistics (2015 January) Global Population 7.21B Active Internet Users 3.01B Active Social Media Accounts 2.08B Active Unique Mobile Users 3.65B The more natural and convenient input of devices evolves towards speech. 8
  • 9. Intelligent Assistant Architecture Reactive Assistance 反應式協助 Proactive Assistance 主動式協助 Data Data Bases and Client Signals Device/Service End-points (Phone, PC, Xbox, Web Browser, Messaging Apps) User Experience “restaurant suggestions”“call taxi” 9
  • 10. • Dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions. • Dialogue systems are being incorporated into various devices (smart- phones, smart TVs, in-car navigating system, etc). Good dialogue systems assist users to access information conveniently and finish tasks efficiently. Dialogue System 對話系統 JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion 10
  • 11. App  Bot 11 • A bot is responsible for a “single” domain, similar to an app Seamless and automatic information transferring across domains  reduce duplicate information and interaction
  • 12. Outline Introduction Intelligent Assistants 智慧助理 Mobile Service 行動客服 Dialogue System/Bot 對話系統/機器人 Framework Language Understanding 語意理解 Dialogue Management 對話管理 Output Generation 輸出生成 Recent Trend Industrial Trend and Challenge Deep Learning Basics Deep Learning for Dialogues Conclusion 12
  • 13. System Framework 13 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking • System Action/Policy Decision Output Generation Recognized Text 我要申辦下下周期間的國外上網方案 Semantic Frame apply_international_dataplan period=下下周 System Action/Policy request_country Text response 你要去哪一國 Screen Display country? Text Input 我要申辦下下周期間的國外上網方案 Speech Signal
  • 14. Interaction Example User Intelligent Agent Q: How does a dialogue system process this request? 你上個月的電話費帳單金額為800元, 請問你要用你預設的帳號繳款嗎? 我要繳交上個月的手機費帳單 14
  • 15. System Framework 15 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking • System Action/Policy Decision Output Generation Recognized Text 我要申辦下下周期間的國外上網方案 Semantic Frame apply_international_dataplan period=下下周 System Action/Policy request_country Text response 你要去哪一國 Screen Display country? Text Input 我要申辦下下周期間的國外上網方案 Speech Signal
  • 16. 1. Domain Identification Requires Predefined Domain Ontology User Organized Domain Knowledge (Database)Intelligent Agent 16 市話 DB 個人資料 DB Machine Learning for Classification 手機 DB 我要繳交上個月的手機費帳單
  • 17. 我要繳交上個月的手機費帳單 2. Intent Detection Requires Predefined Schema User Intelligent Agent 17 FEE_PAYMENT CHECK_REMAINING_DATA : Machine Learning for Classification 手機 DB
  • 18. 我要繳交上個月的手機費帳單 3. Slot Filling Requires Predefined Schema User Intelligent Agent 18 手機 DB Number Period Amount 0933xxx 12月 800 0928xxx 11月 560 : : : FEE_PAYMENT period=“上個月” FEE_PAYMENT period=“12月” amount=“800” Semantic Frame Machine Learning for Information Extraction
  • 19. System Framework 19 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking • System Action/Policy Decision Output Generation Recognized Text 我要申辦下下周期間的國外上網方案 Semantic Frame apply_international_dataplan period=下下周 System Action/Policy request_country Text response 你要去哪一國 Screen Display country? Text Input 我要申辦下下周期間的國外上網方案 Speech Signal
  • 20. State Tracking Requires Hand-Crafted States User Intelligent Agent 20 amount period number amount, period period, number amount, card all 要0933那個號碼 NULL 我要繳交上個月的手機費帳單
  • 21. State Tracking Requires Hand-Crafted States User Intelligent Agent 21 period period, number 要0933那個號碼 NULL 我要繳交上個月的手機費帳單
  • 22. State Tracking Requires Hand-Crafted States User Intelligent Agent 22 amount period number amount, period period, number amount, number all NULL 我要繳交x個月的手機費帳單 FEE_PAYMENT period=“這個月” FEE_PAYMENT period=“上個月” FEE_PAYMENT ? ?
  • 23. Policy for Agent Action • Inform – “你的帳單金額為800元” • Request – “請問是要繳交哪一支號碼的呢?” • Confirm – “你要繳交12月的帳單嗎?” • Database Search • Task Completion / Information Display – Payment / Data Checking 23 0933xxx 0928xxx :
  • 24. System Framework 24 Speech Recognition Language Understanding (LU) • Domain Identification • User Intent Detection • Slot Filling Dialogue Management (DM) • Dialogue State Tracking • System Action/Policy Decision Output Generation Recognized Text 我要申辦下下周期間的國外上網方案 Semantic Frame apply_international_dataplan period=下下周 System Action/Policy request_country Text response 你要去哪一國 Screen Display country? Text Input 我要申辦下下周期間的國外上網方案 Speech Signal
  • 25. Output / NL Generation • Inform – “你的帳單為800” v.s. • Request – “你要繳交哪一支號碼的帳單?” v.s. • Confirm – “你要繳交12月的帳單嗎?” 25 $800
  • 26. Outline Introduction Intelligent Assistants 智慧助理 Mobile Service 行動客服 Dialogue System/Bot 對話系統/機器人 Framework Language Understanding 語意理解 Dialogue Management 對話管理 Output Generation 輸出生成 Recent Trend Industrial Trend and Challenge Deep Learning Basics Deep Learning for Dialogues Conclusion 26
  • 29. FinTech ChatBot 3分でわかるFintech – Chat botが作り出す新しい世界 29
  • 30. Challenge • Predefined semantic schema Chen et al., “Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding,” in ACL-IJCNLP, 2015. • Data without annotations Chen et al., “Zero-Shot Learning of Intent Embeddings for Expansion by Convolutional Deep Structured Semantic Models,” in ICASSP, 2016. • Semantic concept interpretation Chen et al., “Deriving Local Relational Surface Forms from Dependency-Based Entity Embeddings for Unsupervised Spoken Language Understanding,” in SLT, 2014. • Predefined dialogue states Chen, et al., “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding,” in Interspeech, 2016. • Error propagation Hakkani-Tur et al., “Multi-Domain Joint Semantic Frame Parsing using Bi-directional RNN-LSTM,” in Interspeech, 2016. • Cross-domain intention/bot hierarchy Sun et al., “An Intelligent Assistant for High-Level Task Understanding,” in IUI, 2016. Sun et al., “AppDialogue: Multi-App Dialogues for Intelligent Assistants,” in LREC, 2016. Chen et al., “Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding,” in ICMI, 2016. • Cross-domain information transferring Kim et al., “New Transfer Learning Techniques For Disparate Label Sets,” in ACL-IJCNLP, 2015. FIND_RESTAURANT rating=“good” rating=5? 4? HotelRest Flight Travel Trip Planning 30
  • 32. Learning ≈ Looking for a Function • Speech Recognition • Handwritten Recognition • Weather forecast • Play video games  f  f  f  f “2” “你好” “ Saturday” “move left” Thursday 32
  • 33. Machine Learning Framework Training is to pick the best function given the observed data Testing is to predict the label using the learned function Training Data Model: Hypothesis Function Set 21, ff Training: Pick the best function f * Testing:   yxf  y * f“Best” Function     ,ˆ,,ˆ, 2211 yxyx Testing Data   ,?,x “It claims too much.” - (negative) :x :ˆy 33 function input function output
  • 34. Target Function • Classification Task – x: input object to be classified  a N-dim vector – y: class/label  a M-dim vector   yxf  MN RRf : Assume both x and y can be represented as fixed-size vectors 34
  • 35. Vector Representation Ex • Handwriting Digit Classification “2”“1”              0 0 1 10 dimensions for digit recognition “1” “2” “3”              0 1 0 “1” “2” “3” 1: for ink 0: otherwise Each pixel corresponds to an element in the vector            1 0 16 x 16 16 x 16 = 256 dimensions x: image y: class/label “1” or not “2” or not “3” or not MN RRf : 35
  • 36. Vector Representation Ex • Sentiment Analysis “-”“+”              0 0 1 3 dimensions (positive, negative, neutral) “+” “-” “?”              0 1 0 “+” “-” “?” 1: indicates the word 0: otherwise Each element in the vector corresponds to a word in the vocabulary            1 0 dimensions = size of vocab x: word y: class/label “+” or not “-” or not “?” or not MN RRf : “love” 36
  • 37. A Single Neuron z 1w 2w Nw … 1x 2x Nx  b  z  z zbias y   z e z    1 1  Sigmoid function Activation function Each neuron is a very simple function 37
  • 38. A Single Neuron z 1w 2w Nw … 1x 2x Nx  b  z bias y   z e z    1 1  38 1 w, b are the parameters of this neuron
  • 39. A Single Neuron z 1w 2w Nw …1x 2x Nx  b bias y 39 1 MN RRf :      5.0"2" 5.0"2" ynot yis A single neuron can only handle binary classification
  • 40. A Layer of Neurons • Handwriting digit classification 40 MN RRf : A layer of neurons can handle multiple possible output, and the result depends on the max one … 1x 2x Nx  1  1y  … … “1” or not “2” or not “3” or not 2y 3y 10 neurons/10 classes Which one is the max?
  • 41. Neural Networks – Multi- Layer Perceptron (MLP) 1a 1z  2z 1x 2x  z 2a Hidden Units 1 1 y 41
  • 42. • Continuous function w/ 2 layers • Combine two opposite-facing threshold functions to make a ridge • Continuous function w/ 3 layers • Combine two perpendicular ridges to make a bump  Add bumps of various sizes and locations to fit any surface Expression of MLP http://aima.eecs.berkeley.edu/slides-pdf/chapter20b.pdf Multiple layers enrich the model expression, so that the model can approximate more complex functions 42
  • 43. Deep Neural Networks (DNN) • Fully connected feedforward network 1x 2x …… Layer 1 …… 1y 2y …… Layer 2 …… Layer L …… …… …… Input Output MyNx vector x vector y MN RRf : Deep NN: multiple hidden layers 43
  • 45. RNN for SLU • IOB Sequence Labeling for Slot Filling • Intent Classification 45 𝑤0 𝑤1 𝑤2 𝑤 𝑛 ℎ0 𝑓 ℎ1 𝑓 ℎ2 𝑓 ℎ 𝑛 𝑓 ℎ0 𝑏 ℎ1 𝑏 ℎ2 𝑏 ℎ 𝑛 𝑏 𝑦0 𝑦1 𝑦2 𝑦 𝑛 (a) LSTM (b) LSTM-LA (c) bLSTM-LA (d) Intent LSTM intent 𝑤0 𝑤1 𝑤2 𝑤 𝑛 ℎ0 ℎ1 ℎ2 ℎ 𝑛 𝑦0 𝑦1 𝑦2 𝑦 𝑛 𝑤0 𝑤1 𝑤2 𝑤 𝑛 ℎ0 ℎ1 ℎ2 ℎ 𝑛 𝑦0 𝑦1 𝑦2 𝑦 𝑛 𝑤0 𝑤1 𝑤2 𝑤 𝑛 ℎ0 ℎ1 ℎ2 ℎ 𝑛
  • 46. RNN for SLU • Joint Multi-Domain Intent Prediction and Slot Filling – Information can mutually enhanced 46 semantic frame sequence ht-1 ht+1ht W W W W taiwanese B-type U food U please U V O V O V hT+1 EOS U FIND_REST V Slot Tagging Intent Prediction Hakkani-Tur, et al., “Multi-Domain Joint Semantic Frame Parsing using Bi-directional RNN-LSTM,” in Interspeech, 2016.
  • 47. 47 just sent email to bob about fishing this weekend O O O O B-contact_name O B-subject I-subject I-subject U S I send_email D communication  send_email(contact_name=“bob”, subject=“fishing this weekend”) are we going to fish this weekend U1 S2  send_email(message=“are we going to fish this weekend”) send email to bob U2  send_email(contact_name=“bob”) B-message I-message I-message I-message I-message I-message I-message B-contact_nameS1 Domain Identification  Intent Prediction  Slot Filling Contextual SLU (Chen et al., 2016)
  • 48. 48 u Knowledge Attention Distributionpi mi Memory Representation Weighted Sum h ∑ Wkg o Knowledge Encoding Representation history utterances {xi} current utterance c Inner Product Sentence Encoder RNNin x1 x2 xi… Contextual Sentence Encoder x1 x2 xi… RNNmem slot tagging sequence y ht-1 ht V V W W W wt-1 wt yt-1 yt U U RNN Tagger M M Chen, et al., “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding,” in Interspeech, 2016. 1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding Contextual SLU (Chen et al., 2016) Idea: additionally incorporating contextual knowledge during slot tagging  track dialogue states in a latent way
  • 49. End-to-End Supervised Dialogue System 49Wen, et al., “A Network-based End-to-End Trainable Task-Oriented Dialogue System,” arXiv.:1604.04562v2.
  • 50. InfoBot: End-to-End Dialogue System with Supervised & Reinforcement Learning 50 Movie=?; Actor=Bill Murray; Release Year=1993 Find me the Bill Murray’s movie. I think it came out in 1993. When was it released? Groundhog Day is a Bill Murray movie which came out in 1993. KB-InfoBotUser (Groundhog Day, actor, Bill Murray) (Groundhog Day, release year, 1993) (Australia, actor, Nicole Kidman) (Mad Max: Fury Road, release year, 2015) Knowledge Base (head, relation, tail) Dhingra, et al., “End-to-End Reinforcement Learning of Dialogue Agents for Information Access,” arXiv.:1609.00777v2.
  • 51. E2E RL Task-Completion Bot 51 wi B- typ e wi +1 wi+2 O O E OS <intent > wi B- typ e wi +1 wi+2 O O E OS <intent > Semantic Frame request_movie genre=action, date=this weekend System Action/Policy request_location User Dialogue Action Inform(location=Bellevue) Time t-1 wi <slot> wi+1 wi+2 O O EOS <intent> Language Understanding (LU) Time t-2 Time t Dialogue Management (DM) w0 w1 w2 Natural Language Generation (NLG) EOS User Goal User Agenda Modeling User Simulator (Li et al., 2016) End-to-End Neural Dialogue System Text Input Are there any action movies to see this weekend?
  • 52. E2E RL Task-Completion Bot • Simulated User – Generate interactions based on a predefined fake goal – Automatically learn strategy by training on the simulated data 52 User goal: Two tickets for “the witch” tomorrow 9:30 PM at regal meridian 16, Seattle. RULE BASED SYSTEM User: Can I get 2 tickets for the witch? Agent: Which theater do you want? User: Regal meridian 16 Agent: What time would you like to see it? User: 9:30 pm Agent: What date would you like to see it? User: Tomorrow! Agent: How many people are you? User: Just two adults. Agent: What is your location? User: Seattle! Agent: What kind of video format do you like? User: Oh, any kind is okay. Agent: Do you have any price constraint? User: No, I don’t care. Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm. User: Thank you. Agent: Thank you! REINFORCEMENT LEARNING SYSTEM User: Can I get 2 tickets for the witch? Agent: What date would you like to see it? User: Tomorrow Agent: Which theater do you prefer? User: Regal meridian 16 Agent: What is your location? User: Seattle Agent: What time would you like to see it? User: 9:30 pm. Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm. User: Thanks. Agent: Thanks!
  • 53. Outline Introduction Intelligent Assistants 智慧助理 Mobile Service 行動客服 Dialogue System/Bot 對話系統/機器人 Framework Language Understanding 語意理解 Dialogue Management 對話管理 Output Generation 輸出生成 Recent Trend Industrial Trend and Challenge Deep Learning Basics Deep Learning for Dialogues Conclusion 53
  • 54. Conclusion • The conversational bots can help users manage information access and finish tasks via spoken interactions – More natural – More convenient – More efficient – User-centered • Future Vision – Not only single-turn requests but also multi-turn conversations – Not only simple transactions but also complicated ones • Dialogues can span on multiple domains (e.g. check remaining data and then apply for more data) • NN-Based Dialogue System – Pipeline outputs are represented as vectors  distributional – The execution is constrained by backend services  symbolic 54
  • 55. Q & A T H A N K S F O R YO U R AT T E N T I O N ! 55

Editor's Notes

  1. How can find a restaurant
  2. Domain knowledge representation (graph)
  3. Domain knowledge representation (graph)
  4. Domain knowledge representation (graph)
  5. Domain knowledge representation (graph)
  6. Domain knowledge representation (graph)
  7. Domain knowledge representation (graph)
  8. Function can be so complex