SlideShare a Scribd company logo
1 of 60
Part II-1: Natural Language
Processing
Generative Adversarial Network
and its Applications to Speech Processing
and Natural Language Processing
Sequence Generation
Generator Generator
Machine Learning
Generator
How are you?
How are you I am fine.
ASR Translation Chatbot
The generator is a typical seq2seq model.
With GAN, you can train seq2seq model in another way.
Review: Sequence-to-sequence
• Chat-bot as example
Encoder Decoder
Input sentence c
output
sentence x
Training
data:
A: How are you ?
B: I’m good.
…………
How are you ?
I’m good.
Generator
Output: Not bad I’m John.
Maximize
likelihood
Training Criterion
Human better
better
Improving Sequence-to-sequence
RL (human
feedback)
GAN
(discriminator
feedback)
Introduction
• Machine obtains feedback from user
• Chat-bot learns to maximize the expected reward
https://image.freepik.com/free-vector/variety-
of-human-avatars_23-2147506285.jpg
How are
you?
Bye bye 
Hello
Hi 
-10 3
http://www.freepik.com/free-vector/variety-
of-human-avatars_766615.htm
Maximizing Expected Reward
Human
Input sentence c response sentence x
Chatbot
En De
response sentence x
Input sentence c
[Li, et al., EMNLP, 2016]
reward
𝑅 𝑐, 𝑥
Learn to maximize expected reward
Policy Gradient
Maximizing Expected Reward
𝜃∗ = 𝑎𝑟𝑔 max
𝜃
ത𝑅 𝜃
ത𝑅 𝜃
Encoder Generator
𝜃
𝑐 𝑥 Human
= ෍
𝑐
𝑃 𝑐 ෍
𝑥
𝑅 𝑐, 𝑥 𝑃 𝜃 𝑥|𝑐
𝑅 𝑐, 𝑥
Randomness in generator
Probability that the input/history is c
Maximizing expected reward
update
Maximizing Expected Reward
= 𝐸 𝑐~𝑃 𝑐 𝐸 𝑥~𝑃 𝜃 𝑥|𝑐 𝑅 𝑐, 𝑥
≈
1
𝑁
෍
𝑖=1
𝑁
𝑅 𝑐 𝑖
, 𝑥 𝑖
Sample:
𝜃∗ = 𝑎𝑟𝑔 max
𝜃
ത𝑅 𝜃
= ෍
𝑐
𝑃 𝑐 ෍
𝑥
𝑅 𝑐, 𝑥 𝑃 𝜃 𝑥|𝑐
Maximizing expected reward
Encoder Generator
𝜃
𝑐 𝑥 Human 𝑅 𝑐, 𝑥
update
ത𝑅 𝜃
= 𝐸ℎ~𝑃 𝑐 ,𝑥~𝑃 𝜃 𝑥|𝑐 𝑅 𝑐, 𝑥
𝑐1, 𝑥1 , 𝑐2, 𝑥2 , ⋯ , 𝑐 𝑁, 𝑥 𝑁
Where
is 𝜃?
Policy Gradient
𝛻 ത𝑅 𝜃 = ෍
𝑐
𝑃 𝑐 ෍
𝑥
𝑅 𝑐, 𝑥 𝛻𝑃 𝜃 𝑥|𝑐
= ෍
𝑐
𝑃 𝑐 ෍
𝑥
𝑅 𝑐, 𝑥 𝑃 𝜃 𝑥|𝑐
𝛻𝑃 𝜃 𝑥|𝑐
𝑃 𝜃 𝑥|𝑐
𝑑𝑙𝑜𝑔 𝑓 𝑥
𝑑𝑥
=
1
𝑓 𝑥
𝑑𝑓 𝑥
𝑑𝑥
= ෍
𝑐
𝑃 𝑐 ෍
𝑥
𝑅 𝑐, 𝑥 𝑃 𝜃 𝑥|𝑐 𝛻𝑙𝑜𝑔𝑃 𝜃 𝑥|𝑐
≈
1
𝑁
෍
𝑖=1
𝑁
𝑅 𝑐 𝑖, 𝑥 𝑖 𝛻𝑙𝑜𝑔𝑃 𝜃 𝑥 𝑖|𝑐 𝑖
= 𝐸 𝑐~𝑃 𝑐 ,𝑥~𝑃 𝜃 𝑥|𝑐 𝑅 𝑐, 𝑥 𝛻𝑙𝑜𝑔𝑃 𝜃 𝑥|𝑐
Sampling
= ෍
𝑐
𝑃 𝑐 ෍
𝑥
𝑅 𝑐, 𝑥 𝑃 𝜃 𝑥|𝑐ത𝑅 𝜃 ≈
1
𝑁
෍
𝑖=1
𝑁
𝑅 𝑐 𝑖, 𝑥 𝑖
Policy Gradient
• Gradient Ascent
𝜃 𝑛𝑒𝑤 ← 𝜃 𝑜𝑙𝑑 + 𝜂𝛻 ത𝑅 𝜃 𝑜𝑙𝑑
𝛻 ത𝑅 𝜃 ≈
1
𝑁
෍
𝑖=1
𝑁
𝑅 𝑐 𝑖, 𝑥 𝑖 𝛻𝑙𝑜𝑔𝑃 𝜃 𝑥 𝑖|𝑐 𝑖
𝑅 𝑐 𝑖, 𝑥 𝑖 is positive
After updating 𝜃, 𝑃 𝜃 𝑥 𝑖|𝑐 𝑖 will increase
𝑅 𝑐 𝑖, 𝑥 𝑖 is negative
After updating 𝜃, 𝑃 𝜃 𝑥 𝑖|𝑐 𝑖 will decrease
Policy Gradient - Implementation
𝜃 𝑡
𝑐1, 𝑥1
𝑐2, 𝑥2
𝑐 𝑁
, 𝑥 𝑁
……
𝑅 𝑐1, 𝑥1
𝑅 𝑐2, 𝑥2
𝑅 𝑐 𝑁, 𝑥 𝑁
……
1
𝑁
෍
𝑖=1
𝑁
𝑅 𝑐 𝑖, 𝑥 𝑖 𝛻𝑙𝑜𝑔𝑃 𝜃 𝑡 𝑥 𝑖|𝑐 𝑖
𝜃 𝑡+1 ← 𝜃 𝑡 + 𝜂𝛻 ത𝑅 𝜃 𝑡
𝑅 𝑐 𝑖, 𝑥 𝑖 is positive
Updating 𝜃 to increase 𝑃 𝜃 𝑥 𝑖
|𝑐 𝑖
𝑅 𝑐 𝑖, 𝑥 𝑖 is negative
Updating 𝜃 to decrease 𝑃 𝜃 𝑥 𝑖|𝑐 𝑖
Comparison
1
𝑁
෍
𝑖=1
𝑁
𝑅 𝑐 𝑖, 𝑥 𝑖 𝛻𝑙𝑜𝑔𝑃 𝜃 𝑥 𝑖|𝑐 𝑖
1
𝑁
෍
𝑖=1
𝑁
𝑙𝑜𝑔𝑃 𝜃 ො𝑥 𝑖
|𝑐 𝑖
1
𝑁
෍
𝑖=1
𝑁
𝛻𝑙𝑜𝑔𝑃 𝜃 ො𝑥 𝑖|𝑐 𝑖
1
𝑁
෍
𝑖=1
𝑁
𝑅 𝑐 𝑖, 𝑥 𝑖 𝑙𝑜𝑔𝑃 𝜃 𝑥 𝑖|𝑐 𝑖
𝑅 𝑐 𝑖
, ො𝑥 𝑖
= 1 obtained from interaction
weighted by 𝑅 𝑐 𝑖
, 𝑥 𝑖
Objective
Function
Gradient
Maximum
Likelihood
Reinforcement
Learning
Training
Data
𝑐1, ො𝑥1 , … , 𝑐 𝑁, ො𝑥 𝑁
𝑐1, 𝑥1 , … , 𝑐 𝑁, 𝑥 𝑁
Alpha GO style training !
• Let two agents talk to each other
How old are you?
See you.
See you.
See you.
How old are you?
I am 16.
I though you were 12.
What make you
think so?
Using a pre-defined evaluation function to compute R(c,x)
I am busy.
Improving Sequence-to-sequence
RL (human
feedback)
GAN
(discriminator
feedback)
Conditional GAN
Discriminator
Input sentence c response sentence x
Real or fake
http://www.nipic.com/show/3/83/3936650kd7476069.html
human
dialogues
Chatbot
En De
response sentence x
Input sentence c
[Li, et al., EMNLP, 2017]
“reward”
Algorithm
• Initialize generator G (chatbot) and discriminator D
• In each iteration:
• Sample input c and response 𝑥 from training set
• Sample input 𝑐′ from training set, and generate
response ෤𝑥 by G(𝑐′)
• Update D to increase 𝐷 𝑐, 𝑥 and decrease 𝐷 𝑐′, ෤𝑥
• Update generator G (chatbot) such that
Discrimi
nator
scalar
update
Training data:
𝑐
Chatbot
En De
Pairs of conditional input c
and response x
A A
A
B
A
B
A
A
B
B
B
<BOS>
Can we use
gradient ascent?
Due to the sampling process, “discriminator+ generator”
is not differentiable
Discriminator
Discrimi
nator
scalar
update
scalarChatbot
En De
NO!
Update Parameters
Three Categories of Solutions
Gumbel-softmax
• [Matt J. Kusner, et al, arXiv, 2016]
Continuous Input for Discriminator
• [Sai Rajeswar, et al., arXiv, 2017][Ofir Press, et al., ICML workshop, 2017][Zhen
Xu, et al., EMNLP, 2017][Alex Lamb, et al., NIPS, 2016][Yizhe Zhang, et al., ICML,
2017]
“Reinforcement Learning”
• [Yu, et al., AAAI, 2017][Li, et al., EMNLP, 2017][Tong Che, et al, arXiv,
2017][Jiaxian Guo, et al., AAAI, 2018][Kevin Lin, et al, NIPS, 2017][William
Fedus, et al., ICLR, 2018]
Gumbel-softmax
http://blog.evjang.com/
2016/11/tutorial-
categorical-
variational.html
https://casmls.github.i
o/general/2017/02/01
/GumbelSoftmax.html
https://gabrielhuang.g
itbooks.io/machine-
learning/reparametriz
ation-trick.html
Three Categories of Solutions
Gumbel-softmax
• [Matt J. Kusner, et al, arXiv, 2016]
Continuous Input for Discriminator
• [Sai Rajeswar, et al., arXiv, 2017][Ofir Press, et al., ICML workshop, 2017][Zhen
Xu, et al., EMNLP, 2017][Alex Lamb, et al., NIPS, 2016][Yizhe Zhang, et al., ICML,
2017]
“Reinforcement Learning”
• [Yu, et al., AAAI, 2017][Li, et al., EMNLP, 2017][Tong Che, et al, arXiv,
2017][Jiaxian Guo, et al., AAAI, 2018][Kevin Lin, et al, NIPS, 2017][William
Fedus, et al., ICLR, 2018]
A A
A
B
A
B
A
A
B
B
B
<BOS>
Use the distribution
as the input of
discriminator
Avoid the sampling
process
Discriminator
Discrimi
nator
scalar
update
scalarChatbot
En De
Update Parameters
We can do
backpropagation
now.
What is the problem?
• Real sentence
• Generated
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0.9
0.1
0
0
0
0.1
0.9
0
0
0
0.1
0.1
0.7
0.1
0
0
0
0.1
0.8
0.1
0
0
0
0.1
0.9
Can never
be 1-of-N
Discriminator can
immediately find
the difference.
WGAN is helpful
Three Categories of Solutions
Gumbel-softmax
• [Matt J. Kusner, et al, arXiv, 2016]
Continuous Input for Discriminator
• [Sai Rajeswar, et al., arXiv, 2017][Ofir Press, et al., ICML workshop, 2017][Zhen
Xu, et al., EMNLP, 2017][Alex Lamb, et al., NIPS, 2016][Yizhe Zhang, et al., ICML,
2017]
“Reinforcement Learning”
• [Yu, et al., AAAI, 2017][Li, et al., EMNLP, 2017][Tong Che, et al, arXiv,
2017][Jiaxian Guo, et al., AAAI, 2018][Kevin Lin, et al, NIPS, 2017][William
Fedus, et al., ICLR, 2018]
Reinforcement Learning?
• Consider the output of discriminator as reward
• Update generator to increase discriminator = to get
maximum reward
• Using the formulation of policy gradient, replace reward
𝑅 𝑐, 𝑥 with discriminator output D 𝑐, 𝑥
• Different from typical RL
• The discriminator would update
Discrimi
nator
scalar
update
Chatbot
En De
𝜃 𝑡
𝑐1, 𝑥1
𝑐2, 𝑥2
𝑐 𝑁, 𝑥 𝑁
……
……
g-step
d-step
discriminator
𝑐
𝑥
fake real
𝑅 𝑐1, 𝑥1
𝑅 𝑐2, 𝑥2
𝑅 𝑐 𝑁
, 𝑥 𝑁
𝐷 𝑐1
, 𝑥1
𝐷 𝑐2
, 𝑥2
𝐷 𝑐 𝑁, 𝑥 𝑁
1
𝑁
෍
𝑖=1
𝑁
𝑅 𝑐 𝑖, 𝑥 𝑖 𝛻𝑙𝑜𝑔𝑃 𝜃 𝑡 𝑥 𝑖|𝑐 𝑖
𝜃 𝑡+1
← 𝜃 𝑡
+ 𝜂𝛻 ത𝑅 𝜃 𝑡
𝑅 𝑐 𝑖, 𝑥 𝑖 is positive
Updating 𝜃 to increase 𝑃 𝜃 𝑥 𝑖
|𝑐 𝑖
𝑅 𝑐 𝑖, 𝑥 𝑖 is negative
Updating 𝜃 to decrease 𝑃 𝜃 𝑥 𝑖|𝑐 𝑖
𝐷 𝑐 𝑖, 𝑥 𝑖
𝐷 𝑐 𝑖, 𝑥 𝑖
𝐷 𝑐 𝑖, 𝑥 𝑖
D
Reward for Every Generation Step
𝛻 ത𝑅 𝜃 ≈
1
𝑁
෍
𝑖=1
𝑁
𝐷 𝑐 𝑖, 𝑥 𝑖 𝛻𝑙𝑜𝑔𝑃 𝜃 𝑥 𝑖|𝑐 𝑖
𝑐 𝑖 = “What is your name?”
𝑥 𝑖 = “I don’t know”
𝐷 𝑐 𝑖
, 𝑥 𝑖
is negative
Update 𝜃 to decrease log𝑃 𝜃 𝑥 𝑖|𝑐 𝑖
𝑙𝑜𝑔𝑃 𝜃 𝑥 𝑖|𝑐 𝑖 = 𝑙𝑜𝑔𝑃 𝑥1
𝑖
|𝑐 𝑖 + 𝑙𝑜𝑔𝑃 𝑥2
𝑖
|𝑐 𝑖, 𝑥1
𝑖
+ 𝑙𝑜𝑔𝑃 𝑥3
𝑖
|𝑐 𝑖, 𝑥1:2
𝑖
𝑃 "𝐼"|𝑐 𝑖
𝑐 𝑖
= “What is your name?”
𝑥 𝑖 = “I am John”
𝐷 𝑐 𝑖
, 𝑥 𝑖
is positive
Update 𝜃 to increase log𝑃 𝜃 𝑥 𝑖|𝑐 𝑖
𝑙𝑜𝑔𝑃 𝜃 𝑥 𝑖
|𝑐 𝑖
= 𝑙𝑜𝑔𝑃 𝑥1
𝑖
|𝑐 𝑖
+ 𝑙𝑜𝑔𝑃 𝑥2
𝑖
|𝑐 𝑖
, 𝑥1
𝑖
+ 𝑙𝑜𝑔𝑃 𝑥3
𝑖
|𝑐 𝑖
, 𝑥1:2
𝑖
𝑃 "𝐼"|𝑐 𝑖
Reward for Every Generation Step
Method 2. Discriminator For Partially Decoded Sequences
𝑙𝑜𝑔𝑃 𝜃 𝑥 𝑖
|ℎ𝑖
= 𝑙𝑜𝑔𝑃 𝑥1
𝑖
|𝑐 𝑖
+ 𝑙𝑜𝑔𝑃 𝑥2
𝑖
|𝑐 𝑖
, 𝑥1
𝑖
+ 𝑙𝑜𝑔𝑃 𝑥3
𝑖
|𝑐 𝑖
, 𝑥1:2
𝑖
ℎ𝑖
= “What is your name?” 𝑥 𝑖 = “I don’t know”
𝑃 "𝐼"|𝑐 𝑖
𝑃 "𝑑𝑜𝑛′𝑡"|𝑐 𝑖
, "𝐼" 𝑃 "𝑘𝑛𝑜𝑤"|𝑐 𝑖
, "𝐼 𝑑𝑜𝑛′𝑡"
Method 1. Monte Carlo (MC) Search
𝛻 ത𝑅 𝜃 ≈
1
𝑁
෍
𝑖=1
𝑁
෍
𝑡=1
𝑇
𝑄 𝑐 𝑖, 𝑥1:𝑡
𝑖
− 𝑏 𝛻𝑙𝑜𝑔𝑃 𝜃 𝑥𝑡
𝑖
|𝑐 𝑖, 𝑥1:𝑡−1
𝑖
𝛻 ത𝑅 𝜃 ≈
1
𝑁
෍
𝑖=1
𝑁
𝐷 𝑐 𝑖, 𝑥 𝑖 𝛻𝑙𝑜𝑔𝑃 𝜃 𝑥 𝑖|𝑐 𝑖
[Yu, et al., AAAI, 2017]
[Li, et al., EMNLP,
Tips: RankGAN
Kevin Lin, Dianqi Li, Xiaodong
He, Zhengyou Zhang, Ming-Ting
Sun, “Adversarial Ranking for Language
Generation”, NIPS 2017
Image caption generation:
Find more comparison in the survey papers.
[Lu, et al., arXiv, 2018][Zhu, et al., arXiv, 2018]
Input We've got to look for another route.
MLE I'm sorry.
GAN You're not going to be here for a while.
Input You can save him by talking.
MLE I don't know.
GAN You know what's going on in there, you know what I
mean?
• MLE frequently generates “I’m sorry”, “I don’t know”, etc.
(corresponding to fuzzy images?)
• GAN generates more complex responses (however, no strong
evidence shows that they are more coherent to the input)
Experimental Results
More Applications
• Supervised machine translation [Wu, et al., arXiv
2017][Yang, et al., arXiv 2017]
• Supervised abstractive summarization [Liu, et al., AAAI
2018]
• Image/video caption generation [Rakshith Shetty, et al., ICCV
2017][Liang, et al., arXiv 2017]
If you are using seq2seq models,
consider to improve them by GAN.
Unsupervised Sequence-to-Sequence?
Domain X Domain Y
positive sentences
[Alexis Conneau, et al., ICLR, 2018]
[Guillaume Lample, et al., ICLR, 2018]
Unsupervised learning
with 10M sentences
Supervised learning with
100K sentence pairs
=
supervised
unsupervised
More in part III
Part II-2: Speech Processing
Generative Adversarial Network
and its Applications to Speech Processing
and Natural Language Processing
Outline of Part II-2
Speech Enhancement
Speech Synthesis
Voice Conversion
Speech Recognition
Outline of Part II-2
Speech Enhancement
Speech Synthesis
Voice Conversion
Speech Recognition
Speech Enhancement
• Typical deep learning approach
Noisy Clean
NN Output
We can use CNN, LSTM or
other fancy structures.
Hard to collect paired noisy and clean data?
No, collect clean speech, and then add noise yourself.
[Y. Xu, TASLP’15]
Using L1, L2 loss
Speech Enhancement
• Conditional GAN
G
D scalar
noisy output clean
noisy clean
output
noisy
training data
(fake pair
or not)
[Pascual et al.,
Interspeech 2017]
[Michelsanti et al.,
Interpsech 2017]
[Donahue et al.,
ICASSP 2018]
[Higuchi et al., ASRU 2017]
Speech Enhancement
without Paired Data
𝐺 𝑋→𝑌 𝐺Y→X
as close as possible
𝐺Y→X 𝐺 𝑋→𝑌
as close as possible
𝐷 𝑌𝐷 𝑋
scalar: belongs to
domain Y or not
scalar: belongs to
domain X or not
Speech Enhancement
without Paired Data
𝐺 𝑋→𝑌 𝐺Y→X
as close as possible
𝐺Y→X 𝐺 𝑋→𝑌
as close as possible
𝐷 𝑌𝐷 𝑋
scalar: belongs to
domain Y or not
scalar: belongs to
domain X or not
X: Noisy Speech
Y: Clean Speech
How to use the generators?
Method 1
ASR
Trained with
clean speech
text
Clean
Speech
Noisy
Speech
Noisy to
Clean
Method 2
ASR
Trained ASR with
noisy speechClean Speech Noisy Speech
Clean to
Noisy
Experimental Results
Baseline
Method
1
Method
2
[M. Mimura et al., ASRU 2017]
Outline of Part II-2
Speech Enhancement
Speech Synthesis
Voice Conversion
Speech Recognition
• GAN postfilter [Kaneko et al., ICASSP 2017]
 Traditional MMSE criterion results in statistical averaging.
 GAN is used as a new objective function to estimate the parameters in G.
 The proposed work intends to further improve the naturalness of
synthesized speech or parameters from a synthesizer.
Postfilter
Synthesized
Mel cepst. coef.
Natural
Mel cepst. coef.
D
Nature
or
Generated
Generated
Mel cepst. coef.
G
Fig. 4: Spectrograms of: (a) NAT (nature); (b) SYN (synthesized); (c) VS (variance
scaling); (d) MS (modulation spectrum); (e) MSE; (f) GAN postfilters.
Postfilter (GAN-based Postfilter)
• Spectrogram analysis
GAN postfilter reconstructs spectral texture similar to the natural one.
• Speech synthesis with GAN & multi-task learning
(SS-GAN-MTL) [Yang et al., ASRU 2017]
𝒙ෝ𝒙
Speech Synthesis
𝑮 𝑺𝑺
Natural
speech
parameters
Generated
speech
parameters
Gen.
Nature
𝑫
Noise
𝒛
𝒚
𝑉𝐺𝐴𝑁 𝐺, 𝐷 = 𝐸 𝒙~𝑝 𝑑𝑎𝑡𝑎(𝑥) 𝑙𝑜𝑔𝐷 𝒙|𝒚
+ 𝐸𝒛~𝑝 𝑧, log(1 − 𝐷 𝐺 𝒛|𝒚 |𝒚
𝑉𝐿2 𝐺 = 𝐸𝒛~𝑝 𝑧,[𝐺 𝒛|𝒚 − 𝒙]2
Linguistic
features
MSE
• Speech synthesis with GAN & multi-task learning
(SS-GAN-MTL) [Yang et al., ASRU 2017]
Speech Synthesis (SS-GAN-MTL)
𝒙
𝑮 𝑺𝑺
Natural
speech
parameters
Generated
speech
parameters
Gen.
Nature
𝑫
Noise
𝒛
𝒚
Linguistic
features
MSE
ෝ𝒙
𝑫 𝑪𝑬
𝑉𝐺𝐴𝑁 𝐺, 𝐷 = 𝐸 𝒙~𝑝 𝑑𝑎𝑡𝑎(𝑥) 𝑙𝑜𝑔𝐷 𝒙|𝒚
+ 𝐸𝒛~𝑝 𝑧, log(1 − 𝐷 𝐺 𝒛|𝒚 |𝒚
𝑉𝐿2 𝐺 = 𝐸𝒛~𝑝 𝑧,[𝐺 𝒛|𝒚 − 𝒙]2
𝒙ෝ𝒙
𝑮 𝑺𝑺
Natural
speech
parameters
Generated
speech
parameters
Gen.
Nature
Noise
𝒛
𝒚
Linguistic
features
MSE
𝑫 𝑪𝑬
𝑉𝐺𝐴𝑁 𝐺, 𝐷 = 𝐸 𝒙~𝑝 𝑑𝑎𝑡𝑎(𝑥) 𝑙𝑜𝑔𝐷 𝐶𝐸 𝒙|𝒚, 𝑙𝑎𝑏𝑒𝑙
+ 𝐸𝒛~𝑝 𝑧, log(1 − 𝐷 𝐶𝐸 𝐺 𝒛|𝒚 |𝒚, 𝑙𝑎𝑏𝑒𝑙
𝑉𝐿2 𝐺 = 𝐸𝒛~𝑝 𝑧,[𝐺 𝒛|𝒚 − 𝒙]2
CE
True label
Outline of Part II-2
Speech Enhancement
Speech Synthesis
Voice Conversion
Speech Recognition
Voice Conversion
In the past
Speaker A Speaker B
How are you? How are you?
Good morning Good morning
A
Hello
B
Hello
NN
As close as
possible
Gaussian mixture model (GMM) [Toda et al., TASLP 2007], non-negative matrix
factorization (NMF) [Wu et al., TASLP 2014; Fu et al., TBME 2017], locally linear
embedding (LLE) [Wu et al., Interspeech 2016], restricted Boltzmann machine (RBM)
[Chen et al., TASLP 2014], feed forward NN [Desai et al., TASLP 2010], recurrent NN
(RNN) [Nakashika et al., Interspeech 2014].
In the past
With GAN
Speaker A Speaker B
How are you? How are you?
Good morning Good morning
Speaker A Speaker B
天氣真好 How are you?
再見囉 Good morning
Speakers A and B are talking about completely different things.
Voice Conversion
without Paired Data
𝐺 𝑋→𝑌 𝐺Y→X
as close as possible
𝐺Y→X 𝐺 𝑋→𝑌
as close as possible
𝐷 𝑌𝐷 𝑋
scalar: belongs to
domain Y or not
scalar: belongs to
domain X or not
Voice Conversion
without Paired Data
𝐺 𝑋→𝑌 𝐺Y→X
as close as possible
𝐺Y→X 𝐺 𝑋→𝑌
as close as possible
𝐷 𝑌𝐷 𝑋
scalar: belongs to
domain Y or not
scalar: belongs to
domain X or not
X: Speaker A, Y: Speaker B
[Takuhiro Kaneko, et. al, arXiv,
2017][Fuming Fang, et. al, ICASSP,
2018][Yang Gao, et. al, ICASSP,
2018]
Voice Conversion
without Paired Data
• Subjective evaluations
MOS for naturalness.
Similarity of to source and to
target speakers. S: Source; T:Target;
P: Proposed; B:Baseline
1. The proposed method uses non-parallel data.
2. For naturalness, the proposed method outperforms baseline.
3. For similarity, the proposed method is comparable to the baseline.
Target
speaker
Source
speaker
Outline of Part II
Speech Enhancement
Speech Synthesis
Voice Conversion
Speech Recognition
Review:
Domain Adversarial Training
feature extractor (Generator)
Discriminator
(Domain classifier)
image
Label predictor
Which digits?
Not only cheat the domain
classifier, but satisfying label
predictor at the same time
In speech recognition, different domains can be
different accents, different noise conditions, etc.
Which domain?
Different types of Noises
as Different Domains
[Shinohara, INTERSPEEC, 2016]
Domain adversarial
training
Different Accents
as Different Domains
• STD: standard speech
• FJ, JS, JX, SC, GD, HN: accents
Domain adversarial training is effective in learning features
invariant to domain differences without labeled transcriptions.
[Sun et al., ICASSP2018]
Dual Learning: ASR & TTS
ASR
TTS
the figure is up upside
down on purpose
Speech
Text
ASR & TTS form a cycle.
X domain: Speech
Y domain: Audio
Dual Learning: TTS v.s. ASR
• Given pretrained TTS and ASR system
ASR
ASRTTS
TTS
Speech
(without Text)
Text
Speech
Text
(without Text)
Speech
Text
ascloseaspossible
ascloseaspossible
Discriminator
is not used.
It cannot be
completely
unsupervised.
Dual Learning: TTS v.s. ASR
• Experiments
1600 utterance-
sentence pairs
7200 unpaired
utterances and
sentences
supervised
loss
unsupervised
loss
mse
Prediction of the
“end-of-utterance”
Mel: mel-scale spectrum
Raw: raw waveform
[Andros Tjandra et al., ASRU 2017]

More Related Content

What's hot

GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...Balázs Hidasi
 
The How and Why of Feature Engineering
The How and Why of Feature EngineeringThe How and Why of Feature Engineering
The How and Why of Feature EngineeringAlice Zheng
 
Variants of GANs - Jaejun Yoo
Variants of GANs - Jaejun YooVariants of GANs - Jaejun Yoo
Variants of GANs - Jaejun YooJaeJun Yoo
 
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...MLconf
 
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017MLconf
 
Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...Balázs Hidasi
 
Uncertainty Modeling in Deep Learning
Uncertainty Modeling in Deep LearningUncertainty Modeling in Deep Learning
Uncertainty Modeling in Deep LearningSungjoon Choi
 
Hands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep LearningHands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep LearningChun-Ming Chang
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep LearningRayKim51
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2San Kim
 
Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendationsBalázs Hidasi
 
[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalizationJaeJun Yoo
 
Recommendation system using collaborative deep learning
Recommendation system using collaborative deep learningRecommendation system using collaborative deep learning
Recommendation system using collaborative deep learningRitesh Sawant
 
Utilizing additional information in factorization methods (research overview,...
Utilizing additional information in factorization methods (research overview,...Utilizing additional information in factorization methods (research overview,...
Utilizing additional information in factorization methods (research overview,...Balázs Hidasi
 
Context-aware preference modeling with factorization
Context-aware preference modeling with factorizationContext-aware preference modeling with factorization
Context-aware preference modeling with factorizationBalázs Hidasi
 
Pixelor presentation slides for SIGGRAPH Asia 2020
Pixelor presentation slides for SIGGRAPH Asia 2020Pixelor presentation slides for SIGGRAPH Asia 2020
Pixelor presentation slides for SIGGRAPH Asia 2020Ayan Das
 
[RLPR S2] Deep Recurrent Q-Learning for Partially Observable MDPs
[RLPR S2] Deep Recurrent Q-Learning for Partially Observable MDPs[RLPR S2] Deep Recurrent Q-Learning for Partially Observable MDPs
[RLPR S2] Deep Recurrent Q-Learning for Partially Observable MDPsKorea University
 

What's hot (20)

GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
 
The How and Why of Feature Engineering
The How and Why of Feature EngineeringThe How and Why of Feature Engineering
The How and Why of Feature Engineering
 
Variants of GANs - Jaejun Yoo
Variants of GANs - Jaejun YooVariants of GANs - Jaejun Yoo
Variants of GANs - Jaejun Yoo
 
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
 
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
 
Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...
 
Uncertainty in Deep Learning
Uncertainty in Deep LearningUncertainty in Deep Learning
Uncertainty in Deep Learning
 
Deep Reasoning
Deep ReasoningDeep Reasoning
Deep Reasoning
 
Uncertainty Modeling in Deep Learning
Uncertainty Modeling in Deep LearningUncertainty Modeling in Deep Learning
Uncertainty Modeling in Deep Learning
 
Hands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep LearningHands-on Tutorial of Deep Learning
Hands-on Tutorial of Deep Learning
 
IROS 2017 Slides
IROS 2017 SlidesIROS 2017 Slides
IROS 2017 Slides
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
 
Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendations
 
[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization
 
Recommendation system using collaborative deep learning
Recommendation system using collaborative deep learningRecommendation system using collaborative deep learning
Recommendation system using collaborative deep learning
 
Utilizing additional information in factorization methods (research overview,...
Utilizing additional information in factorization methods (research overview,...Utilizing additional information in factorization methods (research overview,...
Utilizing additional information in factorization methods (research overview,...
 
Context-aware preference modeling with factorization
Context-aware preference modeling with factorizationContext-aware preference modeling with factorization
Context-aware preference modeling with factorization
 
Pixelor presentation slides for SIGGRAPH Asia 2020
Pixelor presentation slides for SIGGRAPH Asia 2020Pixelor presentation slides for SIGGRAPH Asia 2020
Pixelor presentation slides for SIGGRAPH Asia 2020
 
[RLPR S2] Deep Recurrent Q-Learning for Partially Observable MDPs
[RLPR S2] Deep Recurrent Q-Learning for Partially Observable MDPs[RLPR S2] Deep Recurrent Q-Learning for Partially Observable MDPs
[RLPR S2] Deep Recurrent Q-Learning for Partially Observable MDPs
 

Similar to [GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processing

[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GANNAVER Engineering
 
[GAN by Hung-yi Lee]Part 3: The recent research of my group
[GAN by Hung-yi Lee]Part 3: The recent research of my group[GAN by Hung-yi Lee]Part 3: The recent research of my group
[GAN by Hung-yi Lee]Part 3: The recent research of my groupNAVER Engineering
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and ApplicationsHoang Nguyen
 
Maximally Invariant Data Perturbation as Explanation
Maximally Invariant Data Perturbation as ExplanationMaximally Invariant Data Perturbation as Explanation
Maximally Invariant Data Perturbation as ExplanationSatoshi Hara
 
Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018
Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018
Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018Amazon Web Services
 
Transformers.pdf
Transformers.pdfTransformers.pdf
Transformers.pdfAli Zoljodi
 
ML Study Jams - Session 3.pptx
ML Study Jams - Session 3.pptxML Study Jams - Session 3.pptx
ML Study Jams - Session 3.pptxMayankChadha14
 
Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Yuandong Tian at AI Frontiers : Planning in Reinforcement LearningYuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Yuandong Tian at AI Frontiers : Planning in Reinforcement LearningAI Frontiers
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep LearningCloudxLab
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learningknowbigdata
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep LearningShubhWadekar
 
James Whittaker - Pursuing Quality-You Won't Get There - EuroSTAR 2011
James Whittaker - Pursuing Quality-You Won't Get There - EuroSTAR 2011James Whittaker - Pursuing Quality-You Won't Get There - EuroSTAR 2011
James Whittaker - Pursuing Quality-You Won't Get There - EuroSTAR 2011TEST Huddle
 
PR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion TradeoffPR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion TradeoffTaeoh Kim
 
An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier ananth
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceAmit Sharma
 
Mentor mix review
Mentor mix reviewMentor mix review
Mentor mix reviewtaeseon ryu
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019Faisal Siddiqi
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
 

Similar to [GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processing (20)

[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
 
[GAN by Hung-yi Lee]Part 3: The recent research of my group
[GAN by Hung-yi Lee]Part 3: The recent research of my group[GAN by Hung-yi Lee]Part 3: The recent research of my group
[GAN by Hung-yi Lee]Part 3: The recent research of my group
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
 
Maximally Invariant Data Perturbation as Explanation
Maximally Invariant Data Perturbation as ExplanationMaximally Invariant Data Perturbation as Explanation
Maximally Invariant Data Perturbation as Explanation
 
Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018
Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018
Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018
 
Transformers.pdf
Transformers.pdfTransformers.pdf
Transformers.pdf
 
ML Study Jams - Session 3.pptx
ML Study Jams - Session 3.pptxML Study Jams - Session 3.pptx
ML Study Jams - Session 3.pptx
 
Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Yuandong Tian at AI Frontiers : Planning in Reinforcement LearningYuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learning
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learning
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learning
 
Predictive Testing
Predictive TestingPredictive Testing
Predictive Testing
 
James Whittaker - Pursuing Quality-You Won't Get There - EuroSTAR 2011
James Whittaker - Pursuing Quality-You Won't Get There - EuroSTAR 2011James Whittaker - Pursuing Quality-You Won't Get There - EuroSTAR 2011
James Whittaker - Pursuing Quality-You Won't Get There - EuroSTAR 2011
 
PR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion TradeoffPR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion Tradeoff
 
An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier
 
gan.pdf
gan.pdfgan.pdf
gan.pdf
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
 
Mentor mix review
Mentor mix reviewMentor mix review
Mentor mix review
 
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 

More from NAVER Engineering

디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIXNAVER Engineering
 
진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)NAVER Engineering
 
서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트NAVER Engineering
 
BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호NAVER Engineering
 
이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라NAVER Engineering
 
날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기NAVER Engineering
 
쏘카프레임 구축 배경과 과정
 쏘카프레임 구축 배경과 과정 쏘카프레임 구축 배경과 과정
쏘카프레임 구축 배경과 과정NAVER Engineering
 
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기NAVER Engineering
 
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)NAVER Engineering
 
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드NAVER Engineering
 
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기NAVER Engineering
 
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활NAVER Engineering
 
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출NAVER Engineering
 
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우NAVER Engineering
 
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...NAVER Engineering
 
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법NAVER Engineering
 
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며NAVER Engineering
 
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기NAVER Engineering
 
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기NAVER Engineering
 

More from NAVER Engineering (20)

React vac pattern
React vac patternReact vac pattern
React vac pattern
 
디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX
 
진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)
 
서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트
 
BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호
 
이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라
 
날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기
 
쏘카프레임 구축 배경과 과정
 쏘카프레임 구축 배경과 과정 쏘카프레임 구축 배경과 과정
쏘카프레임 구축 배경과 과정
 
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
 
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
 
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
 
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
 
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
 
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
 
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
 
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
 
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
 
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
 
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
 
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processing

  • 1. Part II-1: Natural Language Processing Generative Adversarial Network and its Applications to Speech Processing and Natural Language Processing
  • 2. Sequence Generation Generator Generator Machine Learning Generator How are you? How are you I am fine. ASR Translation Chatbot The generator is a typical seq2seq model. With GAN, you can train seq2seq model in another way.
  • 3. Review: Sequence-to-sequence • Chat-bot as example Encoder Decoder Input sentence c output sentence x Training data: A: How are you ? B: I’m good. ………… How are you ? I’m good. Generator Output: Not bad I’m John. Maximize likelihood Training Criterion Human better better
  • 5. Introduction • Machine obtains feedback from user • Chat-bot learns to maximize the expected reward https://image.freepik.com/free-vector/variety- of-human-avatars_23-2147506285.jpg How are you? Bye bye  Hello Hi  -10 3 http://www.freepik.com/free-vector/variety- of-human-avatars_766615.htm
  • 6. Maximizing Expected Reward Human Input sentence c response sentence x Chatbot En De response sentence x Input sentence c [Li, et al., EMNLP, 2016] reward 𝑅 𝑐, 𝑥 Learn to maximize expected reward Policy Gradient
  • 7. Maximizing Expected Reward 𝜃∗ = 𝑎𝑟𝑔 max 𝜃 ത𝑅 𝜃 ത𝑅 𝜃 Encoder Generator 𝜃 𝑐 𝑥 Human = ෍ 𝑐 𝑃 𝑐 ෍ 𝑥 𝑅 𝑐, 𝑥 𝑃 𝜃 𝑥|𝑐 𝑅 𝑐, 𝑥 Randomness in generator Probability that the input/history is c Maximizing expected reward update
  • 8. Maximizing Expected Reward = 𝐸 𝑐~𝑃 𝑐 𝐸 𝑥~𝑃 𝜃 𝑥|𝑐 𝑅 𝑐, 𝑥 ≈ 1 𝑁 ෍ 𝑖=1 𝑁 𝑅 𝑐 𝑖 , 𝑥 𝑖 Sample: 𝜃∗ = 𝑎𝑟𝑔 max 𝜃 ത𝑅 𝜃 = ෍ 𝑐 𝑃 𝑐 ෍ 𝑥 𝑅 𝑐, 𝑥 𝑃 𝜃 𝑥|𝑐 Maximizing expected reward Encoder Generator 𝜃 𝑐 𝑥 Human 𝑅 𝑐, 𝑥 update ത𝑅 𝜃 = 𝐸ℎ~𝑃 𝑐 ,𝑥~𝑃 𝜃 𝑥|𝑐 𝑅 𝑐, 𝑥 𝑐1, 𝑥1 , 𝑐2, 𝑥2 , ⋯ , 𝑐 𝑁, 𝑥 𝑁 Where is 𝜃?
  • 9. Policy Gradient 𝛻 ത𝑅 𝜃 = ෍ 𝑐 𝑃 𝑐 ෍ 𝑥 𝑅 𝑐, 𝑥 𝛻𝑃 𝜃 𝑥|𝑐 = ෍ 𝑐 𝑃 𝑐 ෍ 𝑥 𝑅 𝑐, 𝑥 𝑃 𝜃 𝑥|𝑐 𝛻𝑃 𝜃 𝑥|𝑐 𝑃 𝜃 𝑥|𝑐 𝑑𝑙𝑜𝑔 𝑓 𝑥 𝑑𝑥 = 1 𝑓 𝑥 𝑑𝑓 𝑥 𝑑𝑥 = ෍ 𝑐 𝑃 𝑐 ෍ 𝑥 𝑅 𝑐, 𝑥 𝑃 𝜃 𝑥|𝑐 𝛻𝑙𝑜𝑔𝑃 𝜃 𝑥|𝑐 ≈ 1 𝑁 ෍ 𝑖=1 𝑁 𝑅 𝑐 𝑖, 𝑥 𝑖 𝛻𝑙𝑜𝑔𝑃 𝜃 𝑥 𝑖|𝑐 𝑖 = 𝐸 𝑐~𝑃 𝑐 ,𝑥~𝑃 𝜃 𝑥|𝑐 𝑅 𝑐, 𝑥 𝛻𝑙𝑜𝑔𝑃 𝜃 𝑥|𝑐 Sampling = ෍ 𝑐 𝑃 𝑐 ෍ 𝑥 𝑅 𝑐, 𝑥 𝑃 𝜃 𝑥|𝑐ത𝑅 𝜃 ≈ 1 𝑁 ෍ 𝑖=1 𝑁 𝑅 𝑐 𝑖, 𝑥 𝑖
  • 10. Policy Gradient • Gradient Ascent 𝜃 𝑛𝑒𝑤 ← 𝜃 𝑜𝑙𝑑 + 𝜂𝛻 ത𝑅 𝜃 𝑜𝑙𝑑 𝛻 ത𝑅 𝜃 ≈ 1 𝑁 ෍ 𝑖=1 𝑁 𝑅 𝑐 𝑖, 𝑥 𝑖 𝛻𝑙𝑜𝑔𝑃 𝜃 𝑥 𝑖|𝑐 𝑖 𝑅 𝑐 𝑖, 𝑥 𝑖 is positive After updating 𝜃, 𝑃 𝜃 𝑥 𝑖|𝑐 𝑖 will increase 𝑅 𝑐 𝑖, 𝑥 𝑖 is negative After updating 𝜃, 𝑃 𝜃 𝑥 𝑖|𝑐 𝑖 will decrease
  • 11. Policy Gradient - Implementation 𝜃 𝑡 𝑐1, 𝑥1 𝑐2, 𝑥2 𝑐 𝑁 , 𝑥 𝑁 …… 𝑅 𝑐1, 𝑥1 𝑅 𝑐2, 𝑥2 𝑅 𝑐 𝑁, 𝑥 𝑁 …… 1 𝑁 ෍ 𝑖=1 𝑁 𝑅 𝑐 𝑖, 𝑥 𝑖 𝛻𝑙𝑜𝑔𝑃 𝜃 𝑡 𝑥 𝑖|𝑐 𝑖 𝜃 𝑡+1 ← 𝜃 𝑡 + 𝜂𝛻 ത𝑅 𝜃 𝑡 𝑅 𝑐 𝑖, 𝑥 𝑖 is positive Updating 𝜃 to increase 𝑃 𝜃 𝑥 𝑖 |𝑐 𝑖 𝑅 𝑐 𝑖, 𝑥 𝑖 is negative Updating 𝜃 to decrease 𝑃 𝜃 𝑥 𝑖|𝑐 𝑖
  • 12. Comparison 1 𝑁 ෍ 𝑖=1 𝑁 𝑅 𝑐 𝑖, 𝑥 𝑖 𝛻𝑙𝑜𝑔𝑃 𝜃 𝑥 𝑖|𝑐 𝑖 1 𝑁 ෍ 𝑖=1 𝑁 𝑙𝑜𝑔𝑃 𝜃 ො𝑥 𝑖 |𝑐 𝑖 1 𝑁 ෍ 𝑖=1 𝑁 𝛻𝑙𝑜𝑔𝑃 𝜃 ො𝑥 𝑖|𝑐 𝑖 1 𝑁 ෍ 𝑖=1 𝑁 𝑅 𝑐 𝑖, 𝑥 𝑖 𝑙𝑜𝑔𝑃 𝜃 𝑥 𝑖|𝑐 𝑖 𝑅 𝑐 𝑖 , ො𝑥 𝑖 = 1 obtained from interaction weighted by 𝑅 𝑐 𝑖 , 𝑥 𝑖 Objective Function Gradient Maximum Likelihood Reinforcement Learning Training Data 𝑐1, ො𝑥1 , … , 𝑐 𝑁, ො𝑥 𝑁 𝑐1, 𝑥1 , … , 𝑐 𝑁, 𝑥 𝑁
  • 13. Alpha GO style training ! • Let two agents talk to each other How old are you? See you. See you. See you. How old are you? I am 16. I though you were 12. What make you think so? Using a pre-defined evaluation function to compute R(c,x) I am busy.
  • 15. Conditional GAN Discriminator Input sentence c response sentence x Real or fake http://www.nipic.com/show/3/83/3936650kd7476069.html human dialogues Chatbot En De response sentence x Input sentence c [Li, et al., EMNLP, 2017] “reward”
  • 16. Algorithm • Initialize generator G (chatbot) and discriminator D • In each iteration: • Sample input c and response 𝑥 from training set • Sample input 𝑐′ from training set, and generate response ෤𝑥 by G(𝑐′) • Update D to increase 𝐷 𝑐, 𝑥 and decrease 𝐷 𝑐′, ෤𝑥 • Update generator G (chatbot) such that Discrimi nator scalar update Training data: 𝑐 Chatbot En De Pairs of conditional input c and response x
  • 17. A A A B A B A A B B B <BOS> Can we use gradient ascent? Due to the sampling process, “discriminator+ generator” is not differentiable Discriminator Discrimi nator scalar update scalarChatbot En De NO! Update Parameters
  • 18. Three Categories of Solutions Gumbel-softmax • [Matt J. Kusner, et al, arXiv, 2016] Continuous Input for Discriminator • [Sai Rajeswar, et al., arXiv, 2017][Ofir Press, et al., ICML workshop, 2017][Zhen Xu, et al., EMNLP, 2017][Alex Lamb, et al., NIPS, 2016][Yizhe Zhang, et al., ICML, 2017] “Reinforcement Learning” • [Yu, et al., AAAI, 2017][Li, et al., EMNLP, 2017][Tong Che, et al, arXiv, 2017][Jiaxian Guo, et al., AAAI, 2018][Kevin Lin, et al, NIPS, 2017][William Fedus, et al., ICLR, 2018]
  • 20. Three Categories of Solutions Gumbel-softmax • [Matt J. Kusner, et al, arXiv, 2016] Continuous Input for Discriminator • [Sai Rajeswar, et al., arXiv, 2017][Ofir Press, et al., ICML workshop, 2017][Zhen Xu, et al., EMNLP, 2017][Alex Lamb, et al., NIPS, 2016][Yizhe Zhang, et al., ICML, 2017] “Reinforcement Learning” • [Yu, et al., AAAI, 2017][Li, et al., EMNLP, 2017][Tong Che, et al, arXiv, 2017][Jiaxian Guo, et al., AAAI, 2018][Kevin Lin, et al, NIPS, 2017][William Fedus, et al., ICLR, 2018]
  • 21. A A A B A B A A B B B <BOS> Use the distribution as the input of discriminator Avoid the sampling process Discriminator Discrimi nator scalar update scalarChatbot En De Update Parameters We can do backpropagation now.
  • 22. What is the problem? • Real sentence • Generated 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0.9 0.1 0 0 0 0.1 0.9 0 0 0 0.1 0.1 0.7 0.1 0 0 0 0.1 0.8 0.1 0 0 0 0.1 0.9 Can never be 1-of-N Discriminator can immediately find the difference. WGAN is helpful
  • 23. Three Categories of Solutions Gumbel-softmax • [Matt J. Kusner, et al, arXiv, 2016] Continuous Input for Discriminator • [Sai Rajeswar, et al., arXiv, 2017][Ofir Press, et al., ICML workshop, 2017][Zhen Xu, et al., EMNLP, 2017][Alex Lamb, et al., NIPS, 2016][Yizhe Zhang, et al., ICML, 2017] “Reinforcement Learning” • [Yu, et al., AAAI, 2017][Li, et al., EMNLP, 2017][Tong Che, et al, arXiv, 2017][Jiaxian Guo, et al., AAAI, 2018][Kevin Lin, et al, NIPS, 2017][William Fedus, et al., ICLR, 2018]
  • 24. Reinforcement Learning? • Consider the output of discriminator as reward • Update generator to increase discriminator = to get maximum reward • Using the formulation of policy gradient, replace reward 𝑅 𝑐, 𝑥 with discriminator output D 𝑐, 𝑥 • Different from typical RL • The discriminator would update Discrimi nator scalar update Chatbot En De
  • 25. 𝜃 𝑡 𝑐1, 𝑥1 𝑐2, 𝑥2 𝑐 𝑁, 𝑥 𝑁 …… …… g-step d-step discriminator 𝑐 𝑥 fake real 𝑅 𝑐1, 𝑥1 𝑅 𝑐2, 𝑥2 𝑅 𝑐 𝑁 , 𝑥 𝑁 𝐷 𝑐1 , 𝑥1 𝐷 𝑐2 , 𝑥2 𝐷 𝑐 𝑁, 𝑥 𝑁 1 𝑁 ෍ 𝑖=1 𝑁 𝑅 𝑐 𝑖, 𝑥 𝑖 𝛻𝑙𝑜𝑔𝑃 𝜃 𝑡 𝑥 𝑖|𝑐 𝑖 𝜃 𝑡+1 ← 𝜃 𝑡 + 𝜂𝛻 ത𝑅 𝜃 𝑡 𝑅 𝑐 𝑖, 𝑥 𝑖 is positive Updating 𝜃 to increase 𝑃 𝜃 𝑥 𝑖 |𝑐 𝑖 𝑅 𝑐 𝑖, 𝑥 𝑖 is negative Updating 𝜃 to decrease 𝑃 𝜃 𝑥 𝑖|𝑐 𝑖 𝐷 𝑐 𝑖, 𝑥 𝑖 𝐷 𝑐 𝑖, 𝑥 𝑖 𝐷 𝑐 𝑖, 𝑥 𝑖 D
  • 26. Reward for Every Generation Step 𝛻 ത𝑅 𝜃 ≈ 1 𝑁 ෍ 𝑖=1 𝑁 𝐷 𝑐 𝑖, 𝑥 𝑖 𝛻𝑙𝑜𝑔𝑃 𝜃 𝑥 𝑖|𝑐 𝑖 𝑐 𝑖 = “What is your name?” 𝑥 𝑖 = “I don’t know” 𝐷 𝑐 𝑖 , 𝑥 𝑖 is negative Update 𝜃 to decrease log𝑃 𝜃 𝑥 𝑖|𝑐 𝑖 𝑙𝑜𝑔𝑃 𝜃 𝑥 𝑖|𝑐 𝑖 = 𝑙𝑜𝑔𝑃 𝑥1 𝑖 |𝑐 𝑖 + 𝑙𝑜𝑔𝑃 𝑥2 𝑖 |𝑐 𝑖, 𝑥1 𝑖 + 𝑙𝑜𝑔𝑃 𝑥3 𝑖 |𝑐 𝑖, 𝑥1:2 𝑖 𝑃 "𝐼"|𝑐 𝑖 𝑐 𝑖 = “What is your name?” 𝑥 𝑖 = “I am John” 𝐷 𝑐 𝑖 , 𝑥 𝑖 is positive Update 𝜃 to increase log𝑃 𝜃 𝑥 𝑖|𝑐 𝑖 𝑙𝑜𝑔𝑃 𝜃 𝑥 𝑖 |𝑐 𝑖 = 𝑙𝑜𝑔𝑃 𝑥1 𝑖 |𝑐 𝑖 + 𝑙𝑜𝑔𝑃 𝑥2 𝑖 |𝑐 𝑖 , 𝑥1 𝑖 + 𝑙𝑜𝑔𝑃 𝑥3 𝑖 |𝑐 𝑖 , 𝑥1:2 𝑖 𝑃 "𝐼"|𝑐 𝑖
  • 27. Reward for Every Generation Step Method 2. Discriminator For Partially Decoded Sequences 𝑙𝑜𝑔𝑃 𝜃 𝑥 𝑖 |ℎ𝑖 = 𝑙𝑜𝑔𝑃 𝑥1 𝑖 |𝑐 𝑖 + 𝑙𝑜𝑔𝑃 𝑥2 𝑖 |𝑐 𝑖 , 𝑥1 𝑖 + 𝑙𝑜𝑔𝑃 𝑥3 𝑖 |𝑐 𝑖 , 𝑥1:2 𝑖 ℎ𝑖 = “What is your name?” 𝑥 𝑖 = “I don’t know” 𝑃 "𝐼"|𝑐 𝑖 𝑃 "𝑑𝑜𝑛′𝑡"|𝑐 𝑖 , "𝐼" 𝑃 "𝑘𝑛𝑜𝑤"|𝑐 𝑖 , "𝐼 𝑑𝑜𝑛′𝑡" Method 1. Monte Carlo (MC) Search 𝛻 ത𝑅 𝜃 ≈ 1 𝑁 ෍ 𝑖=1 𝑁 ෍ 𝑡=1 𝑇 𝑄 𝑐 𝑖, 𝑥1:𝑡 𝑖 − 𝑏 𝛻𝑙𝑜𝑔𝑃 𝜃 𝑥𝑡 𝑖 |𝑐 𝑖, 𝑥1:𝑡−1 𝑖 𝛻 ത𝑅 𝜃 ≈ 1 𝑁 ෍ 𝑖=1 𝑁 𝐷 𝑐 𝑖, 𝑥 𝑖 𝛻𝑙𝑜𝑔𝑃 𝜃 𝑥 𝑖|𝑐 𝑖 [Yu, et al., AAAI, 2017] [Li, et al., EMNLP,
  • 28. Tips: RankGAN Kevin Lin, Dianqi Li, Xiaodong He, Zhengyou Zhang, Ming-Ting Sun, “Adversarial Ranking for Language Generation”, NIPS 2017 Image caption generation:
  • 29. Find more comparison in the survey papers. [Lu, et al., arXiv, 2018][Zhu, et al., arXiv, 2018] Input We've got to look for another route. MLE I'm sorry. GAN You're not going to be here for a while. Input You can save him by talking. MLE I don't know. GAN You know what's going on in there, you know what I mean? • MLE frequently generates “I’m sorry”, “I don’t know”, etc. (corresponding to fuzzy images?) • GAN generates more complex responses (however, no strong evidence shows that they are more coherent to the input) Experimental Results
  • 30. More Applications • Supervised machine translation [Wu, et al., arXiv 2017][Yang, et al., arXiv 2017] • Supervised abstractive summarization [Liu, et al., AAAI 2018] • Image/video caption generation [Rakshith Shetty, et al., ICCV 2017][Liang, et al., arXiv 2017] If you are using seq2seq models, consider to improve them by GAN.
  • 31. Unsupervised Sequence-to-Sequence? Domain X Domain Y positive sentences [Alexis Conneau, et al., ICLR, 2018] [Guillaume Lample, et al., ICLR, 2018]
  • 32. Unsupervised learning with 10M sentences Supervised learning with 100K sentence pairs = supervised unsupervised More in part III
  • 33. Part II-2: Speech Processing Generative Adversarial Network and its Applications to Speech Processing and Natural Language Processing
  • 34. Outline of Part II-2 Speech Enhancement Speech Synthesis Voice Conversion Speech Recognition
  • 35. Outline of Part II-2 Speech Enhancement Speech Synthesis Voice Conversion Speech Recognition
  • 36. Speech Enhancement • Typical deep learning approach Noisy Clean NN Output We can use CNN, LSTM or other fancy structures. Hard to collect paired noisy and clean data? No, collect clean speech, and then add noise yourself. [Y. Xu, TASLP’15] Using L1, L2 loss
  • 37. Speech Enhancement • Conditional GAN G D scalar noisy output clean noisy clean output noisy training data (fake pair or not) [Pascual et al., Interspeech 2017] [Michelsanti et al., Interpsech 2017] [Donahue et al., ICASSP 2018] [Higuchi et al., ASRU 2017]
  • 38. Speech Enhancement without Paired Data 𝐺 𝑋→𝑌 𝐺Y→X as close as possible 𝐺Y→X 𝐺 𝑋→𝑌 as close as possible 𝐷 𝑌𝐷 𝑋 scalar: belongs to domain Y or not scalar: belongs to domain X or not
  • 39. Speech Enhancement without Paired Data 𝐺 𝑋→𝑌 𝐺Y→X as close as possible 𝐺Y→X 𝐺 𝑋→𝑌 as close as possible 𝐷 𝑌𝐷 𝑋 scalar: belongs to domain Y or not scalar: belongs to domain X or not X: Noisy Speech Y: Clean Speech
  • 40. How to use the generators? Method 1 ASR Trained with clean speech text Clean Speech Noisy Speech Noisy to Clean Method 2 ASR Trained ASR with noisy speechClean Speech Noisy Speech Clean to Noisy
  • 42. Outline of Part II-2 Speech Enhancement Speech Synthesis Voice Conversion Speech Recognition
  • 43. • GAN postfilter [Kaneko et al., ICASSP 2017]  Traditional MMSE criterion results in statistical averaging.  GAN is used as a new objective function to estimate the parameters in G.  The proposed work intends to further improve the naturalness of synthesized speech or parameters from a synthesizer. Postfilter Synthesized Mel cepst. coef. Natural Mel cepst. coef. D Nature or Generated Generated Mel cepst. coef. G
  • 44. Fig. 4: Spectrograms of: (a) NAT (nature); (b) SYN (synthesized); (c) VS (variance scaling); (d) MS (modulation spectrum); (e) MSE; (f) GAN postfilters. Postfilter (GAN-based Postfilter) • Spectrogram analysis GAN postfilter reconstructs spectral texture similar to the natural one.
  • 45. • Speech synthesis with GAN & multi-task learning (SS-GAN-MTL) [Yang et al., ASRU 2017] 𝒙ෝ𝒙 Speech Synthesis 𝑮 𝑺𝑺 Natural speech parameters Generated speech parameters Gen. Nature 𝑫 Noise 𝒛 𝒚 𝑉𝐺𝐴𝑁 𝐺, 𝐷 = 𝐸 𝒙~𝑝 𝑑𝑎𝑡𝑎(𝑥) 𝑙𝑜𝑔𝐷 𝒙|𝒚 + 𝐸𝒛~𝑝 𝑧, log(1 − 𝐷 𝐺 𝒛|𝒚 |𝒚 𝑉𝐿2 𝐺 = 𝐸𝒛~𝑝 𝑧,[𝐺 𝒛|𝒚 − 𝒙]2 Linguistic features MSE
  • 46. • Speech synthesis with GAN & multi-task learning (SS-GAN-MTL) [Yang et al., ASRU 2017] Speech Synthesis (SS-GAN-MTL) 𝒙 𝑮 𝑺𝑺 Natural speech parameters Generated speech parameters Gen. Nature 𝑫 Noise 𝒛 𝒚 Linguistic features MSE ෝ𝒙 𝑫 𝑪𝑬 𝑉𝐺𝐴𝑁 𝐺, 𝐷 = 𝐸 𝒙~𝑝 𝑑𝑎𝑡𝑎(𝑥) 𝑙𝑜𝑔𝐷 𝒙|𝒚 + 𝐸𝒛~𝑝 𝑧, log(1 − 𝐷 𝐺 𝒛|𝒚 |𝒚 𝑉𝐿2 𝐺 = 𝐸𝒛~𝑝 𝑧,[𝐺 𝒛|𝒚 − 𝒙]2 𝒙ෝ𝒙 𝑮 𝑺𝑺 Natural speech parameters Generated speech parameters Gen. Nature Noise 𝒛 𝒚 Linguistic features MSE 𝑫 𝑪𝑬 𝑉𝐺𝐴𝑁 𝐺, 𝐷 = 𝐸 𝒙~𝑝 𝑑𝑎𝑡𝑎(𝑥) 𝑙𝑜𝑔𝐷 𝐶𝐸 𝒙|𝒚, 𝑙𝑎𝑏𝑒𝑙 + 𝐸𝒛~𝑝 𝑧, log(1 − 𝐷 𝐶𝐸 𝐺 𝒛|𝒚 |𝒚, 𝑙𝑎𝑏𝑒𝑙 𝑉𝐿2 𝐺 = 𝐸𝒛~𝑝 𝑧,[𝐺 𝒛|𝒚 − 𝒙]2 CE True label
  • 47. Outline of Part II-2 Speech Enhancement Speech Synthesis Voice Conversion Speech Recognition
  • 49. In the past Speaker A Speaker B How are you? How are you? Good morning Good morning A Hello B Hello NN As close as possible Gaussian mixture model (GMM) [Toda et al., TASLP 2007], non-negative matrix factorization (NMF) [Wu et al., TASLP 2014; Fu et al., TBME 2017], locally linear embedding (LLE) [Wu et al., Interspeech 2016], restricted Boltzmann machine (RBM) [Chen et al., TASLP 2014], feed forward NN [Desai et al., TASLP 2010], recurrent NN (RNN) [Nakashika et al., Interspeech 2014].
  • 50. In the past With GAN Speaker A Speaker B How are you? How are you? Good morning Good morning Speaker A Speaker B 天氣真好 How are you? 再見囉 Good morning Speakers A and B are talking about completely different things.
  • 51. Voice Conversion without Paired Data 𝐺 𝑋→𝑌 𝐺Y→X as close as possible 𝐺Y→X 𝐺 𝑋→𝑌 as close as possible 𝐷 𝑌𝐷 𝑋 scalar: belongs to domain Y or not scalar: belongs to domain X or not
  • 52. Voice Conversion without Paired Data 𝐺 𝑋→𝑌 𝐺Y→X as close as possible 𝐺Y→X 𝐺 𝑋→𝑌 as close as possible 𝐷 𝑌𝐷 𝑋 scalar: belongs to domain Y or not scalar: belongs to domain X or not X: Speaker A, Y: Speaker B [Takuhiro Kaneko, et. al, arXiv, 2017][Fuming Fang, et. al, ICASSP, 2018][Yang Gao, et. al, ICASSP, 2018]
  • 53. Voice Conversion without Paired Data • Subjective evaluations MOS for naturalness. Similarity of to source and to target speakers. S: Source; T:Target; P: Proposed; B:Baseline 1. The proposed method uses non-parallel data. 2. For naturalness, the proposed method outperforms baseline. 3. For similarity, the proposed method is comparable to the baseline. Target speaker Source speaker
  • 54. Outline of Part II Speech Enhancement Speech Synthesis Voice Conversion Speech Recognition
  • 55. Review: Domain Adversarial Training feature extractor (Generator) Discriminator (Domain classifier) image Label predictor Which digits? Not only cheat the domain classifier, but satisfying label predictor at the same time In speech recognition, different domains can be different accents, different noise conditions, etc. Which domain?
  • 56. Different types of Noises as Different Domains [Shinohara, INTERSPEEC, 2016] Domain adversarial training
  • 57. Different Accents as Different Domains • STD: standard speech • FJ, JS, JX, SC, GD, HN: accents Domain adversarial training is effective in learning features invariant to domain differences without labeled transcriptions. [Sun et al., ICASSP2018]
  • 58. Dual Learning: ASR & TTS ASR TTS the figure is up upside down on purpose Speech Text ASR & TTS form a cycle. X domain: Speech Y domain: Audio
  • 59. Dual Learning: TTS v.s. ASR • Given pretrained TTS and ASR system ASR ASRTTS TTS Speech (without Text) Text Speech Text (without Text) Speech Text ascloseaspossible ascloseaspossible Discriminator is not used. It cannot be completely unsupervised.
  • 60. Dual Learning: TTS v.s. ASR • Experiments 1600 utterance- sentence pairs 7200 unpaired utterances and sentences supervised loss unsupervised loss mse Prediction of the “end-of-utterance” Mel: mel-scale spectrum Raw: raw waveform [Andros Tjandra et al., ASRU 2017]