SlideShare a Scribd company logo
1 of 53
Download to read offline
Some interesting generative
models
Yi-fan Liou
@AIA 20190927
誤會是這樣發生的
https://arxiv.org/pdf/1807.03039.pdf
今天講題的兩大保證
• 保證講者沒有強大的數學背景
• 請相信生物學家…
• 保證內容不會是全部正確的
• 歡迎討論
Outline
• Autoencoder
• Autoencoder and variational autoencoder
• VQVAE (VQVAE2)
• Autoregressive model
• PixelCNN
• GAN
• CycleGAN, RecycleGAN
• CycleGAN with guess discriminator
• Flow-based Model
• GLOW
Autoencoder
未知的作者 的 此相片 已透過 CC BY-NC 授權 未知的作者 的 此相片 已透過 CC BY-NC 授權
Latent
variables
The latent variables
The variables that can express the
output of the phenomena
1. Denoise
2. Dimension reduction
3. Feature extraction
4. Segmentation
5. Energy measuring
…
Autoencoder with RELU?
https://groups.google.com/forum/#!msg/pylearn-dev/iWqctW9nkAg/JJ5GwA5OYlUJ
Sometime,
Everything is built over the dead bodies….
並不是所有的事情都是這麼理所當然
Variational Autoencoder
未知的作者 的 此相片 已透過 CC BY-NC 授權 未知的作者 的 此相片 已透過 CC BY-NC 授權
mean
variance
The “mean” and
“Variance” could
create a
distribution.
samplingIf we control the latent variables underling the normal
distribution, maybe we might create new latent
variables by sampling from the normal distribution.
The mean and variance should close
to 0 and 1 as best as possible.
策略
1. Latent variables 要呈現
常態分佈。
2. 一些離散的點就給予亂
數做perturbation,期
望把沒有樣本的空間也
填滿。 (其實可以視為
從常態分佈中抽樣)
Vector Quantize Autoencoder
encoder decoder
Latent variables
autoencoder Variational
autoencoder
Vector quantized
Variational
autoencoder
好處: 訓練容易
問題: 如果test抽
到介於訓練樣本
的點與點之間,
就無法順利還原
好處: 對於抽到點
與點之間的樣本
有一定的抗性。
問題: 抵抗性仍然
不夠大
好處: 沒有抽到點與點
間樣本的問題。
問題: 有夠難寫的,而
且不好練https://arxiv.org/abs/1711.00937
encoder decoder
autoencoder VQ variational autoencoder
…
Vector dictionary
找出在dictionary中,
相似度最高的向量
decoder
…
inferencing
任意分布抽樣都可,
因為都會找最近的
vector做代表
If we want to use 2D CNN in VQVAE
How to pass the gradient in VQVAE
encoder decoder
…
Loss1: decoder loss
如果直接算loss,
gradient傳到這邊就停
Loss2: encoder loss
這邊假設encoder就是編出
dictionary中取出的那個代表,所以
直接給dictionary中的結果跟梯度
Loss3: QV loss
Encoder應該盡量編碼出跟
dictionary選出的vector相同的數值
zq = ze + tf.stop_gradient(zq - ze)
zq = ze + (zq - ze) = zq
Forward:
zq = ze + tf.stop_gradient(zq - ze) = ze
因為stop gradient會讓這部分在倒傳遞時直接被忽略
Backward:
3 losses in VQVAE
https://github.com/markliou/ML-
practice/blob/master/VQVAE/vqvae.py
MNIST with VAVAE
https://github.com/markliou/ML-practice/blob/master/VQVAE/vqvae.py
The improvement of VQ
https://arxiv.org/pdf/1803.03382.pdf
The issues from VQ :
1. Only the code selected by encoder will be updated
2. Not all the codes in the codebook will be used
The exponential moving average (EMA) method
Codes
embedding
1. Calculating the distances of the codes that are selected
=> This help to maintain the codebook Selected
code
Codebook 查詢結果
樣本數量
Code的更新Lambda=.999
2. Decomposing the vector into small pieces would help to use them more
efficient.
=> Vector slicing help to use the code more efficiently
找出所有跟該code
相關的embedding,
然後算出這些
embedding的質心,
同時讓這code往這
個質心移動
質心
VQVAE2
https://arxiv.org/pdf/1906.00446.pdf
Autoregressive models
Focus on Pixel CNN
Can we use CNN to handle the autoregressive
model?
• The problem autoregressive model met is …
• “the model cannot get the information from future, but it can use any
information of the past”.
So …
Maybe we can just make a toy sample that:
the model only use the previous information.
(but it would not be the real case in the word for image generating)
PixelCNN
1 2 3
4 5 6
7 8 9
kernel
1 2 3
4 5 6
7 8 9
Masked kernel
mask
The pixel CNN which use masked
kernel is proposed with PixelRNN
with LSTM
However, PixelRNN perform better
than PixelCNN.
Some ones thought this would be
caused from LSTM having “gates”
which provide the RNN to handle
complex problems.
Blind spots of pixel CNN
• https://towardsdatascience.com/blind-spot-problem-in-pixelcnn-8c71592a14a
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
The blind spot means: the
generated point is
generated without
considering the spots.
Feature map
1 2 3
4 5 6
7 8 9
kernel
1 2 3
4 5 6
7 8 9
Masked kernel
mask
When running with the masked kernels
1
1 2 3
4 a
If the feature map is handled with zero
padding, then a would be only influenced
with bias terms of NN. (or maybe we can
consider the non-zero padding)
next
1
1 2 3
4a b
a b c d e
f g h i j
n o1k 2l 3m
4p q
Blind spot
The q is generated
without considering j, n,
o.
The blind spots explanation
Absenting this
direction makes the
blind spots. These
spot number will
increase with the
layer number.
How about make a new
filters which still have this
direction?
Supposing:
The 3*3 kernel and
postulating 2 or 3 layers
Horizontal and vertical stack
The feature map can be separated into 2 parts according to the status of the rows:
1. The rows containing the predicted spot
2. The rows without the predicted spot
https://www.slideshare.net/suga93/conditional-image-generation-with-pixelcnn-decoders
masked
It’s equivalent to
use 2*3 kernel
*
*
This mask is due to we
will use the horizontal
stack information.
Construct the horizontal and vertical at the
same time
Vertical
stack
Horizontal
stack
*
*
+
New feature map
Horizontal stack provides the
final predicted output
Padding after
convolution
crop
This feature map will
see the “future spot”
without padding and
crop
What happen when vertical and horizontal
stacks merging
P P P P P P P P P
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18
1 2 3
13
22
1 2 3
13
21 22
= +
If we want to take
position 22 …
Blind spot disappear
(but we still use the same kernel shape)
The concept of conditional gated CNN
The input should be
weighted and normalized
using certain method.
The opening of the
gate can be also
decided by the input
The x is masked using the
same mask. This influences
the Wf and Wg
• The description can be also given as one-hot
encoding, h.
• x is thought as given from h
The h should be properly transform due to the shape should be the same to W*X.
The h can be also embedded use neural network, m, such that s = m(h)
V*s can be directly calculated without masking. The kernel size use the 1*1 in original
paper.
Vertical
stack
Horizontal
stack
Gate mechanism
The feature map will pass two functions.
The tanh will normalize the feature maps
while the sigmoid is the gate.
The whole postulated architecture v’ h’
v h
Vertical stack
Horizontal stack
output
GTU and GLU
GTU : gate tanh unit
GLU : gate linear unit
GAN
Generative adversarial network
1. Cycle GAN
2. Recycle GAN
3. The loss functions and which is most useful
Generative adversarial network
• The problem of generative model
• “structured things” seems having some certain “distribution”
• Human sentence : n + v + adj + n
• The generative model need to learn the “distribution”
• The types of generative model learning
• Select the most related object from database
• Just learn the similarity => like Deepblue
• Learn the distribution
input
Transform to vector
database output
input
Transform to vector
output
distribution
Generative adversarial network
• How to learn a distribution is a problem
Generative
model
Generated
samples
generate
These samples have certain pattern
Real
samples
Discriminator
Measure the
differences
Tell the generator how to modify the distribution
Give a random
vector
CycleGan
• The naïve GAN generated model depending on the random vector
• The user cannot decide what will be generated
• we can use the paired data to constrain the output
• Not every data have the paired data
• CycleGan solve the unpaired data problem
G
D
Image source d from H.Y. Lee, NTU
Paired data
Spatial cycle consistency is not enough
• Mode collapse is a
serious problem in
cycle GAN
• In (a), the generated
Obama has only some
pixels changing, but
they can generated
variated Trump.
• In (b), the similar
situation occurred.
• Self-adversirial attack
similar
different
3 Different domain style transfer methods
The Recycle GAN
Xt Y’t
Gy
X’t
Gx
Cycle consistency
Dy Dx
True or False True or FalseCycle GAN
Video X Video Y
Px Py
P can use t frames to predict
(t+1) frame
Gx
Xt+1 Y’t+1 X’t+1
Cycle consistency
(Recycle loss)
Py
Px Maybe not needed
Recycle GAN
Self-adversarial attack
• 進行cycle-GAN的時候,中間產生的產物沒甚麼變化,但卻可以騙
過D
https://arxiv.org/pdf/1908.01517.pdf
Defense methods
1. Adversarial training with noise
2. Guess discriminator
The guess discriminator
Xt Y’t
Gy
X’t
Gx
Cycle consistency
Dy Dx
True or False True or FalseCycle GAN
https://arxiv.org/pdf/1908.01517.pdf
Dguess
Which is the
constructed
sample
Guess discriminator
have the input and
the reconstructed
sample.
input
reconstructed
Dguess
Which one
is input?
在這步驟把
input和
reconstructed
sample 順序任
意排列,看D
能不能判別出
上下哪張是
input
但是實際使用上,
直接用noise的效果
就很不錯了。
How to choose the types of GAN
• “Are GANs Created Equal? A Large-Scale Study” (2017)
• https://arxiv.org/abs/1711.10337 (Google Brain)
結論:
燒完一堆花錢做實驗後,
還是原始版本最好。
Flow Based model – Glow as
example
Yi-Fan Liou
今天要說一個initialization決定
輸贏的網路
不過首先還是要說一些可能對後面理解Glow有幫助的東西
Optical flow
D. Putcha et al, “Functional correlates of optic flow motion processing in Parkinson’s disease,” 2014
Optic flow stimuli illustration. Optic flow motion stimuli (A) simulate forward and backward
motion using dot fields that are expanding or contracting while rotating about a central focus.
Random motion (B) simulates non-coherent motion using dots moving at the same speeds used
in optic flow, but with random directions of movement. In the illustrations, the length of arrows
corresponds with dot speed, indication that dot speed increases with distance away from the
center.
https://mlatgt.blog/2018/09/06/learning-rigidity-and-scene-flow-estimation/
http://hammerbchen.blogspot.com/2011/10/art-of-optical-flow.html
Lucas-Kanade
Ref:
http://www.inf.fu-berlin.de/inst/ag-ki/rojas_home/documents/tutorials/Lucas-Kanade2.pdf
http://www.cs.umd.edu/~djacobs/CMSC426/OpticalFlow.pdf
今天有個圖形某一小區塊的亮度如果往右下慢慢
增加,那可以猜測目前這個區塊是往左上移動。
這時候可以猜出(x,y)這個位點在時間t內移動為:
稍微
整理
兩邊都分別乘以ST,這時
候ST S應該要為可逆才可能
有解。
ST S如果是可逆,那應
該會是對角矩陣。
Lucas-Kanade追蹤了前後兩個frame在短時間內的
變化。
從這方法來看,大概可以得到一個簡單的特性,
就是處理光流的矩陣大概是呈現對角的狀態。
對角矩陣有個特性就是”可追蹤”(tractable)。
LU decomposition?
Normalizing flow
http://proceedings.mlr.press/v37/rezende15.pdf
f1 f2
Target
distribution
Expected
distribution
f1-1
f2-1
從一個我們認識的分布(通常會用normal
distribution),透過一連串可逆的mapping,得到另
外一組再實際場域中可以看到的分布。
Normalizing flow based methods
• 從某個分布空間域投射到另外一個空間域,如果每個小物件都有
其對應的關係,找出這個關係就能相互推導
f(x)
g(x)
f(x) = -g(x)
如果空間上本身就有”對應”關係,那不需要讓模型去”自動產生
全部合理的形狀”(因為太困難了)。也許可以透過修改一個小的
機率分布就能得到最後對應的關係。
https://arxiv.org/pdf/1605.08803.pdf
Affine transformation
http://silverwind1982.pixnet.net/blog/post/160691705-affine-transformation
https://math.stackexchange.com/questions/884666/what-
are-differences-between-affine-space-and-vector-space
Affine space
If the point can map to affine space, this is
the “affine transformation”.
This transformation is supposed to be linear.
Affine space 2
Affine space 1
Affine space 2
Affine space 1
affine transformation
𝑋1
𝑌1
=
𝐴11 𝐴12
𝐴21 𝐴22
X
𝑋2
𝑌2
+
𝐵1
𝐵2
𝑋1
𝑌1
1
=
𝐴11 𝐴12 𝐵1
𝐴21 𝐴22 𝐵2
0 0 1
X
𝑋2
𝑌2
1
homogenerous coordinates
放大縮小 平移
如果放入三角函數還可以達到
“旋轉”的效果
𝑋1
𝑌1
1
=
cos(θ) −sin(θ) 𝐵1
sin(θ) cos(θ) 𝐵2
0 0 1
X
𝑋2
𝑌2
1
This is also inversed function.
Objective:
可以用一個神經網路學到這些affine
transformation的特性嗎?If Z > 1, the vector (X, Y, Z) will pass
another plane (which is not a sub-
vector space of each other) and give
a coordinates of (x, y)
(but the information of z would be loss)
non-volume preserving
You can find more detail from Dr. Hung-yi Lee’s Youtube channel
https://www.youtube.com/watch?v=uXY18nzdSsM
Volume preserving
The “determinate” of the vector
indicates the “volume”.
The characteristics:
1. The diagonal of Jacob matrix
between the functions are 1.
2. The relation between functions are
traceable.
What will the “non-volume preserving”
(NVP) do?
In such task based on the concepts of optical-flow
methods, the volume would not be so important.
But…
The characteristic of “traceable” is essential.
RealNVP: modifying the functions which keep the “invertible” characteristic
Identical
matrix
I don’t care
TNVP – temporal non-volume preserving
https://arxiv.org/pdf/1703.08617.pdf
Curiosity in reinforcement learning
Rewards
Intrinsic – inside the agent
Extrinsic – from the environment
https://arxiv.org/pdf/1705.05363.pdf
IntrinsicExtrinsic
如果可以從當前的狀態跟當前的行為直接
推導下一個狀態,表示這樣的行為已經非
常成熟,而不需要多探索了。
(因此如果可以找到一個動作無法預測未
來,那就應該給予高一點的獎勵)
St+1 是由at跟St 一起運作得到的。所以也
許可以透過建構一個model從St+1跟St來反
推at。
這模型可以分為兩部分,第一個部分就是
regressor,也就是inverse model;另外一
部分就是feature extractor。這邊是期望有
一個很好的feature extractor,而這個
feature extractor可以把真正有資訊的
feature抽出來而不是亂抽。loss
LI: inverse model loss
LF: forward model loss
b: 調整intrinsic reward比重。
l:調整intrinsic跟extrinsic rewards比重
Curious-driven exploration
https://arxiv.org/pdf/1905.10071.pdf
Rewards
Intrinsic – inside the agent
Extrinsic – from the environment
Conventional ICM
Flow-based ICM
需要用反應(at)做為一
個proxy objective來評
估。這部分要以”精確
找出反應”來調整”狀態”
的encoder。
Various kinds of flow model
https://arxiv.org/pdf/1907.07945.pdf
上面的先備知識講完後,就來
主題Glow
The process of Glow
製作Flow model時,就掌握三大原則:
1. Traceable
2. 想辦法打亂
3. 如果有channel,channel間必須有
相依性。(方便進行affine轉換)
RealNVP (2018)
Factor out
Scale block
…
Scale block
Z
Actnorm
1*1 Invertible conv
Affine coupling
Scale block
…
Y
Loss 內涵:
1. 在flow裡面,每一個點都有意義,因
此每個點都要算。
2. Normalizing flow的意思就是希望“把
output整到某種分布”,而我們常用的就
是常態分布。因此就把z跟y跟常態分佈
做maximum likelihood (MLE)。
3. 每次做轉換的時候並不希望有太大的”
體積轉變“(log determinant)
z
任何你喜歡的分
布,但大部分的
人會挑常態分佈
MLE
Actnorm layer
This layer try to scale the input into an “acceptable”
information for next layer.
• “acceptable” means the information have the 0 and 1
for mean and variance, respectively.
• This concept is similar to “batch normalization (BN).
Here, the mean and variance is get directly from the
input data.
• This is a special case of “data dependent initialization”
• This would be caused from BN is less traceable.
Data dependent initialization
https://arxiv.org/pdf/1511.06856.pdf (ICLR2016)
Layer 1
Layer 2
Layer k
….
Layer k-1
inputinputinput
Input of
Layer K
Layer k
output of
Layer K
控制這一層的比
重,使得output
也可以維持在
activation
function可以作
用的範圍
(控制抽樣的
distribution的
mean跟variance
即可)
Between layer normalization
原本都是假設前面幾層都是線性的
(supposed affine layer),但實際上不
是這樣。因此還是需要做進一步調整,
調整方式就是用多放不同的Batch進
去,然後估計不同樣本變化多大,在
進一步調整mean跟Variance(rk in the
algorithm2)。
Within layer initialization
inputinputinput inputinputinput
Data dependent initialization
https://arxiv.org/pdf/1511.06856.pdf (ICLR2016)
Invertible 1x1 convolution Actnorm
1*1 Invertible conv
Affine coupling
Scale block
Purpose:
Shuffling the channel information and
1*1 convolution is traceable
1
2
3
1
0
0
0
0
1
0
1
0
1 3 2
reversable 要保持可逆性只需要製作一
個正交矩陣即可。不一定非
要只用1或0構成。
實作方式:
1. 製作一個亂數矩陣,其大小就是1*1 convolution使用的
kernel數量。
2. 把該矩陣做QR分解,其Q矩陣就是正交矩陣。
3. 把Q矩陣作為initialize的比重。
Affine Coupling layers
RealNVP (2018)
Actnorm
1*1 Invertible conv
Affine coupling
Scale block
Resnet
Tips:
1. channel數量每一層的
input = output
2. 最後一層用zero-
initialization,channel
數量設為2倍
3. 切一半。一半是
shift(+), 一半是scale(*)
用複雜的
神經網路
找出來
center
scale
+
*
concate看似整齊,實際上背後會
被1*1convolution做shffle
Zero
initialization
Factor out layer and Squeeze layer
RealNVP
https://arxiv.org/pdf/1605.08803.pdf
Actnorm
1*1 Invertible conv
Affine coupling
Scale block
h
z
繼續進行flow
停住不動
Squeezing
把圖像切的更小,而且切法固定不失相
依性。
更多的channel可以讓shuffle進行得更有
效率
That’s all~~

More Related Content

What's hot

XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionJaroslaw Szymczak
 
GBM package in r
GBM package in rGBM package in r
GBM package in rmark_landry
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewPoo Kuan Hoong
 
Introduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-LearnIntroduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-LearnAmol Agrawal
 
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...Altoros
 
Introduction to neural networks and Keras
Introduction to neural networks and KerasIntroduction to neural networks and Keras
Introduction to neural networks and KerasJie He
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowOswald Campesato
 
Basic ideas on keras framework
Basic ideas on keras frameworkBasic ideas on keras framework
Basic ideas on keras frameworkAlison Marczewski
 
Matching networks for one shot learning
Matching networks for one shot learningMatching networks for one shot learning
Matching networks for one shot learningKazuki Fujikawa
 
Introduction to Deep Learning with Python
Introduction to Deep Learning with PythonIntroduction to Deep Learning with Python
Introduction to Deep Learning with Pythonindico data
 
Deep Learning with PyTorch
Deep Learning with PyTorchDeep Learning with PyTorch
Deep Learning with PyTorchMayur Bhangale
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative ModelsKenta Oono
 
PyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersPyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersBayu Aldi Yansyah
 
Introduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learnIntroduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learnMatt Hagy
 
Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowNicholas McClure
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlowDarshan Patel
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...Sebastian Raschka
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
 
D3, TypeScript, and Deep Learning
D3, TypeScript, and Deep LearningD3, TypeScript, and Deep Learning
D3, TypeScript, and Deep LearningOswald Campesato
 

What's hot (20)

XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
 
GBM package in r
GBM package in rGBM package in r
GBM package in r
 
TensorFlow and Keras: An Overview
TensorFlow and Keras: An OverviewTensorFlow and Keras: An Overview
TensorFlow and Keras: An Overview
 
Introduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-LearnIntroduction to Machine Learning in Python using Scikit-Learn
Introduction to Machine Learning in Python using Scikit-Learn
 
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...
 
Introduction to neural networks and Keras
Introduction to neural networks and KerasIntroduction to neural networks and Keras
Introduction to neural networks and Keras
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlow
 
Basic ideas on keras framework
Basic ideas on keras frameworkBasic ideas on keras framework
Basic ideas on keras framework
 
Matching networks for one shot learning
Matching networks for one shot learningMatching networks for one shot learning
Matching networks for one shot learning
 
Introduction to Deep Learning with Python
Introduction to Deep Learning with PythonIntroduction to Deep Learning with Python
Introduction to Deep Learning with Python
 
Deep Learning with PyTorch
Deep Learning with PyTorchDeep Learning with PyTorch
Deep Learning with PyTorch
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative Models
 
PyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersPyTorch for Deep Learning Practitioners
PyTorch for Deep Learning Practitioners
 
Introduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learnIntroduction to Machine Learning with Python and scikit-learn
Introduction to Machine Learning with Python and scikit-learn
 
Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in Tensorflow
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlow
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
Siamese networks
Siamese networksSiamese networks
Siamese networks
 
D3, TypeScript, and Deep Learning
D3, TypeScript, and Deep LearningD3, TypeScript, and Deep Learning
D3, TypeScript, and Deep Learning
 

Similar to 20190927 generative models_aia

Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonAditya Bhattacharya
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryKenta Oono
 
Unsupervised Object Detection
Unsupervised Object DetectionUnsupervised Object Detection
Unsupervised Object DetectionMahan Fathi
 
Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017Alex Conway
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureSanghamitra Deb
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsShunta Saito
 
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019Alberto Massidda - Scenes from a memory - Codemotion Rome 2019
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019Codemotion
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Oswald Campesato
 
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...Alex Conway
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用CHENHuiMei
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...Edge AI and Vision Alliance
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Sergey Karayev
 
Teach a neural network to read handwriting
Teach a neural network to read handwritingTeach a neural network to read handwriting
Teach a neural network to read handwritingVipul Kaushal
 

Similar to 20190927 generative models_aia (20)

Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathon
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
20220811 - computer vision
20220811 - computer vision20220811 - computer vision
20220811 - computer vision
 
Unsupervised Object Detection
Unsupervised Object DetectionUnsupervised Object Detection
Unsupervised Object Detection
 
Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and Future
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019Alberto Massidda - Scenes from a memory - Codemotion Rome 2019
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)
 
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
Svm ms
Svm msSvm ms
Svm ms
 
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
 
Android and Deep Learning
Android and Deep LearningAndroid and Deep Learning
Android and Deep Learning
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
 
Teach a neural network to read handwriting
Teach a neural network to read handwritingTeach a neural network to read handwriting
Teach a neural network to read handwriting
 

Recently uploaded

Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 

Recently uploaded (20)

Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 

20190927 generative models_aia

  • 4. Outline • Autoencoder • Autoencoder and variational autoencoder • VQVAE (VQVAE2) • Autoregressive model • PixelCNN • GAN • CycleGAN, RecycleGAN • CycleGAN with guess discriminator • Flow-based Model • GLOW
  • 5. Autoencoder 未知的作者 的 此相片 已透過 CC BY-NC 授權 未知的作者 的 此相片 已透過 CC BY-NC 授權 Latent variables The latent variables The variables that can express the output of the phenomena 1. Denoise 2. Dimension reduction 3. Feature extraction 4. Segmentation 5. Energy measuring …
  • 6. Autoencoder with RELU? https://groups.google.com/forum/#!msg/pylearn-dev/iWqctW9nkAg/JJ5GwA5OYlUJ Sometime, Everything is built over the dead bodies…. 並不是所有的事情都是這麼理所當然
  • 7. Variational Autoencoder 未知的作者 的 此相片 已透過 CC BY-NC 授權 未知的作者 的 此相片 已透過 CC BY-NC 授權 mean variance The “mean” and “Variance” could create a distribution. samplingIf we control the latent variables underling the normal distribution, maybe we might create new latent variables by sampling from the normal distribution. The mean and variance should close to 0 and 1 as best as possible. 策略 1. Latent variables 要呈現 常態分佈。 2. 一些離散的點就給予亂 數做perturbation,期 望把沒有樣本的空間也 填滿。 (其實可以視為 從常態分佈中抽樣)
  • 8. Vector Quantize Autoencoder encoder decoder Latent variables autoencoder Variational autoencoder Vector quantized Variational autoencoder 好處: 訓練容易 問題: 如果test抽 到介於訓練樣本 的點與點之間, 就無法順利還原 好處: 對於抽到點 與點之間的樣本 有一定的抗性。 問題: 抵抗性仍然 不夠大 好處: 沒有抽到點與點 間樣本的問題。 問題: 有夠難寫的,而 且不好練https://arxiv.org/abs/1711.00937 encoder decoder autoencoder VQ variational autoencoder … Vector dictionary 找出在dictionary中, 相似度最高的向量 decoder … inferencing 任意分布抽樣都可, 因為都會找最近的 vector做代表
  • 9. If we want to use 2D CNN in VQVAE
  • 10. How to pass the gradient in VQVAE encoder decoder … Loss1: decoder loss 如果直接算loss, gradient傳到這邊就停 Loss2: encoder loss 這邊假設encoder就是編出 dictionary中取出的那個代表,所以 直接給dictionary中的結果跟梯度 Loss3: QV loss Encoder應該盡量編碼出跟 dictionary選出的vector相同的數值 zq = ze + tf.stop_gradient(zq - ze) zq = ze + (zq - ze) = zq Forward: zq = ze + tf.stop_gradient(zq - ze) = ze 因為stop gradient會讓這部分在倒傳遞時直接被忽略 Backward: 3 losses in VQVAE https://github.com/markliou/ML- practice/blob/master/VQVAE/vqvae.py
  • 12. The improvement of VQ https://arxiv.org/pdf/1803.03382.pdf The issues from VQ : 1. Only the code selected by encoder will be updated 2. Not all the codes in the codebook will be used The exponential moving average (EMA) method Codes embedding 1. Calculating the distances of the codes that are selected => This help to maintain the codebook Selected code Codebook 查詢結果 樣本數量 Code的更新Lambda=.999 2. Decomposing the vector into small pieces would help to use them more efficient. => Vector slicing help to use the code more efficiently 找出所有跟該code 相關的embedding, 然後算出這些 embedding的質心, 同時讓這code往這 個質心移動 質心
  • 15. Can we use CNN to handle the autoregressive model? • The problem autoregressive model met is … • “the model cannot get the information from future, but it can use any information of the past”. So … Maybe we can just make a toy sample that: the model only use the previous information. (but it would not be the real case in the word for image generating)
  • 16. PixelCNN 1 2 3 4 5 6 7 8 9 kernel 1 2 3 4 5 6 7 8 9 Masked kernel mask The pixel CNN which use masked kernel is proposed with PixelRNN with LSTM However, PixelRNN perform better than PixelCNN. Some ones thought this would be caused from LSTM having “gates” which provide the RNN to handle complex problems.
  • 17. Blind spots of pixel CNN • https://towardsdatascience.com/blind-spot-problem-in-pixelcnn-8c71592a14a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 The blind spot means: the generated point is generated without considering the spots. Feature map 1 2 3 4 5 6 7 8 9 kernel 1 2 3 4 5 6 7 8 9 Masked kernel mask When running with the masked kernels 1 1 2 3 4 a If the feature map is handled with zero padding, then a would be only influenced with bias terms of NN. (or maybe we can consider the non-zero padding) next 1 1 2 3 4a b a b c d e f g h i j n o1k 2l 3m 4p q Blind spot The q is generated without considering j, n, o.
  • 18. The blind spots explanation Absenting this direction makes the blind spots. These spot number will increase with the layer number. How about make a new filters which still have this direction? Supposing: The 3*3 kernel and postulating 2 or 3 layers
  • 19. Horizontal and vertical stack The feature map can be separated into 2 parts according to the status of the rows: 1. The rows containing the predicted spot 2. The rows without the predicted spot https://www.slideshare.net/suga93/conditional-image-generation-with-pixelcnn-decoders masked It’s equivalent to use 2*3 kernel * * This mask is due to we will use the horizontal stack information.
  • 20. Construct the horizontal and vertical at the same time Vertical stack Horizontal stack * * + New feature map Horizontal stack provides the final predicted output Padding after convolution crop This feature map will see the “future spot” without padding and crop
  • 21. What happen when vertical and horizontal stacks merging P P P P P P P P P 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1 2 3 13 22 1 2 3 13 21 22 = + If we want to take position 22 … Blind spot disappear (but we still use the same kernel shape)
  • 22. The concept of conditional gated CNN The input should be weighted and normalized using certain method. The opening of the gate can be also decided by the input The x is masked using the same mask. This influences the Wf and Wg • The description can be also given as one-hot encoding, h. • x is thought as given from h The h should be properly transform due to the shape should be the same to W*X. The h can be also embedded use neural network, m, such that s = m(h) V*s can be directly calculated without masking. The kernel size use the 1*1 in original paper. Vertical stack Horizontal stack Gate mechanism The feature map will pass two functions. The tanh will normalize the feature maps while the sigmoid is the gate.
  • 23. The whole postulated architecture v’ h’ v h Vertical stack Horizontal stack output
  • 24. GTU and GLU GTU : gate tanh unit GLU : gate linear unit
  • 25. GAN Generative adversarial network 1. Cycle GAN 2. Recycle GAN 3. The loss functions and which is most useful
  • 26. Generative adversarial network • The problem of generative model • “structured things” seems having some certain “distribution” • Human sentence : n + v + adj + n • The generative model need to learn the “distribution” • The types of generative model learning • Select the most related object from database • Just learn the similarity => like Deepblue • Learn the distribution input Transform to vector database output input Transform to vector output distribution
  • 27. Generative adversarial network • How to learn a distribution is a problem Generative model Generated samples generate These samples have certain pattern Real samples Discriminator Measure the differences Tell the generator how to modify the distribution Give a random vector
  • 28. CycleGan • The naïve GAN generated model depending on the random vector • The user cannot decide what will be generated • we can use the paired data to constrain the output • Not every data have the paired data • CycleGan solve the unpaired data problem G D Image source d from H.Y. Lee, NTU Paired data
  • 29. Spatial cycle consistency is not enough • Mode collapse is a serious problem in cycle GAN • In (a), the generated Obama has only some pixels changing, but they can generated variated Trump. • In (b), the similar situation occurred. • Self-adversirial attack similar different
  • 30. 3 Different domain style transfer methods
  • 31. The Recycle GAN Xt Y’t Gy X’t Gx Cycle consistency Dy Dx True or False True or FalseCycle GAN Video X Video Y Px Py P can use t frames to predict (t+1) frame Gx Xt+1 Y’t+1 X’t+1 Cycle consistency (Recycle loss) Py Px Maybe not needed Recycle GAN
  • 33. The guess discriminator Xt Y’t Gy X’t Gx Cycle consistency Dy Dx True or False True or FalseCycle GAN https://arxiv.org/pdf/1908.01517.pdf Dguess Which is the constructed sample Guess discriminator have the input and the reconstructed sample. input reconstructed Dguess Which one is input? 在這步驟把 input和 reconstructed sample 順序任 意排列,看D 能不能判別出 上下哪張是 input 但是實際使用上, 直接用noise的效果 就很不錯了。
  • 34. How to choose the types of GAN • “Are GANs Created Equal? A Large-Scale Study” (2017) • https://arxiv.org/abs/1711.10337 (Google Brain) 結論: 燒完一堆花錢做實驗後, 還是原始版本最好。
  • 35. Flow Based model – Glow as example Yi-Fan Liou
  • 37. Optical flow D. Putcha et al, “Functional correlates of optic flow motion processing in Parkinson’s disease,” 2014 Optic flow stimuli illustration. Optic flow motion stimuli (A) simulate forward and backward motion using dot fields that are expanding or contracting while rotating about a central focus. Random motion (B) simulates non-coherent motion using dots moving at the same speeds used in optic flow, but with random directions of movement. In the illustrations, the length of arrows corresponds with dot speed, indication that dot speed increases with distance away from the center. https://mlatgt.blog/2018/09/06/learning-rigidity-and-scene-flow-estimation/ http://hammerbchen.blogspot.com/2011/10/art-of-optical-flow.html
  • 40. Normalizing flow based methods • 從某個分布空間域投射到另外一個空間域,如果每個小物件都有 其對應的關係,找出這個關係就能相互推導 f(x) g(x) f(x) = -g(x) 如果空間上本身就有”對應”關係,那不需要讓模型去”自動產生 全部合理的形狀”(因為太困難了)。也許可以透過修改一個小的 機率分布就能得到最後對應的關係。 https://arxiv.org/pdf/1605.08803.pdf
  • 41. Affine transformation http://silverwind1982.pixnet.net/blog/post/160691705-affine-transformation https://math.stackexchange.com/questions/884666/what- are-differences-between-affine-space-and-vector-space Affine space If the point can map to affine space, this is the “affine transformation”. This transformation is supposed to be linear. Affine space 2 Affine space 1 Affine space 2 Affine space 1 affine transformation 𝑋1 𝑌1 = 𝐴11 𝐴12 𝐴21 𝐴22 X 𝑋2 𝑌2 + 𝐵1 𝐵2 𝑋1 𝑌1 1 = 𝐴11 𝐴12 𝐵1 𝐴21 𝐴22 𝐵2 0 0 1 X 𝑋2 𝑌2 1 homogenerous coordinates 放大縮小 平移 如果放入三角函數還可以達到 “旋轉”的效果 𝑋1 𝑌1 1 = cos(θ) −sin(θ) 𝐵1 sin(θ) cos(θ) 𝐵2 0 0 1 X 𝑋2 𝑌2 1 This is also inversed function. Objective: 可以用一個神經網路學到這些affine transformation的特性嗎?If Z > 1, the vector (X, Y, Z) will pass another plane (which is not a sub- vector space of each other) and give a coordinates of (x, y) (but the information of z would be loss)
  • 42. non-volume preserving You can find more detail from Dr. Hung-yi Lee’s Youtube channel https://www.youtube.com/watch?v=uXY18nzdSsM Volume preserving The “determinate” of the vector indicates the “volume”. The characteristics: 1. The diagonal of Jacob matrix between the functions are 1. 2. The relation between functions are traceable. What will the “non-volume preserving” (NVP) do? In such task based on the concepts of optical-flow methods, the volume would not be so important. But… The characteristic of “traceable” is essential. RealNVP: modifying the functions which keep the “invertible” characteristic Identical matrix I don’t care
  • 43. TNVP – temporal non-volume preserving https://arxiv.org/pdf/1703.08617.pdf
  • 44. Curiosity in reinforcement learning Rewards Intrinsic – inside the agent Extrinsic – from the environment https://arxiv.org/pdf/1705.05363.pdf IntrinsicExtrinsic 如果可以從當前的狀態跟當前的行為直接 推導下一個狀態,表示這樣的行為已經非 常成熟,而不需要多探索了。 (因此如果可以找到一個動作無法預測未 來,那就應該給予高一點的獎勵) St+1 是由at跟St 一起運作得到的。所以也 許可以透過建構一個model從St+1跟St來反 推at。 這模型可以分為兩部分,第一個部分就是 regressor,也就是inverse model;另外一 部分就是feature extractor。這邊是期望有 一個很好的feature extractor,而這個 feature extractor可以把真正有資訊的 feature抽出來而不是亂抽。loss LI: inverse model loss LF: forward model loss b: 調整intrinsic reward比重。 l:調整intrinsic跟extrinsic rewards比重
  • 45. Curious-driven exploration https://arxiv.org/pdf/1905.10071.pdf Rewards Intrinsic – inside the agent Extrinsic – from the environment Conventional ICM Flow-based ICM 需要用反應(at)做為一 個proxy objective來評 估。這部分要以”精確 找出反應”來調整”狀態” 的encoder。
  • 46. Various kinds of flow model https://arxiv.org/pdf/1907.07945.pdf
  • 48. The process of Glow 製作Flow model時,就掌握三大原則: 1. Traceable 2. 想辦法打亂 3. 如果有channel,channel間必須有 相依性。(方便進行affine轉換) RealNVP (2018) Factor out Scale block … Scale block Z Actnorm 1*1 Invertible conv Affine coupling Scale block … Y Loss 內涵: 1. 在flow裡面,每一個點都有意義,因 此每個點都要算。 2. Normalizing flow的意思就是希望“把 output整到某種分布”,而我們常用的就 是常態分布。因此就把z跟y跟常態分佈 做maximum likelihood (MLE)。 3. 每次做轉換的時候並不希望有太大的” 體積轉變“(log determinant) z 任何你喜歡的分 布,但大部分的 人會挑常態分佈 MLE
  • 49. Actnorm layer This layer try to scale the input into an “acceptable” information for next layer. • “acceptable” means the information have the 0 and 1 for mean and variance, respectively. • This concept is similar to “batch normalization (BN). Here, the mean and variance is get directly from the input data. • This is a special case of “data dependent initialization” • This would be caused from BN is less traceable. Data dependent initialization https://arxiv.org/pdf/1511.06856.pdf (ICLR2016) Layer 1 Layer 2 Layer k …. Layer k-1 inputinputinput Input of Layer K Layer k output of Layer K 控制這一層的比 重,使得output 也可以維持在 activation function可以作 用的範圍 (控制抽樣的 distribution的 mean跟variance 即可) Between layer normalization 原本都是假設前面幾層都是線性的 (supposed affine layer),但實際上不 是這樣。因此還是需要做進一步調整, 調整方式就是用多放不同的Batch進 去,然後估計不同樣本變化多大,在 進一步調整mean跟Variance(rk in the algorithm2)。 Within layer initialization inputinputinput inputinputinput Data dependent initialization https://arxiv.org/pdf/1511.06856.pdf (ICLR2016)
  • 50. Invertible 1x1 convolution Actnorm 1*1 Invertible conv Affine coupling Scale block Purpose: Shuffling the channel information and 1*1 convolution is traceable 1 2 3 1 0 0 0 0 1 0 1 0 1 3 2 reversable 要保持可逆性只需要製作一 個正交矩陣即可。不一定非 要只用1或0構成。 實作方式: 1. 製作一個亂數矩陣,其大小就是1*1 convolution使用的 kernel數量。 2. 把該矩陣做QR分解,其Q矩陣就是正交矩陣。 3. 把Q矩陣作為initialize的比重。
  • 51. Affine Coupling layers RealNVP (2018) Actnorm 1*1 Invertible conv Affine coupling Scale block Resnet Tips: 1. channel數量每一層的 input = output 2. 最後一層用zero- initialization,channel 數量設為2倍 3. 切一半。一半是 shift(+), 一半是scale(*) 用複雜的 神經網路 找出來 center scale + * concate看似整齊,實際上背後會 被1*1convolution做shffle Zero initialization
  • 52. Factor out layer and Squeeze layer RealNVP https://arxiv.org/pdf/1605.08803.pdf Actnorm 1*1 Invertible conv Affine coupling Scale block h z 繼續進行flow 停住不動 Squeezing 把圖像切的更小,而且切法固定不失相 依性。 更多的channel可以讓shuffle進行得更有 效率