SlideShare a Scribd company logo
1 of 29
Download to read offline
Invertible Residual Networks
박수철

모두의연구소 

풀잎스쿨 Deep Generative Models
Jens Behrmann, Will Grathwohl, Ricky T. Q. Chen

David Duvenaud, Jörn-Henrik Jacobsen
Resnets
Fθ = I + gθ
Kaiming He et al. Deep Residual Learning for Image Recognition. 2015
1. Layer가 Invertible해야 한다.
Resnet구조를 사용하면서 non-linear function에 Lipshitz constraint 부여

2. Inverse를 구할 수 있어야 한다.
Banach fixed-point thorem에 근거해 iteration을 통해 근사

3. Layer의 log-determinant를 쉽게 구할 수 있어야 한다.
log-determinant를 matrix logarithm의 trace로 표현하고 이를 근사하여 구
함
Flow Models에 필요한 조건
Flow Models에 필요한 조건
1.Layer가 Invertible해야 한다.
Sufficient condition for invertible ResNets
Jens Behrmann et al. Invertible Residual Networks. 2019
Domain
Image
x
x’
f(x’)
f(x)
||x-x’||
||f(x)-f(x’)||
Lipshitz Constant, Lipshitz Norm
Sufficient condition for invertible ResNets
Takeru Miyato et al. Spectral Normalization for Generative Adversarial Networks. 2018
Jens Behrmann et al. Invertible Residual Networks. 2019
Satisfying the Lipschitz Constraint
1. Relu, ELU, tanh 등의 non-linear activation들은 이미
Lipschitz constraint 만족

2. Matrix multiplication으로 표현되는 dense layer,
convolution layer들은 weight matrix를 largest singular
value로 나누어 normalization하는 것으로 Lipschitz
constraint를 만족시킬 수 있다.
https://en.wikipedia.org/wiki/Singular_value_decomposition
Satisfying the Lipschitz Constraint
M: an arbitrary m × n matrix
U: m×m unitary(orthogonal)
matrix
Σ: m×n diagonal matrix with
non-negative real numbers
on the diagonal
V: n×n unitary(orthogonal)
matrix

Singular Value Decomposition
https://en.wikipedia.org/wiki/Singular_value_decomposition
Satisfying the Lipschitz Constraint
Singular Value Decomposition
Satisfying the Lipschitz Constraint
Weight Normalization
Jens Behrmann et al. Invertible Residual Networks. 2019
Finding the largest singular value
Jens Behrmann et al. Invertible Residual Networks. 2019
Singular Value Decomposition을 수행하는데 O(D^3)의 연산량이 요
구되지만 다음과 같은 알고리즘에 의해 근사가 가능하다.
2. Inverse를 구할 수 있어야 한다.
Inverse of i-Resnet Layer
Banach fixed-point theorem
Jens Behrmann et al. Invertible Residual Networks. 2019
Inverse of i-Resnet Layer
Jens Behrmann et al. Invertible Residual Networks. 2019
3. Layer의 log-determinant를
쉽게 구할 수 있어야 한다.
Log-determinant of Jacobian
Jens Behrmann et al. Invertible Residual Networks. 2019
ln px(x) = ln pz(z) + ln det JF(x)
= tr(ln JF)
ln det JF(x) = ln det JF(x)
= tr(ln(I + Jg(x)))
=
∞
∑
k=1
(−1)k+1
tr(Jk
g)
k
Change of variable
by Lipshitz constraint
by Withers & Nadarajah (2010)
for z = F(x) = (I + g)(x)
by definition
Log-determinant of Jacobian
Hall, B. C. Lie groups, lie algebras, and representations: An elementary introduction. Graduate Texts in Mathematics,
222 (2nd ed.), Springer, 2015.
Complex Logarithm
Matrix Logarithm
Log-determinant of Jacobian
ln det JF(x) =
∞
∑
k=1
(−1)k+1
tr(Jk
g)
k
Problems
1. 를 구하는데 O(d^2) 연산량 필요
2. Jacobian matrix 자체를 를 구하기 어렵다.
3. 무한 수열의 합을 계산해야한다.
tr(Jk
g)
Jk
g
Solutions
1.2. Deep learning framework에서 제공하는 automatic differentiation을
이용해서 vector-Jacobian 을 구하고 이를 이용해 matrix trace를
stochastic approximation함.
3. 임의의 index n까지만 계산하여 근사한다. 이로인해 biased estimator가 되지
만 에러가 bound됨을 증명.
vT
Jg
Jens Behrmann et al. Invertible Residual Networks. 2019
Log-determinant of Jacobian
Jens Behrmann et al. Invertible Residual Networks. 2019
tr(A) = 𝔼p(v) [vT
Av]
Hutchinsons trace estimator
, where A ∈ ℝd×d
, 𝔼[v] = 0, Cov(v) = I
Log-determinant of Jacobian
Implementation of 

backpropagation
https://github.com/eriklindernoren/ML-From-Scratch/blob/master/mlfromscratch/deep_learning/layers.py
∂L
∂y
∂y
∂x
y = Wx + b
∂y
∂x
= W
Log-determinant of Jacobian
에서 vector 를 샘플링하고 원하는 layer의 backward 입력으로
넣으면 vector-Jacobian 을 구할 수 있다.
p(v) v
vT
Jg
Layer g Layer g
x
y = g(x) v ∼ p(v)
Forward Backward
vT
Jg
Log-determinant of Jacobian
처음에 로 초기화되고 그다음부터는 의 값을 가짐vT
wT
Jg
결국 값을 갖게 되고 를 근사한다.vT
Jk
gv tr(Jk
g)
Jens Behrmann et al. Invertible Residual Networks. 2019
Log-determinant of Jacobian
Jens Behrmann et al. Invertible Residual Networks. 2019
비교
Jens Behrmann et al. Invertible Residual Networks. 2019
결과
Jens Behrmann et al. Invertible Residual Networks. 2019
결과
Jens Behrmann et al. Invertible Residual Networks. 2019
결과
Jens Behrmann et al. Invertible Residual Networks. 2019
끝

More Related Content

What's hot

머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear Model머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear ModelJungkyu Lee
 
7. Linear Regression
7. Linear Regression7. Linear Regression
7. Linear RegressionJungkyu Lee
 
8. Logistic Regression
8. Logistic Regression8. Logistic Regression
8. Logistic RegressionJungkyu Lee
 
머피의 머신러닝 : Gaussian Processes
머피의 머신러닝 : Gaussian Processes머피의 머신러닝 : Gaussian Processes
머피의 머신러닝 : Gaussian ProcessesJungkyu Lee
 
Lecture 3: Unsupervised Learning
Lecture 3: Unsupervised LearningLecture 3: Unsupervised Learning
Lecture 3: Unsupervised LearningSang Jun Lee
 
FCN to DeepLab.v3+
FCN to DeepLab.v3+FCN to DeepLab.v3+
FCN to DeepLab.v3+Whi Kwon
 
Lecture 4: Neural Networks I
Lecture 4: Neural Networks ILecture 4: Neural Networks I
Lecture 4: Neural Networks ISang Jun Lee
 
[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)Donghyeon Kim
 
Bayesian nets 발표 1
Bayesian nets 발표 1Bayesian nets 발표 1
Bayesian nets 발표 1민석 김
 
[Paper] eXplainable ai(xai) in computer vision
[Paper] eXplainable ai(xai) in computer vision[Paper] eXplainable ai(xai) in computer vision
[Paper] eXplainable ai(xai) in computer visionSusang Kim
 
Murpy's Machine Learning 9. Generalize Linear Model
Murpy's Machine Learning 9. Generalize Linear ModelMurpy's Machine Learning 9. Generalize Linear Model
Murpy's Machine Learning 9. Generalize Linear ModelJungkyu Lee
 
Eigendecomposition and pca
Eigendecomposition and pcaEigendecomposition and pca
Eigendecomposition and pcaJinhwan Suk
 
3 Generative models for discrete data
3 Generative models for discrete data3 Generative models for discrete data
3 Generative models for discrete dataJungkyu Lee
 

What's hot (13)

머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear Model머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear Model
 
7. Linear Regression
7. Linear Regression7. Linear Regression
7. Linear Regression
 
8. Logistic Regression
8. Logistic Regression8. Logistic Regression
8. Logistic Regression
 
머피의 머신러닝 : Gaussian Processes
머피의 머신러닝 : Gaussian Processes머피의 머신러닝 : Gaussian Processes
머피의 머신러닝 : Gaussian Processes
 
Lecture 3: Unsupervised Learning
Lecture 3: Unsupervised LearningLecture 3: Unsupervised Learning
Lecture 3: Unsupervised Learning
 
FCN to DeepLab.v3+
FCN to DeepLab.v3+FCN to DeepLab.v3+
FCN to DeepLab.v3+
 
Lecture 4: Neural Networks I
Lecture 4: Neural Networks ILecture 4: Neural Networks I
Lecture 4: Neural Networks I
 
[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)
 
Bayesian nets 발표 1
Bayesian nets 발표 1Bayesian nets 발표 1
Bayesian nets 발표 1
 
[Paper] eXplainable ai(xai) in computer vision
[Paper] eXplainable ai(xai) in computer vision[Paper] eXplainable ai(xai) in computer vision
[Paper] eXplainable ai(xai) in computer vision
 
Murpy's Machine Learning 9. Generalize Linear Model
Murpy's Machine Learning 9. Generalize Linear ModelMurpy's Machine Learning 9. Generalize Linear Model
Murpy's Machine Learning 9. Generalize Linear Model
 
Eigendecomposition and pca
Eigendecomposition and pcaEigendecomposition and pca
Eigendecomposition and pca
 
3 Generative models for discrete data
3 Generative models for discrete data3 Generative models for discrete data
3 Generative models for discrete data
 

Invertible residual networks Review

  • 1. Invertible Residual Networks 박수철
 모두의연구소 
 풀잎스쿨 Deep Generative Models Jens Behrmann, Will Grathwohl, Ricky T. Q. Chen
 David Duvenaud, Jörn-Henrik Jacobsen
  • 2. Resnets Fθ = I + gθ Kaiming He et al. Deep Residual Learning for Image Recognition. 2015
  • 3. 1. Layer가 Invertible해야 한다. Resnet구조를 사용하면서 non-linear function에 Lipshitz constraint 부여
 2. Inverse를 구할 수 있어야 한다. Banach fixed-point thorem에 근거해 iteration을 통해 근사
 3. Layer의 log-determinant를 쉽게 구할 수 있어야 한다. log-determinant를 matrix logarithm의 trace로 표현하고 이를 근사하여 구 함 Flow Models에 필요한 조건
  • 6. Sufficient condition for invertible ResNets Jens Behrmann et al. Invertible Residual Networks. 2019
  • 7. Domain Image x x’ f(x’) f(x) ||x-x’|| ||f(x)-f(x’)|| Lipshitz Constant, Lipshitz Norm Sufficient condition for invertible ResNets Takeru Miyato et al. Spectral Normalization for Generative Adversarial Networks. 2018
  • 8. Jens Behrmann et al. Invertible Residual Networks. 2019 Satisfying the Lipschitz Constraint 1. Relu, ELU, tanh 등의 non-linear activation들은 이미 Lipschitz constraint 만족
 2. Matrix multiplication으로 표현되는 dense layer, convolution layer들은 weight matrix를 largest singular value로 나누어 normalization하는 것으로 Lipschitz constraint를 만족시킬 수 있다.
  • 9. https://en.wikipedia.org/wiki/Singular_value_decomposition Satisfying the Lipschitz Constraint M: an arbitrary m × n matrix U: m×m unitary(orthogonal) matrix Σ: m×n diagonal matrix with non-negative real numbers on the diagonal V: n×n unitary(orthogonal) matrix
 Singular Value Decomposition
  • 11. Satisfying the Lipschitz Constraint Weight Normalization Jens Behrmann et al. Invertible Residual Networks. 2019
  • 12. Finding the largest singular value Jens Behrmann et al. Invertible Residual Networks. 2019 Singular Value Decomposition을 수행하는데 O(D^3)의 연산량이 요 구되지만 다음과 같은 알고리즘에 의해 근사가 가능하다.
  • 13. 2. Inverse를 구할 수 있어야 한다.
  • 14. Inverse of i-Resnet Layer Banach fixed-point theorem Jens Behrmann et al. Invertible Residual Networks. 2019
  • 15. Inverse of i-Resnet Layer Jens Behrmann et al. Invertible Residual Networks. 2019
  • 16. 3. Layer의 log-determinant를 쉽게 구할 수 있어야 한다.
  • 17. Log-determinant of Jacobian Jens Behrmann et al. Invertible Residual Networks. 2019 ln px(x) = ln pz(z) + ln det JF(x) = tr(ln JF) ln det JF(x) = ln det JF(x) = tr(ln(I + Jg(x))) = ∞ ∑ k=1 (−1)k+1 tr(Jk g) k Change of variable by Lipshitz constraint by Withers & Nadarajah (2010) for z = F(x) = (I + g)(x) by definition
  • 18. Log-determinant of Jacobian Hall, B. C. Lie groups, lie algebras, and representations: An elementary introduction. Graduate Texts in Mathematics, 222 (2nd ed.), Springer, 2015. Complex Logarithm Matrix Logarithm
  • 19. Log-determinant of Jacobian ln det JF(x) = ∞ ∑ k=1 (−1)k+1 tr(Jk g) k Problems 1. 를 구하는데 O(d^2) 연산량 필요 2. Jacobian matrix 자체를 를 구하기 어렵다. 3. 무한 수열의 합을 계산해야한다. tr(Jk g) Jk g Solutions 1.2. Deep learning framework에서 제공하는 automatic differentiation을 이용해서 vector-Jacobian 을 구하고 이를 이용해 matrix trace를 stochastic approximation함. 3. 임의의 index n까지만 계산하여 근사한다. 이로인해 biased estimator가 되지 만 에러가 bound됨을 증명. vT Jg Jens Behrmann et al. Invertible Residual Networks. 2019
  • 20. Log-determinant of Jacobian Jens Behrmann et al. Invertible Residual Networks. 2019 tr(A) = 𝔼p(v) [vT Av] Hutchinsons trace estimator , where A ∈ ℝd×d , 𝔼[v] = 0, Cov(v) = I
  • 21. Log-determinant of Jacobian Implementation of 
 backpropagation https://github.com/eriklindernoren/ML-From-Scratch/blob/master/mlfromscratch/deep_learning/layers.py ∂L ∂y ∂y ∂x y = Wx + b ∂y ∂x = W
  • 22. Log-determinant of Jacobian 에서 vector 를 샘플링하고 원하는 layer의 backward 입력으로 넣으면 vector-Jacobian 을 구할 수 있다. p(v) v vT Jg Layer g Layer g x y = g(x) v ∼ p(v) Forward Backward vT Jg
  • 23. Log-determinant of Jacobian 처음에 로 초기화되고 그다음부터는 의 값을 가짐vT wT Jg 결국 값을 갖게 되고 를 근사한다.vT Jk gv tr(Jk g) Jens Behrmann et al. Invertible Residual Networks. 2019
  • 24. Log-determinant of Jacobian Jens Behrmann et al. Invertible Residual Networks. 2019
  • 25. 비교 Jens Behrmann et al. Invertible Residual Networks. 2019
  • 26. 결과 Jens Behrmann et al. Invertible Residual Networks. 2019
  • 27. 결과 Jens Behrmann et al. Invertible Residual Networks. 2019
  • 28. 결과 Jens Behrmann et al. Invertible Residual Networks. 2019
  • 29.