Chapter 14 AutoEncoder

Chapter 14
Autoencoders
지후, 경욱

Contents
• 먼저 알면 좋을 것들
• 14.1 ~ 14.4
• 14.5 Denoising Autoencoders
• 14.6 Learning Manifolds with Autoencoders
• 14.7 Contractive Autoencoders
• 14.8 Predictive Sparse Decomposition
• 14.9 Applications of Autoencoders
+ VAE에 대한 내용을 조금 해볼까 합니다.. (개어렵)
※ 8,9는 각각 1페이지 → 이해 못함 & 철지난 얘기 같으므로 그냥 읽어보시길 추천
※ ‘오토인코더의 모든 것 - 1/3 (Naver TV)’ 축약이라 생각하시면 됩니다. 시간되면 꼭 보시길..
https://tv.naver.com/v/3185672/list/150913
※ ‘오토인코더의 모든 것 - 2/3, 3/3’ 은 개어렵습니다.

어서와, 오토인코더는 처음이지?

어서와, Maximum Likelihood는
처음이 아니지?

어서와, Manifold도
처음이 아니지?

Autoencoder: neural network that is trained to attempt to copy its input to its output
hidden layer that describes a code used to represent the input
encoder function
decoder that produces a reconstruction
Intro
input output
code

If an autoencoder succeeds in
simply learning to set everywhere,
Intro
Good? or Bad?
- Then it is not especially useful.
- Autoencoders are designed to be unable to learn to copy perfectly.
- They are restricted in ways that allow them to copy only approximately, and to copy only
input that resembles the training data.
- Because the model is forced to prioritize which aspects of the input should be copied,
it often learns useful properties of the data.

Intro
Traditionally, autoencoders were used for dimensionality reduction or feature learning.
Recently, theoretical connections between autoencoders and latent variable models
have brought autoencoders to the forefront of generative modeling,
as we will see in chapter 20.
Autoencoders? For what?

We hope that training the autoencoder to perform the input copying task
will result in h taking on useful properties.
One way to obtain useful features from the autoencoder is
to constrain h To have a smaller dimension than x.
An autoencoder whose code dimension is less than the input dimension…
>> Undercomplete Autoencoder
14.1 Undercomplete Autoencoders

14.1 Undercomplete Autoencoders
The learning process is described simply as
minimizing a loss function
where L is a loss function penalizing g(f(x)) for being dissimilar from x
,such as the mean squared error
Quiz. If model is linear & loss function is MSE…?

14.2 Regularized Autoencoders
If the encoder and decoder are given too much capacity,
Undercomplete autoencoders fail to learn anything useful.
A similar problem occurs if the hidden code is allowed to have dimension equal to the input,
and in the overcomplete case in which the hidden code has dimension greater than the input.
But…

Rather than limiting the model capacity by keeping the encoder
and decoder shallow and the code size small,
Regularized autoencoders use a loss function that encourages the model
to have other properties besides the ability to copy its input to its output.
A regularized autoencoder can be nonlinear and overcomplete
but still learn something useful about the data distribution,
even if the model capacity is great enough to learn a trivial identity function.

1. Sparse Autoencoders
A sparse autoencoder is simply an autoencoder whose training criterion involves
a sparsity penalty Ω(h) on the code layer h.
Sparse autoencoders are typically used to learn features for another task, such as classiﬁcation
An autoencoder that has been regularized to be sparse must respond to unique statistical
features of the dataset it has been trained on, rather than simply acting as an identity function.

1. Sparse Autoencoders

2. Denoising Autoencoders
Rather than adding a penalty Ω to the cost function, we can obtain an autoencoder
that learns something useful by changing the reconstruction error term of the cost function.
where is a copy of that has been corrupted by some form of noise.
Denoising autoencoders must therefore undo this corruption
rather than simply copying their input. To be continued in 14.5.

Another strategy for regularizing an autoencoder is to use a penalty Ω, as in sparse autoencoders
but with a diﬀerent form of Ω
An autoencoder regularized in this way is called a contractive autoencoder, or CAE.
The CAE is described in more detail in section 14.7
3. Regularizing by Penalizing Derivatives

14.3 Representational Power, Layer Size and Depth
Autoencoders are often trained with only a single layer encoder and a single layer decoder.
However, this is not a requirement.
In fact, using deep encoders and decoders oﬀers many advantages.
Recall from section 6.4.1 that there are many advantages to depth in a feedforward network.
Because autoencoders are feedforward networks, these advantages also apply to autoencoders
Depth can exponentially reduce the computational cost of representing some functions.
Depth can exponentially decrease the amount of training data needed to learn some functions.

(Right) Another repeating pattern can
be folded on top of the first (by
another downstream unit) to obtain
another symmetry (which is now
repeated four times, with two hidden
layers)
Figure 6.5: An intuitive, geometric explanation of the exponential advantage of deeper rectifier networks
(Left) An absolute value rectification
unit has the same output for every pair of
mirror points in its input. The mirror axis
of symmetry is given by the hyperplane
defined by the weights and bias of the unit.
A function computed on top of that unit
(the green decision surface) will be a mirror
image of a simpler pattern across that axis
of symmetry.
(Center) The function can be obtained
by folding the space around the axis of
symmetry.

Autoencoders are just feedforward networks.
The same loss functions and output unit types that can be used for
traditional feedforward networks are also used for autoencoders.
14.4 Stochastic Encoders and Decoders
As described in section 6.2.2.4, a general strategy for designing the output units
and the loss function of a feedforward network is to deﬁne an output distribution
p(y|x) and minimize the negative log-likelihood -log p(y|x).
In that setting, y is a vector of targets, such as class labels.
In an autoencoder, x is now the target as well as the input.

Given a hidden code h, we may think of the decoder as providing
a conditional distribution .
We may then train the autoencoder by minimizing .
As with traditional feedforward networks, we usually use linear output units to parametrize
the mean of a Gaussian distribution if x is real valued.
In that case, the negative log-likelihood yields a mean squared error criterion.
Similarly, binary x values correspond to a Bernoulli distribution whose parameters are
given by a sigmoid output unit, discrete x values correspond to a softmax distribution,
and so on.

We can also generalize the notion of an encoding function
to an encoding distribution

14.5 Denoising Autoencoder(잡음 제거 자동부호기)
DAE는 손상된 자료점이 입력되었을 때 손상되지 않은 원래의 자료점을 출력하도록 훈련되는 자동부호기이다.
그림 14.3
손상된 자료점
1. 훈련 자료에서 훈련 견본 x를 추출한다.
2. 𝐶( 𝑥|𝑥 = 𝑥)로부터 손상된 버전 𝑥를 추출한다.
3. (𝑥, 𝑥)를 하나의 훈련 견본으로 사용해서 자동부호기 재구축 분포를 추정한다.
감이 올 것 같으면서도 오지 않아.

다시 돌아가서..Source : 오토인코더의 모든 것 https://www.slideshare.net/NaverEngineering/ss-96581209

Source : 오토인코더의 모든 것 https://www.slideshare.net/NaverEngineering/ss-96581209

다시 돌아오면..
사람이 생각했을 때 똑같다라고 생각할만큼의 노이즈를 준다.
아 -> 노이즈 아 -> 같은 거네

그림 14.4
Source : 책
손상된 자료점
손상 과정

14.7 Contractive Autoencoders

14.7 Contractive Autoencoders
개선됨을 보여줬지만 제가 봤을 땐 잘 모르겠구요 ㅎ
(실제로 영상에서 한 말)

14.6 Learning Manifolds with Autoencoders (자동부호기로 다양체 배우기)
그림 14.6

모든 자동부호기 훈련 절차는 다음 두 요인의 절충이 관여한다.
1. 훈련 견본이 x이고 그것의 표현이 h라고 할 때, 복호기를 이용해서 h로부터 x를 복원하는 방법의 학습.
이때 x가 훈련 자료에서 추출한 것이라는 사실이 아주 중요하다.
이는 자동부호기가 자료 생성 분포로부터 나올 가능성이 없는 입력들을 성공적으로 재구축할 필요가 없음을
뜻하기 때문이다.
2. 제약 또는 정칙화 벌점의 충족.
여기서 제약 또는 정칙화 벌점은 자동부호기의 수용력을 제한하는 아키텍처적 제약일 수도 있고
재구축 비용에 추가되는 정칙화 항일 수도 있다.

1 Page 짜리 노이해
14.8 Predictive Sparse Decomposition
따라서 Skip

14.9 Applications of Autoencoders (자동부호기의 응용)
? 정보 검색이요..?
차원축소 (당연한말)
? 의미 해싱이요..?

14.9 Applications of Autoencoders (자동부호기의 응용)

Chapter 14 AutoEncoder

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Chapter 14 AutoEncoder

Similar to Chapter 14 AutoEncoder (20)

More from KyeongUkJang

More from KyeongUkJang (20)

Recently uploaded

Recently uploaded (20)

Chapter 14 AutoEncoder

Editor's Notes