SlideShare a Scribd company logo
1 of 32
Download to read offline
Deep Feedforward Networks
하인준 이화용
6
목표는 어떤 함수를 근사하는것
feedforward인 이유는 한 방향으로만 가면서 최종 출력 y에 도달하기 때문
feedback을 도입하여 확장한 모형을 recurrent neural network(순환 신경망 10장)
f(x) ≈ f*(x)
네트워크는 그물처럼 엮인 형태로 표현
세 함수의 연쇄구조. 가 각각 1층 2층
사슬구조의 길이는 깊이(또는 심도)
마지막 층( )을 출력층이라고 부른다
신경망을 훈련한다는것은 이 f들을 목적함수에 적합시키는 것을 의미.
중간층의 출력을 보여주지 않기 때문에 hidden layer라고 한다.
f(1)
, f(2)
f(3)
(f(2)
(f(1)
(x))) ≈ f*(x)
f(3)
perceptron 선형모형의 한계를 어떻게 극복할 수 있을까
선형모형은 모형의 수용력이 선형함수로만 국한됨. 이 결함 때문에 선형모형은 임의의 두 입력변수 사이의 상호작용을 이해할 수 없음
선형모형을 x의 비선형 함수들로 확장하는 방법은 선형모형을 x자체가 아니라 변환된 에 적용하는 것이다.
kernel의 개념과 유사!
ϕ(x)
Example : XOR case
x1
x2
̂yS
̂y = σ(X * weight + bias)
perceptron 선형모형의 한계를 어떻게 극복할 수 있을까
선형모형은 모형의 수용력이 선형함수로만 국한됨. 이 결함 때문에 선형모형은 임의의 두 입력변수 사이의 상호작용을 이해할 수 없음
선형모형을 x의 비선형 함수들로 확장하는 방법은 선형모형을 x자체가 아니라 변환된 에 적용하는 것이다.
kernel의 개념과 유사!
ϕ(x)
Example : XOR case
Example : XOR case
x1
x2
̂yS
x1
x2
̂yS
x1
x2
̂yS
Example : XOR case
XOR
0 0 0 1 0 0
0 1 1
1 0 1
1 1 0
x1 x2 y1 y2 ̂y
x1
x2
y1S
x1
x2
y2S
y1
y2
̂yS
W =
[
5
5]
, b = − 8 W = [
−7
−7], b = 3 W =
[
−11
−11]
, b = 6
Example : XOR case
XOR
0 0 0 1 0 0
0 1 0 0 1 1
1 0 1
1 1 0
x1 x2 y1 y2 ̂y
x1
x2
y1S
x1
x2
y2S
y1
y2
̂yS
W = [
−7
−7], b = 3 W =
[
−11
−11]
, b = 6W =
[
5
5]
, b = − 8
Example : XOR case
XOR
0 0 0 1 0 0
0 1 0 0 1 1
1 0 0 0 1 1
1 1 0
x1 x2 y1 y2 ̂y
x1
x2
y1S
x1
x2
y2S
y1
y2
̂yS
W = [
−7
−7], b = 3 W =
[
−11
−11]
, b = 6W =
[
5
5]
, b = − 8
Example : XOR case
XOR
0 0 0 1 0 0
0 1 0 0 1 1
1 0 0 0 1 1
1 1 1 0 0 0
x1 x2 y1 y2 ̂y
x1
x2
y1S
x1
x2
y2S
y1
y2
̂yS
W = [
−7
−7], b = 3 W =
[
−11
−11]
, b = 6W =
[
5
5]
, b = − 8
Example : XOR case
+
-
-
x1
x2
y1
y2
선형모형을 x의 비선형 함수들로 확장하는 방법은 선형모형을 x자체가 아니라 변환된 에 적용하는 것이다.
kernel의 개념과 유사!
ϕ(x)
연준감사
Example : XOR case
x1
x2
̂y
다른 W와 b 구할수 있을까?
S
S
S
W = [
−7
−7], b = 3
W =
[
−11
−11]
, b = 6
W =
[
5
5]
, b = − 8
Example : XOR case
̂YX
W1 =
[
5 −7
5 −7]
B1 =
[
−8
3 ]
W2 =
[
−11
−11]
b2 = 6
XW1 + B1 XW2 + b2S S
k(X) = σ(XW1 + B1)
̂Y = H(X) = σ(k(X)W2 + b2)
K
f(3)
(f(2)
(f(1)
(x))) ≈ f*(x)
Example : XOR case
Gradient-Based Learning
최적의 모델?
개선? 학습?
Gradient-Based Learning
Gradient-Based Learning
Cost Function의
Global optimum을 찾기는 어렵다!
Gradient-Based Learning
딥러닝에서는 layer가 쌓이면서
추정해야 할 모수들이 기하급수적으로 늘어나므로
수치적 추정은 거의 불가능하다.
적절한 Local optimum을 목표로 한다.
Gradient-Based Learning
Cost Function
딥 뉴럴 네트워크의 비용함수도 선형 모델과 별다르지 않다!
Gradient-Based Learning
Output Units
Binary Multinomial
Algorithm Logistic Softmax
cost function Logistic cost function Cross Entropy
formula C : (H(x), y) = − y log(H(x)) − (1 − y)log(1 − H(x)) C : (H(X), Y) = −
∑
i
Yi log(H(Xi))
Hidden Units
Hidden Units
Hidden Units
Vanishing Gradient
Hidden Units
̂YX S S SS
Sigmoid !
Hidden Units
̂YX R R SR
g(x) = max(0,x)
ReLU !
https://en.wikipedia.org/wiki/Activation_function
Hidden Units
Hidden Units

More Related Content

What's hot

Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear ComplexitySangwoo Mo
 
Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)nikhilus85
 
Gaussian Mixture Model
Gaussian Mixture ModelGaussian Mixture Model
Gaussian Mixture ModelKyeongUkJang
 
Flow based generative models
Flow based generative modelsFlow based generative models
Flow based generative models수철 박
 
Time series predictions using LSTMs
Time series predictions using LSTMsTime series predictions using LSTMs
Time series predictions using LSTMsSetu Chokshi
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)H K Yoon
 
Variational inference intro. (korean ver.)
Variational inference intro. (korean ver.)Variational inference intro. (korean ver.)
Variational inference intro. (korean ver.)Kiho Hong
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learningJie-Han Chen
 
PRML Chapter 10
PRML Chapter 10PRML Chapter 10
PRML Chapter 10Sunwoo Kim
 
Variational Auto Encoder and the Math Behind
Variational Auto Encoder and the Math BehindVariational Auto Encoder and the Math Behind
Variational Auto Encoder and the Math BehindVarun Reddy
 
Transformer in Computer Vision
Transformer in Computer VisionTransformer in Computer Vision
Transformer in Computer VisionDongmin Choi
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models Chia-Wen Cheng
 
GAN with Mathematics
GAN with MathematicsGAN with Mathematics
GAN with MathematicsHyeongmin Lee
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 

What's hot (20)

Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear Complexity
 
Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)Action Recognition (Thesis presentation)
Action Recognition (Thesis presentation)
 
Gaussian Mixture Model
Gaussian Mixture ModelGaussian Mixture Model
Gaussian Mixture Model
 
Flow based generative models
Flow based generative modelsFlow based generative models
Flow based generative models
 
Swin transformer
Swin transformerSwin transformer
Swin transformer
 
Time series predictions using LSTMs
Time series predictions using LSTMsTime series predictions using LSTMs
Time series predictions using LSTMs
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)
 
Variational inference intro. (korean ver.)
Variational inference intro. (korean ver.)Variational inference intro. (korean ver.)
Variational inference intro. (korean ver.)
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
 
PRML Chapter 10
PRML Chapter 10PRML Chapter 10
PRML Chapter 10
 
ViT.pptx
ViT.pptxViT.pptx
ViT.pptx
 
Variational Auto Encoder and the Math Behind
Variational Auto Encoder and the Math BehindVariational Auto Encoder and the Math Behind
Variational Auto Encoder and the Math Behind
 
Transformer in Computer Vision
Transformer in Computer VisionTransformer in Computer Vision
Transformer in Computer Vision
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
GAN with Mathematics
GAN with MathematicsGAN with Mathematics
GAN with Mathematics
 
Lstm
LstmLstm
Lstm
 
Cnn 강의
Cnn 강의Cnn 강의
Cnn 강의
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 

Similar to Chapter 6 Deep feedforward networks - 1

Wasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 IWasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 ISungbin Lim
 
The Art of Computer Programming 2.3.2 Tree
The Art of Computer Programming 2.3.2 TreeThe Art of Computer Programming 2.3.2 Tree
The Art of Computer Programming 2.3.2 Treehyun soomyung
 
머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear Model머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear ModelJungkyu Lee
 
밑바닥부터시작하는딥러닝 Ch05
밑바닥부터시작하는딥러닝 Ch05밑바닥부터시작하는딥러닝 Ch05
밑바닥부터시작하는딥러닝 Ch05HyeonSeok Choi
 
제4강 명제와 논리-정보
제4강 명제와 논리-정보제4강 명제와 논리-정보
제4강 명제와 논리-정보csungwoo
 
부울 대수와 컴퓨터 논리
부울 대수와 컴퓨터 논리부울 대수와 컴퓨터 논리
부울 대수와 컴퓨터 논리suitzero
 
XAI recent researches
XAI recent researchesXAI recent researches
XAI recent researchesseungwoo kim
 

Similar to Chapter 6 Deep feedforward networks - 1 (7)

Wasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 IWasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 I
 
The Art of Computer Programming 2.3.2 Tree
The Art of Computer Programming 2.3.2 TreeThe Art of Computer Programming 2.3.2 Tree
The Art of Computer Programming 2.3.2 Tree
 
머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear Model머피's 머신러닝: Latent Linear Model
머피's 머신러닝: Latent Linear Model
 
밑바닥부터시작하는딥러닝 Ch05
밑바닥부터시작하는딥러닝 Ch05밑바닥부터시작하는딥러닝 Ch05
밑바닥부터시작하는딥러닝 Ch05
 
제4강 명제와 논리-정보
제4강 명제와 논리-정보제4강 명제와 논리-정보
제4강 명제와 논리-정보
 
부울 대수와 컴퓨터 논리
부울 대수와 컴퓨터 논리부울 대수와 컴퓨터 논리
부울 대수와 컴퓨터 논리
 
XAI recent researches
XAI recent researchesXAI recent researches
XAI recent researches
 

More from KyeongUkJang

Photo wake up - 3d character animation from a single photo
Photo wake up - 3d character animation from a single photoPhoto wake up - 3d character animation from a single photo
Photo wake up - 3d character animation from a single photoKyeongUkJang
 
GAN - Generative Adversarial Nets
GAN - Generative Adversarial NetsGAN - Generative Adversarial Nets
GAN - Generative Adversarial NetsKyeongUkJang
 
Distilling the knowledge in a neural network
Distilling the knowledge in a neural networkDistilling the knowledge in a neural network
Distilling the knowledge in a neural networkKyeongUkJang
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationKyeongUkJang
 
CNN for sentence classification
CNN for sentence classificationCNN for sentence classification
CNN for sentence classificationKyeongUkJang
 
Visualizing data using t-SNE
Visualizing data using t-SNEVisualizing data using t-SNE
Visualizing data using t-SNEKyeongUkJang
 
Playing atari with deep reinforcement learning
Playing atari with deep reinforcement learningPlaying atari with deep reinforcement learning
Playing atari with deep reinforcement learningKyeongUkJang
 
Chapter 20 Deep generative models
Chapter 20 Deep generative modelsChapter 20 Deep generative models
Chapter 20 Deep generative modelsKyeongUkJang
 
Natural Language Processing(NLP) - basic 2
Natural Language Processing(NLP) - basic 2Natural Language Processing(NLP) - basic 2
Natural Language Processing(NLP) - basic 2KyeongUkJang
 
Natural Language Processing(NLP) - Basic
Natural Language Processing(NLP) - BasicNatural Language Processing(NLP) - Basic
Natural Language Processing(NLP) - BasicKyeongUkJang
 
Chapter 17 monte carlo methods
Chapter 17 monte carlo methodsChapter 17 monte carlo methods
Chapter 17 monte carlo methodsKyeongUkJang
 
Chapter 16 structured probabilistic models for deep learning - 2
Chapter 16 structured probabilistic models for deep learning - 2Chapter 16 structured probabilistic models for deep learning - 2
Chapter 16 structured probabilistic models for deep learning - 2KyeongUkJang
 
Chapter 16 structured probabilistic models for deep learning - 1
Chapter 16 structured probabilistic models for deep learning - 1Chapter 16 structured probabilistic models for deep learning - 1
Chapter 16 structured probabilistic models for deep learning - 1KyeongUkJang
 
Chapter 15 Representation learning - 2
Chapter 15 Representation learning - 2Chapter 15 Representation learning - 2
Chapter 15 Representation learning - 2KyeongUkJang
 
Chapter 15 Representation learning - 1
Chapter 15 Representation learning - 1Chapter 15 Representation learning - 1
Chapter 15 Representation learning - 1KyeongUkJang
 

More from KyeongUkJang (20)

Photo wake up - 3d character animation from a single photo
Photo wake up - 3d character animation from a single photoPhoto wake up - 3d character animation from a single photo
Photo wake up - 3d character animation from a single photo
 
YOLO
YOLOYOLO
YOLO
 
AlphagoZero
AlphagoZeroAlphagoZero
AlphagoZero
 
GoogLenet
GoogLenetGoogLenet
GoogLenet
 
GAN - Generative Adversarial Nets
GAN - Generative Adversarial NetsGAN - Generative Adversarial Nets
GAN - Generative Adversarial Nets
 
Distilling the knowledge in a neural network
Distilling the knowledge in a neural networkDistilling the knowledge in a neural network
Distilling the knowledge in a neural network
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
CNN for sentence classification
CNN for sentence classificationCNN for sentence classification
CNN for sentence classification
 
Visualizing data using t-SNE
Visualizing data using t-SNEVisualizing data using t-SNE
Visualizing data using t-SNE
 
Playing atari with deep reinforcement learning
Playing atari with deep reinforcement learningPlaying atari with deep reinforcement learning
Playing atari with deep reinforcement learning
 
Chapter 20 - GAN
Chapter 20 - GANChapter 20 - GAN
Chapter 20 - GAN
 
Chapter 20 - VAE
Chapter 20 - VAEChapter 20 - VAE
Chapter 20 - VAE
 
Chapter 20 Deep generative models
Chapter 20 Deep generative modelsChapter 20 Deep generative models
Chapter 20 Deep generative models
 
Natural Language Processing(NLP) - basic 2
Natural Language Processing(NLP) - basic 2Natural Language Processing(NLP) - basic 2
Natural Language Processing(NLP) - basic 2
 
Natural Language Processing(NLP) - Basic
Natural Language Processing(NLP) - BasicNatural Language Processing(NLP) - Basic
Natural Language Processing(NLP) - Basic
 
Chapter 17 monte carlo methods
Chapter 17 monte carlo methodsChapter 17 monte carlo methods
Chapter 17 monte carlo methods
 
Chapter 16 structured probabilistic models for deep learning - 2
Chapter 16 structured probabilistic models for deep learning - 2Chapter 16 structured probabilistic models for deep learning - 2
Chapter 16 structured probabilistic models for deep learning - 2
 
Chapter 16 structured probabilistic models for deep learning - 1
Chapter 16 structured probabilistic models for deep learning - 1Chapter 16 structured probabilistic models for deep learning - 1
Chapter 16 structured probabilistic models for deep learning - 1
 
Chapter 15 Representation learning - 2
Chapter 15 Representation learning - 2Chapter 15 Representation learning - 2
Chapter 15 Representation learning - 2
 
Chapter 15 Representation learning - 1
Chapter 15 Representation learning - 1Chapter 15 Representation learning - 1
Chapter 15 Representation learning - 1
 

Chapter 6 Deep feedforward networks - 1

  • 2. 목표는 어떤 함수를 근사하는것 feedforward인 이유는 한 방향으로만 가면서 최종 출력 y에 도달하기 때문 feedback을 도입하여 확장한 모형을 recurrent neural network(순환 신경망 10장) f(x) ≈ f*(x)
  • 3. 네트워크는 그물처럼 엮인 형태로 표현 세 함수의 연쇄구조. 가 각각 1층 2층 사슬구조의 길이는 깊이(또는 심도) 마지막 층( )을 출력층이라고 부른다 신경망을 훈련한다는것은 이 f들을 목적함수에 적합시키는 것을 의미. 중간층의 출력을 보여주지 않기 때문에 hidden layer라고 한다. f(1) , f(2) f(3) (f(2) (f(1) (x))) ≈ f*(x) f(3)
  • 4. perceptron 선형모형의 한계를 어떻게 극복할 수 있을까 선형모형은 모형의 수용력이 선형함수로만 국한됨. 이 결함 때문에 선형모형은 임의의 두 입력변수 사이의 상호작용을 이해할 수 없음 선형모형을 x의 비선형 함수들로 확장하는 방법은 선형모형을 x자체가 아니라 변환된 에 적용하는 것이다. kernel의 개념과 유사! ϕ(x)
  • 5.
  • 6.
  • 7. Example : XOR case x1 x2 ̂yS ̂y = σ(X * weight + bias)
  • 8. perceptron 선형모형의 한계를 어떻게 극복할 수 있을까 선형모형은 모형의 수용력이 선형함수로만 국한됨. 이 결함 때문에 선형모형은 임의의 두 입력변수 사이의 상호작용을 이해할 수 없음 선형모형을 x의 비선형 함수들로 확장하는 방법은 선형모형을 x자체가 아니라 변환된 에 적용하는 것이다. kernel의 개념과 유사! ϕ(x)
  • 10. Example : XOR case x1 x2 ̂yS x1 x2 ̂yS x1 x2 ̂yS
  • 11. Example : XOR case XOR 0 0 0 1 0 0 0 1 1 1 0 1 1 1 0 x1 x2 y1 y2 ̂y x1 x2 y1S x1 x2 y2S y1 y2 ̂yS W = [ 5 5] , b = − 8 W = [ −7 −7], b = 3 W = [ −11 −11] , b = 6
  • 12. Example : XOR case XOR 0 0 0 1 0 0 0 1 0 0 1 1 1 0 1 1 1 0 x1 x2 y1 y2 ̂y x1 x2 y1S x1 x2 y2S y1 y2 ̂yS W = [ −7 −7], b = 3 W = [ −11 −11] , b = 6W = [ 5 5] , b = − 8
  • 13. Example : XOR case XOR 0 0 0 1 0 0 0 1 0 0 1 1 1 0 0 0 1 1 1 1 0 x1 x2 y1 y2 ̂y x1 x2 y1S x1 x2 y2S y1 y2 ̂yS W = [ −7 −7], b = 3 W = [ −11 −11] , b = 6W = [ 5 5] , b = − 8
  • 14. Example : XOR case XOR 0 0 0 1 0 0 0 1 0 0 1 1 1 0 0 0 1 1 1 1 1 0 0 0 x1 x2 y1 y2 ̂y x1 x2 y1S x1 x2 y2S y1 y2 ̂yS W = [ −7 −7], b = 3 W = [ −11 −11] , b = 6W = [ 5 5] , b = − 8
  • 15. Example : XOR case + - - x1 x2 y1 y2 선형모형을 x의 비선형 함수들로 확장하는 방법은 선형모형을 x자체가 아니라 변환된 에 적용하는 것이다. kernel의 개념과 유사! ϕ(x)
  • 17. Example : XOR case x1 x2 ̂y 다른 W와 b 구할수 있을까? S S S W = [ −7 −7], b = 3 W = [ −11 −11] , b = 6 W = [ 5 5] , b = − 8
  • 18. Example : XOR case ̂YX W1 = [ 5 −7 5 −7] B1 = [ −8 3 ] W2 = [ −11 −11] b2 = 6 XW1 + B1 XW2 + b2S S k(X) = σ(XW1 + B1) ̂Y = H(X) = σ(k(X)W2 + b2) K f(3) (f(2) (f(1) (x))) ≈ f*(x)
  • 22. Gradient-Based Learning Cost Function의 Global optimum을 찾기는 어렵다!
  • 23. Gradient-Based Learning 딥러닝에서는 layer가 쌓이면서 추정해야 할 모수들이 기하급수적으로 늘어나므로 수치적 추정은 거의 불가능하다. 적절한 Local optimum을 목표로 한다.
  • 24. Gradient-Based Learning Cost Function 딥 뉴럴 네트워크의 비용함수도 선형 모델과 별다르지 않다!
  • 25. Gradient-Based Learning Output Units Binary Multinomial Algorithm Logistic Softmax cost function Logistic cost function Cross Entropy formula C : (H(x), y) = − y log(H(x)) − (1 − y)log(1 − H(x)) C : (H(X), Y) = − ∑ i Yi log(H(Xi))
  • 29. Hidden Units ̂YX S S SS Sigmoid !
  • 30. Hidden Units ̂YX R R SR g(x) = max(0,x) ReLU !