발표자: 최윤제(고려대 석사과정)
최윤제 (Yunjey Choi)는 고려대학교에서 컴퓨터공학을 전공하였으며, 현재는 석사과정으로 Machine Learning을 공부하고 있는 학생이다. 코딩을 좋아하며 이해한 것을 다른 사람들에게 공유하는 것을 좋아한다. 1년 간 TensorFlow를 사용하여 Deep Learning을 공부하였고 현재는 PyTorch를 사용하여 Generative Adversarial Network를 공부하고 있다. TensorFlow로 여러 논문들을 구현, PyTorch Tutorial을 만들어 Github에 공개한 이력을 갖고 있다.
개요:
Generative Adversarial Network(GAN)은 2014년 Ian Goodfellow에 의해 처음으로 제안되었으며, 적대적 학습을 통해 실제 데이터의 분포를 추정하는 생성 모델입니다. 최근 들어 GAN은 가장 인기있는 연구 분야로 떠오르고 있고 하루에도 수 많은 관련 논문들이 쏟아져 나오고 있습니다.
수 없이 쏟아져 나오고 있는 GAN 논문들을 다 읽기가 힘드신가요? 괜찮습니다. 기본적인 GAN만 완벽하게 이해한다면 새로 나오는 논문들도 쉽게 이해할 수 있습니다.
이번 발표를 통해 제가 GAN에 대해 알고 있는 모든 것들을 전달해드리고자 합니다. GAN을 아예 모르시는 분들, GAN에 대한 이론적인 내용이 궁금하셨던 분들, GAN을 어떻게 활용할 수 있을지 궁금하셨던 분들이 발표를 들으면 좋을 것 같습니다.
발표영상: https://youtu.be/odpjk7_tGY0
발표자: 이활석(NAVER)
발표일: 2017.11.
최근 딥러닝 연구는 지도학습에서 비지도학습으로 급격히 무게 중심이 옮겨 지고 있습니다. 본 과정에서는 비지도학습의 가장 대표적인 방법인 오토인코더의 모든 것에 대해서 살펴보고자 합니다. 차원 축소관점에서 가장 많이 사용되는Autoencoder와 (AE) 그 변형 들인 Denoising AE, Contractive AE에 대해서 공부할 것이며, 데이터 생성 관점에서 최근 각광 받는 Variational AE와 (VAE) 그 변형 들인 Conditional VAE, Adversarial AE에 대해서 공부할 것입니다. 또한, 오토인코더의 다양한 활용 예시를 살펴봄으로써 현업과의 접점을 찾아보도록 노력할 것입니다.
1. Revisit Deep Neural Networks
2. Manifold Learning
3. Autoencoders
4. Variational Autoencoders
5. Applications
발표자: 최윤제(고려대 석사과정)
최윤제 (Yunjey Choi)는 고려대학교에서 컴퓨터공학을 전공하였으며, 현재는 석사과정으로 Machine Learning을 공부하고 있는 학생이다. 코딩을 좋아하며 이해한 것을 다른 사람들에게 공유하는 것을 좋아한다. 1년 간 TensorFlow를 사용하여 Deep Learning을 공부하였고 현재는 PyTorch를 사용하여 Generative Adversarial Network를 공부하고 있다. TensorFlow로 여러 논문들을 구현, PyTorch Tutorial을 만들어 Github에 공개한 이력을 갖고 있다.
개요:
Generative Adversarial Network(GAN)은 2014년 Ian Goodfellow에 의해 처음으로 제안되었으며, 적대적 학습을 통해 실제 데이터의 분포를 추정하는 생성 모델입니다. 최근 들어 GAN은 가장 인기있는 연구 분야로 떠오르고 있고 하루에도 수 많은 관련 논문들이 쏟아져 나오고 있습니다.
수 없이 쏟아져 나오고 있는 GAN 논문들을 다 읽기가 힘드신가요? 괜찮습니다. 기본적인 GAN만 완벽하게 이해한다면 새로 나오는 논문들도 쉽게 이해할 수 있습니다.
이번 발표를 통해 제가 GAN에 대해 알고 있는 모든 것들을 전달해드리고자 합니다. GAN을 아예 모르시는 분들, GAN에 대한 이론적인 내용이 궁금하셨던 분들, GAN을 어떻게 활용할 수 있을지 궁금하셨던 분들이 발표를 들으면 좋을 것 같습니다.
발표영상: https://youtu.be/odpjk7_tGY0
발표자: 이활석(NAVER)
발표일: 2017.11.
최근 딥러닝 연구는 지도학습에서 비지도학습으로 급격히 무게 중심이 옮겨 지고 있습니다. 본 과정에서는 비지도학습의 가장 대표적인 방법인 오토인코더의 모든 것에 대해서 살펴보고자 합니다. 차원 축소관점에서 가장 많이 사용되는Autoencoder와 (AE) 그 변형 들인 Denoising AE, Contractive AE에 대해서 공부할 것이며, 데이터 생성 관점에서 최근 각광 받는 Variational AE와 (VAE) 그 변형 들인 Conditional VAE, Adversarial AE에 대해서 공부할 것입니다. 또한, 오토인코더의 다양한 활용 예시를 살펴봄으로써 현업과의 접점을 찾아보도록 노력할 것입니다.
1. Revisit Deep Neural Networks
2. Manifold Learning
3. Autoencoders
4. Variational Autoencoders
5. Applications
Revised presentation slide for NLP-DL, 2016/6/22.
Recent Progress (from 2014) in Recurrent Neural Networks and Natural Language Processing.
Profile http://www.cl.ecei.tohoku.ac.jp/~sosuke.k/
Japanese ver. https://www.slideshare.net/hytae/rnn-63761483
Social networks are not new, even though websites like Facebook and Twitter might make you want to believe they are; and trust me- I’m not talking about Myspace! Social networks are extremely interesting models for human behavior, whose study dates back to the early twentieth century. However, because of those websites, data scientists have access to much more data than the anthropologists who studied the networks of tribes!
Because networks take a relationship-centered view of the world, the data structures that we will analyze model real world behaviors and community. Through a suite of algorithms derived from mathematical Graph theory we are able to compute and predict behavior of individuals and communities through these types of analyses. Clearly this has a number of practical applications from recommendation to law enforcement to election prediction, and more.
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016Taehoon Kim
발표 영상 : https://goo.gl/jrKrvf
데모 영상 : https://youtu.be/exXD6wJLJ6s
Deep Q-Network, Double Q-learning, Dueling Network 등의 기술을 소개하며, hyperparameter, debugging, ensemble 등의 엔지니어링으로 성능을 끌어 올린 과정을 공유합니다.
안녕하세요.
이번에 '1st 함께하는 딥러닝 컨퍼런스'에서 "안.전.제.일. 강화학습"이란 주제로 발표한 이동민이라고 합니다.
컨퍼런스 관련 링크는 다음과 같습니다.
https://tykimos.github.io/2018/06/28/ISS_1st_Deep_Learning_Conference_All_Together/
그리고 대략적인 개요는 다음과 같습니다.
1. What is Artificial Intelligence?
2. What is Reinforcement Learning?
3. What is Artificial General Intelligence?
4. Planning and Learning
5. Safe Reinforcement Learning
또한 이 자료에는 "Imagination-Augmented Agents for Deep Reinforcement Learning"이라는 논문을 자세히 설명하였습니다.
많은 분들이 보시고 도움이 되셨으면 좋겠습니다~!
Numerical solution of a system of linear equations by
1) LU FACTORIZATION METHOD.
2) GAUSS ELIMINATION METHOD.
3) MATRIX INVERSION BY GAUSS ELIMINATION METHOD.
Revised presentation slide for NLP-DL, 2016/6/22.
Recent Progress (from 2014) in Recurrent Neural Networks and Natural Language Processing.
Profile http://www.cl.ecei.tohoku.ac.jp/~sosuke.k/
Japanese ver. https://www.slideshare.net/hytae/rnn-63761483
Social networks are not new, even though websites like Facebook and Twitter might make you want to believe they are; and trust me- I’m not talking about Myspace! Social networks are extremely interesting models for human behavior, whose study dates back to the early twentieth century. However, because of those websites, data scientists have access to much more data than the anthropologists who studied the networks of tribes!
Because networks take a relationship-centered view of the world, the data structures that we will analyze model real world behaviors and community. Through a suite of algorithms derived from mathematical Graph theory we are able to compute and predict behavior of individuals and communities through these types of analyses. Clearly this has a number of practical applications from recommendation to law enforcement to election prediction, and more.
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016Taehoon Kim
발표 영상 : https://goo.gl/jrKrvf
데모 영상 : https://youtu.be/exXD6wJLJ6s
Deep Q-Network, Double Q-learning, Dueling Network 등의 기술을 소개하며, hyperparameter, debugging, ensemble 등의 엔지니어링으로 성능을 끌어 올린 과정을 공유합니다.
안녕하세요.
이번에 '1st 함께하는 딥러닝 컨퍼런스'에서 "안.전.제.일. 강화학습"이란 주제로 발표한 이동민이라고 합니다.
컨퍼런스 관련 링크는 다음과 같습니다.
https://tykimos.github.io/2018/06/28/ISS_1st_Deep_Learning_Conference_All_Together/
그리고 대략적인 개요는 다음과 같습니다.
1. What is Artificial Intelligence?
2. What is Reinforcement Learning?
3. What is Artificial General Intelligence?
4. Planning and Learning
5. Safe Reinforcement Learning
또한 이 자료에는 "Imagination-Augmented Agents for Deep Reinforcement Learning"이라는 논문을 자세히 설명하였습니다.
많은 분들이 보시고 도움이 되셨으면 좋겠습니다~!
Numerical solution of a system of linear equations by
1) LU FACTORIZATION METHOD.
2) GAUSS ELIMINATION METHOD.
3) MATRIX INVERSION BY GAUSS ELIMINATION METHOD.
Mathematics (from Greek μάθημα máthēma, “knowledge, study, learning”) is the study of topics such as quantity (numbers), structure, space, and change. There is a range of views among mathematicians and philosophers as to the exact scope and definition of mathematics
Conformable Chebyshev differential equation of first kindIJECEIAES
In this paper, the Chebyshev-I conformable differential equation is considered. A proper power series is examined; there are two solutions, the even solution and the odd solution. The Rodrigues’ type formula is also allocated for the conformable Chebyshev-I polynomials.
Optimal multi-configuration approximation of an N-fermion wave functionjiang-min zhang
We propose a simple iterative algorithm to construct the optimal multi-configuration approximation of an N-fermion wave function. That is, M≥N single-particle orbitals are sought iteratively so that the projection of the given wave function in the CNM-dimensional configuration subspace is maximized. The algorithm has a monotonic convergence property and can be easily parallelized. The significance of the algorithm on the study of entanglement in a multi-fermion system and its implication on the multi-configuration time-dependent Hartree-Fock (MCTDHF) are discussed. The ground state and real-time dynamics of spinless fermions with nearest-neighbor interactions are studied using this algorithm, discussing several subtleties.
Image sciences, image processing, image restoration, photo manipulation. Image and videos representation. Digital versus analog imagery. Quantization and sampling. Sources and models of noises in digital CCD imagery: photon, thermal and readout noises. Sources and models of blurs. Convolutions and point spread functions. Overview of other standard models, problems and tasks: salt-and-pepper and impulse noises, half toning, inpainting, super-resolution, compressed sensing, high dynamic range imagery, demosaicing. Short introduction to other types of imagery: SAR, Sonar, ultrasound, CT and MRI. Linear and ill-posed restoration problems.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
9. 벡터함수인경우..
F : R →R i.e.
F(x) = (f (x), … , f (x))
∇F(x) = (f (x), … , f (x))
Note : ∇F 는(1, d)‑행벡터입니다.
d
1 d
1
′
d
′
10. VIP for vector‑valued function
Linearity
∇(F + G) = ∇F + ∇G
Product rule : F G ∈ R
∇(F G) = (∇F)G + (∇G)F
Chain rule (f : R → R)
∇(f(G)(x)) = (∇G(x))(∇f(G(x))) = g (x) (G(x))
T
T
d
i=1
∑
d
i
′
∂yi
∂f
11. ∇(f(G)) = = = g (G)
dx
dz
i=1
∑
d
dx
dyi
∂yi
∂z
i=1
∑
d
i
′
∂yi
∂f
12. Vector‑valued multivariable 함수인경우..
F : R →R i.e.
F(x) = (f (x), … , f (x))
∇F = (∇f , … , ∇f ) =
Note : ∇F 는(n, m)‑행렬이다!
n m
1 m
1 m
⎣
⎡∂ fx1 1
⋮
∂ fxn 1
∂ fx1 2
⋮
∂ fxn 2
⋯
⋮
⋯
∂ fx1 m
⋮
∂ fxn m
⎦
⎤
13. Linearity
∇(F + G) = ∇F + ∇G
Product rule : F G ∈ R
∇(F G) = (∇F)G + (∇G)F
Chain rule : G :R →R , F :R →R
∇(F(G)(x)) = (∇G(x))(∇F(G(x)))
T
T
n k k m
23. 최소값을구하려면또미분해야합니다!
∇ ∥Xβ − Y∥β 2
2
= ∇ (Xβ) Xβ − (X Y) β − β X Y +Y Yβ ( T T T T T T
)
= ∇ Xβ Xβ + ∇ Xβ Xβ − (X Y) −X Yβ [ ] β [ ] T T
= 2X Xβ − 2X Y = 0T T
24. ∴ X Xβ =X Y
만약k ≤ n 이고 다중공선성이없다면rank(X X) = k 이므로X X
는역행렬이존재한다:
∴ β = (X X) X Y
참고로아까전엔...
β = (x x) x y
... 선형대수는그저차원맞춤법일뿐!
T T
T T
∗ T −1 T
∗ T −1 T
27. Variables
Input : x =z
Output in Hidden Layer ℓ : z = f (W z )
z = f W z , i ∈ {1, … , d }
Output : = f (W z )
(0)
(ℓ)
(ℓ)
(ℓ) (ℓ−1)
i
(ℓ) (ℓ)
(
j=1
∑
dℓ
ij
(ℓ)
j
(ℓ−1)
) ℓ
y^ (L)
(L) (L−1)
31. =L (z )
∂Wℓ
∂L ′
L
∂Wℓ
∂zL
=L (z )f (W z )W′
L
′
L L−1 L
∂Wℓ
∂zL−1
32. =L (z )
∂Wℓ
∂L ′
L
∂Wℓ
∂zL
=L (z )f (W z )W′
L
′
L L−1 L
∂Wℓ
∂zL−1
= δ WL L
∂Wℓ
∂zL−1
33. =L (z )
∂Wℓ
∂L ′
L
∂Wℓ
∂zL
=L (z )f (W z )W′
L
′
L L−1 L
∂Wℓ
∂zL−1
= δ WL L
∂Wℓ
∂zL−1
= δ W f (W z )WL L
′
L−1 L−2 L−1
∂Wℓ
∂zL−2
34. =L (z )
∂Wℓ
∂L ′
L
∂Wℓ
∂zL
=L (z )f (W z )W′
L
′
L L−1 L
∂Wℓ
∂zL−1
= δ WL L
∂Wℓ
∂zL−1
= δ W f (W z )WL L
′
L−1 L−2 L−1
∂Wℓ
∂zL−2
= δ WL−1 L−1
∂Wℓ
∂zL−2
35. =L (z )
∂Wℓ
∂L ′
L
∂Wℓ
∂zL
=L (z )f (W z )W′
L
′
L L−1 L
∂Wℓ
∂zL−1
= δ WL L
∂Wℓ
∂zL−1
= δ W f (W z )WL L
′
L−1 L−2 L−1
∂Wℓ
∂zL−2
= δ WL−1 L−1
∂Wℓ
∂zL−2
= ⋯
= δ Wℓ+1 ℓ+1
∂Wℓ
∂zℓ
= δ W f (W z )zℓ+1 ℓ+1
′
ℓ ℓ−1 ℓ−1
36. =L (z )
∂Wℓ
∂L ′
L
∂Wℓ
∂zL
=L (z )f (W z )W′
L
′
L L−1 L
∂Wℓ
∂zL−1
= δ WL L
∂Wℓ
∂zL−1
= δ W f (W z )WL L
′
L−1 L−2 L−1
∂Wℓ
∂zL−2
= δ WL−1 L−1
∂Wℓ
∂zL−2
= ⋯
= δ Wℓ+1 ℓ+1
∂Wℓ
∂zℓ
= δ W f (W z )z = δ zℓ+1 ℓ+1
′
ℓ ℓ−1 ℓ−1 ℓ ℓ−1
48. Kronecker product
A : (n, m)‑행렬, B : (p, q)‑행렬
A ⊗ B =
A ⊗ B 는(np, mq)‑행렬이다
⎣
⎢
⎢
⎡a B11
a B21
⋮
a Bn1
a B12
a B22
⋮
a Bn2
⋯
⋯
⋱
⋯
a B1m
a B2m
⋮
a Bnm
⎦
⎥
⎥
⎤
49. = b ⊗I
I = np.eye(n) # (n,n)-Identity matrix
b = np.array([[b1],..,[bm]]) # (m,1)-Column vector
np.kron(b,I) # (mn,n)-matrix : Kronecker product
∂X
∂(Xb)
n
50. (재도전) Derivation of Back‑propagation Algorythm
:= = ∇ L
(d d , 1) = (d d , d ) × (d , 1)
∂Wℓ
∂L( )y^
∂(vec(W ))ℓ
∂L( )y^
∂(vec(W ))ℓ
∂y^
y^
ℓ ℓ−1 ℓ ℓ−1 L L
53. 여기서∇f 는다음과 같은(d , d )‑대각행렬입니다
∇f =
=
∂Wℓ
∂y^
∂(vec(W ))ℓ
∂f (W z )L L (L−1)
= ∇f
∂(vec(W) )ℓ
∂(W z )L (L−1)
L
L L L
⎣
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎡
f W z′
(
k=1
∑
dL
1k
(L)
k
(L−1)
)
⋱
f W z′
(
k=1
∑
dL
d kL
(L)
k
(L−1)
)
⎦
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎤
54. ∇f 와∇ L 이곱해지면다음과 같은(d , 1) 행렬이나온다
= diag(∇f ) ⊙ ∇ L
L y L
⎣
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎡
f W z′
(
k=1
∑
dL
1k
(L)
k
(L−1)
)
∂y1
∂L
⋮
f W z′
(
k=1
∑
dL
1k
(L)
k
(L−1)
)
∂ydL
∂L
⎦
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎤
L y
56. 그러므로... (만약ℓ = L 인경우)
Dimension check :
(d d , 1) = (d d , d ) × (d , 1)
∂WL
∂L( )y^
= diag(∇f ) ⊙ ∇ L
∂(vec(W ))L
∂(W z )L (L−1)
[ L y ]
=z ⊗I δ(L−1) dL L
L L−1 L L−1 L−1 L
57. 그러므로... (만약ℓ ≠ L 인경우)
Dimension check :
(d d , d ) × (d , d ) × (d , 1)
∂Wℓ
∂L( )y^
= diag(∇f ) ⊙ ∇ L
∂(vec(W ))ℓ
∂(W z )L (L−1)
[ L y ]
= W δ
∂(vec(W ))ℓ
∂z(L−1)
L
T
L
ℓ ℓ−1 L−1 L−1 L L
59. 뭐한번해보죠...
∂Wℓ
∂L( )y^
= W δ
∂(vec(W ))ℓ
∂z(L−1)
L
T
L
= W δ
∂(vec(W ))ℓ
∂f (W z )L−1 L−1 (L−2)
L
T
L
= diag(∇f ) ⊙W δ
∂(vec(W ))ℓ
∂(W z )L−1 (L−2)
[ L−1 L
T
L]
= W δ
∂(vec(W ))ℓ
∂z(L−2)
L−1
T
L−1
60. 계속하세요
=
∂Wℓ
∂L( )y^
W δ
∂(vec(W ))ℓ
∂z(L−2)
L−1
T
L−1
= W δ
∂(vec(W ))ℓ
∂f (W z )L−2 L−2 (L−3)
L−1
T
L−1
= diag(∇f ) ⊙W δ
∂(vec(W ))ℓ
∂(W z )L−2 (L−3)
[ L−2 L−1
T
L−1]
= W δ
∂(vec(W ))ℓ
∂z(L−3)
L−2
T
L−2
66. Back‑propagation via Matrix Calculus
= ∇f W ∇f ∇ L( )z
Dimension check :
∂Wℓ
∂L( )y^
ℓ
⎝
⎛
j=ℓ+1
∏
dL
j
T
j
⎠
⎞
y y^ (ℓ−1)
T
(d , d ) = (d , d ) × (d , d ) × (d , d )ℓ ℓ−1 ℓ ℓ
⎝
⎛
j=ℓ+1
∏
dL
j−1 j j j
⎠
⎞
×(d , 1) × (1, d )L ℓ−1
67. Back‑propagation Algorythm
위를이용해W , … ,W 를업데이트할수있다
W ← W − α =W − αδ z
δL
δℓ
δ1
= [diag(∇f (W z )) ⊙ ∇ L( )]L L (L−1) y y^
⋮
=W [diag(∇f (W z )) ⊙ δ ]ℓ+1
T
ℓ ℓ (ℓ−1) ℓ+1
⋮
=W [diag(∇f (W x)) ⊙ δ ]2
T
1 1 2
1 L
ℓ ℓ
∂Wℓ
∂L
ℓ ℓ (ℓ−1)
T