SlideShare a Scribd company logo
SCPARK
Diffusion모델부터 DALL·E 2까지
STE @


- GSEP: Music Source Separation


- GTS: Music & Lyrics Synchronization


- ? : Sound Generative Models
박수철 @


- Text-To-Speech


- Voice Cloning


- Voice Conversion
박수철 @


- Diffusion/Score-based models


- 음성인식과 음성합성


- 타코트론의 모든 것


- Deep generative models
GSEP, GTS 데모
GSEP, GTS 데모
JTBC 개표방송
JTBC 개표방송
Diffusion모델부터 DALL·E 2까지
- Diffusion Model


Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." International Conference on Machine Learning.
PMLR, 2015.
- DDPM


Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising diffusion probabilistic models." Advances in Neural Information Processing Systems 33
(2020): 6840-6851.
- CLIP


Radford, Alec, et al. "Learning transferable visual models from natural language supervision." International Conference on Machine Learning. PMLR,
2021.
- GLIDE


Nichol, Alex, et al. "Glide: Towards photorealistic image generation and editing with text-guided diffusion models." arXiv preprint arXiv:2112.10741
(2021).
- DALL·E 2


Ramesh, Aditya, et al. "Hierarchical text-conditional image generation with clip latents." arXiv preprint arXiv:2204.06125 (2022).
- Guided Diffusion Sampling


Dhariwal, Prafulla, and Alexander Nichol. "Diffusion models beat gans on image synthesis." Advances in Neural Information Processing Systems 34
(2021): 8780-8794.
- Classifier-free diffusion guidance


Ho, Jonathan, and Tim Salimans. "Classifier-free diffusion guidance." NeurIPS 2021 Workshop on Deep Generative Models and Downstream
Applications. 2021.
Generative Model
Generative Model
- Generative model은 dataset의 probability distribution을 학습하고 sampling하
는 것
Auto-Regressive Model
pθ(x) =
n2
∏
i=1
pθ (xi ∣ x1, …, xi−1)
Van den Oord, Aaron, et al. "Conditional image generation with pixelcnn decoders."


Advances in neural information processing systems 29 (2016).
Variational Auto-Encoder
pθ(x) =
∫
pθ(x ∣ z)pθ(z)dz
https://en.wikipedia.org/wiki/Variational_autoencoder
Generative Model
Flow-based Model Generative Adversarial Networks
- Generative model은 dataset의 probability distribution을 학습하고 sampling하
는 것
Lil'Log, Flow-based Deep Generative Models


https://lilianweng.github.io/posts/2018-10-13-flow-models/
pθ(x) = pθ(z)|det(dz/dx)|
𝔼
x∼p
data (x)[log D(x)] +
𝔼
z∼pz(z)[log(1 − D(G(z)))]
Goodfellow, Ian, et al. "Generative adversarial nets."


Advances in neural information processing systems 27 (2014).
Diffusion Model


DDPM
Diffusion Model
- Sohl-Dickstein, Jascha의 논문 Deep unsupervised learning using
nonequilibrium thermodynamics에서 제안
Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." International Conference on Machine Learning. PMLR, 2015.
Di
ff
usion

https://en.wikipedia.org/wiki/Di
ff
usion

(Flipped)
Diffusion Model
Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." International Conference on Machine Learning. PMLR, 2015.
q (x1:T ∣ x0) :=
T
∏
t=1
q (xt ∣ xt−1), q (xt ∣ xt−1, x0) :=
𝒩
(xt; 1 − βtxt−1, βtI)
Forward Process
Posterior q (xt−1 ∣ xt, x0) =
𝒩
(xt−1; μ̃t (xt, x0), β̃tI)
 where  μ̃t (xt, x0) :=
ᾱt−1 βt
1 − ᾱt
x0 +
αt (1 − ᾱt−1)
1 − ᾱt
xt  and  β̃t :=
1 − ᾱt−1
1 − ᾱt
βt
pθ (xt−1 ∣ xt) :=
𝒩
(xt−1; μθ (xt, t), Σθ (xt, t))
Backward Process


(Neural Networks)
Loss Function DKL (q (xt−1 ∣ xt, x0) ∥pθ (xt−1 ∣ xt))
Diffusion Model - Forward Process
Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." International Conference on Machine Learning. PMLR, 2015.
Forward Process
xt−1
xt
Distribution of at an arbitrary timestep t in closed form
xt
q (xt ∣ x0) =
𝒩
(xt; ᾱtx0, (1 − ᾱt) I) , where  αt := 1 − βt and ᾱt :=
t
∏
s=1
αs
식 유도는Lil'Log 참고

https://lilianweng.github.io/posts/2021-07-11-di
ff
usion-models/
q (x1:T ∣ x0) :=
T
∏
t=1
q (xt ∣ xt−1), q (xt ∣ xt−1, x0) :=
𝒩
(xt; 1 − βtxt−1, βtI)
Diffusion Model - Forward Process
Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." International Conference on Machine Learning. PMLR, 2015.
MNIST single data, β = 0.2, T = 10
Swiss roll dataset, β = 0.05, T = 10
Diffusion Model - Posterior
Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." International Conference on Machine Learning. PMLR, 2015.
Posterior q (xt−1 ∣ xt, x0) =
𝒩
(xt−1; μ̃t (xt, x0), β̃tI)
 where  μ̃t (xt, x0) :=
ᾱt−1 βt
1 − ᾱt
x0 +
αt (1 − ᾱt−1)
1 − ᾱt
xt  and  β̃t :=
1 − ᾱt−1
1 − ᾱt
βt
q(xt−1 ∣ xt, x0) =
q(xt−1 ∣ x0)q(xt ∣ xt−1, x0)
q(xt ∣ x0)
by Bayes' Rule
Forward Process q (x1:T ∣ x0) :=
T
∏
t=1
q (xt ∣ xt−1), q (xt ∣ xt−1, x0) :=
𝒩
Diffusion Model - Backward Process
Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." International Conference on Machine Learning. PMLR, 2015.
Forward Process
Posterior q (xt−1 ∣ xt, x0) =
𝒩
 where  μ̃t (xt, x0) :=
ᾱt−1 βt
1 − ᾱt
x0 +
αt (1 − ᾱt−1)
1 − ᾱt
xt  and  β̃t :=
1 − ᾱt−1
1 − ᾱt
βt
pθ (xt−1 ∣ xt) :=
𝒩
(xt−1; μθ (xt, t), Σθ (xt, t))
Backward Process


(Neural Networks)
Loss Function DKL (q (xt−1 ∣ xt, x0) ∥pθ (xt−1 ∣ xt))
U-net
xt
μθ (xt, t),
Σθ (xt, t)
t
q (x1:T ∣ x0) :=
T
∏
t=1
q (xt ∣ xt−1), q (xt ∣ xt−1, x0) :=
𝒩
Diffusion Model - Loss Function
Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." International Conference on Machine Learning. PMLR, 2015.
Forward Process
Posterior q (xt−1 ∣ xt, x0) =
𝒩
 where  μ̃t (xt, x0) :=
ᾱt−1 βt
1 − ᾱt
x0 +
αt (1 − ᾱt−1)
1 − ᾱt
xt  and  β̃t :=
1 − ᾱt−1
1 − ᾱt
βt
pθ (xt−1 ∣ xt) :=
𝒩
Backward Process
𝒩
Diffusion Model - Output Samples
Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." International Conference on Machine Learning. PMLR, 2015.
DDPM (Denoising Diffusion Probabilistic Models)
Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising diffusion probabilistic models." Advances in Neural Information Processing Systems 33 (2020): 6840-6851.
- Jonathan Ho의 논문 Denoising diffusion probabilistic models에서 제안
Distribution of at an arbitrary timestep t in closed form
xt
x0
ϵ ∼
𝒩
(0, I)
xt (x0, ϵ) = ᾱtx0 + 1 − ᾱtϵ
Lsimple (θ) :=
𝔼
t,x0,ϵ
[
ϵ − ϵθ ( ᾱtx0 + 1 − ᾱtϵ, t)
2
] is a linear combination of and
xt x0 ϵ
q (xt ∣ x0) =
𝒩
(xt; ᾱtx0, (1 − ᾱt) I) , where  αt := 1 − βt and ᾱt :=
t
∏
s=1
αs
Posterior q (xt−1 ∣ xt, x0) =
𝒩
(xt−1; μ̃t (xt, x0), β̃tI)
Loss Function
https://github.com/rosinality/denoising-di
ff
usion-pytorch/blob/master/
di
ff
usion.py
generate ϵ
sample xt
predict ϵ
Predict (or ) at each step
ϵ x0
DDPM (Denoising Diffusion Probabilistic Models)
Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising diffusion probabilistic models." Advances in Neural Information Processing Systems 33 (2020): 6840-6851.
- Jonathan Ho의 논문 Denoising diffusion probabilistic models에서 제안
Distribution of at an arbitrary timestep t in closed form
xt
x0
ϵ ∼
𝒩
(0, I)
xt (x0, ϵ) = ᾱtx0 + 1 − ᾱtϵ
Lsimple (θ) :=
𝔼
t,x0,ϵ
[
ϵ − ϵθ ( ᾱtx0 + 1 − ᾱtϵ, t)
2
] is a linear combination of and
xt x0 ϵ
q (xt ∣ x0) =
𝒩
(xt; ᾱtx0, (1 − ᾱt) I) , where  αt := 1 − βt and ᾱt :=
t
∏
s=1
αs
Posterior q (xt−1 ∣ xt, x0) =
𝒩
(xt−1; μ̃t (xt, x0), β̃tI)
Loss Function
https://github.com/rosinality/denoising-di
ff
usion-pytorch/blob/master/
di
ff
usion.py
generate ϵ
sample xt
predict ϵ
Predict (or ) at each step
ϵ x0
DDPM - Output Samples
Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising diffusion probabilistic models." Advances in Neural Information Processing Systems 33 (2020): 6840-6851.
https://github.com/yang-song/score_sde_pytorch/


Song, Yang, et al. "Score-based generative modeling through stochastic differential equations." arXiv preprint arXiv:2011.13456 (2020).
DDPM - Output Samples
Diffusion vs. ...
"We emphasize that our objective Eq. (6) requires no adversarial training, no surrogate losses, and
nosampling from the score network during training (e.g., unlike contrastive divergence). Also, it does not
require sθ(x, σ) to have special architectures in order to be tractable."


Song, Yang, and Stefano Ermon. "Generative modeling by estimating gradients of the data distribution." Advances in Neural Information Processing Systems 32
(2019).
"We present a novel way to define probabilistic models that allows:


1. extreme flexibility in model structure,
2. exact sampling,
3. easy multiplication with other distributions, e.g. in order to compute a posterior, and
4. the model log likelihood, and the probability of individual states, to be cheaply evaluated."
Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." International Conference on Machine Learning. PMLR, 2015.
Diffusion vs. ...
Tractability
Flexibility
Auto-


Regressive
VAE Flow GAN Diffusion
Good Good Good Not Good Good
Not Good




Causal structure
Fixed distribution
Not Good


Dimension reduction
Fixed distribution
Likelihood


can't be evaluated
Not Good


Invertible structure


Fixed distribution
Good Good
GLIDE


CLIP


DALL·E 2
GLIDE - Output Samples
Nichol, Alex, et al. "Glide: Towards photorealistic image generation and editing with text-guided diffusion models." arXiv preprint arXiv:2112.10741 (2021).
- OpenAI에서 논문 Glide: Towards photorealistic image generation and editing
with text-guided diffusion models을 통해 제안
GLIDE - Overall Architecture
Nichol, Alex, et al. "Glide: Towards photorealistic image generation and editing with text-guided diffusion models." arXiv preprint arXiv:2112.10741 (2021).
A hedgehog using a calculator
Text Encoder
Encoding Sequence
ResBlock
Attention
Down Layer
...
...
Mid
...
GLIDE source : https://github.com/openai/glide-text2im
Down or Up
Up Layer
...
xt
ϵ
AdaIn or Add
Attention
U-net
Layer
(Batch, Channel, Length)
xt+1
xT
⋯ ⋯
xt−1 x0
⋯ ⋯
CLIP (Contrastive Language-Image Pre-training)
- (image, text) 쌍의 데이터로 self-supervised learning을 통해 text/image encoder
를 학습
Radford, Alec, et al. "Learning transferable visual models from natural language supervision." International Conference on Machine Learning. PMLR, 2021.
CLIP (Contrastive Language-Image Pre-training)
- (image, text) 쌍의 데이터로 self-supervised learning을 통해 text/image encoder
를 학습
Radford, Alec, et al. "Learning transferable visual models from natural language supervision." International Conference on Machine Learning. PMLR, 2021.
CLIP (Contrastive Language-Image Pre-training)
https://openai.com/blog/clip/
DALL·E 2 - Output Samples
- OpenAI에서 Hierarchical text-conditional image generation with clip latents 논
문을 통해 제안
Ramesh, Aditya, et al. "Hierarchical text-conditional image generation with clip latents." arXiv preprint arXiv:2204.06125 (2022).
DALL·E 2 - Overall Architecture
Ramesh, Aditya, et al. "Hierarchical text-conditional image generation with clip latents." arXiv preprint arXiv:2204.06125 (2022).
DALL·E 2 - Prior
- GLIDE text embedding과 CLIP text embedding으로부터 CLIP image embedding
을 생성
Diffusion Backbone


(Transformer Decoder)
Text encoding sequence


(Batch, Channel, Length)
CLIP text embedding
Diffusion timestep
Noised CLIP image embedding
Final embedding Unnoised CLIP image embedding
Ramesh, Aditya, et al. "Hierarchical text-conditional image generation with clip latents." arXiv preprint arXiv:2204.06125 (2022).
Guided Diffusion Sampling
Classifier-free Diffusion Guidance


CLIP Guidance
Guided Diffusion Sampling
- Diffusion Models Beat GANs on Image Synthesis 논문에서 제안
- Diffusion 모델 외 추가적인 image classifier를 학습시키고 sampling 과정에서 classifier로부터
gradient를 받아 sampling에 도움을 줌
Dhariwal, Prafulla, and Alexander Nichol. "Diffusion models beat gans on image synthesis." Advances in Neural Information Processing Systems 34 (2021): 8780-8794.
̂
μθ (xt ∣ y) = μθ (xt ∣ y) + s ⋅ Σθ (xt ∣ y)∇xt
log pϕ (y ∣ xt)
Posterior q (xt−1 ∣ xt, x0) =
𝒩
(xt−1; μ̃t (xt, x0), β̃tI)
Nichol, Alex, et al. "Glide: Towards photorealistic image generation and editing with text-guided diffusion models." arXiv preprint arXiv:2112.10741 (2021).
Nichol, Alex, et al.에서 재인용
수정된 mean classi
fi
er로부터 전해진 gradient
guidance

scale
기존 mean 기존 covariance
xt
μθ (xt ∣ y)
∇xt
log pϕ (y ∣ xt)
Guided Diffusion Sampling
Dhariwal, Prafulla, and Alexander Nichol. "Diffusion models beat gans on image synthesis." Advances in Neural Information Processing Systems 34 (2021): 8780-8794.
Classifier-free guidance
- Ho, Jonathan과 Tim Salimans의 논문 Classifier-free diffusion guidance에서 제안
- 추가적인 classifier를 트레이닝할 필요없이 diffusion 모델만 가지고 guided sampling을 가능하게 만
듬
Ho, Jonathan, and Tim Salimans. "Classifier-free diffusion guidance." NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications. 2021.
Nichol, Alex, et al. "Glide: Towards photorealistic image generation and editing with text-guided diffusion models." arXiv preprint arXiv:2112.10741 (2021).
Nichol, Alex, et al.에서 재인용
̂
ϵθ (xt ∣ y) = ϵθ (xt ∣ y) + s ⋅ (ϵθ (xt ∣ y) − ϵθ (xt ∣ ∅))
unconditional

predicted score
conditional

predicted score
ϵθ (xt ∣ y) ϵθ (xt ∣ y) − ϵθ (xt ∣ ∅) ≈ − σt ∇xt
log pi
(y ∣ xt)
Diffusion Backbone (U-net)
xt
, or
t y ∅
ϵθ
ϵθ (xt ∣ ∅)
guidance

scale
수정된 score
Classifier-free guidance
Ho, Jonathan, and Tim Salimans. "Classifier-free diffusion guidance." NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications. 2021.
CLIP Guidance
- Classifier를 이용한 guidance 방법과 비슷하게 CLIP 모델을 이용하여 sampling 단계
에서 도움을 줌
Nichol, Alex, et al. "Glide: Towards photorealistic image generation and editing with text-guided diffusion models." arXiv preprint arXiv:2112.10741 (2021).
̂
μθ (xt ∣ c) = μθ (xt ∣ c) + s ⋅ Σθ (xt ∣ c)∇xt(f (xt) ⋅ g(c))
CLIP

image encoding
CLIP

text encoding
xt
μθ (xt ∣ c)
∇xt(f (xt) ⋅ g(c))
Classfier-free Guidance vs. CLIP Guidance in GLIDE
Nichol, Alex, et al. "Glide: Towards photorealistic image generation and editing with text-guided diffusion models." arXiv preprint arXiv:2112.10741 (2021).
감사합니다 :)

More Related Content

What's hot

GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
ArangoDB Database
 
CVPR 2022 Tutorial에 대한 쉽고 상세한 Diffusion Probabilistic Model
CVPR 2022 Tutorial에 대한 쉽고 상세한 Diffusion Probabilistic ModelCVPR 2022 Tutorial에 대한 쉽고 상세한 Diffusion Probabilistic Model
CVPR 2022 Tutorial에 대한 쉽고 상세한 Diffusion Probabilistic Model
jaypi Ko
 
End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0
taeseon ryu
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
홍배 김
 
Introduction to Visual transformers
Introduction to Visual transformers Introduction to Visual transformers
Introduction to Visual transformers
leopauly
 
PR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic Models
Hyeongmin Lee
 
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
Ha Phuong
 
Cs231n 2017 lecture9 CNN Architecture
Cs231n 2017 lecture9 CNN ArchitectureCs231n 2017 lecture9 CNN Architecture
Cs231n 2017 lecture9 CNN Architecture
Yanbin Kong
 
상상을 현실로 만드는, 이미지 생성 모델을 위한 엔지니어링
상상을 현실로 만드는, 이미지 생성 모델을 위한 엔지니어링상상을 현실로 만드는, 이미지 생성 모델을 위한 엔지니어링
상상을 현실로 만드는, 이미지 생성 모델을 위한 엔지니어링
Taehoon Kim
 
論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models
Seiya Tokui
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
NAVER Engineering
 
Focal loss의 응용(Detection & Classification)
Focal loss의 응용(Detection & Classification)Focal loss의 응용(Detection & Classification)
Focal loss의 응용(Detection & Classification)
홍배 김
 
알기쉬운 Variational autoencoder
알기쉬운 Variational autoencoder알기쉬운 Variational autoencoder
알기쉬운 Variational autoencoder
홍배 김
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
Dongmin Choi
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
taeseon ryu
 
[DL輪読会]Pyramid Stereo Matching Network
[DL輪読会]Pyramid Stereo Matching Network[DL輪読会]Pyramid Stereo Matching Network
[DL輪読会]Pyramid Stereo Matching Network
Deep Learning JP
 
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
Yongho Ha
 
Brief intro : Invariance and Equivariance
Brief intro : Invariance and EquivarianceBrief intro : Invariance and Equivariance
Brief intro : Invariance and Equivariance
홍배 김
 
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
Seongyun Byeon
 
Towards Light-weight and Real-time Line Segment Detection
Towards Light-weight and Real-time Line Segment DetectionTowards Light-weight and Real-time Line Segment Detection
Towards Light-weight and Real-time Line Segment Detection
Byung Soo Ko
 

What's hot (20)

GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
 
CVPR 2022 Tutorial에 대한 쉽고 상세한 Diffusion Probabilistic Model
CVPR 2022 Tutorial에 대한 쉽고 상세한 Diffusion Probabilistic ModelCVPR 2022 Tutorial에 대한 쉽고 상세한 Diffusion Probabilistic Model
CVPR 2022 Tutorial에 대한 쉽고 상세한 Diffusion Probabilistic Model
 
End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...
 
Introduction to Visual transformers
Introduction to Visual transformers Introduction to Visual transformers
Introduction to Visual transformers
 
PR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic Models
 
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
007 20151214 Deep Unsupervised Learning using Nonequlibrium Thermodynamics
 
Cs231n 2017 lecture9 CNN Architecture
Cs231n 2017 lecture9 CNN ArchitectureCs231n 2017 lecture9 CNN Architecture
Cs231n 2017 lecture9 CNN Architecture
 
상상을 현실로 만드는, 이미지 생성 모델을 위한 엔지니어링
상상을 현실로 만드는, 이미지 생성 모델을 위한 엔지니어링상상을 현실로 만드는, 이미지 생성 모델을 위한 엔지니어링
상상을 현실로 만드는, 이미지 생성 모델을 위한 엔지니어링
 
論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
Focal loss의 응용(Detection & Classification)
Focal loss의 응용(Detection & Classification)Focal loss의 응용(Detection & Classification)
Focal loss의 응용(Detection & Classification)
 
알기쉬운 Variational autoencoder
알기쉬운 Variational autoencoder알기쉬운 Variational autoencoder
알기쉬운 Variational autoencoder
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
[DL輪読会]Pyramid Stereo Matching Network
[DL輪読会]Pyramid Stereo Matching Network[DL輪読会]Pyramid Stereo Matching Network
[DL輪読会]Pyramid Stereo Matching Network
 
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
 
Brief intro : Invariance and Equivariance
Brief intro : Invariance and EquivarianceBrief intro : Invariance and Equivariance
Brief intro : Invariance and Equivariance
 
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
 
Towards Light-weight and Real-time Line Segment Detection
Towards Light-weight and Real-time Line Segment DetectionTowards Light-weight and Real-time Line Segment Detection
Towards Light-weight and Real-time Line Segment Detection
 

Similar to diffusion 모델부터 DALLE2까지.pdf

Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Umberto Picchini
 
block-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdf
Junghyun Lee
 
Neural Networks with Anticipation: Problems and Prospects
Neural Networks with Anticipation: Problems and ProspectsNeural Networks with Anticipation: Problems and Prospects
Neural Networks with Anticipation: Problems and Prospects
SSA KPI
 
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Quantitative Propagation of Chaos for SGD in Wide Neural NetworksQuantitative Propagation of Chaos for SGD in Wide Neural Networks
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Valentin De Bortoli
 
Statistical Physics Studies of Machine Learning Problems by Lenka Zdeborova, ...
Statistical Physics Studies of Machine Learning Problems by Lenka Zdeborova, ...Statistical Physics Studies of Machine Learning Problems by Lenka Zdeborova, ...
Statistical Physics Studies of Machine Learning Problems by Lenka Zdeborova, ...
Paris Women in Machine Learning and Data Science
 
Metaheuristic Algorithms: A Critical Analysis
Metaheuristic Algorithms: A Critical AnalysisMetaheuristic Algorithms: A Critical Analysis
Metaheuristic Algorithms: A Critical Analysis
Xin-She Yang
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
Tomaso Aste
 
Lausanne 2019 #2
Lausanne 2019 #2Lausanne 2019 #2
Lausanne 2019 #2
Arthur Charpentier
 
Cyberinfrastructure for Einstein's Equations and Beyond
Cyberinfrastructure for Einstein's Equations and BeyondCyberinfrastructure for Einstein's Equations and Beyond
Cyberinfrastructure for Einstein's Equations and Beyond
University of Illinois at Urbana-Champaign
 
Lecture17 xing fei-fei
Lecture17 xing fei-feiLecture17 xing fei-fei
Lecture17 xing fei-fei
Tianlu Wang
 
Generative models : VAE and GAN
Generative models : VAE and GANGenerative models : VAE and GAN
Generative models : VAE and GAN
SEMINARGROOT
 
2022 03 22_蔡煒俊_u-net_convolutional_networks_for_biomedical_image_segmentation
2022 03 22_蔡煒俊_u-net_convolutional_networks_for_biomedical_image_segmentation2022 03 22_蔡煒俊_u-net_convolutional_networks_for_biomedical_image_segmentation
2022 03 22_蔡煒俊_u-net_convolutional_networks_for_biomedical_image_segmentation
KevinTsai67
 
Artificial Neural Networks for NIU
Artificial Neural Networks for NIUArtificial Neural Networks for NIU
Artificial Neural Networks for NIU
Prof. Neeta Awasthy
 
Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms
Xin-She Yang
 
On Continuum Limits of Markov Chains and Network Modeling
On Continuum Limits of Markov Chains and  Network ModelingOn Continuum Limits of Markov Chains and  Network Modeling
On Continuum Limits of Markov Chains and Network Modeling
Yang Zhang
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Fabian Pedregosa
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
Frank Nielsen
 
Csc446: Pattern Recognition
Csc446: Pattern Recognition Csc446: Pattern Recognition
Csc446: Pattern Recognition
Mostafa G. M. Mostafa
 

Similar to diffusion 모델부터 DALLE2까지.pdf (20)

Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
 
block-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdf
 
Neural Networks with Anticipation: Problems and Prospects
Neural Networks with Anticipation: Problems and ProspectsNeural Networks with Anticipation: Problems and Prospects
Neural Networks with Anticipation: Problems and Prospects
 
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Quantitative Propagation of Chaos for SGD in Wide Neural NetworksQuantitative Propagation of Chaos for SGD in Wide Neural Networks
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
 
SASA 2016
SASA 2016SASA 2016
SASA 2016
 
Final Report-1-(1)
Final Report-1-(1)Final Report-1-(1)
Final Report-1-(1)
 
Statistical Physics Studies of Machine Learning Problems by Lenka Zdeborova, ...
Statistical Physics Studies of Machine Learning Problems by Lenka Zdeborova, ...Statistical Physics Studies of Machine Learning Problems by Lenka Zdeborova, ...
Statistical Physics Studies of Machine Learning Problems by Lenka Zdeborova, ...
 
Metaheuristic Algorithms: A Critical Analysis
Metaheuristic Algorithms: A Critical AnalysisMetaheuristic Algorithms: A Critical Analysis
Metaheuristic Algorithms: A Critical Analysis
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
 
Lausanne 2019 #2
Lausanne 2019 #2Lausanne 2019 #2
Lausanne 2019 #2
 
Cyberinfrastructure for Einstein's Equations and Beyond
Cyberinfrastructure for Einstein's Equations and BeyondCyberinfrastructure for Einstein's Equations and Beyond
Cyberinfrastructure for Einstein's Equations and Beyond
 
Lecture17 xing fei-fei
Lecture17 xing fei-feiLecture17 xing fei-fei
Lecture17 xing fei-fei
 
Generative models : VAE and GAN
Generative models : VAE and GANGenerative models : VAE and GAN
Generative models : VAE and GAN
 
2022 03 22_蔡煒俊_u-net_convolutional_networks_for_biomedical_image_segmentation
2022 03 22_蔡煒俊_u-net_convolutional_networks_for_biomedical_image_segmentation2022 03 22_蔡煒俊_u-net_convolutional_networks_for_biomedical_image_segmentation
2022 03 22_蔡煒俊_u-net_convolutional_networks_for_biomedical_image_segmentation
 
Artificial Neural Networks for NIU
Artificial Neural Networks for NIUArtificial Neural Networks for NIU
Artificial Neural Networks for NIU
 
Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms
 
On Continuum Limits of Markov Chains and Network Modeling
On Continuum Limits of Markov Chains and  Network ModelingOn Continuum Limits of Markov Chains and  Network Modeling
On Continuum Limits of Markov Chains and Network Modeling
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and Algorithms
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
 
Csc446: Pattern Recognition
Csc446: Pattern Recognition Csc446: Pattern Recognition
Csc446: Pattern Recognition
 

More from 수철 박

Flow based generative models
Flow based generative modelsFlow based generative models
Flow based generative models
수철 박
 
VQ-VAE
VQ-VAEVQ-VAE
VQ-VAE
수철 박
 
Gmm to vgmm
Gmm to vgmmGmm to vgmm
Gmm to vgmm
수철 박
 
A universal music translation network
A universal music translation networkA universal music translation network
A universal music translation network
수철 박
 
Kernel Method
Kernel MethodKernel Method
Kernel Method
수철 박
 
R.T.Bach
R.T.BachR.T.Bach
R.T.Bach
수철 박
 

More from 수철 박 (6)

Flow based generative models
Flow based generative modelsFlow based generative models
Flow based generative models
 
VQ-VAE
VQ-VAEVQ-VAE
VQ-VAE
 
Gmm to vgmm
Gmm to vgmmGmm to vgmm
Gmm to vgmm
 
A universal music translation network
A universal music translation networkA universal music translation network
A universal music translation network
 
Kernel Method
Kernel MethodKernel Method
Kernel Method
 
R.T.Bach
R.T.BachR.T.Bach
R.T.Bach
 

Recently uploaded

4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
ssuserbfdca9
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 

Recently uploaded (20)

4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 

diffusion 모델부터 DALLE2까지.pdf

  • 2. STE @ 
 - GSEP: Music Source Separation 
 - GTS: Music & Lyrics Synchronization 
 - ? : Sound Generative Models 박수철 @ 
 - Text-To-Speech 
 - Voice Cloning 
 - Voice Conversion 박수철 @ 
 - Diffusion/Score-based models 
 - 음성인식과 음성합성 
 - 타코트론의 모든 것 
 - Deep generative models GSEP, GTS 데모 GSEP, GTS 데모 JTBC 개표방송 JTBC 개표방송
  • 3. Diffusion모델부터 DALL·E 2까지 - Diffusion Model 
 Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." International Conference on Machine Learning. PMLR, 2015. - DDPM 
 Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising diffusion probabilistic models." Advances in Neural Information Processing Systems 33 (2020): 6840-6851. - CLIP 
 Radford, Alec, et al. "Learning transferable visual models from natural language supervision." International Conference on Machine Learning. PMLR, 2021. - GLIDE 
 Nichol, Alex, et al. "Glide: Towards photorealistic image generation and editing with text-guided diffusion models." arXiv preprint arXiv:2112.10741 (2021). - DALL·E 2 
 Ramesh, Aditya, et al. "Hierarchical text-conditional image generation with clip latents." arXiv preprint arXiv:2204.06125 (2022). - Guided Diffusion Sampling 
 Dhariwal, Prafulla, and Alexander Nichol. "Diffusion models beat gans on image synthesis." Advances in Neural Information Processing Systems 34 (2021): 8780-8794. - Classifier-free diffusion guidance 
 Ho, Jonathan, and Tim Salimans. "Classifier-free diffusion guidance." NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications. 2021.
  • 5. Generative Model - Generative model은 dataset의 probability distribution을 학습하고 sampling하 는 것 Auto-Regressive Model pθ(x) = n2 ∏ i=1 pθ (xi ∣ x1, …, xi−1) Van den Oord, Aaron, et al. "Conditional image generation with pixelcnn decoders." 
 Advances in neural information processing systems 29 (2016). Variational Auto-Encoder pθ(x) = ∫ pθ(x ∣ z)pθ(z)dz https://en.wikipedia.org/wiki/Variational_autoencoder
  • 6. Generative Model Flow-based Model Generative Adversarial Networks - Generative model은 dataset의 probability distribution을 학습하고 sampling하 는 것 Lil'Log, Flow-based Deep Generative Models 
 https://lilianweng.github.io/posts/2018-10-13-flow-models/ pθ(x) = pθ(z)|det(dz/dx)| 𝔼 x∼p data (x)[log D(x)] + 𝔼 z∼pz(z)[log(1 − D(G(z)))] Goodfellow, Ian, et al. "Generative adversarial nets." 
 Advances in neural information processing systems 27 (2014).
  • 8. Diffusion Model - Sohl-Dickstein, Jascha의 논문 Deep unsupervised learning using nonequilibrium thermodynamics에서 제안 Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." International Conference on Machine Learning. PMLR, 2015. Di ff usion
 https://en.wikipedia.org/wiki/Di ff usion (Flipped)
  • 9. Diffusion Model Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." International Conference on Machine Learning. PMLR, 2015. q (x1:T ∣ x0) := T ∏ t=1 q (xt ∣ xt−1), q (xt ∣ xt−1, x0) := 𝒩 (xt; 1 − βtxt−1, βtI) Forward Process Posterior q (xt−1 ∣ xt, x0) = 𝒩 (xt−1; μ̃t (xt, x0), β̃tI)  where  μ̃t (xt, x0) := ᾱt−1 βt 1 − ᾱt x0 + αt (1 − ᾱt−1) 1 − ᾱt xt  and  β̃t := 1 − ᾱt−1 1 − ᾱt βt pθ (xt−1 ∣ xt) := 𝒩 (xt−1; μθ (xt, t), Σθ (xt, t)) Backward Process 
 (Neural Networks) Loss Function DKL (q (xt−1 ∣ xt, x0) ∥pθ (xt−1 ∣ xt))
  • 10. Diffusion Model - Forward Process Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." International Conference on Machine Learning. PMLR, 2015. Forward Process xt−1 xt Distribution of at an arbitrary timestep t in closed form xt q (xt ∣ x0) = 𝒩 (xt; ᾱtx0, (1 − ᾱt) I) , where  αt := 1 − βt and ᾱt := t ∏ s=1 αs 식 유도는Lil'Log 참고
 https://lilianweng.github.io/posts/2021-07-11-di ff usion-models/ q (x1:T ∣ x0) := T ∏ t=1 q (xt ∣ xt−1), q (xt ∣ xt−1, x0) := 𝒩 (xt; 1 − βtxt−1, βtI)
  • 11. Diffusion Model - Forward Process Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." International Conference on Machine Learning. PMLR, 2015. MNIST single data, β = 0.2, T = 10 Swiss roll dataset, β = 0.05, T = 10
  • 12. Diffusion Model - Posterior Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." International Conference on Machine Learning. PMLR, 2015. Posterior q (xt−1 ∣ xt, x0) = 𝒩 (xt−1; μ̃t (xt, x0), β̃tI)  where  μ̃t (xt, x0) := ᾱt−1 βt 1 − ᾱt x0 + αt (1 − ᾱt−1) 1 − ᾱt xt  and  β̃t := 1 − ᾱt−1 1 − ᾱt βt q(xt−1 ∣ xt, x0) = q(xt−1 ∣ x0)q(xt ∣ xt−1, x0) q(xt ∣ x0) by Bayes' Rule Forward Process q (x1:T ∣ x0) := T ∏ t=1 q (xt ∣ xt−1), q (xt ∣ xt−1, x0) := 𝒩
  • 13. Diffusion Model - Backward Process Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." International Conference on Machine Learning. PMLR, 2015. Forward Process Posterior q (xt−1 ∣ xt, x0) = 𝒩  where  μ̃t (xt, x0) := ᾱt−1 βt 1 − ᾱt x0 + αt (1 − ᾱt−1) 1 − ᾱt xt  and  β̃t := 1 − ᾱt−1 1 − ᾱt βt pθ (xt−1 ∣ xt) := 𝒩 (xt−1; μθ (xt, t), Σθ (xt, t)) Backward Process 
 (Neural Networks) Loss Function DKL (q (xt−1 ∣ xt, x0) ∥pθ (xt−1 ∣ xt)) U-net xt μθ (xt, t), Σθ (xt, t) t q (x1:T ∣ x0) := T ∏ t=1 q (xt ∣ xt−1), q (xt ∣ xt−1, x0) := 𝒩
  • 14. Diffusion Model - Loss Function Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." International Conference on Machine Learning. PMLR, 2015. Forward Process Posterior q (xt−1 ∣ xt, x0) = 𝒩  where  μ̃t (xt, x0) := ᾱt−1 βt 1 − ᾱt x0 + αt (1 − ᾱt−1) 1 − ᾱt xt  and  β̃t := 1 − ᾱt−1 1 − ᾱt βt pθ (xt−1 ∣ xt) := 𝒩 Backward Process 𝒩
  • 15. Diffusion Model - Output Samples Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." International Conference on Machine Learning. PMLR, 2015.
  • 16. DDPM (Denoising Diffusion Probabilistic Models) Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising diffusion probabilistic models." Advances in Neural Information Processing Systems 33 (2020): 6840-6851. - Jonathan Ho의 논문 Denoising diffusion probabilistic models에서 제안 Distribution of at an arbitrary timestep t in closed form xt x0 ϵ ∼ 𝒩 (0, I) xt (x0, ϵ) = ᾱtx0 + 1 − ᾱtϵ Lsimple (θ) := 𝔼 t,x0,ϵ [ ϵ − ϵθ ( ᾱtx0 + 1 − ᾱtϵ, t) 2 ] is a linear combination of and xt x0 ϵ q (xt ∣ x0) = 𝒩 (xt; ᾱtx0, (1 − ᾱt) I) , where  αt := 1 − βt and ᾱt := t ∏ s=1 αs Posterior q (xt−1 ∣ xt, x0) = 𝒩 (xt−1; μ̃t (xt, x0), β̃tI) Loss Function https://github.com/rosinality/denoising-di ff usion-pytorch/blob/master/ di ff usion.py generate ϵ sample xt predict ϵ Predict (or ) at each step ϵ x0
  • 17. DDPM (Denoising Diffusion Probabilistic Models) Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising diffusion probabilistic models." Advances in Neural Information Processing Systems 33 (2020): 6840-6851. - Jonathan Ho의 논문 Denoising diffusion probabilistic models에서 제안 Distribution of at an arbitrary timestep t in closed form xt x0 ϵ ∼ 𝒩 (0, I) xt (x0, ϵ) = ᾱtx0 + 1 − ᾱtϵ Lsimple (θ) := 𝔼 t,x0,ϵ [ ϵ − ϵθ ( ᾱtx0 + 1 − ᾱtϵ, t) 2 ] is a linear combination of and xt x0 ϵ q (xt ∣ x0) = 𝒩 (xt; ᾱtx0, (1 − ᾱt) I) , where  αt := 1 − βt and ᾱt := t ∏ s=1 αs Posterior q (xt−1 ∣ xt, x0) = 𝒩 (xt−1; μ̃t (xt, x0), β̃tI) Loss Function https://github.com/rosinality/denoising-di ff usion-pytorch/blob/master/ di ff usion.py generate ϵ sample xt predict ϵ Predict (or ) at each step ϵ x0
  • 18. DDPM - Output Samples Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising diffusion probabilistic models." Advances in Neural Information Processing Systems 33 (2020): 6840-6851.
  • 19. https://github.com/yang-song/score_sde_pytorch/ 
 Song, Yang, et al. "Score-based generative modeling through stochastic differential equations." arXiv preprint arXiv:2011.13456 (2020). DDPM - Output Samples
  • 20. Diffusion vs. ... "We emphasize that our objective Eq. (6) requires no adversarial training, no surrogate losses, and nosampling from the score network during training (e.g., unlike contrastive divergence). Also, it does not require sθ(x, σ) to have special architectures in order to be tractable." 
 Song, Yang, and Stefano Ermon. "Generative modeling by estimating gradients of the data distribution." Advances in Neural Information Processing Systems 32 (2019). "We present a novel way to define probabilistic models that allows: 
 1. extreme flexibility in model structure, 2. exact sampling, 3. easy multiplication with other distributions, e.g. in order to compute a posterior, and 4. the model log likelihood, and the probability of individual states, to be cheaply evaluated." Sohl-Dickstein, Jascha, et al. "Deep unsupervised learning using nonequilibrium thermodynamics." International Conference on Machine Learning. PMLR, 2015.
  • 21. Diffusion vs. ... Tractability Flexibility Auto- 
 Regressive VAE Flow GAN Diffusion Good Good Good Not Good Good Not Good 
 
 Causal structure Fixed distribution Not Good 
 Dimension reduction Fixed distribution Likelihood 
 can't be evaluated Not Good 
 Invertible structure 
 Fixed distribution Good Good
  • 23. GLIDE - Output Samples Nichol, Alex, et al. "Glide: Towards photorealistic image generation and editing with text-guided diffusion models." arXiv preprint arXiv:2112.10741 (2021). - OpenAI에서 논문 Glide: Towards photorealistic image generation and editing with text-guided diffusion models을 통해 제안
  • 24. GLIDE - Overall Architecture Nichol, Alex, et al. "Glide: Towards photorealistic image generation and editing with text-guided diffusion models." arXiv preprint arXiv:2112.10741 (2021). A hedgehog using a calculator Text Encoder Encoding Sequence ResBlock Attention Down Layer ... ... Mid ... GLIDE source : https://github.com/openai/glide-text2im Down or Up Up Layer ... xt ϵ AdaIn or Add Attention U-net Layer (Batch, Channel, Length) xt+1 xT ⋯ ⋯ xt−1 x0 ⋯ ⋯
  • 25. CLIP (Contrastive Language-Image Pre-training) - (image, text) 쌍의 데이터로 self-supervised learning을 통해 text/image encoder 를 학습 Radford, Alec, et al. "Learning transferable visual models from natural language supervision." International Conference on Machine Learning. PMLR, 2021.
  • 26. CLIP (Contrastive Language-Image Pre-training) - (image, text) 쌍의 데이터로 self-supervised learning을 통해 text/image encoder 를 학습 Radford, Alec, et al. "Learning transferable visual models from natural language supervision." International Conference on Machine Learning. PMLR, 2021.
  • 27. CLIP (Contrastive Language-Image Pre-training) https://openai.com/blog/clip/
  • 28. DALL·E 2 - Output Samples - OpenAI에서 Hierarchical text-conditional image generation with clip latents 논 문을 통해 제안 Ramesh, Aditya, et al. "Hierarchical text-conditional image generation with clip latents." arXiv preprint arXiv:2204.06125 (2022).
  • 29. DALL·E 2 - Overall Architecture Ramesh, Aditya, et al. "Hierarchical text-conditional image generation with clip latents." arXiv preprint arXiv:2204.06125 (2022).
  • 30. DALL·E 2 - Prior - GLIDE text embedding과 CLIP text embedding으로부터 CLIP image embedding 을 생성 Diffusion Backbone 
 (Transformer Decoder) Text encoding sequence 
 (Batch, Channel, Length) CLIP text embedding Diffusion timestep Noised CLIP image embedding Final embedding Unnoised CLIP image embedding Ramesh, Aditya, et al. "Hierarchical text-conditional image generation with clip latents." arXiv preprint arXiv:2204.06125 (2022).
  • 31. Guided Diffusion Sampling Classifier-free Diffusion Guidance 
 CLIP Guidance
  • 32. Guided Diffusion Sampling - Diffusion Models Beat GANs on Image Synthesis 논문에서 제안 - Diffusion 모델 외 추가적인 image classifier를 학습시키고 sampling 과정에서 classifier로부터 gradient를 받아 sampling에 도움을 줌 Dhariwal, Prafulla, and Alexander Nichol. "Diffusion models beat gans on image synthesis." Advances in Neural Information Processing Systems 34 (2021): 8780-8794. ̂ μθ (xt ∣ y) = μθ (xt ∣ y) + s ⋅ Σθ (xt ∣ y)∇xt log pϕ (y ∣ xt) Posterior q (xt−1 ∣ xt, x0) = 𝒩 (xt−1; μ̃t (xt, x0), β̃tI) Nichol, Alex, et al. "Glide: Towards photorealistic image generation and editing with text-guided diffusion models." arXiv preprint arXiv:2112.10741 (2021). Nichol, Alex, et al.에서 재인용 수정된 mean classi fi er로부터 전해진 gradient guidance
 scale 기존 mean 기존 covariance xt μθ (xt ∣ y) ∇xt log pϕ (y ∣ xt)
  • 33. Guided Diffusion Sampling Dhariwal, Prafulla, and Alexander Nichol. "Diffusion models beat gans on image synthesis." Advances in Neural Information Processing Systems 34 (2021): 8780-8794.
  • 34. Classifier-free guidance - Ho, Jonathan과 Tim Salimans의 논문 Classifier-free diffusion guidance에서 제안 - 추가적인 classifier를 트레이닝할 필요없이 diffusion 모델만 가지고 guided sampling을 가능하게 만 듬 Ho, Jonathan, and Tim Salimans. "Classifier-free diffusion guidance." NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications. 2021. Nichol, Alex, et al. "Glide: Towards photorealistic image generation and editing with text-guided diffusion models." arXiv preprint arXiv:2112.10741 (2021). Nichol, Alex, et al.에서 재인용 ̂ ϵθ (xt ∣ y) = ϵθ (xt ∣ y) + s ⋅ (ϵθ (xt ∣ y) − ϵθ (xt ∣ ∅)) unconditional
 predicted score conditional
 predicted score ϵθ (xt ∣ y) ϵθ (xt ∣ y) − ϵθ (xt ∣ ∅) ≈ − σt ∇xt log pi (y ∣ xt) Diffusion Backbone (U-net) xt , or t y ∅ ϵθ ϵθ (xt ∣ ∅) guidance
 scale 수정된 score
  • 35. Classifier-free guidance Ho, Jonathan, and Tim Salimans. "Classifier-free diffusion guidance." NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications. 2021.
  • 36. CLIP Guidance - Classifier를 이용한 guidance 방법과 비슷하게 CLIP 모델을 이용하여 sampling 단계 에서 도움을 줌 Nichol, Alex, et al. "Glide: Towards photorealistic image generation and editing with text-guided diffusion models." arXiv preprint arXiv:2112.10741 (2021). ̂ μθ (xt ∣ c) = μθ (xt ∣ c) + s ⋅ Σθ (xt ∣ c)∇xt(f (xt) ⋅ g(c)) CLIP
 image encoding CLIP
 text encoding xt μθ (xt ∣ c) ∇xt(f (xt) ⋅ g(c))
  • 37. Classfier-free Guidance vs. CLIP Guidance in GLIDE Nichol, Alex, et al. "Glide: Towards photorealistic image generation and editing with text-guided diffusion models." arXiv preprint arXiv:2112.10741 (2021).