Vitaly Bondar's guide to generative deep learning models

•

2 likes•2,375 views

A presentation about the development of the ideas from the autoencoder to the Stable Diffusion text-to-image model. Models covered: autoencoder, VAE, VQ-VAE, VQ-GAN, latent diffusion, and stable diffusion.

Science

Stable Diﬀusion path
presented by
Vitaly Bondar
Generative DL enthusiast
johngull @ gmail

Autoencoder
Illustration: Lilian Weng
LeCun, Y. Modeles connexionistes de l’apprentissage ` . Ph.D. thesis, 1987

Variational autoencoder (VAE)
Illustrations:
Lilian Weng, D. Kingma
Kingma, Welling. Auto-Encoding Variational Bayes. 2013

Variational autoencoder (VAE)
Kingma, Welling. Auto-Encoding Variational Bayes. 2013

Vector Quantized Variational Autoencoder (VQ-VAE)
Oord et al. Neural Discrete Representation Learning, 2017

VQ-GAN
Esser, Rombach et al. Taming Transformers for High-Resolution Image Synthesis, 2020

Recap: diﬀusion models
Sohl-Dickstein et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics, 2015;
Yang & Ermon, 2019; DDPM; Ho et al. 2020; … ?

Latent diﬀussions
Rombach, Blattmann et al. High-Resolution Image Synthesis with Latent Diﬀusion Models, 2021

Latent diﬀussions
Training phases:
● Autoencoder
○ Loss: Patch-based GAN loss + Perceptual loss
○ Regularization: KL-loss (close to VAE) OR quantization in Decoder (like VQ-GAN)
● Various generative tasks
○ Loss: classical diﬀusion L2 restoration loss
○ All trainings done on single A100
Rombach, Blattmann et al. High-Resolution Image Synthesis with Latent Diﬀusion Models, 2021

Latent diﬀussions
Downsampling for 4-16x: speedup of generative training without sampling quality loss
Rombach, Blattmann et al. High-Resolution Image Synthesis with Latent Diﬀusion Models, 2021

Imagen (important inﬂuencer)
Saharia, Chan. Photorealistic Text-to-Image Diﬀusion Models with Deep Language Understanding, 2022

Stable diﬀusion
● No paper (core team is from Latent diﬀusion authors)
● Open source: https://github.com/CompVis/stable-diﬀusion
● Well improved and well trained latent diﬀusion model

Stable diﬀusion
Key components:
● High quality decoder from VQ-GAN
● Diﬀusion in latent space
● Frozen language model (CLIP ViT-L/14 embeddings)
● Classiﬁer-free guidance
● A lot of data (LAION-5B and its subsets)
● A lot of compute power

What's hot

Generative Models for General AudiencesSangwoo Mo

Generative adversarial networksYunjey Choi

Introduction to Generative Adversarial Networks (GANs)Appsilon Data Science

GANs and ApplicationsHoang Nguyen

Word embedding ShivaniChoudhary74

Score based generative modelsangyun lee

Transformer Introduction (Seminar Material)Yuta Niki

Generative AIAll Things Open

Generative Adversarial Network (GAN)Prakhar Rastogi

Generative AICarlos J. Costa

StarGANJoonhyung Lee

How Does Generative AI Actually Work? (a quick semi-technical introduction to...ssuser4edc93

An Introduction to Generative AICori Faklaris

Fine tune and deploy Hugging Face NLP modelsOVHcloud

Notes on attention mechanismKhang Pham

Generative Adversarial Networks (GANs)Amol Patil

Introduction to Diffusion ModelsSangwoo Mo

LDM_ImageSythesis.pptxAkankshaRawat53

Basic Generative Adversarial NetworksDong Heon Cho

Generative Models and ChatGPTLoic Merckel

What's hot (20)

Generative Models for General Audiences

Generative adversarial networks

Introduction to Generative Adversarial Networks (GANs)

GANs and Applications

Word embedding

Score based generative model

Transformer Introduction (Seminar Material)

Generative AI

Generative Adversarial Network (GAN)

Generative AI

StarGAN

How Does Generative AI Actually Work? (a quick semi-technical introduction to...

An Introduction to Generative AI

Fine tune and deploy Hugging Face NLP models

Notes on attention mechanism

Generative Adversarial Networks (GANs)

Introduction to Diffusion Models

LDM_ImageSythesis.pptx

Basic Generative Adversarial Networks

Generative Models and ChatGPT

Similar to Vitaly Bondar's guide to generative deep learning models

Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Le...Hamid Eghbal-zadeh

Adaptive Spectral ProjectionStanfordComputationalImaging

보다 유연한 이미지 변환을 하려면?광희 이

PR-315: Taming Transformers for High-Resolution Image SynthesisHyeongmin Lee

[246]QANet: Towards Efficient and Human-Level Reading Comprehension on SQuADNAVER D2

Multi-modal embeddings: from discriminative to generative models and creative aiRoelof Pieters

ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsForward Gradient

Modeling perceptual similarity and shift invariance in deep networksNAVER Engineering

Satellite image contrast enhancement using discrete wavelet transformHarishwar Reddy

Audio and Vision (D2L9 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya

Visual TransformersKwanghee Choi

ViT (Vision Transformer) Review [CDM]Dongmin Choi

Image captioningRajesh Shreedhar Bhat

Shaders - Claudia Doppioslash - Unity With the BestBeMyApp

The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon

Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog...Association for Computational Linguistics

Icml2018 naver reviewNAVER Engineering

Deep Learning & NLP: Graphs to the Rescue!Roelof Pieters

ICPC 2011 - Improving IR-based Traceability Recovery Using Smoothing FiltersSebastiano Panichella

Similar to Vitaly Bondar's guide to generative deep learning models (20)

Deep Within-Class Covariance Analysis for Robust Deep Audio Representation Le...

Adaptive Spectral Projection

보다 유연한 이미지 변환을 하려면?

PR-315: Taming Transformers for High-Resolution Image Synthesis

[246]QANet: Towards Efficient and Human-Level Reading Comprehension on SQuAD

Multi-modal embeddings: from discriminative to generative models and creative ai

ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications

Modeling perceptual similarity and shift invariance in deep networks

Satellite image contrast enhancement using discrete wavelet transform

Audio and Vision (D2L9 Insight@DCU Machine Learning Workshop 2017)

Visual Transformers

ViT (Vision Transformer) Review [CDM]

Image captioning

Shaders - Claudia Doppioslash - Unity With the Best

The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017

Tiancheng Zhao - 2017 - Learning Discourse-level Diversity for Neural Dialog...

Icml2018 naver review

Deep Learning & NLP: Graphs to the Rescue!

ICPC 2011 - Improving IR-based Traceability Recovery Using Smoothing Filters

Recently uploaded

Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal

Neurodevelopmental disorders according to the dsm 5 trssuser06f238

Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane

Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3

Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY

TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b

LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth

TOTAL CHOLESTEROL (lipid profile test).pptxdharshini369nike

Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P

insect anatomy and insect body wall and their physiologyDrAnita Sharma

Evidences of Evolution General Biology 2John Carlo Rollon

SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter

Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh

Volatile Oils Pharmacognosy And Phytochemistry -INandakishor Bhaurao Deshmukh

Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India

RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed

Gas_Laws_powerpoint_notes.ppt for grade 10ROLANARIBATO3

Harmful and Useful Microorganisms Presentationtahreemzahra82

Cytokinin, mechanism and its application.pptxVarshiniMK

Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl

Recently uploaded (20)

Spermiogenesis or Spermateleosis or metamorphosis of spermatid

Neurodevelopmental disorders according to the dsm 5 tr

Microphone- characteristics,carbon microphone, dynamic microphone.pptx

Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta

Behavioral Disorder: Schizophrenia & it's Case Study.pdf

TOPIC 8 Temperature and Heat.pdf physics

LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx

TOTAL CHOLESTEROL (lipid profile test).pptx

Artificial Intelligence In Microbiology by Dr. Prince C P

insect anatomy and insect body wall and their physiology

Evidences of Evolution General Biology 2

SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx

Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝

Volatile Oils Pharmacognosy And Phytochemistry -I

Bentham & Hooker's Classification. along with the merits and demerits of the ...

RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx

Gas_Laws_powerpoint_notes.ppt for grade 10

Harmful and Useful Microorganisms Presentation

Cytokinin, mechanism and its application.pptx

Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |

Vitaly Bondar's guide to generative deep learning models

1. Stable Diﬀusion path presented by Vitaly Bondar Generative DL enthusiast johngull @ gmail

2. Autoencoder Illustration: Lilian Weng LeCun, Y. Modeles connexionistes de l’apprentissage ` . Ph.D. thesis, 1987

3. Variational autoencoder (VAE) Illustrations: Lilian Weng, D. Kingma Kingma, Welling. Auto-Encoding Variational Bayes. 2013

4. Variational autoencoder (VAE) Kingma, Welling. Auto-Encoding Variational Bayes. 2013

5. Vector Quantized Variational Autoencoder (VQ-VAE) Oord et al. Neural Discrete Representation Learning, 2017

6. Vector Quantized Variational Autoencoder (VQ-VAE) Oord et al. Neural Discrete Representation Learning, 2017

7. Vector Quantized Variational Autoencoder (VQ-VAE) Oord et al. Neural Discrete Representation Learning, 2017

8. VQ-GAN Esser, Rombach et al. Taming Transformers for High-Resolution Image Synthesis, 2020

9. VQ-GAN Esser, Rombach et al. Taming Transformers for High-Resolution Image Synthesis, 2020

10. VQ-GAN Esser, Rombach et al. Taming Transformers for High-Resolution Image Synthesis, 2020

11. Recap: diﬀusion models Sohl-Dickstein et al. Deep Unsupervised Learning using Nonequilibrium Thermodynamics, 2015; Yang & Ermon, 2019; DDPM; Ho et al. 2020; … ?

12. Latent diﬀussions Rombach, Blattmann et al. High-Resolution Image Synthesis with Latent Diﬀusion Models, 2021

13. Latent diffussions Training phases: ● Autoencoder ○ Loss: Patch-based GAN loss + Perceptual loss ○ Regularization: KL-loss (close to VAE) OR quantization in Decoder (like VQ-GAN) ● Various generative tasks ○ Loss: classical diffusion L2 restoration loss ○ All trainings done on single A100 Rombach, Blattmann et al. High-Resolution Image Synthesis with Latent Diffusion Models, 2021

14. Latent diﬀussions Downsampling for 4-16x: speedup of generative training without sampling quality loss Rombach, Blattmann et al. High-Resolution Image Synthesis with Latent Diﬀusion Models, 2021

15. Latent diﬀussions Rombach, Blattmann et al. High-Resolution Image Synthesis with Latent Diﬀusion Models, 2021

16. Latent diﬀussions

17. Latent diﬀussions

18. Latent diﬀussions

19. Latent diﬀussions

20. Imagen (important inﬂuencer) Saharia, Chan. Photorealistic Text-to-Image Diﬀusion Models with Deep Language Understanding, 2022

21. Imagen (important inﬂuencer) Saharia, Chan. Photorealistic Text-to-Image Diﬀusion Models with Deep Language Understanding, 2022

22. Imagen (important inﬂuencer)

23. Stable diffusion ● No paper (core team is from Latent diffusion authors) ● Open source: https://github.com/CompVis/stable-diffusion ● Well improved and well trained latent diffusion model

24. Stable diffusion Key components: ● High quality decoder from VQ-GAN ● Diffusion in latent space ● Frozen language model (CLIP ViT-L/14 embeddings) ● Classifier-free guidance ● A lot of data (LAION-5B and its subsets) ● A lot of compute power

25. Stable diﬀusion

Vitaly Bondar's guide to generative deep learning models

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Vitaly Bondar's guide to generative deep learning models

Similar to Vitaly Bondar's guide to generative deep learning models (20)

Recently uploaded

Recently uploaded (20)

Vitaly Bondar's guide to generative deep learning models