SlideShare a Scribd company logo
Imagen: Photorealistic Text-to-Image Diffusion
Models with Deep Language Understanding
https://arxiv.org/abs/2205.11487
https://imagen.research.google/
Vitaly Bondar
johngull @ gmail
Architecture
Key contributions of the paper
1. We discover that large frozen language models trained only on text data are surprisingly very
effective text encoders for text-to-image generation, and that scaling the size of frozen text encoder
improves sample quality significantly more than scaling the size of image diffusion model.
2. We introduce dynamic thresholding, a new diffusion sampling technique to leverage high guidance
weights and generating more photorealistic and detailed images than previously possible.
3. We highlight several important diffusion architecture design choices and propose Efficient
U-Net, a new architecture variant which is simpler, converges faster and is more memory efficient.
4. We achieve a new state-of-the-art COCO FID of 7.27. Human raters find Imagen to be on-par with
the reference images in terms of image-text alignment.
5. We introduce DrawBench, a new comprehensive and challenging evaluation benchmark for the
text-to-image task. On DrawBench human evaluation, we find Imagen to outperform all other work,
including the concurrent work of DALL-E 2
Classifier-free guidance
Increasing of w improve fidelity, but decrease realism. Surprisingly w>1 works well
also. Even extremely high values like 7-10.
Value used in Imagen - 0.9 (10% zeroing of text embeddings)
With high w train/test mismatch happens and reduce quality on the new texts. Also
leads to moving out of [-1;1] range.
Dynamic thresholding
Dynamic thresholding
Noise conditioning
Imagen uses noise conditioning augmentation for both the super-resolution models. We find this to be a critical for generating high
fidelity images.
Given a conditioning low-resolution image and augmentation level (a.k.a aug_level) (e.g., strength of Gaussian noise or blur), we
corrupt the low-resolution image with the augmentation (corresponding to aug_level), and condition the diffusion model on aug_level.
During training, aug_level is chosen randomly, while during inference if fixed (best 0.1-0.2).
64x64 model
SuperResolution models (Efficient U-Net)
Efficient U-Net makes several key modifications to the typical U-Net:
● We shift the model parameters from the high resolution blocks to the low resolution blocks,
via adding more residual blocks for the lower resolutions. Since lower resolution blocks typically
have many more channels, this allows us to increase the model capacity through more model
parameters, without egregious memory and computation costs.
● When using large number of residual blocks at lower-resolution (e.g. we use 8 residual blocks at
lower-resolutions compared to typical 2-3 residual blocks used in standard U-Net architectures [16,
59]) we find that scaling the skip connections by 1/√2 similar to [66, 59] significantly improves
convergence speed.
● In a typical U-Net’s downsampling block, the downsampling operation happens after the
convolutions, and in an upsampling block, the upsampling operation happens prior the convolution.
We reverse this order for both downsampling and upsampling blocks in order to significantly
improve the speed of the forward pass of the U-Net, and find no performance degradation.
SuperResolution models (Efficient U-Net)
SuperResolution models (Efficient U-Net)
SuperResolution models (Efficient U-Net)
Impact of sizes
Training
We use a batch size of 2048 and 2.5M training steps for all models.
256 TPU-v4 chips for our base 64 × 64 model, and 128 TPU-v4 chips for both super-resolution
For classifier-free guidance, we joint-train unconditionally via zeroing out the text embeddings with 10% probability for all three models
Dataset: We train on a combination of internal datasets, with ≈ 460M image-text pairs, and the publicly available Laion dataset, with ≈ 400M
image-text pairs
Our 256×256 → 1024×1024 super-resolution model trains on 64×64 → 256×256 crops of the 1024 × 1024 image. To facilitate this, we remove the
self-attention layers, however we keep the text cross-attention layers which we found to be critical.
Results
Results
Results
Results
Results
Results
Results
Results

More Related Content

What's hot

Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
WithTheBest
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
Steve Omohundro
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
Emanuele Ghelfi
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
Chia-Wen Cheng
 
Score based generative model
Score based generative modelScore based generative model
Score based generative model
sangyun lee
 
Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdf
Qualcomm Research
 
Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural network
KIRAN R
 
Introduction to Interpretable Machine Learning
Introduction to Interpretable Machine LearningIntroduction to Interpretable Machine Learning
Introduction to Interpretable Machine Learning
Nguyen Giang
 
Generative models (Geek hub 2021 lecture)
Generative models (Geek hub 2021 lecture)Generative models (Geek hub 2021 lecture)
Generative models (Geek hub 2021 lecture)
Vitaly Bondar
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
leopauly
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
Appsilon Data Science
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
Hoang Nguyen
 
Intro to LLMs
Intro to LLMsIntro to LLMs
Intro to LLMs
Loic Merckel
 
Exploring Generating AI with Diffusion Models
Exploring Generating AI with Diffusion ModelsExploring Generating AI with Diffusion Models
Exploring Generating AI with Diffusion Models
KonfHubTechConferenc
 
A Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth EstimationA Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth Estimation
Ryo Takahashi
 
Image to image translation with Pix2Pix GAN
Image to image translation with Pix2Pix GANImage to image translation with Pix2Pix GAN
Image to image translation with Pix2Pix GAN
S.Shayan Daneshvar
 
Self supervised learning
Self supervised learningSelf supervised learning
Self supervised learning
哲东 郑
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
Databricks
 
Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...
Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...
Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...
Edureka!
 

What's hot (20)

Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 
Deep Generative Models
Deep Generative Models Deep Generative Models
Deep Generative Models
 
Score based generative model
Score based generative modelScore based generative model
Score based generative model
 
Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdf
 
Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural network
 
Introduction to Interpretable Machine Learning
Introduction to Interpretable Machine LearningIntroduction to Interpretable Machine Learning
Introduction to Interpretable Machine Learning
 
Generative models (Geek hub 2021 lecture)
Generative models (Geek hub 2021 lecture)Generative models (Geek hub 2021 lecture)
Generative models (Geek hub 2021 lecture)
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
 
Intro to LLMs
Intro to LLMsIntro to LLMs
Intro to LLMs
 
Exploring Generating AI with Diffusion Models
Exploring Generating AI with Diffusion ModelsExploring Generating AI with Diffusion Models
Exploring Generating AI with Diffusion Models
 
A Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth EstimationA Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth Estimation
 
Image to image translation with Pix2Pix GAN
Image to image translation with Pix2Pix GANImage to image translation with Pix2Pix GAN
Image to image translation with Pix2Pix GAN
 
Self supervised learning
Self supervised learningSelf supervised learning
Self supervised learning
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
 
Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...
Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...
Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...
 

Similar to Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSitakanta Mishra
 
Neural Style Transfer in practice
Neural Style Transfer in practiceNeural Style Transfer in practice
Neural Style Transfer in practice
MohamedAmineHACHICHA1
 
Neural Style Transfer in Practice
Neural Style Transfer in PracticeNeural Style Transfer in Practice
Neural Style Transfer in Practice
KhalilBergaoui
 
Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...
Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...
Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...
DR.P.S.JAGADEESH KUMAR
 
A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution
Mohammed Ashour
 
A Novel Blind SR Method to Improve the Spatial Resolution of Real Life Video ...
A Novel Blind SR Method to Improve the Spatial Resolution of Real Life Video ...A Novel Blind SR Method to Improve the Spatial Resolution of Real Life Video ...
A Novel Blind SR Method to Improve the Spatial Resolution of Real Life Video ...
IRJET Journal
 
Automated Speech Recognition
Automated Speech Recognition Automated Speech Recognition
Automated Speech Recognition
Pruthvij Thakar
 
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
Edge AI and Vision Alliance
 
IRJET- American Sign Language Classification
IRJET- American Sign Language ClassificationIRJET- American Sign Language Classification
IRJET- American Sign Language Classification
IRJET Journal
 
Implementing Neural Style Transfer
Implementing Neural Style Transfer Implementing Neural Style Transfer
Implementing Neural Style Transfer
Tahsin Mayeesha
 
App development with stable diffusion model Unlocking the power of generative...
App development with stable diffusion model Unlocking the power of generative...App development with stable diffusion model Unlocking the power of generative...
App development with stable diffusion model Unlocking the power of generative...
StephenAmell4
 
06 13sept 8313 9997-2-ed an adaptive (edit lafi)
06 13sept 8313 9997-2-ed an adaptive (edit lafi)06 13sept 8313 9997-2-ed an adaptive (edit lafi)
06 13sept 8313 9997-2-ed an adaptive (edit lafi)
IAESIJEECS
 
An improved image compression algorithm based on daubechies wavelets with ar...
An improved image compression algorithm based on daubechies  wavelets with ar...An improved image compression algorithm based on daubechies  wavelets with ar...
An improved image compression algorithm based on daubechies wavelets with ar...
Alexander Decker
 
Mnist report ppt
Mnist report pptMnist report ppt
Mnist report ppt
RaghunandanJairam
 
Mnist report
Mnist reportMnist report
Mnist report
RaghunandanJairam
 
Sign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedSign Detection from Hearing Impaired
Sign Detection from Hearing Impaired
IRJET Journal
 
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxxIMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx
AtharvaTanawade
 
Lossless Huffman coding image compression implementation in spatial domain by...
Lossless Huffman coding image compression implementation in spatial domain by...Lossless Huffman coding image compression implementation in spatial domain by...
Lossless Huffman coding image compression implementation in spatial domain by...
IRJET Journal
 
Hybrid compression based stationary wavelet transforms
Hybrid compression based stationary wavelet transformsHybrid compression based stationary wavelet transforms
Hybrid compression based stationary wavelet transforms
Omar Ghazi
 

Similar to Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding (20)

Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
 
Neural Style Transfer in practice
Neural Style Transfer in practiceNeural Style Transfer in practice
Neural Style Transfer in practice
 
Neural Style Transfer in Practice
Neural Style Transfer in PracticeNeural Style Transfer in Practice
Neural Style Transfer in Practice
 
Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...
Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...
Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...
 
A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution
 
A Novel Blind SR Method to Improve the Spatial Resolution of Real Life Video ...
A Novel Blind SR Method to Improve the Spatial Resolution of Real Life Video ...A Novel Blind SR Method to Improve the Spatial Resolution of Real Life Video ...
A Novel Blind SR Method to Improve the Spatial Resolution of Real Life Video ...
 
Automated Speech Recognition
Automated Speech Recognition Automated Speech Recognition
Automated Speech Recognition
 
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
 
IRJET- American Sign Language Classification
IRJET- American Sign Language ClassificationIRJET- American Sign Language Classification
IRJET- American Sign Language Classification
 
Implementing Neural Style Transfer
Implementing Neural Style Transfer Implementing Neural Style Transfer
Implementing Neural Style Transfer
 
App development with stable diffusion model Unlocking the power of generative...
App development with stable diffusion model Unlocking the power of generative...App development with stable diffusion model Unlocking the power of generative...
App development with stable diffusion model Unlocking the power of generative...
 
06 13sept 8313 9997-2-ed an adaptive (edit lafi)
06 13sept 8313 9997-2-ed an adaptive (edit lafi)06 13sept 8313 9997-2-ed an adaptive (edit lafi)
06 13sept 8313 9997-2-ed an adaptive (edit lafi)
 
An improved image compression algorithm based on daubechies wavelets with ar...
An improved image compression algorithm based on daubechies  wavelets with ar...An improved image compression algorithm based on daubechies  wavelets with ar...
An improved image compression algorithm based on daubechies wavelets with ar...
 
Mnist report ppt
Mnist report pptMnist report ppt
Mnist report ppt
 
Mnist report
Mnist reportMnist report
Mnist report
 
Sign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedSign Detection from Hearing Impaired
Sign Detection from Hearing Impaired
 
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxxIMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx
 
Lossless Huffman coding image compression implementation in spatial domain by...
Lossless Huffman coding image compression implementation in spatial domain by...Lossless Huffman coding image compression implementation in spatial domain by...
Lossless Huffman coding image compression implementation in spatial domain by...
 
Hybrid compression based stationary wavelet transforms
Hybrid compression based stationary wavelet transformsHybrid compression based stationary wavelet transforms
Hybrid compression based stationary wavelet transforms
 
A0540106
A0540106A0540106
A0540106
 

Recently uploaded

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 

Recently uploaded (20)

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 

Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

  • 1. Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding https://arxiv.org/abs/2205.11487 https://imagen.research.google/ Vitaly Bondar johngull @ gmail
  • 3. Key contributions of the paper 1. We discover that large frozen language models trained only on text data are surprisingly very effective text encoders for text-to-image generation, and that scaling the size of frozen text encoder improves sample quality significantly more than scaling the size of image diffusion model. 2. We introduce dynamic thresholding, a new diffusion sampling technique to leverage high guidance weights and generating more photorealistic and detailed images than previously possible. 3. We highlight several important diffusion architecture design choices and propose Efficient U-Net, a new architecture variant which is simpler, converges faster and is more memory efficient. 4. We achieve a new state-of-the-art COCO FID of 7.27. Human raters find Imagen to be on-par with the reference images in terms of image-text alignment. 5. We introduce DrawBench, a new comprehensive and challenging evaluation benchmark for the text-to-image task. On DrawBench human evaluation, we find Imagen to outperform all other work, including the concurrent work of DALL-E 2
  • 4. Classifier-free guidance Increasing of w improve fidelity, but decrease realism. Surprisingly w>1 works well also. Even extremely high values like 7-10. Value used in Imagen - 0.9 (10% zeroing of text embeddings) With high w train/test mismatch happens and reduce quality on the new texts. Also leads to moving out of [-1;1] range.
  • 7. Noise conditioning Imagen uses noise conditioning augmentation for both the super-resolution models. We find this to be a critical for generating high fidelity images. Given a conditioning low-resolution image and augmentation level (a.k.a aug_level) (e.g., strength of Gaussian noise or blur), we corrupt the low-resolution image with the augmentation (corresponding to aug_level), and condition the diffusion model on aug_level. During training, aug_level is chosen randomly, while during inference if fixed (best 0.1-0.2).
  • 9. SuperResolution models (Efficient U-Net) Efficient U-Net makes several key modifications to the typical U-Net: ● We shift the model parameters from the high resolution blocks to the low resolution blocks, via adding more residual blocks for the lower resolutions. Since lower resolution blocks typically have many more channels, this allows us to increase the model capacity through more model parameters, without egregious memory and computation costs. ● When using large number of residual blocks at lower-resolution (e.g. we use 8 residual blocks at lower-resolutions compared to typical 2-3 residual blocks used in standard U-Net architectures [16, 59]) we find that scaling the skip connections by 1/√2 similar to [66, 59] significantly improves convergence speed. ● In a typical U-Net’s downsampling block, the downsampling operation happens after the convolutions, and in an upsampling block, the upsampling operation happens prior the convolution. We reverse this order for both downsampling and upsampling blocks in order to significantly improve the speed of the forward pass of the U-Net, and find no performance degradation.
  • 14. Training We use a batch size of 2048 and 2.5M training steps for all models. 256 TPU-v4 chips for our base 64 × 64 model, and 128 TPU-v4 chips for both super-resolution For classifier-free guidance, we joint-train unconditionally via zeroing out the text embeddings with 10% probability for all three models Dataset: We train on a combination of internal datasets, with ≈ 460M image-text pairs, and the publicly available Laion dataset, with ≈ 400M image-text pairs Our 256×256 → 1024×1024 super-resolution model trains on 64×64 → 256×256 crops of the 1024 × 1024 image. To facilitate this, we remove the self-attention layers, however we keep the text cross-attention layers which we found to be critical.
  • 19.