SlideShare a Scribd company logo
1 of 23
Download to read offline
Imagen: Photorealistic Text-to-Image Diffusion
Models with Deep Language Understanding
https://arxiv.org/abs/2205.11487
https://imagen.research.google/
Vitaly Bondar
johngull @ gmail
Architecture
Key contributions of the paper
1. We discover that large frozen language models trained only on text data are surprisingly very
effective text encoders for text-to-image generation, and that scaling the size of frozen text encoder
improves sample quality significantly more than scaling the size of image diffusion model.
2. We introduce dynamic thresholding, a new diffusion sampling technique to leverage high guidance
weights and generating more photorealistic and detailed images than previously possible.
3. We highlight several important diffusion architecture design choices and propose Efficient
U-Net, a new architecture variant which is simpler, converges faster and is more memory efficient.
4. We achieve a new state-of-the-art COCO FID of 7.27. Human raters find Imagen to be on-par with
the reference images in terms of image-text alignment.
5. We introduce DrawBench, a new comprehensive and challenging evaluation benchmark for the
text-to-image task. On DrawBench human evaluation, we find Imagen to outperform all other work,
including the concurrent work of DALL-E 2
Classifier-free guidance
Increasing of w improve fidelity, but decrease realism. Surprisingly w>1 works well
also. Even extremely high values like 7-10.
Value used in Imagen - 0.9 (10% zeroing of text embeddings)
With high w train/test mismatch happens and reduce quality on the new texts. Also
leads to moving out of [-1;1] range.
Dynamic thresholding
Dynamic thresholding
Noise conditioning
Imagen uses noise conditioning augmentation for both the super-resolution models. We find this to be a critical for generating high
fidelity images.
Given a conditioning low-resolution image and augmentation level (a.k.a aug_level) (e.g., strength of Gaussian noise or blur), we
corrupt the low-resolution image with the augmentation (corresponding to aug_level), and condition the diffusion model on aug_level.
During training, aug_level is chosen randomly, while during inference if fixed (best 0.1-0.2).
64x64 model
SuperResolution models (Efficient U-Net)
Efficient U-Net makes several key modifications to the typical U-Net:
● We shift the model parameters from the high resolution blocks to the low resolution blocks,
via adding more residual blocks for the lower resolutions. Since lower resolution blocks typically
have many more channels, this allows us to increase the model capacity through more model
parameters, without egregious memory and computation costs.
● When using large number of residual blocks at lower-resolution (e.g. we use 8 residual blocks at
lower-resolutions compared to typical 2-3 residual blocks used in standard U-Net architectures [16,
59]) we find that scaling the skip connections by 1/√2 similar to [66, 59] significantly improves
convergence speed.
● In a typical U-Net’s downsampling block, the downsampling operation happens after the
convolutions, and in an upsampling block, the upsampling operation happens prior the convolution.
We reverse this order for both downsampling and upsampling blocks in order to significantly
improve the speed of the forward pass of the U-Net, and find no performance degradation.
SuperResolution models (Efficient U-Net)
SuperResolution models (Efficient U-Net)
SuperResolution models (Efficient U-Net)
Impact of sizes
Training
We use a batch size of 2048 and 2.5M training steps for all models.
256 TPU-v4 chips for our base 64 × 64 model, and 128 TPU-v4 chips for both super-resolution
For classifier-free guidance, we joint-train unconditionally via zeroing out the text embeddings with 10% probability for all three models
Dataset: We train on a combination of internal datasets, with ≈ 460M image-text pairs, and the publicly available Laion dataset, with ≈ 400M
image-text pairs
Our 256×256 → 1024×1024 super-resolution model trains on 64×64 → 256×256 crops of the 1024 × 1024 image. To facilitate this, we remove the
self-attention layers, however we keep the text cross-attention layers which we found to be critical.
Results
Results
Results
Results
Results
Results
Results
Results

More Related Content

What's hot

Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs)Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs)Amol Patil
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Appsilon Data Science
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative ModelsMLReview
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksYunjey Choi
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksDing Li
 
Generative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their ApplicationsGenerative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their ApplicationsArtifacia
 
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIWithTheBest
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Prakhar Rastogi
 
PR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsHyeongmin Lee
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningMohamed Loey
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and ApplicationsHoang Nguyen
 
Introduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksIntroduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksBennoG1
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Manohar Mukku
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesFellowship at Vodafone FutureLab
 
Exploring Generating AI with Diffusion Models
Exploring Generating AI with Diffusion ModelsExploring Generating AI with Diffusion Models
Exploring Generating AI with Diffusion ModelsKonfHubTechConferenc
 

What's hot (20)

Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs)Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs)
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Generative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their ApplicationsGenerative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their Applications
 
Generative models
Generative modelsGenerative models
Generative models
 
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
 
Generative adversarial text to image synthesis
Generative adversarial text to image synthesisGenerative adversarial text to image synthesis
Generative adversarial text to image synthesis
 
Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN)
 
U-Net (1).pptx
U-Net (1).pptxU-Net (1).pptx
U-Net (1).pptx
 
PR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic ModelsPR-409: Denoising Diffusion Probabilistic Models
PR-409: Denoising Diffusion Probabilistic Models
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
 
Introduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksIntroduction to Generative Adversarial Networks
Introduction to Generative Adversarial Networks
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)
 
Zero shot learning
Zero shot learning Zero shot learning
Zero shot learning
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network Approaches
 
Exploring Generating AI with Diffusion Models
Exploring Generating AI with Diffusion ModelsExploring Generating AI with Diffusion Models
Exploring Generating AI with Diffusion Models
 

Similar to Photorealistic Text-to-Image Models with Deep Language Understanding

Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSitakanta Mishra
 
Neural Style Transfer in Practice
Neural Style Transfer in PracticeNeural Style Transfer in Practice
Neural Style Transfer in PracticeKhalilBergaoui
 
Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...
Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...
Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...DR.P.S.JAGADEESH KUMAR
 
A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution Mohammed Ashour
 
A Novel Blind SR Method to Improve the Spatial Resolution of Real Life Video ...
A Novel Blind SR Method to Improve the Spatial Resolution of Real Life Video ...A Novel Blind SR Method to Improve the Spatial Resolution of Real Life Video ...
A Novel Blind SR Method to Improve the Spatial Resolution of Real Life Video ...IRJET Journal
 
Automated Speech Recognition
Automated Speech Recognition Automated Speech Recognition
Automated Speech Recognition Pruthvij Thakar
 
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...Edge AI and Vision Alliance
 
IRJET- American Sign Language Classification
IRJET- American Sign Language ClassificationIRJET- American Sign Language Classification
IRJET- American Sign Language ClassificationIRJET Journal
 
Implementing Neural Style Transfer
Implementing Neural Style Transfer Implementing Neural Style Transfer
Implementing Neural Style Transfer Tahsin Mayeesha
 
App development with stable diffusion model Unlocking the power of generative...
App development with stable diffusion model Unlocking the power of generative...App development with stable diffusion model Unlocking the power of generative...
App development with stable diffusion model Unlocking the power of generative...StephenAmell4
 
06 13sept 8313 9997-2-ed an adaptive (edit lafi)
06 13sept 8313 9997-2-ed an adaptive (edit lafi)06 13sept 8313 9997-2-ed an adaptive (edit lafi)
06 13sept 8313 9997-2-ed an adaptive (edit lafi)IAESIJEECS
 
An improved image compression algorithm based on daubechies wavelets with ar...
An improved image compression algorithm based on daubechies  wavelets with ar...An improved image compression algorithm based on daubechies  wavelets with ar...
An improved image compression algorithm based on daubechies wavelets with ar...Alexander Decker
 
Sign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedSign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedIRJET Journal
 
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxxIMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxxAtharvaTanawade
 
Lossless Huffman coding image compression implementation in spatial domain by...
Lossless Huffman coding image compression implementation in spatial domain by...Lossless Huffman coding image compression implementation in spatial domain by...
Lossless Huffman coding image compression implementation in spatial domain by...IRJET Journal
 
Hybrid compression based stationary wavelet transforms
Hybrid compression based stationary wavelet transformsHybrid compression based stationary wavelet transforms
Hybrid compression based stationary wavelet transformsOmar Ghazi
 

Similar to Photorealistic Text-to-Image Models with Deep Language Understanding (20)

Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSaptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
 
Neural Style Transfer in Practice
Neural Style Transfer in PracticeNeural Style Transfer in Practice
Neural Style Transfer in Practice
 
Neural Style Transfer in practice
Neural Style Transfer in practiceNeural Style Transfer in practice
Neural Style Transfer in practice
 
Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...
Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...
Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...
 
A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution
 
A Novel Blind SR Method to Improve the Spatial Resolution of Real Life Video ...
A Novel Blind SR Method to Improve the Spatial Resolution of Real Life Video ...A Novel Blind SR Method to Improve the Spatial Resolution of Real Life Video ...
A Novel Blind SR Method to Improve the Spatial Resolution of Real Life Video ...
 
Automated Speech Recognition
Automated Speech Recognition Automated Speech Recognition
Automated Speech Recognition
 
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
 
IRJET- American Sign Language Classification
IRJET- American Sign Language ClassificationIRJET- American Sign Language Classification
IRJET- American Sign Language Classification
 
Implementing Neural Style Transfer
Implementing Neural Style Transfer Implementing Neural Style Transfer
Implementing Neural Style Transfer
 
App development with stable diffusion model Unlocking the power of generative...
App development with stable diffusion model Unlocking the power of generative...App development with stable diffusion model Unlocking the power of generative...
App development with stable diffusion model Unlocking the power of generative...
 
06 13sept 8313 9997-2-ed an adaptive (edit lafi)
06 13sept 8313 9997-2-ed an adaptive (edit lafi)06 13sept 8313 9997-2-ed an adaptive (edit lafi)
06 13sept 8313 9997-2-ed an adaptive (edit lafi)
 
An improved image compression algorithm based on daubechies wavelets with ar...
An improved image compression algorithm based on daubechies  wavelets with ar...An improved image compression algorithm based on daubechies  wavelets with ar...
An improved image compression algorithm based on daubechies wavelets with ar...
 
Mnist report ppt
Mnist report pptMnist report ppt
Mnist report ppt
 
Mnist report
Mnist reportMnist report
Mnist report
 
Sign Detection from Hearing Impaired
Sign Detection from Hearing ImpairedSign Detection from Hearing Impaired
Sign Detection from Hearing Impaired
 
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxxIMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx
IMAGE CAPTION GENERATOR.pptx1.pptxxxxxxxxxx
 
Lossless Huffman coding image compression implementation in spatial domain by...
Lossless Huffman coding image compression implementation in spatial domain by...Lossless Huffman coding image compression implementation in spatial domain by...
Lossless Huffman coding image compression implementation in spatial domain by...
 
Hybrid compression based stationary wavelet transforms
Hybrid compression based stationary wavelet transformsHybrid compression based stationary wavelet transforms
Hybrid compression based stationary wavelet transforms
 
A0540106
A0540106A0540106
A0540106
 

Recently uploaded

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 

Recently uploaded (20)

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 

Photorealistic Text-to-Image Models with Deep Language Understanding

  • 1. Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding https://arxiv.org/abs/2205.11487 https://imagen.research.google/ Vitaly Bondar johngull @ gmail
  • 3. Key contributions of the paper 1. We discover that large frozen language models trained only on text data are surprisingly very effective text encoders for text-to-image generation, and that scaling the size of frozen text encoder improves sample quality significantly more than scaling the size of image diffusion model. 2. We introduce dynamic thresholding, a new diffusion sampling technique to leverage high guidance weights and generating more photorealistic and detailed images than previously possible. 3. We highlight several important diffusion architecture design choices and propose Efficient U-Net, a new architecture variant which is simpler, converges faster and is more memory efficient. 4. We achieve a new state-of-the-art COCO FID of 7.27. Human raters find Imagen to be on-par with the reference images in terms of image-text alignment. 5. We introduce DrawBench, a new comprehensive and challenging evaluation benchmark for the text-to-image task. On DrawBench human evaluation, we find Imagen to outperform all other work, including the concurrent work of DALL-E 2
  • 4. Classifier-free guidance Increasing of w improve fidelity, but decrease realism. Surprisingly w>1 works well also. Even extremely high values like 7-10. Value used in Imagen - 0.9 (10% zeroing of text embeddings) With high w train/test mismatch happens and reduce quality on the new texts. Also leads to moving out of [-1;1] range.
  • 7. Noise conditioning Imagen uses noise conditioning augmentation for both the super-resolution models. We find this to be a critical for generating high fidelity images. Given a conditioning low-resolution image and augmentation level (a.k.a aug_level) (e.g., strength of Gaussian noise or blur), we corrupt the low-resolution image with the augmentation (corresponding to aug_level), and condition the diffusion model on aug_level. During training, aug_level is chosen randomly, while during inference if fixed (best 0.1-0.2).
  • 9. SuperResolution models (Efficient U-Net) Efficient U-Net makes several key modifications to the typical U-Net: ● We shift the model parameters from the high resolution blocks to the low resolution blocks, via adding more residual blocks for the lower resolutions. Since lower resolution blocks typically have many more channels, this allows us to increase the model capacity through more model parameters, without egregious memory and computation costs. ● When using large number of residual blocks at lower-resolution (e.g. we use 8 residual blocks at lower-resolutions compared to typical 2-3 residual blocks used in standard U-Net architectures [16, 59]) we find that scaling the skip connections by 1/√2 similar to [66, 59] significantly improves convergence speed. ● In a typical U-Net’s downsampling block, the downsampling operation happens after the convolutions, and in an upsampling block, the upsampling operation happens prior the convolution. We reverse this order for both downsampling and upsampling blocks in order to significantly improve the speed of the forward pass of the U-Net, and find no performance degradation.
  • 14. Training We use a batch size of 2048 and 2.5M training steps for all models. 256 TPU-v4 chips for our base 64 × 64 model, and 128 TPU-v4 chips for both super-resolution For classifier-free guidance, we joint-train unconditionally via zeroing out the text embeddings with 10% probability for all three models Dataset: We train on a combination of internal datasets, with ≈ 460M image-text pairs, and the publicly available Laion dataset, with ≈ 400M image-text pairs Our 256×256 → 1024×1024 super-resolution model trains on 64×64 → 256×256 crops of the 1024 × 1024 image. To facilitate this, we remove the self-attention layers, however we keep the text cross-attention layers which we found to be critical.
  • 19.