1. This document describes Imagen, a new state-of-the-art photorealistic text-to-image diffusion model with deep language understanding. 2. Key contributions include using large frozen language models as effective text encoders, a new dynamic thresholding sampling technique for more photorealistic images, and an efficient U-Net architecture. 3. On various benchmarks including COCO FID and a new DrawBench, human evaluations found Imagen generates images that better align with text prompts and outperform other models including DALL-E 2.