Build with AI
on Google Cloud
Session #2 GenAI Deep Dive
2/5/2025
Seattle | Surrey | Vancouver | Burnaby
GDG Seattle
2
Margaret Maynard-Reid
Yenchi Lin
Clive Boulton
Vishal Pallerla
I/O Extended 2019
2024 Build with AI
DevFest Seattle 2018
DevFest Seattle 2022
WTM Lightning Talks 2018
Cloud Study Jam 2018
DevFest Seattle 2017
DevFest Seattle 2016
DevFest Seattle 2015
DevFest Seattle 2024
Follow GDG Seattle on LinkedIn
GDG Surrey
3
Follow GDG Surrey on LinkedIn
GDG Vancouver
Follow GDG Seattle on LinkedIn
4
Follow GDG Vancouver on LinkedIn
Join our GDG Vancouver Community
Volunteer Interest Form
GDG Burnaby
GDG Burnaby Bevy | LinkedIn
5
Build with AI
on Google Cloud
Agenda
● Study series overview
● Talk 1: Imagen 3
● Talk 2: AI Foundations
● Q & A
Seattle | Surrey | Vancouver | Burnaby
Build with AI
on Google Cloud
Study series
overview
7
Topics for this session
● Online study series overview
● Intro to GenAI on Google Cloud
● GenAI beginner path
● …
8
9
Link to Story on Medium
Study series overview
Follow 5 generative AI paths on Google Cloud Skills Boost:
1. 1/22/25 - Beginner: Intro to GenAI (link)
2. 2/5/25 - Generate Smarter GenAI Outputs (link)
3. 2/19/25 - Build & Modernize Apps with GenAI (link)
4. 3/5/25 - Integrate GenAI into Your DataFlow (link)
5. 3/19/25 - Deploy & Manage GenAI Models (link)
Topics are not limited to the above.
Each session: 2 short talks (by Googlers or experts) + Q&A section.
10
What is a learning path?
A learning path has multiple courses
Each course has videos, recommended reading, quiz & hands-on labs.
You will have at least two weeks to work through the materials
It’s OK if you don’t finish and feel free to study ahead
11
Access to Cloud Skills Boost
● Sign up here: https://www.cloudskillsboost.google/
● By RSVP, you get free access for a few months
● The videos are accessible by default while labs each require a credit
● You can work on each GenAI paths before or after each session
Note: Make sure to sign up on Google Cloud Skills Boost with the same email that you used
for event RSVP.
12
Build with AI
on Google Cloud
Imagen 3: Beyond
Image Generation
Margaret Maynard-Reid, AI/ML GDE
13
AI/ML GDE (Google Developer Expert)
3D artist
Fashion Designer
Instructor of MSIS, UW Foster
Ex MS Design Studio, MSR, MS Bing
About me
margaretmz.art
14
What is Generative AI?
A type of AI that creates new content with generative models:
15
Text
Image
Video
Audio
Generative AI
Text
Image
Video
Audio
Vision Generative Models
● 2014 Generative Adversarial
Networks (GANs)
● 2016 Autoregressive Models
● 2019 Variational autoencoders
(VAEs)
● Flow-based models
● 2020 Diffusion models
● 2022 Diffusion Transformer
16
Source: Lilian Weng blog (link)
Diffusion Models
1. Gradually add gaussian
noise to training data
2. Learn how to reverse the
process to generate
images from noise.
17
Source: Nvidia developer blog (link)
Forward image diffusion
Generative reverse denoise
CLIP: Contrastive Language-Image Pre-training
CLIP is a bridge between NLP and computer
vision, connecting text and Images
It has a text encoder and image encoder,
trained with 400 million image-text pairs.
● DALLE, DALLE-2
● Stable Diffusion
● Imagen, Imagen 2, Imagen 3
Paper: Learning Transferable Visual Models From
Natural Language Supervision
18
Diffusion Transformer
Paper: Scalable Diffusion Models with
Transformers
SoTA models using diffusion
transformer:
● Pixart-a
● SORA
● Stable Diffusion 3
19
Timeline: generative AI in vision
Source: Sora paper
20
Imagen 3
Veo/VideoFX
What is Imagen 3?
Google’s state-of-the-art text-to-image model
● Generate images
● Edit images: inpainting, outpainting, background
● Customize with references
21
Imagen 3
Imagen 3 claims top spot of Text-to-Image models on LymSys arena
22
How to access Imagen 3?
Google Labs ImageFX
https://labs.google/fx/tools/image-fx
23
Google Gemini App
https://gemini.google.com/app/
Google Cloud Vertex AI
https://console.cloud.google.com/vertex-ai/
studio/vision?
Google Colab
Imagen 3 - image generation
Generate floral design in
watercolor-style with Imagen 3
24
Integrate the print into my 3D fashion
design in Clo3D
Editing with masks
25
Original
Imagen 3 - use mask to change images
26
Removed necklace Changed earring
Imagen 3 - change background
27
“Change the background to a sandy
beach by the ocean with blue sky”
“Change the background
to a botanical garden”
“Change the background
to fashion runway”
Original image
Imagen 3 - reference image (product)
Max number of reference images: 4
Prompt = “woman drinking out of a teacup[1][2] wearing a green sweater”
28
Imagen 3 - reference image (person)
29
“A purple floral dress” “A green dress”
“A blue dress”
Original image
Veo 2 / VideoFX
Text to Image:
● Veo 2
Text to Image to Video:
● Imagen 3 and Veo 2
Google blog post:
State-of-the-art video and image
generation with Veo 2 and Imagen 3
30
Thank you!
Connect with me to learn more about AI, art & design!
@margaretmz
@margaretmz
@margaretmz
@margaretmz
31
Build with AI
on Google Cloud
32
AI Foundations: From
Embeddings to RAG
Annie Wang, Google
About Me
Software Engineer & Career Coach
@Google
You’re greeted with a
bunch of scary lights
on the dashboard!
You turn on
your car and…
LLM to generate response
Answer
“What does this warning
light mean on my car?”
model
actually… I'm not
sure…
��
Answer
“What does this warning
light mean on my car?”
model vector DB
query...
with the latest external knowledge,
less hallucinations
🤓
What is
Embedding?
Demo
Input
Text
Embedding
Model Embeddings
[0.1, 0.002,
0.56, 0.98…]
Imagine it!
bug
Insect
Bug in the garden
Beetle
Caterpillar
Distance
JSONDecodeError:
Expecting ,
delimiter: line 1
column 8 (char 12)
Bug in the code
Input
Multimodal
Embedding
Model Embeddings
[0.1, 0.002,
0.56, 0.98…]
[0.93, 0.133,
0.142, 0.03…]
[0.22, 0.092,
0.391, 0.78…]
?
Vector Search
Retrieval & Similarity Search
Given a query, search a corpus of items for the most relevant candidate item(s)
…
1
2
Retrieved candidate_items should be
more similar to the query_item than any
other items in the embedding_space
embedding_space
query_item candidate_items
query search rank
k
Behind the scenes, embeddings are used
…
1
2
k
embedding_space
query_item candidate_items
query search rank
[0.1, 0.002,
0.56, 0.98...]
[0.97, 0.003,
0.532, 0.91...]
[0.94, 0.004,
0.553, 0.89...]
[0.1, 0.003,
0.52, 0.89...]
In this approach we create an anthemic opener showing real people using A.I. to do amazing things. This can be everyday people
In this approach we create an anthemic opener showing real people using A.I. to do amazing things. This can be everyday people
In this approach we create an anthemic opener showing real people using A.I. to do amazing things. This can be everyday people
Embedding
space
Position of objects
within an vector
space captures
meaning
This extends to
multimodal data
Joint Embedding
Vector Space
Image:
“gray tabby cat
laying in front of a
Christmas tree”
Text: size color
living
RAG
This augments the
existing LLM’s knowledge
with information it
wasn’t trained on
The LLM generates a
response that weaves
together retrieved chunks
+ pretrained knowledge
Chunks retrieved
from vector search
are fed into LLM
01 02 03
Standalone LLM
Individual asks question to LLM
LLM generates response based on
pretrained knowledge
Answer is returned to user
LLM to generate response Answer
“What does this warning
light mean on my car?”
Question / Input
Query embedding
Vector search
LLM to generate
response
Approx. Nearest
Neighbors
Search
Fetch actual text
based on doc ids
Vector database
Document chunks, images
“What does this warning light mean on my car?”
Answer
RAG
Inputs are turned to embeddings
Vector search → multimodal
outputs (documents, images)
Outputs sent to LLM
Answer is returned to user
Retrieve top-k
relevant items
Thank you :)
anniewang.tech
Build with AI
on Google Cloud
Q&A
57
Build with AI
on Google Cloud
Cloud Skills Boost
walkthrough
58
59
Sign In -Google Cloud Skills Boost
60
Explore Paths
More questions?
Post them on GDG Surrey
Discord server #gen_ai_gcp
61
Scan Me
Have fun studying!
Action items:
● Join discord - post your questions there
● Get access to Cloud Skills Boost credits
● Complete 2nd GenAI path on CSB
● Get started on 3rd GenAI path on CSB
Next session:
● Feb 19, 2025 - Session #3 Gemini (RSVP)
62

Build with AI on Google Cloud Session #2

  • 1.
    Build with AI onGoogle Cloud Session #2 GenAI Deep Dive 2/5/2025 Seattle | Surrey | Vancouver | Burnaby
  • 2.
    GDG Seattle 2 Margaret Maynard-Reid YenchiLin Clive Boulton Vishal Pallerla I/O Extended 2019 2024 Build with AI DevFest Seattle 2018 DevFest Seattle 2022 WTM Lightning Talks 2018 Cloud Study Jam 2018 DevFest Seattle 2017 DevFest Seattle 2016 DevFest Seattle 2015 DevFest Seattle 2024 Follow GDG Seattle on LinkedIn
  • 3.
    GDG Surrey 3 Follow GDGSurrey on LinkedIn
  • 4.
    GDG Vancouver Follow GDGSeattle on LinkedIn 4 Follow GDG Vancouver on LinkedIn Join our GDG Vancouver Community Volunteer Interest Form
  • 5.
    GDG Burnaby GDG BurnabyBevy | LinkedIn 5
  • 6.
    Build with AI onGoogle Cloud Agenda ● Study series overview ● Talk 1: Imagen 3 ● Talk 2: AI Foundations ● Q & A Seattle | Surrey | Vancouver | Burnaby
  • 7.
    Build with AI onGoogle Cloud Study series overview 7
  • 8.
    Topics for thissession ● Online study series overview ● Intro to GenAI on Google Cloud ● GenAI beginner path ● … 8
  • 9.
    9 Link to Storyon Medium
  • 10.
    Study series overview Follow5 generative AI paths on Google Cloud Skills Boost: 1. 1/22/25 - Beginner: Intro to GenAI (link) 2. 2/5/25 - Generate Smarter GenAI Outputs (link) 3. 2/19/25 - Build & Modernize Apps with GenAI (link) 4. 3/5/25 - Integrate GenAI into Your DataFlow (link) 5. 3/19/25 - Deploy & Manage GenAI Models (link) Topics are not limited to the above. Each session: 2 short talks (by Googlers or experts) + Q&A section. 10
  • 11.
    What is alearning path? A learning path has multiple courses Each course has videos, recommended reading, quiz & hands-on labs. You will have at least two weeks to work through the materials It’s OK if you don’t finish and feel free to study ahead 11
  • 12.
    Access to CloudSkills Boost ● Sign up here: https://www.cloudskillsboost.google/ ● By RSVP, you get free access for a few months ● The videos are accessible by default while labs each require a credit ● You can work on each GenAI paths before or after each session Note: Make sure to sign up on Google Cloud Skills Boost with the same email that you used for event RSVP. 12
  • 13.
    Build with AI onGoogle Cloud Imagen 3: Beyond Image Generation Margaret Maynard-Reid, AI/ML GDE 13
  • 14.
    AI/ML GDE (GoogleDeveloper Expert) 3D artist Fashion Designer Instructor of MSIS, UW Foster Ex MS Design Studio, MSR, MS Bing About me margaretmz.art 14
  • 15.
    What is GenerativeAI? A type of AI that creates new content with generative models: 15 Text Image Video Audio Generative AI Text Image Video Audio
  • 16.
    Vision Generative Models ●2014 Generative Adversarial Networks (GANs) ● 2016 Autoregressive Models ● 2019 Variational autoencoders (VAEs) ● Flow-based models ● 2020 Diffusion models ● 2022 Diffusion Transformer 16 Source: Lilian Weng blog (link)
  • 17.
    Diffusion Models 1. Graduallyadd gaussian noise to training data 2. Learn how to reverse the process to generate images from noise. 17 Source: Nvidia developer blog (link) Forward image diffusion Generative reverse denoise
  • 18.
    CLIP: Contrastive Language-ImagePre-training CLIP is a bridge between NLP and computer vision, connecting text and Images It has a text encoder and image encoder, trained with 400 million image-text pairs. ● DALLE, DALLE-2 ● Stable Diffusion ● Imagen, Imagen 2, Imagen 3 Paper: Learning Transferable Visual Models From Natural Language Supervision 18
  • 19.
    Diffusion Transformer Paper: ScalableDiffusion Models with Transformers SoTA models using diffusion transformer: ● Pixart-a ● SORA ● Stable Diffusion 3 19
  • 20.
    Timeline: generative AIin vision Source: Sora paper 20 Imagen 3 Veo/VideoFX
  • 21.
    What is Imagen3? Google’s state-of-the-art text-to-image model ● Generate images ● Edit images: inpainting, outpainting, background ● Customize with references 21
  • 22.
    Imagen 3 Imagen 3claims top spot of Text-to-Image models on LymSys arena 22
  • 23.
    How to accessImagen 3? Google Labs ImageFX https://labs.google/fx/tools/image-fx 23 Google Gemini App https://gemini.google.com/app/ Google Cloud Vertex AI https://console.cloud.google.com/vertex-ai/ studio/vision? Google Colab
  • 24.
    Imagen 3 -image generation Generate floral design in watercolor-style with Imagen 3 24 Integrate the print into my 3D fashion design in Clo3D
  • 25.
  • 26.
    Original Imagen 3 -use mask to change images 26 Removed necklace Changed earring
  • 27.
    Imagen 3 -change background 27 “Change the background to a sandy beach by the ocean with blue sky” “Change the background to a botanical garden” “Change the background to fashion runway” Original image
  • 28.
    Imagen 3 -reference image (product) Max number of reference images: 4 Prompt = “woman drinking out of a teacup[1][2] wearing a green sweater” 28
  • 29.
    Imagen 3 -reference image (person) 29 “A purple floral dress” “A green dress” “A blue dress” Original image
  • 30.
    Veo 2 /VideoFX Text to Image: ● Veo 2 Text to Image to Video: ● Imagen 3 and Veo 2 Google blog post: State-of-the-art video and image generation with Veo 2 and Imagen 3 30
  • 31.
    Thank you! Connect withme to learn more about AI, art & design! @margaretmz @margaretmz @margaretmz @margaretmz 31
  • 32.
    Build with AI onGoogle Cloud 32 AI Foundations: From Embeddings to RAG Annie Wang, Google
  • 33.
    About Me Software Engineer& Career Coach @Google
  • 34.
    You’re greeted witha bunch of scary lights on the dashboard! You turn on your car and…
  • 37.
    LLM to generateresponse Answer “What does this warning light mean on my car?” model actually… I'm not sure… ��
  • 40.
    Answer “What does thiswarning light mean on my car?” model vector DB query... with the latest external knowledge, less hallucinations 🤓
  • 41.
  • 43.
  • 44.
  • 45.
  • 46.
    Insect Bug in thegarden Beetle Caterpillar Distance JSONDecodeError: Expecting , delimiter: line 1 column 8 (char 12) Bug in the code
  • 47.
    Input Multimodal Embedding Model Embeddings [0.1, 0.002, 0.56,0.98…] [0.93, 0.133, 0.142, 0.03…] [0.22, 0.092, 0.391, 0.78…] ?
  • 48.
  • 49.
    Retrieval & SimilaritySearch Given a query, search a corpus of items for the most relevant candidate item(s) … 1 2 Retrieved candidate_items should be more similar to the query_item than any other items in the embedding_space embedding_space query_item candidate_items query search rank k
  • 50.
    Behind the scenes,embeddings are used … 1 2 k embedding_space query_item candidate_items query search rank [0.1, 0.002, 0.56, 0.98...] [0.97, 0.003, 0.532, 0.91...] [0.94, 0.004, 0.553, 0.89...] [0.1, 0.003, 0.52, 0.89...]
  • 51.
    In this approachwe create an anthemic opener showing real people using A.I. to do amazing things. This can be everyday people In this approach we create an anthemic opener showing real people using A.I. to do amazing things. This can be everyday people In this approach we create an anthemic opener showing real people using A.I. to do amazing things. This can be everyday people Embedding space Position of objects within an vector space captures meaning This extends to multimodal data Joint Embedding Vector Space Image: “gray tabby cat laying in front of a Christmas tree” Text: size color living
  • 52.
  • 53.
    This augments the existingLLM’s knowledge with information it wasn’t trained on The LLM generates a response that weaves together retrieved chunks + pretrained knowledge Chunks retrieved from vector search are fed into LLM 01 02 03
  • 54.
    Standalone LLM Individual asksquestion to LLM LLM generates response based on pretrained knowledge Answer is returned to user LLM to generate response Answer “What does this warning light mean on my car?”
  • 55.
    Question / Input Queryembedding Vector search LLM to generate response Approx. Nearest Neighbors Search Fetch actual text based on doc ids Vector database Document chunks, images “What does this warning light mean on my car?” Answer RAG Inputs are turned to embeddings Vector search → multimodal outputs (documents, images) Outputs sent to LLM Answer is returned to user Retrieve top-k relevant items
  • 56.
  • 57.
    Build with AI onGoogle Cloud Q&A 57
  • 58.
    Build with AI onGoogle Cloud Cloud Skills Boost walkthrough 58
  • 59.
    59 Sign In -GoogleCloud Skills Boost
  • 60.
  • 61.
    More questions? Post themon GDG Surrey Discord server #gen_ai_gcp 61 Scan Me
  • 62.
    Have fun studying! Actionitems: ● Join discord - post your questions there ● Get access to Cloud Skills Boost credits ● Complete 2nd GenAI path on CSB ● Get started on 3rd GenAI path on CSB Next session: ● Feb 19, 2025 - Session #3 Gemini (RSVP) 62