SlideShare a Scribd company logo
Highway
City
street Daytime Dawn/dusk Night
Few-shot Image Generation using Scene Graphs
WiMLDS Meetup, 08.02.2022
A. Farshad*, S. Musatian*, H. Dhamo, N. Navab
Overview
• Images generated from scene graphs normally have low quality
• Limited amount of paired scene graph and image data
2
left of
car car
truck
left of
below
car
J. Johnson, A. Gupta and L. Fei-Fei, "Image Generation from Scene Graphs" CVPR 2018.
Solution:
• Meta Image Generation from Scene Graphs
• Higher image generation quality using meta-learning
• Episodic training
• Episodes based on shared characteristics
Overview
3
left of
car car
truck
left of
below
car
J. Johnson, A. Gupta and L. Fei-Fei, "Image Generation from Scene Graphs" CVPR 2018.
Semantic scene graphs
Graph that describes a scene
● Nodes: objects
● Edges: relationships between objects
○ Action (holding, eating, riding, ...)
○ Proximity (near, left of, front of, above, ...)
○ Support (on, hanging on, ...)
○ Comparison (same as, smaller than, ...)
● Attributes: object properties
○ color, shape, material, …
man
sky
bike
T-shirt
red
old
above
wearing
riding
road
on
on
4
Semantic scene graphs
5
From noise to scene graph
girl horse
tree
grass
riding
behind
under
on
graph
prediction
source image
image
generation
modified image
beside
interactive graph modification
From image to scene graph to image
From image to scene graph From scene graph to image
[Xu CVPR’17]
[Johnson CVPR’18] Purely semantic nodes (object class)
[Ashual ICCV’19] Also use visual features for objects
S. Garg, H. Dhamo, A. Farshad, S. Musatian, N. Navab, and F. Tombari. Unconditional scene graph generation. ICCV 2021.
H. Dhamo, A. Farshad, I. Laina, N. Navab, G. D. Hager, F. Tombari, and C. Rupprecht. Semantic image manipulation using scene graphs. CVPR 2020.
O. Ashual and L. Wolf. Specifying object attributes and relations in interactive scene generation. ICCV 2019.
J. Johnson, A. Gupta, and Li Fei-Fei. Image generation from scene graphs. CVPR 2018.
D. Xu, Y. Zhu, C. Choi, Li Fei-Fei. Scene Graph Generation by Iterative Message Passing. CVPR 2017
[Dhamo CVPR’20]
[Garg ICCV’21]
Image Generation from Scene Graphs
Scene Graph GCN Bounding boxes, Masks Layout Decoder Image
6
Source: J. Johnson, A. Gupta and L. Fei-Fei, "Image Generation from Scene Graphs" CVPR 2018.
Meta-learning
7
Source: Ye, Han-Jia & Sheng, Xiang-Rong & Zhan, De-Chuan. (2020). Few-shot learning with
adaptively initialized task optimizer: a practical meta-learning approach. Machine Learning. 109.
10.1007/s10994-019-05838-7. Source: https://openai.com/blog/reptile/
Reptile
8
Source: Nichol, A., & Schulman, J. (2018). Reptile: a scalable meta-learning algorithm. arXiv preprint arXiv:1803.02999.
Φ
W1 *
W2 *
Method
9
Fine-tuning
Fine-tuning the meta-trained
model on test tasks
+
Few-shot image generation
Task Construction
Dividing the dataset based on
the scene attributes
● Berkeley Deep Drive
○ Attributes: Time of
day, Place
○ e.g: Day, Highway
● Action Genome
○ Video frames
● Visual Genome
○ Clustering using
SCAN [1]
[1] Van Gansbeke, W., Vandenhende, S., Georgoulis, S., Proesmans, M., & Van Gool, L. (2020, August). Scan: Learning to classify images without labels. In ECCV.
Meta-training
Meta-optimization on
randomly sampled tasks
using Reptile [2]
[2] Nichol, A., & Schulman, J. (2018). Reptile: a scalable meta-learning algorithm. arXiv preprint arXiv:1803.02999.
SG2Im
10
Input: Scene graph
Scene layout
on the side of
person
sofa
blanket
sitting on
covered by
GCN
Box
Mask
x1, y1, x2, y2
x1, y1, x2, y2
x1, y1, x2, y2
Layout
prediction
Noise
Output: Image
downsampled
layout
Conv
Conv
Upsample
Cascaded Refinement Network
Source: J. Johnson, A. Gupta and L. Fei-Fei, "Image Generation from Scene Graphs" CVPR 2018.
Embeddings
Method
Task 1 Task N
...
Task k
...
Training tasks
Train
Update meta
model
parameters
θ ⇽ θ + є(θk
- θ)
Pick random task k
GCN Scene layout
G
Model parameters θk
Train on task k for t steps
on the side of
person
sofa
blanket
sitting on
covered by
11
Train for each test task
G
Model parameters θ
Test task A Test task Y
...
∇
θ
Y ∇
θ
A
θA
θY
Test
SG2Im objective GAN objective Meta objective
Meta Model Parameter Update
Datasets
12
Visual Genome
● Diverse dataset with objects and
their semantic relationships
Krishna, R., Zhu, Y.,, ... & Fei-Fei, L. Visual genome: Connecting language and vision using crowdsourced dense image annotations. IJCV 2017.
https://bdd-data.berkeley.edu/
Ji, J., Krishna, R., Fei-Fei, L., & Niebles, J. C. Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs. CVPR 2020.
Berkeley Deep Drive (BDD) Action Genome (AG)
● Split into classes based on image
attributes - 23 classes
● Pre-processing to get rid of the
background and close-shot on the
objects
● Annotates the object that
person interact with (if no
interaction then they are not
annotated)
● Split based on the videos
Source: https://www.bdd100k.com/
Evaluation Metrics
• Frechet Inception Distance (FID)
• Estimates the distance between the Inception feature vectors for real and generated images
• Looks at the embedding layer as a continuous multivariate Gaussian and calculates
Wasserstein-2 distance between two Gaussians
• Distance is used to quantify the quality of generated samples
• Lower scores -> higher-quality images
• Kernel Inception Distance (KID)
• Does not assume any specific form of the estimated distributions
• Uses squared maximum mean discrepancy (MMD) between Inception representations as a
measure of dissimilarity between two probability distributions
• Due to unbiasedness, serves as a better estimation over FID, especially in situations where the
evaluation data is scarce
13
Few-shot Image Generation from Scene Graphs on BDD
Highway
City
street
Daytime Dawn/dusk Night
160-shot 10-shot 5-shot
Method Decoder FID ↓ KID*10^3 ↓ FID ↓ KID*10^3 ↓ FID ↓ KID*10^3 ↓
SG2Im CRN 194 210 176 186.5 196.8 224.2
MIGS (Ours) CRN 158.5 156.4 157 158.4 183.5 187.6
SG2Im SPADE 66.1 42.2 70.6 48.3 95.2 73.1
MIGS (Ours) SPADE 49.5 26.7 46.1 24 53.5 30.7
14
left of
car car
truck
left of
below
car
Ground
Truth
15
10-shot
160-shot
5-shot
SG2Im + CRN SG2Im + SPADE
MIGS (Ours) + CRN MIGS (Ours) + SPADE
Few-shot Image Generation from Scene Graphs on BDD
SG2Im
+
CRN
SG2Im
+
SPADE
MIGS
(Ours)
+
CRN
MIGS
(Ours)
+
SPADE
Ground
Truth
16
Method Decoder FID↓ KID *
10^3↓
SG2Im CRN 198 163.4
MIGS
(Ours)
CRN 174.5 137.8
SG2Im SPADE 141.3 76.3
MIGS
(Ours)
SPADE 98.1 47.4
● Action Genome
○ Video frames
● Qualitative &
Quantitative Results
Few-shot Image Generation from Scene Graphs on AG
17
● Visual Genome
○ Clustering
● Qualitative &
Quantitative Results
160-shot 10-shot 5-shot
Method Decoder FID↓ KID*10^3↓ FID↓ KID*10^3↓ FID↓ KID*10^3↓
SG2Im SPADE 55.20 35.54 81.42 59.39 91.79 68.52
MIGS (Ours) SPADE 54.24 29.00 75.96 50.69 83.54 55.28
Few-shot Image Generation from Scene Graphs on VG
Conclusion
• Meta-learning enables training models with as few as 5 samples
• It can provide a better initialization for non few-shot training
• The first work for a few-shot image generation of scenes in the wild
• Achieved state-of-the-art results for scene graph to image
generation task
• Image generation from scene graphs
• Still a long way to go for high quality image generation
18
Questions?
19
Project Page
Visit Our project page for the source
code, pre-trained models, etc.
migs2021.github.io
20

More Related Content

Similar to “Few-shot Image Generation using Scene Graphs” by Azade Farshad

Interactive Editing of Signed Distance Fields
Interactive Editing of Signed Distance FieldsInteractive Editing of Signed Distance Fields
Interactive Editing of Signed Distance Fields
Matthias Trapp
 
Report
ReportReport
The Origin of Grad-CAM
The Origin of Grad-CAMThe Origin of Grad-CAM
The Origin of Grad-CAM
Shintaro Yoshida
 
S+SSPR 2010 Workshop
S+SSPR 2010 WorkshopS+SSPR 2010 Workshop
S+SSPR 2010 Workshop
Dakshina Ranjan Kisku
 
Motion analysis in video surveillance using edge detection techniques
Motion analysis in video surveillance using edge detection techniquesMotion analysis in video surveillance using edge detection techniques
Motion analysis in video surveillance using edge detection techniques
IOSR Journals
 
IMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEY
IMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEYIMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEY
IMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEY
ijcsit
 
Image Generation with Gans-based Techniques: A Survey
Image Generation with Gans-based Techniques: A SurveyImage Generation with Gans-based Techniques: A Survey
Image Generation with Gans-based Techniques: A Survey
AIRCC Publishing Corporation
 
Large Scale Image Retrieval 2022.pdf
Large Scale Image Retrieval 2022.pdfLarge Scale Image Retrieval 2022.pdf
Large Scale Image Retrieval 2022.pdf
SamuCerezo
 
Generation of Deepfake images using GAN and Least squares GAN.ppt
Generation of Deepfake images using GAN and Least squares GAN.pptGeneration of Deepfake images using GAN and Least squares GAN.ppt
Generation of Deepfake images using GAN and Least squares GAN.ppt
DivyaGugulothu
 
From Vision to Actions - Towards Adaptive & Autonomous Humanoid Robots [PhD D...
From Vision to Actions - Towards Adaptive & Autonomous Humanoid Robots [PhD D...From Vision to Actions - Towards Adaptive & Autonomous Humanoid Robots [PhD D...
From Vision to Actions - Towards Adaptive & Autonomous Humanoid Robots [PhD D...Juxi Leitner
 
3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II
Yu Huang
 
High-Quality Server Side Rendering using the OGC’s 3D Portrayal Service – App...
High-Quality Server Side Rendering using the OGC’s 3D Portrayal Service – App...High-Quality Server Side Rendering using the OGC’s 3D Portrayal Service – App...
High-Quality Server Side Rendering using the OGC’s 3D Portrayal Service – App...
Martin Christen
 
AR/SLAM for end-users
AR/SLAM for end-usersAR/SLAM for end-users
AR/SLAM for end-users
Rakuten Group, Inc.
 
Mars Terrain Image Classification Using Cartesian Genetic Programming #isaira...
Mars Terrain Image Classification Using Cartesian Genetic Programming #isaira...Mars Terrain Image Classification Using Cartesian Genetic Programming #isaira...
Mars Terrain Image Classification Using Cartesian Genetic Programming #isaira...
Juxi Leitner
 
Project_Final_Review.pdf
Project_Final_Review.pdfProject_Final_Review.pdf
Project_Final_Review.pdf
DivyaGugulothu
 
Style gan2 review
Style gan2 reviewStyle gan2 review
Style gan2 review
taeseon ryu
 
利用影像匹配進行物件辨識與對位
利用影像匹配進行物件辨識與對位利用影像匹配進行物件辨識與對位
利用影像匹配進行物件辨識與對位
CHENHuiMei
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp
Deep Learning JP
 
Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector
Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super VectorLec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector
Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector
United States Air Force Academy
 

Similar to “Few-shot Image Generation using Scene Graphs” by Azade Farshad (20)

Interactive Editing of Signed Distance Fields
Interactive Editing of Signed Distance FieldsInteractive Editing of Signed Distance Fields
Interactive Editing of Signed Distance Fields
 
Report
ReportReport
Report
 
The Origin of Grad-CAM
The Origin of Grad-CAMThe Origin of Grad-CAM
The Origin of Grad-CAM
 
S+SSPR 2010 Workshop
S+SSPR 2010 WorkshopS+SSPR 2010 Workshop
S+SSPR 2010 Workshop
 
Motion analysis in video surveillance using edge detection techniques
Motion analysis in video surveillance using edge detection techniquesMotion analysis in video surveillance using edge detection techniques
Motion analysis in video surveillance using edge detection techniques
 
IMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEY
IMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEYIMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEY
IMAGE GENERATION WITH GANS-BASED TECHNIQUES: A SURVEY
 
Image Generation with Gans-based Techniques: A Survey
Image Generation with Gans-based Techniques: A SurveyImage Generation with Gans-based Techniques: A Survey
Image Generation with Gans-based Techniques: A Survey
 
Large Scale Image Retrieval 2022.pdf
Large Scale Image Retrieval 2022.pdfLarge Scale Image Retrieval 2022.pdf
Large Scale Image Retrieval 2022.pdf
 
Generation of Deepfake images using GAN and Least squares GAN.ppt
Generation of Deepfake images using GAN and Least squares GAN.pptGeneration of Deepfake images using GAN and Least squares GAN.ppt
Generation of Deepfake images using GAN and Least squares GAN.ppt
 
From Vision to Actions - Towards Adaptive & Autonomous Humanoid Robots [PhD D...
From Vision to Actions - Towards Adaptive & Autonomous Humanoid Robots [PhD D...From Vision to Actions - Towards Adaptive & Autonomous Humanoid Robots [PhD D...
From Vision to Actions - Towards Adaptive & Autonomous Humanoid Robots [PhD D...
 
3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II3-d interpretation from single 2-d image for autonomous driving II
3-d interpretation from single 2-d image for autonomous driving II
 
High-Quality Server Side Rendering using the OGC’s 3D Portrayal Service – App...
High-Quality Server Side Rendering using the OGC’s 3D Portrayal Service – App...High-Quality Server Side Rendering using the OGC’s 3D Portrayal Service – App...
High-Quality Server Side Rendering using the OGC’s 3D Portrayal Service – App...
 
AR/SLAM for end-users
AR/SLAM for end-usersAR/SLAM for end-users
AR/SLAM for end-users
 
Mars Terrain Image Classification Using Cartesian Genetic Programming #isaira...
Mars Terrain Image Classification Using Cartesian Genetic Programming #isaira...Mars Terrain Image Classification Using Cartesian Genetic Programming #isaira...
Mars Terrain Image Classification Using Cartesian Genetic Programming #isaira...
 
Project_Final_Review.pdf
Project_Final_Review.pdfProject_Final_Review.pdf
Project_Final_Review.pdf
 
Style gan2 review
Style gan2 reviewStyle gan2 review
Style gan2 review
 
利用影像匹配進行物件辨識與對位
利用影像匹配進行物件辨識與對位利用影像匹配進行物件辨識與對位
利用影像匹配進行物件辨識與對位
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
 
[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp[DL輪読会]ClearGrasp
[DL輪読会]ClearGrasp
 
Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector
Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super VectorLec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector
Lec-08 Feature Aggregation II: Fisher Vector, AKULA and Super Vector
 

More from Paris Women in Machine Learning and Data Science

Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
Paris Women in Machine Learning and Data Science
 
How and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe DaudierHow and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe Daudier
Paris Women in Machine Learning and Data Science
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
Paris Women in Machine Learning and Data Science
 
Managing international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha DimbanManaging international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha Dimban
Paris Women in Machine Learning and Data Science
 
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria KnorpsOptimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Paris Women in Machine Learning and Data Science
 
Perspectives, by M. Pannegeon
Perspectives, by M. PannegeonPerspectives, by M. Pannegeon
Evaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled dataEvaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled data
Paris Women in Machine Learning and Data Science
 
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Paris Women in Machine Learning and Data Science
 
An age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-PierreAn age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-Pierre
Paris Women in Machine Learning and Data Science
 
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle LautréApplying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Paris Women in Machine Learning and Data Science
 
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierHow to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
Paris Women in Machine Learning and Data Science
 
Global Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna AbreuGlobal Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna Abreu
Paris Women in Machine Learning and Data Science
 
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie DelonPlug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Paris Women in Machine Learning and Data Science
 
Sales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca IannuzziSales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca Iannuzzi
Paris Women in Machine Learning and Data Science
 
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta BinkyteIdentifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Paris Women in Machine Learning and Data Science
 
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
Paris Women in Machine Learning and Data Science
 
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Paris Women in Machine Learning and Data Science
 
Sandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI projectSandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI project
Paris Women in Machine Learning and Data Science
 
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Paris Women in Machine Learning and Data Science
 
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdfKhrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Paris Women in Machine Learning and Data Science
 

More from Paris Women in Machine Learning and Data Science (20)

Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
How and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe DaudierHow and why AI should fight cybersexism, by Chloe Daudier
How and why AI should fight cybersexism, by Chloe Daudier
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Managing international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha DimbanManaging international tech teams, by Natasha Dimban
Managing international tech teams, by Natasha Dimban
 
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria KnorpsOptimizing GenAI apps, by N. El Mawass and Maria Knorps
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
 
Perspectives, by M. Pannegeon
Perspectives, by M. PannegeonPerspectives, by M. Pannegeon
Perspectives, by M. Pannegeon
 
Evaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled dataEvaluation strategies for dealing with partially labelled or unlabelled data
Evaluation strategies for dealing with partially labelled or unlabelled data
 
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
 
An age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-PierreAn age-old question, by Caroline Jean-Pierre
An age-old question, by Caroline Jean-Pierre
 
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle LautréApplying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
 
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure SoulierHow to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
 
Global Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna AbreuGlobal Ambitions Local Realities, by Anna Abreu
Global Ambitions Local Realities, by Anna Abreu
 
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie DelonPlug-and-Play methods for inverse problems in imagine, by Julie Delon
Plug-and-Play methods for inverse problems in imagine, by Julie Delon
 
Sales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca IannuzziSales Forecasting as a Data Product by Francesca Iannuzzi
Sales Forecasting as a Data Product by Francesca Iannuzzi
 
Identifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta BinkyteIdentifying and mitigating bias in machine learning, by Ruta Binkyte
Identifying and mitigating bias in machine learning, by Ruta Binkyte
 
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...“Turning your ML algorithms into full web apps in no time with Python" by Mar...
“Turning your ML algorithms into full web apps in no time with Python" by Mar...
 
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
Nature Language Processing for proteins by Amélie Héliou, Software Engineer @...
 
Sandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI projectSandrine Henry presents the BechdelAI project
Sandrine Henry presents the BechdelAI project
 
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
Anastasiia Tryputen_War in Ukraine or how extraordinary courage reshapes geop...
 
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdfKhrystyna Grynko WiMLDS - From marketing to Tech.pdf
Khrystyna Grynko WiMLDS - From marketing to Tech.pdf
 

Recently uploaded

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 

Recently uploaded (20)

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 

“Few-shot Image Generation using Scene Graphs” by Azade Farshad

  • 1. Highway City street Daytime Dawn/dusk Night Few-shot Image Generation using Scene Graphs WiMLDS Meetup, 08.02.2022 A. Farshad*, S. Musatian*, H. Dhamo, N. Navab
  • 2. Overview • Images generated from scene graphs normally have low quality • Limited amount of paired scene graph and image data 2 left of car car truck left of below car J. Johnson, A. Gupta and L. Fei-Fei, "Image Generation from Scene Graphs" CVPR 2018.
  • 3. Solution: • Meta Image Generation from Scene Graphs • Higher image generation quality using meta-learning • Episodic training • Episodes based on shared characteristics Overview 3 left of car car truck left of below car J. Johnson, A. Gupta and L. Fei-Fei, "Image Generation from Scene Graphs" CVPR 2018.
  • 4. Semantic scene graphs Graph that describes a scene ● Nodes: objects ● Edges: relationships between objects ○ Action (holding, eating, riding, ...) ○ Proximity (near, left of, front of, above, ...) ○ Support (on, hanging on, ...) ○ Comparison (same as, smaller than, ...) ● Attributes: object properties ○ color, shape, material, … man sky bike T-shirt red old above wearing riding road on on 4
  • 5. Semantic scene graphs 5 From noise to scene graph girl horse tree grass riding behind under on graph prediction source image image generation modified image beside interactive graph modification From image to scene graph to image From image to scene graph From scene graph to image [Xu CVPR’17] [Johnson CVPR’18] Purely semantic nodes (object class) [Ashual ICCV’19] Also use visual features for objects S. Garg, H. Dhamo, A. Farshad, S. Musatian, N. Navab, and F. Tombari. Unconditional scene graph generation. ICCV 2021. H. Dhamo, A. Farshad, I. Laina, N. Navab, G. D. Hager, F. Tombari, and C. Rupprecht. Semantic image manipulation using scene graphs. CVPR 2020. O. Ashual and L. Wolf. Specifying object attributes and relations in interactive scene generation. ICCV 2019. J. Johnson, A. Gupta, and Li Fei-Fei. Image generation from scene graphs. CVPR 2018. D. Xu, Y. Zhu, C. Choi, Li Fei-Fei. Scene Graph Generation by Iterative Message Passing. CVPR 2017 [Dhamo CVPR’20] [Garg ICCV’21]
  • 6. Image Generation from Scene Graphs Scene Graph GCN Bounding boxes, Masks Layout Decoder Image 6 Source: J. Johnson, A. Gupta and L. Fei-Fei, "Image Generation from Scene Graphs" CVPR 2018.
  • 7. Meta-learning 7 Source: Ye, Han-Jia & Sheng, Xiang-Rong & Zhan, De-Chuan. (2020). Few-shot learning with adaptively initialized task optimizer: a practical meta-learning approach. Machine Learning. 109. 10.1007/s10994-019-05838-7. Source: https://openai.com/blog/reptile/
  • 8. Reptile 8 Source: Nichol, A., & Schulman, J. (2018). Reptile: a scalable meta-learning algorithm. arXiv preprint arXiv:1803.02999. Φ W1 * W2 *
  • 9. Method 9 Fine-tuning Fine-tuning the meta-trained model on test tasks + Few-shot image generation Task Construction Dividing the dataset based on the scene attributes ● Berkeley Deep Drive ○ Attributes: Time of day, Place ○ e.g: Day, Highway ● Action Genome ○ Video frames ● Visual Genome ○ Clustering using SCAN [1] [1] Van Gansbeke, W., Vandenhende, S., Georgoulis, S., Proesmans, M., & Van Gool, L. (2020, August). Scan: Learning to classify images without labels. In ECCV. Meta-training Meta-optimization on randomly sampled tasks using Reptile [2] [2] Nichol, A., & Schulman, J. (2018). Reptile: a scalable meta-learning algorithm. arXiv preprint arXiv:1803.02999.
  • 10. SG2Im 10 Input: Scene graph Scene layout on the side of person sofa blanket sitting on covered by GCN Box Mask x1, y1, x2, y2 x1, y1, x2, y2 x1, y1, x2, y2 Layout prediction Noise Output: Image downsampled layout Conv Conv Upsample Cascaded Refinement Network Source: J. Johnson, A. Gupta and L. Fei-Fei, "Image Generation from Scene Graphs" CVPR 2018. Embeddings
  • 11. Method Task 1 Task N ... Task k ... Training tasks Train Update meta model parameters θ ⇽ θ + є(θk - θ) Pick random task k GCN Scene layout G Model parameters θk Train on task k for t steps on the side of person sofa blanket sitting on covered by 11 Train for each test task G Model parameters θ Test task A Test task Y ... ∇ θ Y ∇ θ A θA θY Test SG2Im objective GAN objective Meta objective Meta Model Parameter Update
  • 12. Datasets 12 Visual Genome ● Diverse dataset with objects and their semantic relationships Krishna, R., Zhu, Y.,, ... & Fei-Fei, L. Visual genome: Connecting language and vision using crowdsourced dense image annotations. IJCV 2017. https://bdd-data.berkeley.edu/ Ji, J., Krishna, R., Fei-Fei, L., & Niebles, J. C. Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs. CVPR 2020. Berkeley Deep Drive (BDD) Action Genome (AG) ● Split into classes based on image attributes - 23 classes ● Pre-processing to get rid of the background and close-shot on the objects ● Annotates the object that person interact with (if no interaction then they are not annotated) ● Split based on the videos Source: https://www.bdd100k.com/
  • 13. Evaluation Metrics • Frechet Inception Distance (FID) • Estimates the distance between the Inception feature vectors for real and generated images • Looks at the embedding layer as a continuous multivariate Gaussian and calculates Wasserstein-2 distance between two Gaussians • Distance is used to quantify the quality of generated samples • Lower scores -> higher-quality images • Kernel Inception Distance (KID) • Does not assume any specific form of the estimated distributions • Uses squared maximum mean discrepancy (MMD) between Inception representations as a measure of dissimilarity between two probability distributions • Due to unbiasedness, serves as a better estimation over FID, especially in situations where the evaluation data is scarce 13
  • 14. Few-shot Image Generation from Scene Graphs on BDD Highway City street Daytime Dawn/dusk Night 160-shot 10-shot 5-shot Method Decoder FID ↓ KID*10^3 ↓ FID ↓ KID*10^3 ↓ FID ↓ KID*10^3 ↓ SG2Im CRN 194 210 176 186.5 196.8 224.2 MIGS (Ours) CRN 158.5 156.4 157 158.4 183.5 187.6 SG2Im SPADE 66.1 42.2 70.6 48.3 95.2 73.1 MIGS (Ours) SPADE 49.5 26.7 46.1 24 53.5 30.7 14
  • 15. left of car car truck left of below car Ground Truth 15 10-shot 160-shot 5-shot SG2Im + CRN SG2Im + SPADE MIGS (Ours) + CRN MIGS (Ours) + SPADE Few-shot Image Generation from Scene Graphs on BDD
  • 16. SG2Im + CRN SG2Im + SPADE MIGS (Ours) + CRN MIGS (Ours) + SPADE Ground Truth 16 Method Decoder FID↓ KID * 10^3↓ SG2Im CRN 198 163.4 MIGS (Ours) CRN 174.5 137.8 SG2Im SPADE 141.3 76.3 MIGS (Ours) SPADE 98.1 47.4 ● Action Genome ○ Video frames ● Qualitative & Quantitative Results Few-shot Image Generation from Scene Graphs on AG
  • 17. 17 ● Visual Genome ○ Clustering ● Qualitative & Quantitative Results 160-shot 10-shot 5-shot Method Decoder FID↓ KID*10^3↓ FID↓ KID*10^3↓ FID↓ KID*10^3↓ SG2Im SPADE 55.20 35.54 81.42 59.39 91.79 68.52 MIGS (Ours) SPADE 54.24 29.00 75.96 50.69 83.54 55.28 Few-shot Image Generation from Scene Graphs on VG
  • 18. Conclusion • Meta-learning enables training models with as few as 5 samples • It can provide a better initialization for non few-shot training • The first work for a few-shot image generation of scenes in the wild • Achieved state-of-the-art results for scene graph to image generation task • Image generation from scene graphs • Still a long way to go for high quality image generation 18
  • 20. Project Page Visit Our project page for the source code, pre-trained models, etc. migs2021.github.io 20