Exploring Patch-wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks

Exploring Patch-wise Semantic Relation for Contrastive Learning
in Image-to-Image Translation Tasks
Chanyong Jung*, Gihyun Kwon*, Jong Chul Ye (*: co-author)
Bio Imaging, Signal Processing & Learning Lab, KAIST
CVPR 2022

Introduction
 Heterogeneous semantic relation
1. Patch-wise semantic relation should be preserved, to enhance
spatial correspondence.
2. Negative samples for contrastive loss should be treated
differently, since they have heterogeneous semantics.
We claim:
Patch-wise heterogenous semantic relation Proposed method
 Consistency of semantic relation
 Contrastive loss using hard negative mining by the semantic relation
Shared Encoder Embedding Space
𝑧𝑘 𝑘=1
𝐾
𝑤𝑘 𝑘=1
𝐾
𝑧2
𝑧3
𝑧4 𝑧1
𝑧1
𝑧2
𝑧3
𝑧4
𝑧5
𝑤1
𝑤2
𝑤3
𝑤4
𝑤5
𝑧5
𝑤2
𝑤3
𝑤4 𝑤1 𝑤5
Input & Output
: Consistency of Contrastive Semantic Relation
with Hard Negative mining
We propose:
Patches from horse Patches from
background
: Semantically Unrelated
: Semantically Related
𝑧𝑘
𝑧1
𝑧2
𝑧3
𝑧4
𝑧5

Method
1. Consistency of Semantic relation distribution
Consistency
Similarity Distribution 𝑃𝑘
𝑧𝑘
𝑃𝑘
𝑧1 𝑧2 𝑧3 𝑧4
𝑖
Similarity Distribution 𝑄𝑘
𝑤𝑘
𝑄𝑘
𝑤1 𝑤2 𝑤3 𝑤4
𝑖
Input
Output
𝑃𝑘 𝑖 =
exp 𝑧𝑘
⊤
𝑧𝑖
𝑗=1
𝐾
exp 𝑧𝑘
⊤
𝑧𝑗
𝑄𝑘 𝑖 =
exp 𝑤𝑘
⊤
𝑤𝑖
𝑗=1
𝐾
exp 𝑤𝑘
⊤
𝑤𝑗
𝐿𝑆𝑅𝐶 =
𝑘=1
𝐾
𝐽𝑆𝐷(𝑃𝑘||𝑄𝑘)
Semantic relation of 𝑖-th patch for 𝑘-th
patch is defined as:
Input
Output
Jensen-Shannon divergence(JSD) between 𝑃𝑘, 𝑄𝑘 is
minimized for the semantic relation consistency (SRC) :

Method
2. Contrastive loss with Hard negatives mining
Sampling negatives 𝑧−
by query 𝑧 is modeled as
the von Mises Fisher distribution
𝑧−
∼ 𝑞𝑧− 𝑧−
; 𝑧, 𝛾 =
1
𝑁𝑞
exp 𝛾 𝑧⊤
𝑧−
𝑝𝑍(𝑧−
)
: Hard negatives
: Negative samples
𝑧
: Query point
Embedding space of input image 𝒳
We use the contrastive loss by decoupled infoNCE (DCE)
with hard negatives (hDCE)
𝐿ℎ𝐷𝐶𝐸 𝛾, 𝜏 =
exp 𝑤⊤
𝑧
𝐸𝑞 exp 𝑤⊤𝑧−
=
exp 𝑤⊤
𝑧
Ep[exp 𝛾 𝑧⊤𝑧− exp(𝑤⊤𝑧−)]
𝜏: Temperature parameter
𝛾: Hardness of the negatives
 Negatives are weighted by semantic closeness, exp{𝛾 𝑧⊤
𝑧−
}
 Hardness of the negatives is explicitly controlled by 𝛾
: We train networks by curriculum learning with varying 𝛾
For positive pair (𝑤, 𝑧) and negative pair (𝑤, 𝑧−
) :

Results
𝐺𝑒𝑛𝑐−𝑑𝑒𝑐
Input
𝐺𝑒𝑛𝑐
𝐹
𝐹
hDCE +SRC
Output
𝐺𝑒𝑛𝑐−𝑑𝑒𝑐
Input
AdaIN
𝐺𝑒𝑛𝑐
𝐹
𝐹
hDCE +SRC
Output
𝐹
hDCE
+SRC
𝐺𝑡𝑒𝑎𝑐ℎ𝑒𝑟
𝐺𝑠𝑡𝑢𝑑𝑒𝑛𝑡 𝐹
fixed
Input Output
𝑧𝑘 𝑘=1
𝐾 𝑤𝑘 𝑘=1
𝐾
𝑧𝑘 𝑘=1
𝐾
𝑤𝑘 𝑘=1
𝐾
𝑧𝑘 𝑘=1
𝐾
𝑤𝑘 𝑘=1
𝐾
(a) Single-modal translation (c) GAN Compression
(b) Multi-modal translation
 Three tasks for the experiments:

Results
1. Single-modal translation
Source
 Improvement of output by retaining
patch-wise semantic relation
Source Ours

Results
2. Multi-modal translation
Latent-guided translation
Reference-guided translation
Source Ours
 Improved output by retaining
patch-wise semantic relation
Diverse outputs by random style codes of each class
Input Spring Summer Autumn Winter

Results
3. GAN Compression
Input Teacher Ours Baseline
 Our student inherits the patch-wise
semantic relation from the teacher.
 The output shows improved
correspondence with the teacher
Horse-to-Zebra Map-to-Satellite Cityscapes

Results
- Similarity Map
Input
Output
: Query point
 Semantic relation consistency (SRC) enhances the input-output correspondence
 Hard negative mining (Hneg) sharpens the semantic relations
Input &
Query point
DCE
DCE +SRC
DCE+Hneg
+SRC
InfoNCE

Thank You
Jong Chul Ye
E-mail:
jong.ye@kaist.ac.kr
Gihyun Kwon
E-mail:
cyclomon@kaist.ac.kr
Chanyong Jung
E-mail:
jcy@kaist.ac.kr

Exploring Patch-wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks

Recommended

Recommended

More Related Content

Similar to Exploring Patch-wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks

Similar to Exploring Patch-wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks (19)

Recently uploaded

Recently uploaded (20)

Exploring Patch-wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks

Editor's Notes