SlideShare a Scribd company logo
1 of 19
Region Graph Embedding
Network for Zero-Shot Learning
Tien-Bach-Thanh Do
Network Science Lab
Dept. of Artificial Intelligence
The Catholic University of Korea
E-mail: osfa19730@catholic.ac.kr
2024/04/08
Guo-Sen Xie et al.
ECCV 2020
2
Introduction
● What is Zero-Shot Learning?
○ ZSL is a model’s ability to detect classes never seen during training
○ The condition is that the classes are not known during supervised learning
○ Earlier work in ZSL use attributes in a two-step approach to infer unknown classes
○ In the CV context, more recent advances learn mappings from image feature space to
semantic space
○ Other approaches learn non-linear multimodal embeddings
○ In the modern NLP context, language models can be evaluated on downstream tasks
without fine tuning
● What is Generalized ZSL?
○ The set of classes are split into seen and unseen classes, where training relies on the
semantic features of the seen and unseen classes and the visual representations of only
the seen classes, while testing uses the visual representations of the seen and unseen
classes
3
Previous Approaches
● ZSL rely on learning attribute classifiers, based on which the class posterior of a test image is deduced,
however, associations among these attributes are not well exploited
● Embedding based methods
○ Accompanied by a compatibility loss and can effectively address the association issue [50]
○ Leverage a compatibility hinge loss for learning the association between images and attributes
[2]
○ [39,66,41,4,14] are also competitive embedding based models, however these methods usually
achieve relatively inferior results, since they adopt global features and/or exploit shallow models
○ End-to-end CNN models [33,26,42,28] obtain the best performances => extend the compatibility
loss by adding the seen class attributes and advocate learning more discriminative features,
however they struggle to focus on the discriminative parts which are intrinsically accounting for
better semantic transfer
● Part-based ZSL
○ [11,1,64] utilized part annotations to discover discriminative part features for tackling fine-
grained ZSL, however part annotation are costly and labor-dependent
○ Pursing automatic part discovery [53], attention mechanisms [57,56,55,25] have been applied
into ZSL and GZSL [52,80,78,30] for capturing multiple semantic regions, which can facilitate
4
Previous Approaches
● Part-based ZSL
○ [11,1,64] utilized part annotations to discover discriminative part features for tackling fine-
grained ZSL, however part annotation are costly and labor-dependent
○ Pursing automatic part discovery [53], attention mechanisms [57,56,55,25] have been applied
into ZSL and GZSL [52,80,78,30] for capturing multiple semantic regions, which can facilitate
desirable knowledge transfer, these methods achieve remarkable improvements on ZSL, but the
performance gains on GZSL are not satisfactory => fail at solving the domain bias issue
5
How graph is created?
● Each node in the graph representing an attended region in the image
● Edges of these region nodes are their pairwise appearance similarities
6
Framework
7
Method
Task Definitions
● Have Ns training samples from Cs seen classes which are defined as S = {(ss
i, ys
i)}Ns
i=1
● XS = {xs
i}Ns
i=1 and YS are the training dataset and its label set
● The seen class label of the ith sample xs
i is ys
i ∈ YS
● As = {xs
i}Ns
i=1 represents the semantic vector set of seen classes
● Given an unseen testing set U = {(xu
i, yu
i)}Nu
i=1 which Nu samples => predict the label ys
i ∈ YU for each
● More knowledge for U is provided by the semantic vector set Au = {au
i}Cu
i=1 for the Cu unseen classes
● The label sets of seen and unseen classes are disjoint
● For GZSL, the searched label space is expanded to Y = YS U YU by taking samples from both seen and
unseen classes as the testing data
● Denote as
i / au
i ∈ RQ
8
Method
Overview
● RGEN consists of two sub-branches:
○ Constrained Part Attention (CPA) branch
■ Capable of automatically discovering more discriminative regions => to generate attended object
regions and is different from [52]
● Unlike [52] without any regularizations on attention masks, compactness and diversity are
introduced for learning desirable parts
● Transfer and balance losses are leveraged comparing to [52] which uses attribute
incorporated cross-entropy loss
○ Parts Relation Reasoning (PRR) branch
■ Aims at capturing appearance relationships among the discovered parts by GCN-based graph
reasoning
■ The outputs of such GCNs are updated node features, which are further used to learn
embedding to the semantic space
● Both branches are jointly trained by the proposed transfer and balance losses
9
Method
Constrained Part Attention Branch
● Attention Parts Generation
○ Leverage the soft spatial attention to map image x into a set of K part features
○ Suppose the last convolutional feature map w.r.t. x is Z(x) ∈ RH*W*C, which H,W,C being its height,
width, and channel number
○ K attention masks {Mi(x)}K
i=1 are obtained by a 1*1 convolution G on Z(x) and a Sigmoid thresholding
where Mi(x) ∈ RH*W is the ith attention mask of input x. Based on these masks, obtain K corresponding
attentive feature maps {Ti(x)}K
i=1 w.r.t Z(x)
where R reshapes the input to be the same shape as Z(x), ☉ is an element-wise multiplication and Ti(x) .
Apply global max-pooling to each Ti(x), get K part features {fi(x)}K
i=1 fi(x) ∈ RC
● {fi(x)}K
i=1 have 2 functions
○ Concatenated as a vector f ∈ RKC, which is connected to the bottleneck layer and then semantic
space
○ Finally the semantic layer output is supervised by the transfer and balance losses
○ They are taken as nodes and used to construct region graph, which is fed to GCNs in the PRR
branch for parts relation reasoning
10
Method
Constrained Part Attention Branch
● Constrained Attention Masks
○ Discover more compact and divergent parts, constrain the attention masks from the channel
clustering
○ Constrain masks from spatial attention
○ Compact loss and divergent loss for K masks on nb batch data are
where Mi hat is an ideal peaked attention map for the ith part, is the maximum activation
of other masks at coordinate (h, w)
11
Method
Parts Relation Reasoning Branch
● K part features {fi(x)}K
i=1 represents one attended region
● Employ GCN to perform region-based relation modeling => lead to PRR branch
● Region graph Г ∈ RK*K, which K part features as its K nodes
● In Г have high confidence edge between similar regions and low confidence edge dissimilar regions
● Conduct l2-normalization on each fi(x) => dot-product is leveraged to calculate the pairwise similarity
● Dot-product calculation is equal to the cosine similarity metric and the graph has self-connections as well
● Calculate the degree matrix D of Г with
● Leverage GCN to perform reasoning on region graph => use 2 layer GCN propagation
where F(0) ∈ RK*C are stacked K part features and C is dimension,W(l) l=0 are learnable parameters and σ is
the Relu activation function
● The updated features undergo a concatenation, a bottleneck layer and an embedding to the semantic
space
12
The Transfer and Balance Losses
The Transfer Loss
● To make ZSL and GZSL feasible, the achieved features should be further embedded into a certain
subspace
● Given the ith seen image and its ground truth semantic vector as
* ∈ AS, suppose its embedded feature is
collectively denoted as ε(xs
i), which equals to the concatenated rows of F(2) or the concatenated K part
features (Ф,f)
● Revisit the ACE loss, to associate image xs
i with its true attribute information, to compatibility score Г*
i is
formulated as
where W are the embedding weights that need to be learned jointly, which is a two-layer MLP in
implementation
● Consider Г*
i as a classification score in the cross-entropy loss, for seen data from a batch, the Attribute
incorporated CE loss (ACE) becomes
where are the scores on Cs seen semantic vectors
13
The Transfer and Balance Losses
The Transfer Loss
● There are 2 drawbacks
○ The learned models are still biased towards seen classes
○ The performances of these deep models are inferior on GZSL
● To alleviate these problems, incorporate unseen attributes Au into RGEN
● Leverage least square regression to obtain the reconstruction coefficients V ∈ RCu*Cs of each seen class
attribute w.r.t. all unseen class attributes V = (BTB + βI)-1 BTA, which is obtained by solving
the ith column of V represents the contrasting class similarity of as
i w.r.t. B
where are the scores w.r.t. Cu unseen semantic vector for is the
softmax-layer normalization of Sij and yi is the column location in V w.r.t. growth-truth semantic vector of xs
i
14
The Transfer and Balance Losses
The Balance Loss
● To tackle the challenge of extreme domain bias in GZSL, propose a balance loss by pursuing the
maximum response consistency, among seen and unseen outputs
● Given the input seen sample xs
i, get its prediction scores on seen class and unseen class attributes as
● To balance these scores from the two sides, the balance loss is proposed for batch data
where max P outputs the maximum value of the input vector P
● The balance loss is only utilized for GZSL, not ZSL, since balancing is not required when only unseen
test images are available
15
Training Objective
● As two branches are guided by proposed transfer and balance losses during end-to-end training
● Only one stream of data as the input of net, the backbone is shared
● The final loss for RGEN is as follows
● The formulations of LCPA and LPRR
where λ1 and λ2 take same values for 2 branches.
● The difference between LCPA and LPRR lies in the concatenated embedding features f and θ
16
Zero-Shot Prediction
● RGEN framework, the unseen test image xu is predicted in a fused manner
● After obtaining the embedding features of xu in the semantic space w.r.t. CPA and PRR branches,
denoted as , calculate their fused result by the same combination
coefficients as the training phase, then predict its label by
where Yu/Y corresponds to ZSL/GZSL
17
Datasets
4 datasets
• SUN [36], CUB[44], AWA2 [50], APY [12]
• Use the Proposed Split [50] for evaluation => more strict and does not contain any class overlapping with
ImageNet classes
18
Results
19
Conclusion
• RGEN is proposed for tacking ZSL and GZSL tasks
• RGEN contains the constrained part attention and the parts relation reasoning branches
• To guide RGEN training, the transfer and balance losses are integrated into the framework
○ The balance loss is especially valuable for alleviating the extreme bias in the deep GZSL models,
providing intrinsic insights for solving GZSL

More Related Content

Similar to 240408_Thanh_LabSeminar[Region Graph Embedding Network for Zero-Shot Learning].pptx

Webinar on Graph Neural Networks
Webinar on Graph Neural NetworksWebinar on Graph Neural Networks
Webinar on Graph Neural NetworksLucaCrociani1
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
Local Binary Fitting Segmentation by Cooperative Quantum Particle Optimization
Local Binary Fitting Segmentation by Cooperative Quantum Particle OptimizationLocal Binary Fitting Segmentation by Cooperative Quantum Particle Optimization
Local Binary Fitting Segmentation by Cooperative Quantum Particle OptimizationTELKOMNIKA JOURNAL
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...
Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...
Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...Universitat Politècnica de Catalunya
 
Attentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene GraphsAttentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene GraphsSangmin Woo
 
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...cscpconf
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
From_seq2seq_to_BERT
From_seq2seq_to_BERTFrom_seq2seq_to_BERT
From_seq2seq_to_BERTHuali Zhao
 
Classification of handwritten characters by their symmetry features
Classification of handwritten characters by their symmetry featuresClassification of handwritten characters by their symmetry features
Classification of handwritten characters by their symmetry featuresAYUSH RAJ
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Universitat Politècnica de Catalunya
 
Recent Progress in RNN and NLP
Recent Progress in RNN and NLPRecent Progress in RNN and NLP
Recent Progress in RNN and NLPhytae
 
Pay attention to MLPs
Pay attention to MLPsPay attention to MLPs
Pay attention to MLPsSeolhokim
 
Deep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxDeep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxvipul6601
 
220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualization220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualizationtaeseon ryu
 
Density Based Clustering Approach for Solving the Software Component Restruct...
Density Based Clustering Approach for Solving the Software Component Restruct...Density Based Clustering Approach for Solving the Software Component Restruct...
Density Based Clustering Approach for Solving the Software Component Restruct...IRJET Journal
 
NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...
NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...
NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...ssuser4b1f48
 
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...csandit
 

Similar to 240408_Thanh_LabSeminar[Region Graph Embedding Network for Zero-Shot Learning].pptx (20)

Webinar on Graph Neural Networks
Webinar on Graph Neural NetworksWebinar on Graph Neural Networks
Webinar on Graph Neural Networks
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
Local Binary Fitting Segmentation by Cooperative Quantum Particle Optimization
Local Binary Fitting Segmentation by Cooperative Quantum Particle OptimizationLocal Binary Fitting Segmentation by Cooperative Quantum Particle Optimization
Local Binary Fitting Segmentation by Cooperative Quantum Particle Optimization
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
 
Localization, Classification, and Evaluation.pdf
Localization, Classification, and Evaluation.pdfLocalization, Classification, and Evaluation.pdf
Localization, Classification, and Evaluation.pdf
 
Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...
Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...
Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...
 
Attentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene GraphsAttentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene Graphs
 
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
 
From_seq2seq_to_BERT
From_seq2seq_to_BERTFrom_seq2seq_to_BERT
From_seq2seq_to_BERT
 
Classification of handwritten characters by their symmetry features
Classification of handwritten characters by their symmetry featuresClassification of handwritten characters by their symmetry features
Classification of handwritten characters by their symmetry features
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
 
Recent Progress in RNN and NLP
Recent Progress in RNN and NLPRecent Progress in RNN and NLP
Recent Progress in RNN and NLP
 
Pay attention to MLPs
Pay attention to MLPsPay attention to MLPs
Pay attention to MLPs
 
Deep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxDeep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptx
 
220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualization220206 transformer interpretability beyond attention visualization
220206 transformer interpretability beyond attention visualization
 
Density Based Clustering Approach for Solving the Software Component Restruct...
Density Based Clustering Approach for Solving the Software Component Restruct...Density Based Clustering Approach for Solving the Software Component Restruct...
Density Based Clustering Approach for Solving the Software Component Restruct...
 
NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...
NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...
NS-CUK Joint Journal Club : S.T.Nguyen, Review on "Graph Neural Networks for ...
 
ML using MATLAB
ML using MATLABML using MATLAB
ML using MATLAB
 
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
Objective Evaluation of a Deep Neural Network Approach for Single-Channel Spe...
 

More from thanhdowork

[20240520_LabSeminar_Huy]DSTAGNN: Dynamic Spatial-Temporal Aware Graph Neural...
[20240520_LabSeminar_Huy]DSTAGNN: Dynamic Spatial-Temporal Aware Graph Neural...[20240520_LabSeminar_Huy]DSTAGNN: Dynamic Spatial-Temporal Aware Graph Neural...
[20240520_LabSeminar_Huy]DSTAGNN: Dynamic Spatial-Temporal Aware Graph Neural...thanhdowork
 
240520_Thanh_LabSeminar[G-MSM: Unsupervised Multi-Shape Matching with Graph-b...
240520_Thanh_LabSeminar[G-MSM: Unsupervised Multi-Shape Matching with Graph-b...240520_Thanh_LabSeminar[G-MSM: Unsupervised Multi-Shape Matching with Graph-b...
240520_Thanh_LabSeminar[G-MSM: Unsupervised Multi-Shape Matching with Graph-b...thanhdowork
 
240513_Thanh_LabSeminar[Learning and Aggregating Lane Graphs for Urban Automa...
240513_Thanh_LabSeminar[Learning and Aggregating Lane Graphs for Urban Automa...240513_Thanh_LabSeminar[Learning and Aggregating Lane Graphs for Urban Automa...
240513_Thanh_LabSeminar[Learning and Aggregating Lane Graphs for Urban Automa...thanhdowork
 
240513_Thuy_Labseminar[Universal Prompt Tuning for Graph Neural Networks].pptx
240513_Thuy_Labseminar[Universal Prompt Tuning for Graph Neural Networks].pptx240513_Thuy_Labseminar[Universal Prompt Tuning for Graph Neural Networks].pptx
240513_Thuy_Labseminar[Universal Prompt Tuning for Graph Neural Networks].pptxthanhdowork
 
[20240513_LabSeminar_Huy]GraphFewShort_Transfer.pptx
[20240513_LabSeminar_Huy]GraphFewShort_Transfer.pptx[20240513_LabSeminar_Huy]GraphFewShort_Transfer.pptx
[20240513_LabSeminar_Huy]GraphFewShort_Transfer.pptxthanhdowork
 
240506_JW_labseminar[Structural Deep Network Embedding].pptx
240506_JW_labseminar[Structural Deep Network Embedding].pptx240506_JW_labseminar[Structural Deep Network Embedding].pptx
240506_JW_labseminar[Structural Deep Network Embedding].pptxthanhdowork
 
[20240506_LabSeminar_Huy]Conditional Local Convolution for Spatio-Temporal Me...
[20240506_LabSeminar_Huy]Conditional Local Convolution for Spatio-Temporal Me...[20240506_LabSeminar_Huy]Conditional Local Convolution for Spatio-Temporal Me...
[20240506_LabSeminar_Huy]Conditional Local Convolution for Spatio-Temporal Me...thanhdowork
 
240506_Thanh_LabSeminar[ASG2Caption].pptx
240506_Thanh_LabSeminar[ASG2Caption].pptx240506_Thanh_LabSeminar[ASG2Caption].pptx
240506_Thanh_LabSeminar[ASG2Caption].pptxthanhdowork
 
240506_Thuy_Labseminar[GraphPrompt: Unifying Pre-Training and Downstream Task...
240506_Thuy_Labseminar[GraphPrompt: Unifying Pre-Training and Downstream Task...240506_Thuy_Labseminar[GraphPrompt: Unifying Pre-Training and Downstream Task...
240506_Thuy_Labseminar[GraphPrompt: Unifying Pre-Training and Downstream Task...thanhdowork
 
[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...
[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...
[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...thanhdowork
 
240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...
240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...
240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...thanhdowork
 
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...thanhdowork
 
240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...
240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...
240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...thanhdowork
 
[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptxthanhdowork
 
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...thanhdowork
 
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...thanhdowork
 
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...thanhdowork
 
240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...
240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...
240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...thanhdowork
 
240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptxthanhdowork
 
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...thanhdowork
 

More from thanhdowork (20)

[20240520_LabSeminar_Huy]DSTAGNN: Dynamic Spatial-Temporal Aware Graph Neural...
[20240520_LabSeminar_Huy]DSTAGNN: Dynamic Spatial-Temporal Aware Graph Neural...[20240520_LabSeminar_Huy]DSTAGNN: Dynamic Spatial-Temporal Aware Graph Neural...
[20240520_LabSeminar_Huy]DSTAGNN: Dynamic Spatial-Temporal Aware Graph Neural...
 
240520_Thanh_LabSeminar[G-MSM: Unsupervised Multi-Shape Matching with Graph-b...
240520_Thanh_LabSeminar[G-MSM: Unsupervised Multi-Shape Matching with Graph-b...240520_Thanh_LabSeminar[G-MSM: Unsupervised Multi-Shape Matching with Graph-b...
240520_Thanh_LabSeminar[G-MSM: Unsupervised Multi-Shape Matching with Graph-b...
 
240513_Thanh_LabSeminar[Learning and Aggregating Lane Graphs for Urban Automa...
240513_Thanh_LabSeminar[Learning and Aggregating Lane Graphs for Urban Automa...240513_Thanh_LabSeminar[Learning and Aggregating Lane Graphs for Urban Automa...
240513_Thanh_LabSeminar[Learning and Aggregating Lane Graphs for Urban Automa...
 
240513_Thuy_Labseminar[Universal Prompt Tuning for Graph Neural Networks].pptx
240513_Thuy_Labseminar[Universal Prompt Tuning for Graph Neural Networks].pptx240513_Thuy_Labseminar[Universal Prompt Tuning for Graph Neural Networks].pptx
240513_Thuy_Labseminar[Universal Prompt Tuning for Graph Neural Networks].pptx
 
[20240513_LabSeminar_Huy]GraphFewShort_Transfer.pptx
[20240513_LabSeminar_Huy]GraphFewShort_Transfer.pptx[20240513_LabSeminar_Huy]GraphFewShort_Transfer.pptx
[20240513_LabSeminar_Huy]GraphFewShort_Transfer.pptx
 
240506_JW_labseminar[Structural Deep Network Embedding].pptx
240506_JW_labseminar[Structural Deep Network Embedding].pptx240506_JW_labseminar[Structural Deep Network Embedding].pptx
240506_JW_labseminar[Structural Deep Network Embedding].pptx
 
[20240506_LabSeminar_Huy]Conditional Local Convolution for Spatio-Temporal Me...
[20240506_LabSeminar_Huy]Conditional Local Convolution for Spatio-Temporal Me...[20240506_LabSeminar_Huy]Conditional Local Convolution for Spatio-Temporal Me...
[20240506_LabSeminar_Huy]Conditional Local Convolution for Spatio-Temporal Me...
 
240506_Thanh_LabSeminar[ASG2Caption].pptx
240506_Thanh_LabSeminar[ASG2Caption].pptx240506_Thanh_LabSeminar[ASG2Caption].pptx
240506_Thanh_LabSeminar[ASG2Caption].pptx
 
240506_Thuy_Labseminar[GraphPrompt: Unifying Pre-Training and Downstream Task...
240506_Thuy_Labseminar[GraphPrompt: Unifying Pre-Training and Downstream Task...240506_Thuy_Labseminar[GraphPrompt: Unifying Pre-Training and Downstream Task...
240506_Thuy_Labseminar[GraphPrompt: Unifying Pre-Training and Downstream Task...
 
[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...
[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...
[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...
 
240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...
240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...
240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...
 
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
 
240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...
240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...
240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...
 
[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx
 
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
 
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
 
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
 
240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...
240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...
240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...
 
240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx
 
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
 

Recently uploaded

DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUMDEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUMELOISARIVERA8
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...EduSkills OECD
 
Basic Civil Engineering notes on Transportation Engineering & Modes of Transport
Basic Civil Engineering notes on Transportation Engineering & Modes of TransportBasic Civil Engineering notes on Transportation Engineering & Modes of Transport
Basic Civil Engineering notes on Transportation Engineering & Modes of TransportDenish Jangid
 
SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code ExamplesPeter Brusilovsky
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024Borja Sotomayor
 
Major project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesMajor project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesAmanpreetKaur157993
 
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community PartnershipsSpring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community Partnershipsexpandedwebsite
 
An Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge AppAn Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge AppCeline George
 
The Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFThe Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFVivekanand Anglo Vedic Academy
 
ANTI PARKISON DRUGS.pptx
ANTI         PARKISON          DRUGS.pptxANTI         PARKISON          DRUGS.pptx
ANTI PARKISON DRUGS.pptxPoojaSen20
 
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjStl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjMohammed Sikander
 
male presentation...pdf.................
male presentation...pdf.................male presentation...pdf.................
male presentation...pdf.................MirzaAbrarBaig5
 
PSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptxPSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptxMarlene Maheu
 
Graduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxGraduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxneillewis46
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...EADTU
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsSandeep D Chaudhary
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptNishitharanjan Rout
 

Recently uploaded (20)

DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUMDEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...
 
Basic Civil Engineering notes on Transportation Engineering & Modes of Transport
Basic Civil Engineering notes on Transportation Engineering & Modes of TransportBasic Civil Engineering notes on Transportation Engineering & Modes of Transport
Basic Civil Engineering notes on Transportation Engineering & Modes of Transport
 
SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code Examples
 
Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024
 
Major project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesMajor project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategies
 
Including Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdfIncluding Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdf
 
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community PartnershipsSpring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
 
An Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge AppAn Overview of the Odoo 17 Knowledge App
An Overview of the Odoo 17 Knowledge App
 
The Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDFThe Story of Village Palampur Class 9 Free Study Material PDF
The Story of Village Palampur Class 9 Free Study Material PDF
 
ANTI PARKISON DRUGS.pptx
ANTI         PARKISON          DRUGS.pptxANTI         PARKISON          DRUGS.pptx
ANTI PARKISON DRUGS.pptx
 
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjjStl Algorithms in C++ jjjjjjjjjjjjjjjjjj
Stl Algorithms in C++ jjjjjjjjjjjjjjjjjj
 
male presentation...pdf.................
male presentation...pdf.................male presentation...pdf.................
male presentation...pdf.................
 
PSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptxPSYPACT- Practicing Over State Lines May 2024.pptx
PSYPACT- Practicing Over State Lines May 2024.pptx
 
Graduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxGraduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptx
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.ppt
 

240408_Thanh_LabSeminar[Region Graph Embedding Network for Zero-Shot Learning].pptx

  • 1. Region Graph Embedding Network for Zero-Shot Learning Tien-Bach-Thanh Do Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: osfa19730@catholic.ac.kr 2024/04/08 Guo-Sen Xie et al. ECCV 2020
  • 2. 2 Introduction ● What is Zero-Shot Learning? ○ ZSL is a model’s ability to detect classes never seen during training ○ The condition is that the classes are not known during supervised learning ○ Earlier work in ZSL use attributes in a two-step approach to infer unknown classes ○ In the CV context, more recent advances learn mappings from image feature space to semantic space ○ Other approaches learn non-linear multimodal embeddings ○ In the modern NLP context, language models can be evaluated on downstream tasks without fine tuning ● What is Generalized ZSL? ○ The set of classes are split into seen and unseen classes, where training relies on the semantic features of the seen and unseen classes and the visual representations of only the seen classes, while testing uses the visual representations of the seen and unseen classes
  • 3. 3 Previous Approaches ● ZSL rely on learning attribute classifiers, based on which the class posterior of a test image is deduced, however, associations among these attributes are not well exploited ● Embedding based methods ○ Accompanied by a compatibility loss and can effectively address the association issue [50] ○ Leverage a compatibility hinge loss for learning the association between images and attributes [2] ○ [39,66,41,4,14] are also competitive embedding based models, however these methods usually achieve relatively inferior results, since they adopt global features and/or exploit shallow models ○ End-to-end CNN models [33,26,42,28] obtain the best performances => extend the compatibility loss by adding the seen class attributes and advocate learning more discriminative features, however they struggle to focus on the discriminative parts which are intrinsically accounting for better semantic transfer ● Part-based ZSL ○ [11,1,64] utilized part annotations to discover discriminative part features for tackling fine- grained ZSL, however part annotation are costly and labor-dependent ○ Pursing automatic part discovery [53], attention mechanisms [57,56,55,25] have been applied into ZSL and GZSL [52,80,78,30] for capturing multiple semantic regions, which can facilitate
  • 4. 4 Previous Approaches ● Part-based ZSL ○ [11,1,64] utilized part annotations to discover discriminative part features for tackling fine- grained ZSL, however part annotation are costly and labor-dependent ○ Pursing automatic part discovery [53], attention mechanisms [57,56,55,25] have been applied into ZSL and GZSL [52,80,78,30] for capturing multiple semantic regions, which can facilitate desirable knowledge transfer, these methods achieve remarkable improvements on ZSL, but the performance gains on GZSL are not satisfactory => fail at solving the domain bias issue
  • 5. 5 How graph is created? ● Each node in the graph representing an attended region in the image ● Edges of these region nodes are their pairwise appearance similarities
  • 7. 7 Method Task Definitions ● Have Ns training samples from Cs seen classes which are defined as S = {(ss i, ys i)}Ns i=1 ● XS = {xs i}Ns i=1 and YS are the training dataset and its label set ● The seen class label of the ith sample xs i is ys i ∈ YS ● As = {xs i}Ns i=1 represents the semantic vector set of seen classes ● Given an unseen testing set U = {(xu i, yu i)}Nu i=1 which Nu samples => predict the label ys i ∈ YU for each ● More knowledge for U is provided by the semantic vector set Au = {au i}Cu i=1 for the Cu unseen classes ● The label sets of seen and unseen classes are disjoint ● For GZSL, the searched label space is expanded to Y = YS U YU by taking samples from both seen and unseen classes as the testing data ● Denote as i / au i ∈ RQ
  • 8. 8 Method Overview ● RGEN consists of two sub-branches: ○ Constrained Part Attention (CPA) branch ■ Capable of automatically discovering more discriminative regions => to generate attended object regions and is different from [52] ● Unlike [52] without any regularizations on attention masks, compactness and diversity are introduced for learning desirable parts ● Transfer and balance losses are leveraged comparing to [52] which uses attribute incorporated cross-entropy loss ○ Parts Relation Reasoning (PRR) branch ■ Aims at capturing appearance relationships among the discovered parts by GCN-based graph reasoning ■ The outputs of such GCNs are updated node features, which are further used to learn embedding to the semantic space ● Both branches are jointly trained by the proposed transfer and balance losses
  • 9. 9 Method Constrained Part Attention Branch ● Attention Parts Generation ○ Leverage the soft spatial attention to map image x into a set of K part features ○ Suppose the last convolutional feature map w.r.t. x is Z(x) ∈ RH*W*C, which H,W,C being its height, width, and channel number ○ K attention masks {Mi(x)}K i=1 are obtained by a 1*1 convolution G on Z(x) and a Sigmoid thresholding where Mi(x) ∈ RH*W is the ith attention mask of input x. Based on these masks, obtain K corresponding attentive feature maps {Ti(x)}K i=1 w.r.t Z(x) where R reshapes the input to be the same shape as Z(x), ☉ is an element-wise multiplication and Ti(x) . Apply global max-pooling to each Ti(x), get K part features {fi(x)}K i=1 fi(x) ∈ RC ● {fi(x)}K i=1 have 2 functions ○ Concatenated as a vector f ∈ RKC, which is connected to the bottleneck layer and then semantic space ○ Finally the semantic layer output is supervised by the transfer and balance losses ○ They are taken as nodes and used to construct region graph, which is fed to GCNs in the PRR branch for parts relation reasoning
  • 10. 10 Method Constrained Part Attention Branch ● Constrained Attention Masks ○ Discover more compact and divergent parts, constrain the attention masks from the channel clustering ○ Constrain masks from spatial attention ○ Compact loss and divergent loss for K masks on nb batch data are where Mi hat is an ideal peaked attention map for the ith part, is the maximum activation of other masks at coordinate (h, w)
  • 11. 11 Method Parts Relation Reasoning Branch ● K part features {fi(x)}K i=1 represents one attended region ● Employ GCN to perform region-based relation modeling => lead to PRR branch ● Region graph Г ∈ RK*K, which K part features as its K nodes ● In Г have high confidence edge between similar regions and low confidence edge dissimilar regions ● Conduct l2-normalization on each fi(x) => dot-product is leveraged to calculate the pairwise similarity ● Dot-product calculation is equal to the cosine similarity metric and the graph has self-connections as well ● Calculate the degree matrix D of Г with ● Leverage GCN to perform reasoning on region graph => use 2 layer GCN propagation where F(0) ∈ RK*C are stacked K part features and C is dimension,W(l) l=0 are learnable parameters and σ is the Relu activation function ● The updated features undergo a concatenation, a bottleneck layer and an embedding to the semantic space
  • 12. 12 The Transfer and Balance Losses The Transfer Loss ● To make ZSL and GZSL feasible, the achieved features should be further embedded into a certain subspace ● Given the ith seen image and its ground truth semantic vector as * ∈ AS, suppose its embedded feature is collectively denoted as ε(xs i), which equals to the concatenated rows of F(2) or the concatenated K part features (Ф,f) ● Revisit the ACE loss, to associate image xs i with its true attribute information, to compatibility score Г* i is formulated as where W are the embedding weights that need to be learned jointly, which is a two-layer MLP in implementation ● Consider Г* i as a classification score in the cross-entropy loss, for seen data from a batch, the Attribute incorporated CE loss (ACE) becomes where are the scores on Cs seen semantic vectors
  • 13. 13 The Transfer and Balance Losses The Transfer Loss ● There are 2 drawbacks ○ The learned models are still biased towards seen classes ○ The performances of these deep models are inferior on GZSL ● To alleviate these problems, incorporate unseen attributes Au into RGEN ● Leverage least square regression to obtain the reconstruction coefficients V ∈ RCu*Cs of each seen class attribute w.r.t. all unseen class attributes V = (BTB + βI)-1 BTA, which is obtained by solving the ith column of V represents the contrasting class similarity of as i w.r.t. B where are the scores w.r.t. Cu unseen semantic vector for is the softmax-layer normalization of Sij and yi is the column location in V w.r.t. growth-truth semantic vector of xs i
  • 14. 14 The Transfer and Balance Losses The Balance Loss ● To tackle the challenge of extreme domain bias in GZSL, propose a balance loss by pursuing the maximum response consistency, among seen and unseen outputs ● Given the input seen sample xs i, get its prediction scores on seen class and unseen class attributes as ● To balance these scores from the two sides, the balance loss is proposed for batch data where max P outputs the maximum value of the input vector P ● The balance loss is only utilized for GZSL, not ZSL, since balancing is not required when only unseen test images are available
  • 15. 15 Training Objective ● As two branches are guided by proposed transfer and balance losses during end-to-end training ● Only one stream of data as the input of net, the backbone is shared ● The final loss for RGEN is as follows ● The formulations of LCPA and LPRR where λ1 and λ2 take same values for 2 branches. ● The difference between LCPA and LPRR lies in the concatenated embedding features f and θ
  • 16. 16 Zero-Shot Prediction ● RGEN framework, the unseen test image xu is predicted in a fused manner ● After obtaining the embedding features of xu in the semantic space w.r.t. CPA and PRR branches, denoted as , calculate their fused result by the same combination coefficients as the training phase, then predict its label by where Yu/Y corresponds to ZSL/GZSL
  • 17. 17 Datasets 4 datasets • SUN [36], CUB[44], AWA2 [50], APY [12] • Use the Proposed Split [50] for evaluation => more strict and does not contain any class overlapping with ImageNet classes
  • 19. 19 Conclusion • RGEN is proposed for tacking ZSL and GZSL tasks • RGEN contains the constrained part attention and the parts relation reasoning branches • To guide RGEN training, the transfer and balance losses are integrated into the framework ○ The balance loss is especially valuable for alleviating the extreme bias in the deep GZSL models, providing intrinsic insights for solving GZSL