Adversarial Reinforced Learning for Unsupervised Domain Adaptation

Paper review
2021/10/18
펀더멘탈팀
김동희, 김창연, 김지연, 이재윤, 송헌, 이근배(P)

1. INTRODUCTION
2. RELATEDWORK
3. BACKGROUND
4. METHODS
5. EXPERIMENTS

• High demand for auto. classification of data
• Large labeled training data → annotation cost ↑
• Necessary to transfer knowledge from an existing labeled domain (SOURCE)
to an unlabeled new domain (TARGET)
• Domain shift phenomenon
• ML models don’t generalize well from SOURCE to TARGET
Domain generalization with Mixstyle, Zhou el al., ICLR 2021
• Domain adaptation (DA) become effective method to mitigate
the domain shift problem
• Traditional method
• Low → deep level instance representation
(AlexNet, ResNet50..)
• Heavily affected by the extracted features
• DL based method
• Design distance matrices to measure
the discrepancy between 2 domains, or
• Learn domain invariant features by adversarial learning

• Distance based methods aim to min. the discrepancy
between source & target
• Classifier: to distinguish the source & target
• Discriminator: to fool the classifier
• Minimax game → the distance between the 2
domains become↓
• Domain Adversarial Neural Network (DANN)
• A minimax loss to integrate a gradient reversal layer to promote the
discrimination of source & target
• Adversarial Discriminative Domain Adaptation (ADDA)
• Uses an inverted label GAN loss to split the source & target
• Features can be learned separately
• Domain Symmetric Net (SymNet)
• A symmetrically designed source/target classifier
• proposed category label loss can improve the domain loss by learning the
invariant feat. between 2 domains
https://lh3.googleusercontent.com/-zsZDA4RqWSs/X9_4Ga_C9TI/AAAAAAAAQRQ/GjX0NhPXa70bjc2SL6XcdzbEOkVlPneKwCLcBGAsYHQ/w608-h241/image.png

• Propose a novel framework called adversarial RL for unsupervised domain adaptation (ARL)
• RL is a selector to identify the closest feature pair between source and target domain
• Develop new reward across source & target
• proposed deep correlation reward on the target can guide the agent to learn the best policy
• select the closest feat. Pair for both domains
• Propose adversarial learning and domain distribution alignment together
• mitigate the discrepancy between source/target domains

• In prior work, we explored the effect of model selection on domains adaption methods
• 16 deep neural networks on 12 domains
• Distance between the source/target from different feature extractors can be shortened
• ShuffleNet and NasnetMobile are closer to each other in projected 2D space
• In the original space, features can be closer each other → two feature sets have similar distance.
• Important to identify close features between two domains

• Goal is to learn a classifier f under a feature extractor F
• ensures lower generalization error in the target domain
• We propose a new framework for unsupervised domain adaptation
• which select the best feature pair between two domains from different pre-trained NN using RL
𝐷𝑠 = 𝑥𝑠𝑖,𝑦𝑗𝑖 ⅈ=1
𝜂𝑠
𝐷𝑡 = 𝑥𝑡𝑖 j=1
𝜂𝑡

𝑆, 𝐴, 𝑇, 𝑅, 𝛾
𝐸 ෍
𝑡=0
𝑇
𝛾𝑡𝑅 𝑠𝑡, 𝑎𝑡
• S: a set of states
• A: a set of actions
• T: transition function T(s, s′,a) = P(s′|s,a) → models the possibility of next state s ′ given action a in state s
• R: the reward function R(s, s′, a) which gets reward R from state s → s ′
• γ: discount factor in which 0 ≤ γ ≤ 1
• T is the timestep at which each episode ends.
• The goal of RL is to learn a policy π(a|s), that maximizes the discounted expected reward as:

For the task in the labeled source domain, it minimizes the following cross-entropy loss:
𝐿𝑆 𝑓 𝐹 𝑥𝑆 , 𝑦𝑆 = −
1
𝑛𝑠
ා
𝑖=1
𝑛𝑆
෍
𝑐=1
𝐶
𝑦𝑠𝑖𝑐 log 𝑓𝑐 1 = 𝜒𝑠𝑖
𝐿𝐴 𝜒𝑠, 𝜒𝑇 = −
1
𝑛𝑠
ා
𝑖=1
𝑛𝑠
log 1 − 𝐷 𝐹 𝑥𝑠𝑖
−
1
𝑛𝑡
෎
𝑗=1
𝑛𝑡
log 𝐷 𝐹 𝑋𝑇𝑗

H is the universal RKHS, and G : X → H.
𝑑ⅈ𝑠𝑡𝑘 𝜒𝑠, 𝜒𝑇 =
1
𝑛𝑠
෍
𝑖=1
𝑛𝑠
𝐺𝑘 𝜒𝑠𝑖
−
1
𝜂𝑡
෍
𝑗=1
𝜂𝑡
𝐺𝑘 𝜒𝑇𝑗
𝐻
𝑑ⅈ𝑠𝑡 𝜒𝑠, 𝜒𝑇 =
1
𝐾2
෍
𝑘𝑠=1
𝐾
∙ ෍
𝑘𝑡=1
𝐾
∙
1
𝑛𝑠
෍
𝑖=1
𝑛𝑠
𝐺𝑘 𝜒𝑠𝑖
−
1
𝜂𝑡
෍
𝑗=1
𝜂𝑡
𝐺𝑘 𝜒𝑇𝑗
𝐻

𝑦𝑖 = 𝑦𝑗 𝑖𝑓 sⅈ𝑚 𝐺 𝜒𝑖 , 𝐺 𝜒𝑗 > sⅈ𝑚 𝐺 𝜒𝑖 , 𝐺 𝜒𝑗≠ ሶ
𝐼
𝑅 𝑓 𝐹 𝐺 𝜒𝑇 , 𝑦𝑇𝑐𝑜𝑟𝑟 =
1
𝑛𝑡
෍
𝑖=1
𝑛𝑡
𝑦𝑝𝑟ⅇ𝑑𝑖 == 𝑦𝑝𝑟𝑒𝑑 𝑦𝑐𝑜𝑟𝑟𝑖
𝑅 𝑓 𝐹 𝐺 𝜒𝑇 , 𝑦𝑆𝑖/𝑇𝑐𝑜𝑟𝑟 =
0 𝑖𝑓 𝑓 𝐹 𝐺 𝜒𝑆𝑖
𝑇𝑖
≠ 𝑦𝑠𝑖/𝑇𝑐𝑜𝑟𝑟𝑖
1 𝑖𝑓 𝑓 𝐹 𝐺 𝜒𝑆𝑖
𝑇𝑖
= 𝑦𝑠𝑖/𝑇𝑐𝑜𝑟𝑟𝑖
𝑅𝑡𝑜𝑡𝑎𝑙 =
1
𝑛𝑆
σ𝑖=1
𝑛𝑠
∙ 𝑅 𝑓 𝐹 𝐺 𝜒𝑆𝑖
, 𝑦𝑆𝑖
+
1
𝑛𝑡
σ𝑗=1
𝑛𝑡
∙ 𝑅 𝑓 𝐹 𝐺 𝜒𝑇𝑖
, 𝑦𝑇𝑐𝑜𝑟𝑟𝑗
𝐿𝑅(𝑅𝑆, 𝑅𝑇 )= 𝐿 𝑓 𝐹 𝐺 𝜒𝑆 , 𝑦𝑆 + 𝐿 𝑓 𝐹 𝐺 𝜒𝑇 , 𝑦𝑇𝑐𝑜𝑟𝑟

𝐿𝐷𝐴(𝐷𝑆, 𝐷𝑇 )= 𝑎𝑟𝑔𝑚𝑖𝑛 (𝐿𝐺 𝑓 𝐹 𝐺 𝜒𝑆 , 𝑦𝑆 + 𝜂 ∙ 𝑓
2
+ 𝜆𝐷𝑓 𝐷𝑆, 𝐷𝑇 + 𝜌𝑅𝑓(𝐷𝑆, 𝐷𝑇)

ℒ(𝜒𝑆, 𝑦𝑆, 𝜒𝑇, 𝑦𝑇𝑐𝑜𝑟𝑟
)= 𝑎𝑟𝑔𝑚𝑖𝑛 (𝐿𝑅 𝑅𝑠, 𝑅𝑇 + 𝐿𝑆(𝑓 𝐹 𝐺 𝜒𝑆 , 𝑦𝑆) + 𝐿𝐴 𝐺 𝜒𝑆 , 𝐺 𝜒𝑇 + 𝐿𝐷𝐴(𝐷𝑆, 𝐷𝑇))

• We evaluate our ARL model using two
benchmark datasets, which are widely used in
UDA.
• We follow the protocol of prior work which
extracted features from 16 pre-trained NN
• Squeezenet, Alexnet, Googlenet,
Shufflenet , Resnet18, Vgg16, Vgg19,
Mobilenetv2, Nasnetmobile, Resnet50,
Resnet101, Densenet201, Inceptionv3,
Xception, Inceptionresnetv2, Nasnetlarge
• All extracted features are from the last fully
connected layer and each image has feature
size of 1,000.
• Office + Caltech-10 standard benchmark for domain adaptation, which contains
Office 10 and Caltech 10 datasets.
• 2,533 images in 4 domains (A, W, D, C)
• Office-31 consists of 4,110 images in 31 classes from 3 domains (A, W,D)
• Office-Home contains 15,588 images from 4 domains, and it has 65 categories and 4
domains (Ar, Cl, Pr, Rw)

Adversarial Reinforced Learning for Unsupervised Domain Adaptation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Adversarial Reinforced Learning for Unsupervised Domain Adaptation

Similar to Adversarial Reinforced Learning for Unsupervised Domain Adaptation (20)

More from taeseon ryu

More from taeseon ryu (20)

Recently uploaded

Recently uploaded (20)

Adversarial Reinforced Learning for Unsupervised Domain Adaptation