Paper review
2021/10/18
펀더멘탈팀
김동희, 김창연, 김지연, 이재윤, 송헌, 이근배(P)
1. INTRODUCTION
2. RELATEDWORK
3. BACKGROUND
4. METHODS
5. EXPERIMENTS
1. INTRODUCTION
• High demand for auto. classification of data
• Large labeled training data → annotation cost ↑
• Necessary to transfer knowledge from an existing labeled domain (SOURCE)
to an unlabeled new domain (TARGET)
• Domain shift phenomenon
• ML models don’t generalize well from SOURCE to TARGET
Domain generalization with Mixstyle, Zhou el al., ICLR 2021
• Domain adaptation (DA) become effective method to mitigate
the domain shift problem
• Traditional method
• Low → deep level instance representation
(AlexNet, ResNet50..)
• Heavily affected by the extracted features
• DL based method
• Design distance matrices to measure
the discrepancy between 2 domains, or
• Learn domain invariant features by adversarial learning
• Distance based methods aim to min. the discrepancy
between source & target
• Classifier: to distinguish the source & target
• Discriminator: to fool the classifier
• Minimax game → the distance between the 2
domains become↓
• Domain Adversarial Neural Network (DANN)
• A minimax loss to integrate a gradient reversal layer to promote the
discrimination of source & target
• Adversarial Discriminative Domain Adaptation (ADDA)
• Uses an inverted label GAN loss to split the source & target
• Features can be learned separately
• Domain Symmetric Net (SymNet)
• A symmetrically designed source/target classifier
• proposed category label loss can improve the domain loss by learning the
invariant feat. between 2 domains
https://lh3.googleusercontent.com/-zsZDA4RqWSs/X9_4Ga_C9TI/AAAAAAAAQRQ/GjX0NhPXa70bjc2SL6XcdzbEOkVlPneKwCLcBGAsYHQ/w608-h241/image.png
• Propose a novel framework called adversarial RL for unsupervised domain adaptation (ARL)
• RL is a selector to identify the closest feature pair between source and target domain
• Develop new reward across source & target
• proposed deep correlation reward on the target can guide the agent to learn the best policy
• select the closest feat. Pair for both domains
• Propose adversarial learning and domain distribution alignment together
• mitigate the discrepancy between source/target domains
2. RELATEDWORK
• In prior work, we explored the effect of model selection on domains adaption methods
• 16 deep neural networks on 12 domains
• Distance between the source/target from different feature extractors can be shortened
• ShuffleNet and NasnetMobile are closer to each other in projected 2D space
• In the original space, features can be closer each other → two feature sets have similar distance.
• Important to identify close features between two domains
Q & A
3. BACKGROUND
• Goal is to learn a classifier f under a feature extractor F
• ensures lower generalization error in the target domain
• We propose a new framework for unsupervised domain adaptation
• which select the best feature pair between two domains from different pre-trained NN using RL
𝐷𝑠 = 𝑥𝑠𝑖,𝑦𝑗𝑖 ⅈ=1
𝜂𝑠
𝐷𝑡 = 𝑥𝑡𝑖 j=1
𝜂𝑡
𝑆, 𝐴, 𝑇, 𝑅, 𝛾
𝐸 ෍
𝑡=0
𝑇
𝛾𝑡𝑅 𝑠𝑡, 𝑎𝑡
• S: a set of states
• A: a set of actions
• T: transition function T(s, s′,a) = P(s′|s,a) → models the possibility of next state s ′ given action a in state s
• R: the reward function R(s, s′, a) which gets reward R from state s → s ′
• γ: discount factor in which 0 ≤ γ ≤ 1
• T is the timestep at which each episode ends.
• The goal of RL is to learn a policy π(a|s), that maximizes the discounted expected reward as:
For the task in the labeled source domain, it minimizes the following cross-entropy loss:
𝐿𝑆 𝑓 𝐹 𝑥𝑆 , 𝑦𝑆 = −
1
𝑛𝑠
ා
𝑖=1
𝑛𝑆
෍
𝑐=1
𝐶
𝑦𝑠𝑖𝑐 log 𝑓𝑐 1 = 𝜒𝑠𝑖
𝐿𝐴 𝜒𝑠, 𝜒𝑇 = −
1
𝑛𝑠
ා
𝑖=1
𝑛𝑠
log 1 − 𝐷 𝐹 𝑥𝑠𝑖
−
1
𝑛𝑡
෎
𝑗=1
𝑛𝑡
log 𝐷 𝐹 𝑋𝑇𝑗
4. METHODS
H is the universal RKHS, and G : X → H.
𝑑ⅈ𝑠𝑡𝑘 𝜒𝑠, 𝜒𝑇 =
1
𝑛𝑠
෍
𝑖=1
𝑛𝑠
𝐺𝑘 𝜒𝑠𝑖
−
1
𝜂𝑡
෍
𝑗=1
𝜂𝑡
𝐺𝑘 𝜒𝑇𝑗
𝐻
𝑑ⅈ𝑠𝑡 𝜒𝑠, 𝜒𝑇 =
1
𝐾2
෍
𝑘𝑠=1
𝐾
∙ ෍
𝑘𝑡=1
𝐾
∙
1
𝑛𝑠
෍
𝑖=1
𝑛𝑠
𝐺𝑘 𝜒𝑠𝑖
−
1
𝜂𝑡
෍
𝑗=1
𝜂𝑡
𝐺𝑘 𝜒𝑇𝑗
𝐻
𝑦𝑖 = 𝑦𝑗 𝑖𝑓 sⅈ𝑚 𝐺 𝜒𝑖 , 𝐺 𝜒𝑗 > sⅈ𝑚 𝐺 𝜒𝑖 , 𝐺 𝜒𝑗≠ ሶ
𝐼
𝑅 𝑓 𝐹 𝐺 𝜒𝑇 , 𝑦𝑇𝑐𝑜𝑟𝑟 =
1
𝑛𝑡
෍
𝑖=1
𝑛𝑡
𝑦𝑝𝑟ⅇ𝑑𝑖 == 𝑦𝑝𝑟𝑒𝑑 𝑦𝑐𝑜𝑟𝑟𝑖
𝑅 𝑓 𝐹 𝐺 𝜒𝑇 , 𝑦𝑆𝑖/𝑇𝑐𝑜𝑟𝑟 =
0 𝑖𝑓 𝑓 𝐹 𝐺 𝜒𝑆𝑖
𝑇𝑖
≠ 𝑦𝑠𝑖/𝑇𝑐𝑜𝑟𝑟𝑖
1 𝑖𝑓 𝑓 𝐹 𝐺 𝜒𝑆𝑖
𝑇𝑖
= 𝑦𝑠𝑖/𝑇𝑐𝑜𝑟𝑟𝑖
𝑅𝑡𝑜𝑡𝑎𝑙 =
1
𝑛𝑆
σ𝑖=1
𝑛𝑠
∙ 𝑅 𝑓 𝐹 𝐺 𝜒𝑆𝑖
, 𝑦𝑆𝑖
+
1
𝑛𝑡
σ𝑗=1
𝑛𝑡
∙ 𝑅 𝑓 𝐹 𝐺 𝜒𝑇𝑖
, 𝑦𝑇𝑐𝑜𝑟𝑟𝑗
𝐿𝑅(𝑅𝑆, 𝑅𝑇 )= 𝐿 𝑓 𝐹 𝐺 𝜒𝑆 , 𝑦𝑆 + 𝐿 𝑓 𝐹 𝐺 𝜒𝑇 , 𝑦𝑇𝑐𝑜𝑟𝑟
𝐿𝐷𝐴(𝐷𝑆, 𝐷𝑇 )= 𝑎𝑟𝑔𝑚𝑖𝑛 (𝐿𝐺 𝑓 𝐹 𝐺 𝜒𝑆 , 𝑦𝑆 + 𝜂 ∙ 𝑓
2
+ 𝜆𝐷𝑓 𝐷𝑆, 𝐷𝑇 + 𝜌𝑅𝑓(𝐷𝑆, 𝐷𝑇)
ℒ(𝜒𝑆, 𝑦𝑆, 𝜒𝑇, 𝑦𝑇𝑐𝑜𝑟𝑟
)= 𝑎𝑟𝑔𝑚𝑖𝑛 (𝐿𝑅 𝑅𝑠, 𝑅𝑇 + 𝐿𝑆(𝑓 𝐹 𝐺 𝜒𝑆 , 𝑦𝑆) + 𝐿𝐴 𝐺 𝜒𝑆 , 𝐺 𝜒𝑇 + 𝐿𝐷𝐴(𝐷𝑆, 𝐷𝑇))
5. EXPERIMENTS
• We evaluate our ARL model using two
benchmark datasets, which are widely used in
UDA.
• We follow the protocol of prior work which
extracted features from 16 pre-trained NN
• Squeezenet, Alexnet, Googlenet,
Shufflenet , Resnet18, Vgg16, Vgg19,
Mobilenetv2, Nasnetmobile, Resnet50,
Resnet101, Densenet201, Inceptionv3,
Xception, Inceptionresnetv2, Nasnetlarge
• All extracted features are from the last fully
connected layer and each image has feature
size of 1,000.
• Office + Caltech-10 standard benchmark for domain adaptation, which contains
Office 10 and Caltech 10 datasets.
• 2,533 images in 4 domains (A, W, D, C)
• Office-31 consists of 4,110 images in 31 classes from 3 domains (A, W,D)
• Office-Home contains 15,588 images from 4 domains, and it has 65 categories and 4
domains (Ar, Cl, Pr, Rw)
Q & A

Adversarial Reinforced Learning for Unsupervised Domain Adaptation

  • 1.
    Paper review 2021/10/18 펀더멘탈팀 김동희, 김창연,김지연, 이재윤, 송헌, 이근배(P)
  • 2.
    1. INTRODUCTION 2. RELATEDWORK 3.BACKGROUND 4. METHODS 5. EXPERIMENTS
  • 3.
  • 4.
    • High demandfor auto. classification of data • Large labeled training data → annotation cost ↑ • Necessary to transfer knowledge from an existing labeled domain (SOURCE) to an unlabeled new domain (TARGET) • Domain shift phenomenon • ML models don’t generalize well from SOURCE to TARGET Domain generalization with Mixstyle, Zhou el al., ICLR 2021 • Domain adaptation (DA) become effective method to mitigate the domain shift problem • Traditional method • Low → deep level instance representation (AlexNet, ResNet50..) • Heavily affected by the extracted features • DL based method • Design distance matrices to measure the discrepancy between 2 domains, or • Learn domain invariant features by adversarial learning
  • 5.
    • Distance basedmethods aim to min. the discrepancy between source & target • Classifier: to distinguish the source & target • Discriminator: to fool the classifier • Minimax game → the distance between the 2 domains become↓ • Domain Adversarial Neural Network (DANN) • A minimax loss to integrate a gradient reversal layer to promote the discrimination of source & target • Adversarial Discriminative Domain Adaptation (ADDA) • Uses an inverted label GAN loss to split the source & target • Features can be learned separately • Domain Symmetric Net (SymNet) • A symmetrically designed source/target classifier • proposed category label loss can improve the domain loss by learning the invariant feat. between 2 domains https://lh3.googleusercontent.com/-zsZDA4RqWSs/X9_4Ga_C9TI/AAAAAAAAQRQ/GjX0NhPXa70bjc2SL6XcdzbEOkVlPneKwCLcBGAsYHQ/w608-h241/image.png
  • 6.
    • Propose anovel framework called adversarial RL for unsupervised domain adaptation (ARL) • RL is a selector to identify the closest feature pair between source and target domain • Develop new reward across source & target • proposed deep correlation reward on the target can guide the agent to learn the best policy • select the closest feat. Pair for both domains • Propose adversarial learning and domain distribution alignment together • mitigate the discrepancy between source/target domains
  • 7.
  • 8.
    • In priorwork, we explored the effect of model selection on domains adaption methods • 16 deep neural networks on 12 domains • Distance between the source/target from different feature extractors can be shortened • ShuffleNet and NasnetMobile are closer to each other in projected 2D space • In the original space, features can be closer each other → two feature sets have similar distance. • Important to identify close features between two domains
  • 9.
  • 10.
  • 11.
    • Goal isto learn a classifier f under a feature extractor F • ensures lower generalization error in the target domain • We propose a new framework for unsupervised domain adaptation • which select the best feature pair between two domains from different pre-trained NN using RL 𝐷𝑠 = 𝑥𝑠𝑖,𝑦𝑗𝑖 ⅈ=1 𝜂𝑠 𝐷𝑡 = 𝑥𝑡𝑖 j=1 𝜂𝑡
  • 12.
    𝑆, 𝐴, 𝑇,𝑅, 𝛾 𝐸 ෍ 𝑡=0 𝑇 𝛾𝑡𝑅 𝑠𝑡, 𝑎𝑡 • S: a set of states • A: a set of actions • T: transition function T(s, s′,a) = P(s′|s,a) → models the possibility of next state s ′ given action a in state s • R: the reward function R(s, s′, a) which gets reward R from state s → s ′ • γ: discount factor in which 0 ≤ γ ≤ 1 • T is the timestep at which each episode ends. • The goal of RL is to learn a policy π(a|s), that maximizes the discounted expected reward as:
  • 13.
    For the taskin the labeled source domain, it minimizes the following cross-entropy loss: 𝐿𝑆 𝑓 𝐹 𝑥𝑆 , 𝑦𝑆 = − 1 𝑛𝑠 ා 𝑖=1 𝑛𝑆 ෍ 𝑐=1 𝐶 𝑦𝑠𝑖𝑐 log 𝑓𝑐 1 = 𝜒𝑠𝑖 𝐿𝐴 𝜒𝑠, 𝜒𝑇 = − 1 𝑛𝑠 ා 𝑖=1 𝑛𝑠 log 1 − 𝐷 𝐹 𝑥𝑠𝑖 − 1 𝑛𝑡 ෎ 𝑗=1 𝑛𝑡 log 𝐷 𝐹 𝑋𝑇𝑗
  • 14.
  • 15.
    H is theuniversal RKHS, and G : X → H. 𝑑ⅈ𝑠𝑡𝑘 𝜒𝑠, 𝜒𝑇 = 1 𝑛𝑠 ෍ 𝑖=1 𝑛𝑠 𝐺𝑘 𝜒𝑠𝑖 − 1 𝜂𝑡 ෍ 𝑗=1 𝜂𝑡 𝐺𝑘 𝜒𝑇𝑗 𝐻 𝑑ⅈ𝑠𝑡 𝜒𝑠, 𝜒𝑇 = 1 𝐾2 ෍ 𝑘𝑠=1 𝐾 ∙ ෍ 𝑘𝑡=1 𝐾 ∙ 1 𝑛𝑠 ෍ 𝑖=1 𝑛𝑠 𝐺𝑘 𝜒𝑠𝑖 − 1 𝜂𝑡 ෍ 𝑗=1 𝜂𝑡 𝐺𝑘 𝜒𝑇𝑗 𝐻
  • 16.
    𝑦𝑖 = 𝑦𝑗𝑖𝑓 sⅈ𝑚 𝐺 𝜒𝑖 , 𝐺 𝜒𝑗 > sⅈ𝑚 𝐺 𝜒𝑖 , 𝐺 𝜒𝑗≠ ሶ 𝐼 𝑅 𝑓 𝐹 𝐺 𝜒𝑇 , 𝑦𝑇𝑐𝑜𝑟𝑟 = 1 𝑛𝑡 ෍ 𝑖=1 𝑛𝑡 𝑦𝑝𝑟ⅇ𝑑𝑖 == 𝑦𝑝𝑟𝑒𝑑 𝑦𝑐𝑜𝑟𝑟𝑖 𝑅 𝑓 𝐹 𝐺 𝜒𝑇 , 𝑦𝑆𝑖/𝑇𝑐𝑜𝑟𝑟 = 0 𝑖𝑓 𝑓 𝐹 𝐺 𝜒𝑆𝑖 𝑇𝑖 ≠ 𝑦𝑠𝑖/𝑇𝑐𝑜𝑟𝑟𝑖 1 𝑖𝑓 𝑓 𝐹 𝐺 𝜒𝑆𝑖 𝑇𝑖 = 𝑦𝑠𝑖/𝑇𝑐𝑜𝑟𝑟𝑖 𝑅𝑡𝑜𝑡𝑎𝑙 = 1 𝑛𝑆 σ𝑖=1 𝑛𝑠 ∙ 𝑅 𝑓 𝐹 𝐺 𝜒𝑆𝑖 , 𝑦𝑆𝑖 + 1 𝑛𝑡 σ𝑗=1 𝑛𝑡 ∙ 𝑅 𝑓 𝐹 𝐺 𝜒𝑇𝑖 , 𝑦𝑇𝑐𝑜𝑟𝑟𝑗 𝐿𝑅(𝑅𝑆, 𝑅𝑇 )= 𝐿 𝑓 𝐹 𝐺 𝜒𝑆 , 𝑦𝑆 + 𝐿 𝑓 𝐹 𝐺 𝜒𝑇 , 𝑦𝑇𝑐𝑜𝑟𝑟
  • 17.
    𝐿𝐷𝐴(𝐷𝑆, 𝐷𝑇 )=𝑎𝑟𝑔𝑚𝑖𝑛 (𝐿𝐺 𝑓 𝐹 𝐺 𝜒𝑆 , 𝑦𝑆 + 𝜂 ∙ 𝑓 2 + 𝜆𝐷𝑓 𝐷𝑆, 𝐷𝑇 + 𝜌𝑅𝑓(𝐷𝑆, 𝐷𝑇)
  • 19.
    ℒ(𝜒𝑆, 𝑦𝑆, 𝜒𝑇,𝑦𝑇𝑐𝑜𝑟𝑟 )= 𝑎𝑟𝑔𝑚𝑖𝑛 (𝐿𝑅 𝑅𝑠, 𝑅𝑇 + 𝐿𝑆(𝑓 𝐹 𝐺 𝜒𝑆 , 𝑦𝑆) + 𝐿𝐴 𝐺 𝜒𝑆 , 𝐺 𝜒𝑇 + 𝐿𝐷𝐴(𝐷𝑆, 𝐷𝑇))
  • 20.
  • 21.
    • We evaluateour ARL model using two benchmark datasets, which are widely used in UDA. • We follow the protocol of prior work which extracted features from 16 pre-trained NN • Squeezenet, Alexnet, Googlenet, Shufflenet , Resnet18, Vgg16, Vgg19, Mobilenetv2, Nasnetmobile, Resnet50, Resnet101, Densenet201, Inceptionv3, Xception, Inceptionresnetv2, Nasnetlarge • All extracted features are from the last fully connected layer and each image has feature size of 1,000. • Office + Caltech-10 standard benchmark for domain adaptation, which contains Office 10 and Caltech 10 datasets. • 2,533 images in 4 domains (A, W, D, C) • Office-31 consists of 4,110 images in 31 classes from 3 domains (A, W,D) • Office-Home contains 15,588 images from 4 domains, and it has 65 categories and 4 domains (Ar, Cl, Pr, Rw)
  • 24.