SlideShare a Scribd company logo
Paper review
2021/10/18
펀더멘탈팀
김동희, 김창연, 김지연, 이재윤, 송헌, 이근배(P)
1. INTRODUCTION
2. RELATEDWORK
3. BACKGROUND
4. METHODS
5. EXPERIMENTS
1. INTRODUCTION
• High demand for auto. classification of data
• Large labeled training data → annotation cost ↑
• Necessary to transfer knowledge from an existing labeled domain (SOURCE)
to an unlabeled new domain (TARGET)
• Domain shift phenomenon
• ML models don’t generalize well from SOURCE to TARGET
Domain generalization with Mixstyle, Zhou el al., ICLR 2021
• Domain adaptation (DA) become effective method to mitigate
the domain shift problem
• Traditional method
• Low → deep level instance representation
(AlexNet, ResNet50..)
• Heavily affected by the extracted features
• DL based method
• Design distance matrices to measure
the discrepancy between 2 domains, or
• Learn domain invariant features by adversarial learning
• Distance based methods aim to min. the discrepancy
between source & target
• Classifier: to distinguish the source & target
• Discriminator: to fool the classifier
• Minimax game → the distance between the 2
domains become↓
• Domain Adversarial Neural Network (DANN)
• A minimax loss to integrate a gradient reversal layer to promote the
discrimination of source & target
• Adversarial Discriminative Domain Adaptation (ADDA)
• Uses an inverted label GAN loss to split the source & target
• Features can be learned separately
• Domain Symmetric Net (SymNet)
• A symmetrically designed source/target classifier
• proposed category label loss can improve the domain loss by learning the
invariant feat. between 2 domains
https://lh3.googleusercontent.com/-zsZDA4RqWSs/X9_4Ga_C9TI/AAAAAAAAQRQ/GjX0NhPXa70bjc2SL6XcdzbEOkVlPneKwCLcBGAsYHQ/w608-h241/image.png
• Propose a novel framework called adversarial RL for unsupervised domain adaptation (ARL)
• RL is a selector to identify the closest feature pair between source and target domain
• Develop new reward across source & target
• proposed deep correlation reward on the target can guide the agent to learn the best policy
• select the closest feat. Pair for both domains
• Propose adversarial learning and domain distribution alignment together
• mitigate the discrepancy between source/target domains
2. RELATEDWORK
• In prior work, we explored the effect of model selection on domains adaption methods
• 16 deep neural networks on 12 domains
• Distance between the source/target from different feature extractors can be shortened
• ShuffleNet and NasnetMobile are closer to each other in projected 2D space
• In the original space, features can be closer each other → two feature sets have similar distance.
• Important to identify close features between two domains
Q & A
3. BACKGROUND
• Goal is to learn a classifier f under a feature extractor F
• ensures lower generalization error in the target domain
• We propose a new framework for unsupervised domain adaptation
• which select the best feature pair between two domains from different pre-trained NN using RL
𝐷𝑠 = 𝑥𝑠𝑖,𝑦𝑗𝑖 ⅈ=1
𝜂𝑠
𝐷𝑡 = 𝑥𝑡𝑖 j=1
𝜂𝑡
𝑆, 𝐴, 𝑇, 𝑅, 𝛾
𝐸 ෍
𝑡=0
𝑇
𝛾𝑡𝑅 𝑠𝑡, 𝑎𝑡
• S: a set of states
• A: a set of actions
• T: transition function T(s, s′,a) = P(s′|s,a) → models the possibility of next state s ′ given action a in state s
• R: the reward function R(s, s′, a) which gets reward R from state s → s ′
• γ: discount factor in which 0 ≤ γ ≤ 1
• T is the timestep at which each episode ends.
• The goal of RL is to learn a policy π(a|s), that maximizes the discounted expected reward as:
For the task in the labeled source domain, it minimizes the following cross-entropy loss:
𝐿𝑆 𝑓 𝐹 𝑥𝑆 , 𝑦𝑆 = −
1
𝑛𝑠
ා
𝑖=1
𝑛𝑆
෍
𝑐=1
𝐶
𝑦𝑠𝑖𝑐 log 𝑓𝑐 1 = 𝜒𝑠𝑖
𝐿𝐴 𝜒𝑠, 𝜒𝑇 = −
1
𝑛𝑠
ා
𝑖=1
𝑛𝑠
log 1 − 𝐷 𝐹 𝑥𝑠𝑖
−
1
𝑛𝑡
෎
𝑗=1
𝑛𝑡
log 𝐷 𝐹 𝑋𝑇𝑗
4. METHODS
H is the universal RKHS, and G : X → H.
𝑑ⅈ𝑠𝑡𝑘 𝜒𝑠, 𝜒𝑇 =
1
𝑛𝑠
෍
𝑖=1
𝑛𝑠
𝐺𝑘 𝜒𝑠𝑖
−
1
𝜂𝑡
෍
𝑗=1
𝜂𝑡
𝐺𝑘 𝜒𝑇𝑗
𝐻
𝑑ⅈ𝑠𝑡 𝜒𝑠, 𝜒𝑇 =
1
𝐾2
෍
𝑘𝑠=1
𝐾
∙ ෍
𝑘𝑡=1
𝐾
∙
1
𝑛𝑠
෍
𝑖=1
𝑛𝑠
𝐺𝑘 𝜒𝑠𝑖
−
1
𝜂𝑡
෍
𝑗=1
𝜂𝑡
𝐺𝑘 𝜒𝑇𝑗
𝐻
𝑦𝑖 = 𝑦𝑗 𝑖𝑓 sⅈ𝑚 𝐺 𝜒𝑖 , 𝐺 𝜒𝑗 > sⅈ𝑚 𝐺 𝜒𝑖 , 𝐺 𝜒𝑗≠ ሶ
𝐼
𝑅 𝑓 𝐹 𝐺 𝜒𝑇 , 𝑦𝑇𝑐𝑜𝑟𝑟 =
1
𝑛𝑡
෍
𝑖=1
𝑛𝑡
𝑦𝑝𝑟ⅇ𝑑𝑖 == 𝑦𝑝𝑟𝑒𝑑 𝑦𝑐𝑜𝑟𝑟𝑖
𝑅 𝑓 𝐹 𝐺 𝜒𝑇 , 𝑦𝑆𝑖/𝑇𝑐𝑜𝑟𝑟 =
0 𝑖𝑓 𝑓 𝐹 𝐺 𝜒𝑆𝑖
𝑇𝑖
≠ 𝑦𝑠𝑖/𝑇𝑐𝑜𝑟𝑟𝑖
1 𝑖𝑓 𝑓 𝐹 𝐺 𝜒𝑆𝑖
𝑇𝑖
= 𝑦𝑠𝑖/𝑇𝑐𝑜𝑟𝑟𝑖
𝑅𝑡𝑜𝑡𝑎𝑙 =
1
𝑛𝑆
σ𝑖=1
𝑛𝑠
∙ 𝑅 𝑓 𝐹 𝐺 𝜒𝑆𝑖
, 𝑦𝑆𝑖
+
1
𝑛𝑡
σ𝑗=1
𝑛𝑡
∙ 𝑅 𝑓 𝐹 𝐺 𝜒𝑇𝑖
, 𝑦𝑇𝑐𝑜𝑟𝑟𝑗
𝐿𝑅(𝑅𝑆, 𝑅𝑇 )= 𝐿 𝑓 𝐹 𝐺 𝜒𝑆 , 𝑦𝑆 + 𝐿 𝑓 𝐹 𝐺 𝜒𝑇 , 𝑦𝑇𝑐𝑜𝑟𝑟
𝐿𝐷𝐴(𝐷𝑆, 𝐷𝑇 )= 𝑎𝑟𝑔𝑚𝑖𝑛 (𝐿𝐺 𝑓 𝐹 𝐺 𝜒𝑆 , 𝑦𝑆 + 𝜂 ∙ 𝑓
2
+ 𝜆𝐷𝑓 𝐷𝑆, 𝐷𝑇 + 𝜌𝑅𝑓(𝐷𝑆, 𝐷𝑇)
ℒ(𝜒𝑆, 𝑦𝑆, 𝜒𝑇, 𝑦𝑇𝑐𝑜𝑟𝑟
)= 𝑎𝑟𝑔𝑚𝑖𝑛 (𝐿𝑅 𝑅𝑠, 𝑅𝑇 + 𝐿𝑆(𝑓 𝐹 𝐺 𝜒𝑆 , 𝑦𝑆) + 𝐿𝐴 𝐺 𝜒𝑆 , 𝐺 𝜒𝑇 + 𝐿𝐷𝐴(𝐷𝑆, 𝐷𝑇))
5. EXPERIMENTS
• We evaluate our ARL model using two
benchmark datasets, which are widely used in
UDA.
• We follow the protocol of prior work which
extracted features from 16 pre-trained NN
• Squeezenet, Alexnet, Googlenet,
Shufflenet , Resnet18, Vgg16, Vgg19,
Mobilenetv2, Nasnetmobile, Resnet50,
Resnet101, Densenet201, Inceptionv3,
Xception, Inceptionresnetv2, Nasnetlarge
• All extracted features are from the last fully
connected layer and each image has feature
size of 1,000.
• Office + Caltech-10 standard benchmark for domain adaptation, which contains
Office 10 and Caltech 10 datasets.
• 2,533 images in 4 domains (A, W, D, C)
• Office-31 consists of 4,110 images in 31 classes from 3 domains (A, W,D)
• Office-Home contains 15,588 images from 4 domains, and it has 65 categories and 4
domains (Ar, Cl, Pr, Rw)
Q & A

More Related Content

What's hot

Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
DataRobot
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
swapnac12
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative Models
Kenta Oono
 
ddpg seminar
ddpg seminarddpg seminar
ddpg seminar
민재 정
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine Learning
Pavithra Thippanaik
 
GBM package in r
GBM package in rGBM package in r
GBM package in r
mark_landry
 
Iclr2016 vaeまとめ
Iclr2016 vaeまとめIclr2016 vaeまとめ
Iclr2016 vaeまとめ
Deep Learning JP
 
K-means and GMM
K-means and GMMK-means and GMM
K-means and GMM
Sanghyuk Chun
 
Matching networks for one shot learning
Matching networks for one shot learningMatching networks for one shot learning
Matching networks for one shot learning
Kazuki Fujikawa
 
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
Jisang Yoon
 
lecture_mooney.ppt
lecture_mooney.pptlecture_mooney.ppt
lecture_mooney.ppt
butest
 
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
hyunsung lee
 
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
MLReview
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
Ryo Iwaki
 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural Nets
Pierre de Lacaze
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Seongwon Hwang
 
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement LearningSafe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learning
mooopan
 
QMIX: monotonic value function factorization paper review
QMIX: monotonic value function factorization paper reviewQMIX: monotonic value function factorization paper review
QMIX: monotonic value function factorization paper review
민재 정
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix Factorization
Benjamin Bengfort
 
Matrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpMatrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlp
ankit_ppt
 

What's hot (20)

Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative Models
 
ddpg seminar
ddpg seminarddpg seminar
ddpg seminar
 
Instance Based Learning in Machine Learning
Instance Based Learning in Machine LearningInstance Based Learning in Machine Learning
Instance Based Learning in Machine Learning
 
GBM package in r
GBM package in rGBM package in r
GBM package in r
 
Iclr2016 vaeまとめ
Iclr2016 vaeまとめIclr2016 vaeまとめ
Iclr2016 vaeまとめ
 
K-means and GMM
K-means and GMMK-means and GMM
K-means and GMM
 
Matching networks for one shot learning
Matching networks for one shot learningMatching networks for one shot learning
Matching networks for one shot learning
 
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
 
lecture_mooney.ppt
lecture_mooney.pptlecture_mooney.ppt
lecture_mooney.ppt
 
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...Session-Based Recommendations with Recurrent Neural Networks(Balazs Hidasi, ...
Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, ...
 
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
OPTIMIZATION AS A MODEL FOR FEW-SHOT LEARNING
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural Nets
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
 
Safe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement LearningSafe and Efficient Off-Policy Reinforcement Learning
Safe and Efficient Off-Policy Reinforcement Learning
 
QMIX: monotonic value function factorization paper review
QMIX: monotonic value function factorization paper reviewQMIX: monotonic value function factorization paper review
QMIX: monotonic value function factorization paper review
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix Factorization
 
Matrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpMatrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlp
 

Similar to Adversarial Reinforced Learning for Unsupervised Domain Adaptation

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
MLconf
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
ssuser2624f71
 
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
ssuser4b1f48
 
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Chris Ohk
 
Trajectory Transformer.pptx
Trajectory Transformer.pptxTrajectory Transformer.pptx
Trajectory Transformer.pptx
Seungeon Baek
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptxvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
Seungeon Baek
 
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketching
Hsing-chuan Hsieh
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
taeseon ryu
 
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En..."Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...
ssuser2624f71
 
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...
AmirParnianifard1
 
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Spark Summit
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matching
taeseon ryu
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
Nimrita Koul
 
Datamining with R
Datamining with RDatamining with R
Datamining with R
Shitalkumar Sukhdeve
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...
Classification of Iris Data using Kernel Radial Basis Probabilistic  Neural N...Classification of Iris Data using Kernel Radial Basis Probabilistic  Neural N...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...
Scientific Review SR
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Scientific Review
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
ChenYiHuang5
 
230727_HB_JointJournalClub.pptx
230727_HB_JointJournalClub.pptx230727_HB_JointJournalClub.pptx
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-making
Adam Doyle
 
Phases of distributed query processing
Phases of distributed query processingPhases of distributed query processing
Phases of distributed query processing
Nevil Dsouza
 

Similar to Adversarial Reinforced Learning for Unsupervised Domain Adaptation (20)

Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks, arXiv e-...
 
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...NS-CUK Seminar: H.E.Lee,  Review on "Gated Graph Sequence Neural Networks", I...
NS-CUK Seminar: H.E.Lee, Review on "Gated Graph Sequence Neural Networks", I...
 
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
 
Trajectory Transformer.pptx
Trajectory Transformer.pptxTrajectory Transformer.pptx
Trajectory Transformer.pptx
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptxvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pptx
 
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketching
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En..."Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...
 
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...
 
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matching
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
 
Datamining with R
Datamining with RDatamining with R
Datamining with R
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...
Classification of Iris Data using Kernel Radial Basis Probabilistic  Neural N...Classification of Iris Data using Kernel Radial Basis Probabilistic  Neural N...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
230727_HB_JointJournalClub.pptx
230727_HB_JointJournalClub.pptx230727_HB_JointJournalClub.pptx
230727_HB_JointJournalClub.pptx
 
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-making
 
Phases of distributed query processing
Phases of distributed query processingPhases of distributed query processing
Phases of distributed query processing
 

More from taeseon ryu

VoxelNet
VoxelNetVoxelNet
VoxelNet
taeseon ryu
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
taeseon ryu
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
taeseon ryu
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
taeseon ryu
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
taeseon ryu
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
taeseon ryu
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
taeseon ryu
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
taeseon ryu
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
taeseon ryu
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
taeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
taeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
taeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
taeseon ryu
 
mPLUG
mPLUGmPLUG
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
taeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
taeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
taeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
taeseon ryu
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimization
taeseon ryu
 

More from taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimization
 

Recently uploaded

一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
tzu5xla
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
9gr6pty
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
nhutnguyen355078
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
Vineet
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
uevausa
 

Recently uploaded (20)

一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 

Adversarial Reinforced Learning for Unsupervised Domain Adaptation

  • 1. Paper review 2021/10/18 펀더멘탈팀 김동희, 김창연, 김지연, 이재윤, 송헌, 이근배(P)
  • 2. 1. INTRODUCTION 2. RELATEDWORK 3. BACKGROUND 4. METHODS 5. EXPERIMENTS
  • 4. • High demand for auto. classification of data • Large labeled training data → annotation cost ↑ • Necessary to transfer knowledge from an existing labeled domain (SOURCE) to an unlabeled new domain (TARGET) • Domain shift phenomenon • ML models don’t generalize well from SOURCE to TARGET Domain generalization with Mixstyle, Zhou el al., ICLR 2021 • Domain adaptation (DA) become effective method to mitigate the domain shift problem • Traditional method • Low → deep level instance representation (AlexNet, ResNet50..) • Heavily affected by the extracted features • DL based method • Design distance matrices to measure the discrepancy between 2 domains, or • Learn domain invariant features by adversarial learning
  • 5. • Distance based methods aim to min. the discrepancy between source & target • Classifier: to distinguish the source & target • Discriminator: to fool the classifier • Minimax game → the distance between the 2 domains become↓ • Domain Adversarial Neural Network (DANN) • A minimax loss to integrate a gradient reversal layer to promote the discrimination of source & target • Adversarial Discriminative Domain Adaptation (ADDA) • Uses an inverted label GAN loss to split the source & target • Features can be learned separately • Domain Symmetric Net (SymNet) • A symmetrically designed source/target classifier • proposed category label loss can improve the domain loss by learning the invariant feat. between 2 domains https://lh3.googleusercontent.com/-zsZDA4RqWSs/X9_4Ga_C9TI/AAAAAAAAQRQ/GjX0NhPXa70bjc2SL6XcdzbEOkVlPneKwCLcBGAsYHQ/w608-h241/image.png
  • 6. • Propose a novel framework called adversarial RL for unsupervised domain adaptation (ARL) • RL is a selector to identify the closest feature pair between source and target domain • Develop new reward across source & target • proposed deep correlation reward on the target can guide the agent to learn the best policy • select the closest feat. Pair for both domains • Propose adversarial learning and domain distribution alignment together • mitigate the discrepancy between source/target domains
  • 8. • In prior work, we explored the effect of model selection on domains adaption methods • 16 deep neural networks on 12 domains • Distance between the source/target from different feature extractors can be shortened • ShuffleNet and NasnetMobile are closer to each other in projected 2D space • In the original space, features can be closer each other → two feature sets have similar distance. • Important to identify close features between two domains
  • 11. • Goal is to learn a classifier f under a feature extractor F • ensures lower generalization error in the target domain • We propose a new framework for unsupervised domain adaptation • which select the best feature pair between two domains from different pre-trained NN using RL 𝐷𝑠 = 𝑥𝑠𝑖,𝑦𝑗𝑖 ⅈ=1 𝜂𝑠 𝐷𝑡 = 𝑥𝑡𝑖 j=1 𝜂𝑡
  • 12. 𝑆, 𝐴, 𝑇, 𝑅, 𝛾 𝐸 ෍ 𝑡=0 𝑇 𝛾𝑡𝑅 𝑠𝑡, 𝑎𝑡 • S: a set of states • A: a set of actions • T: transition function T(s, s′,a) = P(s′|s,a) → models the possibility of next state s ′ given action a in state s • R: the reward function R(s, s′, a) which gets reward R from state s → s ′ • γ: discount factor in which 0 ≤ γ ≤ 1 • T is the timestep at which each episode ends. • The goal of RL is to learn a policy π(a|s), that maximizes the discounted expected reward as:
  • 13. For the task in the labeled source domain, it minimizes the following cross-entropy loss: 𝐿𝑆 𝑓 𝐹 𝑥𝑆 , 𝑦𝑆 = − 1 𝑛𝑠 ා 𝑖=1 𝑛𝑆 ෍ 𝑐=1 𝐶 𝑦𝑠𝑖𝑐 log 𝑓𝑐 1 = 𝜒𝑠𝑖 𝐿𝐴 𝜒𝑠, 𝜒𝑇 = − 1 𝑛𝑠 ා 𝑖=1 𝑛𝑠 log 1 − 𝐷 𝐹 𝑥𝑠𝑖 − 1 𝑛𝑡 ෎ 𝑗=1 𝑛𝑡 log 𝐷 𝐹 𝑋𝑇𝑗
  • 15. H is the universal RKHS, and G : X → H. 𝑑ⅈ𝑠𝑡𝑘 𝜒𝑠, 𝜒𝑇 = 1 𝑛𝑠 ෍ 𝑖=1 𝑛𝑠 𝐺𝑘 𝜒𝑠𝑖 − 1 𝜂𝑡 ෍ 𝑗=1 𝜂𝑡 𝐺𝑘 𝜒𝑇𝑗 𝐻 𝑑ⅈ𝑠𝑡 𝜒𝑠, 𝜒𝑇 = 1 𝐾2 ෍ 𝑘𝑠=1 𝐾 ∙ ෍ 𝑘𝑡=1 𝐾 ∙ 1 𝑛𝑠 ෍ 𝑖=1 𝑛𝑠 𝐺𝑘 𝜒𝑠𝑖 − 1 𝜂𝑡 ෍ 𝑗=1 𝜂𝑡 𝐺𝑘 𝜒𝑇𝑗 𝐻
  • 16. 𝑦𝑖 = 𝑦𝑗 𝑖𝑓 sⅈ𝑚 𝐺 𝜒𝑖 , 𝐺 𝜒𝑗 > sⅈ𝑚 𝐺 𝜒𝑖 , 𝐺 𝜒𝑗≠ ሶ 𝐼 𝑅 𝑓 𝐹 𝐺 𝜒𝑇 , 𝑦𝑇𝑐𝑜𝑟𝑟 = 1 𝑛𝑡 ෍ 𝑖=1 𝑛𝑡 𝑦𝑝𝑟ⅇ𝑑𝑖 == 𝑦𝑝𝑟𝑒𝑑 𝑦𝑐𝑜𝑟𝑟𝑖 𝑅 𝑓 𝐹 𝐺 𝜒𝑇 , 𝑦𝑆𝑖/𝑇𝑐𝑜𝑟𝑟 = 0 𝑖𝑓 𝑓 𝐹 𝐺 𝜒𝑆𝑖 𝑇𝑖 ≠ 𝑦𝑠𝑖/𝑇𝑐𝑜𝑟𝑟𝑖 1 𝑖𝑓 𝑓 𝐹 𝐺 𝜒𝑆𝑖 𝑇𝑖 = 𝑦𝑠𝑖/𝑇𝑐𝑜𝑟𝑟𝑖 𝑅𝑡𝑜𝑡𝑎𝑙 = 1 𝑛𝑆 σ𝑖=1 𝑛𝑠 ∙ 𝑅 𝑓 𝐹 𝐺 𝜒𝑆𝑖 , 𝑦𝑆𝑖 + 1 𝑛𝑡 σ𝑗=1 𝑛𝑡 ∙ 𝑅 𝑓 𝐹 𝐺 𝜒𝑇𝑖 , 𝑦𝑇𝑐𝑜𝑟𝑟𝑗 𝐿𝑅(𝑅𝑆, 𝑅𝑇 )= 𝐿 𝑓 𝐹 𝐺 𝜒𝑆 , 𝑦𝑆 + 𝐿 𝑓 𝐹 𝐺 𝜒𝑇 , 𝑦𝑇𝑐𝑜𝑟𝑟
  • 17. 𝐿𝐷𝐴(𝐷𝑆, 𝐷𝑇 )= 𝑎𝑟𝑔𝑚𝑖𝑛 (𝐿𝐺 𝑓 𝐹 𝐺 𝜒𝑆 , 𝑦𝑆 + 𝜂 ∙ 𝑓 2 + 𝜆𝐷𝑓 𝐷𝑆, 𝐷𝑇 + 𝜌𝑅𝑓(𝐷𝑆, 𝐷𝑇)
  • 18.
  • 19. ℒ(𝜒𝑆, 𝑦𝑆, 𝜒𝑇, 𝑦𝑇𝑐𝑜𝑟𝑟 )= 𝑎𝑟𝑔𝑚𝑖𝑛 (𝐿𝑅 𝑅𝑠, 𝑅𝑇 + 𝐿𝑆(𝑓 𝐹 𝐺 𝜒𝑆 , 𝑦𝑆) + 𝐿𝐴 𝐺 𝜒𝑆 , 𝐺 𝜒𝑇 + 𝐿𝐷𝐴(𝐷𝑆, 𝐷𝑇))
  • 21. • We evaluate our ARL model using two benchmark datasets, which are widely used in UDA. • We follow the protocol of prior work which extracted features from 16 pre-trained NN • Squeezenet, Alexnet, Googlenet, Shufflenet , Resnet18, Vgg16, Vgg19, Mobilenetv2, Nasnetmobile, Resnet50, Resnet101, Densenet201, Inceptionv3, Xception, Inceptionresnetv2, Nasnetlarge • All extracted features are from the last fully connected layer and each image has feature size of 1,000. • Office + Caltech-10 standard benchmark for domain adaptation, which contains Office 10 and Caltech 10 datasets. • 2,533 images in 4 domains (A, W, D, C) • Office-31 consists of 4,110 images in 31 classes from 3 domains (A, W,D) • Office-Home contains 15,588 images from 4 domains, and it has 65 categories and 4 domains (Ar, Cl, Pr, Rw)
  • 22.
  • 23.
  • 24. Q & A