Domain Adversarial Training of
Neural Network
PR12와 함께 이해하는
* Domain Adversarial Training of Neural Network, Y. Ganin et al. 2016를 바탕으로 작성한 리뷰
Jaejun Yoo
Ph.D. Candidate @KAIST
PR12
4TH MAY, 2017
Usually we try to…
Test
(target)
Training
(source)
For simplicity, let’s consider the
binary classification problem
일반적인 supervised learning setting: Training
과 test의 domain이 같다고 가정.
TAXONOMY OF TRANSFER LEARNING
전자기기 고객평가 (X) /
긍정 혹은 부정 라벨 (Y)
전자기기 고객평가 (X) /
긍정 혹은 부정 라벨 (Y)
비디오 게임 고객평가 (X)
전자기기 고객평가 (X) /
긍정 혹은 부정 라벨 (Y)
비디오 게임 고객평가 (X)
NN으로 표현되는 H 함수 공간으로부터….
전자기기 고객평가 (X) /
긍정 혹은 부정 라벨 (Y)
비디오 게임 고객평가 (X)
Classifier h를 학습하는데,
target의 label을 모르지만
source(X,Y)와 target(X)
두 도메인 모두에서 잘 label
을 찾는 h를 찾고 싶다.
NN으로 표현되는 H 함수 공간으로부터….
DANN
DANN
TRY TO CLASSIFY WELL WITH
THE EXTRACTED FEATURE!
Ordinary classification
POSITIVE
NEGATIVE
고객 평가 댓글
DANN
Ordinary classification
Domain Classification
전자기기
비디오 게임
TRY TO CLASSIFY WELL WITH
THE EXTRACTED FEATURE!
POSITIVE
NEGATIVE
고객 평가 댓글
DANN
Ordinary classification
Domain Classification
전자기기
비디오 게임
TRY TO CLASSIFY WELL WITH
THE EXTRACTED FEATURE!
POSITIVE
NEGATIVE
고객 평가 댓글
TRY TO EXTRACT
DOMAIN INDEPENDENT FEATURE!
DANN
Ordinary classification
Domain Classification
전자기기
비디오 게임
TRY TO CLASSIFY WELL WITH
THE EXTRACTED FEATURE!
POSITIVE
NEGATIVE
고객 평가 댓글
TRY TO EXTRACT
DOMAIN INDEPENDENT FEATURE!
e.g. f : compact, sharp, blurry
→ easy to discriminate the domain
⇓
f : good, excited, nice, never buy, …
• Combining DA and feature learning within one training process
• Principled way to learn a good representation based on the
generalization guarantee
: minimize the H divergence directly (no heuristic)
“When or when not the DA algorithm works.”
“Why it works.”
DANN
기존 전략: 최대한 적은 parameter로 training
error가 최소인 model을 찾자
이제는 training domain (source)과 testing
domain (target)이 서로 다르다
기존의 전략 외에 다른 전략이 추가로 필요하다.
PREREQUISITE
Different distances
Slide courtesy of Sungbin Lim, DeepBio, 2017
= 0
A Bound on the Adaptation Error
1. Difference across all measurable subsets cannot be estimated from
finite samples
2. We’re only interested in differences related to classification error
Idea: Measure subsets where hypotheses in disagree
Subsets A are error sets of one hypothesis wrt another
1. Always lower than L1
2. computable from finite unlabeled samples. (Kifer et al. 2004)
3. train classifier to discriminate between source and target data
A Computable Adaptation Bound
Divergence estimation
complexity
Dependent on number
of unlabeled samples
The optimal joint hypothesis
is the hypothesis with minimal combined error
is that error
THANKS TO GENERALIZATION GUARANTEE
THEORETICAL RESULTS
THEORETICAL RESULTS
𝒉 ∈ 𝑯 ⟺ 𝟏 − 𝒉 ∈ 𝑯
THEORETICAL RESULTS
THEORETICAL RESULTS
DANN
DANN
DANN
DANN
DANN
↔
DANN
↔
DANN
SHALLOW DANN
SHALLOW DANN
tSNE RESULTS
REFERENCE
PAPERS
1. A survey on transfer learning, SJ Pan 2009
2. A theory of learning from different domains, S Ben-David et al. 2010
3. Domain-Adversarial Training of Neural Networks, Y Ganin 2016
BLOG
1. http://jaejunyoo.blogspot.com/2017/01/domain-adversarial-training-of-neural.html
2. https://github.com/jaejun-yoo/tf-dann-py35
3. https://github.com/jaejun-yoo/shallow-DANN-two-moon-dataset
SLIDES
1. http://www.di.ens.fr/~germain/talks/nips2014_dann_slides.pdf
2. http://john.blitzer.com/talks/icmltutorial_2010.pdf (DA theory part)
3. https://epat2014.sciencesconf.org/conference/epat2014/pages/slides_DA_epat_17.pdf (DA theory part)
4. https://www.slideshare.net/butest/ppt-3860159 (DA theory part)
VIDEO
1. https://www.youtube.com/watch?v=h8tXDbywcdQ (Terry Um 딥러닝 토크)
2. https://www.youtube.com/watch?v=F2OJ0fAK46Q (DA theory part)
3. https://www.youtube.com/watch?v=uc6K6tRHMAA&index=13&list=WL&t=2570s (DA theory part)

[Pr12] dann jaejun yoo

  • 1.
    Domain Adversarial Trainingof Neural Network PR12와 함께 이해하는 * Domain Adversarial Training of Neural Network, Y. Ganin et al. 2016를 바탕으로 작성한 리뷰 Jaejun Yoo Ph.D. Candidate @KAIST PR12 4TH MAY, 2017
  • 2.
    Usually we tryto… Test (target) Training (source)
  • 3.
    For simplicity, let’sconsider the binary classification problem
  • 5.
    일반적인 supervised learningsetting: Training 과 test의 domain이 같다고 가정.
  • 9.
  • 13.
    전자기기 고객평가 (X)/ 긍정 혹은 부정 라벨 (Y)
  • 14.
    전자기기 고객평가 (X)/ 긍정 혹은 부정 라벨 (Y) 비디오 게임 고객평가 (X)
  • 15.
    전자기기 고객평가 (X)/ 긍정 혹은 부정 라벨 (Y) 비디오 게임 고객평가 (X) NN으로 표현되는 H 함수 공간으로부터….
  • 16.
    전자기기 고객평가 (X)/ 긍정 혹은 부정 라벨 (Y) 비디오 게임 고객평가 (X) Classifier h를 학습하는데, target의 label을 모르지만 source(X,Y)와 target(X) 두 도메인 모두에서 잘 label 을 찾는 h를 찾고 싶다. NN으로 표현되는 H 함수 공간으로부터….
  • 17.
  • 18.
    DANN TRY TO CLASSIFYWELL WITH THE EXTRACTED FEATURE! Ordinary classification POSITIVE NEGATIVE 고객 평가 댓글
  • 19.
    DANN Ordinary classification Domain Classification 전자기기 비디오게임 TRY TO CLASSIFY WELL WITH THE EXTRACTED FEATURE! POSITIVE NEGATIVE 고객 평가 댓글
  • 20.
    DANN Ordinary classification Domain Classification 전자기기 비디오게임 TRY TO CLASSIFY WELL WITH THE EXTRACTED FEATURE! POSITIVE NEGATIVE 고객 평가 댓글 TRY TO EXTRACT DOMAIN INDEPENDENT FEATURE!
  • 21.
    DANN Ordinary classification Domain Classification 전자기기 비디오게임 TRY TO CLASSIFY WELL WITH THE EXTRACTED FEATURE! POSITIVE NEGATIVE 고객 평가 댓글 TRY TO EXTRACT DOMAIN INDEPENDENT FEATURE! e.g. f : compact, sharp, blurry → easy to discriminate the domain ⇓ f : good, excited, nice, never buy, …
  • 22.
    • Combining DAand feature learning within one training process • Principled way to learn a good representation based on the generalization guarantee : minimize the H divergence directly (no heuristic) “When or when not the DA algorithm works.” “Why it works.” DANN
  • 23.
    기존 전략: 최대한적은 parameter로 training error가 최소인 model을 찾자
  • 24.
    이제는 training domain(source)과 testing domain (target)이 서로 다르다 기존의 전략 외에 다른 전략이 추가로 필요하다.
  • 26.
  • 27.
  • 28.
    A Bound onthe Adaptation Error 1. Difference across all measurable subsets cannot be estimated from finite samples 2. We’re only interested in differences related to classification error
  • 29.
    Idea: Measure subsetswhere hypotheses in disagree Subsets A are error sets of one hypothesis wrt another 1. Always lower than L1 2. computable from finite unlabeled samples. (Kifer et al. 2004) 3. train classifier to discriminate between source and target data
  • 30.
    A Computable AdaptationBound Divergence estimation complexity Dependent on number of unlabeled samples
  • 31.
    The optimal jointhypothesis is the hypothesis with minimal combined error is that error
  • 32.
  • 33.
  • 34.
    THEORETICAL RESULTS 𝒉 ∈𝑯 ⟺ 𝟏 − 𝒉 ∈ 𝑯
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 45.
  • 46.
  • 47.
  • 48.
    REFERENCE PAPERS 1. A surveyon transfer learning, SJ Pan 2009 2. A theory of learning from different domains, S Ben-David et al. 2010 3. Domain-Adversarial Training of Neural Networks, Y Ganin 2016 BLOG 1. http://jaejunyoo.blogspot.com/2017/01/domain-adversarial-training-of-neural.html 2. https://github.com/jaejun-yoo/tf-dann-py35 3. https://github.com/jaejun-yoo/shallow-DANN-two-moon-dataset SLIDES 1. http://www.di.ens.fr/~germain/talks/nips2014_dann_slides.pdf 2. http://john.blitzer.com/talks/icmltutorial_2010.pdf (DA theory part) 3. https://epat2014.sciencesconf.org/conference/epat2014/pages/slides_DA_epat_17.pdf (DA theory part) 4. https://www.slideshare.net/butest/ppt-3860159 (DA theory part) VIDEO 1. https://www.youtube.com/watch?v=h8tXDbywcdQ (Terry Um 딥러닝 토크) 2. https://www.youtube.com/watch?v=F2OJ0fAK46Q (DA theory part) 3. https://www.youtube.com/watch?v=uc6K6tRHMAA&index=13&list=WL&t=2570s (DA theory part)