Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Matching Network
1.
2. Learning without big-data
Model-based Metric-based Optimization-based
Key Idea RNN, Memory Metric Learning Gradient Descent
How
𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑?
𝑓𝜃(𝑥, 𝑆)
(𝑥𝑖,𝑦𝑖)∈𝑠
𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋)
3. Learning without big-data
Model-based Metric-based Optimization-based
Key Idea RNN, Memory Metric Learning Gradient Descent
How
𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑?
𝑓𝜃(𝑥, 𝑆)
(𝑥𝑖,𝑦𝑖)∈𝑠
𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋)
모델 내부 혹은 외부에 메모리 셀을
별도로 설계하여 모델을 더 빠르게 학
습시킬지 느리게 학습시킬지 조율
4. Learning without big-data
Model-based Metric-based Optimization-based
Key Idea RNN, Memory Metric Learning Gradient Descent
How
𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑?
𝑓𝜃(𝑥, 𝑆)
(𝑥𝑖,𝑦𝑖)∈𝑠
𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋)
모델 내부 혹은 외부에 메모리 셀을
별도로 설계하여 모델을 더 빠르게 학
습시킬지 느리게 학습시킬지 조율
거리 기반으로 모델을 학습
5. Learning without big-data
Model-based Metric-based Optimization-based
Key Idea RNN, Memory Metric Learning Gradient Descent
How
𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑?
𝑓𝜃(𝑥, 𝑆)
(𝑥𝑖,𝑦𝑖)∈𝑠
𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋)
모델 내부 혹은 외부에 메모리 셀을
별도로 설계하여 모델을 더 빠르게 학
습시킬지 느리게 학습시킬지 조율
거리 기반으로 모델을 학습
그레디언트를 어떻게 효율적으로 조절
하여 모델을 빠르게 학습시킬지 조율
6. Learning without big-data
Model-based Metric-based Optimization-based
Key Idea RNN, Memory Metric Learning Gradient Descent
How
𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑?
𝑓𝜃(𝑥, 𝑆)
(𝑥𝑖,𝑦𝑖)∈𝑠
𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋)
모델 내부 혹은 외부에 메모리 셀을
별도로 설계하여 모델을 더 빠르게 학
습시킬지 느리게 학습시킬지 조율
거리 기반으로 모델을 학습
그레디언트를 어떻게 효율적으로 조절
하여 모델을 빠르게 학습시킬지 조율
7. Learning without big-data
Model-based Metric-based Optimization-based
Key Idea RNN, Memory Metric Learning Gradient Descent
How 𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑? 𝑓𝜃(𝑥, 𝑆)
(𝑥𝑖,𝑦𝑖)∈𝑠
𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋)
ℓ HISTORY
8. Learning without big-data
Model-based Metric-based Optimization-based
Key Idea RNN, Memory Metric Learning Gradient Descent
How 𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑? 𝑓𝜃(𝑥, 𝑆)
(𝑥𝑖,𝑦𝑖)∈𝑠
𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋)
ℓ HISTORY
- Metric-based Approach
Siamese Neural Network for One-Shot Image Recognition (2015)
Matching networks for one shot learning (2016)
Prototypical networks for few-shot learning (2017)
Learning to Compare: Relation Network for Few-Shot Learning (2018)
- Model-based Approach
Meta-Learning with Memory Augmented Neural Networks (2016)
Meta Networks (2017)
- Optimization-based Approach
Optimization as a Model for Few-Shot Learning (2017)
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (2017)
9. Learning without big-data
Model-based Metric-based Optimization-based
Key Idea RNN, Memory Metric Learning Gradient Descent
How 𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑? 𝑓𝜃(𝑥, 𝑆)
(𝑥𝑖,𝑦𝑖)∈𝑠
𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋)
ℓ HISTORY
- Metric-based Approach
Siamese Neural Network for One-Shot Image Recognition (2015)
Matching networks for one shot learning (2016)
Prototypical networks for few-shot learning (2017)
Learning to Compare: Relation Network for Few-Shot Learning (2018)
- Model-based Approach
Meta-Learning with Memory Augmented Neural Networks (2016)
Meta Networks (2017)
- Optimization-based Approach
Optimization as a Model for Few-Shot Learning (2017)
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (2017)
: 가장 먼저 제안된 Meta learning
10. Learning without big-data
Model-based Metric-based Optimization-based
Key Idea RNN, Memory Metric Learning Gradient Descent
How 𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑? 𝑓𝜃(𝑥, 𝑆)
(𝑥𝑖,𝑦𝑖)∈𝑠
𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋)
ℓ HISTORY
- Metric-based Approach
Siamese Neural Network for One-Shot Image Recognition (2015)
Matching networks for one shot learning (2016)
Prototypical networks for few-shot learning (2017)
Learning to Compare: Relation Network for Few-Shot Learning (2018)
- Model-based Approach
Meta-Learning with Memory Augmented Neural Networks (2016)
Meta Networks (2017)
- Optimization-based Approach
Optimization as a Model for Few-Shot Learning (2017)
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (2017)
이하 모든 논문이 Matching
Network를 바탕으로 발전된 형태
11. Learning without big-data
ℓ HISTORY
- Metric-based Approach
Siamese Neural Network for One-Shot Image Recognition (2015)
Matching networks for one shot learning (2016)
Prototypical networks for few-shot learning (2017)
Learning to Compare: Relation Network for Few-Shot Learning (2018)
- Model-based Approach
Meta-Learning with Memory Augmented Neural Networks (2016)
Meta Networks (2017)
- Optimization-based Approach
Optimization as a Model for Few-Shot Learning (2017)
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (2017)
- Graph Neural Network-based Approach
Few-shot learning with graph neural networks (2017) [ICLR]
(:요즘은 이런 친구도 있어요)
12. Learning without big-data
PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION
Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019
Transfer Learning
13. Learning without big-data
PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION
Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019
Meta Learning
14. Learning without big-data
PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION
Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019
Transfer Learning with Data augmentation
Transfer Learning without Data augmentation
15. Learning without big-data
PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION
Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019
Transfer Learning with Data augmentation
Transfer Learning without Data augmentation
: 데이터를 왜곡 또는 변형시켜서 데이터셋의 크기를
증가시키는 것
16. Learning without big-data
PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION
Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019
Transfer Learning with Data augmentation
Transfer Learning without Data augmentation
대표적인 메타러닝 Methods
17. Learning without big-data
PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION
Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019
18. Learning without big-data
PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION
Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019
: 저자 주장과 달리 실제로 메타러닝
방법에 있어 성능의 큰 차이는 없어…
19. Learning without big-data
PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION
Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019
: 심지어 Matching Net의 간결한 구조가
MAML보다도 더 괜찮은 성능을 보임
20. Learning without big-data
PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION
Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019
: ConvN → Scratch단부터 설계된 모델
: ResnetN → Pretrained 된 Resnet 바탕 학습
21. Learning without big-data
PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION
Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019
: ConvN → Meta Learning 우세
Compared Method가 상위
: ResnetN → Transfer Learning 우세
Base line 이 상위
[기존 파라미터 영향이 있지 않을지…]
22. Learning without big-data
PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION
Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019
보통 우리가 모델을 설계 할 때 Pre-trained모델을
우선적으로 사용하기 때문에 (더 높은 정확도를
보장 받기에) 아직 Meta-Learning적 방법이 최선
의 접근법이 아닐 확률이 높음에 주의
24. Contribution about Matching Network
1. 거리기반 방법으로 모델 아키텍쳐를 제시했다는 점
2. 메타러닝에 학습에 특화된 에피소드 학습을 제시
3. 미니 이미지넷 데이터 베이스를 제공했다는 점
Propose the model architecture with distance-based(metric) way.
Present episode learning method that specified for meta-learning
Made mini-image-net database for few-shot learning
25. Training Approch
ℓ Training Strategy
Batch Training [Simple strategy]
- Limited data에서 일반적인 방법으로 지도학습이 잘 되지 않음
Episode Training [Meta strategy]
- Training을 할 때, Testing과 유사한 episode 구성 (Overfitting 방지)
- Training set : Support set(S), Batch set(B) 구성
ㅇ
N-way K-shot
일반적인 학습 방법으로 학습하면 쉽게 Overfitting 되고 충분한 성능이 보장되지 않음
33. Episode Training : Meta strategy
𝜃 = argmax
𝜃
𝐸𝐿~𝑇 𝐸𝑆~𝐿,𝐵~𝐿
(𝑥,𝑦)∈𝐵
𝑙𝑜𝑔𝑃𝜃(𝑦|𝑥, 𝑆)
ℓ Update Rule
34. Episode Training : Meta strategy
𝜃 = argmax
𝜃
𝐸𝐿~𝑇 𝐸𝑆~𝐿,𝐵~𝐿
(𝑥,𝑦)∈𝐵
𝑙𝑜𝑔𝑃𝜃(𝑦|𝑥, 𝑆)
ℓ Update Rule
: 전체 Task T에서 어떤 레이블을 뽑을지 정한다
T : Task
L : 레이블
35. Episode Training : Meta strategy
𝜃 = argmax
𝜃
𝐸𝐿~𝑇 𝐸𝑆~𝐿,𝐵~𝐿
(𝑥,𝑦)∈𝐵
𝑙𝑜𝑔𝑃𝜃(𝑦|𝑥, 𝑆)
ℓ Update Rule
: 뽑은 L에서 각각 Support set과 Batch set을 나눈다.
T : Task
L : 레이블
S : Support Set
B : Batch Set
36. Episode Training : Meta strategy
𝜃 = argmax
𝜃
𝐸𝐿~𝑇 𝐸𝑆~𝐿,𝐵~𝐿
(𝑥,𝑦)∈𝐵
𝑙𝑜𝑔𝑃𝜃(𝑦|𝑥, 𝑆)
ℓ Update Rule
: 해당하는 support set과 해당하는 batch에 있는 데이터를
넣어 해당하는 batch set에 있는 확률 값을 만든다.
T : Task
L : 레이블
S : Support Set
B : Batch Set
B안에 있는 𝑥(𝑖𝑛𝑝𝑢𝑡), 𝑦(𝑎𝑛𝑠𝑤𝑒𝑟)
넣어서 구하게 되는
확률 값 𝑃
37. Episode Training : Meta strategy
𝜃 = argmax
𝜃
𝐸𝐿~𝑇 𝐸𝑆~𝐿,𝐵~𝐿
(𝑥,𝑦)∈𝐵
𝑙𝑜𝑔𝑃𝜃(𝑦|𝑥, 𝑆)
ℓ Update Rule
: 따라서 해당 확률 값의 log likelihood를 Maximize하게
파라미터를 업데이트 해나가는 과정
38. Episode Training : Meta strategy
𝜃 = argmax
𝜃
𝐸𝐿~𝑇 𝐸𝑆~𝐿,𝐵~𝐿
(𝑥,𝑦)∈𝐵
𝑙𝑜𝑔𝑃𝜃(𝑦|𝑥, 𝑆)
ℓ Update Rule
논문에서 추천하는 Hyperparameter setting
Label (L) : 5 ~ 25 labels
각 label당 sample 수 : 1 ~ 5
39. Episode Training : Meta strategy
Support Set Query Set
Build Classifier
Predict
1
𝑇
𝑡=1
𝑇
𝐿(ෞ
𝑦𝑡, 𝑦𝑡)
P(y=c)
P(y=c)
Evaluate Loss
Update meta learner
[by gradient]
Meta – Learning = Learning to learn
Train Same way As Testing
쿼리셋이 어떤 서포트 셋에 대한클래스
인지 확률 값으로 만든다
59. PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
60. PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
Deep Neural Freature를 활용한 메트릭 기반 러닝을 적용
61. PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
모델링 관점에서의 특이점 : Attention와 rapid learning을 가능하게 하는 메모리의 활용
트레이닝 관점에서의 특이점 : Episode단위 학습 제안 (Support set, Batch set…)
62. PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
63. PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
Support Set 개념의 제안
64. PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
전반적으로 영감을 받은 연구는
Seq2Seq
Attention Mechanism
Memory Network
Pointer Network
65. PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
전반적으로 영감을 받은 연구는
Seq2Seq
Attention Mechanism
Memory Network
Pointer Network
핵심은 Attention
66. PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
Support Set과 같은 메타 학습전략에 대한 설명 및
확률분포 P가 신경망을 통해 parametrized 된다는 이야기
67. PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
Support Set과 같은 메타 학습전략에 대한 설명 및
확률분포 P가 신경망을 통해 parametrized 된다는 이야기
…
이어서 모델 전반에 대한 수식!
68. PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
Attention 알고리즘에 대한 간단한 정리 및
알고리즘의 ‘k-b Nearest Neighbours’ 와의 유사성
69. PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
어텐션 메커니즘에서 여러 개의 레이블에서 b개의 레이블이 0 으로
attention mapping이 된다면 ‘k-b’개의 NN개념으로 이해될 수 있다.
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
70. PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
어텐션 메커니즘에서 여러 개의 레이블에서 b개의 레이블이 0 으로
attention mapping이 된다면 ‘k-b’개의 NN개념으로 이해될 수 있다.
정말?
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
71. PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
Matching network는 metric 기반에 구조에 cosine 함수로 값을 계산
0 ≤ 𝑐𝑜𝑠𝑁 ≤ 1
72. PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
Matching network는 metric 기반에 구조에 cosine 함수로 값을 계산
0 ≤ 𝑐𝑜𝑠𝑁 ≤ 1
가장 비슷한 케이스
exp(1) = 2.72
가장 거리가 먼 케이스
exp(-1) = 0.37
73. PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
0 ≤ 𝑐𝑜𝑠𝑁 ≤ 1
가장 비슷한 케이스
exp(1) = 2.72
가장 거리가 먼 케이스
exp(-1) = 0.37
exp 1
exp 1 + 4 ∗ exp(−1)
= 0.65 … ?
74. PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
0 ≤ 𝑐𝑜𝑠𝑁 ≤ 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
1
166
331
496
661
826
991
1156
1321
1486
1651
1816
1981
2146
2311
2476
2641
2806
2971
Value of Attention with Cosine
𝑥 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑙𝑎𝑏𝑒𝑙
𝑦(𝑥10) = 0.421
𝑦(𝑥100) = 0.068
𝑦(𝑥500) = 0.014
𝑦(𝑥1000) = 0.007
𝑦(𝑥2000) = 0.003
𝑦(𝑥3000) = 0.002
0이 될 수가 없는 구조
75. PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
0 ≤ 𝑐𝑜𝑠𝑁 ≤ 1
가장 비슷한 케이스
exp(1) = 2.72
가장 거리가 먼 케이스
exp(-1) = 0.37
0이 될 수가 없는 구조
더 넓은 𝑥축에 대한 Range가 필요… (cosine으로는 불가)
76. PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
Attention Kernel에 대한 정리들 …
77. PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
복잡한 Task를 해결하기 위한 고도화 된 모델 제안
Full Context Embeddings…
78. PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
복잡한 Task를 해결하기 위한 고도화 된 모델 제안
Full Context Embeddings…
79. Full Context Embedding
𝑔𝜃 X
𝑓𝜃
Heavy Weight
Support set (S)
Batch set (B)
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
80. Full Context Embedding
𝑔𝜃 X
𝑓𝜃
Heavy Weight
Support set (S)
Batch set (B)
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
81. Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔′
𝑔′
𝑔′
단순히 CNN을 통해
Feature Extraction을 하기 때문에
Support set 레이블 간의
연관성 (dependent)가 부여되지 않음
𝑥𝑖 𝑦𝑖
82. Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔′
𝑔′
𝑔′
𝑥𝑖 𝑦𝑖
83. Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔′
𝑔′
𝑔′
𝐿𝑆𝑇𝑀
+
𝐿𝑆𝑇𝑀
𝐿𝑆𝑇𝑀
+
𝐿𝑆𝑇𝑀
𝐿𝑆𝑇𝑀
+
𝐿𝑆𝑇𝑀
𝑔(𝑥𝑖, 𝑆)
𝑥𝑖 𝑦𝑖
84. Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔′
𝑔′
𝑔′
𝐿𝑆𝑇𝑀
+
𝐿𝑆𝑇𝑀
𝐿𝑆𝑇𝑀
+
𝐿𝑆𝑇𝑀
𝐿𝑆𝑇𝑀
+
𝐿𝑆𝑇𝑀
𝑔(𝑥𝑖, 𝑆)
𝑥𝑖 𝑦𝑖
Bi-LSTM으로 추가 embedding
[서로 간의 dependen가 반영된 새로운 feature을 더 extraction하게 된다. ]
85. Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖−1, റ
𝑐𝑖−1)
ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖+1, ര
𝑐𝑖−1)
𝑔 𝑥𝑖, 𝑆 = ℎ𝑖 + ℎ𝑖 + 𝑔′(𝑥𝑖)
86. Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖−1, റ
𝑐𝑖−1)
ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖+1, ര
𝑐𝑖−1)
𝑔 𝑥𝑖, 𝑆 = ℎ𝑖 + ℎ𝑖 + 𝑔′(𝑥𝑖)
Hidden State
(from LSTM)
87. Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖−1, റ
𝑐𝑖−1)
ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖+1, ര
𝑐𝑖−1)
𝑔 𝑥𝑖, 𝑆 = ℎ𝑖 + ℎ𝑖 + 𝑔′(𝑥𝑖)
기존에 conv에서
뽑은 Feature
88. Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖−1, റ
𝑐𝑖−1)
ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖+1, ര
𝑐𝑖−1)
𝑔 𝑥𝑖, 𝑆 = ℎ𝑖 + ℎ𝑖 + 𝑔′(𝑥𝑖)
Residual connection으로
새롭게 feature extraction
89. Full Context Embedding
𝑔𝜃 X
𝑓𝜃
Heavy Weight
Support set (S)
Batch set (B)
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
90. Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔𝑥
𝑟𝑘−1
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 )
ℎ𝑘−1
ℎ𝑘−1
ℎ𝑘
𝐿𝑆𝑇𝑀 𝑓′
ො
𝑥
Batch set (B)
+
𝐿𝑆𝑇𝑀
+
𝑓 ො
𝑥, 𝑆 = ℎ𝑘
91. Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔𝑥
𝑟𝑘−1
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 )
ℎ𝑘−1
ℎ𝑘−1
ℎ𝑘
𝐿𝑆𝑇𝑀 𝑓′
ො
𝑥
Batch set (B)
+
𝐿𝑆𝑇𝑀
+
𝑓 ො
𝑥, 𝑆 = ℎ𝑘
Dependent Embedding 된 support set 정보 받기
Batch set에 attLSTM으로 추가 embedding
92. Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔𝑥
𝑟𝑘−1
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 )
ℎ𝑘−1
ℎ𝑘−1
ℎ𝑘
𝐿𝑆𝑇𝑀 𝑓′
ො
𝑥
Batch set (B)
+
𝐿𝑆𝑇𝑀
+
𝑓 ො
𝑥, 𝑆 = ℎ𝑘
Conv로 feature extraction 한 뒤 LSTM에 입력
93. Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔𝑥
𝑟𝑘−1
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 )
ℎ𝑘−1
ℎ𝑘−1
ℎ𝑘
𝐿𝑆𝑇𝑀 𝑓′
ො
𝑥
Batch set (B)
+
𝐿𝑆𝑇𝑀
+
𝑓 ො
𝑥, 𝑆 = ℎ𝑘
Sequence는 K개
[K is hyperparameter]
Conv로 feature extraction 한 뒤 LSTM에 입력
…
94. Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔𝑥
𝑟𝑘−1
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 )
ℎ𝑘−1
ℎ𝑘−1
ℎ𝑘
𝐿𝑆𝑇𝑀 𝑓′
ො
𝑥
Batch set (B)
+
𝐿𝑆𝑇𝑀
+
𝑓 ො
𝑥, 𝑆 = ℎ𝑘
ℎ𝑘, 𝑐𝑘 = 𝐿𝑆𝑇𝑀(𝑓′ ො
𝑥 , ℎ𝑘−1, 𝑟𝑘−1 , 𝑐𝑘−1)
ℎ𝑘 =
ℎ𝑘 + 𝑓′(ො
𝑥)
ℎ = 𝐻𝑖𝑑𝑑𝑒𝑛 𝑠𝑡𝑎𝑡𝑒
𝑐 = 𝑐𝑒𝑙𝑙 𝑠𝑡𝑎𝑡𝑒
Conv로 feature extraction 한 뒤 LSTM에 입력
95. Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔𝑥
𝑟𝑘−1
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 )
ℎ𝑘−1
ℎ𝑘−1
ℎ𝑘
𝐿𝑆𝑇𝑀 𝑓′
ො
𝑥
Batch set (B)
+
𝐿𝑆𝑇𝑀
+
𝑓 ො
𝑥, 𝑆 = ℎ𝑘
ℎ𝑘, 𝑐𝑘 = 𝐿𝑆𝑇𝑀(𝑓′ ො
𝑥 , ℎ𝑘−1, 𝑟𝑘−1 , 𝑐𝑘−1)
ℎ𝑘 =
ℎ𝑘 + 𝑓′(ො
𝑥)
ℎ = 𝐻𝑖𝑑𝑑𝑒𝑛 𝑠𝑡𝑎𝑡𝑒
𝑐 = 𝑐𝑒𝑙𝑙 𝑠𝑡𝑎𝑡𝑒
기존에 conv로 뽑은 feature와 LSTM으로
생성된 값과의 residual connection을 통해
새로운 feature을 뽑게 된다.
96. Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔𝑥
𝑟𝑘−1
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 )
ℎ𝑘−1
ℎ𝑘−1
ℎ𝑘
𝐿𝑆𝑇𝑀 𝑓′
ො
𝑥
Batch set (B)
+
𝐿𝑆𝑇𝑀
+
𝑓 ො
𝑥, 𝑆 = ℎ𝑘
ℎ𝑘, 𝑐𝑘 = 𝐿𝑆𝑇𝑀(𝑓′ ො
𝑥 , ℎ𝑘−1, 𝑟𝑘−1 , 𝑐𝑘−1)
ℎ𝑘 =
ℎ𝑘 + 𝑓′(ො
𝑥)
ℎ = 𝐻𝑖𝑑𝑑𝑒𝑛 𝑠𝑡𝑎𝑡𝑒
𝑐 = 𝑐𝑒𝑙𝑙 𝑠𝑡𝑎𝑡𝑒
𝑟𝑘−1 =
𝑖=1
|𝑠|
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 )𝑔(𝑥𝑖)
𝑎(ℎ𝑘−1, 𝑔(𝑥𝑖)) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(ℎ𝑘−1
𝑇
𝑔(𝑥𝑖))
위에서 LSTM을 통해 구한 feature와 support set에서 나온 feature들간
attention 매커니즘을 통해 새로운 feature를 뽑는다
97. Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔𝑥
𝑟𝑘−1
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 )
ℎ𝑘−1
ℎ𝑘−1
ℎ𝑘
𝐿𝑆𝑇𝑀 𝑓′
ො
𝑥
Batch set (B)
+
𝐿𝑆𝑇𝑀
+
𝑓 ො
𝑥, 𝑆 = ℎ𝑘
ℎ𝑘, 𝑐𝑘 = 𝐿𝑆𝑇𝑀(𝑓′ ො
𝑥 , ℎ𝑘−1, 𝑟𝑘−1 , 𝑐𝑘−1)
ℎ𝑘 =
ℎ𝑘 + 𝑓′(ො
𝑥)
ℎ = 𝐻𝑖𝑑𝑑𝑒𝑛 𝑠𝑡𝑎𝑡𝑒
𝑐 = 𝑐𝑒𝑙𝑙 𝑠𝑡𝑎𝑡𝑒
𝑟𝑘−1 =
𝑖=1
|𝑠|
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 )𝑔(𝑥𝑖)
𝑎(ℎ𝑘−1, 𝑔(𝑥𝑖)) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(ℎ𝑘−1
𝑇
𝑔(𝑥𝑖))
뽑아낸 feature을 다음 sequence로 넘긴다.
다음 sequence에서는 해당 정보가 강화되기 때문에 지속적으로 누적하여
support set의 정보를 얻게 된다.
정보의 누적
(of support set)
98. Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔𝑥
𝑟𝑘−1
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 )
ℎ𝑘−1
ℎ𝑘−1
ℎ𝑘
𝐿𝑆𝑇𝑀 𝑓′
ො
𝑥
Batch set (B)
+
𝐿𝑆𝑇𝑀
+
𝑓 ො
𝑥, 𝑆 = ℎ𝑘
ℎ𝑘, 𝑐𝑘 = 𝐿𝑆𝑇𝑀(𝑓′ ො
𝑥 , ℎ𝑘−1, 𝑟𝑘−1 , 𝑐𝑘−1)
ℎ𝑘 =
ℎ𝑘 + 𝑓′(ො
𝑥)
ℎ = 𝐻𝑖𝑑𝑑𝑒𝑛 𝑠𝑡𝑎𝑡𝑒
𝑐 = 𝑐𝑒𝑙𝑙 𝑠𝑡𝑎𝑡𝑒
𝑟𝑘−1 =
𝑖=1
|𝑠|
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 )𝑔(𝑥𝑖)
𝑎(ℎ𝑘−1, 𝑔(𝑥𝑖)) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(ℎ𝑘−1
𝑇
𝑔(𝑥𝑖))
최종 적으로 K 번 째 LSTM을 통해 나온 hidden state의 input값과
더하여 최종적인 feature을 뽑아내게 된다.
100. Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 =
𝑖=1
𝑘
𝑎 ො
𝑥, 𝑥𝑖 𝑦𝑖 𝑎 ො
𝑥, 𝑥 =
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥)))
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
101. Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
Bi-LSTM을 통해 dependent한 support set feature 추출
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 =
𝑖=1
𝑘
𝑎 ො
𝑥, 𝑥𝑖 𝑦𝑖 𝑎 ො
𝑥, 𝑥 =
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥)))
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
102. Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
Attentional LSTM을 통해 support set과 연관되어 강화된 정보로 뽑힌 feature
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 =
𝑖=1
𝑘
𝑎 ො
𝑥, 𝑥𝑖 𝑦𝑖 𝑎 ො
𝑥, 𝑥 =
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥)))
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
103. Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 =
𝑖=1
𝑘
𝑎 ො
𝑥, 𝑥𝑖 𝑦𝑖 𝑎 ො
𝑥, 𝑥 =
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥)))
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
Metric 기반 Attention으로 뽑아내는 확률
104. Matching Network Experiments
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
Dataset : Ominiglot [Lake+, 2011]
50 Alphabets with 1623 character
- Training 1200, Testing 423 (char)
- Training에서 본적 없는 alphabet으로 Testing
20 / character : character당 20명의 사람들의 수작업 데이터
- Training할 때 random rotation을 통해 augmentation
Input : 28*28 Image
4 stack modules [3x3 conv_64 + BN + Relu + 2x2 Max pooling]
105. Matching Network Experiments
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
: mating on raw pixels
: VGG
: Model base ML
110. Appendix : Seq2Seq
신경망을 통한 해결
seq2seq
인코더 : 점들의 나열을 입력하여 점들의 정보를 학습
디코더 : 학습한 정보로 최외각 점들의 번호 시퀀스를 출력
111. Appendix : Seq2Seq
- 출력이 입력에 의존하는 문제를 해결 하기에는
seq2seq가 적합하지 않음
- Attention을 사용해도 장거리 관계 포착은 해결 할 수
있지만 출력이 고정된다는 문제는 해결 불가
신경망을 통한 해결
Seq2seq with attention