SlideShare a Scribd company logo
1 of 111
Download to read offline
Learning without big-data
Model-based Metric-based Optimization-based
Key Idea RNN, Memory Metric Learning Gradient Descent
How
𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑?
𝑓𝜃(𝑥, 𝑆) ෍
(𝑥𝑖,𝑦𝑖)∈𝑠
𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋)
Learning without big-data
Model-based Metric-based Optimization-based
Key Idea RNN, Memory Metric Learning Gradient Descent
How
𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑?
𝑓𝜃(𝑥, 𝑆) ෍
(𝑥𝑖,𝑦𝑖)∈𝑠
𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋)
모델 내부 혹은 외부에 메모리 셀을
별도로 설계하여 모델을 더 빠르게 학
습시킬지 느리게 학습시킬지 조율
Learning without big-data
Model-based Metric-based Optimization-based
Key Idea RNN, Memory Metric Learning Gradient Descent
How
𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑?
𝑓𝜃(𝑥, 𝑆) ෍
(𝑥𝑖,𝑦𝑖)∈𝑠
𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋)
모델 내부 혹은 외부에 메모리 셀을
별도로 설계하여 모델을 더 빠르게 학
습시킬지 느리게 학습시킬지 조율
거리 기반으로 모델을 학습
Learning without big-data
Model-based Metric-based Optimization-based
Key Idea RNN, Memory Metric Learning Gradient Descent
How
𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑?
𝑓𝜃(𝑥, 𝑆) ෍
(𝑥𝑖,𝑦𝑖)∈𝑠
𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋)
모델 내부 혹은 외부에 메모리 셀을
별도로 설계하여 모델을 더 빠르게 학
습시킬지 느리게 학습시킬지 조율
거리 기반으로 모델을 학습
그레디언트를 어떻게 효율적으로 조절
하여 모델을 빠르게 학습시킬지 조율
Learning without big-data
Model-based Metric-based Optimization-based
Key Idea RNN, Memory Metric Learning Gradient Descent
How
𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑?
𝑓𝜃(𝑥, 𝑆) ෍
(𝑥𝑖,𝑦𝑖)∈𝑠
𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋)
모델 내부 혹은 외부에 메모리 셀을
별도로 설계하여 모델을 더 빠르게 학
습시킬지 느리게 학습시킬지 조율
거리 기반으로 모델을 학습
그레디언트를 어떻게 효율적으로 조절
하여 모델을 빠르게 학습시킬지 조율
Learning without big-data
Model-based Metric-based Optimization-based
Key Idea RNN, Memory Metric Learning Gradient Descent
How 𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑? 𝑓𝜃(𝑥, 𝑆) ෍
(𝑥𝑖,𝑦𝑖)∈𝑠
𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋)
ℓ HISTORY
Learning without big-data
Model-based Metric-based Optimization-based
Key Idea RNN, Memory Metric Learning Gradient Descent
How 𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑? 𝑓𝜃(𝑥, 𝑆) ෍
(𝑥𝑖,𝑦𝑖)∈𝑠
𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋)
ℓ HISTORY
- Metric-based Approach
Siamese Neural Network for One-Shot Image Recognition (2015)
Matching networks for one shot learning (2016)
Prototypical networks for few-shot learning (2017)
Learning to Compare: Relation Network for Few-Shot Learning (2018)
- Model-based Approach
Meta-Learning with Memory Augmented Neural Networks (2016)
Meta Networks (2017)
- Optimization-based Approach
Optimization as a Model for Few-Shot Learning (2017)
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (2017)
Learning without big-data
Model-based Metric-based Optimization-based
Key Idea RNN, Memory Metric Learning Gradient Descent
How 𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑? 𝑓𝜃(𝑥, 𝑆) ෍
(𝑥𝑖,𝑦𝑖)∈𝑠
𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋)
ℓ HISTORY
- Metric-based Approach
Siamese Neural Network for One-Shot Image Recognition (2015)
Matching networks for one shot learning (2016)
Prototypical networks for few-shot learning (2017)
Learning to Compare: Relation Network for Few-Shot Learning (2018)
- Model-based Approach
Meta-Learning with Memory Augmented Neural Networks (2016)
Meta Networks (2017)
- Optimization-based Approach
Optimization as a Model for Few-Shot Learning (2017)
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (2017)
: 가장 먼저 제안된 Meta learning
Learning without big-data
Model-based Metric-based Optimization-based
Key Idea RNN, Memory Metric Learning Gradient Descent
How 𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑? 𝑓𝜃(𝑥, 𝑆) ෍
(𝑥𝑖,𝑦𝑖)∈𝑠
𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋)
ℓ HISTORY
- Metric-based Approach
Siamese Neural Network for One-Shot Image Recognition (2015)
Matching networks for one shot learning (2016)
Prototypical networks for few-shot learning (2017)
Learning to Compare: Relation Network for Few-Shot Learning (2018)
- Model-based Approach
Meta-Learning with Memory Augmented Neural Networks (2016)
Meta Networks (2017)
- Optimization-based Approach
Optimization as a Model for Few-Shot Learning (2017)
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (2017)
이하 모든 논문이 Matching
Network를 바탕으로 발전된 형태
Learning without big-data
ℓ HISTORY
- Metric-based Approach
Siamese Neural Network for One-Shot Image Recognition (2015)
Matching networks for one shot learning (2016)
Prototypical networks for few-shot learning (2017)
Learning to Compare: Relation Network for Few-Shot Learning (2018)
- Model-based Approach
Meta-Learning with Memory Augmented Neural Networks (2016)
Meta Networks (2017)
- Optimization-based Approach
Optimization as a Model for Few-Shot Learning (2017)
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (2017)
- Graph Neural Network-based Approach
Few-shot learning with graph neural networks (2017) [ICLR]
(:요즘은 이런 친구도 있어요)
Learning without big-data
PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION
Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019
Transfer Learning
Learning without big-data
PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION
Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019
Meta Learning
Learning without big-data
PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION
Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019
Transfer Learning with Data augmentation
Transfer Learning without Data augmentation
Learning without big-data
PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION
Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019
Transfer Learning with Data augmentation
Transfer Learning without Data augmentation
: 데이터를 왜곡 또는 변형시켜서 데이터셋의 크기를
증가시키는 것
Learning without big-data
PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION
Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019
Transfer Learning with Data augmentation
Transfer Learning without Data augmentation
대표적인 메타러닝 Methods
Learning without big-data
PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION
Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019
Learning without big-data
PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION
Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019
: 저자 주장과 달리 실제로 메타러닝
방법에 있어 성능의 큰 차이는 없어…
Learning without big-data
PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION
Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019
: 심지어 Matching Net의 간결한 구조가
MAML보다도 더 괜찮은 성능을 보임
Learning without big-data
PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION
Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019
: ConvN → Scratch단부터 설계된 모델
: ResnetN → Pretrained 된 Resnet 바탕 학습
Learning without big-data
PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION
Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019
: ConvN → Meta Learning 우세
Compared Method가 상위
: ResnetN → Transfer Learning 우세
Base line 이 상위
[기존 파라미터 영향이 있지 않을지…]
Learning without big-data
PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION
Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019
보통 우리가 모델을 설계 할 때 Pre-trained모델을
우선적으로 사용하기 때문에 (더 높은 정확도를
보장 받기에) 아직 Meta-Learning적 방법이 최선
의 접근법이 아닐 확률이 높음에 주의
What is Matching Network?
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 = ෍
𝑖=1
𝑘
𝑎(ො
𝑥, 𝑥𝑖)𝑦𝑖
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
Contribution about Matching Network
1. 거리기반 방법으로 모델 아키텍쳐를 제시했다는 점
2. 메타러닝에 학습에 특화된 에피소드 학습을 제시
3. 미니 이미지넷 데이터 베이스를 제공했다는 점
Propose the model architecture with distance-based(metric) way.
Present episode learning method that specified for meta-learning
Made mini-image-net database for few-shot learning
Training Approch
ℓ Training Strategy
Batch Training [Simple strategy]
- Limited data에서 일반적인 방법으로 지도학습이 잘 되지 않음
Episode Training [Meta strategy]
- Training을 할 때, Testing과 유사한 episode 구성 (Overfitting 방지)
- Training set : Support set(S), Batch set(B) 구성
ㅇ
N-way K-shot
일반적인 학습 방법으로 학습하면 쉽게 Overfitting 되고 충분한 성능이 보장되지 않음
N-way K-shot
Monkey
YODA
?
ㅇ
Robot
Water
bears
?
ㅇ
Lion
Fish
Moth
?
Test Data “Lion Fish - Moth”
Train Dataset #1
Train Dataset #2
N-way K-shot
Monkey
YODA
?
ㅇ
Robot
Water
bears
?
ㅇ
Lion
Fish
Moth
?
Support Set (S) Batch Set (B)
Test Data “Lion Fish - Moth”
Train Dataset #1
Train Dataset #2
Train
Test
N-way K-shot
Monkey
YODA
?
ㅇ
Robot
Water
bears
?
ㅇ
Lion
Fish
Moth
?
Support Set (S) Batch Set (B)
Train Dataset #1
Train Dataset #2
Test Data “Lion Fish - Moth”
N way k Shot
Q.
N-way K-shot
Monkey
YODA
?
ㅇ
Robot
Water
bears
?
ㅇ
Lion
Fish
Moth
?
Support Set (S) Batch Set (B)
Train Dataset #1
Train Dataset #2
Test Data “Lion Fish - Moth”
2 way 4 Shot
N : Number of class for 1 episode
k : Number of samples
2 WAY
4 Shot
N-way K-shot
Monkey
YODA
?
ㅇ
Robot
Water
bears
?
ㅇ
Lion
Fish
Moth
?
Support Set (S) Batch Set (B)
Train Dataset #1
Train Dataset #2
Test Data “Lion Fish - Moth”
Episode에서 발생하는 데이터 수 K?
Q.
2 way 4 Shot
N-way K-shot
Monkey
YODA
?
ㅇ
Robot
Water
bears
?
ㅇ
Lion
Fish
Moth
?
Support Set (S) Batch Set (B)
Train Dataset #1
Train Dataset #2
Test Data “Lion Fish - Moth”
2 * 4 = 8
K = kN
2(N) way 4(k) Shot
N-way K-shot
Monkey
YODA
?
ㅇ
Robot
Water
bears
?
ㅇ
Lion
Fish
Moth
?
Support Set (S) Batch Set (B)
Episode 1
About Monkey & YODA
Episode 2
About Robot & Water Bears
Train Dataset #1
Train Dataset #2
Test Data “Lion Fish - Moth”
Episode Training : Meta strategy
𝜃 = argmax
𝜃
𝐸𝐿~𝑇 𝐸𝑆~𝐿,𝐵~𝐿 ෍
(𝑥,𝑦)∈𝐵
𝑙𝑜𝑔𝑃𝜃(𝑦|𝑥, 𝑆)
ℓ Update Rule
Episode Training : Meta strategy
𝜃 = argmax
𝜃
𝐸𝐿~𝑇 𝐸𝑆~𝐿,𝐵~𝐿 ෍
(𝑥,𝑦)∈𝐵
𝑙𝑜𝑔𝑃𝜃(𝑦|𝑥, 𝑆)
ℓ Update Rule
: 전체 Task T에서 어떤 레이블을 뽑을지 정한다
T : Task
L : 레이블
Episode Training : Meta strategy
𝜃 = argmax
𝜃
𝐸𝐿~𝑇 𝐸𝑆~𝐿,𝐵~𝐿 ෍
(𝑥,𝑦)∈𝐵
𝑙𝑜𝑔𝑃𝜃(𝑦|𝑥, 𝑆)
ℓ Update Rule
: 뽑은 L에서 각각 Support set과 Batch set을 나눈다.
T : Task
L : 레이블
S : Support Set
B : Batch Set
Episode Training : Meta strategy
𝜃 = argmax
𝜃
𝐸𝐿~𝑇 𝐸𝑆~𝐿,𝐵~𝐿 ෍
(𝑥,𝑦)∈𝐵
𝑙𝑜𝑔𝑃𝜃(𝑦|𝑥, 𝑆)
ℓ Update Rule
: 해당하는 support set과 해당하는 batch에 있는 데이터를
넣어 해당하는 batch set에 있는 확률 값을 만든다.
T : Task
L : 레이블
S : Support Set
B : Batch Set
B안에 있는 𝑥(𝑖𝑛𝑝𝑢𝑡), 𝑦(𝑎𝑛𝑠𝑤𝑒𝑟)
넣어서 구하게 되는
확률 값 𝑃
Episode Training : Meta strategy
𝜃 = argmax
𝜃
𝐸𝐿~𝑇 𝐸𝑆~𝐿,𝐵~𝐿 ෍
(𝑥,𝑦)∈𝐵
𝑙𝑜𝑔𝑃𝜃(𝑦|𝑥, 𝑆)
ℓ Update Rule
: 따라서 해당 확률 값의 log likelihood를 Maximize하게
파라미터를 업데이트 해나가는 과정
Episode Training : Meta strategy
𝜃 = argmax
𝜃
𝐸𝐿~𝑇 𝐸𝑆~𝐿,𝐵~𝐿 ෍
(𝑥,𝑦)∈𝐵
𝑙𝑜𝑔𝑃𝜃(𝑦|𝑥, 𝑆)
ℓ Update Rule
논문에서 추천하는 Hyperparameter setting
Label (L) : 5 ~ 25 labels
각 label당 sample 수 : 1 ~ 5
Episode Training : Meta strategy
Support Set Query Set
Build Classifier
Predict
1
𝑇
෍
𝑡=1
𝑇
𝐿(ෞ
𝑦𝑡, 𝑦𝑡)
P(y=c)
P(y=c)
Evaluate Loss
Update meta learner
[by gradient]
Meta – Learning = Learning to learn
Train Same way As Testing
쿼리셋이 어떤 서포트 셋에 대한클래스
인지 확률 값으로 만든다
Matching Network
𝑔𝜃 X
𝑓𝜃
෍
Heavy Weight
Support set (S)
Batch set (B)
Matching Network
𝑔𝜃 X
𝑓𝜃
෍
Heavy Weight
Support set (S)
Batch set (B)
Matching Network
𝑔𝜃 X
𝑓𝜃
෍
Heavy Weight
Support set (S)
Batch set (B)
4way
1shot
Matching Network
𝑔𝜃 X
𝑓𝜃
෍
Heavy Weight
Support set (S)
Batch set (B)
4way
1shot
Attention
Method
Matching Network
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 = ෍
𝑖=1
𝑘
𝑎(ො
𝑥, 𝑥𝑖)𝑦𝑖
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙
Matching Network
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 = ෍
𝑖=1
𝑘
𝑎(ො
𝑥, 𝑥𝑖)𝑦𝑖
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙
Support Set
Group
One Batch
Matching Network
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 = ෍
𝑖=1
𝑘
𝑎(ො
𝑥, 𝑥𝑖)𝑦𝑖
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙
어떤 label일지
확률 값으로 표현
Matching Network
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 = ෍
𝑖=1
𝑘
𝑎(ො
𝑥, 𝑥𝑖)𝑦𝑖
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙
𝑥𝑖 서포트셋 데이터 하나
ො
𝑥 배치 데이터 하나
Matching Network
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 = ෍
𝑖=1
𝑘
𝑎(ො
𝑥, 𝑥𝑖)𝑦𝑖
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙
Kernel Density
Estimator
거리기반으로 얼마나
가까운지 표현
Matching Network
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 = ෍
𝑖=1
𝑘
𝑎(ො
𝑥, 𝑥𝑖)𝑦𝑖
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙
함께 Mapping
확률 선정
Matching Network
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 = ෍
𝑖=1
𝑘
𝑎(ො
𝑥, 𝑥𝑖)𝑦𝑖
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙
Kernel Density
Estimator
Matching Network
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 = ෍
𝑖=1
𝑘
𝑎(ො
𝑥, 𝑥𝑖)𝑦𝑖
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙
Kernel Density
Estimator
𝑓𝜃를 통한 feature extraction
Matching Network
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 = ෍
𝑖=1
𝑘
𝑎(ො
𝑥, 𝑥𝑖)𝑦𝑖
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙
Kernel Density
Estimator
𝑓𝜃를 통한 feature extraction
𝑔𝜃를 통한 feature extraction
Matching Network
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 = ෍
𝑖=1
𝑘
𝑎(ො
𝑥, 𝑥𝑖)𝑦𝑖
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙
Kernel Density
Estimator
𝑔𝜃를 통한 feature extraction
𝑓𝜃를 통한 feature extraction
𝑓𝜃 𝑔𝜃 에 대한 paper description :
동일해도 가능 / 별도로 설정해도 가능
Matching Network
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 = ෍
𝑖=1
𝑘
𝑎(ො
𝑥, 𝑥𝑖)𝑦𝑖
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙
Kernel Density
Estimator
𝑔𝜃를 통한 feature extraction
𝑓𝜃를 통한 feature extraction
𝑓𝜃 𝑔𝜃 에 대한 paper description :
동일해도 가능 / 별도로 설정해도 가능 Word to Vec 학습법과 유사
Matching Network
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 = ෍
𝑖=1
𝑘
𝑎(ො
𝑥, 𝑥𝑖)𝑦𝑖
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙
Kernel Density
Estimator
𝑔𝜃를 통한 feature extraction
𝑓𝜃를 통한 feature extraction
𝑓𝜃 𝑔𝜃 에 대한 paper description :
Scratch 학습 (Conv 3~4개)
Pretrained 이용 (VGG, Inception…)
Matching Network
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 = ෍
𝑖=1
𝑘
𝑎(ො
𝑥, 𝑥𝑖)𝑦𝑖
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙
Kernel Density
Estimator
𝑓𝜃를 통한 feature extraction
𝑔𝜃를 통한 feature extraction
Cosine similarity를 통한 유사도 계산
Matching Network
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 = ෍
𝑖=1
𝑘
𝑎(ො
𝑥, 𝑥𝑖)𝑦𝑖
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙
Kernel Density
Estimator
𝑓𝜃를 통한 feature extraction
𝑔𝜃를 통한 feature extraction
Softmax를 통해 KDE를 구한다
Matching Network
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 = ෍
𝑖=1
𝑘
𝑎(ො
𝑥, 𝑥𝑖)𝑦𝑖
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙
Kernel Density
Estimator
𝑓𝜃를 통한 feature extraction
𝑔𝜃를 통한 feature extraction
레이블의 예측치들의 Attention 합
PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
Deep Neural Freature를 활용한 메트릭 기반 러닝을 적용
PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
모델링 관점에서의 특이점 : Attention와 rapid learning을 가능하게 하는 메모리의 활용
트레이닝 관점에서의 특이점 : Episode단위 학습 제안 (Support set, Batch set…)
PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
Support Set 개념의 제안
PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
전반적으로 영감을 받은 연구는
Seq2Seq
Attention Mechanism
Memory Network
Pointer Network
PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
전반적으로 영감을 받은 연구는
Seq2Seq
Attention Mechanism
Memory Network
Pointer Network
핵심은 Attention
PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
Support Set과 같은 메타 학습전략에 대한 설명 및
확률분포 P가 신경망을 통해 parametrized 된다는 이야기
PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
Support Set과 같은 메타 학습전략에 대한 설명 및
확률분포 P가 신경망을 통해 parametrized 된다는 이야기
…
이어서 모델 전반에 대한 수식!
PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
Attention 알고리즘에 대한 간단한 정리 및
알고리즘의 ‘k-b Nearest Neighbours’ 와의 유사성
PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
어텐션 메커니즘에서 여러 개의 레이블에서 b개의 레이블이 0 으로
attention mapping이 된다면 ‘k-b’개의 NN개념으로 이해될 수 있다.
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
어텐션 메커니즘에서 여러 개의 레이블에서 b개의 레이블이 0 으로
attention mapping이 된다면 ‘k-b’개의 NN개념으로 이해될 수 있다.
정말?
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
Matching network는 metric 기반에 구조에 cosine 함수로 값을 계산
0 ≤ 𝑐𝑜𝑠𝑁 ≤ 1
PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
Matching network는 metric 기반에 구조에 cosine 함수로 값을 계산
0 ≤ 𝑐𝑜𝑠𝑁 ≤ 1
가장 비슷한 케이스
exp(1) = 2.72
가장 거리가 먼 케이스
exp(-1) = 0.37
PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
0 ≤ 𝑐𝑜𝑠𝑁 ≤ 1
가장 비슷한 케이스
exp(1) = 2.72
가장 거리가 먼 케이스
exp(-1) = 0.37
exp 1
exp 1 + 4 ∗ exp(−1)
= 0.65 … ?
PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑎 ො
𝑥, 𝑥 =
exp(𝑐 𝑓 ො
𝑥 , 𝑔 𝑥 )
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
0 ≤ 𝑐𝑜𝑠𝑁 ≤ 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
1
166
331
496
661
826
991
1156
1321
1486
1651
1816
1981
2146
2311
2476
2641
2806
2971
Value of Attention with Cosine
𝑥 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑙𝑎𝑏𝑒𝑙
𝑦(𝑥10) = 0.421
𝑦(𝑥100) = 0.068
𝑦(𝑥500) = 0.014
𝑦(𝑥1000) = 0.007
𝑦(𝑥2000) = 0.003
𝑦(𝑥3000) = 0.002
0이 될 수가 없는 구조
PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
0 ≤ 𝑐𝑜𝑠𝑁 ≤ 1
가장 비슷한 케이스
exp(1) = 2.72
가장 거리가 먼 케이스
exp(-1) = 0.37
0이 될 수가 없는 구조
더 넓은 𝑥축에 대한 Range가 필요… (cosine으로는 불가)
PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
Attention Kernel에 대한 정리들 …
PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
복잡한 Task를 해결하기 위한 고도화 된 모델 제안
Full Context Embeddings…
PAPER : Matching Network
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
복잡한 Task를 해결하기 위한 고도화 된 모델 제안
Full Context Embeddings…
Full Context Embedding
𝑔𝜃 X
𝑓𝜃
෍
Heavy Weight
Support set (S)
Batch set (B)
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
Full Context Embedding
𝑔𝜃 X
𝑓𝜃
෍
Heavy Weight
Support set (S)
Batch set (B)
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔′
𝑔′
𝑔′
단순히 CNN을 통해
Feature Extraction을 하기 때문에
Support set 레이블 간의
연관성 (dependent)가 부여되지 않음
𝑥𝑖 𝑦𝑖
Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔′
𝑔′
𝑔′
𝑥𝑖 𝑦𝑖
Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔′
𝑔′
𝑔′
𝐿𝑆𝑇𝑀
+
𝐿𝑆𝑇𝑀
𝐿𝑆𝑇𝑀
+
𝐿𝑆𝑇𝑀
𝐿𝑆𝑇𝑀
+
𝐿𝑆𝑇𝑀
𝑔(𝑥𝑖, 𝑆)
𝑥𝑖 𝑦𝑖
Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔′
𝑔′
𝑔′
𝐿𝑆𝑇𝑀
+
𝐿𝑆𝑇𝑀
𝐿𝑆𝑇𝑀
+
𝐿𝑆𝑇𝑀
𝐿𝑆𝑇𝑀
+
𝐿𝑆𝑇𝑀
𝑔(𝑥𝑖, 𝑆)
𝑥𝑖 𝑦𝑖
Bi-LSTM으로 추가 embedding
[서로 간의 dependen가 반영된 새로운 feature을 더 extraction하게 된다. ]
Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖−1, റ
𝑐𝑖−1)
ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖+1, ര
𝑐𝑖−1)
𝑔 𝑥𝑖, 𝑆 = ℎ𝑖 + ℎ𝑖 + 𝑔′(𝑥𝑖)
Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖−1, റ
𝑐𝑖−1)
ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖+1, ര
𝑐𝑖−1)
𝑔 𝑥𝑖, 𝑆 = ℎ𝑖 + ℎ𝑖 + 𝑔′(𝑥𝑖)
Hidden State
(from LSTM)
Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖−1, റ
𝑐𝑖−1)
ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖+1, ര
𝑐𝑖−1)
𝑔 𝑥𝑖, 𝑆 = ℎ𝑖 + ℎ𝑖 + 𝑔′(𝑥𝑖)
기존에 conv에서
뽑은 Feature
Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖−1, റ
𝑐𝑖−1)
ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖+1, ര
𝑐𝑖−1)
𝑔 𝑥𝑖, 𝑆 = ℎ𝑖 + ℎ𝑖 + 𝑔′(𝑥𝑖)
Residual connection으로
새롭게 feature extraction
Full Context Embedding
𝑔𝜃 X
𝑓𝜃
෍
Heavy Weight
Support set (S)
Batch set (B)
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔𝑥
𝑟𝑘−1
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 ) ෠
ℎ𝑘−1
ℎ𝑘−1
෠
ℎ𝑘
𝐿𝑆𝑇𝑀 𝑓′
ො
𝑥
Batch set (B)
+
𝐿𝑆𝑇𝑀
+
𝑓 ො
𝑥, 𝑆 = ℎ𝑘
Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔𝑥
𝑟𝑘−1
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 ) ෠
ℎ𝑘−1
ℎ𝑘−1
෠
ℎ𝑘
𝐿𝑆𝑇𝑀 𝑓′
ො
𝑥
Batch set (B)
+
𝐿𝑆𝑇𝑀
+
𝑓 ො
𝑥, 𝑆 = ℎ𝑘
Dependent Embedding 된 support set 정보 받기
Batch set에 attLSTM으로 추가 embedding
Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔𝑥
𝑟𝑘−1
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 ) ෠
ℎ𝑘−1
ℎ𝑘−1
෠
ℎ𝑘
𝐿𝑆𝑇𝑀 𝑓′
ො
𝑥
Batch set (B)
+
𝐿𝑆𝑇𝑀
+
𝑓 ො
𝑥, 𝑆 = ℎ𝑘
Conv로 feature extraction 한 뒤 LSTM에 입력
Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔𝑥
𝑟𝑘−1
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 ) ෠
ℎ𝑘−1
ℎ𝑘−1
෠
ℎ𝑘
𝐿𝑆𝑇𝑀 𝑓′
ො
𝑥
Batch set (B)
+
𝐿𝑆𝑇𝑀
+
𝑓 ො
𝑥, 𝑆 = ℎ𝑘
Sequence는 K개
[K is hyperparameter]
Conv로 feature extraction 한 뒤 LSTM에 입력
…
Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔𝑥
𝑟𝑘−1
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 ) ෠
ℎ𝑘−1
ℎ𝑘−1
෠
ℎ𝑘
𝐿𝑆𝑇𝑀 𝑓′
ො
𝑥
Batch set (B)
+
𝐿𝑆𝑇𝑀
+
𝑓 ො
𝑥, 𝑆 = ℎ𝑘
෠
ℎ𝑘, 𝑐𝑘 = 𝐿𝑆𝑇𝑀(𝑓′ ො
𝑥 , ℎ𝑘−1, 𝑟𝑘−1 , 𝑐𝑘−1)
ℎ𝑘 = ෠
ℎ𝑘 + 𝑓′(ො
𝑥)
ℎ = 𝐻𝑖𝑑𝑑𝑒𝑛 𝑠𝑡𝑎𝑡𝑒
𝑐 = 𝑐𝑒𝑙𝑙 𝑠𝑡𝑎𝑡𝑒
Conv로 feature extraction 한 뒤 LSTM에 입력
Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔𝑥
𝑟𝑘−1
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 ) ෠
ℎ𝑘−1
ℎ𝑘−1
෠
ℎ𝑘
𝐿𝑆𝑇𝑀 𝑓′
ො
𝑥
Batch set (B)
+
𝐿𝑆𝑇𝑀
+
𝑓 ො
𝑥, 𝑆 = ℎ𝑘
෠
ℎ𝑘, 𝑐𝑘 = 𝐿𝑆𝑇𝑀(𝑓′ ො
𝑥 , ℎ𝑘−1, 𝑟𝑘−1 , 𝑐𝑘−1)
ℎ𝑘 = ෠
ℎ𝑘 + 𝑓′(ො
𝑥)
ℎ = 𝐻𝑖𝑑𝑑𝑒𝑛 𝑠𝑡𝑎𝑡𝑒
𝑐 = 𝑐𝑒𝑙𝑙 𝑠𝑡𝑎𝑡𝑒
기존에 conv로 뽑은 feature와 LSTM으로
생성된 값과의 residual connection을 통해
새로운 feature을 뽑게 된다.
Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔𝑥
𝑟𝑘−1
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 ) ෠
ℎ𝑘−1
ℎ𝑘−1
෠
ℎ𝑘
𝐿𝑆𝑇𝑀 𝑓′
ො
𝑥
Batch set (B)
+
𝐿𝑆𝑇𝑀
+
𝑓 ො
𝑥, 𝑆 = ℎ𝑘
෠
ℎ𝑘, 𝑐𝑘 = 𝐿𝑆𝑇𝑀(𝑓′ ො
𝑥 , ℎ𝑘−1, 𝑟𝑘−1 , 𝑐𝑘−1)
ℎ𝑘 = ෠
ℎ𝑘 + 𝑓′(ො
𝑥)
ℎ = 𝐻𝑖𝑑𝑑𝑒𝑛 𝑠𝑡𝑎𝑡𝑒
𝑐 = 𝑐𝑒𝑙𝑙 𝑠𝑡𝑎𝑡𝑒
𝑟𝑘−1 = ෍
𝑖=1
|𝑠|
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 )𝑔(𝑥𝑖)
𝑎(ℎ𝑘−1, 𝑔(𝑥𝑖)) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(ℎ𝑘−1
𝑇
𝑔(𝑥𝑖))
위에서 LSTM을 통해 구한 feature와 support set에서 나온 feature들간
attention 매커니즘을 통해 새로운 feature를 뽑는다
Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔𝑥
𝑟𝑘−1
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 ) ෠
ℎ𝑘−1
ℎ𝑘−1
෠
ℎ𝑘
𝐿𝑆𝑇𝑀 𝑓′
ො
𝑥
Batch set (B)
+
𝐿𝑆𝑇𝑀
+
𝑓 ො
𝑥, 𝑆 = ℎ𝑘
෠
ℎ𝑘, 𝑐𝑘 = 𝐿𝑆𝑇𝑀(𝑓′ ො
𝑥 , ℎ𝑘−1, 𝑟𝑘−1 , 𝑐𝑘−1)
ℎ𝑘 = ෠
ℎ𝑘 + 𝑓′(ො
𝑥)
ℎ = 𝐻𝑖𝑑𝑑𝑒𝑛 𝑠𝑡𝑎𝑡𝑒
𝑐 = 𝑐𝑒𝑙𝑙 𝑠𝑡𝑎𝑡𝑒
𝑟𝑘−1 = ෍
𝑖=1
|𝑠|
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 )𝑔(𝑥𝑖)
𝑎(ℎ𝑘−1, 𝑔(𝑥𝑖)) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(ℎ𝑘−1
𝑇
𝑔(𝑥𝑖))
뽑아낸 feature을 다음 sequence로 넘긴다.
다음 sequence에서는 해당 정보가 강화되기 때문에 지속적으로 누적하여
support set의 정보를 얻게 된다.
정보의 누적
(of support set)
Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔𝑥
𝑟𝑘−1
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 ) ෠
ℎ𝑘−1
ℎ𝑘−1
෠
ℎ𝑘
𝐿𝑆𝑇𝑀 𝑓′
ො
𝑥
Batch set (B)
+
𝐿𝑆𝑇𝑀
+
𝑓 ො
𝑥, 𝑆 = ℎ𝑘
෠
ℎ𝑘, 𝑐𝑘 = 𝐿𝑆𝑇𝑀(𝑓′ ො
𝑥 , ℎ𝑘−1, 𝑟𝑘−1 , 𝑐𝑘−1)
ℎ𝑘 = ෠
ℎ𝑘 + 𝑓′(ො
𝑥)
ℎ = 𝐻𝑖𝑑𝑑𝑒𝑛 𝑠𝑡𝑎𝑡𝑒
𝑐 = 𝑐𝑒𝑙𝑙 𝑠𝑡𝑎𝑡𝑒
𝑟𝑘−1 = ෍
𝑖=1
|𝑠|
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 )𝑔(𝑥𝑖)
𝑎(ℎ𝑘−1, 𝑔(𝑥𝑖)) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(ℎ𝑘−1
𝑇
𝑔(𝑥𝑖))
최종 적으로 K 번 째 LSTM을 통해 나온 hidden state의 input값과
더하여 최종적인 feature을 뽑아내게 된다.
Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑔𝑥
𝑟𝑘−1
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 ) ෠
ℎ𝑘−1
ℎ𝑘−1
෠
ℎ𝑘
𝐿𝑆𝑇𝑀 𝑓′
ො
𝑥
Batch set (B)
+
𝐿𝑆𝑇𝑀
+
𝑓 ො
𝑥, 𝑆 = ℎ𝑘
𝑓 ො
𝑥, 𝑆 = 𝑎𝑡𝑡𝐿𝑆𝑇𝑀 𝑓′
ො
𝑥 , 𝑔 𝑠 , 𝐾 = ℎ𝐾
෠
ℎ𝑘, 𝑐𝑘 = 𝐿𝑆𝑇𝑀(𝑓′ ො
𝑥 , ℎ𝑘−1, 𝑟𝑘−1 , 𝑐𝑘−1)
ℎ𝑘 = ෠
ℎ𝑘 + 𝑓′(ො
𝑥)
𝑟𝑘−1 = ෍
𝑖=1
|𝑠|
𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 )𝑔(𝑥𝑖)
Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 = ෍
𝑖=1
𝑘
𝑎 ො
𝑥, 𝑥𝑖 𝑦𝑖 𝑎 ො
𝑥, 𝑥 =
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥)))
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
Bi-LSTM을 통해 dependent한 support set feature 추출
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 = ෍
𝑖=1
𝑘
𝑎 ො
𝑥, 𝑥𝑖 𝑦𝑖 𝑎 ො
𝑥, 𝑥 =
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥)))
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
Attentional LSTM을 통해 support set과 연관되어 강화된 정보로 뽑힌 feature
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 = ෍
𝑖=1
𝑘
𝑎 ො
𝑥, 𝑥𝑖 𝑦𝑖 𝑎 ො
𝑥, 𝑥 =
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥)))
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
Full Context Embedding
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
𝑃 ො
𝑦𝑘 = 1 ො
𝑥, 𝑆 = ෍
𝑖=1
𝑘
𝑎 ො
𝑥, 𝑥𝑖 𝑦𝑖 𝑎 ො
𝑥, 𝑥 =
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥)))
σ𝑖=1
𝐾
exp(𝑐(𝑓 ො
𝑥 , 𝑔(𝑥𝑖)))
Metric 기반 Attention으로 뽑아내는 확률
Matching Network Experiments
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
Dataset : Ominiglot [Lake+, 2011]
50 Alphabets with 1623 character
- Training 1200, Testing 423 (char)
- Training에서 본적 없는 alphabet으로 Testing
20 / character : character당 20명의 사람들의 수작업 데이터
- Training할 때 random rotation을 통해 augmentation
Input : 28*28 Image
4 stack modules [3x3 conv_64 + BN + Relu + 2x2 Max pooling]
Matching Network Experiments
Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
: mating on raw pixels
: VGG
: Model base ML
Q n A
Q & A
Appendix : Seq2Seq
Convex hull problem
∶최외각 점들 탐색 문제
- 점들이 주어질 때 최외각 점들을 찾기
Appendix : Seq2Seq
Convex hull problem
∶최외각 점들 탐색 문제
- 점들이 주어질 때 최외각 점들을 찾기
Appendix : Seq2Seq
신경망을 통한 해결
seq2seq
인코더 : 점들의 나열을 입력하여 점들의 정보를 학습
디코더 : 학습한 정보로 최외각 점들의 번호 시퀀스를 출력
Appendix : Seq2Seq
- 출력이 입력에 의존하는 문제를 해결 하기에는
seq2seq가 적합하지 않음
- Attention을 사용해도 장거리 관계 포착은 해결 할 수
있지만 출력이 고정된다는 문제는 해결 불가
신경망을 통한 해결
Seq2seq with attention

More Related Content

What's hot

TWO STAGE NETWORKS
TWO STAGE NETWORKSTWO STAGE NETWORKS
TWO STAGE NETWORKSAakankshaR
 
Smith chart:A graphical representation.
Smith chart:A graphical representation.Smith chart:A graphical representation.
Smith chart:A graphical representation.amitmeghanani
 
Frequency Modulation and Demodulation
Frequency  Modulation and DemodulationFrequency  Modulation and Demodulation
Frequency Modulation and Demodulationj naga sai
 
Millimeter wave as the future of 5g
Millimeter wave as the future of 5g Millimeter wave as the future of 5g
Millimeter wave as the future of 5g Saurabh Verma
 
Stick Diagram and Lambda Based Design Rules
Stick Diagram and Lambda Based Design RulesStick Diagram and Lambda Based Design Rules
Stick Diagram and Lambda Based Design RulesTahsin Al Mahi
 
M ary psk modulation
M ary psk modulationM ary psk modulation
M ary psk modulationAhmed Diaa
 
DIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSINGDIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSINGSnehal Hedau
 
Fading in wireless propagation channel
Fading in wireless propagation channelFading in wireless propagation channel
Fading in wireless propagation channelSunandita Debnath
 
EC6503 TLWG - Properties of Smith Chart
EC6503 TLWG - Properties of Smith ChartEC6503 TLWG - Properties of Smith Chart
EC6503 TLWG - Properties of Smith Chartchitrarengasamy
 
Channel Models for Massive MIMO
Channel Models for Massive MIMOChannel Models for Massive MIMO
Channel Models for Massive MIMOCPqD
 
Case study of wireless sensor network
Case study of wireless sensor networkCase study of wireless sensor network
Case study of wireless sensor networkSushil Aggarwal
 
Impedance matching in awr
Impedance matching in awrImpedance matching in awr
Impedance matching in awrAntul Kashyap
 

What's hot (20)

TWO STAGE NETWORKS
TWO STAGE NETWORKSTWO STAGE NETWORKS
TWO STAGE NETWORKS
 
Massive mimo
Massive mimoMassive mimo
Massive mimo
 
Smith chart:A graphical representation.
Smith chart:A graphical representation.Smith chart:A graphical representation.
Smith chart:A graphical representation.
 
Magic tee
Magic tee  Magic tee
Magic tee
 
Frequency Modulation and Demodulation
Frequency  Modulation and DemodulationFrequency  Modulation and Demodulation
Frequency Modulation and Demodulation
 
Millimeter wave as the future of 5g
Millimeter wave as the future of 5g Millimeter wave as the future of 5g
Millimeter wave as the future of 5g
 
Stick Diagram and Lambda Based Design Rules
Stick Diagram and Lambda Based Design RulesStick Diagram and Lambda Based Design Rules
Stick Diagram and Lambda Based Design Rules
 
Earlang ejercicios
Earlang ejerciciosEarlang ejercicios
Earlang ejercicios
 
Spread spectrum
Spread spectrumSpread spectrum
Spread spectrum
 
M ary psk modulation
M ary psk modulationM ary psk modulation
M ary psk modulation
 
DIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSINGDIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSING
 
Microstripline
MicrostriplineMicrostripline
Microstripline
 
Impedance Matching
Impedance MatchingImpedance Matching
Impedance Matching
 
Fading in wireless propagation channel
Fading in wireless propagation channelFading in wireless propagation channel
Fading in wireless propagation channel
 
EC6503 TLWG - Properties of Smith Chart
EC6503 TLWG - Properties of Smith ChartEC6503 TLWG - Properties of Smith Chart
EC6503 TLWG - Properties of Smith Chart
 
Channel Models for Massive MIMO
Channel Models for Massive MIMOChannel Models for Massive MIMO
Channel Models for Massive MIMO
 
Ch 06
Ch 06Ch 06
Ch 06
 
Case study of wireless sensor network
Case study of wireless sensor networkCase study of wireless sensor network
Case study of wireless sensor network
 
Impedance matching in awr
Impedance matching in awrImpedance matching in awr
Impedance matching in awr
 
Unit 210 - Lesson 3 - Bandwidth
Unit 210 - Lesson 3 - BandwidthUnit 210 - Lesson 3 - Bandwidth
Unit 210 - Lesson 3 - Bandwidth
 

Similar to Matching Network

Research Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep LearningResearch Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep LearningSergey Karayev
 
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...MLAI2
 
How can we train with few data
How can we train with few dataHow can we train with few data
How can we train with few dataDong Heon Cho
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overviewLEE HOSEONG
 
2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong LeeMoazzem Hossain
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Anubhav Jain
 
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network Yan Xu
 
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...Isabelle Augenstein
 
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...cvpaper. challenge
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningﺁﺻﻒ ﻋﻠﯽ ﻣﯿﺮ
 
Visual geometry with deep learning
Visual geometry with deep learningVisual geometry with deep learning
Visual geometry with deep learningNAVER Engineering
 
Object Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet IIObject Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet IIWanjin Yu
 
UTS workshop talk
UTS workshop talkUTS workshop talk
UTS workshop talkLei Wang
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET Journal
 
Neo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExpNeo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExpAdrian Ziegler
 
Visual concept learning
Visual concept learningVisual concept learning
Visual concept learningVaibhav Singh
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstracttsysglobalsolutions
 
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep NetworksModel-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep NetworksYoonho Lee
 

Similar to Matching Network (20)

Research Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep LearningResearch Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep Learning
 
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
 
How can we train with few data
How can we train with few dataHow can we train with few data
How can we train with few data
 
2019 cvpr paper_overview
2019 cvpr paper_overview2019 cvpr paper_overview
2019 cvpr paper_overview
 
2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee2019 cvpr paper overview by Ho Seong Lee
2019 cvpr paper overview by Ho Seong Lee
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
 
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
Towards Set Learning and Prediction - Laura Leal-Taixe - UPC Barcelona 2018
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
 
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
教師なし画像特徴表現学習の動向 {Un, Self} supervised representation learning (CVPR 2018 完全読破...
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learning
 
Visual geometry with deep learning
Visual geometry with deep learningVisual geometry with deep learning
Visual geometry with deep learning
 
Object Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet IIObject Detection Beyond Mask R-CNN and RetinaNet II
Object Detection Beyond Mask R-CNN and RetinaNet II
 
Mapping Keywords to
Mapping Keywords to Mapping Keywords to
Mapping Keywords to
 
UTS workshop talk
UTS workshop talkUTS workshop talk
UTS workshop talk
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and Python
 
Neo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExpNeo4j MeetUp - Graph Exploration with MetaExp
Neo4j MeetUp - Graph Exploration with MetaExp
 
Visual concept learning
Visual concept learningVisual concept learning
Visual concept learning
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
 
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep NetworksModel-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
 

Recently uploaded

Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 

Recently uploaded (20)

Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 

Matching Network

  • 1.
  • 2. Learning without big-data Model-based Metric-based Optimization-based Key Idea RNN, Memory Metric Learning Gradient Descent How 𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑? 𝑓𝜃(𝑥, 𝑆) ෍ (𝑥𝑖,𝑦𝑖)∈𝑠 𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋)
  • 3. Learning without big-data Model-based Metric-based Optimization-based Key Idea RNN, Memory Metric Learning Gradient Descent How 𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑? 𝑓𝜃(𝑥, 𝑆) ෍ (𝑥𝑖,𝑦𝑖)∈𝑠 𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋) 모델 내부 혹은 외부에 메모리 셀을 별도로 설계하여 모델을 더 빠르게 학 습시킬지 느리게 학습시킬지 조율
  • 4. Learning without big-data Model-based Metric-based Optimization-based Key Idea RNN, Memory Metric Learning Gradient Descent How 𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑? 𝑓𝜃(𝑥, 𝑆) ෍ (𝑥𝑖,𝑦𝑖)∈𝑠 𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋) 모델 내부 혹은 외부에 메모리 셀을 별도로 설계하여 모델을 더 빠르게 학 습시킬지 느리게 학습시킬지 조율 거리 기반으로 모델을 학습
  • 5. Learning without big-data Model-based Metric-based Optimization-based Key Idea RNN, Memory Metric Learning Gradient Descent How 𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑? 𝑓𝜃(𝑥, 𝑆) ෍ (𝑥𝑖,𝑦𝑖)∈𝑠 𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋) 모델 내부 혹은 외부에 메모리 셀을 별도로 설계하여 모델을 더 빠르게 학 습시킬지 느리게 학습시킬지 조율 거리 기반으로 모델을 학습 그레디언트를 어떻게 효율적으로 조절 하여 모델을 빠르게 학습시킬지 조율
  • 6. Learning without big-data Model-based Metric-based Optimization-based Key Idea RNN, Memory Metric Learning Gradient Descent How 𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑? 𝑓𝜃(𝑥, 𝑆) ෍ (𝑥𝑖,𝑦𝑖)∈𝑠 𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋) 모델 내부 혹은 외부에 메모리 셀을 별도로 설계하여 모델을 더 빠르게 학 습시킬지 느리게 학습시킬지 조율 거리 기반으로 모델을 학습 그레디언트를 어떻게 효율적으로 조절 하여 모델을 빠르게 학습시킬지 조율
  • 7. Learning without big-data Model-based Metric-based Optimization-based Key Idea RNN, Memory Metric Learning Gradient Descent How 𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑? 𝑓𝜃(𝑥, 𝑆) ෍ (𝑥𝑖,𝑦𝑖)∈𝑠 𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋) ℓ HISTORY
  • 8. Learning without big-data Model-based Metric-based Optimization-based Key Idea RNN, Memory Metric Learning Gradient Descent How 𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑? 𝑓𝜃(𝑥, 𝑆) ෍ (𝑥𝑖,𝑦𝑖)∈𝑠 𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋) ℓ HISTORY - Metric-based Approach Siamese Neural Network for One-Shot Image Recognition (2015) Matching networks for one shot learning (2016) Prototypical networks for few-shot learning (2017) Learning to Compare: Relation Network for Few-Shot Learning (2018) - Model-based Approach Meta-Learning with Memory Augmented Neural Networks (2016) Meta Networks (2017) - Optimization-based Approach Optimization as a Model for Few-Shot Learning (2017) Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (2017)
  • 9. Learning without big-data Model-based Metric-based Optimization-based Key Idea RNN, Memory Metric Learning Gradient Descent How 𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑? 𝑓𝜃(𝑥, 𝑆) ෍ (𝑥𝑖,𝑦𝑖)∈𝑠 𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋) ℓ HISTORY - Metric-based Approach Siamese Neural Network for One-Shot Image Recognition (2015) Matching networks for one shot learning (2016) Prototypical networks for few-shot learning (2017) Learning to Compare: Relation Network for Few-Shot Learning (2018) - Model-based Approach Meta-Learning with Memory Augmented Neural Networks (2016) Meta Networks (2017) - Optimization-based Approach Optimization as a Model for Few-Shot Learning (2017) Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (2017) : 가장 먼저 제안된 Meta learning
  • 10. Learning without big-data Model-based Metric-based Optimization-based Key Idea RNN, Memory Metric Learning Gradient Descent How 𝑝𝜃 𝑦 𝑥 𝑖𝑠 𝑚𝑜𝑑𝑒𝑙𝑒𝑑? 𝑓𝜃(𝑥, 𝑆) ෍ (𝑥𝑖,𝑦𝑖)∈𝑠 𝑘𝜃(𝑥, 𝑥𝑖)𝑦𝑖(∗) 𝑃𝑔∅ 𝜃,𝑆𝐿 (𝑦|𝑋) ℓ HISTORY - Metric-based Approach Siamese Neural Network for One-Shot Image Recognition (2015) Matching networks for one shot learning (2016) Prototypical networks for few-shot learning (2017) Learning to Compare: Relation Network for Few-Shot Learning (2018) - Model-based Approach Meta-Learning with Memory Augmented Neural Networks (2016) Meta Networks (2017) - Optimization-based Approach Optimization as a Model for Few-Shot Learning (2017) Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (2017) 이하 모든 논문이 Matching Network를 바탕으로 발전된 형태
  • 11. Learning without big-data ℓ HISTORY - Metric-based Approach Siamese Neural Network for One-Shot Image Recognition (2015) Matching networks for one shot learning (2016) Prototypical networks for few-shot learning (2017) Learning to Compare: Relation Network for Few-Shot Learning (2018) - Model-based Approach Meta-Learning with Memory Augmented Neural Networks (2016) Meta Networks (2017) - Optimization-based Approach Optimization as a Model for Few-Shot Learning (2017) Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (2017) - Graph Neural Network-based Approach Few-shot learning with graph neural networks (2017) [ICLR] (:요즘은 이런 친구도 있어요)
  • 12. Learning without big-data PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019 Transfer Learning
  • 13. Learning without big-data PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019 Meta Learning
  • 14. Learning without big-data PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019 Transfer Learning with Data augmentation Transfer Learning without Data augmentation
  • 15. Learning without big-data PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019 Transfer Learning with Data augmentation Transfer Learning without Data augmentation : 데이터를 왜곡 또는 변형시켜서 데이터셋의 크기를 증가시키는 것
  • 16. Learning without big-data PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019 Transfer Learning with Data augmentation Transfer Learning without Data augmentation 대표적인 메타러닝 Methods
  • 17. Learning without big-data PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019
  • 18. Learning without big-data PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019 : 저자 주장과 달리 실제로 메타러닝 방법에 있어 성능의 큰 차이는 없어…
  • 19. Learning without big-data PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019 : 심지어 Matching Net의 간결한 구조가 MAML보다도 더 괜찮은 성능을 보임
  • 20. Learning without big-data PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019 : ConvN → Scratch단부터 설계된 모델 : ResnetN → Pretrained 된 Resnet 바탕 학습
  • 21. Learning without big-data PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019 : ConvN → Meta Learning 우세 Compared Method가 상위 : ResnetN → Transfer Learning 우세 Base line 이 상위 [기존 파라미터 영향이 있지 않을지…]
  • 22. Learning without big-data PAPER : A CLOSER LOOK AT FEW-SHOT CLASSIFICATION Chen, Wei-Yu, et al. "A closer look at few-shot classification." arXiv preprint arXiv:1904.04232 (2019). ICLR2019 보통 우리가 모델을 설계 할 때 Pre-trained모델을 우선적으로 사용하기 때문에 (더 높은 정확도를 보장 받기에) 아직 Meta-Learning적 방법이 최선 의 접근법이 아닐 확률이 높음에 주의
  • 23. What is Matching Network? 𝑃 ො 𝑦𝑘 = 1 ො 𝑥, 𝑆 = ෍ 𝑖=1 𝑘 𝑎(ො 𝑥, 𝑥𝑖)𝑦𝑖 𝑎 ො 𝑥, 𝑥 = exp(𝑐 𝑓 ො 𝑥 , 𝑔 𝑥 ) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖)))
  • 24. Contribution about Matching Network 1. 거리기반 방법으로 모델 아키텍쳐를 제시했다는 점 2. 메타러닝에 학습에 특화된 에피소드 학습을 제시 3. 미니 이미지넷 데이터 베이스를 제공했다는 점 Propose the model architecture with distance-based(metric) way. Present episode learning method that specified for meta-learning Made mini-image-net database for few-shot learning
  • 25. Training Approch ℓ Training Strategy Batch Training [Simple strategy] - Limited data에서 일반적인 방법으로 지도학습이 잘 되지 않음 Episode Training [Meta strategy] - Training을 할 때, Testing과 유사한 episode 구성 (Overfitting 방지) - Training set : Support set(S), Batch set(B) 구성 ㅇ N-way K-shot 일반적인 학습 방법으로 학습하면 쉽게 Overfitting 되고 충분한 성능이 보장되지 않음
  • 26. N-way K-shot Monkey YODA ? ㅇ Robot Water bears ? ㅇ Lion Fish Moth ? Test Data “Lion Fish - Moth” Train Dataset #1 Train Dataset #2
  • 27. N-way K-shot Monkey YODA ? ㅇ Robot Water bears ? ㅇ Lion Fish Moth ? Support Set (S) Batch Set (B) Test Data “Lion Fish - Moth” Train Dataset #1 Train Dataset #2 Train Test
  • 28. N-way K-shot Monkey YODA ? ㅇ Robot Water bears ? ㅇ Lion Fish Moth ? Support Set (S) Batch Set (B) Train Dataset #1 Train Dataset #2 Test Data “Lion Fish - Moth” N way k Shot Q.
  • 29. N-way K-shot Monkey YODA ? ㅇ Robot Water bears ? ㅇ Lion Fish Moth ? Support Set (S) Batch Set (B) Train Dataset #1 Train Dataset #2 Test Data “Lion Fish - Moth” 2 way 4 Shot N : Number of class for 1 episode k : Number of samples 2 WAY 4 Shot
  • 30. N-way K-shot Monkey YODA ? ㅇ Robot Water bears ? ㅇ Lion Fish Moth ? Support Set (S) Batch Set (B) Train Dataset #1 Train Dataset #2 Test Data “Lion Fish - Moth” Episode에서 발생하는 데이터 수 K? Q. 2 way 4 Shot
  • 31. N-way K-shot Monkey YODA ? ㅇ Robot Water bears ? ㅇ Lion Fish Moth ? Support Set (S) Batch Set (B) Train Dataset #1 Train Dataset #2 Test Data “Lion Fish - Moth” 2 * 4 = 8 K = kN 2(N) way 4(k) Shot
  • 32. N-way K-shot Monkey YODA ? ㅇ Robot Water bears ? ㅇ Lion Fish Moth ? Support Set (S) Batch Set (B) Episode 1 About Monkey & YODA Episode 2 About Robot & Water Bears Train Dataset #1 Train Dataset #2 Test Data “Lion Fish - Moth”
  • 33. Episode Training : Meta strategy 𝜃 = argmax 𝜃 𝐸𝐿~𝑇 𝐸𝑆~𝐿,𝐵~𝐿 ෍ (𝑥,𝑦)∈𝐵 𝑙𝑜𝑔𝑃𝜃(𝑦|𝑥, 𝑆) ℓ Update Rule
  • 34. Episode Training : Meta strategy 𝜃 = argmax 𝜃 𝐸𝐿~𝑇 𝐸𝑆~𝐿,𝐵~𝐿 ෍ (𝑥,𝑦)∈𝐵 𝑙𝑜𝑔𝑃𝜃(𝑦|𝑥, 𝑆) ℓ Update Rule : 전체 Task T에서 어떤 레이블을 뽑을지 정한다 T : Task L : 레이블
  • 35. Episode Training : Meta strategy 𝜃 = argmax 𝜃 𝐸𝐿~𝑇 𝐸𝑆~𝐿,𝐵~𝐿 ෍ (𝑥,𝑦)∈𝐵 𝑙𝑜𝑔𝑃𝜃(𝑦|𝑥, 𝑆) ℓ Update Rule : 뽑은 L에서 각각 Support set과 Batch set을 나눈다. T : Task L : 레이블 S : Support Set B : Batch Set
  • 36. Episode Training : Meta strategy 𝜃 = argmax 𝜃 𝐸𝐿~𝑇 𝐸𝑆~𝐿,𝐵~𝐿 ෍ (𝑥,𝑦)∈𝐵 𝑙𝑜𝑔𝑃𝜃(𝑦|𝑥, 𝑆) ℓ Update Rule : 해당하는 support set과 해당하는 batch에 있는 데이터를 넣어 해당하는 batch set에 있는 확률 값을 만든다. T : Task L : 레이블 S : Support Set B : Batch Set B안에 있는 𝑥(𝑖𝑛𝑝𝑢𝑡), 𝑦(𝑎𝑛𝑠𝑤𝑒𝑟) 넣어서 구하게 되는 확률 값 𝑃
  • 37. Episode Training : Meta strategy 𝜃 = argmax 𝜃 𝐸𝐿~𝑇 𝐸𝑆~𝐿,𝐵~𝐿 ෍ (𝑥,𝑦)∈𝐵 𝑙𝑜𝑔𝑃𝜃(𝑦|𝑥, 𝑆) ℓ Update Rule : 따라서 해당 확률 값의 log likelihood를 Maximize하게 파라미터를 업데이트 해나가는 과정
  • 38. Episode Training : Meta strategy 𝜃 = argmax 𝜃 𝐸𝐿~𝑇 𝐸𝑆~𝐿,𝐵~𝐿 ෍ (𝑥,𝑦)∈𝐵 𝑙𝑜𝑔𝑃𝜃(𝑦|𝑥, 𝑆) ℓ Update Rule 논문에서 추천하는 Hyperparameter setting Label (L) : 5 ~ 25 labels 각 label당 sample 수 : 1 ~ 5
  • 39. Episode Training : Meta strategy Support Set Query Set Build Classifier Predict 1 𝑇 ෍ 𝑡=1 𝑇 𝐿(ෞ 𝑦𝑡, 𝑦𝑡) P(y=c) P(y=c) Evaluate Loss Update meta learner [by gradient] Meta – Learning = Learning to learn Train Same way As Testing 쿼리셋이 어떤 서포트 셋에 대한클래스 인지 확률 값으로 만든다
  • 40. Matching Network 𝑔𝜃 X 𝑓𝜃 ෍ Heavy Weight Support set (S) Batch set (B)
  • 41. Matching Network 𝑔𝜃 X 𝑓𝜃 ෍ Heavy Weight Support set (S) Batch set (B)
  • 42. Matching Network 𝑔𝜃 X 𝑓𝜃 ෍ Heavy Weight Support set (S) Batch set (B) 4way 1shot
  • 43. Matching Network 𝑔𝜃 X 𝑓𝜃 ෍ Heavy Weight Support set (S) Batch set (B) 4way 1shot Attention Method
  • 44. Matching Network 𝑃 ො 𝑦𝑘 = 1 ො 𝑥, 𝑆 = ෍ 𝑖=1 𝑘 𝑎(ො 𝑥, 𝑥𝑖)𝑦𝑖 𝑎 ො 𝑥, 𝑥 = exp(𝑐 𝑓 ො 𝑥 , 𝑔 𝑥 ) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖))) 𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙
  • 45. Matching Network 𝑃 ො 𝑦𝑘 = 1 ො 𝑥, 𝑆 = ෍ 𝑖=1 𝑘 𝑎(ො 𝑥, 𝑥𝑖)𝑦𝑖 𝑎 ො 𝑥, 𝑥 = exp(𝑐 𝑓 ො 𝑥 , 𝑔 𝑥 ) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖))) 𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙 Support Set Group One Batch
  • 46. Matching Network 𝑃 ො 𝑦𝑘 = 1 ො 𝑥, 𝑆 = ෍ 𝑖=1 𝑘 𝑎(ො 𝑥, 𝑥𝑖)𝑦𝑖 𝑎 ො 𝑥, 𝑥 = exp(𝑐 𝑓 ො 𝑥 , 𝑔 𝑥 ) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖))) 𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙 어떤 label일지 확률 값으로 표현
  • 47. Matching Network 𝑃 ො 𝑦𝑘 = 1 ො 𝑥, 𝑆 = ෍ 𝑖=1 𝑘 𝑎(ො 𝑥, 𝑥𝑖)𝑦𝑖 𝑎 ො 𝑥, 𝑥 = exp(𝑐 𝑓 ො 𝑥 , 𝑔 𝑥 ) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖))) 𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙 𝑥𝑖 서포트셋 데이터 하나 ො 𝑥 배치 데이터 하나
  • 48. Matching Network 𝑃 ො 𝑦𝑘 = 1 ො 𝑥, 𝑆 = ෍ 𝑖=1 𝑘 𝑎(ො 𝑥, 𝑥𝑖)𝑦𝑖 𝑎 ො 𝑥, 𝑥 = exp(𝑐 𝑓 ො 𝑥 , 𝑔 𝑥 ) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖))) 𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙 Kernel Density Estimator 거리기반으로 얼마나 가까운지 표현
  • 49. Matching Network 𝑃 ො 𝑦𝑘 = 1 ො 𝑥, 𝑆 = ෍ 𝑖=1 𝑘 𝑎(ො 𝑥, 𝑥𝑖)𝑦𝑖 𝑎 ො 𝑥, 𝑥 = exp(𝑐 𝑓 ො 𝑥 , 𝑔 𝑥 ) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖))) 𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙 함께 Mapping 확률 선정
  • 50. Matching Network 𝑃 ො 𝑦𝑘 = 1 ො 𝑥, 𝑆 = ෍ 𝑖=1 𝑘 𝑎(ො 𝑥, 𝑥𝑖)𝑦𝑖 𝑎 ො 𝑥, 𝑥 = exp(𝑐 𝑓 ො 𝑥 , 𝑔 𝑥 ) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖))) 𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙 Kernel Density Estimator
  • 51. Matching Network 𝑃 ො 𝑦𝑘 = 1 ො 𝑥, 𝑆 = ෍ 𝑖=1 𝑘 𝑎(ො 𝑥, 𝑥𝑖)𝑦𝑖 𝑎 ො 𝑥, 𝑥 = exp(𝑐 𝑓 ො 𝑥 , 𝑔 𝑥 ) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖))) 𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙 Kernel Density Estimator 𝑓𝜃를 통한 feature extraction
  • 52. Matching Network 𝑃 ො 𝑦𝑘 = 1 ො 𝑥, 𝑆 = ෍ 𝑖=1 𝑘 𝑎(ො 𝑥, 𝑥𝑖)𝑦𝑖 𝑎 ො 𝑥, 𝑥 = exp(𝑐 𝑓 ො 𝑥 , 𝑔 𝑥 ) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖))) 𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙 Kernel Density Estimator 𝑓𝜃를 통한 feature extraction 𝑔𝜃를 통한 feature extraction
  • 53. Matching Network 𝑃 ො 𝑦𝑘 = 1 ො 𝑥, 𝑆 = ෍ 𝑖=1 𝑘 𝑎(ො 𝑥, 𝑥𝑖)𝑦𝑖 𝑎 ො 𝑥, 𝑥 = exp(𝑐 𝑓 ො 𝑥 , 𝑔 𝑥 ) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖))) 𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙 Kernel Density Estimator 𝑔𝜃를 통한 feature extraction 𝑓𝜃를 통한 feature extraction 𝑓𝜃 𝑔𝜃 에 대한 paper description : 동일해도 가능 / 별도로 설정해도 가능
  • 54. Matching Network 𝑃 ො 𝑦𝑘 = 1 ො 𝑥, 𝑆 = ෍ 𝑖=1 𝑘 𝑎(ො 𝑥, 𝑥𝑖)𝑦𝑖 𝑎 ො 𝑥, 𝑥 = exp(𝑐 𝑓 ො 𝑥 , 𝑔 𝑥 ) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖))) 𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙 Kernel Density Estimator 𝑔𝜃를 통한 feature extraction 𝑓𝜃를 통한 feature extraction 𝑓𝜃 𝑔𝜃 에 대한 paper description : 동일해도 가능 / 별도로 설정해도 가능 Word to Vec 학습법과 유사
  • 55. Matching Network 𝑃 ො 𝑦𝑘 = 1 ො 𝑥, 𝑆 = ෍ 𝑖=1 𝑘 𝑎(ො 𝑥, 𝑥𝑖)𝑦𝑖 𝑎 ො 𝑥, 𝑥 = exp(𝑐 𝑓 ො 𝑥 , 𝑔 𝑥 ) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖))) 𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙 Kernel Density Estimator 𝑔𝜃를 통한 feature extraction 𝑓𝜃를 통한 feature extraction 𝑓𝜃 𝑔𝜃 에 대한 paper description : Scratch 학습 (Conv 3~4개) Pretrained 이용 (VGG, Inception…)
  • 56. Matching Network 𝑃 ො 𝑦𝑘 = 1 ො 𝑥, 𝑆 = ෍ 𝑖=1 𝑘 𝑎(ො 𝑥, 𝑥𝑖)𝑦𝑖 𝑎 ො 𝑥, 𝑥 = exp(𝑐 𝑓 ො 𝑥 , 𝑔 𝑥 ) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖))) 𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙 Kernel Density Estimator 𝑓𝜃를 통한 feature extraction 𝑔𝜃를 통한 feature extraction Cosine similarity를 통한 유사도 계산
  • 57. Matching Network 𝑃 ො 𝑦𝑘 = 1 ො 𝑥, 𝑆 = ෍ 𝑖=1 𝑘 𝑎(ො 𝑥, 𝑥𝑖)𝑦𝑖 𝑎 ො 𝑥, 𝑥 = exp(𝑐 𝑓 ො 𝑥 , 𝑔 𝑥 ) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖))) 𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙 Kernel Density Estimator 𝑓𝜃를 통한 feature extraction 𝑔𝜃를 통한 feature extraction Softmax를 통해 KDE를 구한다
  • 58. Matching Network 𝑃 ො 𝑦𝑘 = 1 ො 𝑥, 𝑆 = ෍ 𝑖=1 𝑘 𝑎(ො 𝑥, 𝑥𝑖)𝑦𝑖 𝑎 ො 𝑥, 𝑥 = exp(𝑐 𝑓 ො 𝑥 , 𝑔 𝑥 ) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖))) 𝑐 ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝐸𝑛𝑑 𝑡𝑜 𝐸𝑛𝑑 𝑚𝑜𝑑𝑒𝑙 Kernel Density Estimator 𝑓𝜃를 통한 feature extraction 𝑔𝜃를 통한 feature extraction 레이블의 예측치들의 Attention 합
  • 59. PAPER : Matching Network Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
  • 60. PAPER : Matching Network Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS Deep Neural Freature를 활용한 메트릭 기반 러닝을 적용
  • 61. PAPER : Matching Network Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 모델링 관점에서의 특이점 : Attention와 rapid learning을 가능하게 하는 메모리의 활용 트레이닝 관점에서의 특이점 : Episode단위 학습 제안 (Support set, Batch set…)
  • 62. PAPER : Matching Network Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
  • 63. PAPER : Matching Network Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS Support Set 개념의 제안
  • 64. PAPER : Matching Network Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 전반적으로 영감을 받은 연구는 Seq2Seq Attention Mechanism Memory Network Pointer Network
  • 65. PAPER : Matching Network Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 전반적으로 영감을 받은 연구는 Seq2Seq Attention Mechanism Memory Network Pointer Network 핵심은 Attention
  • 66. PAPER : Matching Network Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS Support Set과 같은 메타 학습전략에 대한 설명 및 확률분포 P가 신경망을 통해 parametrized 된다는 이야기
  • 67. PAPER : Matching Network Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS Support Set과 같은 메타 학습전략에 대한 설명 및 확률분포 P가 신경망을 통해 parametrized 된다는 이야기 … 이어서 모델 전반에 대한 수식!
  • 68. PAPER : Matching Network Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS Attention 알고리즘에 대한 간단한 정리 및 알고리즘의 ‘k-b Nearest Neighbours’ 와의 유사성
  • 69. PAPER : Matching Network Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 어텐션 메커니즘에서 여러 개의 레이블에서 b개의 레이블이 0 으로 attention mapping이 된다면 ‘k-b’개의 NN개념으로 이해될 수 있다. 𝑎 ො 𝑥, 𝑥 = exp(𝑐 𝑓 ො 𝑥 , 𝑔 𝑥 ) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖)))
  • 70. PAPER : Matching Network Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 어텐션 메커니즘에서 여러 개의 레이블에서 b개의 레이블이 0 으로 attention mapping이 된다면 ‘k-b’개의 NN개념으로 이해될 수 있다. 정말? 𝑎 ො 𝑥, 𝑥 = exp(𝑐 𝑓 ො 𝑥 , 𝑔 𝑥 ) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖)))
  • 71. PAPER : Matching Network Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 𝑎 ො 𝑥, 𝑥 = exp(𝑐 𝑓 ො 𝑥 , 𝑔 𝑥 ) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖))) Matching network는 metric 기반에 구조에 cosine 함수로 값을 계산 0 ≤ 𝑐𝑜𝑠𝑁 ≤ 1
  • 72. PAPER : Matching Network Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 𝑎 ො 𝑥, 𝑥 = exp(𝑐 𝑓 ො 𝑥 , 𝑔 𝑥 ) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖))) Matching network는 metric 기반에 구조에 cosine 함수로 값을 계산 0 ≤ 𝑐𝑜𝑠𝑁 ≤ 1 가장 비슷한 케이스 exp(1) = 2.72 가장 거리가 먼 케이스 exp(-1) = 0.37
  • 73. PAPER : Matching Network Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 𝑎 ො 𝑥, 𝑥 = exp(𝑐 𝑓 ො 𝑥 , 𝑔 𝑥 ) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖))) 0 ≤ 𝑐𝑜𝑠𝑁 ≤ 1 가장 비슷한 케이스 exp(1) = 2.72 가장 거리가 먼 케이스 exp(-1) = 0.37 exp 1 exp 1 + 4 ∗ exp(−1) = 0.65 … ?
  • 74. PAPER : Matching Network Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 𝑎 ො 𝑥, 𝑥 = exp(𝑐 𝑓 ො 𝑥 , 𝑔 𝑥 ) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖))) 0 ≤ 𝑐𝑜𝑠𝑁 ≤ 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 1 166 331 496 661 826 991 1156 1321 1486 1651 1816 1981 2146 2311 2476 2641 2806 2971 Value of Attention with Cosine 𝑥 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑙𝑎𝑏𝑒𝑙 𝑦(𝑥10) = 0.421 𝑦(𝑥100) = 0.068 𝑦(𝑥500) = 0.014 𝑦(𝑥1000) = 0.007 𝑦(𝑥2000) = 0.003 𝑦(𝑥3000) = 0.002 0이 될 수가 없는 구조
  • 75. PAPER : Matching Network Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 0 ≤ 𝑐𝑜𝑠𝑁 ≤ 1 가장 비슷한 케이스 exp(1) = 2.72 가장 거리가 먼 케이스 exp(-1) = 0.37 0이 될 수가 없는 구조 더 넓은 𝑥축에 대한 Range가 필요… (cosine으로는 불가)
  • 76. PAPER : Matching Network Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS Attention Kernel에 대한 정리들 …
  • 77. PAPER : Matching Network Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 복잡한 Task를 해결하기 위한 고도화 된 모델 제안 Full Context Embeddings…
  • 78. PAPER : Matching Network Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 복잡한 Task를 해결하기 위한 고도화 된 모델 제안 Full Context Embeddings…
  • 79. Full Context Embedding 𝑔𝜃 X 𝑓𝜃 ෍ Heavy Weight Support set (S) Batch set (B) Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
  • 80. Full Context Embedding 𝑔𝜃 X 𝑓𝜃 ෍ Heavy Weight Support set (S) Batch set (B) Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
  • 81. Full Context Embedding Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 𝑔′ 𝑔′ 𝑔′ 단순히 CNN을 통해 Feature Extraction을 하기 때문에 Support set 레이블 간의 연관성 (dependent)가 부여되지 않음 𝑥𝑖 𝑦𝑖
  • 82. Full Context Embedding Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 𝑔′ 𝑔′ 𝑔′ 𝑥𝑖 𝑦𝑖
  • 83. Full Context Embedding Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 𝑔′ 𝑔′ 𝑔′ 𝐿𝑆𝑇𝑀 + 𝐿𝑆𝑇𝑀 𝐿𝑆𝑇𝑀 + 𝐿𝑆𝑇𝑀 𝐿𝑆𝑇𝑀 + 𝐿𝑆𝑇𝑀 𝑔(𝑥𝑖, 𝑆) 𝑥𝑖 𝑦𝑖
  • 84. Full Context Embedding Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 𝑔′ 𝑔′ 𝑔′ 𝐿𝑆𝑇𝑀 + 𝐿𝑆𝑇𝑀 𝐿𝑆𝑇𝑀 + 𝐿𝑆𝑇𝑀 𝐿𝑆𝑇𝑀 + 𝐿𝑆𝑇𝑀 𝑔(𝑥𝑖, 𝑆) 𝑥𝑖 𝑦𝑖 Bi-LSTM으로 추가 embedding [서로 간의 dependen가 반영된 새로운 feature을 더 extraction하게 된다. ]
  • 85. Full Context Embedding Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖−1, റ 𝑐𝑖−1) ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖+1, ര 𝑐𝑖−1) 𝑔 𝑥𝑖, 𝑆 = ℎ𝑖 + ℎ𝑖 + 𝑔′(𝑥𝑖)
  • 86. Full Context Embedding Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖−1, റ 𝑐𝑖−1) ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖+1, ര 𝑐𝑖−1) 𝑔 𝑥𝑖, 𝑆 = ℎ𝑖 + ℎ𝑖 + 𝑔′(𝑥𝑖) Hidden State (from LSTM)
  • 87. Full Context Embedding Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖−1, റ 𝑐𝑖−1) ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖+1, ര 𝑐𝑖−1) 𝑔 𝑥𝑖, 𝑆 = ℎ𝑖 + ℎ𝑖 + 𝑔′(𝑥𝑖) 기존에 conv에서 뽑은 Feature
  • 88. Full Context Embedding Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖−1, റ 𝑐𝑖−1) ℎ𝑖, 𝑐𝑖 = 𝐿𝑆𝑇𝑀(𝑔′ 𝑥𝑖 , ℎ𝑖+1, ര 𝑐𝑖−1) 𝑔 𝑥𝑖, 𝑆 = ℎ𝑖 + ℎ𝑖 + 𝑔′(𝑥𝑖) Residual connection으로 새롭게 feature extraction
  • 89. Full Context Embedding 𝑔𝜃 X 𝑓𝜃 ෍ Heavy Weight Support set (S) Batch set (B) Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS
  • 90. Full Context Embedding Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 𝑔𝑥 𝑟𝑘−1 𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 ) ෠ ℎ𝑘−1 ℎ𝑘−1 ෠ ℎ𝑘 𝐿𝑆𝑇𝑀 𝑓′ ො 𝑥 Batch set (B) + 𝐿𝑆𝑇𝑀 + 𝑓 ො 𝑥, 𝑆 = ℎ𝑘
  • 91. Full Context Embedding Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 𝑔𝑥 𝑟𝑘−1 𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 ) ෠ ℎ𝑘−1 ℎ𝑘−1 ෠ ℎ𝑘 𝐿𝑆𝑇𝑀 𝑓′ ො 𝑥 Batch set (B) + 𝐿𝑆𝑇𝑀 + 𝑓 ො 𝑥, 𝑆 = ℎ𝑘 Dependent Embedding 된 support set 정보 받기 Batch set에 attLSTM으로 추가 embedding
  • 92. Full Context Embedding Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 𝑔𝑥 𝑟𝑘−1 𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 ) ෠ ℎ𝑘−1 ℎ𝑘−1 ෠ ℎ𝑘 𝐿𝑆𝑇𝑀 𝑓′ ො 𝑥 Batch set (B) + 𝐿𝑆𝑇𝑀 + 𝑓 ො 𝑥, 𝑆 = ℎ𝑘 Conv로 feature extraction 한 뒤 LSTM에 입력
  • 93. Full Context Embedding Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 𝑔𝑥 𝑟𝑘−1 𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 ) ෠ ℎ𝑘−1 ℎ𝑘−1 ෠ ℎ𝑘 𝐿𝑆𝑇𝑀 𝑓′ ො 𝑥 Batch set (B) + 𝐿𝑆𝑇𝑀 + 𝑓 ො 𝑥, 𝑆 = ℎ𝑘 Sequence는 K개 [K is hyperparameter] Conv로 feature extraction 한 뒤 LSTM에 입력 …
  • 94. Full Context Embedding Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 𝑔𝑥 𝑟𝑘−1 𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 ) ෠ ℎ𝑘−1 ℎ𝑘−1 ෠ ℎ𝑘 𝐿𝑆𝑇𝑀 𝑓′ ො 𝑥 Batch set (B) + 𝐿𝑆𝑇𝑀 + 𝑓 ො 𝑥, 𝑆 = ℎ𝑘 ෠ ℎ𝑘, 𝑐𝑘 = 𝐿𝑆𝑇𝑀(𝑓′ ො 𝑥 , ℎ𝑘−1, 𝑟𝑘−1 , 𝑐𝑘−1) ℎ𝑘 = ෠ ℎ𝑘 + 𝑓′(ො 𝑥) ℎ = 𝐻𝑖𝑑𝑑𝑒𝑛 𝑠𝑡𝑎𝑡𝑒 𝑐 = 𝑐𝑒𝑙𝑙 𝑠𝑡𝑎𝑡𝑒 Conv로 feature extraction 한 뒤 LSTM에 입력
  • 95. Full Context Embedding Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 𝑔𝑥 𝑟𝑘−1 𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 ) ෠ ℎ𝑘−1 ℎ𝑘−1 ෠ ℎ𝑘 𝐿𝑆𝑇𝑀 𝑓′ ො 𝑥 Batch set (B) + 𝐿𝑆𝑇𝑀 + 𝑓 ො 𝑥, 𝑆 = ℎ𝑘 ෠ ℎ𝑘, 𝑐𝑘 = 𝐿𝑆𝑇𝑀(𝑓′ ො 𝑥 , ℎ𝑘−1, 𝑟𝑘−1 , 𝑐𝑘−1) ℎ𝑘 = ෠ ℎ𝑘 + 𝑓′(ො 𝑥) ℎ = 𝐻𝑖𝑑𝑑𝑒𝑛 𝑠𝑡𝑎𝑡𝑒 𝑐 = 𝑐𝑒𝑙𝑙 𝑠𝑡𝑎𝑡𝑒 기존에 conv로 뽑은 feature와 LSTM으로 생성된 값과의 residual connection을 통해 새로운 feature을 뽑게 된다.
  • 96. Full Context Embedding Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 𝑔𝑥 𝑟𝑘−1 𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 ) ෠ ℎ𝑘−1 ℎ𝑘−1 ෠ ℎ𝑘 𝐿𝑆𝑇𝑀 𝑓′ ො 𝑥 Batch set (B) + 𝐿𝑆𝑇𝑀 + 𝑓 ො 𝑥, 𝑆 = ℎ𝑘 ෠ ℎ𝑘, 𝑐𝑘 = 𝐿𝑆𝑇𝑀(𝑓′ ො 𝑥 , ℎ𝑘−1, 𝑟𝑘−1 , 𝑐𝑘−1) ℎ𝑘 = ෠ ℎ𝑘 + 𝑓′(ො 𝑥) ℎ = 𝐻𝑖𝑑𝑑𝑒𝑛 𝑠𝑡𝑎𝑡𝑒 𝑐 = 𝑐𝑒𝑙𝑙 𝑠𝑡𝑎𝑡𝑒 𝑟𝑘−1 = ෍ 𝑖=1 |𝑠| 𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 )𝑔(𝑥𝑖) 𝑎(ℎ𝑘−1, 𝑔(𝑥𝑖)) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(ℎ𝑘−1 𝑇 𝑔(𝑥𝑖)) 위에서 LSTM을 통해 구한 feature와 support set에서 나온 feature들간 attention 매커니즘을 통해 새로운 feature를 뽑는다
  • 97. Full Context Embedding Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 𝑔𝑥 𝑟𝑘−1 𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 ) ෠ ℎ𝑘−1 ℎ𝑘−1 ෠ ℎ𝑘 𝐿𝑆𝑇𝑀 𝑓′ ො 𝑥 Batch set (B) + 𝐿𝑆𝑇𝑀 + 𝑓 ො 𝑥, 𝑆 = ℎ𝑘 ෠ ℎ𝑘, 𝑐𝑘 = 𝐿𝑆𝑇𝑀(𝑓′ ො 𝑥 , ℎ𝑘−1, 𝑟𝑘−1 , 𝑐𝑘−1) ℎ𝑘 = ෠ ℎ𝑘 + 𝑓′(ො 𝑥) ℎ = 𝐻𝑖𝑑𝑑𝑒𝑛 𝑠𝑡𝑎𝑡𝑒 𝑐 = 𝑐𝑒𝑙𝑙 𝑠𝑡𝑎𝑡𝑒 𝑟𝑘−1 = ෍ 𝑖=1 |𝑠| 𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 )𝑔(𝑥𝑖) 𝑎(ℎ𝑘−1, 𝑔(𝑥𝑖)) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(ℎ𝑘−1 𝑇 𝑔(𝑥𝑖)) 뽑아낸 feature을 다음 sequence로 넘긴다. 다음 sequence에서는 해당 정보가 강화되기 때문에 지속적으로 누적하여 support set의 정보를 얻게 된다. 정보의 누적 (of support set)
  • 98. Full Context Embedding Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 𝑔𝑥 𝑟𝑘−1 𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 ) ෠ ℎ𝑘−1 ℎ𝑘−1 ෠ ℎ𝑘 𝐿𝑆𝑇𝑀 𝑓′ ො 𝑥 Batch set (B) + 𝐿𝑆𝑇𝑀 + 𝑓 ො 𝑥, 𝑆 = ℎ𝑘 ෠ ℎ𝑘, 𝑐𝑘 = 𝐿𝑆𝑇𝑀(𝑓′ ො 𝑥 , ℎ𝑘−1, 𝑟𝑘−1 , 𝑐𝑘−1) ℎ𝑘 = ෠ ℎ𝑘 + 𝑓′(ො 𝑥) ℎ = 𝐻𝑖𝑑𝑑𝑒𝑛 𝑠𝑡𝑎𝑡𝑒 𝑐 = 𝑐𝑒𝑙𝑙 𝑠𝑡𝑎𝑡𝑒 𝑟𝑘−1 = ෍ 𝑖=1 |𝑠| 𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 )𝑔(𝑥𝑖) 𝑎(ℎ𝑘−1, 𝑔(𝑥𝑖)) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(ℎ𝑘−1 𝑇 𝑔(𝑥𝑖)) 최종 적으로 K 번 째 LSTM을 통해 나온 hidden state의 input값과 더하여 최종적인 feature을 뽑아내게 된다.
  • 99. Full Context Embedding Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 𝑔𝑥 𝑟𝑘−1 𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 ) ෠ ℎ𝑘−1 ℎ𝑘−1 ෠ ℎ𝑘 𝐿𝑆𝑇𝑀 𝑓′ ො 𝑥 Batch set (B) + 𝐿𝑆𝑇𝑀 + 𝑓 ො 𝑥, 𝑆 = ℎ𝑘 𝑓 ො 𝑥, 𝑆 = 𝑎𝑡𝑡𝐿𝑆𝑇𝑀 𝑓′ ො 𝑥 , 𝑔 𝑠 , 𝐾 = ℎ𝐾 ෠ ℎ𝑘, 𝑐𝑘 = 𝐿𝑆𝑇𝑀(𝑓′ ො 𝑥 , ℎ𝑘−1, 𝑟𝑘−1 , 𝑐𝑘−1) ℎ𝑘 = ෠ ℎ𝑘 + 𝑓′(ො 𝑥) 𝑟𝑘−1 = ෍ 𝑖=1 |𝑠| 𝑎(ℎ𝑘−1, 𝑔 𝑥𝑖 )𝑔(𝑥𝑖)
  • 100. Full Context Embedding Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 𝑃 ො 𝑦𝑘 = 1 ො 𝑥, 𝑆 = ෍ 𝑖=1 𝑘 𝑎 ො 𝑥, 𝑥𝑖 𝑦𝑖 𝑎 ො 𝑥, 𝑥 = exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥))) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖)))
  • 101. Full Context Embedding Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS Bi-LSTM을 통해 dependent한 support set feature 추출 𝑃 ො 𝑦𝑘 = 1 ො 𝑥, 𝑆 = ෍ 𝑖=1 𝑘 𝑎 ො 𝑥, 𝑥𝑖 𝑦𝑖 𝑎 ො 𝑥, 𝑥 = exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥))) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖)))
  • 102. Full Context Embedding Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS Attentional LSTM을 통해 support set과 연관되어 강화된 정보로 뽑힌 feature 𝑃 ො 𝑦𝑘 = 1 ො 𝑥, 𝑆 = ෍ 𝑖=1 𝑘 𝑎 ො 𝑥, 𝑥𝑖 𝑦𝑖 𝑎 ො 𝑥, 𝑥 = exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥))) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖)))
  • 103. Full Context Embedding Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS 𝑃 ො 𝑦𝑘 = 1 ො 𝑥, 𝑆 = ෍ 𝑖=1 𝑘 𝑎 ො 𝑥, 𝑥𝑖 𝑦𝑖 𝑎 ො 𝑥, 𝑥 = exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥))) σ𝑖=1 𝐾 exp(𝑐(𝑓 ො 𝑥 , 𝑔(𝑥𝑖))) Metric 기반 Attention으로 뽑아내는 확률
  • 104. Matching Network Experiments Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS Dataset : Ominiglot [Lake+, 2011] 50 Alphabets with 1623 character - Training 1200, Testing 423 (char) - Training에서 본적 없는 alphabet으로 Testing 20 / character : character당 20명의 사람들의 수작업 데이터 - Training할 때 random rotation을 통해 augmentation Input : 28*28 Image 4 stack modules [3x3 conv_64 + BN + Relu + 2x2 Max pooling]
  • 105. Matching Network Experiments Vinyals, Oriol, et al. "Matching networks for one shot learning." Advances in neural information processing systems 29 (2016): 3630-3638. NIPS : mating on raw pixels : VGG : Model base ML
  • 106. Q n A Q & A
  • 107.
  • 108. Appendix : Seq2Seq Convex hull problem ∶최외각 점들 탐색 문제 - 점들이 주어질 때 최외각 점들을 찾기
  • 109. Appendix : Seq2Seq Convex hull problem ∶최외각 점들 탐색 문제 - 점들이 주어질 때 최외각 점들을 찾기
  • 110. Appendix : Seq2Seq 신경망을 통한 해결 seq2seq 인코더 : 점들의 나열을 입력하여 점들의 정보를 학습 디코더 : 학습한 정보로 최외각 점들의 번호 시퀀스를 출력
  • 111. Appendix : Seq2Seq - 출력이 입력에 의존하는 문제를 해결 하기에는 seq2seq가 적합하지 않음 - Attention을 사용해도 장거리 관계 포착은 해결 할 수 있지만 출력이 고정된다는 문제는 해결 불가 신경망을 통한 해결 Seq2seq with attention