SlideShare a Scribd company logo
1 of 84
Download to read offline
Beyond Synthetic Noise:
Deep Learning on Controlled Noisy Labels
Presenter
이재윤
1
Fundamental Team
고형권, 김동희, 김창연, 송헌, 이민경
Jiang, Lu et al. "Beyond Synthetic Noise: Deep Learning on Controlled
Noisy Labels." arXiv preprint arXiv:1911.09781 (2021).
Contents
1. Noisy Label
2. Background
3. MentorMix
4. Experiments
1. Noisy Label
Dog Dog Cat?
Cat Cat Dog?
CLEAN NOISY
Model Preds
Loss
Label Y
Understanding Deep Learning Requires Rethinking Generalization, ICLR 2017
Understanding Deep Learning Requires Rethinking Generalization, ICLR 2017
Understanding Deep Learning Requires Rethinking Generalization, ICLR 2017
2. Background
“MentorMix:
Minimize
the empirical vicinal risk
using curriculum learning”
“MentorNet + MixUp:
Minimize
the empirical vicinal risk
using curriculum learning”
MentorNet: Regularizing Very Deep Neural Networks on Corrupted Labels, ICML 2018
mixup: Beyond Empirical Risk Minimization, ICLR 2018
Curriculum Learning ?
Curriculum Learning, ICML 2009
EASY
HARD
EASY
Faster Better
Case 1
EASY
HARD
HARD
EASY
Case 1
Case 2
Which one is
easy/hard?
Case 2
Let’s just solve
everything!
Case 2
I think they are
easy
Case 2
I think they are
easy
Case 2
I think they are
easy
“Self – Paced Learning”
Case 3 (MentorNet)
Case 3 (MentorNet)
Case 3 (MentorNet) I think they are
easy
Case 3 (MentorNet) I think they are
easy for student
• Current Score
• Student’s interpretation
• Last Score
• Degree of progress
Case 3 (MentorNet) I think they are
easy for student
• Current Score
• Student’s interpretation
• Last Score
• Degree of progress
Mixup?
Prevent Overfitting
3. MentorMix
Easiness in Noisy Label
Cat
Dog
Dog Dog
Cat Cat
Easiness in Noisy Label
Cat
Dog
Dog Dog
Cat Cat
Easiness in Noisy Label
Cat
Dog
Dog Dog
Cat Cat
It’s Dog!!
It is easy
Easiness in Noisy Label
Cat
Dog
Dog Dog
Cat Cat
It’s Cat!!
It is hard
Easiness in Noisy Label
Cat
Dog
Dog Dog
Cat Cat
It’s Dog!!
It is easy
Easiness in Noisy Label
Cat
Dog
Dog Dog
Cat Cat
It’s Cat!!
It is hard
Easiness in Noisy Label
Better Generalization
Student Learns Fast
Implementaion
MentorMix Implementation
• 𝑛 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒
• 𝑣 ∈ [0,1]𝑛
∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
• 𝑤 ∶ 𝑤𝑒𝑖𝑔ℎ𝑡 𝑜𝑓 𝒔𝒕𝒖𝒅𝒆𝒏𝒕 𝑚𝑜𝑑𝑒𝑙
• 𝐹 𝑣, 𝑤 ∶ 𝐿𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
• 𝑔𝑠 𝑥𝑖; 𝑤, 𝑦𝑖 ∶ 𝑝𝑟𝑒𝑑𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷𝑁𝑁
• 𝐺 ∶ 𝐶𝑢𝑟𝑟𝑖𝑐𝑢𝑙𝑢𝑚
• 𝛾 ∶ 𝑙𝑜𝑠𝑠 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑
𝐹(𝑣, 𝑤)
• 𝑛 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒
• 𝑣 ∈ [0,1]𝑛
∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
• 𝑤 ∶ 𝑤𝑒𝑖𝑔ℎ𝑡 𝑜𝑓 𝒔𝒕𝒖𝒅𝒆𝒏𝒕 𝑚𝑜𝑑𝑒𝑙
• 𝐹 𝑣, 𝑤 ∶ 𝐿𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
• 𝑔𝑠 𝑥𝑖; 𝑤, 𝑦𝑖 ∶ 𝑝𝑟𝑒𝑑𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷𝑁𝑁
• 𝐺 ∶ 𝐶𝑢𝑟𝑟𝑖𝑐𝑢𝑙𝑢𝑚
• 𝛾 ∶ 𝑙𝑜𝑠𝑠 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑
MentorMix Implementation
Cross Entropy Loss
𝐹(𝑣, 𝑤)
=
1
𝑛
෍
𝑖=1
𝑛
𝑣𝑖 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 + 𝜃 𝑤 2
2
− 𝐺(𝑣; 𝛾)
MentorMix Implementation
• 𝑛 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒
• 𝑣 ∈ [0,1]𝑛
∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
• 𝑤 ∶ 𝑤𝑒𝑖𝑔ℎ𝑡 𝑜𝑓 𝒔𝒕𝒖𝒅𝒆𝒏𝒕 𝑚𝑜𝑑𝑒𝑙
• 𝐹 𝑣, 𝑤 ∶ 𝐿𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
• 𝑔𝑠 𝑥𝑖; 𝑤, 𝑦𝑖 ∶ 𝑝𝑟𝑒𝑑𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷𝑁𝑁
• 𝐺 ∶ 𝐶𝑢𝑟𝑟𝑖𝑐𝑢𝑙𝑢𝑚
• 𝛾 ∶ 𝑙𝑜𝑠𝑠 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑
𝒗 = 𝟏
𝒗 = 𝟎
𝐹(𝑣, 𝑤)
=
1
𝑛
෍
𝑖=1
𝑛
𝑣𝑖 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 + 𝜃 𝑤 2
2
− 𝐺(𝑣; 𝛾)
Cross Entropy Loss
MentorMix Implementation
• 𝑛 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒
• 𝑣 ∈ [0,1]𝑛
∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
• 𝑤 ∶ 𝑤𝑒𝑖𝑔ℎ𝑡 𝑜𝑓 𝒔𝒕𝒖𝒅𝒆𝒏𝒕 𝑚𝑜𝑑𝑒𝑙
• 𝐹 𝑣, 𝑤 ∶ 𝐿𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
• 𝑔𝑠 𝑥𝑖; 𝑤, 𝑦𝑖 ∶ 𝑝𝑟𝑒𝑑𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷𝑁𝑁
• 𝐺 ∶ 𝐶𝑢𝑟𝑟𝑖𝑐𝑢𝑙𝑢𝑚
• 𝛾 ∶ 𝑙𝑜𝑠𝑠 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑
𝐹(𝑣, 𝑤)
=
1
𝑛
෍
𝑖=1
𝑛
𝑣𝑖 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 + 𝜃 𝑤 2
2
− 𝐺(𝑣; 𝛾)
𝒗 = 𝟏
𝒗 = 𝟎
Cross Entropy Loss
• 𝑛 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒
• 𝑣 ∈ [0,1]𝑛
∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
• 𝑤 ∶ 𝑤𝑒𝑖𝑔ℎ𝑡 𝑜𝑓 𝒔𝒕𝒖𝒅𝒆𝒏𝒕 𝑚𝑜𝑑𝑒𝑙
• 𝐹 𝑣, 𝑤 ∶ 𝐿𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
• 𝑔𝑠 𝑥𝑖; 𝑤, 𝑦𝑖 ∶ 𝑝𝑟𝑒𝑑𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷𝑁𝑁
• 𝐺 ∶ 𝐶𝑢𝑟𝑟𝑖𝑐𝑢𝑙𝑢𝑚
• 𝛾 ∶ 𝑙𝑜𝑠𝑠 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑
MentorMix Implementation
Curriculum
𝐹(𝑣, 𝑤)
=
1
𝑛
෍
𝑖=1
𝑛
𝑣𝑖 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 + 𝜃 𝑤 2
2
− 𝐺(𝑣; 𝛾)
• 𝑛 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒
• 𝑣 ∈ [0,1]𝑛
∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
• 𝑤 ∶ 𝑤𝑒𝑖𝑔ℎ𝑡 𝑜𝑓 𝒔𝒕𝒖𝒅𝒆𝒏𝒕 𝑚𝑜𝑑𝑒𝑙
• 𝐹 𝑣, 𝑤 ∶ 𝐿𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
• 𝑔𝑠 𝑥𝑖; 𝑤, 𝑦𝑖 ∶ 𝑝𝑟𝑒𝑑𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷𝑁𝑁
• 𝐺 ∶ 𝐶𝑢𝑟𝑟𝑖𝑐𝑢𝑙𝑢𝑚
• 𝛾 ∶ 𝑙𝑜𝑠𝑠 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑
MentorMix Implementation
=
1
𝑛
෍
𝑖=1
𝑛
𝑣𝑖 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 + 𝜃 𝑤 2
2
− 𝐺(𝑣; 𝛾)
𝐺 𝑣; 𝛾 = −𝛾𝑣𝑖
𝐹(𝑣, 𝑤)
• 𝑛 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒
• 𝑣 ∈ [0,1]𝑛
∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
• 𝑤 ∶ 𝑤𝑒𝑖𝑔ℎ𝑡 𝑜𝑓 𝒔𝒕𝒖𝒅𝒆𝒏𝒕 𝑚𝑜𝑑𝑒𝑙
• 𝐹 𝑣, 𝑤 ∶ 𝐿𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
• 𝑔𝑠 𝑥𝑖; 𝑤, 𝑦𝑖 ∶ 𝑝𝑟𝑒𝑑𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷𝑁𝑁
• 𝐺 ∶ 𝐶𝑢𝑟𝑟𝑖𝑐𝑢𝑙𝑢𝑚
• 𝛾 ∶ 𝑙𝑜𝑠𝑠 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑
MentorMix Implementation
=
1
𝑛
෍
𝑖=1
𝑛
𝑣𝑖 [ 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 − 𝛾 ] + 𝜃 𝑤 2
2
=
1
𝑛
෍
𝑖=1
𝑛
𝑣𝑖 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 + 𝜃 𝑤 2
2
− 𝛾𝑣𝑖
𝐹(𝑣, 𝑤)
• 𝑛 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒
• 𝑣 ∈ [0,1]𝑛
∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
• 𝑤 ∶ 𝑤𝑒𝑖𝑔ℎ𝑡 𝑜𝑓 𝒔𝒕𝒖𝒅𝒆𝒏𝒕 𝑚𝑜𝑑𝑒𝑙
• 𝐹 𝑣, 𝑤 ∶ 𝐿𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
• 𝑔𝑠 𝑥𝑖; 𝑤, 𝑦𝑖 ∶ 𝑝𝑟𝑒𝑑𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷𝑁𝑁
• 𝐺 ∶ 𝐶𝑢𝑟𝑟𝑖𝑐𝑢𝑙𝑢𝑚
• 𝛾 ∶ 𝑙𝑜𝑠𝑠 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑
MentorMix Implementation
=
1
𝑛
෍
𝑖=1
𝑛
𝑣𝑖 [ 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 − 𝛾 ] + 𝜃 𝑤 2
2
=
1
𝑛
෍
𝑖=1
𝑛
𝑣𝑖 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 + 𝜃 𝑤 2
2
− 𝛾𝑣𝑖
𝐹(𝑣, 𝑤)
𝑣𝑖 = ቊ
1 𝑖𝑓 𝑙 𝑥𝑖, 𝑦𝑖 < 𝛾
0 𝑖𝑓 𝑙 𝑥𝑖, 𝑦𝑖 > 𝛾
• 𝑛 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒
• 𝑣 ∈ [0,1]𝑛
∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
• 𝑤 ∶ 𝑤𝑒𝑖𝑔ℎ𝑡 𝑜𝑓 𝒔𝒕𝒖𝒅𝒆𝒏𝒕 𝑚𝑜𝑑𝑒𝑙
• 𝐹 𝑣, 𝑤 ∶ 𝐿𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛
• 𝑔𝑠 𝑥𝑖; 𝑤, 𝑦𝑖 ∶ 𝑝𝑟𝑒𝑑𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷𝑁𝑁
• 𝐺 ∶ 𝐶𝑢𝑟𝑟𝑖𝑐𝑢𝑙𝑢𝑚
• 𝛾 ∶ 𝑙𝑜𝑠𝑠 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑
MentorMix Implementation
=
1
𝑛
෍
𝑖=1
𝑛
𝑣𝑖 [ 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 − 𝛾 ] + 𝜃 𝑤 2
2
=
1
𝑛
෍
𝑖=1
𝑛
𝑣𝑖 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 + 𝜃 𝑤 2
2
− 𝛾𝑣𝑖
𝐹(𝑣, 𝑤)
𝑣𝑖 = ቊ
1 𝑖𝑓 𝑙 𝑥𝑖, 𝑦𝑖 < 𝛾
0 𝑖𝑓 𝑙 𝑥𝑖, 𝑦𝑖 > 𝛾
𝒗 = 𝟏
𝒗 = 𝟎
• 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼
• ෦
𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡
• ෦
𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡
MentorMix Implementation
𝐹(𝑣, 𝑤)
=
1
𝑛
෍
𝑖=1
𝑛
𝑣𝑖 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 + 𝜃 𝑤 2
2
− 𝐺(𝑣; 𝛾)
• 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼
• ෦
𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡
• ෦
𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡
MentorMix Implementation
𝐹(𝑣, 𝑤)
=
1
𝑛
෍
𝑖=1
𝑛
𝑣𝑖 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 + 𝜃 𝑤 2
2
− 𝐺(𝑣; 𝛾)
MentorMix Implementation
𝐹(𝑣, 𝑤)
• 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼
• ෦
𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡
• ෦
𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡
• ෦
𝑣𝑖𝑗 ∈ [0,1]𝑛 ∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
MentorMix Implementation
𝐹(𝑣, 𝑤)
• 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼
• ෦
𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡
• ෦
𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡
• ෦
𝑣𝑖𝑗 ∈ [0,1]𝑛 ∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
෦
𝑣𝑖𝑗 = ቐ
1 𝑖𝑓 𝑙 ෦
𝑥𝑖𝑗, ෦
𝑦𝑖𝑗 < 𝛾
0 𝑖𝑓 𝑙 ෦
𝑥𝑖𝑗, ෦
𝑦𝑖𝑗 > 𝛾
MentorMix Implementation
Too Many
Examples
MentorMix Implementation
v=0
v=1
v=1
v=0
MentorMix Implementation
v=0
v=1
v=1
v=0
MentorMix Implementation
v=0
v=1
v=1
v=0
MentorMix Implementation
v=0
v=1
v=1
v=0
× 𝑚𝑖𝑛(𝜆, 1 − 𝜆)
× 𝑚𝑖𝑛(𝜆, 1 − 𝜆)
× 𝒎𝒂𝒙(𝜆, 1 − 𝜆)
× 𝒎𝒂𝒙(𝜆, 1 − 𝜆)
MentorMix Implementation
v=0
v=1
v=1
v=0
× 𝑚𝑖𝑛(𝜆, 1 − 𝜆)
× 𝑚𝑖𝑛(𝜆, 1 − 𝜆)
× 𝒎𝒂𝒙(𝜆, 1 − 𝜆)
× 𝒎𝒂𝒙(𝜆, 1 − 𝜆)
× 𝒎𝒂𝒙(𝜆, 1 − 𝜆)
× 𝒎𝒂𝒙(𝜆, 1 − 𝜆) × 𝑚𝑖𝑛(𝜆, 1 − 𝜆)
× 𝑚𝑖𝑛(𝜆, 1 − 𝜆)
MentorMix Implementation
v=0
v=1
v=1
v=0
× 𝑚𝑖𝑛(𝜆, 1 − 𝜆)
× 𝑚𝑖𝑛(𝜆, 1 − 𝜆)
× 𝒎𝒂𝒙(𝜆, 1 − 𝜆)
× 𝒎𝒂𝒙(𝜆, 1 − 𝜆)
× 𝒎𝒂𝒙(𝜆, 1 − 𝜆)
× 𝒎𝒂𝒙(𝜆, 1 − 𝜆) × 𝑚𝑖𝑛(𝜆, 1 − 𝜆)
× 𝑚𝑖𝑛(𝜆, 1 − 𝜆)
MentorMix Implementation
𝐹 𝑣, 𝑤 =
• 𝑥𝑖 ∶ 𝑖′𝑡ℎ 𝑖𝑛𝑝𝑢𝑡
• 𝑦𝑖 ∶ 𝑖′𝑡ℎ 𝑙𝑎𝑏𝑒𝑙
• 𝐷𝑚 ∶ 𝑚𝑖𝑛𝑖 𝑏𝑎𝑡𝑐ℎ
• 𝛼, 𝛾𝑝 ∶ ℎ𝑦𝑝𝑒𝑟 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟
• 𝑣 ∈ [0,1]𝑛
∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
• 𝑃𝑣 ∶ 𝑆𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
• 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼
• ෦
𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡
• ෦
𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡
• 𝑥𝑖 ∶ 𝑖′𝑡ℎ 𝑖𝑛𝑝𝑢𝑡
• 𝑦𝑖 ∶ 𝑖′𝑡ℎ 𝑙𝑎𝑏𝑒𝑙
• 𝐷𝑚 ∶ 𝑚𝑖𝑛𝑖 𝑏𝑎𝑡𝑐ℎ
• 𝛼, 𝛾𝑝 ∶ ℎ𝑦𝑝𝑒𝑟 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟
• 𝑣 ∈ [0,1]𝑛
∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
• 𝑃𝑣 ∶ 𝑆𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
• 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼
• ෦
𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡
• ෦
𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡
∗ 𝛾𝑝 ∈ {90%, 80%, 70%}
• 𝑥𝑖 ∶ 𝑖′𝑡ℎ 𝑖𝑛𝑝𝑢𝑡
• 𝑦𝑖 ∶ 𝑖′𝑡ℎ 𝑙𝑎𝑏𝑒𝑙
• 𝐷𝑚 ∶ 𝑚𝑖𝑛𝑖 𝑏𝑎𝑡𝑐ℎ
• 𝛼, 𝛾𝑝 ∶ ℎ𝑦𝑝𝑒𝑟 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟
• 𝑣 ∈ [0,1]𝑛
∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
• 𝑃𝑣 ∶ 𝑆𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
• 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼
• ෦
𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡
• ෦
𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡
∗ 𝛾𝑝 ∈ {90%, 80%, 70%}
∗ EMA weight =0.999
• 𝑥𝑖 ∶ 𝑖′𝑡ℎ 𝑖𝑛𝑝𝑢𝑡
• 𝑦𝑖 ∶ 𝑖′𝑡ℎ 𝑙𝑎𝑏𝑒𝑙
• 𝐷𝑚 ∶ 𝑚𝑖𝑛𝑖 𝑏𝑎𝑡𝑐ℎ
• 𝛼, 𝛾𝑝 ∶ ℎ𝑦𝑝𝑒𝑟 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟
• 𝑣 ∈ [0,1]𝑛
∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
• 𝑃𝑣 ∶ 𝑆𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
• 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼
• ෦
𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡
• ෦
𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡
• 𝑥𝑖 ∶ 𝑖′𝑡ℎ 𝑖𝑛𝑝𝑢𝑡
• 𝑦𝑖 ∶ 𝑖′𝑡ℎ 𝑙𝑎𝑏𝑒𝑙
• 𝐷𝑚 ∶ 𝑚𝑖𝑛𝑖 𝑏𝑎𝑡𝑐ℎ
• 𝛼, 𝛾𝑝 ∶ ℎ𝑦𝑝𝑒𝑟 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟
• 𝑣 ∈ [0,1]𝑛
∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
• 𝑃𝑣 ∶ 𝑆𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
• 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼
• ෦
𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡
• ෦
𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡
• 𝑥𝑖 ∶ 𝑖′𝑡ℎ 𝑖𝑛𝑝𝑢𝑡
• 𝑦𝑖 ∶ 𝑖′𝑡ℎ 𝑙𝑎𝑏𝑒𝑙
• 𝐷𝑚 ∶ 𝑚𝑖𝑛𝑖 𝑏𝑎𝑡𝑐ℎ
• 𝛼, 𝛾𝑝 ∶ ℎ𝑦𝑝𝑒𝑟 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟
• 𝑣 ∈ [0,1]𝑛
∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
• 𝑃𝑣 ∶ 𝑆𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
• 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼
• ෦
𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡
• ෦
𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡
• 𝑥𝑖 ∶ 𝑖′𝑡ℎ 𝑖𝑛𝑝𝑢𝑡
• 𝑦𝑖 ∶ 𝑖′𝑡ℎ 𝑙𝑎𝑏𝑒𝑙
• 𝐷𝑚 ∶ 𝑚𝑖𝑛𝑖 𝑏𝑎𝑡𝑐ℎ
• 𝛼, 𝛾𝑝 ∶ ℎ𝑦𝑝𝑒𝑟 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟
• 𝑣 ∈ [0,1]𝑛
∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
• 𝑃𝑣 ∶ 𝑆𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
• 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼
• ෦
𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡
• ෦
𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡
• 𝑥𝑖 ∶ 𝑖′𝑡ℎ 𝑖𝑛𝑝𝑢𝑡
• 𝑦𝑖 ∶ 𝑖′𝑡ℎ 𝑙𝑎𝑏𝑒𝑙
• 𝐷𝑚 ∶ 𝑚𝑖𝑛𝑖 𝑏𝑎𝑡𝑐ℎ
• 𝛼, 𝛾𝑝 ∶ ℎ𝑦𝑝𝑒𝑟 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟
• 𝑣 ∈ [0,1]𝑛
∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
• 𝑃𝑣 ∶ 𝑆𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
• 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼
• ෦
𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡
• ෦
𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡
4. Experiments
• All Methods trained on BLUE/RED Noise Mini-ImageNet / Stanford under 10 noise
levels(p) [0,5,10,15,20,30,40,50,60,80]
• Train from Scratch / Fine-Tuning
• Inception-ResNet-v2(upsample Mini-ImageNet dataset from 84x84 to 299x299)
• Grid Search Hyperparameter 𝛼 ∈ {0.4, 1, 2}, 𝛾𝑝 ∈ {90%, 80%, 70%}
Experiments Setting
Understanding DNNs trained on noisy labels
Understanding DNNs trained on noisy labels
Understanding DNNs trained on noisy labels
Understanding DNNs trained on noisy labels
Understanding DNNs trained on noisy labels
Thank you

More Related Content

Similar to Mentor mix review

[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processing
[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processing[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processing
[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processingNAVER Engineering
 
Some approximation properties of modified baskakov stancu operators
Some approximation properties of modified baskakov stancu operatorsSome approximation properties of modified baskakov stancu operators
Some approximation properties of modified baskakov stancu operatorseSAT Journals
 
Fundamentals of Program Impact Evaluation
Fundamentals of Program Impact EvaluationFundamentals of Program Impact Evaluation
Fundamentals of Program Impact EvaluationMEASURE Evaluation
 
Homework 1 Solution.pptx
Homework 1 Solution.pptxHomework 1 Solution.pptx
Homework 1 Solution.pptxiksanbukhori
 
05.scd_cuantificacion_y_senales_de_prueba
05.scd_cuantificacion_y_senales_de_prueba05.scd_cuantificacion_y_senales_de_prueba
05.scd_cuantificacion_y_senales_de_pruebaHipólito Aguilar
 
diffusion_posterior_sampling_for_general_noisy_inverse_problems_slideshare.pdf
diffusion_posterior_sampling_for_general_noisy_inverse_problems_slideshare.pdfdiffusion_posterior_sampling_for_general_noisy_inverse_problems_slideshare.pdf
diffusion_posterior_sampling_for_general_noisy_inverse_problems_slideshare.pdfChung Hyung Jin
 
Complete Residue Systems.pptx
Complete Residue Systems.pptxComplete Residue Systems.pptx
Complete Residue Systems.pptxJasonMeregildo3
 
Introduction of “Fairness in Learning: Classic and Contextual Bandits”
Introduction of “Fairness in Learning: Classic and Contextual Bandits”Introduction of “Fairness in Learning: Classic and Contextual Bandits”
Introduction of “Fairness in Learning: Classic and Contextual Bandits”Kazuto Fukuchi
 
Θεωρία - Ορισμοί - Προτάσεις 2021 - Γ Λυκείου
Θεωρία - Ορισμοί - Προτάσεις 2021 - Γ Λυκείου Θεωρία - Ορισμοί - Προτάσεις 2021 - Γ Λυκείου
Θεωρία - Ορισμοί - Προτάσεις 2021 - Γ Λυκείου Μάκης Χατζόπουλος
 
07.scd_digitalizacion_de_senales_continuas
07.scd_digitalizacion_de_senales_continuas07.scd_digitalizacion_de_senales_continuas
07.scd_digitalizacion_de_senales_continuasHipólito Aguilar
 

Similar to Mentor mix review (20)

Lec05.pptx
Lec05.pptxLec05.pptx
Lec05.pptx
 
[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processing
[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processing[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processing
[GAN by Hung-yi Lee]Part 2: The application of GAN to speech and text processing
 
08.sdcd_ransformada_z
08.sdcd_ransformada_z08.sdcd_ransformada_z
08.sdcd_ransformada_z
 
04.mdsd_laplace_fourier
04.mdsd_laplace_fourier04.mdsd_laplace_fourier
04.mdsd_laplace_fourier
 
Some approximation properties of modified baskakov stancu operators
Some approximation properties of modified baskakov stancu operatorsSome approximation properties of modified baskakov stancu operators
Some approximation properties of modified baskakov stancu operators
 
Fundamentals of Program Impact Evaluation
Fundamentals of Program Impact EvaluationFundamentals of Program Impact Evaluation
Fundamentals of Program Impact Evaluation
 
Periodic Solutions for Non-Linear Systems of Integral Equations
Periodic Solutions for Non-Linear Systems of Integral EquationsPeriodic Solutions for Non-Linear Systems of Integral Equations
Periodic Solutions for Non-Linear Systems of Integral Equations
 
Homework 1 Solution.pptx
Homework 1 Solution.pptxHomework 1 Solution.pptx
Homework 1 Solution.pptx
 
05.scd_cuantificacion_y_senales_de_prueba
05.scd_cuantificacion_y_senales_de_prueba05.scd_cuantificacion_y_senales_de_prueba
05.scd_cuantificacion_y_senales_de_prueba
 
Sets
SetsSets
Sets
 
Photosynthesis
PhotosynthesisPhotosynthesis
Photosynthesis
 
J256979
J256979J256979
J256979
 
Selection on Observables
Selection on ObservablesSelection on Observables
Selection on Observables
 
diffusion_posterior_sampling_for_general_noisy_inverse_problems_slideshare.pdf
diffusion_posterior_sampling_for_general_noisy_inverse_problems_slideshare.pdfdiffusion_posterior_sampling_for_general_noisy_inverse_problems_slideshare.pdf
diffusion_posterior_sampling_for_general_noisy_inverse_problems_slideshare.pdf
 
Complete Residue Systems.pptx
Complete Residue Systems.pptxComplete Residue Systems.pptx
Complete Residue Systems.pptx
 
Graphical method
Graphical methodGraphical method
Graphical method
 
Introduction of “Fairness in Learning: Classic and Contextual Bandits”
Introduction of “Fairness in Learning: Classic and Contextual Bandits”Introduction of “Fairness in Learning: Classic and Contextual Bandits”
Introduction of “Fairness in Learning: Classic and Contextual Bandits”
 
Deep robotics
Deep roboticsDeep robotics
Deep robotics
 
Θεωρία - Ορισμοί - Προτάσεις 2021 - Γ Λυκείου
Θεωρία - Ορισμοί - Προτάσεις 2021 - Γ Λυκείου Θεωρία - Ορισμοί - Προτάσεις 2021 - Γ Λυκείου
Θεωρία - Ορισμοί - Προτάσεις 2021 - Γ Λυκείου
 
07.scd_digitalizacion_de_senales_continuas
07.scd_digitalizacion_de_senales_continuas07.scd_digitalizacion_de_senales_continuas
07.scd_digitalizacion_de_senales_continuas
 

More from taeseon ryu

OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...taeseon ryu
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splattingtaeseon ryu
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptxtaeseon ryu
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정taeseon ryu
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdftaeseon ryu
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extractiontaeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learningtaeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Modelstaeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuningtaeseon ryu
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdftaeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithmtaeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networkstaeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarizationtaeseon ryu
 

More from taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 

Recently uploaded

Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 

Recently uploaded (20)

Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 

Mentor mix review

  • 1. Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels Presenter 이재윤 1 Fundamental Team 고형권, 김동희, 김창연, 송헌, 이민경 Jiang, Lu et al. "Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels." arXiv preprint arXiv:1911.09781 (2021).
  • 2. Contents 1. Noisy Label 2. Background 3. MentorMix 4. Experiments
  • 4.
  • 5. Dog Dog Cat? Cat Cat Dog?
  • 8. Understanding Deep Learning Requires Rethinking Generalization, ICLR 2017
  • 9. Understanding Deep Learning Requires Rethinking Generalization, ICLR 2017
  • 10. Understanding Deep Learning Requires Rethinking Generalization, ICLR 2017
  • 12. “MentorMix: Minimize the empirical vicinal risk using curriculum learning”
  • 13. “MentorNet + MixUp: Minimize the empirical vicinal risk using curriculum learning” MentorNet: Regularizing Very Deep Neural Networks on Corrupted Labels, ICML 2018 mixup: Beyond Empirical Risk Minimization, ICLR 2018
  • 16. EASY
  • 21. Case 2 Which one is easy/hard?
  • 22. Case 2 Let’s just solve everything!
  • 23. Case 2 I think they are easy
  • 24. Case 2 I think they are easy
  • 25. Case 2 I think they are easy “Self – Paced Learning”
  • 28. Case 3 (MentorNet) I think they are easy
  • 29. Case 3 (MentorNet) I think they are easy for student • Current Score • Student’s interpretation • Last Score • Degree of progress
  • 30. Case 3 (MentorNet) I think they are easy for student • Current Score • Student’s interpretation • Last Score • Degree of progress
  • 32.
  • 33.
  • 36. Easiness in Noisy Label Cat Dog Dog Dog Cat Cat
  • 37. Easiness in Noisy Label Cat Dog Dog Dog Cat Cat
  • 38. Easiness in Noisy Label Cat Dog Dog Dog Cat Cat It’s Dog!! It is easy
  • 39. Easiness in Noisy Label Cat Dog Dog Dog Cat Cat It’s Cat!! It is hard
  • 40. Easiness in Noisy Label Cat Dog Dog Dog Cat Cat It’s Dog!! It is easy
  • 41. Easiness in Noisy Label Cat Dog Dog Dog Cat Cat It’s Cat!! It is hard
  • 42. Easiness in Noisy Label Better Generalization Student Learns Fast
  • 44. MentorMix Implementation • 𝑛 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒 • 𝑣 ∈ [0,1]𝑛 ∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 • 𝑤 ∶ 𝑤𝑒𝑖𝑔ℎ𝑡 𝑜𝑓 𝒔𝒕𝒖𝒅𝒆𝒏𝒕 𝑚𝑜𝑑𝑒𝑙 • 𝐹 𝑣, 𝑤 ∶ 𝐿𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 • 𝑔𝑠 𝑥𝑖; 𝑤, 𝑦𝑖 ∶ 𝑝𝑟𝑒𝑑𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷𝑁𝑁 • 𝐺 ∶ 𝐶𝑢𝑟𝑟𝑖𝑐𝑢𝑙𝑢𝑚 • 𝛾 ∶ 𝑙𝑜𝑠𝑠 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 𝐹(𝑣, 𝑤)
  • 45. • 𝑛 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒 • 𝑣 ∈ [0,1]𝑛 ∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 • 𝑤 ∶ 𝑤𝑒𝑖𝑔ℎ𝑡 𝑜𝑓 𝒔𝒕𝒖𝒅𝒆𝒏𝒕 𝑚𝑜𝑑𝑒𝑙 • 𝐹 𝑣, 𝑤 ∶ 𝐿𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 • 𝑔𝑠 𝑥𝑖; 𝑤, 𝑦𝑖 ∶ 𝑝𝑟𝑒𝑑𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷𝑁𝑁 • 𝐺 ∶ 𝐶𝑢𝑟𝑟𝑖𝑐𝑢𝑙𝑢𝑚 • 𝛾 ∶ 𝑙𝑜𝑠𝑠 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 MentorMix Implementation Cross Entropy Loss 𝐹(𝑣, 𝑤) = 1 𝑛 ෍ 𝑖=1 𝑛 𝑣𝑖 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 + 𝜃 𝑤 2 2 − 𝐺(𝑣; 𝛾)
  • 46. MentorMix Implementation • 𝑛 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒 • 𝑣 ∈ [0,1]𝑛 ∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 • 𝑤 ∶ 𝑤𝑒𝑖𝑔ℎ𝑡 𝑜𝑓 𝒔𝒕𝒖𝒅𝒆𝒏𝒕 𝑚𝑜𝑑𝑒𝑙 • 𝐹 𝑣, 𝑤 ∶ 𝐿𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 • 𝑔𝑠 𝑥𝑖; 𝑤, 𝑦𝑖 ∶ 𝑝𝑟𝑒𝑑𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷𝑁𝑁 • 𝐺 ∶ 𝐶𝑢𝑟𝑟𝑖𝑐𝑢𝑙𝑢𝑚 • 𝛾 ∶ 𝑙𝑜𝑠𝑠 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 𝒗 = 𝟏 𝒗 = 𝟎 𝐹(𝑣, 𝑤) = 1 𝑛 ෍ 𝑖=1 𝑛 𝑣𝑖 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 + 𝜃 𝑤 2 2 − 𝐺(𝑣; 𝛾) Cross Entropy Loss
  • 47. MentorMix Implementation • 𝑛 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒 • 𝑣 ∈ [0,1]𝑛 ∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 • 𝑤 ∶ 𝑤𝑒𝑖𝑔ℎ𝑡 𝑜𝑓 𝒔𝒕𝒖𝒅𝒆𝒏𝒕 𝑚𝑜𝑑𝑒𝑙 • 𝐹 𝑣, 𝑤 ∶ 𝐿𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 • 𝑔𝑠 𝑥𝑖; 𝑤, 𝑦𝑖 ∶ 𝑝𝑟𝑒𝑑𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷𝑁𝑁 • 𝐺 ∶ 𝐶𝑢𝑟𝑟𝑖𝑐𝑢𝑙𝑢𝑚 • 𝛾 ∶ 𝑙𝑜𝑠𝑠 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 𝐹(𝑣, 𝑤) = 1 𝑛 ෍ 𝑖=1 𝑛 𝑣𝑖 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 + 𝜃 𝑤 2 2 − 𝐺(𝑣; 𝛾) 𝒗 = 𝟏 𝒗 = 𝟎 Cross Entropy Loss
  • 48. • 𝑛 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒 • 𝑣 ∈ [0,1]𝑛 ∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 • 𝑤 ∶ 𝑤𝑒𝑖𝑔ℎ𝑡 𝑜𝑓 𝒔𝒕𝒖𝒅𝒆𝒏𝒕 𝑚𝑜𝑑𝑒𝑙 • 𝐹 𝑣, 𝑤 ∶ 𝐿𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 • 𝑔𝑠 𝑥𝑖; 𝑤, 𝑦𝑖 ∶ 𝑝𝑟𝑒𝑑𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷𝑁𝑁 • 𝐺 ∶ 𝐶𝑢𝑟𝑟𝑖𝑐𝑢𝑙𝑢𝑚 • 𝛾 ∶ 𝑙𝑜𝑠𝑠 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 MentorMix Implementation Curriculum 𝐹(𝑣, 𝑤) = 1 𝑛 ෍ 𝑖=1 𝑛 𝑣𝑖 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 + 𝜃 𝑤 2 2 − 𝐺(𝑣; 𝛾)
  • 49. • 𝑛 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒 • 𝑣 ∈ [0,1]𝑛 ∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 • 𝑤 ∶ 𝑤𝑒𝑖𝑔ℎ𝑡 𝑜𝑓 𝒔𝒕𝒖𝒅𝒆𝒏𝒕 𝑚𝑜𝑑𝑒𝑙 • 𝐹 𝑣, 𝑤 ∶ 𝐿𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 • 𝑔𝑠 𝑥𝑖; 𝑤, 𝑦𝑖 ∶ 𝑝𝑟𝑒𝑑𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷𝑁𝑁 • 𝐺 ∶ 𝐶𝑢𝑟𝑟𝑖𝑐𝑢𝑙𝑢𝑚 • 𝛾 ∶ 𝑙𝑜𝑠𝑠 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 MentorMix Implementation = 1 𝑛 ෍ 𝑖=1 𝑛 𝑣𝑖 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 + 𝜃 𝑤 2 2 − 𝐺(𝑣; 𝛾) 𝐺 𝑣; 𝛾 = −𝛾𝑣𝑖 𝐹(𝑣, 𝑤)
  • 50. • 𝑛 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒 • 𝑣 ∈ [0,1]𝑛 ∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 • 𝑤 ∶ 𝑤𝑒𝑖𝑔ℎ𝑡 𝑜𝑓 𝒔𝒕𝒖𝒅𝒆𝒏𝒕 𝑚𝑜𝑑𝑒𝑙 • 𝐹 𝑣, 𝑤 ∶ 𝐿𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 • 𝑔𝑠 𝑥𝑖; 𝑤, 𝑦𝑖 ∶ 𝑝𝑟𝑒𝑑𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷𝑁𝑁 • 𝐺 ∶ 𝐶𝑢𝑟𝑟𝑖𝑐𝑢𝑙𝑢𝑚 • 𝛾 ∶ 𝑙𝑜𝑠𝑠 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 MentorMix Implementation = 1 𝑛 ෍ 𝑖=1 𝑛 𝑣𝑖 [ 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 − 𝛾 ] + 𝜃 𝑤 2 2 = 1 𝑛 ෍ 𝑖=1 𝑛 𝑣𝑖 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 + 𝜃 𝑤 2 2 − 𝛾𝑣𝑖 𝐹(𝑣, 𝑤)
  • 51. • 𝑛 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒 • 𝑣 ∈ [0,1]𝑛 ∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 • 𝑤 ∶ 𝑤𝑒𝑖𝑔ℎ𝑡 𝑜𝑓 𝒔𝒕𝒖𝒅𝒆𝒏𝒕 𝑚𝑜𝑑𝑒𝑙 • 𝐹 𝑣, 𝑤 ∶ 𝐿𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 • 𝑔𝑠 𝑥𝑖; 𝑤, 𝑦𝑖 ∶ 𝑝𝑟𝑒𝑑𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷𝑁𝑁 • 𝐺 ∶ 𝐶𝑢𝑟𝑟𝑖𝑐𝑢𝑙𝑢𝑚 • 𝛾 ∶ 𝑙𝑜𝑠𝑠 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 MentorMix Implementation = 1 𝑛 ෍ 𝑖=1 𝑛 𝑣𝑖 [ 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 − 𝛾 ] + 𝜃 𝑤 2 2 = 1 𝑛 ෍ 𝑖=1 𝑛 𝑣𝑖 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 + 𝜃 𝑤 2 2 − 𝛾𝑣𝑖 𝐹(𝑣, 𝑤) 𝑣𝑖 = ቊ 1 𝑖𝑓 𝑙 𝑥𝑖, 𝑦𝑖 < 𝛾 0 𝑖𝑓 𝑙 𝑥𝑖, 𝑦𝑖 > 𝛾
  • 52. • 𝑛 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒 • 𝑣 ∈ [0,1]𝑛 ∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 • 𝑤 ∶ 𝑤𝑒𝑖𝑔ℎ𝑡 𝑜𝑓 𝒔𝒕𝒖𝒅𝒆𝒏𝒕 𝑚𝑜𝑑𝑒𝑙 • 𝐹 𝑣, 𝑤 ∶ 𝐿𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 • 𝑔𝑠 𝑥𝑖; 𝑤, 𝑦𝑖 ∶ 𝑝𝑟𝑒𝑑𝑒𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐷𝑁𝑁 • 𝐺 ∶ 𝐶𝑢𝑟𝑟𝑖𝑐𝑢𝑙𝑢𝑚 • 𝛾 ∶ 𝑙𝑜𝑠𝑠 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 MentorMix Implementation = 1 𝑛 ෍ 𝑖=1 𝑛 𝑣𝑖 [ 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 − 𝛾 ] + 𝜃 𝑤 2 2 = 1 𝑛 ෍ 𝑖=1 𝑛 𝑣𝑖 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 + 𝜃 𝑤 2 2 − 𝛾𝑣𝑖 𝐹(𝑣, 𝑤) 𝑣𝑖 = ቊ 1 𝑖𝑓 𝑙 𝑥𝑖, 𝑦𝑖 < 𝛾 0 𝑖𝑓 𝑙 𝑥𝑖, 𝑦𝑖 > 𝛾 𝒗 = 𝟏 𝒗 = 𝟎
  • 53. • 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼 • ෦ 𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡 • ෦ 𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡 MentorMix Implementation 𝐹(𝑣, 𝑤) = 1 𝑛 ෍ 𝑖=1 𝑛 𝑣𝑖 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 + 𝜃 𝑤 2 2 − 𝐺(𝑣; 𝛾)
  • 54. • 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼 • ෦ 𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡 • ෦ 𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡 MentorMix Implementation 𝐹(𝑣, 𝑤) = 1 𝑛 ෍ 𝑖=1 𝑛 𝑣𝑖 𝑙 𝑔𝑠 𝑥𝑖; 𝑤 , 𝑦𝑖 + 𝜃 𝑤 2 2 − 𝐺(𝑣; 𝛾)
  • 55. MentorMix Implementation 𝐹(𝑣, 𝑤) • 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼 • ෦ 𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡 • ෦ 𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡 • ෦ 𝑣𝑖𝑗 ∈ [0,1]𝑛 ∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
  • 56. MentorMix Implementation 𝐹(𝑣, 𝑤) • 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼 • ෦ 𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡 • ෦ 𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡 • ෦ 𝑣𝑖𝑗 ∈ [0,1]𝑛 ∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 ෦ 𝑣𝑖𝑗 = ቐ 1 𝑖𝑓 𝑙 ෦ 𝑥𝑖𝑗, ෦ 𝑦𝑖𝑗 < 𝛾 0 𝑖𝑓 𝑙 ෦ 𝑥𝑖𝑗, ෦ 𝑦𝑖𝑗 > 𝛾
  • 61. MentorMix Implementation v=0 v=1 v=1 v=0 × 𝑚𝑖𝑛(𝜆, 1 − 𝜆) × 𝑚𝑖𝑛(𝜆, 1 − 𝜆) × 𝒎𝒂𝒙(𝜆, 1 − 𝜆) × 𝒎𝒂𝒙(𝜆, 1 − 𝜆)
  • 62. MentorMix Implementation v=0 v=1 v=1 v=0 × 𝑚𝑖𝑛(𝜆, 1 − 𝜆) × 𝑚𝑖𝑛(𝜆, 1 − 𝜆) × 𝒎𝒂𝒙(𝜆, 1 − 𝜆) × 𝒎𝒂𝒙(𝜆, 1 − 𝜆) × 𝒎𝒂𝒙(𝜆, 1 − 𝜆) × 𝒎𝒂𝒙(𝜆, 1 − 𝜆) × 𝑚𝑖𝑛(𝜆, 1 − 𝜆) × 𝑚𝑖𝑛(𝜆, 1 − 𝜆)
  • 63. MentorMix Implementation v=0 v=1 v=1 v=0 × 𝑚𝑖𝑛(𝜆, 1 − 𝜆) × 𝑚𝑖𝑛(𝜆, 1 − 𝜆) × 𝒎𝒂𝒙(𝜆, 1 − 𝜆) × 𝒎𝒂𝒙(𝜆, 1 − 𝜆) × 𝒎𝒂𝒙(𝜆, 1 − 𝜆) × 𝒎𝒂𝒙(𝜆, 1 − 𝜆) × 𝑚𝑖𝑛(𝜆, 1 − 𝜆) × 𝑚𝑖𝑛(𝜆, 1 − 𝜆)
  • 65.
  • 66. • 𝑥𝑖 ∶ 𝑖′𝑡ℎ 𝑖𝑛𝑝𝑢𝑡 • 𝑦𝑖 ∶ 𝑖′𝑡ℎ 𝑙𝑎𝑏𝑒𝑙 • 𝐷𝑚 ∶ 𝑚𝑖𝑛𝑖 𝑏𝑎𝑡𝑐ℎ • 𝛼, 𝛾𝑝 ∶ ℎ𝑦𝑝𝑒𝑟 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 • 𝑣 ∈ [0,1]𝑛 ∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 • 𝑃𝑣 ∶ 𝑆𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 • 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼 • ෦ 𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡 • ෦ 𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡
  • 67. • 𝑥𝑖 ∶ 𝑖′𝑡ℎ 𝑖𝑛𝑝𝑢𝑡 • 𝑦𝑖 ∶ 𝑖′𝑡ℎ 𝑙𝑎𝑏𝑒𝑙 • 𝐷𝑚 ∶ 𝑚𝑖𝑛𝑖 𝑏𝑎𝑡𝑐ℎ • 𝛼, 𝛾𝑝 ∶ ℎ𝑦𝑝𝑒𝑟 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 • 𝑣 ∈ [0,1]𝑛 ∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 • 𝑃𝑣 ∶ 𝑆𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 • 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼 • ෦ 𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡 • ෦ 𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡 ∗ 𝛾𝑝 ∈ {90%, 80%, 70%}
  • 68. • 𝑥𝑖 ∶ 𝑖′𝑡ℎ 𝑖𝑛𝑝𝑢𝑡 • 𝑦𝑖 ∶ 𝑖′𝑡ℎ 𝑙𝑎𝑏𝑒𝑙 • 𝐷𝑚 ∶ 𝑚𝑖𝑛𝑖 𝑏𝑎𝑡𝑐ℎ • 𝛼, 𝛾𝑝 ∶ ℎ𝑦𝑝𝑒𝑟 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 • 𝑣 ∈ [0,1]𝑛 ∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 • 𝑃𝑣 ∶ 𝑆𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 • 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼 • ෦ 𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡 • ෦ 𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡 ∗ 𝛾𝑝 ∈ {90%, 80%, 70%} ∗ EMA weight =0.999
  • 69. • 𝑥𝑖 ∶ 𝑖′𝑡ℎ 𝑖𝑛𝑝𝑢𝑡 • 𝑦𝑖 ∶ 𝑖′𝑡ℎ 𝑙𝑎𝑏𝑒𝑙 • 𝐷𝑚 ∶ 𝑚𝑖𝑛𝑖 𝑏𝑎𝑡𝑐ℎ • 𝛼, 𝛾𝑝 ∶ ℎ𝑦𝑝𝑒𝑟 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 • 𝑣 ∈ [0,1]𝑛 ∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 • 𝑃𝑣 ∶ 𝑆𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 • 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼 • ෦ 𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡 • ෦ 𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡
  • 70. • 𝑥𝑖 ∶ 𝑖′𝑡ℎ 𝑖𝑛𝑝𝑢𝑡 • 𝑦𝑖 ∶ 𝑖′𝑡ℎ 𝑙𝑎𝑏𝑒𝑙 • 𝐷𝑚 ∶ 𝑚𝑖𝑛𝑖 𝑏𝑎𝑡𝑐ℎ • 𝛼, 𝛾𝑝 ∶ ℎ𝑦𝑝𝑒𝑟 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 • 𝑣 ∈ [0,1]𝑛 ∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 • 𝑃𝑣 ∶ 𝑆𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 • 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼 • ෦ 𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡 • ෦ 𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡
  • 71. • 𝑥𝑖 ∶ 𝑖′𝑡ℎ 𝑖𝑛𝑝𝑢𝑡 • 𝑦𝑖 ∶ 𝑖′𝑡ℎ 𝑙𝑎𝑏𝑒𝑙 • 𝐷𝑚 ∶ 𝑚𝑖𝑛𝑖 𝑏𝑎𝑡𝑐ℎ • 𝛼, 𝛾𝑝 ∶ ℎ𝑦𝑝𝑒𝑟 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 • 𝑣 ∈ [0,1]𝑛 ∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 • 𝑃𝑣 ∶ 𝑆𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 • 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼 • ෦ 𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡 • ෦ 𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡
  • 72. • 𝑥𝑖 ∶ 𝑖′𝑡ℎ 𝑖𝑛𝑝𝑢𝑡 • 𝑦𝑖 ∶ 𝑖′𝑡ℎ 𝑙𝑎𝑏𝑒𝑙 • 𝐷𝑚 ∶ 𝑚𝑖𝑛𝑖 𝑏𝑎𝑡𝑐ℎ • 𝛼, 𝛾𝑝 ∶ ℎ𝑦𝑝𝑒𝑟 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 • 𝑣 ∈ [0,1]𝑛 ∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 • 𝑃𝑣 ∶ 𝑆𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 • 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼 • ෦ 𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡 • ෦ 𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡
  • 73. • 𝑥𝑖 ∶ 𝑖′𝑡ℎ 𝑖𝑛𝑝𝑢𝑡 • 𝑦𝑖 ∶ 𝑖′𝑡ℎ 𝑙𝑎𝑏𝑒𝑙 • 𝐷𝑚 ∶ 𝑚𝑖𝑛𝑖 𝑏𝑎𝑡𝑐ℎ • 𝛼, 𝛾𝑝 ∶ ℎ𝑦𝑝𝑒𝑟 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 • 𝑣 ∈ [0,1]𝑛 ∶ 𝑙𝑎𝑡𝑒𝑛𝑡 𝑤𝑒𝑖𝑔ℎ𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 • 𝑃𝑣 ∶ 𝑆𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝐷𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 • 𝜆 ~ 𝐵𝑒𝑡𝑎 𝛼, 𝛼 • ෦ 𝑥𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑖𝑛𝑝𝑢𝑡 • ෦ 𝑦𝑖𝑗 ∶ 𝑚𝑖𝑥𝑒𝑑 𝑜𝑢𝑡𝑝𝑢𝑡
  • 75.
  • 76. • All Methods trained on BLUE/RED Noise Mini-ImageNet / Stanford under 10 noise levels(p) [0,5,10,15,20,30,40,50,60,80] • Train from Scratch / Fine-Tuning • Inception-ResNet-v2(upsample Mini-ImageNet dataset from 84x84 to 299x299) • Grid Search Hyperparameter 𝛼 ∈ {0.4, 1, 2}, 𝛾𝑝 ∈ {90%, 80%, 70%} Experiments Setting
  • 77.
  • 78.
  • 79. Understanding DNNs trained on noisy labels
  • 80. Understanding DNNs trained on noisy labels
  • 81. Understanding DNNs trained on noisy labels
  • 82. Understanding DNNs trained on noisy labels
  • 83. Understanding DNNs trained on noisy labels