deep learning from scratch chapter 5.learning related skills

•Download as PPTX, PDF•

1 like•38 views

Jaey Jeong

deep learning from scratch

Software

■Intro
■Optimizer
■Initial Value of Weight
■Overcome of Overfitting
■Hyper Parameter
Agenda
Interaction Lab., Kumoh National Institue of Technology 2

■Optimization
 A parameter that reduces the value of the loss function
• Gradient
 SGD
• 𝑊 = 𝑊 − η
𝜎𝐿
𝜎𝑊
Intro(1/4)
Interaction Lab., Kumoh National Institue of Technology 3

■SGD
 𝑓 𝑥, 𝑦 =
1
20
𝑥2
+ 𝑦2
Intro(2/4)
Interaction Lab., Kumoh National Institue of Technology 4

■SGD
 (-7, 2) start
Intro(3/4)
Interaction Lab., Kumoh National Institue of Technology 5

■SGD
Intro(4/4)
Interaction Lab., Kumoh National Institue of Technology 6

■Momentum
 𝑣 ← 𝑎𝑣 − η
𝜎𝐿
𝜎𝑊
 W ← 𝑊 + 𝑣
Optimizer(1/3)
Interaction Lab., Kumoh National Institue of Technology 7

■AdaGrad
 ℎ ← ℎ +
𝜎𝐿
𝜎𝑊
∙
𝜎𝐿
𝜎𝑊
 𝑊 ← 𝑊 − η
1
ℎ
∙
𝜎𝐿
𝜎𝑊
 Learning rate decay
 RMSProp
Optimizer(2/3)
Interaction Lab., Kumoh National Institue of Technology 8

■Adam
 AdaGrad + Momentum
Optimizer(3/3)
Interaction Lab., Kumoh National Institue of Technology 9

■In case of 0
 Bad idea
• All weights are updated equally in backpropagation
• Learning is not working effectively
Initial value of weight(1/11)
Interaction Lab., Kumoh National Institue of Technology 10

■In case of
 Using sigmoid
 Normal distribution with 1 standard deviation
 Gradient vanishing
Initial value of weight(2/11)
Interaction Lab., Kumoh National Institue of Technology 11

■In case of
 Using sigmoid
 Normal distribution with 0.01 standard deviation
 Representation spectrum
Initial value of weight(3/11)
Interaction Lab., Kumoh National Institue of Technology 12

■In case of Xavier value
 Using sigmoid
 Before N node
 Normal distribution with
1
𝑛
standard deviation
Initial value of weight(4/11)
Interaction Lab., Kumoh National Institue of Technology 13

■In case of He value
 Using ReLU
 Before N node
 Normal distribution with
2
𝑛
standard deviation
Initial value of weight(5/11)
Interaction Lab., Kumoh National Institue of Technology 14

■Batch normalization
 Force distribution of activation values
 Learning speed improvement
 Does not depend on initial value
 Suppression of overfitting
Initial value of weight(6/11)
Interaction Lab., Kumoh National Institue of Technology 15

■Batch normalization
 Insert “Batch Norm” layer
• Adjust so that the activation value is properly distribution
Initial value of weight(7/11)
Interaction Lab., Kumoh National Institue of Technology 16

■Batch normalization
Initial value of weight(8/11)
Interaction Lab., Kumoh National Institue of Technology 17
{x1, x2, x3, …, xn} {𝒙1, 𝒙 2, 𝒙 3, …, 𝒙 n}
Mini-batch mean
Mini-batch variance
normalize

Initial value of weight(8/11)
Interaction Lab., Kumoh National Institue of Technology 18

Initial value of weight(9/11)
Interaction Lab., Kumoh National Institue of Technology 19

■Batch normalization
Initial value of weight(10/11)
Interaction Lab., Kumoh National Institue of Technology 20

■Batch normalization
Initial value of weight(11/11)
Interaction Lab., Kumoh National Institue of Technology 21

■Model with many parameter and high expressiveness
■Little training data
Overcome of Overfitting(1/3)
Interaction Lab., Kumoh National Institue of Technology 22

■Weight decay
 In learning, penalize large weight
 Loss +
1
2
λ𝑊2
 λ: hyper parameter
• If λ is large, penalize weights

1
2
λ𝑊2
→ λ𝑊
Overcome of Overfitting(2/3)
Interaction Lab., Kumoh National Institue of Technology 23

■Dropout
Overcome of Overfitting(3/3)
Interaction Lab., Kumoh National Institue of Technology 24

■Hyper parameter
 Number of neuron
 Batch size
 Learning rate
 Etc…
Hyper parameter(1/3)
Interaction Lab., Kumoh National Institue of Technology 25

■Training data
 Only train
■Test data
 Only test
■Validation data
 Adjust hyper parameter
Hyper parameter(2/3)
Interaction Lab., Kumoh National Institue of Technology 26

■Optimization
 Setting the range of value
 Randomization
 Evaluation after learning with the extracted value
 Repeat and narrow down
Hyper parameter(3/3)
Interaction Lab., Kumoh National Institue of Technology 27

Q&A
Interaction Lab., Kumoh National Institue of Technology 28

What's hot

Machine Learning - Ensemble MethodsAndrew Ferlitsch

Presentation_OCRsamvb18

Cross-validation aggregation for forecastingDevon Barrow

Ensemble hybrid learning techniqueDishaSinha9

Racing for unbalanced methods selectionAndrea Dal Pozzolo

Optimization Shuffled Frog Leaping AlgorithmUday Wankar

Boosting Algorithms Omar Odibat omarodibat

(Machine Learning) Ensemble learning Omkar Rane

Adversarial Reinforced Learning for Unsupervised Domain Adaptationtaeseon ryu

safe and efficient off policy reinforcement learningRyo Iwaki

Machine learning with ADA BoostAman Patel

H2O World - Ensembles with Erin LeDellSri Ambati

Understanding Bagging and BoostingMohit Rajput

Ensemble learningHaris Jamil

Mlp mixer an all-mlp architecture for visionJaey Jeong

Caching strategies for in memory neighborhood-based recommender systemsSimon Dooms

Using HOG Descriptors on Superpixels for Human Detection of UAV ImageryWai Nwe Tun

Boosting Approach to Solving Machine Learning ProblemsDr Sulaimon Afolabi

Decision Forests and discriminant analysispotaters

Kaggle kennethkenluck2001

What's hot (20)

Machine Learning - Ensemble Methods

Presentation_OCR

Cross-validation aggregation for forecasting

Ensemble hybrid learning technique

Racing for unbalanced methods selection

Optimization Shuffled Frog Leaping Algorithm

Boosting Algorithms Omar Odibat

(Machine Learning) Ensemble learning

Adversarial Reinforced Learning for Unsupervised Domain Adaptation

safe and efficient off policy reinforcement learning

Machine learning with ADA Boost

H2O World - Ensembles with Erin LeDell

Understanding Bagging and Boosting

Ensemble learning

Mlp mixer an all-mlp architecture for vision

Caching strategies for in memory neighborhood-based recommender systems

Using HOG Descriptors on Superpixels for Human Detection of UAV Imagery

Boosting Approach to Solving Machine Learning Problems

Decision Forests and discriminant analysis

Kaggle kenneth

Similar to deep learning from scratch chapter 5.learning related skills

deep learning from scratch chapter 6.backpropagationJaey Jeong

Tablet gaze unconstrained appearance based gaze estimation in mobile tabletsJaey Jeong

Presentation1Ashish Meshram

Unsupervised representation learning for gaze estimationJaey Jeong

Appearance based gaze estimation using deep features and random forest regres...Jaey Jeong

Tracking the tracker: Time Series Analysis in Python from First Principleskenluck2001

Deep learning based gaze detection system for automobile drivers using nir ca...Jaey Jeong

deep learning from scratch chapter 4.neural network learingJaey Jeong

Optimization of Unit Commitment Problem using Classical Soft Computing Techni...IRJET Journal

Similar to deep learning from scratch chapter 5.learning related skills (9)

deep learning from scratch chapter 6.backpropagation

Tablet gaze unconstrained appearance based gaze estimation in mobile tablets

Presentation1

Unsupervised representation learning for gaze estimation

Appearance based gaze estimation using deep features and random forest regres...

Tracking the tracker: Time Series Analysis in Python from First Principles

Deep learning based gaze detection system for automobile drivers using nir ca...

deep learning from scratch chapter 4.neural network learing

Optimization of Unit Commitment Problem using Classical Soft Computing Techni...

Recently uploaded

How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes

Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions

Professional Resume Template for Software DevelopersVinodh Ram

HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab

Test Automation Strategy for Frontend and BackendArshad QA

Microsoft AI Transformation Partner Playbook.pdfWilly Marroquin (WillyDevNET)

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700

Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531

Diamond Application Development Crafting Solutions with PrecisionSolGuruz

How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions

Optimizing AI for immediate response in Smart CCTVshikhaohhpro

Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy

Active Directory Penetration Testing, cionsystems.com.pdfCionsystems

The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

A Secure and Reliable Document Management System is Essential.docxComplianceQuest1

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls

Recently uploaded (20)

How To Troubleshoot Collaboration Apps for the Modern Connected Worker

Advancing Engineering with AI through the Next Generation of Strategic Projec...

Professional Resume Template for Software Developers

HR Software Buyers Guide in 2024 - HRSoftware.com

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...

Test Automation Strategy for Frontend and Backend

Microsoft AI Transformation Partner Playbook.pdf

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...

Hand gesture recognition PROJECT PPT.pptx

Diamond Application Development Crafting Solutions with Precision

How To Use Server-Side Rendering with Nuxt.js

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...

Optimizing AI for immediate response in Smart CCTV

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications

Active Directory Penetration Testing, cionsystems.com.pdf

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

A Secure and Reliable Document Management System is Essential.docx

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️

deep learning from scratch chapter 5.learning related skills

1. Interaction Lab. Kumoh National Institute of Technology Deep Learning from Scratch chapter 6. Learning-related skills JaeYeop Jeong

2. ■Intro ■Optimizer ■Initial Value of Weight ■Overcome of Overfitting ■Hyper Parameter Agenda Interaction Lab., Kumoh National Institue of Technology 2

3. ■Optimization  A parameter that reduces the value of the loss function • Gradient  SGD • 𝑊 = 𝑊 − η 𝜎𝐿 𝜎𝑊 Intro(1/4) Interaction Lab., Kumoh National Institue of Technology 3

4. ■SGD  𝑓 𝑥, 𝑦 = 1 20 𝑥2 + 𝑦2 Intro(2/4) Interaction Lab., Kumoh National Institue of Technology 4

5. ■SGD  (-7, 2) start Intro(3/4) Interaction Lab., Kumoh National Institue of Technology 5

6. ■SGD Intro(4/4) Interaction Lab., Kumoh National Institue of Technology 6

7. ■Momentum  𝑣 ← 𝑎𝑣 − η 𝜎𝐿 𝜎𝑊  W ← 𝑊 + 𝑣 Optimizer(1/3) Interaction Lab., Kumoh National Institue of Technology 7

8. ■AdaGrad  ℎ ← ℎ + 𝜎𝐿 𝜎𝑊 ∙ 𝜎𝐿 𝜎𝑊  𝑊 ← 𝑊 − η 1 ℎ ∙ 𝜎𝐿 𝜎𝑊  Learning rate decay  RMSProp Optimizer(2/3) Interaction Lab., Kumoh National Institue of Technology 8

9. ■Adam  AdaGrad + Momentum Optimizer(3/3) Interaction Lab., Kumoh National Institue of Technology 9

10. ■In case of 0  Bad idea • All weights are updated equally in backpropagation • Learning is not working effectively Initial value of weight(1/11) Interaction Lab., Kumoh National Institue of Technology 10

11. ■In case of  Using sigmoid  Normal distribution with 1 standard deviation  Gradient vanishing Initial value of weight(2/11) Interaction Lab., Kumoh National Institue of Technology 11

12. ■In case of  Using sigmoid  Normal distribution with 0.01 standard deviation  Representation spectrum Initial value of weight(3/11) Interaction Lab., Kumoh National Institue of Technology 12

13. ■In case of Xavier value  Using sigmoid  Before N node  Normal distribution with 1 𝑛 standard deviation Initial value of weight(4/11) Interaction Lab., Kumoh National Institue of Technology 13

14. ■In case of He value  Using ReLU  Before N node  Normal distribution with 2 𝑛 standard deviation Initial value of weight(5/11) Interaction Lab., Kumoh National Institue of Technology 14

15. ■Batch normalization  Force distribution of activation values  Learning speed improvement  Does not depend on initial value  Suppression of overfitting Initial value of weight(6/11) Interaction Lab., Kumoh National Institue of Technology 15

16. ■Batch normalization  Insert “Batch Norm” layer • Adjust so that the activation value is properly distribution Initial value of weight(7/11) Interaction Lab., Kumoh National Institue of Technology 16

17. ■Batch normalization Initial value of weight(8/11) Interaction Lab., Kumoh National Institue of Technology 17 {x1, x2, x3, …, xn} {𝒙1, 𝒙 2, 𝒙 3, …, 𝒙 n} Mini-batch mean Mini-batch variance normalize

18. Initial value of weight(8/11) Interaction Lab., Kumoh National Institue of Technology 18

19. Initial value of weight(9/11) Interaction Lab., Kumoh National Institue of Technology 19

20. ■Batch normalization Initial value of weight(10/11) Interaction Lab., Kumoh National Institue of Technology 20

21. ■Batch normalization Initial value of weight(11/11) Interaction Lab., Kumoh National Institue of Technology 21

22. ■Model with many parameter and high expressiveness ■Little training data Overcome of Overfitting(1/3) Interaction Lab., Kumoh National Institue of Technology 22

23. ■Weight decay  In learning, penalize large weight  Loss + 1 2 λ𝑊2  λ: hyper parameter • If λ is large, penalize weights  1 2 λ𝑊2 → λ𝑊 Overcome of Overfitting(2/3) Interaction Lab., Kumoh National Institue of Technology 23

24. ■Dropout Overcome of Overfitting(3/3) Interaction Lab., Kumoh National Institue of Technology 24

25. ■Hyper parameter  Number of neuron  Batch size  Learning rate  Etc… Hyper parameter(1/3) Interaction Lab., Kumoh National Institue of Technology 25

26. ■Training data  Only train ■Test data  Only test ■Validation data  Adjust hyper parameter Hyper parameter(2/3) Interaction Lab., Kumoh National Institue of Technology 26

27. ■Optimization  Setting the range of value  Randomization  Evaluation after learning with the extracted value  Repeat and narrow down Hyper parameter(3/3) Interaction Lab., Kumoh National Institue of Technology 27

28. Q&A Interaction Lab., Kumoh National Institue of Technology 28

Editor's Notes

그릇에 구슬이 구르듯이 값을 탐색 방향성을 가지고 현재 방향에서 일정 값 더 탐색
학습률 감소를 이용해서 값을 탐색 많이 갱신되는 가중치는 최적 값에 가까이 갔다고 판단 그 후는 조금씩 탐색 기울기 역수 값이 계속 곱해져서 언젠가는 0에 가까움 값 -> 기울기 손실 RMSProp 그 전 기울기 값보다 최신 기울기 값이 더 반영되게 하는 것
둘이 장점 합친거 adaGrad에 갱신 되는 값을 조절해줘서 처음엔 크지만 점점 조금씪 탐색 모멘텀에서 방향성을 가지면서 값을 탐색
학습에 관련된 기법 중 가중치 초기 값 결정이 중요함
표준편차가 1인 정규분포에서 가중치 값을 초기화 표준편차 1이면 큰 값 따라서 넓게 분포 즉 분산이 크다 시그모이드 함수에서 대부분 0과 1에 분포
표준편차 0.01 정규분포 가중치 값 초기화 중앙에 값이 분포 각 노드들이 대부분 같은 값을 가지는 것은 표현력이 제한
세이버 값 앞에 노드 개수가 N 개일때 표준편차 루트 1/n을 정규분포 가중치 초기화 값 망이 깊어 갈수록 모양이 일그러지지만 나름 좋음 sigmoid와 사용할 때 좋음
ReLU를 사용할 때 사용하는 He 초기값 표준편차 루트 2/n을 정규분포 가중치 초기화 값으로 사용 0에 많은 값이 몰린 이유는 ReLU 수식에서 음수는 다 0이기 때문에 그런것이라고 생각
앞에서 활성화 함수 값의 분포를 위한 가중치 값들의 초기값을 결정에 대해서 알아봤는데, 각 노드에 활성화 값을 강제로 분포 하는 방식이 배치정규화 학습이 빠르다(학습률 더 조절 가능(정규화해주기 때문에)) 초기 가중치 값 설정할 필요 없음 과적합 방지(입력 값을 정규화 해줘서 0~1사이 값으로 만들어주기 때문에 가중치 갱신에 큰 영향 없게)
레이어 사이에 배치 정규화 레이어 삽입
입력 미니배치 x에 평균 분산을 구해서 정규화 한다.
각 데이터에서 같은 feature끼리 평균과 분산을 구해서 다음 식으로 정규화 한다 즉, 0과 1사이 값으로 변경시켜줌
다음 그림과 같이 각 데이터가 어떤 모습에 분포를 가지더라도 오른쪽으로 정규화 가능
배치 정규화를 사용한 것과 사용하지 않은 것들의 차이
가중치 초기 값을 정해주는거랑 배치 종규화 사용 \
매개변수가 많거나 표현력이 높은 모델 적은 훈련 데이터
가중치 감소 기법 가중치가 큰 값 즉 학습에 영향력이 큰 가중치에는 패널티를 주는 방법 손실함수에 ½람다가중치제곱 갑을 곱하는데, 여기서 람다는 사용자가 정하는 하이퍼 파라미터로써 값이 클수록 큰 패널티를 줄 수 있고 앞에 상수 값은 전체 패널티 값 결정 즉 손실함수에 값을 추가해줌으로써 이 가중치가 중요하지 않다라는 것을 표현 역전파에서는 미분한 값을 더해서 갱신에도 영향을 줄 수 있음
학습 중에 임의에 노드들을 삭제해서 학습하는 방법 모든 노드르 사용하지 않고 매번 삭제하는 노드를 바꿈으로써 매번 다른 모델을 학습시키는 것 테스트할 때는 모든 뉴런 사용
값의 범위를 설정한다. (0.001 ~ 1000) 랜덤으로 추출 추출된 값으로 학습하고 검증데이터로 검증 반복하고 조절

deep learning from scratch chapter 5.learning related skills

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to deep learning from scratch chapter 5.learning related skills

Similar to deep learning from scratch chapter 5.learning related skills (9)

More from Jaey Jeong

More from Jaey Jeong (6)

Recently uploaded

Recently uploaded (20)

deep learning from scratch chapter 5.learning related skills

Editor's Notes