SlideShare a Scribd company logo
Interaction Lab. Kumoh National Institute of Technology
Deep Learning from Scratch
chapter 6. Learning-related skills
JaeYeop Jeong
■Intro
■Optimizer
■Initial Value of Weight
■Overcome of Overfitting
■Hyper Parameter
Agenda
Interaction Lab., Kumoh National Institue of Technology 2
■Optimization
 A parameter that reduces the value of the loss function
• Gradient
 SGD
• 𝑊 = 𝑊 − η
𝜎𝐿
𝜎𝑊
Intro(1/4)
Interaction Lab., Kumoh National Institue of Technology 3
■SGD
 𝑓 𝑥, 𝑦 =
1
20
𝑥2
+ 𝑦2
Intro(2/4)
Interaction Lab., Kumoh National Institue of Technology 4
■SGD
 (-7, 2) start
Intro(3/4)
Interaction Lab., Kumoh National Institue of Technology 5
■SGD
Intro(4/4)
Interaction Lab., Kumoh National Institue of Technology 6
■Momentum
 𝑣 ← 𝑎𝑣 − η
𝜎𝐿
𝜎𝑊
 W ← 𝑊 + 𝑣
Optimizer(1/3)
Interaction Lab., Kumoh National Institue of Technology 7
■AdaGrad
 ℎ ← ℎ +
𝜎𝐿
𝜎𝑊
∙
𝜎𝐿
𝜎𝑊
 𝑊 ← 𝑊 − η
1
ℎ
∙
𝜎𝐿
𝜎𝑊
 Learning rate decay
 RMSProp
Optimizer(2/3)
Interaction Lab., Kumoh National Institue of Technology 8
■Adam
 AdaGrad + Momentum
Optimizer(3/3)
Interaction Lab., Kumoh National Institue of Technology 9
■In case of 0
 Bad idea
• All weights are updated equally in backpropagation
• Learning is not working effectively
Initial value of weight(1/11)
Interaction Lab., Kumoh National Institue of Technology 10
■In case of
 Using sigmoid
 Normal distribution with 1 standard deviation
 Gradient vanishing
Initial value of weight(2/11)
Interaction Lab., Kumoh National Institue of Technology 11
■In case of
 Using sigmoid
 Normal distribution with 0.01 standard deviation
 Representation spectrum
Initial value of weight(3/11)
Interaction Lab., Kumoh National Institue of Technology 12
■In case of Xavier value
 Using sigmoid
 Before N node
 Normal distribution with
1
𝑛
standard deviation
Initial value of weight(4/11)
Interaction Lab., Kumoh National Institue of Technology 13
■In case of He value
 Using ReLU
 Before N node
 Normal distribution with
2
𝑛
standard deviation
Initial value of weight(5/11)
Interaction Lab., Kumoh National Institue of Technology 14
■Batch normalization
 Force distribution of activation values
 Learning speed improvement
 Does not depend on initial value
 Suppression of overfitting
Initial value of weight(6/11)
Interaction Lab., Kumoh National Institue of Technology 15
■Batch normalization
 Insert “Batch Norm” layer
• Adjust so that the activation value is properly distribution
Initial value of weight(7/11)
Interaction Lab., Kumoh National Institue of Technology 16
■Batch normalization
Initial value of weight(8/11)
Interaction Lab., Kumoh National Institue of Technology 17
{x1, x2, x3, …, xn} {𝒙1, 𝒙 2, 𝒙 3, …, 𝒙 n}
Mini-batch mean
Mini-batch variance
normalize
Initial value of weight(8/11)
Interaction Lab., Kumoh National Institue of Technology 18
Initial value of weight(9/11)
Interaction Lab., Kumoh National Institue of Technology 19
■Batch normalization
Initial value of weight(10/11)
Interaction Lab., Kumoh National Institue of Technology 20
■Batch normalization
Initial value of weight(11/11)
Interaction Lab., Kumoh National Institue of Technology 21
■Model with many parameter and high expressiveness
■Little training data
Overcome of Overfitting(1/3)
Interaction Lab., Kumoh National Institue of Technology 22
■Weight decay
 In learning, penalize large weight
 Loss +
1
2
λ𝑊2
 λ: hyper parameter
• If λ is large, penalize weights

1
2
λ𝑊2
→ λ𝑊
Overcome of Overfitting(2/3)
Interaction Lab., Kumoh National Institue of Technology 23
■Dropout
Overcome of Overfitting(3/3)
Interaction Lab., Kumoh National Institue of Technology 24
■Hyper parameter
 Number of neuron
 Batch size
 Learning rate
 Etc…
Hyper parameter(1/3)
Interaction Lab., Kumoh National Institue of Technology 25
■Training data
 Only train
■Test data
 Only test
■Validation data
 Adjust hyper parameter
Hyper parameter(2/3)
Interaction Lab., Kumoh National Institue of Technology 26
■Optimization
 Setting the range of value
 Randomization
 Evaluation after learning with the extracted value
 Repeat and narrow down
Hyper parameter(3/3)
Interaction Lab., Kumoh National Institue of Technology 27
Q&A
Interaction Lab., Kumoh National Institue of Technology 28

More Related Content

What's hot

Optimization Shuffled Frog Leaping Algorithm
Optimization Shuffled Frog Leaping AlgorithmOptimization Shuffled Frog Leaping Algorithm
Optimization Shuffled Frog Leaping Algorithm
Uday Wankar
 
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain AdaptationAdversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
taeseon ryu
 
Caching strategies for in memory neighborhood-based recommender systems
Caching strategies for in memory neighborhood-based recommender systemsCaching strategies for in memory neighborhood-based recommender systems
Caching strategies for in memory neighborhood-based recommender systems
Simon Dooms
 

What's hot (20)

Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble Methods
 
Presentation_OCR
Presentation_OCRPresentation_OCR
Presentation_OCR
 
Cross-validation aggregation for forecasting
Cross-validation aggregation for forecastingCross-validation aggregation for forecasting
Cross-validation aggregation for forecasting
 
Ensemble hybrid learning technique
Ensemble hybrid learning techniqueEnsemble hybrid learning technique
Ensemble hybrid learning technique
 
Racing for unbalanced methods selection
Racing for unbalanced methods selectionRacing for unbalanced methods selection
Racing for unbalanced methods selection
 
Optimization Shuffled Frog Leaping Algorithm
Optimization Shuffled Frog Leaping AlgorithmOptimization Shuffled Frog Leaping Algorithm
Optimization Shuffled Frog Leaping Algorithm
 
Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat
 
(Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning (Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning
 
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain AdaptationAdversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
 
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellH2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDell
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Mlp mixer an all-mlp architecture for vision
Mlp mixer  an all-mlp architecture for visionMlp mixer  an all-mlp architecture for vision
Mlp mixer an all-mlp architecture for vision
 
Caching strategies for in memory neighborhood-based recommender systems
Caching strategies for in memory neighborhood-based recommender systemsCaching strategies for in memory neighborhood-based recommender systems
Caching strategies for in memory neighborhood-based recommender systems
 
Using HOG Descriptors on Superpixels for Human Detection of UAV Imagery
Using HOG Descriptors on Superpixels for Human Detection of UAV ImageryUsing HOG Descriptors on Superpixels for Human Detection of UAV Imagery
Using HOG Descriptors on Superpixels for Human Detection of UAV Imagery
 
Boosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning ProblemsBoosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning Problems
 
Decision Forests and discriminant analysis
Decision Forests and discriminant analysisDecision Forests and discriminant analysis
Decision Forests and discriminant analysis
 
Kaggle kenneth
Kaggle kennethKaggle kenneth
Kaggle kenneth
 

Similar to deep learning from scratch chapter 5.learning related skills

Similar to deep learning from scratch chapter 5.learning related skills (9)

deep learning from scratch chapter 6.backpropagation
deep learning from scratch chapter 6.backpropagationdeep learning from scratch chapter 6.backpropagation
deep learning from scratch chapter 6.backpropagation
 
Tablet gaze unconstrained appearance based gaze estimation in mobile tablets
Tablet gaze unconstrained appearance based gaze estimation in mobile tabletsTablet gaze unconstrained appearance based gaze estimation in mobile tablets
Tablet gaze unconstrained appearance based gaze estimation in mobile tablets
 
Presentation1
Presentation1Presentation1
Presentation1
 
Unsupervised representation learning for gaze estimation
Unsupervised representation learning for gaze estimationUnsupervised representation learning for gaze estimation
Unsupervised representation learning for gaze estimation
 
Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...
 
Tracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First PrinciplesTracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First Principles
 
Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...
 
deep learning from scratch chapter 4.neural network learing
deep learning from scratch chapter 4.neural network learingdeep learning from scratch chapter 4.neural network learing
deep learning from scratch chapter 4.neural network learing
 
Optimization of Unit Commitment Problem using Classical Soft Computing Techni...
Optimization of Unit Commitment Problem using Classical Soft Computing Techni...Optimization of Unit Commitment Problem using Classical Soft Computing Techni...
Optimization of Unit Commitment Problem using Classical Soft Computing Techni...
 

More from Jaey Jeong (6)

Improving accuracy of binary neural networks using unbalanced activation dist...
Improving accuracy of binary neural networks using unbalanced activation dist...Improving accuracy of binary neural networks using unbalanced activation dist...
Improving accuracy of binary neural networks using unbalanced activation dist...
 
Gaze estimation using transformer
Gaze estimation using transformerGaze estimation using transformer
Gaze estimation using transformer
 
핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNN핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNN
 
Neural networks for semantic gaze analysis in xr settings
Neural networks for semantic gaze analysis in xr settingsNeural networks for semantic gaze analysis in xr settings
Neural networks for semantic gaze analysis in xr settings
 
Gaze supported 3 d object manipulation in virtual reality
Gaze supported 3 d object manipulation in virtual realityGaze supported 3 d object manipulation in virtual reality
Gaze supported 3 d object manipulation in virtual reality
 
deep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural networkdeep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural network
 

Recently uploaded

Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
mbmh111980
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 

Recently uploaded (20)

Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting software
 
iGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockiGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by Skilrock
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
Breaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfBreaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdf
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 

deep learning from scratch chapter 5.learning related skills

  • 1. Interaction Lab. Kumoh National Institute of Technology Deep Learning from Scratch chapter 6. Learning-related skills JaeYeop Jeong
  • 2. ■Intro ■Optimizer ■Initial Value of Weight ■Overcome of Overfitting ■Hyper Parameter Agenda Interaction Lab., Kumoh National Institue of Technology 2
  • 3. ■Optimization  A parameter that reduces the value of the loss function • Gradient  SGD • 𝑊 = 𝑊 − η 𝜎𝐿 𝜎𝑊 Intro(1/4) Interaction Lab., Kumoh National Institue of Technology 3
  • 4. ■SGD  𝑓 𝑥, 𝑦 = 1 20 𝑥2 + 𝑦2 Intro(2/4) Interaction Lab., Kumoh National Institue of Technology 4
  • 5. ■SGD  (-7, 2) start Intro(3/4) Interaction Lab., Kumoh National Institue of Technology 5
  • 6. ■SGD Intro(4/4) Interaction Lab., Kumoh National Institue of Technology 6
  • 7. ■Momentum  𝑣 ← 𝑎𝑣 − η 𝜎𝐿 𝜎𝑊  W ← 𝑊 + 𝑣 Optimizer(1/3) Interaction Lab., Kumoh National Institue of Technology 7
  • 8. ■AdaGrad  ℎ ← ℎ + 𝜎𝐿 𝜎𝑊 ∙ 𝜎𝐿 𝜎𝑊  𝑊 ← 𝑊 − η 1 ℎ ∙ 𝜎𝐿 𝜎𝑊  Learning rate decay  RMSProp Optimizer(2/3) Interaction Lab., Kumoh National Institue of Technology 8
  • 9. ■Adam  AdaGrad + Momentum Optimizer(3/3) Interaction Lab., Kumoh National Institue of Technology 9
  • 10. ■In case of 0  Bad idea • All weights are updated equally in backpropagation • Learning is not working effectively Initial value of weight(1/11) Interaction Lab., Kumoh National Institue of Technology 10
  • 11. ■In case of  Using sigmoid  Normal distribution with 1 standard deviation  Gradient vanishing Initial value of weight(2/11) Interaction Lab., Kumoh National Institue of Technology 11
  • 12. ■In case of  Using sigmoid  Normal distribution with 0.01 standard deviation  Representation spectrum Initial value of weight(3/11) Interaction Lab., Kumoh National Institue of Technology 12
  • 13. ■In case of Xavier value  Using sigmoid  Before N node  Normal distribution with 1 𝑛 standard deviation Initial value of weight(4/11) Interaction Lab., Kumoh National Institue of Technology 13
  • 14. ■In case of He value  Using ReLU  Before N node  Normal distribution with 2 𝑛 standard deviation Initial value of weight(5/11) Interaction Lab., Kumoh National Institue of Technology 14
  • 15. ■Batch normalization  Force distribution of activation values  Learning speed improvement  Does not depend on initial value  Suppression of overfitting Initial value of weight(6/11) Interaction Lab., Kumoh National Institue of Technology 15
  • 16. ■Batch normalization  Insert “Batch Norm” layer • Adjust so that the activation value is properly distribution Initial value of weight(7/11) Interaction Lab., Kumoh National Institue of Technology 16
  • 17. ■Batch normalization Initial value of weight(8/11) Interaction Lab., Kumoh National Institue of Technology 17 {x1, x2, x3, …, xn} {𝒙1, 𝒙 2, 𝒙 3, …, 𝒙 n} Mini-batch mean Mini-batch variance normalize
  • 18. Initial value of weight(8/11) Interaction Lab., Kumoh National Institue of Technology 18
  • 19. Initial value of weight(9/11) Interaction Lab., Kumoh National Institue of Technology 19
  • 20. ■Batch normalization Initial value of weight(10/11) Interaction Lab., Kumoh National Institue of Technology 20
  • 21. ■Batch normalization Initial value of weight(11/11) Interaction Lab., Kumoh National Institue of Technology 21
  • 22. ■Model with many parameter and high expressiveness ■Little training data Overcome of Overfitting(1/3) Interaction Lab., Kumoh National Institue of Technology 22
  • 23. ■Weight decay  In learning, penalize large weight  Loss + 1 2 λ𝑊2  λ: hyper parameter • If λ is large, penalize weights  1 2 λ𝑊2 → λ𝑊 Overcome of Overfitting(2/3) Interaction Lab., Kumoh National Institue of Technology 23
  • 24. ■Dropout Overcome of Overfitting(3/3) Interaction Lab., Kumoh National Institue of Technology 24
  • 25. ■Hyper parameter  Number of neuron  Batch size  Learning rate  Etc… Hyper parameter(1/3) Interaction Lab., Kumoh National Institue of Technology 25
  • 26. ■Training data  Only train ■Test data  Only test ■Validation data  Adjust hyper parameter Hyper parameter(2/3) Interaction Lab., Kumoh National Institue of Technology 26
  • 27. ■Optimization  Setting the range of value  Randomization  Evaluation after learning with the extracted value  Repeat and narrow down Hyper parameter(3/3) Interaction Lab., Kumoh National Institue of Technology 27
  • 28. Q&A Interaction Lab., Kumoh National Institue of Technology 28

Editor's Notes

  1. 그릇에 구슬이 구르듯이 값을 탐색 방향성을 가지고 현재 방향에서 일정 값 더 탐색
  2. 학습률 감소를 이용해서 값을 탐색 많이 갱신되는 가중치는 최적 값에 가까이 갔다고 판단 그 후는 조금씩 탐색 기울기 역수 값이 계속 곱해져서 언젠가는 0에 가까움 값 -> 기울기 손실 RMSProp 그 전 기울기 값보다 최신 기울기 값이 더 반영되게 하는 것
  3. 둘이 장점 합친거 adaGrad에 갱신 되는 값을 조절해줘서 처음엔 크지만 점점 조금씪 탐색 모멘텀에서 방향성을 가지면서 값을 탐색
  4. 학습에 관련된 기법 중 가중치 초기 값 결정이 중요함
  5. 표준편차가 1인 정규분포에서 가중치 값을 초기화 표준편차 1이면 큰 값 따라서 넓게 분포 즉 분산이 크다 시그모이드 함수에서 대부분 0과 1에 분포
  6. 표준편차 0.01 정규분포 가중치 값 초기화 중앙에 값이 분포 각 노드들이 대부분 같은 값을 가지는 것은 표현력이 제한
  7. 세이버 값 앞에 노드 개수가 N 개일때 표준편차 루트 1/n을 정규분포 가중치 초기화 값 망이 깊어 갈수록 모양이 일그러지지만 나름 좋음 sigmoid와 사용할 때 좋음
  8. ReLU를 사용할 때 사용하는 He 초기값 표준편차 루트 2/n을 정규분포 가중치 초기화 값으로 사용 0에 많은 값이 몰린 이유는 ReLU 수식에서 음수는 다 0이기 때문에 그런것이라고 생각
  9. 앞에서 활성화 함수 값의 분포를 위한 가중치 값들의 초기값을 결정에 대해서 알아봤는데, 각 노드에 활성화 값을 강제로 분포 하는 방식이 배치정규화 학습이 빠르다(학습률 더 조절 가능(정규화해주기 때문에)) 초기 가중치 값 설정할 필요 없음 과적합 방지(입력 값을 정규화 해줘서 0~1사이 값으로 만들어주기 때문에 가중치 갱신에 큰 영향 없게)
  10. 레이어 사이에 배치 정규화 레이어 삽입
  11. 입력 미니배치 x에 평균 분산을 구해서 정규화 한다.
  12. 각 데이터에서 같은 feature끼리 평균과 분산을 구해서 다음 식으로 정규화 한다 즉, 0과 1사이 값으로 변경시켜줌
  13. 다음 그림과 같이 각 데이터가 어떤 모습에 분포를 가지더라도 오른쪽으로 정규화 가능
  14. 배치 정규화를 사용한 것과 사용하지 않은 것들의 차이
  15. 가중치 초기 값을 정해주는거랑 배치 종규화 사용 \
  16. 매개변수가 많거나 표현력이 높은 모델 적은 훈련 데이터
  17. 가중치 감소 기법 가중치가 큰 값 즉 학습에 영향력이 큰 가중치에는 패널티를 주는 방법 손실함수에 ½람다가중치제곱 갑을 곱하는데, 여기서 람다는 사용자가 정하는 하이퍼 파라미터로써 값이 클수록 큰 패널티를 줄 수 있고 앞에 상수 값은 전체 패널티 값 결정 즉 손실함수에 값을 추가해줌으로써 이 가중치가 중요하지 않다라는 것을 표현 역전파에서는 미분한 값을 더해서 갱신에도 영향을 줄 수 있음
  18. 학습 중에 임의에 노드들을 삭제해서 학습하는 방법 모든 노드르 사용하지 않고 매번 삭제하는 노드를 바꿈으로써 매번 다른 모델을 학습시키는 것 테스트할 때는 모든 뉴런 사용
  19. 값의 범위를 설정한다. (0.001 ~ 1000) 랜덤으로 추출 추출된 값으로 학습하고 검증데이터로 검증 반복하고 조절