deep learning from scratch chapter 5.learning related skills

•Download as PPTX, PDF•

1 like•39 views

This document discusses techniques for deep learning, including optimization methods like stochastic gradient descent, momentum, AdaGrad, RMSProp, and Adam. It also covers initializing weight values, preventing overfitting through techniques like batch normalization, weight decay, and dropout. Hyperparameter tuning is also addressed, such as adjusting the learning rate, batch size, number of neurons, and selecting hyperparameters through validation data.

Software

Interaction Lab. Kumoh National Institute of Technology
Deep Learning from Scratch
chapter 6. Learning-related skills
JaeYeop Jeong

■Intro
■Optimizer
■Initial Value of Weight
■Overcome of Overfitting
■Hyper Parameter
Agenda
Interaction Lab., Kumoh National Institue of Technology 2

■Optimization
 A parameter that reduces the value of the loss function
• Gradient
 SGD
• 𝑊 = 𝑊 − η
𝜎𝐿
𝜎𝑊
Intro(1/4)
Interaction Lab., Kumoh National Institue of Technology 3

■SGD
 𝑓 𝑥, 𝑦 =
1
20
𝑥2
+ 𝑦2
Intro(2/4)
Interaction Lab., Kumoh National Institue of Technology 4

■SGD
 (-7, 2) start
Intro(3/4)
Interaction Lab., Kumoh National Institue of Technology 5

■SGD
Intro(4/4)
Interaction Lab., Kumoh National Institue of Technology 6

■Momentum
 𝑣 ← 𝑎𝑣 − η
𝜎𝐿
𝜎𝑊
 W ← 𝑊 + 𝑣
Optimizer(1/3)
Interaction Lab., Kumoh National Institue of Technology 7

■AdaGrad
 ℎ ← ℎ +
𝜎𝐿
𝜎𝑊
∙
𝜎𝐿
𝜎𝑊
 𝑊 ← 𝑊 − η
1
ℎ
∙
𝜎𝐿
𝜎𝑊
 Learning rate decay
 RMSProp
Optimizer(2/3)
Interaction Lab., Kumoh National Institue of Technology 8

■Adam
 AdaGrad + Momentum
Optimizer(3/3)
Interaction Lab., Kumoh National Institue of Technology 9

■In case of 0
 Bad idea
• All weights are updated equally in backpropagation
• Learning is not working effectively
Initial value of weight(1/11)
Interaction Lab., Kumoh National Institue of Technology 10

■In case of
 Using sigmoid
 Normal distribution with 1 standard deviation
 Gradient vanishing
Initial value of weight(2/11)
Interaction Lab., Kumoh National Institue of Technology 11

■In case of
 Using sigmoid
 Normal distribution with 0.01 standard deviation
 Representation spectrum
Initial value of weight(3/11)
Interaction Lab., Kumoh National Institue of Technology 12

■In case of Xavier value
 Using sigmoid
 Before N node
 Normal distribution with
1
𝑛
standard deviation
Initial value of weight(4/11)
Interaction Lab., Kumoh National Institue of Technology 13

■In case of He value
 Using ReLU
 Before N node
 Normal distribution with
2
𝑛
standard deviation
Initial value of weight(5/11)
Interaction Lab., Kumoh National Institue of Technology 14

■Batch normalization
 Force distribution of activation values
 Learning speed improvement
 Does not depend on initial value
 Suppression of overfitting
Initial value of weight(6/11)
Interaction Lab., Kumoh National Institue of Technology 15

■Batch normalization
 Insert “Batch Norm” layer
• Adjust so that the activation value is properly distribution
Initial value of weight(7/11)
Interaction Lab., Kumoh National Institue of Technology 16

■Batch normalization
Initial value of weight(8/11)
Interaction Lab., Kumoh National Institue of Technology 17
{x1, x2, x3, …, xn} {𝒙1, 𝒙 2, 𝒙 3, …, 𝒙 n}
Mini-batch mean
Mini-batch variance
normalize

Initial value of weight(8/11)
Interaction Lab., Kumoh National Institue of Technology 18

Initial value of weight(9/11)
Interaction Lab., Kumoh National Institue of Technology 19

■Batch normalization
Initial value of weight(10/11)
Interaction Lab., Kumoh National Institue of Technology 20

■Batch normalization
Initial value of weight(11/11)
Interaction Lab., Kumoh National Institue of Technology 21

■Model with many parameter and high expressiveness
■Little training data
Overcome of Overfitting(1/3)
Interaction Lab., Kumoh National Institue of Technology 22

■Weight decay
 In learning, penalize large weight
 Loss +
1
2
λ𝑊2
 λ: hyper parameter
• If λ is large, penalize weights

1
2
λ𝑊2
→ λ𝑊
Overcome of Overfitting(2/3)
Interaction Lab., Kumoh National Institue of Technology 23

■Dropout
Overcome of Overfitting(3/3)
Interaction Lab., Kumoh National Institue of Technology 24

■Hyper parameter
 Number of neuron
 Batch size
 Learning rate
 Etc…
Hyper parameter(1/3)
Interaction Lab., Kumoh National Institue of Technology 25

■Training data
 Only train
■Test data
 Only test
■Validation data
 Adjust hyper parameter
Hyper parameter(2/3)
Interaction Lab., Kumoh National Institue of Technology 26

■Optimization
 Setting the range of value
 Randomization
 Evaluation after learning with the extracted value
 Repeat and narrow down
Hyper parameter(3/3)
Interaction Lab., Kumoh National Institue of Technology 27

Q&A
Interaction Lab., Kumoh National Institue of Technology 28

What's hot

Machine Learning - Ensemble Methods

Andrew Ferlitsch

Presentation_OCR

samvb18

Cross-validation aggregation for forecasting

Devon Barrow

Ensemble hybrid learning technique

DishaSinha9

Racing for unbalanced methods selection

Andrea Dal Pozzolo

The difficulties associated with using mathematical optimization on large-scale engineering problems have contributed to the development of alternative solutions. Linear programming and dynamic programming techniques, for example, often fail (or reach local optimum) in solving NP-hard problems with large number of variables and non-linear objective functions. To overcome these problems, researchers have proposed evolutionary-based algorithms for searching near-optimum solutions to problems. Evolutionary algorithms (EAs) are stochastic search methods that mimic the metaphor of natural biological evolution and/or the social behaviour of species. Examples include how ants find the shortest route to a source of food and how birds find their destination during migration. The behaviour of such species is guided by learning, adaptation, and evolution. To mimic the efficient behaviour of these species, various researchers have developed computational systems that seek fast and robust solutions to complex optimization problems. The first evolutionary-based technique introduced in the literature was the genetic algorithms (Gas). GAs were developed based on the Darwinian principle of the ‘survival of the fittest’ and the natural process of evolution through reproduction. Based on its demonstrated ability to reach near-optimum solutions to large problems, the GAs technique has been used in many applicationsin science and engineering. Despite their benefits, GAs may require long processing time for a near optimum solution to evolve. Also, not all problems lend themselves well to a solution with GAs.

Optimization Shuffled Frog Leaping Algorithm

Uday Wankar

Boosting Algorithms Omar Odibat

omarodibat

(Machine Learning) Ensemble learning

Omkar Rane

안녕하세요 딥러닝 논문읽기 모임입니다 오늘 업로드된 논문 리뷰 영상은 2021 WACB 에서 발표된 Adversarial Reinforced Learning for Unsupervised Domain Adaptation 라는 제목의 논문입니다. 데이터 분류의 자동화를 위해서는 많은양의 학습데이터가 필요합니다. 그렇기에 레이블이 존재하는 데이터로 학습이 끝난 모델을 재활용해서 새로운 도메인에 적용하는 연구인 도메인 어뎁션 분야는 많은 각광을 받고 있습니다. 논문의 특징으로는 크게 세가지를 둘 수 있습니다. 첫 번째로 본 논문에서는 GAN을 이용하여 비지도 방식으로 도메인 어뎁션이 가능한 프레임워크를 제안하였습니다 여기서 이제 강화학습 모델은 소스와 타겟 도메인간 가장 최적의 피처쌍을 선택하는데 사용됩니다 두 번째로 레이블링 되지않은 타겟 도메인에서 가장 적합한 피처를 찾아내기 위해 소스와 타겟간 상관관계를 보상으로 적용하는 정책을 개발하였습니다 마지막으로 제안된 적대적 강화학습 모델을 소스와 타겟 도메인간 최소화하는 피처쌍의 탐색과 각 도메인의 거리 분포상태의 Alignment 학습을 통해 소타대비 이제 성능을 향상 하였습니다 논문에 대한 디테일한 리뷰를 펀디멘탈팀 이근배님이 많은 도움 주셨습니다!

Adversarial Reinforced Learning for Unsupervised Domain Adaptation

taeseon ryu

safe and efficient off policy reinforcement learning

Ryo Iwaki

Machine learning with ADA Boost

Aman Patel

H2O World - Ensembles with Erin LeDell

Sri Ambati

Slide explaining the distinction between bagging and boosting while understanding the bias variance trade-off. Followed by some lesser known scope of supervised learning. understanding the effect of tree split metric in deciding feature importance. Then understanding the effect of threshold on classification accuracy. Additionally, how to adjust model threshold for classification in supervised learning. Note: Limitation of Accuracy metric (baseline accuracy), alternative metrics, their use case and their advantage and limitations were briefly discussed.

Understanding Bagging and Boosting

Mohit Rajput

Ensemble learning

Haris Jamil

Mlp mixer an all-mlp architecture for vision

Jaey Jeong

Caching strategies for in memory neighborhood-based recommender systems

Simon Dooms

Using HOG Descriptors on Superpixels for Human Detection of UAV Imagery

Wai Nwe Tun

Boosting Approach to Solving Machine Learning Problems

Dr Sulaimon Afolabi

Decision Forests and discriminant analysis

potaters

Kaggle kenneth

kenluck2001

What's hot (20)

Machine Learning - Ensemble Methods

Presentation_OCR

Cross-validation aggregation for forecasting

Ensemble hybrid learning technique

Racing for unbalanced methods selection

Optimization Shuffled Frog Leaping Algorithm

Boosting Algorithms Omar Odibat

(Machine Learning) Ensemble learning

Adversarial Reinforced Learning for Unsupervised Domain Adaptation

safe and efficient off policy reinforcement learning

Machine learning with ADA Boost

H2O World - Ensembles with Erin LeDell

Understanding Bagging and Boosting

Ensemble learning

Mlp mixer an all-mlp architecture for vision

Caching strategies for in memory neighborhood-based recommender systems

Using HOG Descriptors on Superpixels for Human Detection of UAV Imagery

Boosting Approach to Solving Machine Learning Problems

Decision Forests and discriminant analysis

Kaggle kenneth

Similar to deep learning from scratch chapter 5.learning related skills

deep learning from scratch chapter 6.backpropagation

Jaey Jeong

Tablet gaze unconstrained appearance based gaze estimation in mobile tablets

Jaey Jeong

Presentation1

Ashish Meshram

Unsupervised representation learning for gaze estimation

Jaey Jeong

Appearance based gaze estimation using deep features and random forest regres...

Jaey Jeong

The talk will focus on 1. Forecasting 2. Anomaly Detection This will take a dive into common methods of doing time series analysis, introduce a new algorithm for online ARIMA, and a number of variations of Kalman filters with barebone implementations in Python. A Python implementation of a anomaly detection system on data stream with a deep dive into the mathematics that will be explained in clear layman's term. We will work through a easy group exercise to internalize the concepts. The talk will discuss how to deploy machine learning module in a production. We discuss lessons learnt in practice and conclusion.

Tracking the tracker: Time Series Analysis in Python from First Principles

kenluck2001

Deep learning based gaze detection system for automobile drivers using nir ca...

Jaey Jeong

deep learning from scratch chapter 4.neural network learing

Jaey Jeong

Optimization of Unit Commitment Problem using Classical Soft Computing Techni...

IRJET Journal

Similar to deep learning from scratch chapter 5.learning related skills (9)

deep learning from scratch chapter 6.backpropagation

Tablet gaze unconstrained appearance based gaze estimation in mobile tablets

Presentation1

Unsupervised representation learning for gaze estimation

Appearance based gaze estimation using deep features and random forest regres...

Tracking the tracker: Time Series Analysis in Python from First Principles

Deep learning based gaze detection system for automobile drivers using nir ca...

deep learning from scratch chapter 4.neural network learing

Optimization of Unit Commitment Problem using Classical Soft Computing Techni...

Recently uploaded

Studiovity film pre-production and screenwriting software

info611746

Skilrock is an igaming technology platform & lottery software provider to the Global Lottery & Gaming Industry with a strong presence across continents. As part of the $2.4 billion SUGAL & DAMANI GROUP, Skilrock leverages the group's extensive domain experience to offer the INFINITI gaming platform - a true Omni-Channel, Omni-Gaming Platform that can serve any lottery or gaming operator anywhere in the world. INFINITI serves Retail, iGaming & Self Service Channels with equal ease and at the same time supports a wide variety of games like Lotto, Keno, Bingo, Instant, Sports, Poker, Rummy, Casino and Slots. Our popular solutions are Scratch Lottery, Retail Lottery, Instant Lottery and other igaming and lottery solutions.

iGaming Platform & Lottery Solutions by Skilrock

Skilrock Technologies

A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1

KnowledgeSeed

Breaking the Code : A Guide to WhatsApp Business API.pdf

Meon Technology

Into the Box 2024 - Keynote Day 2 Slides.pdf

Ortus Solutions, Corp

AI/ML Infra Meetup | Perspective on Deep Learning Framework

Alluxio, Inc.

Advanced Flow Concepts Every Developer Should Know

Peter Caitens

De mooiste recreatieve routes ontdekken met RouteYou en FME

Jelle | Nordend

Unlocking Business Potential: Tailored Technology Solutions by Prosigns Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support. Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth. Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices. AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making. Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency. DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration. Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly. Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business. Join us on a journey of innovation and growth. Let's partner for success with Prosigns.

Prosigns: Transforming Business with Tailored Technology Solutions

Prosigns

Key takeaways: Challenges of building platforms and the benefits of platformless. Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience. How Choreo enables the platformless experience. How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo. Demo of an end-to-end app built and deployed on Choreo.

Accelerate Enterprise Software Engineering with Platformless

WSO2

CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.

Cyaniclab : Software Development Agency Portfolio.pdf

Cyanic lab

Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have. For more Tendenci AMS events, check out www.tendenci.com/events

Corporate Management | Session 3 of 3 | Tendenci AMS

Tendenci - The Open Source AMS (Association Management Software)

Top Mobile App Development Companies 2024

XongoLab Technologies LLP

Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.

Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...

Shahin Sheidaei

"Introduction to Windows 7" serves as the foundational chapter in our guide, setting the stage for understanding the key features and functionalities of this operating system. Windows 7, released by Microsoft in 2009, quickly became one of the most popular and widely used versions of Windows due to its user-friendly interface, stability, and performance improvements over its predecessor, Windows Vista. This chapter begins by providing an overview of the Windows 7 operating system, highlighting its key attributes and improvements compared to earlier versions of Windows. It introduces users to the visual enhancements such as Aero Glass, the revamped taskbar (also known as the Superbar), and the redesigned Start menu, which all contribute to a more intuitive and streamlined user experience. Furthermore, "Introduction to Windows 7" delves into the architecture and system requirements of the operating system, helping users understand what hardware specifications are necessary for optimal performance. It covers topics such as processor requirements, RAM, disk space, and graphics capabilities, ensuring that readers have a clear 3 understanding of the hardware prerequisites for running Windows 7 smoothly. Additionally, this chapter explores the various editions of Windows 7, including Home Premium, Professional, Ultimate, and Enterprise, outlining the differences between them and helping users choose the edition that best suits their needs and requirements. Moreover, "Introduction to Windows 7" provides an overview of the installation process, guiding users through the steps required to install or upgrade to Windows 7 on their computers. It covers topics such as preparing for installation, choosing the installation type (upgrade or custom), partitioning disks, and configuring initial settings. In summary, "Introduction to Windows 7" serves as a comprehensive primer for users who are new to the operating system or seeking to refresh their understanding. By familiarizing themselves with the core concepts and features of Windows 7, readers can lay a solid foundation for exploring more advanced topics covered in subsequent chapters of this guide.

Mastering Windows 7 A Comprehensive Guide for Power Users .pdf

mbmh111980

The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month. The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies. However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News. Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!

SOCRadar Research Team: Latest Activities of IntelBroker

SOCRadar

Agnieszka Andrzejewska - BIM School Course in Kraków

bim.edu.pl

In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey. Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience. Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system. Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.

Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL

Natan Silnitsky

AI/ML Infra Meetup May. 23, 2024 Organized by Alluxio For more Alluxio Events: https://www.alluxio.io/events/ Speaker: - Junchen Jiang (Assistant Professor of Computer Science, @University of Chicago) Prefill in LLM inference is known to be resource-intensive, especially for long LLM inputs. While better scheduling can mitigate prefill’s impact, it would be fundamentally better to avoid (most of) prefill. This talk introduces our preliminary effort towards drastically minimizing prefill delay for LLM inputs that naturally reuse text chunks, such as in retrieval-augmented generation. While keeping the KV cache of all text chunks in memory is difficult, we show that it is possible to store them on cheaper yet slower storage. By improving the loading process of the reused KV caches, we can still significantly speed up prefill delay while maintaining the same generation quality.

AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG

Alluxio, Inc.

AI/ML Infra Meetup May. 23, 2024 Organized by Alluxio For more Alluxio Events: https://www.alluxio.io/events/ Speaker: - Lu Qiu (Data & AI Platform Tech Lead, @Alluxio) - Siyuan Sheng (Senior Software Engineer, @Alluxio) Speed and efficiency are two requirements for the underlying infrastructure for machine learning model development. Data access can bottleneck end-to-end machine learning pipelines as training data volume grows and when large model files are more commonly used for serving. For instance, data loading can constitute nearly 80% of the total model training time, resulting in less than 30% GPU utilization. Also, loading large model files for deployment to production can be slow because of slow network or storage read operations. These challenges are prevalent when using popular frameworks like PyTorch, Ray, or HuggingFace, paired with cloud object storage solutions like S3 or GCS, or downloading models from the HuggingFace model hub. In this presentation, Lu and Siyuan will offer comprehensive insights into improving speed and GPU utilization for model training and serving. You will learn: - The data loading challenges hindering GPU utilization - The reference architecture for running PyTorch and Ray jobs while reading data from S3, with benchmark results of training ResNet50 and BERT - Real-world examples of boosting model performance and GPU utilization through optimized data access

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...

Alluxio, Inc.

Recently uploaded (20)

Studiovity film pre-production and screenwriting software

iGaming Platform & Lottery Solutions by Skilrock

A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1

Breaking the Code : A Guide to WhatsApp Business API.pdf

Into the Box 2024 - Keynote Day 2 Slides.pdf

AI/ML Infra Meetup | Perspective on Deep Learning Framework

Advanced Flow Concepts Every Developer Should Know

De mooiste recreatieve routes ontdekken met RouteYou en FME

Prosigns: Transforming Business with Tailored Technology Solutions

Accelerate Enterprise Software Engineering with Platformless

Cyaniclab : Software Development Agency Portfolio.pdf

Corporate Management | Session 3 of 3 | Tendenci AMS

Top Mobile App Development Companies 2024

Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...

Mastering Windows 7 A Comprehensive Guide for Power Users .pdf

SOCRadar Research Team: Latest Activities of IntelBroker

Agnieszka Andrzejewska - BIM School Course in Kraków

Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL

AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG

AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...

deep learning from scratch chapter 5.learning related skills

1. Interaction Lab. Kumoh National Institute of Technology Deep Learning from Scratch chapter 6. Learning-related skills JaeYeop Jeong

2. ■Intro ■Optimizer ■Initial Value of Weight ■Overcome of Overfitting ■Hyper Parameter Agenda Interaction Lab., Kumoh National Institue of Technology 2

3. ■Optimization  A parameter that reduces the value of the loss function • Gradient  SGD • 𝑊 = 𝑊 − η 𝜎𝐿 𝜎𝑊 Intro(1/4) Interaction Lab., Kumoh National Institue of Technology 3

4. ■SGD  𝑓 𝑥, 𝑦 = 1 20 𝑥2 + 𝑦2 Intro(2/4) Interaction Lab., Kumoh National Institue of Technology 4

5. ■SGD  (-7, 2) start Intro(3/4) Interaction Lab., Kumoh National Institue of Technology 5

6. ■SGD Intro(4/4) Interaction Lab., Kumoh National Institue of Technology 6

7. ■Momentum  𝑣 ← 𝑎𝑣 − η 𝜎𝐿 𝜎𝑊  W ← 𝑊 + 𝑣 Optimizer(1/3) Interaction Lab., Kumoh National Institue of Technology 7

8. ■AdaGrad  ℎ ← ℎ + 𝜎𝐿 𝜎𝑊 ∙ 𝜎𝐿 𝜎𝑊  𝑊 ← 𝑊 − η 1 ℎ ∙ 𝜎𝐿 𝜎𝑊  Learning rate decay  RMSProp Optimizer(2/3) Interaction Lab., Kumoh National Institue of Technology 8

9. ■Adam  AdaGrad + Momentum Optimizer(3/3) Interaction Lab., Kumoh National Institue of Technology 9

10. ■In case of 0  Bad idea • All weights are updated equally in backpropagation • Learning is not working effectively Initial value of weight(1/11) Interaction Lab., Kumoh National Institue of Technology 10

11. ■In case of  Using sigmoid  Normal distribution with 1 standard deviation  Gradient vanishing Initial value of weight(2/11) Interaction Lab., Kumoh National Institue of Technology 11

12. ■In case of  Using sigmoid  Normal distribution with 0.01 standard deviation  Representation spectrum Initial value of weight(3/11) Interaction Lab., Kumoh National Institue of Technology 12

13. ■In case of Xavier value  Using sigmoid  Before N node  Normal distribution with 1 𝑛 standard deviation Initial value of weight(4/11) Interaction Lab., Kumoh National Institue of Technology 13

14. ■In case of He value  Using ReLU  Before N node  Normal distribution with 2 𝑛 standard deviation Initial value of weight(5/11) Interaction Lab., Kumoh National Institue of Technology 14

15. ■Batch normalization  Force distribution of activation values  Learning speed improvement  Does not depend on initial value  Suppression of overfitting Initial value of weight(6/11) Interaction Lab., Kumoh National Institue of Technology 15

16. ■Batch normalization  Insert “Batch Norm” layer • Adjust so that the activation value is properly distribution Initial value of weight(7/11) Interaction Lab., Kumoh National Institue of Technology 16

17. ■Batch normalization Initial value of weight(8/11) Interaction Lab., Kumoh National Institue of Technology 17 {x1, x2, x3, …, xn} {𝒙1, 𝒙 2, 𝒙 3, …, 𝒙 n} Mini-batch mean Mini-batch variance normalize

18. Initial value of weight(8/11) Interaction Lab., Kumoh National Institue of Technology 18

19. Initial value of weight(9/11) Interaction Lab., Kumoh National Institue of Technology 19

20. ■Batch normalization Initial value of weight(10/11) Interaction Lab., Kumoh National Institue of Technology 20

21. ■Batch normalization Initial value of weight(11/11) Interaction Lab., Kumoh National Institue of Technology 21

22. ■Model with many parameter and high expressiveness ■Little training data Overcome of Overfitting(1/3) Interaction Lab., Kumoh National Institue of Technology 22

23. ■Weight decay  In learning, penalize large weight  Loss + 1 2 λ𝑊2  λ: hyper parameter • If λ is large, penalize weights  1 2 λ𝑊2 → λ𝑊 Overcome of Overfitting(2/3) Interaction Lab., Kumoh National Institue of Technology 23

24. ■Dropout Overcome of Overfitting(3/3) Interaction Lab., Kumoh National Institue of Technology 24

25. ■Hyper parameter  Number of neuron  Batch size  Learning rate  Etc… Hyper parameter(1/3) Interaction Lab., Kumoh National Institue of Technology 25

26. ■Training data  Only train ■Test data  Only test ■Validation data  Adjust hyper parameter Hyper parameter(2/3) Interaction Lab., Kumoh National Institue of Technology 26

27. ■Optimization  Setting the range of value  Randomization  Evaluation after learning with the extracted value  Repeat and narrow down Hyper parameter(3/3) Interaction Lab., Kumoh National Institue of Technology 27

28. Q&A Interaction Lab., Kumoh National Institue of Technology 28

Editor's Notes

그릇에 구슬이 구르듯이 값을 탐색 방향성을 가지고 현재 방향에서 일정 값 더 탐색
학습률 감소를 이용해서 값을 탐색 많이 갱신되는 가중치는 최적 값에 가까이 갔다고 판단 그 후는 조금씩 탐색 기울기 역수 값이 계속 곱해져서 언젠가는 0에 가까움 값 -> 기울기 손실 RMSProp 그 전 기울기 값보다 최신 기울기 값이 더 반영되게 하는 것
둘이 장점 합친거 adaGrad에 갱신 되는 값을 조절해줘서 처음엔 크지만 점점 조금씪 탐색 모멘텀에서 방향성을 가지면서 값을 탐색
학습에 관련된 기법 중 가중치 초기 값 결정이 중요함
표준편차가 1인 정규분포에서 가중치 값을 초기화 표준편차 1이면 큰 값 따라서 넓게 분포 즉 분산이 크다 시그모이드 함수에서 대부분 0과 1에 분포
표준편차 0.01 정규분포 가중치 값 초기화 중앙에 값이 분포 각 노드들이 대부분 같은 값을 가지는 것은 표현력이 제한
세이버 값 앞에 노드 개수가 N 개일때 표준편차 루트 1/n을 정규분포 가중치 초기화 값 망이 깊어 갈수록 모양이 일그러지지만 나름 좋음 sigmoid와 사용할 때 좋음
ReLU를 사용할 때 사용하는 He 초기값 표준편차 루트 2/n을 정규분포 가중치 초기화 값으로 사용 0에 많은 값이 몰린 이유는 ReLU 수식에서 음수는 다 0이기 때문에 그런것이라고 생각
앞에서 활성화 함수 값의 분포를 위한 가중치 값들의 초기값을 결정에 대해서 알아봤는데, 각 노드에 활성화 값을 강제로 분포 하는 방식이 배치정규화 학습이 빠르다(학습률 더 조절 가능(정규화해주기 때문에)) 초기 가중치 값 설정할 필요 없음 과적합 방지(입력 값을 정규화 해줘서 0~1사이 값으로 만들어주기 때문에 가중치 갱신에 큰 영향 없게)
레이어 사이에 배치 정규화 레이어 삽입
입력 미니배치 x에 평균 분산을 구해서 정규화 한다.
각 데이터에서 같은 feature끼리 평균과 분산을 구해서 다음 식으로 정규화 한다 즉, 0과 1사이 값으로 변경시켜줌
다음 그림과 같이 각 데이터가 어떤 모습에 분포를 가지더라도 오른쪽으로 정규화 가능
배치 정규화를 사용한 것과 사용하지 않은 것들의 차이
가중치 초기 값을 정해주는거랑 배치 종규화 사용 \
매개변수가 많거나 표현력이 높은 모델 적은 훈련 데이터
가중치 감소 기법 가중치가 큰 값 즉 학습에 영향력이 큰 가중치에는 패널티를 주는 방법 손실함수에 ½람다가중치제곱 갑을 곱하는데, 여기서 람다는 사용자가 정하는 하이퍼 파라미터로써 값이 클수록 큰 패널티를 줄 수 있고 앞에 상수 값은 전체 패널티 값 결정 즉 손실함수에 값을 추가해줌으로써 이 가중치가 중요하지 않다라는 것을 표현 역전파에서는 미분한 값을 더해서 갱신에도 영향을 줄 수 있음
학습 중에 임의에 노드들을 삭제해서 학습하는 방법 모든 노드르 사용하지 않고 매번 삭제하는 노드를 바꿈으로써 매번 다른 모델을 학습시키는 것 테스트할 때는 모든 뉴런 사용
값의 범위를 설정한다. (0.001 ~ 1000) 랜덤으로 추출 추출된 값으로 학습하고 검증데이터로 검증 반복하고 조절

deep learning from scratch chapter 5.learning related skills

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to deep learning from scratch chapter 5.learning related skills

Similar to deep learning from scratch chapter 5.learning related skills (9)

More from Jaey Jeong

More from Jaey Jeong (6)

Recently uploaded

Recently uploaded (20)

deep learning from scratch chapter 5.learning related skills

Editor's Notes