deep learning from scratch chapter 4.neural network learing

•Download as PPTX, PDF•

0 likes•47 views

Jaey Jeong

deep learning from scratch

Software

■Neural network learning
■Loss function
■Differential
■Gradient method
■Learning algorithm implement
Agenda
Interaction Lab., Kumoh National Institue of Technology 2

■Data is important in machine learning.
■All problems can be solved in the same context.
 5, dog, human face
Neural network learning(1/2)
Interaction Lab., Kumoh National Institue of Technology 3
Human thought algorithm
Neural network
(deep learning)
Human thought feature
(SIFT, HOG, …)
Machine
learning
(SVM, KNN, …)
Output
Output
Output

■Divide into training data and test data.
 First, use only training data to find optimal parameters.
 And then, test training model.
 universal model
■Overfitting
Neural network learning(2/2)
Interaction Lab., Kumoh National Institue of Technology 4

■Use the loss function to find the optimal parameter.
 Mean squared error, cross entropy error
■Mean squared error(MSE)
 𝐸 =
1
2 𝑘(𝑦𝑘 − 𝑡𝑘)2
, (𝑦𝑘
= Neural network output, 𝑡𝑘 = lable, 𝑘 = dimesion of data)
 have a small error
Loss function
Interaction Lab., Kumoh National Institue of Technology 5
y -> softmax()
0.0975
0.5975

■Cross entropy error(CEE)
 𝐸 = − 𝑘 𝑡𝑘𝑙𝑜𝑔𝑦𝑘, (𝑦𝑘 = Neural network output, 𝑡𝑘 = lable, 𝑘 = dimesion of data) (𝑙𝑜𝑔𝑒)
• 𝑡𝑘 is one-hot encoding
• Calculate natural logarithm when practically correct
 𝑡𝑘 is ‘2’, 𝑦𝑘 = 0.6 − 𝑙𝑜𝑔0.6 = 0.51
 𝑡𝑘 is ‘2’, 𝑦𝑘 = 0.1 − 𝑙𝑜𝑔0.1 = 2.30
 That is CCE determines the full value of the output when correct
 have a small error
Loss function
Interaction Lab., Kumoh National Institue of Technology 6
+delta
+delta
0.5108
2.3025
y = logx

■Mini-Batch learning
 Machine learning problems are taught using data
• Obtain loss function for training data and find optimal parameters
• That is, if you have 100 training data, use 100 loss function values.
• BigData,,,
 𝐸 = −
1
𝑁 𝛱 𝑘 𝑡𝑛𝑘 log 𝑦𝑛𝑘 (𝑦𝑘 = Neural network output, 𝑡𝑘 = lable, 𝑘 = dimesion of data)
• Average loss function
■ regardless of the number of data
 Use only 100 out of 60000 data
• Learning use only 100 data
Loss function
Interaction Lab., Kumoh National Institue of Technology 7

■Mini-Batch implement
 CCE implement
• t : one-hot encoding
• t : not one-hot encoding
Loss function
Interaction Lab., Kumoh National Institue of Technology 8
batch_size = 5 - > [0, 1, 2, 3, 4]
t - > [2, 7, 0, 9, 4]
[y[0, 2], y[1, 7], y[2, 0], y[3, 9], y[4,4]]
[[0, 1, 2, 3, 4]]

■Why loss function? Why not accuracy?
 To find parameter values that draw high 'accuracy’
 Parameter values that make the loss function small
• Differential
■-
■ +
Loss function
Interaction Lab., Kumoh National Institue of Technology 9
Step
function
Sigmoid
function

■momentary variation

ⅆ𝑓 𝑥
ⅆ𝑥
= lim
ℎ→0
𝑓 𝑥+ℎ −𝑓 𝑥−ℎ
2ℎ
• 𝑦 = 0.01𝑥2 + 0.1𝑥
• 𝑦 = 0.02𝑥 + 0.1
Differential
Interaction Lab., Kumoh National Institue of Technology 10
x = 5 x = 10

■Partial differential
 𝑓 𝑥0, 𝑥1 = 𝑥0
2
+ 𝑥1
2
• x = 3.0
• x = 4.0
Differential
Interaction Lab., Kumoh National Institue of Technology 11

■Gradient method
 Loss function minimum value
• Use gradient
■ Not always correct but it’s hint.
 Move a certain distance after calculation
Gradient method
Interaction Lab., Kumoh National Institue of Technology 12
Learning rate
(0.01, 0.001,,,)
초기값 : (-3.0, 4)

Gradient method
Interaction Lab., Kumoh National Institue of Technology 13
■Learning rate
10.0 1e-10

■Gradient in Neural network
Gradient method
Interaction Lab., Kumoh National Institue of Technology 14

■Stochastic gradient descent (SGD)
 1. Mini-batch
 2. Calculate the gradient
 3. Update parameters
 4. Repeat
Learning algorithm implement
Interaction Lab., Kumoh National Institue of Technology 15

Q&A
Interaction Lab., Kumoh National Institue of Technology 16

What's hot

SIGNATE産業技術総合研究所衛星画像分析コンテスト２位入賞モデルの工夫点Ken'ichi Matsui

さらば！データサイエンティストShohei Hido

Neko kinShota Okubo

異常検知と変化検知の1~3章をまとめてみたTakahiro Yoshizawa

ZytleBot: ROSベースの自律移動ロボットへのFPGAの統合に向けてHideki Takase

lispmeetup#63 Common Lispでゼロから作るDeep LearningSatoshi imai

[DL輪読会]Deep Learning 第5章機械学習の基礎Deep Learning JP

Rによるデータサイエンス：１２章「時系列」Nagi Teramo

新卒2年目が鍛えられたコードレビュー道場Recruit Technologies

人工知能研究のための視覚情報処理Koki Nakamura

Ethereumのシャーディング概論bitbank, Inc. Tokyo, Japan

Rで学ぶ逆変換（逆関数）法Nagi Teramo

llvm入門MITSUNARI Shigeo

画像処理基礎大貴末廣

ロボコン勉強会向けStm32を用いてマスタースレーブシステムDoNabe1

普通の人が勉強会で発表するために必要な準備のすべて～入門パブリック・スピーキングMasahito Zembutsu

強くなるためのプログラミング -プログラミングに関する様々なコンテストとそのはじめ方-#pyconjpcocodrips

グラフと木京大マイコンクラブ

深層学習　第6章KCS Keio Computer Society

[DL輪読会]Deep Learning 第6章深層順伝播型ネットワークDeep Learning JP

What's hot (20)

SIGNATE産業技術総合研究所衛星画像分析コンテスト２位入賞モデルの工夫点

さらば！データサイエンティスト

Neko kin

異常検知と変化検知の1~3章をまとめてみた

ZytleBot: ROSベースの自律移動ロボットへのFPGAの統合に向けて

lispmeetup#63 Common Lispでゼロから作るDeep Learning

[DL輪読会]Deep Learning 第5章機械学習の基礎

Rによるデータサイエンス：１２章「時系列」

新卒2年目が鍛えられたコードレビュー道場

人工知能研究のための視覚情報処理

Ethereumのシャーディング概論

Rで学ぶ逆変換（逆関数）法

llvm入門

画像処理基礎

ロボコン勉強会向けStm32を用いてマスタースレーブシステム

普通の人が勉強会で発表するために必要な準備のすべて～入門パブリック・スピーキング

強くなるためのプログラミング -プログラミングに関する様々なコンテストとそのはじめ方-#pyconjp

グラフと木

深層学習　第6章

[DL輪読会]Deep Learning 第6章深層順伝播型ネットワーク

Similar to deep learning from scratch chapter 4.neural network learing

Unsupervised Feature LearningAmgad Muhammad

Facial Emotion Detection on Children's Emotional FaceTakrim Ul Islam Laskar

Final Semester ProjectDebraj Paul

PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchJisang Yoon

deep learning from scratch chapter 3 neural networkJaey Jeong

Lesson 39Avijit Kumar

AI Lesson 39Assistant Professor

A deep learning approach for twitter spam detection lijie zhouAnne(Lijie) Zhou

MEME – An Integrated Tool For Advanced Computational ExperimentsGIScRG

08 neural networksankit_ppt

QCon Rio - Machine Learning for EveryoneDhiana Deva

Submit_the_SlideShare_Assignment_for_Thesis_Presentations1180013

Ann model and its applicationmilan107

Artificial Neural Networks-Supervised Learning ModelsDrBaljitSinghKhehra

Deep learning - a primerUwe Friedrichsen

Deep learning - a primerShirin Elsinghorst

Deep learning from scratch Eran Shlomo

V2.0 open power ai virtual university deep learning and ai introductionGanesan Narayanasamy

Similar to deep learning from scratch chapter 4.neural network learing (20)

Unsupervised Feature Learning

Facial Emotion Detection on Children's Emotional Face

Final Semester Project

PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch

deep learning from scratch chapter 3 neural network

Lesson 39

AI Lesson 39

A deep learning approach for twitter spam detection lijie zhou

MEME – An Integrated Tool For Advanced Computational Experiments

08 neural networks

QCon Rio - Machine Learning for Everyone

Submit_the_SlideShare_Assignment_for_Thesis_Presentation

Ann model and its application

Artificial Neural Networks-Supervised Learning Models

Deep learning - a primer

Deep learning from scratch

V2.0 open power ai virtual university deep learning and ai introduction

Recently uploaded

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700

XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan

Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh

Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran

chapter--4-software-project-planning.pptkotipi9215

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp

Implementing Zero Trust strategy with AzureDinusha Kumarasiri

Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh

why an Opensea Clone Script might be your perfect match.pdfjoe51371421

What is Fashion PLM and Why Do You Need ItWave PLM

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea

Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3

Recently uploaded (20)

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...

XpertSolvers: Your Partner in Building Innovative Software Solutions

Automate your Kamailio Test Calls - Kamailio World 2024

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...

Intelligent Home Wi-Fi Solutions | ThinkPalm

chapter--4-software-project-planning.ppt

BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE

Implementing Zero Trust strategy with Azure

Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝

why an Opensea Clone Script might be your perfect match.pdf

What is Fashion PLM and Why Do You Need It

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data

deep learning from scratch chapter 4.neural network learing

1. Interaction Lab. Kumoh National Institute of Technology Deep Learning from Scratch chapter 4. Neural Network Learning JaeYeop Jeong

2. ■Neural network learning ■Loss function ■Differential ■Gradient method ■Learning algorithm implement Agenda Interaction Lab., Kumoh National Institue of Technology 2

3. ■Data is important in machine learning. ■All problems can be solved in the same context.  5, dog, human face Neural network learning(1/2) Interaction Lab., Kumoh National Institue of Technology 3 Human thought algorithm Neural network (deep learning) Human thought feature (SIFT, HOG, …) Machine learning (SVM, KNN, …) Output Output Output

4. ■Divide into training data and test data.  First, use only training data to find optimal parameters.  And then, test training model.  universal model ■Overfitting Neural network learning(2/2) Interaction Lab., Kumoh National Institue of Technology 4

5. ■Use the loss function to find the optimal parameter.  Mean squared error, cross entropy error ■Mean squared error(MSE)  𝐸 = 1 2 𝑘(𝑦𝑘 − 𝑡𝑘)2 , (𝑦𝑘 = Neural network output, 𝑡𝑘 = lable, 𝑘 = dimesion of data)  have a small error Loss function Interaction Lab., Kumoh National Institue of Technology 5 y -> softmax() 0.0975 0.5975

6. ■Cross entropy error(CEE)  𝐸 = − 𝑘 𝑡𝑘𝑙𝑜𝑔𝑦𝑘, (𝑦𝑘 = Neural network output, 𝑡𝑘 = lable, 𝑘 = dimesion of data) (𝑙𝑜𝑔𝑒) • 𝑡𝑘 is one-hot encoding • Calculate natural logarithm when practically correct  𝑡𝑘 is ‘2’, 𝑦𝑘 = 0.6 − 𝑙𝑜𝑔0.6 = 0.51  𝑡𝑘 is ‘2’, 𝑦𝑘 = 0.1 − 𝑙𝑜𝑔0.1 = 2.30  That is CCE determines the full value of the output when correct  have a small error Loss function Interaction Lab., Kumoh National Institue of Technology 6 +delta +delta 0.5108 2.3025 y = logx

7. ■Mini-Batch learning  Machine learning problems are taught using data • Obtain loss function for training data and find optimal parameters • That is, if you have 100 training data, use 100 loss function values. • BigData,,,  𝐸 = − 1 𝑁 𝛱 𝑘 𝑡𝑛𝑘 log 𝑦𝑛𝑘 (𝑦𝑘 = Neural network output, 𝑡𝑘 = lable, 𝑘 = dimesion of data) • Average loss function ■ regardless of the number of data  Use only 100 out of 60000 data • Learning use only 100 data Loss function Interaction Lab., Kumoh National Institue of Technology 7

8. ■Mini-Batch implement  CCE implement • t : one-hot encoding • t : not one-hot encoding Loss function Interaction Lab., Kumoh National Institue of Technology 8 batch_size = 5 - > [0, 1, 2, 3, 4] t - > [2, 7, 0, 9, 4] [y[0, 2], y[1, 7], y[2, 0], y[3, 9], y[4,4]] [[0, 1, 2, 3, 4]]

9. ■Why loss function? Why not accuracy?  To find parameter values that draw high 'accuracy’  Parameter values that make the loss function small • Differential ■- ■ + Loss function Interaction Lab., Kumoh National Institue of Technology 9 Step function Sigmoid function

10. ■momentary variation  ⅆ𝑓 𝑥 ⅆ𝑥 = lim ℎ→0 𝑓 𝑥+ℎ −𝑓 𝑥−ℎ 2ℎ • 𝑦 = 0.01𝑥2 + 0.1𝑥 • 𝑦 = 0.02𝑥 + 0.1 Differential Interaction Lab., Kumoh National Institue of Technology 10 x = 5 x = 10

11. ■Partial differential  𝑓 𝑥0, 𝑥1 = 𝑥0 2 + 𝑥1 2 • x = 3.0 • x = 4.0 Differential Interaction Lab., Kumoh National Institue of Technology 11

12. ■Gradient method  Loss function minimum value • Use gradient ■ Not always correct but it’s hint.  Move a certain distance after calculation Gradient method Interaction Lab., Kumoh National Institue of Technology 12 Learning rate (0.01, 0.001,,,) 초기값 : (-3.0, 4)

13. Gradient method Interaction Lab., Kumoh National Institue of Technology 13 ■Learning rate 10.0 1e-10

14. ■Gradient in Neural network Gradient method Interaction Lab., Kumoh National Institue of Technology 14

15. ■Stochastic gradient descent (SGD)  1. Mini-batch  2. Calculate the gradient  3. Update parameters  4. Repeat Learning algorithm implement Interaction Lab., Kumoh National Institue of Technology 15

16. Q&A Interaction Lab., Kumoh National Institue of Technology 16

Editor's Notes

학습 : 훈련 데이터로부터 가중치 매개변수의 최적 값을 자동으로 획득하는 것 SIFT, HOG 이미지의 특징을 벡터화 SVM, KNN 지도학습에서 변환된 벡터를 가지고 지도 학습 방식의 대표 분류기법 기계가 자동으로 최적은 매개변수(가중치, 편향) 설정
Overfitting – 한 데이터셋에만 지나치게 최적화된 상태
손실함수를 이용해서 최적의 매개변수 찾는다 평균 제곱 오차 크로스 엔트로피 오차
정답일 때의 값을 계산하는 것 즉, 교차 엔트로피 오차는 정답일 때의 출력이 전체 값을 정함 -무한대로 발산되지 않게 델타 값 더함
교차 엔트로피 오차를 이용해서 미니배치
정확도 신경망 학습에서는 최적의 매개변수를 탐색할 때 손실 함수의 값을 가능한 한 작게 하는 매개변수 값을 찾는다 매개변수(가중치 편향)의 미분을 계산하고 그 미분 값을 단서로 매개변수의 값을 서서히 갱신
수치미분의 예
손실함수 값이 최저로 만들기 기울기를 이용한다. 경사법 항상 최소값이 아니기때문에 그 방향으로 움직인다. 경사하강법 학습률 – 매개변수 값을 얼마나 갱신할지 결정
-0.7 경우 w23을 h만큼 늘리면 손실함수의 값 0.7h 감소

deep learning from scratch chapter 4.neural network learing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to deep learning from scratch chapter 4.neural network learing

Similar to deep learning from scratch chapter 4.neural network learing (20)

More from Jaey Jeong

More from Jaey Jeong (15)

Recently uploaded

Recently uploaded (20)

deep learning from scratch chapter 4.neural network learing

Editor's Notes