SlideShare a Scribd company logo

DNN and RBM

Back-Propagation Algorithm for Deep Neural Networks and Contradictive Diverse Learning for Restricted Boltzmann Machine

1 of 58
Masayuki Tanaka
Aug. 17, 2015
Back-Propagation Algorithm for
Deep Neural Networks and
Contradictive Diverse Learning for
Restricted Boltzmann Machine
Outline
1. Examples of Deep Learning
2. RBM to Deep NN
3. Deep Neural Network (Deep NN)
– Back-Propagation (Supervised Learning)
4. Restricted Boltzmann Machine (RBM)
– Mathematics, Probabilistic Model and Inference Model
– Pre-training by Contradictive Diverse Learning
(Unsupervised Learning)
5. Inference Model with Distribution
1
http://bit.ly/dnnicpr2014
Deep learning
2
– MNIST (handwritten digits benchmark)
MNIST
 Top performance in character recognition
Deep learning
3
CIFAR10
– CIFAR (image classification benchmark)
 Top performance in image classification
Deep learning
4
Convolution
Pooling
Softmax
OtherGoogLeNet, ILSVRC2014
Image Large Scale Visual Recognition Challenge
(ILSVRC)
 Top performance in visual recoginition
Deep learning
5
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/uns
upervised_icml2012.pdf
Automatic learning with youtube videos,
neuron for human’s face
neuron for cat
10,000,000:
training samples
Three days learning with
1,000 computers
5
 “Cat neuron”

Recommended

Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & OpportunityiTrain
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNNAshray Bhandare
 
Resnet.pptx
Resnet.pptxResnet.pptx
Resnet.pptxYanhuaSi
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Muhammad Haroon
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningMohamed Loey
 

More Related Content

What's hot

CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesDmytro Mishkin
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural NetworkVignesh Suresh
 
Restricted Boltzmann Machines.pptx
Restricted Boltzmann Machines.pptxRestricted Boltzmann Machines.pptx
Restricted Boltzmann Machines.pptxhusseinali674716
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Larry Guo
 
Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronMostafa G. M. Mostafa
 
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Edureka!
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Basit Rafiq
 
Notes from Coursera Deep Learning courses by Andrew Ng
Notes from Coursera Deep Learning courses by Andrew NgNotes from Coursera Deep Learning courses by Andrew Ng
Notes from Coursera Deep Learning courses by Andrew NgdataHacker. rs
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOleg Mygryn
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkRichard Kuo
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetSungminYou
 
SinGAN - Learning a Generative Model from a Single Natural Image
SinGAN - Learning a Generative Model from a Single Natural ImageSinGAN - Learning a Generative Model from a Single Natural Image
SinGAN - Learning a Generative Model from a Single Natural ImageJishnu P
 
A Deep Journey into Super-resolution
A Deep Journey into Super-resolutionA Deep Journey into Super-resolution
A Deep Journey into Super-resolutionRonak Mehta
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector MachinesEdgar Marca
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural NetworksDatabricks
 
Neural Networks: Self-Organizing Maps (SOM)
Neural Networks:  Self-Organizing Maps (SOM)Neural Networks:  Self-Organizing Maps (SOM)
Neural Networks: Self-Organizing Maps (SOM)Mostafa G. M. Mostafa
 
Master Thesis of Computer Engineering SuperResoluton Giuseppe Caliendo
Master Thesis of Computer Engineering SuperResoluton Giuseppe CaliendoMaster Thesis of Computer Engineering SuperResoluton Giuseppe Caliendo
Master Thesis of Computer Engineering SuperResoluton Giuseppe CaliendoGiuseppeCaliendo2
 

What's hot (20)

Back propagation method
Back propagation methodBack propagation method
Back propagation method
 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent Advances
 
Densenet CNN
Densenet CNNDensenet CNN
Densenet CNN
 
Convolutional Neural Network
Convolutional Neural NetworkConvolutional Neural Network
Convolutional Neural Network
 
Restricted Boltzmann Machines.pptx
Restricted Boltzmann Machines.pptxRestricted Boltzmann Machines.pptx
Restricted Boltzmann Machines.pptx
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
 
Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer Perceptron
 
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
Notes from Coursera Deep Learning courses by Andrew Ng
Notes from Coursera Deep Learning courses by Andrew NgNotes from Coursera Deep Learning courses by Andrew Ng
Notes from Coursera Deep Learning courses by Andrew Ng
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
SinGAN - Learning a Generative Model from a Single Natural Image
SinGAN - Learning a Generative Model from a Single Natural ImageSinGAN - Learning a Generative Model from a Single Natural Image
SinGAN - Learning a Generative Model from a Single Natural Image
 
A Deep Journey into Super-resolution
A Deep Journey into Super-resolutionA Deep Journey into Super-resolution
A Deep Journey into Super-resolution
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
 
Neural Networks: Self-Organizing Maps (SOM)
Neural Networks:  Self-Organizing Maps (SOM)Neural Networks:  Self-Organizing Maps (SOM)
Neural Networks: Self-Organizing Maps (SOM)
 
Master Thesis of Computer Engineering SuperResoluton Giuseppe Caliendo
Master Thesis of Computer Engineering SuperResoluton Giuseppe CaliendoMaster Thesis of Computer Engineering SuperResoluton Giuseppe Caliendo
Master Thesis of Computer Engineering SuperResoluton Giuseppe Caliendo
 

Similar to DNN and RBM

Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentationOwin Will
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningTapas Majumdar
 
Echo state networks and locomotion patterns
Echo state networks and locomotion patternsEcho state networks and locomotion patterns
Echo state networks and locomotion patternsVito Strano
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningJunaid Bhat
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagationParveenMalik18
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attributiontaeseon ryu
 
SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15Hao Zhuang
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptxPrabhuSelvaraj15
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooJaeJun Yoo
 
Jsai final final final
Jsai final final finalJsai final final final
Jsai final final finaldinesh malla
 
Neural Inverse Rendering for General Reflectance Photometric Stereo (ICML 2018)
Neural Inverse Rendering for General Reflectance Photometric Stereo (ICML 2018)Neural Inverse Rendering for General Reflectance Photometric Stereo (ICML 2018)
Neural Inverse Rendering for General Reflectance Photometric Stereo (ICML 2018)Tatsunori Taniai
 
20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptxssuser7807522
 
Lecture 5: Neural Networks II
Lecture 5: Neural Networks IILecture 5: Neural Networks II
Lecture 5: Neural Networks IISang Jun Lee
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningVahid Mirjalili
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012Florent Renucci
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2San Kim
 

Similar to DNN and RBM (20)

Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
 
04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
Echo state networks and locomotion patterns
Echo state networks and locomotion patternsEcho state networks and locomotion patterns
Echo state networks and locomotion patterns
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagation
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attribution
 
SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
 
Super resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun YooSuper resolution in deep learning era - Jaejun Yoo
Super resolution in deep learning era - Jaejun Yoo
 
Jsai final final final
Jsai final final finalJsai final final final
Jsai final final final
 
Neural Inverse Rendering for General Reflectance Photometric Stereo (ICML 2018)
Neural Inverse Rendering for General Reflectance Photometric Stereo (ICML 2018)Neural Inverse Rendering for General Reflectance Photometric Stereo (ICML 2018)
Neural Inverse Rendering for General Reflectance Photometric Stereo (ICML 2018)
 
20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx20230213_ComputerVision_연구.pptx
20230213_ComputerVision_연구.pptx
 
Lecture 5: Neural Networks II
Lecture 5: Neural Networks IILecture 5: Neural Networks II
Lecture 5: Neural Networks II
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep Learning
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
 

More from Masayuki Tanaka

Slideshare breaking inter layer co-adaptation
Slideshare breaking inter layer co-adaptationSlideshare breaking inter layer co-adaptation
Slideshare breaking inter layer co-adaptationMasayuki Tanaka
 
PRMU201902 Presentation document
PRMU201902 Presentation documentPRMU201902 Presentation document
PRMU201902 Presentation documentMasayuki Tanaka
 
Gradient-Based Low-Light Image Enhancement
Gradient-Based Low-Light Image EnhancementGradient-Based Low-Light Image Enhancement
Gradient-Based Low-Light Image EnhancementMasayuki Tanaka
 
遠赤外線カメラと可視カメラを利用した悪条件下における画像取得
遠赤外線カメラと可視カメラを利用した悪条件下における画像取得遠赤外線カメラと可視カメラを利用した悪条件下における画像取得
遠赤外線カメラと可視カメラを利用した悪条件下における画像取得Masayuki Tanaka
 
Learnable Image Encryption
Learnable Image EncryptionLearnable Image Encryption
Learnable Image EncryptionMasayuki Tanaka
 
クリエイティブ・コモンズ
クリエイティブ・コモンズクリエイティブ・コモンズ
クリエイティブ・コモンズMasayuki Tanaka
 
メラビアンの法則
メラビアンの法則メラビアンの法則
メラビアンの法則Masayuki Tanaka
 
権威に訴える論証
権威に訴える論証権威に訴える論証
権威に訴える論証Masayuki Tanaka
 
Chain rule of deep neural network layer for back propagation
Chain rule of deep neural network layer for back propagationChain rule of deep neural network layer for back propagation
Chain rule of deep neural network layer for back propagationMasayuki Tanaka
 
One-point for presentation
One-point for presentationOne-point for presentation
One-point for presentationMasayuki Tanaka
 
ADMM algorithm in ProxImaL
ADMM algorithm in ProxImaL ADMM algorithm in ProxImaL
ADMM algorithm in ProxImaL Masayuki Tanaka
 
Intensity Constraint Gradient-Based Image Reconstruction
Intensity Constraint Gradient-Based Image ReconstructionIntensity Constraint Gradient-Based Image Reconstruction
Intensity Constraint Gradient-Based Image ReconstructionMasayuki Tanaka
 
Least Square with L0, L1, and L2 Constraint
Least Square with L0, L1, and L2 ConstraintLeast Square with L0, L1, and L2 Constraint
Least Square with L0, L1, and L2 ConstraintMasayuki Tanaka
 

More from Masayuki Tanaka (20)

Slideshare breaking inter layer co-adaptation
Slideshare breaking inter layer co-adaptationSlideshare breaking inter layer co-adaptation
Slideshare breaking inter layer co-adaptation
 
PRMU201902 Presentation document
PRMU201902 Presentation documentPRMU201902 Presentation document
PRMU201902 Presentation document
 
Gradient-Based Low-Light Image Enhancement
Gradient-Based Low-Light Image EnhancementGradient-Based Low-Light Image Enhancement
Gradient-Based Low-Light Image Enhancement
 
Year-End Seminar 2018
Year-End Seminar 2018Year-End Seminar 2018
Year-End Seminar 2018
 
遠赤外線カメラと可視カメラを利用した悪条件下における画像取得
遠赤外線カメラと可視カメラを利用した悪条件下における画像取得遠赤外線カメラと可視カメラを利用した悪条件下における画像取得
遠赤外線カメラと可視カメラを利用した悪条件下における画像取得
 
Learnable Image Encryption
Learnable Image EncryptionLearnable Image Encryption
Learnable Image Encryption
 
クリエイティブ・コモンズ
クリエイティブ・コモンズクリエイティブ・コモンズ
クリエイティブ・コモンズ
 
デザイン4原則
デザイン4原則デザイン4原則
デザイン4原則
 
メラビアンの法則
メラビアンの法則メラビアンの法則
メラビアンの法則
 
類似性の法則
類似性の法則類似性の法則
類似性の法則
 
権威に訴える論証
権威に訴える論証権威に訴える論証
権威に訴える論証
 
Chain rule of deep neural network layer for back propagation
Chain rule of deep neural network layer for back propagationChain rule of deep neural network layer for back propagation
Chain rule of deep neural network layer for back propagation
 
Give Me Four
Give Me FourGive Me Four
Give Me Four
 
Tech art 20170315
Tech art 20170315Tech art 20170315
Tech art 20170315
 
My Slide Theme
My Slide ThemeMy Slide Theme
My Slide Theme
 
Font Memo
Font MemoFont Memo
Font Memo
 
One-point for presentation
One-point for presentationOne-point for presentation
One-point for presentation
 
ADMM algorithm in ProxImaL
ADMM algorithm in ProxImaL ADMM algorithm in ProxImaL
ADMM algorithm in ProxImaL
 
Intensity Constraint Gradient-Based Image Reconstruction
Intensity Constraint Gradient-Based Image ReconstructionIntensity Constraint Gradient-Based Image Reconstruction
Intensity Constraint Gradient-Based Image Reconstruction
 
Least Square with L0, L1, and L2 Constraint
Least Square with L0, L1, and L2 ConstraintLeast Square with L0, L1, and L2 Constraint
Least Square with L0, L1, and L2 Constraint
 

Recently uploaded

Lung imaging Using 3D Dual-Echo FID Ultra-short Echo Time MRI with novel Rose...
Lung imaging Using 3D Dual-Echo FID Ultra-short Echo Time MRI with novel Rose...Lung imaging Using 3D Dual-Echo FID Ultra-short Echo Time MRI with novel Rose...
Lung imaging Using 3D Dual-Echo FID Ultra-short Echo Time MRI with novel Rose...Uzay Emir
 
Cytotoxic Activity of Linum usitatissimum L. Essential oil against Lung Adeno...
Cytotoxic Activity of Linum usitatissimum L. Essential oil against Lung Adeno...Cytotoxic Activity of Linum usitatissimum L. Essential oil against Lung Adeno...
Cytotoxic Activity of Linum usitatissimum L. Essential oil against Lung Adeno...AmalDhivaharS
 
Seminario biología molecular Lina Charris
Seminario biología molecular Lina CharrisSeminario biología molecular Lina Charris
Seminario biología molecular Lina CharrisLinaMarcelaCharrisRa
 
Weak-lensing detection of intracluster filaments in the Coma cluster
Weak-lensing detection of intracluster filaments in the Coma clusterWeak-lensing detection of intracluster filaments in the Coma cluster
Weak-lensing detection of intracluster filaments in the Coma clusterSérgio Sacani
 
Presentation for first doctoral seminar on Advances in poultry nutrition.pptx
Presentation for first doctoral seminar on Advances in poultry nutrition.pptxPresentation for first doctoral seminar on Advances in poultry nutrition.pptx
Presentation for first doctoral seminar on Advances in poultry nutrition.pptxPallaviMali14
 
A tale of two Lucys - Delft lecture - March 4, 2024
A tale of two Lucys - Delft lecture - March 4, 2024A tale of two Lucys - Delft lecture - March 4, 2024
A tale of two Lucys - Delft lecture - March 4, 2024Richard Gill
 
PSILOTUM : structure, morphology, anatomy, reproduction , life cycle etc.
PSILOTUM : structure, morphology, anatomy,  reproduction , life cycle etc.PSILOTUM : structure, morphology, anatomy,  reproduction , life cycle etc.
PSILOTUM : structure, morphology, anatomy, reproduction , life cycle etc.Silpa Selvaraj
 
the menstrual cycle in female reproductive system
the menstrual cycle in female reproductive systemthe menstrual cycle in female reproductive system
the menstrual cycle in female reproductive systemGilmeTripole1
 
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...dkNET
 
Chemical Bonding and it's Types 001.pptx
Chemical Bonding and it's Types 001.pptxChemical Bonding and it's Types 001.pptx
Chemical Bonding and it's Types 001.pptxperiyar arts college
 
Seminario biología molecular Lina Charris
Seminario biología molecular Lina CharrisSeminario biología molecular Lina Charris
Seminario biología molecular Lina CharrisLinaMarcelaCharrisRa
 
commercial production of cellulase enzyme and its uses
commercial production of cellulase enzyme and its usescommercial production of cellulase enzyme and its uses
commercial production of cellulase enzyme and its usesSilpa Selvaraj
 
Differences between syrups and elixirs .pptx
Differences between  syrups and elixirs .pptxDifferences between  syrups and elixirs .pptx
Differences between syrups and elixirs .pptxushakiranmai4
 
Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...
Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...
Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...AmalDhivaharS
 
electrophoresis: types, advantages, disadvantages and applications.
electrophoresis: types, advantages, disadvantages and applications.electrophoresis: types, advantages, disadvantages and applications.
electrophoresis: types, advantages, disadvantages and applications.Silpa Selvaraj
 
Microbial Fermentation(Strain Improvement)
Microbial  Fermentation(Strain Improvement)Microbial  Fermentation(Strain Improvement)
Microbial Fermentation(Strain Improvement)Rachana Choudhary
 
UKRAINIAN Kyiv Institute of Thermophysics Memorandum_English.pdf
UKRAINIAN Kyiv Institute of Thermophysics Memorandum_English.pdfUKRAINIAN Kyiv Institute of Thermophysics Memorandum_English.pdf
UKRAINIAN Kyiv Institute of Thermophysics Memorandum_English.pdfThane Heins
 
Open Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of AstrophysicsOpen Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of AstrophysicsPeter Coles
 
Presentacion Mariana Arango- biología molecular
Presentacion Mariana Arango- biología molecularPresentacion Mariana Arango- biología molecular
Presentacion Mariana Arango- biología molecularmarianaarangop
 

Recently uploaded (20)

Lung imaging Using 3D Dual-Echo FID Ultra-short Echo Time MRI with novel Rose...
Lung imaging Using 3D Dual-Echo FID Ultra-short Echo Time MRI with novel Rose...Lung imaging Using 3D Dual-Echo FID Ultra-short Echo Time MRI with novel Rose...
Lung imaging Using 3D Dual-Echo FID Ultra-short Echo Time MRI with novel Rose...
 
Cytotoxic Activity of Linum usitatissimum L. Essential oil against Lung Adeno...
Cytotoxic Activity of Linum usitatissimum L. Essential oil against Lung Adeno...Cytotoxic Activity of Linum usitatissimum L. Essential oil against Lung Adeno...
Cytotoxic Activity of Linum usitatissimum L. Essential oil against Lung Adeno...
 
Seminario biología molecular Lina Charris
Seminario biología molecular Lina CharrisSeminario biología molecular Lina Charris
Seminario biología molecular Lina Charris
 
Weak-lensing detection of intracluster filaments in the Coma cluster
Weak-lensing detection of intracluster filaments in the Coma clusterWeak-lensing detection of intracluster filaments in the Coma cluster
Weak-lensing detection of intracluster filaments in the Coma cluster
 
Presentation for first doctoral seminar on Advances in poultry nutrition.pptx
Presentation for first doctoral seminar on Advances in poultry nutrition.pptxPresentation for first doctoral seminar on Advances in poultry nutrition.pptx
Presentation for first doctoral seminar on Advances in poultry nutrition.pptx
 
A tale of two Lucys - Delft lecture - March 4, 2024
A tale of two Lucys - Delft lecture - March 4, 2024A tale of two Lucys - Delft lecture - March 4, 2024
A tale of two Lucys - Delft lecture - March 4, 2024
 
PSILOTUM : structure, morphology, anatomy, reproduction , life cycle etc.
PSILOTUM : structure, morphology, anatomy,  reproduction , life cycle etc.PSILOTUM : structure, morphology, anatomy,  reproduction , life cycle etc.
PSILOTUM : structure, morphology, anatomy, reproduction , life cycle etc.
 
the menstrual cycle in female reproductive system
the menstrual cycle in female reproductive systemthe menstrual cycle in female reproductive system
the menstrual cycle in female reproductive system
 
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
dkNET Webinar: The Collaborative Microbial Metabolite Center – Democratizing ...
 
Chemical Bonding and it's Types 001.pptx
Chemical Bonding and it's Types 001.pptxChemical Bonding and it's Types 001.pptx
Chemical Bonding and it's Types 001.pptx
 
Seminario biología molecular Lina Charris
Seminario biología molecular Lina CharrisSeminario biología molecular Lina Charris
Seminario biología molecular Lina Charris
 
commercial production of cellulase enzyme and its uses
commercial production of cellulase enzyme and its usescommercial production of cellulase enzyme and its uses
commercial production of cellulase enzyme and its uses
 
Differences between syrups and elixirs .pptx
Differences between  syrups and elixirs .pptxDifferences between  syrups and elixirs .pptx
Differences between syrups and elixirs .pptx
 
Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...
Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...
Anti-Obesity Activity of Anthocyanins and Corresponding Introduction in Dieta...
 
REGULATION OF METABOLISM IN PLANTS AND THE DIFFERENT MECHANISMS
REGULATION OF METABOLISM IN PLANTS  AND THE DIFFERENT MECHANISMSREGULATION OF METABOLISM IN PLANTS  AND THE DIFFERENT MECHANISMS
REGULATION OF METABOLISM IN PLANTS AND THE DIFFERENT MECHANISMS
 
electrophoresis: types, advantages, disadvantages and applications.
electrophoresis: types, advantages, disadvantages and applications.electrophoresis: types, advantages, disadvantages and applications.
electrophoresis: types, advantages, disadvantages and applications.
 
Microbial Fermentation(Strain Improvement)
Microbial  Fermentation(Strain Improvement)Microbial  Fermentation(Strain Improvement)
Microbial Fermentation(Strain Improvement)
 
UKRAINIAN Kyiv Institute of Thermophysics Memorandum_English.pdf
UKRAINIAN Kyiv Institute of Thermophysics Memorandum_English.pdfUKRAINIAN Kyiv Institute of Thermophysics Memorandum_English.pdf
UKRAINIAN Kyiv Institute of Thermophysics Memorandum_English.pdf
 
Open Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of AstrophysicsOpen Access Publishing in Astrophysics and the Open Journal of Astrophysics
Open Access Publishing in Astrophysics and the Open Journal of Astrophysics
 
Presentacion Mariana Arango- biología molecular
Presentacion Mariana Arango- biología molecularPresentacion Mariana Arango- biología molecular
Presentacion Mariana Arango- biología molecular
 

DNN and RBM

  • 1. Masayuki Tanaka Aug. 17, 2015 Back-Propagation Algorithm for Deep Neural Networks and Contradictive Diverse Learning for Restricted Boltzmann Machine
  • 2. Outline 1. Examples of Deep Learning 2. RBM to Deep NN 3. Deep Neural Network (Deep NN) – Back-Propagation (Supervised Learning) 4. Restricted Boltzmann Machine (RBM) – Mathematics, Probabilistic Model and Inference Model – Pre-training by Contradictive Diverse Learning (Unsupervised Learning) 5. Inference Model with Distribution 1 http://bit.ly/dnnicpr2014
  • 3. Deep learning 2 – MNIST (handwritten digits benchmark) MNIST  Top performance in character recognition
  • 4. Deep learning 3 CIFAR10 – CIFAR (image classification benchmark)  Top performance in image classification
  • 5. Deep learning 4 Convolution Pooling Softmax OtherGoogLeNet, ILSVRC2014 Image Large Scale Visual Recognition Challenge (ILSVRC)  Top performance in visual recoginition
  • 6. Deep learning 5 http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/uns upervised_icml2012.pdf Automatic learning with youtube videos, neuron for human’s face neuron for cat 10,000,000: training samples Three days learning with 1,000 computers 5  “Cat neuron”
  • 8. Pros and Cons of Deep NN 7 Input layer Output layer Deep NN Until a few years ago… 1. Tend to be overfitting 2. Learning information does not reach to the lower layer ・Pre-training with RBM ・Big data Image net More than 1,5 M: Labeled images http://www.image-net.org/ Labeled Faces in the Wild More than 10,000: Face images http://vis-www.cs.umass.edu/lfw/ High-performance network
  • 9. Outline 1. Examples of Deep NNs 2. RBM to Deep NN 3. Deep Neural Network (Deep NN) – Back-Propagation (Supervised Learning) 4. Restricted Boltzmann Machine (RBM) – Mathematics, Probabilistic Model and Inference Model – Pre-training by Contradictive Diverse Learning (Unsupervised Learning) 5. Inference Model with Distribution 8 http://bit.ly/dnnicpr2014
  • 10. Single Layer Neural Network 9 Input layer 𝑣1 𝑣2 𝑣3 ℎ Output 𝑤1 𝑤2 𝑤3 ℎ = 𝜎 𝑖 𝑤𝑖 𝑣𝑖 + 𝑏 𝜎 𝑥 = 1 1 + 𝑒−𝑥 Sigmoid function 0.0 0.2 0.4 0.6 0.8 1.0 -6.0 -5.0 -4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0  Single node output  Multiple nodes output (Single Layer NN) 𝒉Output layer Input Layer 𝒗 ℎ𝑗 = 𝜎 𝑖 𝑤𝑖𝑗 𝑣𝑖 + 𝑏𝑗 𝒉 = 𝜎 𝑾 𝑇 𝒗 + 𝒃 Vector representation of Single layer NN It is equivalent to the inference model of the RBM
  • 11. Sigmoid function Weighted sum and Activation functions 10 Input layer 𝑣1 𝑣2 𝑣3 ℎ Output layer 𝑤1 𝑤2 𝑤3 𝑛 𝑓 ℎ = 𝑓 𝑛 𝑛 = 𝑖 𝑤𝑖 𝑣𝑖 + 𝑏 Input layer 𝑣1 𝑣2 𝑣3 ℎ Output layer 𝑤1 𝑤2 𝑤3 ℎ = 𝑓 𝑖 𝑤𝑖 𝑣𝑖 + 𝑏 𝑓 𝑥 = 𝜎 𝑥 = 1 1 + 𝑒−𝑥 0.0 0.2 0.4 0.6 0.8 1.0 -6.0 -5.0 -4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 -6.0 -5.0 -4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 Rectified linear unit 𝑓 𝑥 = ReLU 𝑥 = 0 (𝑥 < 0) 𝑥 (𝑥 ≥ 0)
  • 12. Single layer NN to Deep NN 11 The deep NN is build up by stacking single layer NNs. 1st NN 2nd NN k-th NN Output data Input NN The output of the single layer NN will be the input of the next single layer NN. The output data of the deep NN is inferred by iterating the process.
  • 13. Parameters estimation for deep NN 12 Parameters are estimated by gradient descent algorithm which minimizes the difference between the output data and teach data. x y x0 x1 x2 The deep NN is build up by stacking single layer NNs. 1st NN 2nd NN k-th NN Teach data Input NN
  • 14. Parameters estimation for deep NN 13 Parameters are estimated by gradient descent algorithm which minimizes the difference between the output data and teach data. The deep NN is build up by stacking single layer NNs. 1st NN 2nd NN k-th NN Teach data Input NN Back-propagation: The gradients can be calculated as propagating the information backward.
  • 15. Why the pre-training is necessary? 14 1st NN 2nd NN k-th NN Teach data Input NN The back-propagation calculates the gradient from the output layer to the input layer. The information of the back-propagation can not reach the deep layers. Deep layers(1st layer, 2nd layer, …) are better to be learned by the unsupervised learning. Pre-training with the RBMs.
  • 16. Pre-training with RBMs 15 1st NN 2nd NN k-th NN Input data Single layer NN RBM Data The inference of the single layer NN is mathematically equivalent to the inference of the RBM. The RBM parameters are estimated by maximum likelihood algorithm with given training data.
  • 17. Pre-training and fine-tuning 16 Training data Output data Pre-training for 1st layer RBM Training data Output data Pre-training for 2nd layer RBM Input data Teach data Back propagation copy copy copy Fine-tuning of deep NN Pre-training with RBMs
  • 18. Feature vector extraction 17 Training data Output data Pre-training for 1st layer RBM Training data Output data Pre-training for 2nd layer RBM Input data Feature copy copy copy Pre-training with RBMs
  • 19. Outline 1. Examples of Deep NNs 2. RBM to Deep NN 3. Deep Neural Network (Deep NN) – Back-Propagation (Supervised Learning) 4. Restricted Boltzmann Machine (RBM) – Mathematics, Probabilistic Model and Inference Model – Pre-training by Contradictive Diverse Learning (Unsupervised Learning) 5. Inference Model with Distribution 18 http://bit.ly/dnnicpr2014
  • 20. Back-Propagation Algorithm 19 Input data Teach data Back propagation Output data 𝒉 = 𝜎 𝑾 𝑇 𝒗 + 𝒃Vector representation of the single layer NN The goal of learning: Weights W and bias b of the each layer are estimated, so that the differences between the output data and the teach data are minimized. 𝐼 = 1 2 𝑘 ℎ 𝑘 (𝐿) − 𝑡 𝑘 2Objective function Efficient calculation of the gradient is important. 𝜕𝐼 𝜕𝑾(ℓ) Back-propagation algorithm is an efficient algorithm to calculate the gradients.
  • 21. Back-Propagation: Gradient of the sigmoid function 20 𝜎 𝑥 = 1 1 + 𝑒−𝑥  Sigmoid function  Gradient of the sigmoid function 𝜕𝜎 𝜕𝑥 = 1 − 𝜎 𝑥 𝜎(𝑥)  Derivation of the gradient of the sigmoid function 𝜕𝜎 𝜕𝑥 = 𝜕 𝜕𝑥 1 1 + 𝑒−𝑥 = − 1 1 + 𝑒−𝑥 2 × −𝑒−𝑥 = 𝑒−𝑥 1 + 𝑒−𝑥 2 = 𝑒−𝑥 1 + 𝑒−𝑥 × 1 1 + 𝑒−𝑥 = 1 − 1 1 + 𝑒−𝑥 1 1 + 𝑒−𝑥 = (1 − 𝜎 𝑥 )𝜎 𝑥
  • 22. Back-Propagation: Simplification 21  Single layer NN 𝒉 Output layer Input layer 𝒗 ℎ𝑗 = 𝜎 𝑖 𝑤𝑖𝑗 𝑣𝑖 + 𝑏𝑗 𝒉 = 𝜎 𝑾 𝑇 𝒗 + 𝒃 𝑾′ = 𝑾 𝒃 𝑇 𝒗′ = 𝒗 1 𝒉 = 𝜎 𝑾 𝑇 𝒗 + 𝒃 = 𝜎 𝑾′ 𝑇 𝒗′ Here and after, Let’s consider only weight W 𝒉 = 𝜎 𝑾 𝑇 𝒗 Vector representation of the single layer NN
  • 23. Two-layer NN 22 ℎ 𝑘 (2) = 𝜎(𝑛 𝑘 2 ) 𝑛 𝑘 2 = 𝑗 𝑤𝑗𝑘 (2) ℎ𝑗 (1) ℎ𝑗 (1) = 𝜎(𝑛𝑗 1 ) 𝑛𝑗 1 = 𝑖 𝑤𝑖𝑗 (1) 𝑣𝑖 𝑣1 𝑣2 𝑣𝑖 𝑛1 (1) 𝜎 𝜎 𝜎 𝑤𝑖𝑗 (1) 𝑛2 (1) 𝑛𝑗 (1) ℎ1 (1) ℎ2 (1) ℎ𝑗 (1) 𝑛1 (2) 𝜎 𝜎 𝜎 𝑛2 (2) 𝑛 𝑘 (2) ℎ1 (2) ℎ2 (2) ℎ 𝑘 (2) 1 st layer 2 nd layer 𝑤𝑗𝑘 (2) 𝑡1 𝑡2 𝑡 𝑘 Teach Input layer𝑣1 𝑣2 𝑣3 ℎ Output layer 𝑤1 𝑤2 𝑤3 ℎ = 𝜎 𝑖 𝑤𝑖 𝑣𝑖 Input layer 𝑣1 𝑣2 𝑣3 ℎ Output layer 𝑤1 𝑤2 𝑤3 𝑛 𝜎 ℎ = 𝜎 𝑛 𝑛 = 𝑖 𝑤𝑖 𝑣𝑖 Separate weighted sum and activation function
  • 24. Back-Propagation of two-layer NN 23 ℎ 𝑘 (2) = 𝜎(𝑛 𝑘 2 ) 𝑛 𝑘 2 = 𝑗 𝑤𝑗𝑘 (2) ℎ𝑗 (1) ℎ𝑗 (1) = 𝜎(𝑛𝑗 1 ) 𝑛𝑗 1 = 𝑖 𝑤𝑖𝑗 (1) 𝑣𝑖 𝑣1 𝑣2 𝑣𝑖 𝑛1 (1) 𝜎 𝜎 𝜎 𝑤𝑖𝑗 (1) 𝑛2 (1) 𝑛𝑗 (1) ℎ1 (1) ℎ2 (1) ℎ𝑗 (1) 𝑛1 (2) 𝜎 𝜎 𝜎 𝑛2 (2) 𝑛 𝑘 (2) ℎ1 (2) ℎ2 (2) ℎ 𝑘 (2) 1 st layer 2 nd layer 𝑤𝑗𝑘 (2) 𝑡1 𝑡2 𝑡 𝑘 Teach 𝐼 = 1 2 𝑘 ℎ 𝑘 (2) − 𝑡 𝑘 2 Objective function 𝜕𝐼 𝜕𝑤𝑗𝑘 (2) = 𝜕𝐼 𝜕𝑛 𝑘 (2) 𝜕𝑛 𝑘 (2) 𝜕𝑤𝑗𝑘 (2) = 𝛿 𝑘 (2) ℎ𝑗 (1) 𝛿 𝑘 (2) = 𝜕𝐼 𝜕𝑛 𝑘 (2) = 𝜕𝐼 𝜕ℎ 𝑘 (2) 𝜕ℎ 𝑘 (2) 𝜕𝑛 𝑘 (2) = (ℎ 𝑘 (2) − 𝑡 𝑘) ℎ 𝑘 2 (1 − ℎ 𝑘 2 ) 𝜕𝐼 𝜕𝑤𝑖𝑗 (1) = 𝜕𝐼 𝜕𝑛𝑗 (1) 𝜕𝑛𝑗 (1) 𝜕𝑤𝑖𝑗 (1) = 𝛿𝑗 (1) 𝑣𝑖 𝛿𝑗 (1) = 𝜕𝐼 𝜕𝑛𝑗 (1) = 𝑘 𝜕𝐼 𝜕𝑛 𝑘 (2) 𝜕𝑛 𝑘 (2) 𝜕ℎ𝑗 (1) 𝜕ℎ𝑗 (1) 𝜕𝑛𝑗 (1) = 𝑘 𝛿 𝑘 (2) 𝑤𝑗𝑘 (2) ℎ𝑗 1 (1 − ℎ𝑗 1 ) Back-propagation
  • 25. Back-Propagation of arbitrary layer 24 𝒉(ℓ−1) 𝑾(ℓ) 𝒏(ℓ) 𝒏(ℓ) = 𝑾 ℓ T 𝒉(ℓ−1) 𝒇(ℓ) 𝒉(ℓ) = 𝒇(ℓ) 𝒏(ℓ)𝒉(ℓ) 𝑾(ℓ+1) 𝜹(ℓ) = 𝑾(ℓ+1) 𝜹(ℓ+1) ⊗ 𝜕𝑓(ℓ) 𝜕𝒏(ℓ) 𝜕𝐼 𝜕𝑾(ℓ) = 𝜹(ℓ) 𝒉 ℓ−1 T ⊗: elementwise product 𝜹(ℓ+1) = 𝑾(ℓ+2) 𝜹(ℓ+2) ⊗ 𝜕𝑓(ℓ+1) 𝜕𝒏(ℓ+1) 𝜕𝐼 𝜕𝑾(ℓ+1) = 𝜹(ℓ+1) 𝒉 ℓ1 T 𝜹(ℓ−1) = 𝑾(ℓ) 𝜹(ℓ) ⊗ 𝜕𝑓(ℓ−1) 𝜕𝒏(ℓ−1) 𝜕𝐼 𝜕𝑾(ℓ−1) = 𝜹(ℓ−1) 𝒉 ℓ−2 T
  • 26. Tip for gradient calculation debug 25 𝐼(𝜽) = 1 2 𝑘 ℎ 𝑘 𝐿 (𝒗; 𝜽) − 𝑡 𝑘 2 𝜕𝐼 𝜕𝜃𝑖 Objective function Gradient calculated by the back-propagation 𝜕𝐼 𝜕𝜃𝑖 = lim 𝜀→0 𝐼 𝜽 + 𝜀𝟏𝑖 − 𝐼 𝜽 𝜀 𝟏𝑖:i-th element is 1, others are 0 ∆𝑖 𝐼 = 𝐼 𝜽 + 𝜀𝟏𝑖 − 𝐼 𝜽 𝜀 Computational efficient Difficult implementation Computational inefficient Easy implementation Definition of gradient Differential approximation For small 𝜀, ∆𝑖 𝐼 ≑ 𝜕𝐼 𝜕𝜃 𝑖
  • 27. Stochastic Gradient descent algorithm (Mini-batch learning) 26 { 𝒗, 𝒕 1, 𝒗, 𝒕 2, ⋯ , 𝒗, 𝒕 𝑛 ⋯ , 𝒗, 𝒕 𝑁 } Parameters 𝜽 are learned with samples 𝐼 𝑛(𝜽) is the objective function associated to 𝒗, 𝒕 𝑛 𝒗, 𝒕 7 𝒗, 𝒕 10 𝒗, 𝒕 1 𝒗, 𝒕 2 𝒗, 𝒕 3 𝒗, 𝒕 4 𝒗, 𝒕 11𝒗, 𝒕 5 𝒗, 𝒕 6 𝜽 ← 𝜽 − 𝜂 𝜕𝐼 𝜕𝜽𝒗, 𝒕 7 𝒗, 𝒕 9𝒗, 𝒕 2 𝒗, 𝒕 11 Sampling 𝒗, 𝒕 6 𝒗, 𝒕 10𝒗, 𝒕 3 𝒗, 𝒕 12 𝒗, 𝒕 9 𝜽 ← 𝜽 − 𝜂 𝜕𝐼 𝜕𝜽 Sampling𝒗, 𝒕 12 𝒗, 𝒕 8 To avoid the overfitting, the parameters are updated with each mini-batch. Whole training data
  • 28. Practical update of parameters 27 G. Hinton, A Practical Guide to Training Restricted Boltzmann Machines 2010. Size of mini-batch:10 - 100 Learning rate 𝜂:Empirically determined Weight decay rate 𝜆:0.01 - 0.00001 Momentum rate 𝜈:0.9 (initially 0.5) 𝜃(𝑡+1) = 𝜃(𝑡) + ∆𝜃(𝑡) ∆𝜃(𝑡) = −𝜂 𝜕𝐼 𝜕𝜃 − 𝜆𝜃(𝑡) + 𝜈∆𝜃(𝑡−1) Update rule Gradient Weight decay momentum Weight decay is to avoid the unnecessary diverse. (especially for sigmoid function) Momentum is to avoid unnecessary oscillation of update amount. The similar effect of the conjugate gradient algorithm is expected.
  • 29. Outline 1. Examples of Deep NNs 2. RBM to Deep NN 3. Deep Neural Network (Deep NN) – Back-Propagation (Supervised Learning) 4. Restricted Boltzmann Machine (RBM) – Mathematics, Probabilistic Model and Inference Model – Pre-training by Contradictive Diverse Learning (Unsupervised Learning) 5. Inference Model with Distribution 28 http://bit.ly/dnnicpr2014
  • 30. Restricted Boltzmann Machines 29  Boltzmann Machines Boltmann machine is a probabilistic model represented by undirected graph (nodes and edges). Here, the binary state {0,1} is considered as the state of the nodes.  Unrestricted and restricted Boltzmann machine v:visible layer h:hidden layer v:visible layer h:hidden layer • (Unrestricted) Boltzmann machine • Restricted Boltzamann machine (RBM) Every node is connected each other. There is no edge in the same layer. It helps analysis.
  • 31. RBM: Probabilistic and energy models 30 v:visible layer {0,1} h:hidden layer {0,1} Probabilistic model: 𝑃 𝒗, 𝒉; 𝜽 = 1 𝑍 𝜽 exp −𝐸 𝒗, 𝒉; 𝜽 RBM parameter: θ = (W,b,c) Weight: W Bias: b,c Energy model: 𝐸 𝒗, 𝒉; 𝜽 = − 𝑖,𝑗 𝑣𝑖 𝑤𝑖𝑗ℎ𝑗 − 𝑗 𝑏𝑗ℎ𝑗 − 𝑖 𝑐𝑖 𝑣𝑖 = −𝒗 𝑇 𝑾𝒉 − 𝒃 𝑇 𝒉 − 𝒄 𝑇 𝒗 Partition function: 𝑍 𝜽 = 𝒗,𝒉∈{0,1} exp(−𝐸(𝒗, 𝒉; 𝜽))
  • 32. = 𝑖 exp(𝑐𝑖 𝑣𝑖) 𝑗 exp( 𝑖 𝑣𝑖 𝑤𝑖𝑗ℎ𝑗 + 𝑏𝑗ℎ𝑗) 𝑖 exp(𝑐𝑖 𝑣𝑖) 𝑗 ℎ 𝑗∈{0,1} exp( 𝑖 𝑣𝑖 𝑤𝑖𝑗ℎ𝑗 + 𝑏𝑗ℎ𝑗) RBM: Conditional probability model (Inference model) 31 v:visible layer {0,1} h:hidden layer {0,1} 𝑃 𝒉 𝒗; 𝜽 = 𝑃 𝒗, 𝒉; 𝜽 𝒉∈{0,1} 𝑃 𝒗, 𝒉; 𝜽 = exp( 𝑖,𝑗 𝑣𝑖 𝑤𝑖𝑗ℎ𝑗 + 𝑗 𝑏𝑗ℎ𝑗 + 𝑖 𝑐𝑖 𝑣𝑖) 𝒉∈{0,1} exp( 𝑖,𝑗 𝑣𝑖 𝑤𝑖𝑗ℎ𝑗 + 𝑗 𝑏𝑗ℎ𝑗 + 𝑖 𝑐𝑖 𝑣𝑖) Conditional probabilities of nodes are independent. 𝑃 ℎ𝑗 𝒗; 𝜽 = exp( 𝑖 𝑣𝑖 𝑤𝑖𝑗ℎ𝑗 + 𝑏𝑗ℎ𝑗) ℎ 𝑗∈{0,1} exp( 𝑖 𝑣𝑖 𝑤𝑖𝑗ℎ𝑗 + 𝑏𝑗ℎ𝑗) = 𝑗 exp( 𝑖 𝑣𝑖 𝑤𝑖𝑗ℎ𝑗 + 𝑏𝑗ℎ𝑗) ℎ 𝑗∈{0,1} exp( 𝑖 𝑣𝑖 𝑤𝑖𝑗ℎ𝑗 + 𝑏𝑗ℎ𝑗) = 𝑗 𝑃(ℎ𝑗|𝒗; 𝜽) 𝑃 𝒗, 𝒉; 𝜽 = 1 𝑍 𝜽 exp −𝐸 𝒗, 𝒉; 𝜽 𝐸 𝒗, 𝒉; 𝜽 = − 𝑖,𝑗 𝑣𝑖 𝑤𝑖𝑗ℎ𝑗 − 𝑗 𝑏𝑗ℎ𝑗 − 𝑖 𝑐𝑖 𝑣𝑖 = −𝒗 𝑇 𝑾𝒉 − 𝒃 𝑇 𝒉 − 𝒄 𝑇 𝒗
  • 33. 32 v:visible layer {0,1} h:hidden layer {0,1} 𝑃 ℎ𝑗 𝒗; 𝜽 = exp( 𝑖 𝑣𝑖 𝑤𝑖𝑗ℎ𝑗 + 𝑏𝑗ℎ𝑗) ℎ 𝑗∈{0,1} exp( 𝑖 𝑣𝑖 𝑤𝑖𝑗ℎ𝑗 + 𝑏𝑗ℎ𝑗) 𝑃 ℎ𝑗 = 1 𝒗; 𝜽 = exp( 𝑖 𝑣𝑖 𝑤𝑖𝑗 × 1 + 𝑏𝑗 × 1) exp 𝑖 𝑣𝑖 𝑤𝑖𝑗 × 0 + 𝑏𝑗 × 0 + exp( 𝑖 𝑣𝑖 𝑤𝑖𝑗 × 1 + 𝑏𝑗 × 1) 𝜎 𝑥 = 1 1 + 𝑒−𝑥 = exp( 𝑖 𝑣𝑖 𝑤𝑖𝑗 + 𝑏𝑗) exp 0 + exp( 𝑖 𝑣𝑖 𝑤𝑖𝑗 + 𝑏𝑗) = 1 1 + exp(− 𝑖 𝑣𝑖 𝑤𝑖𝑗 + 𝑏𝑗) = 𝜎 𝑖 𝑣𝑖 𝑤𝑖𝑗 + 𝑏𝑗 𝑃 𝒉 = 𝟏 𝒗; 𝜽 = 𝜎 𝑾 𝑇 𝒗 + 𝒃 𝑃 𝒗 = 𝟏 𝒉; 𝜽 = 𝜎 𝑾𝒉 + 𝒄 𝒉 = 𝜎 𝑾 𝑇 𝒗 + 𝒃 Vector representation of single layer NN RBM: Conditional probability model (Inference model)
  • 34. Gaussian-Bernoulli RBM 33 v:visible layer {0,1} h:hidden layer {0,1} Bernoulli-Bernoulli RBM Gaussian-Bernoulli RBM v:visible layer Gaussian distribution 𝑁(𝑣; 𝜇, 𝑠2 ) h:hidden layer {0,1} 𝑃 𝒗, 𝒉 = 1 𝑍(𝜽) exp −𝐸 𝒗, 𝒉 Gaussian-Bernoulli RBM 𝐸 𝒗, 𝒉 = 1 2𝑠2 𝑖 𝑣𝑖 − 𝑐𝑖 2 − 1 𝑠 𝑖,𝑗 𝑣𝑖 𝑤𝑖𝑗ℎ𝑖 − 𝑗 𝑏𝑗ℎ𝑗 Probabilistic model: Inference model (Conditional probability) 𝑃 𝒉 = 𝟏 𝒗 = 𝜎 1 𝑠 𝑾 𝑇 𝒗 + 𝒃 𝑃 𝒗 𝒉 = 𝑵 𝒗; 𝜎𝑾𝒉 + 𝒄, 𝑠2 𝑰 Energy model:
  • 35. Outline 1. Examples of Deep NNs 2. RBM to Deep NN 3. Deep Neural Network (Deep NN) – Back-Propagation (Supervised Learning) 4. Restricted Boltzmann Machine (RBM) – Mathematics, Probabilistic Model and Inference Model – Pre-training by Contradictive Diverse Learning (Unsupervised Learning) 5. Inference Model with Distribution 34 http://bit.ly/dnnicpr2014
  • 36. RBM: Contradictive Diverse Learning 35 𝒗(0) 𝒉(0) 𝒗(1) 𝒉(1) 𝒉(0) = 𝜎 𝑾 𝑇 𝒗(𝟎) + 𝒃 𝒗(1) = 𝜎 𝑾𝒉(𝟎) + 𝒄 𝒉(1) = 𝜎 𝑾 𝑇 𝒗(𝟏) + 𝒃 𝑾 ← 𝑾 − 𝜀 Δ𝑾 Δ𝑾 = 𝟏 𝑁 𝑛 𝒗 𝑛(0) 𝑇 𝒉 𝑛(0) − 𝒗 𝑛(1) 𝑇 𝒉 𝑛(1) Iterative process of the CD learning The CD learning can be considered the maximum likelihood estimation with given training data. Momentum and weight decay are also applied.
  • 37. RBM: Outline of the CD learning 36 v:visible layer {0,1} h:hidden layer {0,1} パラメータθ:Weight W, Bias b,c 𝑃 𝒗, 𝒉; 𝜽 = 1 𝑍 𝜽 exp −𝐸 𝒗, 𝒉; 𝜽 𝐸 𝒗, 𝒉; 𝜽 = − 𝑖,𝑗 𝑣𝑖 𝑤𝑖𝑗ℎ𝑗 − 𝑗 𝑏𝑗ℎ𝑗 − 𝑖 𝑐𝑖 𝑣𝑖 = −𝒗 𝑇 𝑾𝒉 − 𝒃 𝑇 𝒉 − 𝒄 𝑇 𝒗 • The maximum likelihood estimation with given training data {vn}. • (Approximated) EM algorithm is applied to handle unobserved hidden data. • The Gibbs sampling is applied to evaluate the partition function. • The Gibbs sampling is approximated by single sampling. Outline of the Contrastive Divergence Learning
  • 38. CD learning: Maximum likelihood 37 v:visible layer {0,1} h:hidden layer {0,1} 𝑃 𝒗, 𝒉; 𝜽 = 1 𝑍 𝜽 exp −𝐸 𝒗, 𝒉; 𝜽 𝐸 𝒗, 𝒉; 𝜽 = − 𝑖,𝑗 𝑣𝑖 𝑤𝑖𝑗ℎ𝑗 − 𝑗 𝑏𝑗ℎ𝑗 − 𝑖 𝑐𝑖 𝑣𝑖 = −𝒗 𝑇 𝑾𝒉 − 𝒃 𝑇 𝒉 − 𝒄 𝑇 𝒗 RBM is probabilistic model The maximum likelihood gives parameters for given training data. Visible data are given. Hidden data are not given. Integrate the hidden data. Log likelihood for training data {vn} 𝐿 𝜽 = 𝑛 𝐿 𝑛 𝜽 = 𝑛 log 𝑃 𝒗 𝑛; 𝜽 = 𝑛 log 𝐸 𝒉∼𝑃(𝒉|𝒗 𝑛;𝜽) 𝑃(𝒗 𝑛, 𝒉; 𝜽) The optimization is performed by the EM algorithm. 𝐿 𝑛 𝜽 = log 𝐸 𝒉∼𝑃(𝒉|𝒗 𝑛;𝜽) 𝑃(𝒗 𝑛, 𝒉; 𝜽)
  • 39. EM Algorithm 38 Reference: これなら分かる最適化数学,金谷健一 𝐿 𝑛 𝜽 = log 𝑃 𝒗 𝑛; 𝜽 = log 𝐸 𝒉∼𝑃(𝒉|𝒗 𝑛;𝜽) 𝑃(𝒗 𝑛, 𝒉; 𝜽) Log likelihood for training data {vn} The EM algorithm monotonically increases the log likelihood. EM algorithm 1. Initialize parameter 𝜽 by 𝜽0 . Set 𝜏 = 0. 2. Evaluate following function(E-step) 3. Find 𝜽 𝜏 which maximizes 𝑄 𝜏 𝜽 (M-step) 4. Set 𝜏←𝜏 + 1, then step2. Iterate until it is converged. 𝑄 𝜏 𝜽 = 𝐸 𝒉∼𝑃(𝒉|𝒗 𝑛;𝜽 𝜏) log 𝑃(𝒗 𝑛, 𝒉; 𝜽) ※In the CD learning, the M-step is approximated.
  • 40. Evaluation function and derivatives 39 𝑃 𝒗, 𝒉; 𝜽 = 1 𝑍 𝜽 exp −𝐸 𝒗, 𝒉; 𝜽 𝑍 𝜽 = 𝒗,𝒉∈{0,1} exp(−𝐸(𝒗, 𝒉; 𝜽)) 𝑄 𝜏 𝜽 = 𝐸 𝒉∼𝑃(𝒉|𝒗 𝑛;𝜽 𝜏) log 𝑃(𝒗 𝑛, 𝒉; 𝜽) = 𝐸 𝒉∼𝑃(𝒉|𝒗 𝑛;𝜽 𝜏) log exp −𝐸 𝒗 𝑛, 𝒉; 𝜽 𝑍(𝜽) = 𝐸 𝒉∼𝑃(𝒉|𝒗 𝑛;𝜽 𝜏) −𝐸 𝒗 𝑛, 𝒉; 𝜽 − log 𝑍(𝜽) 𝑄 𝜏 𝜽 = 𝐸 𝒉∼𝑃(𝒉|𝒗 𝑛;𝜽 𝜏) log 𝑃(𝒗 𝑛, 𝒉; 𝜽) Evaluation function v:visible layer {0,1} h:hidden layer {0,1} 𝜕𝑄 𝜏 𝜽 𝜕𝜽 = 𝐸 𝒉∼𝑃(𝒉|𝒗 𝑛;𝜽 𝜏) − 𝜕 𝜕𝜽 𝐸 𝒗 𝑛, 𝒉; 𝜽 − 𝜕 𝜕𝜽 log 𝑍(𝜽) = 𝐸 𝒉∼𝑃(𝒉|𝒗 𝑛;𝜽 𝜏) − 𝜕 𝜕𝜽 𝐸 𝒗 𝑛, 𝒉; 𝜽 − 𝐸 𝒗,𝒉∼𝑃(𝒗,𝒉;𝜽) − 𝜕 𝜕𝜽 𝐸 𝒗, 𝒉; 𝜽 Data term Model term
  • 41. Derivative of partition function (Model term) 40 𝜕 log 𝑍(𝜽) 𝜕𝜽 = 1 𝑍(𝜽) 𝜕𝑍 𝜕𝜽 (𝜽) = 1 𝑍(𝜽) 𝜕 𝜕𝜽 𝑓 𝒙; 𝜽 𝑑𝒙 = 1 𝑍(𝜽) 𝜕𝑓 𝒙; 𝜽 𝜕𝜽 𝑑𝒙 = 1 𝑍(𝜽) 𝑓 𝒙; 𝜽 𝜕 log 𝑓 𝒙; 𝜽 𝜕𝜽 𝑑𝒙 𝑃 𝒙; 𝜽 = 1 𝑍 𝜽 𝑓(𝒙; 𝜽) 𝑍 𝜽 = 𝑓 𝒙; 𝜽 𝑑𝒙 Probability function Partition function Arbitrary function: 𝑓(𝒙; 𝜽) = 𝑃 𝒙; 𝜽 𝜕 log 𝑓 𝒙; 𝜽 𝜕𝜽 𝑑𝒙 = 𝐸 𝒙∼𝑃 𝒙;𝜽 𝜕 log 𝑓 𝒙; 𝜽 𝜕𝜽 Derivative of log function Derivative of log function Derivative operator into integral Definition of expectation
  • 42. Evaluation function and derivatives 41 𝜕𝑄 𝜏 𝜽 𝜕𝜽 = 𝐸 𝒉∼𝑃(𝒉|𝒗 𝑛;𝜽 𝜏) − 𝜕 𝜕𝜽 𝐸 𝒗 𝑛, 𝒉; 𝜽 − 𝐸 𝒗,𝒉∼𝑃(𝒗,𝒉;𝜽) − 𝜕 𝜕𝜽 𝐸 𝒗, 𝒉; 𝜽 𝑃 𝒗, 𝒉; 𝜽 = 1 𝑍 𝜽 exp −𝐸 𝒗, 𝒉; 𝜽 𝑄 𝜏 𝜽 = 𝐸 𝒉∼𝑃(𝒉|𝒗 𝑛;𝜽 𝜏) log 𝑃(𝒗 𝑛, 𝒉; 𝜽) Evaluation function v:visible layer {0,1} h:hidden layer {0,1} 𝐸 𝒗, 𝒉; 𝜽 = − 𝑖,𝑗 𝑣𝑖 𝑤𝑖𝑗ℎ𝑗 − 𝑗 𝑏𝑗ℎ𝑗 − 𝑖 𝑐𝑖 𝑣𝑖 = −𝒗 𝑇 𝑾𝒉 − 𝒃 𝑇 𝒉 − 𝒄 𝑇 𝒗 𝜕 𝜕𝑾 𝐸 𝒗, 𝒉; 𝜽 = −𝒗 𝑻 𝒉 𝜕𝑄 𝜏 𝜽 𝜏 𝜕𝑾 = 𝐸 𝒉∼𝑃(𝒉|𝒗 𝑛;𝜽 𝜏) 𝒗 𝑛 𝑇 𝒉 − 𝐸 𝒗,𝒉∼𝑃(𝒗,𝒉;𝜽 𝜏) 𝒗 𝑻 𝒉 𝜕𝑄 𝜏 𝜽 𝜏 𝜕𝜽 = 𝐸 𝒉∼𝑃(𝒉|𝒗 𝑛;𝜽 𝜏) − 𝜕 𝜕𝜽 𝐸 𝒗 𝑛, 𝒉; 𝜽 − 𝐸 𝒗,𝒉∼𝑃(𝒗,𝒉;𝜽 𝜏) − 𝜕 𝜕𝜽 𝐸 𝒗, 𝒉; 𝜽 Data term (Feasible) Model term (Infeasible) Approximation by Gibbs sampling
  • 43. Derivatives of evaluation function (Data term) 42 𝒗(0) 𝒉(0) 𝑃 𝒉 = 𝟏 𝒗; 𝜽 = 𝜎 𝑾 𝑇 𝒗 + 𝒃 Inference of the RBM Data term 𝐸 𝒉∼𝑃(𝒉|𝒗 𝑛;𝜽 𝜏) 𝒗 𝑛 𝑇 𝒉 = 𝒗 𝑛 𝑇 𝐸 𝒉∼𝑃(𝒉|𝒗 𝑛;𝜽 𝜏) 𝒉 = 𝒗 𝑛 𝑇 𝟎 ∙ 𝑃 𝒉 = 𝟎 𝒗; 𝜽 + 𝟏 ∙ 𝑃 𝒉 = 𝟏 𝒗; 𝜽 = 𝒗 𝑛 𝑇 𝜎 𝑾 𝑇 𝒗 𝑛 + 𝒃 𝜕𝑄 𝜏 𝜽 𝜏 𝜕𝑾 = 𝐸 𝒉∼𝑃(𝒉|𝒗 𝑛;𝜽 𝜏) 𝒗 𝑛 𝑇 𝒉 − 𝐸 𝒗,𝒉∼𝑃(𝒗,𝒉;𝜽 𝜏) 𝒗 𝑻 𝒉 Data term(Feasible) Model term(Infeasible) Approximation by Gibbs sampling
  • 44. Expectation Approximation by Monte-Calro Method (Model term) 43 𝐸 𝒗,𝒉∼𝑃(𝒗,𝒉;𝜽 𝜏) 𝒗 𝑻 𝒉 ≑ 1 𝑁 𝑖 𝒗(𝑖) 𝑇 𝒉(𝑖) 𝒗(𝑖) 𝑇 , 𝒉(𝑖) ∼ 𝑃(𝒗, 𝒉; 𝜽 𝜏) Independent sampling Monte-Calro method How can we get the sampling? Gibbs sampling 𝜕𝑄 𝜏 𝜽 𝜏 𝜕𝑾 = 𝐸 𝒉∼𝑃(𝒉|𝒗 𝑛;𝜽 𝜏) 𝒗 𝑛 𝑇 𝒉 − 𝐸 𝒗,𝒉∼𝑃(𝒗,𝒉;𝜽 𝜏) 𝒗 𝑻 𝒉 Data term(Feasible) Model term(Infeasible) Approximation by Gibbs sampling
  • 45. Gibbs sampling 44 𝒗(0) 𝒉(0) 𝑃 𝒉 = 𝟏 𝒗; 𝜽 = 𝜎 𝑾 𝑇 𝒗 + 𝒃 Inference of RBM 𝑃 𝒗 = 𝟏 𝒉; 𝜽 = 𝜎 𝑾𝒉 + 𝒄 𝒗(1) 𝒉(1) 𝒗(∞) 𝒉(∞) Gibbs sampling 1. Initialize 𝒗. 2. Sampling 𝒉 with𝑃(𝒉|𝒗). 3. Sampling 𝒗 with 𝑃(𝒗|𝒉). 4. Iterate step 2 and step 3. 𝐸 𝒗,𝒉∼𝑃(𝒗,𝒉;𝜽 𝜏) 𝒗 𝑻 𝒉 ≑ 1 𝑁 𝑖 𝒗(𝑖) 𝑇 𝒉(𝑖) 𝜕𝑄 𝜏 𝜽 𝜏 𝜕𝑾 = 𝐸 𝒉∼𝑃(𝒉|𝒗 𝑛;𝜽 𝜏) 𝒗 𝑛 𝑇 𝒉 − 𝐸 𝒗,𝒉∼𝑃(𝒗,𝒉;𝜽 𝜏) 𝒗 𝑻 𝒉 Data term(Feasible) Model term(Infeasible) Approximation by Gibbs sampling
  • 46. Approximated Evaluation of Model Term 45 𝒗(0) 𝒉(0) 𝒗(1) 𝒉(1) Stop just one time! 𝒉(0) = 𝜎 𝑾 𝑇 𝒗(𝟎) + 𝒃 𝒗(1) = 𝜎 𝑾𝒉(𝟎) + 𝒄 𝒉(1) = 𝜎 𝑾 𝑇 𝒗(𝟏) + 𝒃 𝐸 𝒉∼𝑃(𝒉|𝒗 𝑛;𝜽 𝜏) 𝒗 𝑛 𝑇 𝒉 =𝒗(0) 𝑇 𝒉(0) 𝐸 𝒗,𝒉∼𝑃(𝒗,𝒉;𝜽 𝜏) 𝒗 𝑻 𝒉 =𝒗(1) 𝑇 𝒉(1) 𝑃 𝒉 = 𝟏 𝒗; 𝜽 = 𝜎 𝑾 𝑇 𝒗 + 𝒃 Inference of RBM 𝑃 𝒗 = 𝟏 𝒉; 𝜽 = 𝜎 𝑾𝒉 + 𝒄 𝜕𝑄 𝜏 𝜽 𝜏 𝜕𝑾 = 𝐸 𝒉∼𝑃(𝒉|𝒗 𝑛;𝜽 𝜏) 𝒗 𝑛 𝑇 𝒉 − 𝐸 𝒗,𝒉∼𝑃(𝒗,𝒉;𝜽 𝜏) 𝒗 𝑻 𝒉 Data term(Feasible) Model term(Infeasible) Approximation by Gibbs sampling
  • 47. Probabilities or status? 46 𝒗(0) 𝒉(0) 𝒗(1) 𝒉(1) 𝒉(0) = 𝜎 𝑾 𝑇 𝒗(𝟎) + 𝒃 𝒗(1) = 𝜎 𝑾𝒉(𝟎) + 𝒄 𝒉(1) = 𝜎 𝑾 𝑇 𝒗(𝟏) + 𝒃 G. Hinton, A Practical Guide to Training Restricted Boltzmann Machines 2010. The inference of the RBM gives the probabilities. In the Gibbs sampling, should we sampling the status with the probabilities ? Or should we simply use the probabilities ? Hinton recommends to use the probabilities. 𝑃 𝒉 = 𝟏 𝒗; 𝜽 = 𝜎 𝑾 𝑇 𝒗 + 𝒃 Inference of RBM 𝑃 𝒗 = 𝟏 𝒉; 𝜽 = 𝜎 𝑾𝒉 + 𝒄
  • 48. RBM: Contradictive Diverse Learning 47 𝒗(0) 𝒉(0) 𝒗(1) 𝒉(1) 𝒉(0) = 𝜎 𝑾 𝑇 𝒗(𝟎) + 𝒃 𝒗(1) = 𝜎 𝑾𝒉(𝟎) + 𝒄 𝒉(1) = 𝜎 𝑾 𝑇 𝒗(𝟏) + 𝒃 𝑾 ← 𝑾 − 𝜀 Δ𝑾 Δ𝑾 = 𝟏 𝑁 𝑛 𝒗 𝑛(0) 𝑇 𝒉 𝑛(0) − 𝒗 𝑛(1) 𝑇 𝒉 𝑛(1) Iterative process of the CD learning The CD learning can be considered the maximum likelihood estimation with given training data. Momentum and weight decay are also applied.
  • 49. Pre-training for the stacked RBMs 48 Training data Output data Pre-training for 1st layer RBM Training data Output data Pre-training for 2nd layer RBM Input data copy copy copy Pre-training for the RBMs
  • 50. Outline 1. Examples of Deep NNs 2. RBM to Deep NN 3. Deep Neural Network (Deep NN) – Back-Propagation (Supervised Learning) 4. Restricted Boltzmann Machine (RBM) – Mathematics, Probabilistic Model and Inference Model – Pre-training by Contradictive Diverse Learning (Unsupervised Learning) 5. Inference Model with Distribution 49 http://bit.ly/dnnicpr2014
  • 51. Drop-out 50 The drop-out is expected to be similar to ensemble learning. It is effective to avoid the overfitting. Drop-out The nodes are randomly dropped out for each mini-batch. The output of the dropped node is zero. 50% drop-out rate is recommended. G. Hinton, N.Srivastava, A.Krizhevsky, I.Sutskever, and R.Salakhutdinov, “Improving neural networks by preventing co-adaptation of feature detectors.”, arXiv preprint arXiv:1207.0580, 2012. Input layer Output layer
  • 52. Ensemble learning and Drop-out 51 Input layer Output layer Ensemble learning Drop-out Integration of multiple weak learner > single learner 𝒉 𝒗 = 1 𝐾 𝒉1 𝒗 + 𝒉2 𝒗 + 𝒉3 𝒗 + ⋯ + 𝒉 𝐾 𝒗 The drop-out is expected to be similar effect to ensemble learning
  • 53. Fast drop-out learning 52 S.I. Wang and C.D. Manning, Fast dropout training, ICML 2013. Input layer 𝑣1 𝑣2 𝑣3 ℎ Output layer 𝑤1 𝑤2 𝑤3 𝑛 𝜎 ℎ = 𝜎 𝑛 𝑛 = 𝑖 𝑤𝑖 𝑣𝑖 + 𝑏 Inference of standard NN 𝑣1 𝑣2 𝑣3 ℎ 𝑤1 𝑤2 𝑤3 𝑛 𝜎 𝜉1 𝜉2 𝜉3 Input layer Output layer ℎ = 𝐸 𝜎 𝜒 𝜒 = 𝑖 𝜉𝑖 𝑤𝑖 𝑣𝑖 + 𝑏 𝜉𝑖 ∈ 0,1 , 𝑃 𝜉𝑖 = 0 = 0.5 𝑃 𝜉𝑖 = 1 = 0.5 𝜒, 𝜉𝑖:Stochastic variables Inference of drop-out NN 𝜒 𝑃(𝜒)
  • 54. Fast drop-out learning 53 S.I. Wang and C.D. Manning, Fast dropout training, ICML 2013. 𝜒 𝑃(𝜒) 𝜒 𝑃(𝜒) Approximate by Gaussian 𝑁(𝜒; 𝜇, 𝑠2 ) mean 𝜇, variance𝑠2 ℎ = 𝐸 𝜎 𝜒 ≑ −∞ ∞ 𝑁(𝜒; 𝜇, 𝑠2 ) 𝜎 𝜒 𝑑𝜒 ≑ 𝜎 𝜇 1 + 𝜋𝑠2/8 Closed form approximation Fast calculation without sampling. 𝑣1 𝑣2 𝑣3 ℎ 𝑤1 𝑤2 𝑤3 𝑛 𝜎 𝜉1 𝜉2 𝜉3 Input layer Output layer ℎ = 𝐸 𝜎 𝜒 𝜒 = 𝑖 𝜉𝑖 𝑤𝑖 𝑣𝑖 + 𝑏 𝜉𝑖 ∈ 0,1 , 𝑃 𝜉𝑖 = 0 = 0.5 𝑃 𝜉𝑖 = 1 = 0.5 𝜒, 𝜉𝑖:Stochastic variables Inference of drop-out NN
  • 55. Inference based on RBM model 54 M. Tanaka and M. Okutomi, A Novel Inference of a Restricted Boltzmann Machine, ICPR2014. Input layer 𝑣1 𝑣2 𝑣3 ℎ Output layer 𝑤1 𝑤2 𝑤3 𝑛 𝜎 ℎ = 𝜎 𝑛 𝑛 = 𝑖 𝑤𝑖 𝑣𝑖 + 𝑏 Inference of standard NN ℎ1 ℎ2 ℎ3 𝐻 Output layer 𝑤1 𝑤2 𝑤3 𝑛 𝜎 𝐻 = 𝐸 𝜎 𝜒 𝜒 = 𝑖 𝑤𝑖ℎ𝑖 + 𝑏 Inference based on RBM model 𝑃 𝒉 = 𝟏 𝒗; 𝜽 = 𝜎 𝑾 𝑇 𝒗 + 𝒃 𝜒, ℎ𝑖: stochastic variables 𝜒 𝑃(𝜒) 𝜒 𝑃(𝜒) ℎ = 𝐸 𝜎 𝜒 ≑ −∞ ∞ 𝑁(𝜒; 𝜇, 𝑠2 )𝜎 𝜒 𝑑𝜒 ≑ 𝜎 𝜇 1 + 𝜋𝑠2/8 mean 𝜇, variance𝑠2 Approximate by Gaussian 𝑁(𝜒; 𝜇, 𝑠2 )
  • 56. Inference based on RBM model 55 M. Tanaka and M. Okutomi, A Novel Inference of a Restricted Boltzmann Machine, ICPR2014. Improve the performance!
  • 57. Inference based on RBM model 56
  • 58. Outline 1. Examples of Deep NNs 2. RBM to Deep NN 3. Deep Neural Network (Deep NN) – Back-Propagation (Supervised Learning) 4. Restricted Boltzmann Machine (RBM) – Mathematics, Probabilistic Model and Inference Model – Pre-training by Contradictive Diverse Learning (Unsupervised Learning) 5. Inference Model with Distribution 57 http://bit.ly/dnnicpr2014