SlideShare a Scribd company logo
Contrastive Divergence
Learning
Geoffrey E. Hinton
A discussion led by Oliver Woodford
Contents
• Maximum Likelihood learning
• Gradient descent based approach
• Markov Chain Monte Carlo sampling
• Contrastive Divergence
• Further topics for discussion:
– Result biasing of Contrastive Divergence
– Product of Experts
– High-dimensional data considerations
• Given:
– Probability model
• - model parameters
• - the partition function, defined as
– Training data
• Aim:
– Find that maximizes likelihood of training data:
– Or, that minimizes negative log of likelihood:
Maximum Likelihood learning
X = fxk
gK
k=1
p(x; £) = 1
Z(£)
f(x; £)
Z(£)
£
Z(£) =
R
f(x; £) dx
£
p(X; £) =
Q
K
k=1
1
Z(£)
f(xk
; £)
£
Toy example
Known result:
E(X; £) = K log(Z(£)) ¡ P
K
k=1
log(f(xk
; £))
f(x; £) = exp¡ (x¡¹)2
2¾2
£ = f¹; ¾g
Z(£) = ¾
p
2¼
• Method:
– at minimum
– Let’s assume that there is no linear solution…
Maximum Likelihood learning
@E(X;£)
@£
= 0
@E(X; £)
@£
=
@ log Z(£)
@£
¡ 1
K
KX
i=1
@ log f(xi
; £)
@£
=
@ log Z(£)
@£
¡
¿
@ log f(x; £)
@£
À
X
is the expectation of given the data distribution .
h¢i
X
¢ X
@E(X;£)
@£
= @ log(¾
p
2¼)
@£
+
¿
@ (x¡¹)2
2¾2
@£
À
X
@E(X;£)
@¹
= ¡
-
x¡¹
¾2
®
X
= 0 ) ¹ = hxi
X
@E(X;£)
@¾
= 1
¾
+
D
(x¡¹)2
¾3
E
X
= 0 ) ¾ =
p
h(x ¡ ¹)2i
X
– Move a fixed step size, , in the direction of steepest
gradient. (Not line search – see why later).
– This gives the following parameter update equation:
Gradient descent-based approach
´
£t+1 = £t
¡ ´
@E(X; £t
)
@£t
= £t
¡ ´
µ
@ log Z(£t
)
@£t
¡
¿
@ log f(x; £t
)
@£t
À
X
¶
– Recall . Sometimes this integral
will be algebraically intractable.
– This means we can calculate neither
nor (hence no line search).
– However, with some clever substitution…
– so
where can be estimated numerically.
Gradient descent-based approach
Z(£) =
R
f(x; £) dx
E(X; £)
@ log Z(£)
@£
@ log Z(£)
@£
= 1
Z(£)
@Z(£)
@£
= 1
Z(£)
@
@£
R
f(x; £) dx
= 1
Z(£)
R
@f(x;£)
@£
dx = 1
Z(£)
R
f(x; £)@ log f(x;£)
@£
dx
=
R
p(x; £)@ log f(x;£)
@£
dx =
D
@ log f(x;£)
@£
E
p(x;£)
D
@ log f(x;£)
@£
E
p(x;£)
£t+1 = £t
¡ ´
µD
@ log f(x;£t
)
@£t
E
p(x;£t
)
¡
D
@ log f(x;£t
)
@£t
E
X
¶
– To estimate we must draw samples from .
– Since is unknown, we cannot draw samples randomly
from a cumulative distribution curve.
– Markov Chain Monte Carlo (MCMC) methods turn random
samples into samples from a proposed distribution, without
knowing .
– Metropolis algorithm:
• Perturb samples e.g.
• Reject if
• Repeat cycle for all samples until stabilization of the distribution.
– Stabilization takes many cycles, and there is no accurate
criteria for determining when it has occurred.
Markov Chain Monte Carlo samplingD
@ log f(x;£)
@£
E
p(x;£)
p(x; £)
Z(£)
x0
k
= xk
+ randn(size(xk
))
x0
k
p(x0
k
;£)
p(xk
;£)
< rand(1)
Z(£)
– Let us use the training data, , as the starting point for our
MCMC sampling.
– Our parameter update equation becomes:
Markov Chain Monte Carlo sampling
X
£t+1 = £t
¡ ´
µD
@ log f(x;£t
)
@£t
E
X1
£t
¡
D
@ log f(x;£t
)
@£t
E
X0
£t
¶
Notation: - training data, - training data after cycles of MCMC,
- samples from proposed distribution with parameters .
n
X1
£
X0
£
Xn
£
£
– Let us make the number of MCMC cycles per iteration
small, say even 1.
– Our parameter update equation is now:
– Intuition: 1 MCMC cycle is enough to move the data from the
target distribution towards the proposed distribution, and so
suggest which direction the proposed distribution should
move to better model the training data.
Contrastive divergence
£t+1 = £t
¡ ´
µD
@ log f(x;£t
)
@£t
E
X1
£t
¡
D
@ log f(x;£t
)
@£t
E
X0
£t
¶
Contrastive divergence bias
– We assume:
– ML learning equivalent to minimizing , where
(Kullback-Leibler divergence).
– CD attempts to minimize
– Usually , but can sometimes bias
results.
– See “On Contrastive Divergence
Learning”, Carreira-Perpinan & Hinton, AIStats
2005, for more details.
PjjQ =
R
p(x) log p(x)
q(x)
dx
@E(X;£)
@£
¼
D
@ log f(x;£)
@£
E
X1
£
¡
D
@ log f(x;£)
@£
E
X0
£
X0
£
jjX1
£
X0
£
jjX1
£
¡ X1
£
jjX1
£
@
@£
(X0
£
jjX1
£
¡X1
£
jjX1
£
) =
D
@ log f(x;£)
@£
E
X1
£
¡
D
@ log f(x;£)
@£
E
X0
£
¡@X1
£
@£
@X1
£
jjX1
£
@X1
£
@X1
£
@£
@X1
£
jjX1
£
@X1
£
¼ 0
Product of Experts
Dimensionality issues

More Related Content

What's hot

Graphing day 1 worked
Graphing day 1 workedGraphing day 1 worked
Graphing day 1 worked
Jonna Ramsey
 
Multicasting in Linear Deterministic Relay Network by Matrix Completion
Multicasting in Linear Deterministic Relay Network by Matrix CompletionMulticasting in Linear Deterministic Relay Network by Matrix Completion
Multicasting in Linear Deterministic Relay Network by Matrix Completion
Tasuku Soma
 
4.7 inverse functions.ppt worked
4.7   inverse functions.ppt worked4.7   inverse functions.ppt worked
4.7 inverse functions.ppt worked
Jonna Ramsey
 
CLIM Undergraduate Workshop: Tutorial on R Software - Huang Huang, Oct 23, 2017
CLIM Undergraduate Workshop: Tutorial on R Software - Huang Huang, Oct 23, 2017CLIM Undergraduate Workshop: Tutorial on R Software - Huang Huang, Oct 23, 2017
CLIM Undergraduate Workshop: Tutorial on R Software - Huang Huang, Oct 23, 2017
The Statistical and Applied Mathematical Sciences Institute
 
Random Number Generators 2018
Random Number Generators 2018Random Number Generators 2018
Random Number Generators 2018
rinnocente
 
H2O World - GLRM - Anqi Fu
H2O World - GLRM - Anqi FuH2O World - GLRM - Anqi Fu
H2O World - GLRM - Anqi Fu
Sri Ambati
 
Generalized Low Rank Models
Generalized Low Rank ModelsGeneralized Low Rank Models
Generalized Low Rank Models
Sri Ambati
 
The Uncertain Enterprise
The Uncertain EnterpriseThe Uncertain Enterprise
The Uncertain Enterprise
ClarkTony
 
Coq for ML users
Coq for ML usersCoq for ML users
Coq for ML users
tmiya
 
Module 6.7
Module 6.7Module 6.7
Module 6.7
mathwithcoachhall
 
Raytracing Part II
Raytracing Part IIRaytracing Part II
Raytracing Part II
Jorge Cantón Ferrero
 
Quaternionic Modular Symbols in Sage
Quaternionic Modular Symbols in SageQuaternionic Modular Symbols in Sage
Quaternionic Modular Symbols in Sage
mmasdeu
 
Aaex7 group2(中英夾雜)
Aaex7 group2(中英夾雜)Aaex7 group2(中英夾雜)
Aaex7 group2(中英夾雜)
Shiang-Yun Yang
 
Esmaeilzade sampling
Esmaeilzade   samplingEsmaeilzade   sampling
Esmaeilzade sampling
rezairavani
 
Gradient descent optimizer
Gradient descent optimizerGradient descent optimizer
Gradient descent optimizer
Hojin Yang
 
Bayesian Inference : Kalman filter 에서 Optimization 까지 - 김홍배 박사님
Bayesian Inference : Kalman filter 에서 Optimization 까지 - 김홍배 박사님Bayesian Inference : Kalman filter 에서 Optimization 까지 - 김홍배 박사님
Bayesian Inference : Kalman filter 에서 Optimization 까지 - 김홍배 박사님
AI Robotics KR
 
Lecture 7 Derivatives
Lecture 7   DerivativesLecture 7   Derivatives
Lecture 7 Derivatives
njit-ronbrown
 
Alg March 26, 2009
Alg March 26, 2009Alg March 26, 2009
Alg March 26, 2009
Mr. Smith
 
Asymptotics 140510003721-phpapp02
Asymptotics 140510003721-phpapp02Asymptotics 140510003721-phpapp02
Asymptotics 140510003721-phpapp02
mansab MIRZA
 
Maximum flow
Maximum flowMaximum flow
Maximum flow
Md. Shafiuzzaman Hira
 

What's hot (20)

Graphing day 1 worked
Graphing day 1 workedGraphing day 1 worked
Graphing day 1 worked
 
Multicasting in Linear Deterministic Relay Network by Matrix Completion
Multicasting in Linear Deterministic Relay Network by Matrix CompletionMulticasting in Linear Deterministic Relay Network by Matrix Completion
Multicasting in Linear Deterministic Relay Network by Matrix Completion
 
4.7 inverse functions.ppt worked
4.7   inverse functions.ppt worked4.7   inverse functions.ppt worked
4.7 inverse functions.ppt worked
 
CLIM Undergraduate Workshop: Tutorial on R Software - Huang Huang, Oct 23, 2017
CLIM Undergraduate Workshop: Tutorial on R Software - Huang Huang, Oct 23, 2017CLIM Undergraduate Workshop: Tutorial on R Software - Huang Huang, Oct 23, 2017
CLIM Undergraduate Workshop: Tutorial on R Software - Huang Huang, Oct 23, 2017
 
Random Number Generators 2018
Random Number Generators 2018Random Number Generators 2018
Random Number Generators 2018
 
H2O World - GLRM - Anqi Fu
H2O World - GLRM - Anqi FuH2O World - GLRM - Anqi Fu
H2O World - GLRM - Anqi Fu
 
Generalized Low Rank Models
Generalized Low Rank ModelsGeneralized Low Rank Models
Generalized Low Rank Models
 
The Uncertain Enterprise
The Uncertain EnterpriseThe Uncertain Enterprise
The Uncertain Enterprise
 
Coq for ML users
Coq for ML usersCoq for ML users
Coq for ML users
 
Module 6.7
Module 6.7Module 6.7
Module 6.7
 
Raytracing Part II
Raytracing Part IIRaytracing Part II
Raytracing Part II
 
Quaternionic Modular Symbols in Sage
Quaternionic Modular Symbols in SageQuaternionic Modular Symbols in Sage
Quaternionic Modular Symbols in Sage
 
Aaex7 group2(中英夾雜)
Aaex7 group2(中英夾雜)Aaex7 group2(中英夾雜)
Aaex7 group2(中英夾雜)
 
Esmaeilzade sampling
Esmaeilzade   samplingEsmaeilzade   sampling
Esmaeilzade sampling
 
Gradient descent optimizer
Gradient descent optimizerGradient descent optimizer
Gradient descent optimizer
 
Bayesian Inference : Kalman filter 에서 Optimization 까지 - 김홍배 박사님
Bayesian Inference : Kalman filter 에서 Optimization 까지 - 김홍배 박사님Bayesian Inference : Kalman filter 에서 Optimization 까지 - 김홍배 박사님
Bayesian Inference : Kalman filter 에서 Optimization 까지 - 김홍배 박사님
 
Lecture 7 Derivatives
Lecture 7   DerivativesLecture 7   Derivatives
Lecture 7 Derivatives
 
Alg March 26, 2009
Alg March 26, 2009Alg March 26, 2009
Alg March 26, 2009
 
Asymptotics 140510003721-phpapp02
Asymptotics 140510003721-phpapp02Asymptotics 140510003721-phpapp02
Asymptotics 140510003721-phpapp02
 
Maximum flow
Maximum flowMaximum flow
Maximum flow
 

Similar to Contrastive Divergence Learning

PS
PSPS
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
Jun Young Park
 
Multiplicative Interaction Models in R
Multiplicative Interaction Models in RMultiplicative Interaction Models in R
Multiplicative Interaction Models in R
htstatistics
 
Efficient Hill Climber for Multi-Objective Pseudo-Boolean Optimization
Efficient Hill Climber for Multi-Objective Pseudo-Boolean OptimizationEfficient Hill Climber for Multi-Objective Pseudo-Boolean Optimization
Efficient Hill Climber for Multi-Objective Pseudo-Boolean Optimization
jfrchicanog
 
Prob-Dist-Toll-Forecast-Uncertainty
Prob-Dist-Toll-Forecast-UncertaintyProb-Dist-Toll-Forecast-Uncertainty
Prob-Dist-Toll-Forecast-Uncertainty
Ankoor Bhagat
 
Teknik Simulasi
Teknik SimulasiTeknik Simulasi
Teknik Simulasi
Rezzy Caraka
 
Pseudo Random Number Generators
Pseudo Random Number GeneratorsPseudo Random Number Generators
Pseudo Random Number Generators
Darshini Parikh
 
PRML Reading Chapter 11 - Sampling Method
PRML Reading Chapter 11 - Sampling MethodPRML Reading Chapter 11 - Sampling Method
PRML Reading Chapter 11 - Sampling Method
Ha Phuong
 
Response Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty QuantificationResponse Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty Quantification
Alexander Litvinenko
 
Optim_methods.pdf
Optim_methods.pdfOptim_methods.pdf
Optim_methods.pdf
SantiagoGarridoBulln
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
The Statistical and Applied Mathematical Sciences Institute
 
derivative.ppt
derivative.pptderivative.ppt
derivative.ppt
Spyder20
 
derivative.ppt
derivative.pptderivative.ppt
derivative.ppt
bahbib22
 
Fast parallelizable scenario-based stochastic optimization
Fast parallelizable scenario-based stochastic optimizationFast parallelizable scenario-based stochastic optimization
Fast parallelizable scenario-based stochastic optimization
Pantelis Sopasakis
 
Phase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCDPhase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCD
Benjamin Jaedon Choi
 
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
The Statistical and Applied Mathematical Sciences Institute
 
Identification of the Mathematical Models of Complex Relaxation Processes in ...
Identification of the Mathematical Models of Complex Relaxation Processes in ...Identification of the Mathematical Models of Complex Relaxation Processes in ...
Identification of the Mathematical Models of Complex Relaxation Processes in ...
Vladimir Bakhrushin
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Website designing company in delhi ncr
Website designing company in delhi ncrWebsite designing company in delhi ncr
Website designing company in delhi ncr
Css Founder
 
Website designing company in delhi ncr
Website designing company in delhi ncrWebsite designing company in delhi ncr
Website designing company in delhi ncr
Css Founder
 

Similar to Contrastive Divergence Learning (20)

PS
PSPS
PS
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
 
Multiplicative Interaction Models in R
Multiplicative Interaction Models in RMultiplicative Interaction Models in R
Multiplicative Interaction Models in R
 
Efficient Hill Climber for Multi-Objective Pseudo-Boolean Optimization
Efficient Hill Climber for Multi-Objective Pseudo-Boolean OptimizationEfficient Hill Climber for Multi-Objective Pseudo-Boolean Optimization
Efficient Hill Climber for Multi-Objective Pseudo-Boolean Optimization
 
Prob-Dist-Toll-Forecast-Uncertainty
Prob-Dist-Toll-Forecast-UncertaintyProb-Dist-Toll-Forecast-Uncertainty
Prob-Dist-Toll-Forecast-Uncertainty
 
Teknik Simulasi
Teknik SimulasiTeknik Simulasi
Teknik Simulasi
 
Pseudo Random Number Generators
Pseudo Random Number GeneratorsPseudo Random Number Generators
Pseudo Random Number Generators
 
PRML Reading Chapter 11 - Sampling Method
PRML Reading Chapter 11 - Sampling MethodPRML Reading Chapter 11 - Sampling Method
PRML Reading Chapter 11 - Sampling Method
 
Response Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty QuantificationResponse Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty Quantification
 
Optim_methods.pdf
Optim_methods.pdfOptim_methods.pdf
Optim_methods.pdf
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
derivative.ppt
derivative.pptderivative.ppt
derivative.ppt
 
derivative.ppt
derivative.pptderivative.ppt
derivative.ppt
 
Fast parallelizable scenario-based stochastic optimization
Fast parallelizable scenario-based stochastic optimizationFast parallelizable scenario-based stochastic optimization
Fast parallelizable scenario-based stochastic optimization
 
Phase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCDPhase diagram at finite T & Mu in strong coupling limit of lattice QCD
Phase diagram at finite T & Mu in strong coupling limit of lattice QCD
 
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
 
Identification of the Mathematical Models of Complex Relaxation Processes in ...
Identification of the Mathematical Models of Complex Relaxation Processes in ...Identification of the Mathematical Models of Complex Relaxation Processes in ...
Identification of the Mathematical Models of Complex Relaxation Processes in ...
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
 
Website designing company in delhi ncr
Website designing company in delhi ncrWebsite designing company in delhi ncr
Website designing company in delhi ncr
 
Website designing company in delhi ncr
Website designing company in delhi ncrWebsite designing company in delhi ncr
Website designing company in delhi ncr
 

Recently uploaded

JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 

Recently uploaded (20)

JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 

Contrastive Divergence Learning

  • 1. Contrastive Divergence Learning Geoffrey E. Hinton A discussion led by Oliver Woodford
  • 2. Contents • Maximum Likelihood learning • Gradient descent based approach • Markov Chain Monte Carlo sampling • Contrastive Divergence • Further topics for discussion: – Result biasing of Contrastive Divergence – Product of Experts – High-dimensional data considerations
  • 3. • Given: – Probability model • - model parameters • - the partition function, defined as – Training data • Aim: – Find that maximizes likelihood of training data: – Or, that minimizes negative log of likelihood: Maximum Likelihood learning X = fxk gK k=1 p(x; £) = 1 Z(£) f(x; £) Z(£) £ Z(£) = R f(x; £) dx £ p(X; £) = Q K k=1 1 Z(£) f(xk ; £) £ Toy example Known result: E(X; £) = K log(Z(£)) ¡ P K k=1 log(f(xk ; £)) f(x; £) = exp¡ (x¡¹)2 2¾2 £ = f¹; ¾g Z(£) = ¾ p 2¼
  • 4. • Method: – at minimum – Let’s assume that there is no linear solution… Maximum Likelihood learning @E(X;£) @£ = 0 @E(X; £) @£ = @ log Z(£) @£ ¡ 1 K KX i=1 @ log f(xi ; £) @£ = @ log Z(£) @£ ¡ ¿ @ log f(x; £) @£ À X is the expectation of given the data distribution . h¢i X ¢ X @E(X;£) @£ = @ log(¾ p 2¼) @£ + ¿ @ (x¡¹)2 2¾2 @£ À X @E(X;£) @¹ = ¡ - x¡¹ ¾2 ® X = 0 ) ¹ = hxi X @E(X;£) @¾ = 1 ¾ + D (x¡¹)2 ¾3 E X = 0 ) ¾ = p h(x ¡ ¹)2i X
  • 5. – Move a fixed step size, , in the direction of steepest gradient. (Not line search – see why later). – This gives the following parameter update equation: Gradient descent-based approach ´ £t+1 = £t ¡ ´ @E(X; £t ) @£t = £t ¡ ´ µ @ log Z(£t ) @£t ¡ ¿ @ log f(x; £t ) @£t À X ¶
  • 6. – Recall . Sometimes this integral will be algebraically intractable. – This means we can calculate neither nor (hence no line search). – However, with some clever substitution… – so where can be estimated numerically. Gradient descent-based approach Z(£) = R f(x; £) dx E(X; £) @ log Z(£) @£ @ log Z(£) @£ = 1 Z(£) @Z(£) @£ = 1 Z(£) @ @£ R f(x; £) dx = 1 Z(£) R @f(x;£) @£ dx = 1 Z(£) R f(x; £)@ log f(x;£) @£ dx = R p(x; £)@ log f(x;£) @£ dx = D @ log f(x;£) @£ E p(x;£) D @ log f(x;£) @£ E p(x;£) £t+1 = £t ¡ ´ µD @ log f(x;£t ) @£t E p(x;£t ) ¡ D @ log f(x;£t ) @£t E X ¶
  • 7. – To estimate we must draw samples from . – Since is unknown, we cannot draw samples randomly from a cumulative distribution curve. – Markov Chain Monte Carlo (MCMC) methods turn random samples into samples from a proposed distribution, without knowing . – Metropolis algorithm: • Perturb samples e.g. • Reject if • Repeat cycle for all samples until stabilization of the distribution. – Stabilization takes many cycles, and there is no accurate criteria for determining when it has occurred. Markov Chain Monte Carlo samplingD @ log f(x;£) @£ E p(x;£) p(x; £) Z(£) x0 k = xk + randn(size(xk )) x0 k p(x0 k ;£) p(xk ;£) < rand(1) Z(£)
  • 8. – Let us use the training data, , as the starting point for our MCMC sampling. – Our parameter update equation becomes: Markov Chain Monte Carlo sampling X £t+1 = £t ¡ ´ µD @ log f(x;£t ) @£t E X1 £t ¡ D @ log f(x;£t ) @£t E X0 £t ¶ Notation: - training data, - training data after cycles of MCMC, - samples from proposed distribution with parameters . n X1 £ X0 £ Xn £ £
  • 9. – Let us make the number of MCMC cycles per iteration small, say even 1. – Our parameter update equation is now: – Intuition: 1 MCMC cycle is enough to move the data from the target distribution towards the proposed distribution, and so suggest which direction the proposed distribution should move to better model the training data. Contrastive divergence £t+1 = £t ¡ ´ µD @ log f(x;£t ) @£t E X1 £t ¡ D @ log f(x;£t ) @£t E X0 £t ¶
  • 10. Contrastive divergence bias – We assume: – ML learning equivalent to minimizing , where (Kullback-Leibler divergence). – CD attempts to minimize – Usually , but can sometimes bias results. – See “On Contrastive Divergence Learning”, Carreira-Perpinan & Hinton, AIStats 2005, for more details. PjjQ = R p(x) log p(x) q(x) dx @E(X;£) @£ ¼ D @ log f(x;£) @£ E X1 £ ¡ D @ log f(x;£) @£ E X0 £ X0 £ jjX1 £ X0 £ jjX1 £ ¡ X1 £ jjX1 £ @ @£ (X0 £ jjX1 £ ¡X1 £ jjX1 £ ) = D @ log f(x;£) @£ E X1 £ ¡ D @ log f(x;£) @£ E X0 £ ¡@X1 £ @£ @X1 £ jjX1 £ @X1 £ @X1 £ @£ @X1 £ jjX1 £ @X1 £ ¼ 0