SlideShare a Scribd company logo
1 of 5
Download to read offline
Maximum Likelihood Calibration Techniques
Chris Garling
Haverford College
Last updated March 7, 2016
Contents
1 Introduction 1
2 Construction of the Gaussian Probability Distribution Func-
tion 1
3 Derivation of the Cost Function 2
3.1 The Foundation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3.2 Introduction of Parameters . . . . . . . . . . . . . . . . . . . . . 2
3.3 Preparation of Cost Function for Python Optimizer . . . . . . . . 3
1 Introduction
In this document, I will put forth the justification and derivation of the equations
used to calibrate the Hercules data set. These observations were obtained in the
SDSS g and r bands, and throughout this document I will name variables after
the g band. Adapt the variables as needed for your project. You may also find
that you need to include spatial parameters in your calibration to account for
pixel variation across the field of view–this was unnecessary for DECam data.
2 Construction of the Gaussian Probability Dis-
tribution Function
To use this method, we require stars for which we have raw instrumental mag-
nitudes and properly measured and calibrated magnitudes–in this case, we have
magnitudes from the Sloan Digital Sky Survey that we are using as our data
points, and we construct a model magnitude out of our instrumental magnitude
and other parameters. Conceptually, what we are doing is using a likelihood
function, which by definition states the likelihood L of a set of parameter val-
ues α given data point x is equal to the probability of measuring x given α. In
mathematical terms,
1
L (α|x) = P(x |α) (1)
We have the data point x: for each star, this is the SDSS measured mag-
nitude. The set of parameters α, however, will contain terms that we do not
know. Thus, we seek to find the values of the parameter set α that maximize
the likelihood of obtaining the data point x. In this document I will prepare
equations for a Python optimizer which will do the work of determining the
optimum values of the parameters that make up α.
3 Derivation of the Cost Function
3.1 The Foundation
Because we have an entire set of data, we will use a probability distribution
function to model the spread of our data set. For this calibration, we use a
Gaussian. So now we set up the basic form of our probability function:
L =
1
√
2πσ
e− 1
2 ( x−µ
σ )2
where L is the likelihood function of obtaining the data point x given the
parameters α (µ, σ) where µ is the expected value of the data point, and σ is
the standard deviation of our Gaussian probability distribution function. At
the current state of this equation, we are looking to maximize L . However, we
have a good bit of work to do on this equation before we can write it into an
optimization program. Let’s get into it.
3.2 Introduction of Parameters
The next step is write x, µ, and σ in terms of the parameters and data sets that
we wish to include in the calibration.
1. Our data point, x, we will rewrite as gSDSS,j. This might confuse you,
as the SDSS magnitude is not what we actually measured. However, the
SDSS measured magnitude is the data point because it was measured by
someone, and what we are looking to do is determine values for parameters
that maximize the likelihood of obtaining that data point.
2. µ we will rename to Gmod, as this is our “model magnitude.”
gmod,ijn = ginst,ij + go,i + cg(g − r)SDSS,j − knxi
• where the subscripts i, j, and n signify that the variable has discrete
values for each exposure, star, or night, respectively,
• ginst is the instrumental magnitude of a star,
• go is the zeropoint offset,
2
• cg is the color term,
• (g-r)SDSS,j is the color of the star as defined by SDSS magnitudes,
• kn is a positive extinction correction parameter,
• and x is the airmass at the time the exposure was taken.
So as you can see, this model magnitude has three free parameters:
one that corrects for a scalar discrepancy between the measured mag-
nitude and the expected, one that accounts for the color of the star in
question, and one that adjusts for the airmass at the time the expo-
sure was taken. The construction of this model magnitude equation
is critical to the quality of your calibration–take time to decide pre-
cisely which parameters you wish to include.
To elaborate on why we are using SDSS magnitudes as our data
points: the parameters of our “expected value” are the ones explained
above. Our measured, raw, instrumental magnitude is one of those
parameters. gmodis our expected value because the purpose of this
calibration is to obtain the values of the three free parameters that
create an “expected value” that maximizes the likelihood (L ) of ob-
taining the data point x.
3. And we will go ahead and define
σ2
ij = σ2
SDSS,j + σ2
inst,ij + [cg(σg,j − σr,j)]2
because we will need it when we begin to simplify our equation, which is
our next step.
3.3 Preparation of Cost Function for Python Optimizer
Now, we will take the negative natural logarithm of our probability equation,
as it will make the equation easier to input to our Python optimizer. We will
call this new function the “cost function” (C).
C = −ln (L ) = −ln
1
√
2πσ
e− 1
2 ( x−µ
σ )2
= −ln
1
√
2πσ
− ln e− 1
2 ( x−µ
σ )2
C =
1
2
x − µ
σ
2
− ln
1
√
2π
− ln
1
σ
We can neglect constant terms, so:
C =
1
2
x − µ
σ
2
− ln
1
σ
=
1
2
x − µ
σ
2
+ ln(σ) =
1
2
x − µ
σ
2
+ ln
σ2
2
3
And so finally, before putting any of our parameters in, our reduced cost
function is:
C = −ln(L ) =
1
2
(x − µ)2
σ2
+ ln(σ2
)
Next, we will plug gmod , gSDSS, and σ2
ij into this equation.
C =
1
2
(gSDSS,j − gmod,ij)2
σ2
ij
+ ln(σ2
ij)
And now, we will plug in all parameters
C =
1
2
[gSDSS,j − (ginst,ij + go,i + (g − r)SDSS,jcg − knxi)]
2
σ2
SDSS,j + σ2
inst,ij + [(σg,j − σr,j)cg)]2
+
1
2
ln(σ2
SDSS,j + σ2
inst,ij + [(σg,j − σr,j)cg)]2
)
But we want to evaluate this cost function for all of our data, so we have to
change the equation to account for that.
C =
1
2 i j n
[gSDSS,j − (ginst,ij + go,i + (g − r)SDSS,jcg − knxi)]2
σ2
SDSS,j + σ2
inst,ij + [(σg,j − σr,j)cg)]2
+ ln(σ2
SDSS,j + σ2
inst,ij + [(σg,j − σr,j)cg)]2
)
We are now ready to take the partial derivatives of the cost function with
respect to each of our free parameters, which the Python optimizer requires.
This expanded version of the cost function we will only use when necessary–
otherwise, we will use σ2
and gmod to keep the equations neater.
dC
dgo,i
= −
j n
gSDSS,j − gmod,jn
σ2
ij
The derivative dC
dcg
is more difficult, so I will take this step by step. We first
use the quotient and chain rules to set up our partial derivative, then we will
solve for all constituents of the function and substitute.
dC
dcg
=
1
2 i j n
σ2 d(gsdss,j −gmod,ijn)2
dcg
− (gSDSS,j − gmod,ijn)
dσ2
ij
dcg
σ4
ij
+
dσ2
ij
dcg
σ2
ij
d(gsdss,j − gmod,ijn)2
dcg
= −2(gSDSS,J − gmod,ijn)(g − r)SDSS,j
dσ2
ij
dcg
= 2cg(σg,j − σr,j)2
σij = σ2
SDSS,j + σ2
inst,ij + [(σg,j − σr,j)cg)]2
4
and substituting ,
dC
dcg
=
1
2 i j n
−σ2
2(gSDSS,J − gmod,ijn)(g − r)SDSS,j − (gSDSS,j − gmod,ijn) 2cg(σg,j − σr,j)2
σ4
ij
+
2cg(σg,j − σr,j)2
σ2
ij
and finally, reducing,
dC
dcg
= −
i j n
cg(gSDSS,J − gmod,ijn)(σg,j − σr,j)2
σ4
+
(gSDSS,J − gmod,ijn)(g − r)SDSS,j
σ2
ij
−
cg(σg,j − σr,j)2
σ2
ij
Next, the partial derivative with respect to k:
dC
dkn
=
i j
gSDSS,j − gmod,jn
σ2
ij
xi
Now all that remains is to input the cost function and its partial derivatives
into an optimizer to find the values of the unknown parameters that maximize
the likelihood, or minimize the cost. In Python, we use scipy.optimize.fmin_l_bfgs_b
5

More Related Content

What's hot

SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegression
Daniel K
 
TunUp final presentation
TunUp final presentationTunUp final presentation
TunUp final presentation
Gianmario Spacagna
 

What's hot (20)

K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithm
 
Pyclustering tutorial - BANG
Pyclustering tutorial - BANGPyclustering tutorial - BANG
Pyclustering tutorial - BANG
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegression
 
TunUp final presentation
TunUp final presentationTunUp final presentation
TunUp final presentation
 
220exercises2
220exercises2220exercises2
220exercises2
 
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
 
A comparison of efficient algorithms for scheduling parallel data redistribution
A comparison of efficient algorithms for scheduling parallel data redistributionA comparison of efficient algorithms for scheduling parallel data redistribution
A comparison of efficient algorithms for scheduling parallel data redistribution
 
post119s1-file3
post119s1-file3post119s1-file3
post119s1-file3
 
K-means Clustering Algorithm with Matlab Source code
K-means Clustering Algorithm with Matlab Source codeK-means Clustering Algorithm with Matlab Source code
K-means Clustering Algorithm with Matlab Source code
 
K means clustering | K Means ++
K means clustering | K Means ++K means clustering | K Means ++
K means clustering | K Means ++
 
Report on Efficient Estimation for High Similarities using Odd Sketches
Report on Efficient Estimation for High Similarities using Odd Sketches  Report on Efficient Estimation for High Similarities using Odd Sketches
Report on Efficient Estimation for High Similarities using Odd Sketches
 
CLUSTERGRAM
CLUSTERGRAMCLUSTERGRAM
CLUSTERGRAM
 
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)
 
Unsupervised learning
Unsupervised learning Unsupervised learning
Unsupervised learning
 
Ayadi weglein-2013
Ayadi weglein-2013Ayadi weglein-2013
Ayadi weglein-2013
 
ClusterAnalysis
ClusterAnalysisClusterAnalysis
ClusterAnalysis
 
Customer Segmentation using Clustering
Customer Segmentation using ClusteringCustomer Segmentation using Clustering
Customer Segmentation using Clustering
 
ECE611 Mini Project2
ECE611 Mini Project2ECE611 Mini Project2
ECE611 Mini Project2
 
Clustering: A Survey
Clustering: A SurveyClustering: A Survey
Clustering: A Survey
 

Similar to Maximum Likelihood Calibration of the Hercules Data Set

Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Mengxi Jiang
 
Data structures notes for college students btech.pptx
Data structures notes for college students btech.pptxData structures notes for college students btech.pptx
Data structures notes for college students btech.pptx
KarthikVijay59
 

Similar to Maximum Likelihood Calibration of the Hercules Data Set (20)

Fcm1
Fcm1Fcm1
Fcm1
 
Fcm1
Fcm1Fcm1
Fcm1
 
Regression
RegressionRegression
Regression
 
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
 
Different Types of Machine Learning Algorithms
Different Types of Machine Learning AlgorithmsDifferent Types of Machine Learning Algorithms
Different Types of Machine Learning Algorithms
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 
Data analysis of weather forecasting
Data analysis of weather forecastingData analysis of weather forecasting
Data analysis of weather forecasting
 
Design and Implementation of Variable Radius Sphere Decoding Algorithm
Design and Implementation of Variable Radius Sphere Decoding AlgorithmDesign and Implementation of Variable Radius Sphere Decoding Algorithm
Design and Implementation of Variable Radius Sphere Decoding Algorithm
 
Exploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectExploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems Project
 
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
 
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...
Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
 
Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technical
 
Image Encryption and Compression
Image Encryption and Compression Image Encryption and Compression
Image Encryption and Compression
 
Data structures notes for college students btech.pptx
Data structures notes for college students btech.pptxData structures notes for college students btech.pptx
Data structures notes for college students btech.pptx
 
Principal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationPrincipal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and Visualization
 
Ggplot2 ch2
Ggplot2 ch2Ggplot2 ch2
Ggplot2 ch2
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduce
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Efficient asic architecture of rsa cryptosystem
Efficient asic architecture of rsa cryptosystemEfficient asic architecture of rsa cryptosystem
Efficient asic architecture of rsa cryptosystem
 

Maximum Likelihood Calibration of the Hercules Data Set

  • 1. Maximum Likelihood Calibration Techniques Chris Garling Haverford College Last updated March 7, 2016 Contents 1 Introduction 1 2 Construction of the Gaussian Probability Distribution Func- tion 1 3 Derivation of the Cost Function 2 3.1 The Foundation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3.2 Introduction of Parameters . . . . . . . . . . . . . . . . . . . . . 2 3.3 Preparation of Cost Function for Python Optimizer . . . . . . . . 3 1 Introduction In this document, I will put forth the justification and derivation of the equations used to calibrate the Hercules data set. These observations were obtained in the SDSS g and r bands, and throughout this document I will name variables after the g band. Adapt the variables as needed for your project. You may also find that you need to include spatial parameters in your calibration to account for pixel variation across the field of view–this was unnecessary for DECam data. 2 Construction of the Gaussian Probability Dis- tribution Function To use this method, we require stars for which we have raw instrumental mag- nitudes and properly measured and calibrated magnitudes–in this case, we have magnitudes from the Sloan Digital Sky Survey that we are using as our data points, and we construct a model magnitude out of our instrumental magnitude and other parameters. Conceptually, what we are doing is using a likelihood function, which by definition states the likelihood L of a set of parameter val- ues α given data point x is equal to the probability of measuring x given α. In mathematical terms, 1
  • 2. L (α|x) = P(x |α) (1) We have the data point x: for each star, this is the SDSS measured mag- nitude. The set of parameters α, however, will contain terms that we do not know. Thus, we seek to find the values of the parameter set α that maximize the likelihood of obtaining the data point x. In this document I will prepare equations for a Python optimizer which will do the work of determining the optimum values of the parameters that make up α. 3 Derivation of the Cost Function 3.1 The Foundation Because we have an entire set of data, we will use a probability distribution function to model the spread of our data set. For this calibration, we use a Gaussian. So now we set up the basic form of our probability function: L = 1 √ 2πσ e− 1 2 ( x−µ σ )2 where L is the likelihood function of obtaining the data point x given the parameters α (µ, σ) where µ is the expected value of the data point, and σ is the standard deviation of our Gaussian probability distribution function. At the current state of this equation, we are looking to maximize L . However, we have a good bit of work to do on this equation before we can write it into an optimization program. Let’s get into it. 3.2 Introduction of Parameters The next step is write x, µ, and σ in terms of the parameters and data sets that we wish to include in the calibration. 1. Our data point, x, we will rewrite as gSDSS,j. This might confuse you, as the SDSS magnitude is not what we actually measured. However, the SDSS measured magnitude is the data point because it was measured by someone, and what we are looking to do is determine values for parameters that maximize the likelihood of obtaining that data point. 2. µ we will rename to Gmod, as this is our “model magnitude.” gmod,ijn = ginst,ij + go,i + cg(g − r)SDSS,j − knxi • where the subscripts i, j, and n signify that the variable has discrete values for each exposure, star, or night, respectively, • ginst is the instrumental magnitude of a star, • go is the zeropoint offset, 2
  • 3. • cg is the color term, • (g-r)SDSS,j is the color of the star as defined by SDSS magnitudes, • kn is a positive extinction correction parameter, • and x is the airmass at the time the exposure was taken. So as you can see, this model magnitude has three free parameters: one that corrects for a scalar discrepancy between the measured mag- nitude and the expected, one that accounts for the color of the star in question, and one that adjusts for the airmass at the time the expo- sure was taken. The construction of this model magnitude equation is critical to the quality of your calibration–take time to decide pre- cisely which parameters you wish to include. To elaborate on why we are using SDSS magnitudes as our data points: the parameters of our “expected value” are the ones explained above. Our measured, raw, instrumental magnitude is one of those parameters. gmodis our expected value because the purpose of this calibration is to obtain the values of the three free parameters that create an “expected value” that maximizes the likelihood (L ) of ob- taining the data point x. 3. And we will go ahead and define σ2 ij = σ2 SDSS,j + σ2 inst,ij + [cg(σg,j − σr,j)]2 because we will need it when we begin to simplify our equation, which is our next step. 3.3 Preparation of Cost Function for Python Optimizer Now, we will take the negative natural logarithm of our probability equation, as it will make the equation easier to input to our Python optimizer. We will call this new function the “cost function” (C). C = −ln (L ) = −ln 1 √ 2πσ e− 1 2 ( x−µ σ )2 = −ln 1 √ 2πσ − ln e− 1 2 ( x−µ σ )2 C = 1 2 x − µ σ 2 − ln 1 √ 2π − ln 1 σ We can neglect constant terms, so: C = 1 2 x − µ σ 2 − ln 1 σ = 1 2 x − µ σ 2 + ln(σ) = 1 2 x − µ σ 2 + ln σ2 2 3
  • 4. And so finally, before putting any of our parameters in, our reduced cost function is: C = −ln(L ) = 1 2 (x − µ)2 σ2 + ln(σ2 ) Next, we will plug gmod , gSDSS, and σ2 ij into this equation. C = 1 2 (gSDSS,j − gmod,ij)2 σ2 ij + ln(σ2 ij) And now, we will plug in all parameters C = 1 2 [gSDSS,j − (ginst,ij + go,i + (g − r)SDSS,jcg − knxi)] 2 σ2 SDSS,j + σ2 inst,ij + [(σg,j − σr,j)cg)]2 + 1 2 ln(σ2 SDSS,j + σ2 inst,ij + [(σg,j − σr,j)cg)]2 ) But we want to evaluate this cost function for all of our data, so we have to change the equation to account for that. C = 1 2 i j n [gSDSS,j − (ginst,ij + go,i + (g − r)SDSS,jcg − knxi)]2 σ2 SDSS,j + σ2 inst,ij + [(σg,j − σr,j)cg)]2 + ln(σ2 SDSS,j + σ2 inst,ij + [(σg,j − σr,j)cg)]2 ) We are now ready to take the partial derivatives of the cost function with respect to each of our free parameters, which the Python optimizer requires. This expanded version of the cost function we will only use when necessary– otherwise, we will use σ2 and gmod to keep the equations neater. dC dgo,i = − j n gSDSS,j − gmod,jn σ2 ij The derivative dC dcg is more difficult, so I will take this step by step. We first use the quotient and chain rules to set up our partial derivative, then we will solve for all constituents of the function and substitute. dC dcg = 1 2 i j n σ2 d(gsdss,j −gmod,ijn)2 dcg − (gSDSS,j − gmod,ijn) dσ2 ij dcg σ4 ij + dσ2 ij dcg σ2 ij d(gsdss,j − gmod,ijn)2 dcg = −2(gSDSS,J − gmod,ijn)(g − r)SDSS,j dσ2 ij dcg = 2cg(σg,j − σr,j)2 σij = σ2 SDSS,j + σ2 inst,ij + [(σg,j − σr,j)cg)]2 4
  • 5. and substituting , dC dcg = 1 2 i j n −σ2 2(gSDSS,J − gmod,ijn)(g − r)SDSS,j − (gSDSS,j − gmod,ijn) 2cg(σg,j − σr,j)2 σ4 ij + 2cg(σg,j − σr,j)2 σ2 ij and finally, reducing, dC dcg = − i j n cg(gSDSS,J − gmod,ijn)(σg,j − σr,j)2 σ4 + (gSDSS,J − gmod,ijn)(g − r)SDSS,j σ2 ij − cg(σg,j − σr,j)2 σ2 ij Next, the partial derivative with respect to k: dC dkn = i j gSDSS,j − gmod,jn σ2 ij xi Now all that remains is to input the cost function and its partial derivatives into an optimizer to find the values of the unknown parameters that maximize the likelihood, or minimize the cost. In Python, we use scipy.optimize.fmin_l_bfgs_b 5