SlideShare a Scribd company logo
1 of 12
Download to read offline
Estimators, Bias and Variance
MLE
Bayesian Statistics
Statistics for Deep Learning
Sung-Yub Kim
Dept of IE, Seoul National University
January 14, 2017
Sung-Yub Kim Statistics for Deep Learning
Estimators, Bias and Variance
MLE
Bayesian Statistics
Point Estimation
Bias
Variance and Standard Error
Consistency
Point Estimation
Point estimation is the attempt to provide the single ’Best’ prediction of some
quantity of interest. In general, the quantity of interest can be a single
parameter or a vector of parameters in some parametric model.
To distinguish estimates of parameters from their true value, our convention
will be dentoe a point estimate of a parameter θ by ˆθ
A good estimator is a function whose output is close to the true underlying θ
that generated the training data.
Funtion Estimation
Sometimes we are interested in performing function approximation. Here we
trying to predict a variable y given x. Therefore, we may assume that
y = f (x) + (1)
Sung-Yub Kim Statistics for Deep Learning
Estimators, Bias and Variance
MLE
Bayesian Statistics
Point Estimation
Bias
Variance and Standard Error
Consistency
Bias
The bias of an estimator is defined as
bias( ˆθm) = E[ ˆθm] − θ (2)
Unbiased
An estimator ˆθm is said to be unbiased if
bias( ˆθm) = 0 (3)
,which implies that E[ ˆθm] = θ.
Asymptotically Unbiased
An estimator ˆθm is said to be asymptotically unbiased if
lim
m→∞
bias( ˆθm) = 0 (4)
, which implies that limm→∞ E[ ˆθm] = θ
Sung-Yub Kim Statistics for Deep Learning
Estimators, Bias and Variance
MLE
Bayesian Statistics
Point Estimation
Bias
Variance and Standard Error
Consistency
Variance
The variance of an estimator is simply the variance
Var(ˆθ) (5)
where the random variable is the training set. Alternately, the square root of
the variance is called the standard error, denoted SE(ˆθ).
The variance of an estimator provides a measure of how we would expect the
estimate we compute from data to vary as we independently resample the
dataset from underlying data-generating process.
Sung-Yub Kim Statistics for Deep Learning
Estimators, Bias and Variance
MLE
Bayesian Statistics
Point Estimation
Bias
Variance and Standard Error
Consistency
Bias-Variance Tradeoff
The realtion between bias and variance can be showed as term of mean
squared error(MSE) of the estimtes:
MSE = E[( ˆθm − θ)] = Bias( ˆθm)2
+ Var( ˆθm) (6)
Desirable estimators are those with small MSE and these are estimators that
manage to keep both their bias and variance somewhat in check. Usually, we
measure generalization error is measured by the MSE.
Sung-Yub Kim Statistics for Deep Learning
Estimators, Bias and Variance
MLE
Bayesian Statistics
Point Estimation
Bias
Variance and Standard Error
Consistency
Weak Consistency
Usually, we are also concerned with the behavior of an estimator as the
amount of training data grows. We wish that, as the number of data points
m in our dataset increases, our point converge to the true value of the
corresponding parameters. More formally, we define an estimate of parameter is
weak consistent if
p lim
m→∞
ˆθm = θ (7)
The symbol p lim indicates convergence in probability, meaning that for any
> 0, P[| ˆθm − θ| > ] → 0 as m → 0.
Strong Consistency
Also, we can define strong consistency like
P[ lim
m→∞
ˆθm = θ] = 1 (8)
This equation also means that almost surely convergence,
P[lim infm→∞{ω ∈ Ω : | ˆθm(ω) − θ(ω)| < }] = 1, ∀ > 0
Sung-Yub Kim Statistics for Deep Learning
Estimators, Bias and Variance
MLE
Bayesian Statistics
Point Estimation
Bias
Variance and Standard Error
Consistency
Consistency and Asymptotically Unbiased
Consistency ensures that bias induced by the estimator diminishes as the
number of data example grows. But, the reverse in not true, since even if we
have enough examples, we can make proabability bigger than a positive number
by controlling .
Sung-Yub Kim Statistics for Deep Learning
Estimators, Bias and Variance
MLE
Bayesian Statistics
MLE
Properties of MLE
Maximum Likelihood Estimation
Let pmodel (x; θ) be a parametric family of probability distributions over the
same space indexed by θ. The Maximum Likelehood Estimator(MLE) for θ is
defined as
θMLE = arg max
θ
pmodel (X; θ) = arg max
θ
m
i=1
pmodel (x(i)
; θ) (9)
and this maximization problem is equivalent to
arg max
θ
m
i=1
log pmodel (x(i)
; θ) = arg max
θ
Ex∼ˆpdata [log pmodel (x; θ)] (10)
where ˆpdata is a empirical distribution. And this can be interpreted as
arg min
θ
DKL(ˆpdata pmodel ) = arg max
θ
Ex∼ˆpdata [log ˆpdata(x)−log pmodel (x; θ)] (11)
since this minimization problem is equivalent to
−arg min
θ
Ex∼ˆpdata [log pmodel (x)] (12)
Sung-Yub Kim Statistics for Deep Learning
Estimators, Bias and Variance
MLE
Bayesian Statistics
MLE
Properties of MLE
Minimizing the Cross Entropy is all we do
Any loss consisting of a negative log-likelihood is a cross-entropy between the
empirical distribution defined by the training set and the probability distribution
defined by model. For example, MSE is the cross-entropy between the
empirical distribution and a Gaussian model.
Sung-Yub Kim Statistics for Deep Learning
Estimators, Bias and Variance
MLE
Bayesian Statistics
MLE
Properties of MLE
MLE is a consistent estimator.
If we assume that the true distribution pdata must lie within the model family
pmodel and the true distribution pdata must correspond to exactly one value of θ.
Consistent estimator has statistical efficiency.
It means that one consistent estimator may obtain lower generalization error for
a fixed number of samples or requires fewer examples to obtain a fixed level of
generalization error.
Cramer-Rao Bound
Cramer-Rao lower bound show that no consistent estimator has a lower MSE
than the MLE.
Sung-Yub Kim Statistics for Deep Learning
Estimators, Bias and Variance
MLE
Bayesian Statistics
MAP
Differences between Frequentists and Bayesians
In frequentist statistics, we have no asssumptions on θ except θ is fixed. But in
bayesian statistics, we assume that θ is not fixed and have a distribution. We
call this distribution prior. In ML, we set this prior high entropy to reflect a
high degree of uncertainty in the value of θ before observing any data.
Differences in estimation
Since estimation is a process of minimization of cross-entropy, in MLE we need
to consider θ in finite vector space. But bayesian assumes that θ has a
distribution we need to consider distribution of θ in infinite dimensional
vector space(function space).
Sung-Yub Kim Statistics for Deep Learning
Estimators, Bias and Variance
MLE
Bayesian Statistics
MAP
Since we can parameterize prior and posterior distribution, we can make a
single-point estimation. And this is so called Maximum A Posteriori(MAP)
estimation.
θMAP = arg max
θ
p(θ|x) = arg max
θ
log p(x|θ) + log p(θ) (13)
and it can be interpreted as minimzation of sum of standard log-likelihood
and log-prior. And this is correspond to weight-decay.
Sung-Yub Kim Statistics for Deep Learning

More Related Content

What's hot

MachineLearning-v0.1
MachineLearning-v0.1MachineLearning-v0.1
MachineLearning-v0.1
Sergey Popov
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
butest
 

What's hot (20)

Machine learning algorithms
Machine learning algorithmsMachine learning algorithms
Machine learning algorithms
 
Evaluation of multilabel multi class classification
Evaluation of multilabel multi class classificationEvaluation of multilabel multi class classification
Evaluation of multilabel multi class classification
 
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 2 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 2 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
 
Multiclass classification of imbalanced data
Multiclass classification of imbalanced dataMulticlass classification of imbalanced data
Multiclass classification of imbalanced data
 
Qt unit i
Qt unit   iQt unit   i
Qt unit i
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 
Module 6: Ensemble Algorithms
Module 6:  Ensemble AlgorithmsModule 6:  Ensemble Algorithms
Module 6: Ensemble Algorithms
 
Class imbalance problem1
Class imbalance problem1Class imbalance problem1
Class imbalance problem1
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
 
Linear Regression in R
Linear Regression in RLinear Regression in R
Linear Regression in R
 
Ensemble methods
Ensemble methodsEnsemble methods
Ensemble methods
 
MachineLearning-v0.1
MachineLearning-v0.1MachineLearning-v0.1
MachineLearning-v0.1
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine Learning by Rj
Machine Learning by RjMachine Learning by Rj
Machine Learning by Rj
 
Intepretable Machine Learning
Intepretable Machine LearningIntepretable Machine Learning
Intepretable Machine Learning
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble Methods
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
 

Viewers also liked

[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
台灣資料科學年會
 

Viewers also liked (19)

Keller, 2008, Strategic Brand Management Chapter 8 Summary
Keller, 2008, Strategic Brand Management Chapter 8 SummaryKeller, 2008, Strategic Brand Management Chapter 8 Summary
Keller, 2008, Strategic Brand Management Chapter 8 Summary
 
Top 10-elearning-statistics-2016
Top 10-elearning-statistics-2016Top 10-elearning-statistics-2016
Top 10-elearning-statistics-2016
 
Experiential Learning in an iPad Classroom - TCEA 2017
Experiential Learning in an iPad Classroom - TCEA 2017Experiential Learning in an iPad Classroom - TCEA 2017
Experiential Learning in an iPad Classroom - TCEA 2017
 
Experiential Learning Online
Experiential Learning OnlineExperiential Learning Online
Experiential Learning Online
 
Extending and integrating a hybrid knowledge representation system into the c...
Extending and integrating a hybrid knowledge representation system into the c...Extending and integrating a hybrid knowledge representation system into the c...
Extending and integrating a hybrid knowledge representation system into the c...
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
 
Case Study of Convolutional Neural Network
Case Study of Convolutional Neural NetworkCase Study of Convolutional Neural Network
Case Study of Convolutional Neural Network
 
Convolutional neural network in practice
Convolutional neural network in practiceConvolutional neural network in practice
Convolutional neural network in practice
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
 
Experiential Learning Presentation
Experiential Learning PresentationExperiential Learning Presentation
Experiential Learning Presentation
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Deep Learningと画像認識   ~歴史・理論・実践~
Deep Learningと画像認識 ~歴史・理論・実践~Deep Learningと画像認識 ~歴史・理論・実践~
Deep Learningと画像認識   ~歴史・理論・実践~
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
The Future of Corporate Learning - Ten Disruptive Trends
The Future of Corporate Learning - Ten Disruptive TrendsThe Future of Corporate Learning - Ten Disruptive Trends
The Future of Corporate Learning - Ten Disruptive Trends
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 

Similar to Statistics for deep learning

Linear Regression
Linear RegressionLinear Regression
Linear Regression
mailund
 
The future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docxThe future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docx
oreo10
 

Similar to Statistics for deep learning (20)

chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.ppt
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood Estimator
 
Dimd_m_004 DL.pdf
Dimd_m_004 DL.pdfDimd_m_004 DL.pdf
Dimd_m_004 DL.pdf
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
 
PyGotham 2016
PyGotham 2016PyGotham 2016
PyGotham 2016
 
Important Classification and Regression Metrics.pptx
Important Classification and Regression Metrics.pptxImportant Classification and Regression Metrics.pptx
Important Classification and Regression Metrics.pptx
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
Statistical parameters
Statistical parametersStatistical parameters
Statistical parameters
 
Linear_Regression
Linear_RegressionLinear_Regression
Linear_Regression
 
Unit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptxUnit III_Ch 17_Probablistic Methods.pptx
Unit III_Ch 17_Probablistic Methods.pptx
 
Sampling Size
Sampling SizeSampling Size
Sampling Size
 
statistical estimation
statistical estimationstatistical estimation
statistical estimation
 
Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?
 
“I. Conjugate priors”
“I. Conjugate priors”“I. Conjugate priors”
“I. Conjugate priors”
 
01.ConditionalProb.pdf in the Bayes_intro folder
01.ConditionalProb.pdf in the Bayes_intro folder01.ConditionalProb.pdf in the Bayes_intro folder
01.ConditionalProb.pdf in the Bayes_intro folder
 
01.conditional prob
01.conditional prob01.conditional prob
01.conditional prob
 
Geert van Kollenburg-masterthesis
Geert van Kollenburg-masterthesisGeert van Kollenburg-masterthesis
Geert van Kollenburg-masterthesis
 
The future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docxThe future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docx
 

Recently uploaded

Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
HyderabadDolls
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 

Recently uploaded (20)

Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
Oral Sex Call Girls Kashmiri Gate Delhi Just Call 👉👉 📞 8448380779 Top Class C...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 

Statistics for deep learning

  • 1. Estimators, Bias and Variance MLE Bayesian Statistics Statistics for Deep Learning Sung-Yub Kim Dept of IE, Seoul National University January 14, 2017 Sung-Yub Kim Statistics for Deep Learning
  • 2. Estimators, Bias and Variance MLE Bayesian Statistics Point Estimation Bias Variance and Standard Error Consistency Point Estimation Point estimation is the attempt to provide the single ’Best’ prediction of some quantity of interest. In general, the quantity of interest can be a single parameter or a vector of parameters in some parametric model. To distinguish estimates of parameters from their true value, our convention will be dentoe a point estimate of a parameter θ by ˆθ A good estimator is a function whose output is close to the true underlying θ that generated the training data. Funtion Estimation Sometimes we are interested in performing function approximation. Here we trying to predict a variable y given x. Therefore, we may assume that y = f (x) + (1) Sung-Yub Kim Statistics for Deep Learning
  • 3. Estimators, Bias and Variance MLE Bayesian Statistics Point Estimation Bias Variance and Standard Error Consistency Bias The bias of an estimator is defined as bias( ˆθm) = E[ ˆθm] − θ (2) Unbiased An estimator ˆθm is said to be unbiased if bias( ˆθm) = 0 (3) ,which implies that E[ ˆθm] = θ. Asymptotically Unbiased An estimator ˆθm is said to be asymptotically unbiased if lim m→∞ bias( ˆθm) = 0 (4) , which implies that limm→∞ E[ ˆθm] = θ Sung-Yub Kim Statistics for Deep Learning
  • 4. Estimators, Bias and Variance MLE Bayesian Statistics Point Estimation Bias Variance and Standard Error Consistency Variance The variance of an estimator is simply the variance Var(ˆθ) (5) where the random variable is the training set. Alternately, the square root of the variance is called the standard error, denoted SE(ˆθ). The variance of an estimator provides a measure of how we would expect the estimate we compute from data to vary as we independently resample the dataset from underlying data-generating process. Sung-Yub Kim Statistics for Deep Learning
  • 5. Estimators, Bias and Variance MLE Bayesian Statistics Point Estimation Bias Variance and Standard Error Consistency Bias-Variance Tradeoff The realtion between bias and variance can be showed as term of mean squared error(MSE) of the estimtes: MSE = E[( ˆθm − θ)] = Bias( ˆθm)2 + Var( ˆθm) (6) Desirable estimators are those with small MSE and these are estimators that manage to keep both their bias and variance somewhat in check. Usually, we measure generalization error is measured by the MSE. Sung-Yub Kim Statistics for Deep Learning
  • 6. Estimators, Bias and Variance MLE Bayesian Statistics Point Estimation Bias Variance and Standard Error Consistency Weak Consistency Usually, we are also concerned with the behavior of an estimator as the amount of training data grows. We wish that, as the number of data points m in our dataset increases, our point converge to the true value of the corresponding parameters. More formally, we define an estimate of parameter is weak consistent if p lim m→∞ ˆθm = θ (7) The symbol p lim indicates convergence in probability, meaning that for any > 0, P[| ˆθm − θ| > ] → 0 as m → 0. Strong Consistency Also, we can define strong consistency like P[ lim m→∞ ˆθm = θ] = 1 (8) This equation also means that almost surely convergence, P[lim infm→∞{ω ∈ Ω : | ˆθm(ω) − θ(ω)| < }] = 1, ∀ > 0 Sung-Yub Kim Statistics for Deep Learning
  • 7. Estimators, Bias and Variance MLE Bayesian Statistics Point Estimation Bias Variance and Standard Error Consistency Consistency and Asymptotically Unbiased Consistency ensures that bias induced by the estimator diminishes as the number of data example grows. But, the reverse in not true, since even if we have enough examples, we can make proabability bigger than a positive number by controlling . Sung-Yub Kim Statistics for Deep Learning
  • 8. Estimators, Bias and Variance MLE Bayesian Statistics MLE Properties of MLE Maximum Likelihood Estimation Let pmodel (x; θ) be a parametric family of probability distributions over the same space indexed by θ. The Maximum Likelehood Estimator(MLE) for θ is defined as θMLE = arg max θ pmodel (X; θ) = arg max θ m i=1 pmodel (x(i) ; θ) (9) and this maximization problem is equivalent to arg max θ m i=1 log pmodel (x(i) ; θ) = arg max θ Ex∼ˆpdata [log pmodel (x; θ)] (10) where ˆpdata is a empirical distribution. And this can be interpreted as arg min θ DKL(ˆpdata pmodel ) = arg max θ Ex∼ˆpdata [log ˆpdata(x)−log pmodel (x; θ)] (11) since this minimization problem is equivalent to −arg min θ Ex∼ˆpdata [log pmodel (x)] (12) Sung-Yub Kim Statistics for Deep Learning
  • 9. Estimators, Bias and Variance MLE Bayesian Statistics MLE Properties of MLE Minimizing the Cross Entropy is all we do Any loss consisting of a negative log-likelihood is a cross-entropy between the empirical distribution defined by the training set and the probability distribution defined by model. For example, MSE is the cross-entropy between the empirical distribution and a Gaussian model. Sung-Yub Kim Statistics for Deep Learning
  • 10. Estimators, Bias and Variance MLE Bayesian Statistics MLE Properties of MLE MLE is a consistent estimator. If we assume that the true distribution pdata must lie within the model family pmodel and the true distribution pdata must correspond to exactly one value of θ. Consistent estimator has statistical efficiency. It means that one consistent estimator may obtain lower generalization error for a fixed number of samples or requires fewer examples to obtain a fixed level of generalization error. Cramer-Rao Bound Cramer-Rao lower bound show that no consistent estimator has a lower MSE than the MLE. Sung-Yub Kim Statistics for Deep Learning
  • 11. Estimators, Bias and Variance MLE Bayesian Statistics MAP Differences between Frequentists and Bayesians In frequentist statistics, we have no asssumptions on θ except θ is fixed. But in bayesian statistics, we assume that θ is not fixed and have a distribution. We call this distribution prior. In ML, we set this prior high entropy to reflect a high degree of uncertainty in the value of θ before observing any data. Differences in estimation Since estimation is a process of minimization of cross-entropy, in MLE we need to consider θ in finite vector space. But bayesian assumes that θ has a distribution we need to consider distribution of θ in infinite dimensional vector space(function space). Sung-Yub Kim Statistics for Deep Learning
  • 12. Estimators, Bias and Variance MLE Bayesian Statistics MAP Since we can parameterize prior and posterior distribution, we can make a single-point estimation. And this is so called Maximum A Posteriori(MAP) estimation. θMAP = arg max θ p(θ|x) = arg max θ log p(x|θ) + log p(θ) (13) and it can be interpreted as minimzation of sum of standard log-likelihood and log-prior. And this is correspond to weight-decay. Sung-Yub Kim Statistics for Deep Learning