SlideShare a Scribd company logo
Model Assessment and Selection
      Machine Learning Seminar Series'11




                Nikita Zhiltsov


 Kazan (Volga Region) Federal University, Russia




             18 November 2011




                                                   1 / 34
Outline
1   Bias, Variance and Model Complexity


2   Nature of Prediction Error


3   Error Estimation: Analytical methods
      AIC
      BIC
      SRM Approach


4   Error Estimation: Sample re-use
      Cross-validation
      Bootstrapping


5   Model Assessment in R



                                           2 / 34
Outline
1   Bias, Variance and Model Complexity


2   Nature of Prediction Error


3   Error Estimation: Analytical methods
      AIC
      BIC
      SRM Approach


4   Error Estimation: Sample re-use
      Cross-validation
      Bootstrapping


5   Model Assessment in R



                                           3 / 34
Notation
   x = (x1 , . . . , xD ) ∈ X  a vector of inputs
   t ∈ T  a target variable
   y(x)  a prediction model

   L(t, y(x))  the loss function for measuring errors.
   Usual choices for regression:
                            (y(x) − t)2 squared error
        L(t, y(x)) =
                             |y(x) − t| absolute error
   ... and classication:
                        I(y(x) = t) 0-1 loss
      L(t, y(x)) =
                        −2 log pt (x) log-likelihood loss
                                                            4 / 34
Notation (cont.)

             1   N
     err =   N   i=1   L(ti , xi )    training error


     ErrD = ED [L(t, y(x))]           test error (prediction error) for a given
     training set D


     Err = E[ErrD ] = E[L(t, y(x))]            expected test error


NB
Most methods eectively estimate only           Err.




                                                                             5 / 34
Typical behavior of test and training error
Example




     Training error is not a good estimate of the test error

     There is some intermediate model complexity that gives
     minimum expected test error

                                                               6 / 34
Dening our goals


Model Selection
Estimating the performance of dierent models in order to choose
the best one




Model Assessment
Having chosen a nal model, estimating its generalization error on
new data




                                                                     7 / 34
Data-rich situation




   Training set is used to learn the models

   Validation set is used to estimate prediction error for model
   selection

   Test set is used for assessment of the generalization error of the
   chosen model




                                                                   8 / 34
Outline
1   Bias, Variance and Model Complexity


2   Nature of Prediction Error


3   Error Estimation: Analytical methods
      AIC
      BIC
      SRM Approach


4   Error Estimation: Sample re-use
      Cross-validation
      Bootstrapping


5   Model Assessment in R



                                           9 / 34
Bias-Variance Decomposition
Let's consider expected loss       E[L]   for regression task:


                     E[L] =            L(t, y(x)) p(x, t)dxdt
                               R   X

Under squared error loss,      h(x) = E[t|x] =       tp(t|x)dt   is the optimal
prediction.
Then,   E[L]   can be decomposed into the sum of three parts:


                      E[L] = bias2 + variance + noise
where

                 2
          bias        =   (ED [y(x; D)] − h(x))2 p(x)dx
        variance      =       ED [(y(x; D) − ED [y(x; D)])2 ] p(x)dx
          noise       =   (h(x) − t)2 p(x, t)dxdt

                                                                             10 / 34
Bias-Variance Decomposition
Examples



                                             p
     For a linear model      y(x, w) =       j=1   wj xj , ∀wj = 0,
     the in-sample error is:

                                N
                        1                                   p 2
                  Err =              (¯(xi ) − h(xi ))2 +
                                      y                       σ + σ2
                        N      i=1
                                                            N

     For a ridge regression model (Tikhonov regularization):

                   N
           1
     Err =              {(ˆ(xi ) − h(xi ))2 + (ˆ(xi ) − y (xi ))2 } + V ar + σ 2
                          y                    y        ¯
           N      i=1

     where   y (xi )
             ˆ          the best-tting linear approximation to      h

                                                                            11 / 34
Behavior of bias and variance




                                12 / 34
Bias-variance tradeo
Example




                     Regression with squared loss

                     Classication with 0-1 loss

                     In the 2nd case, prediction error is no
                     longer the sum of squared bias and
                     variance

                 ⇒   The best choices of tuning parameters
                     may dier substantially in the two
                     settings




                                                               13 / 34
Outline
1   Bias, Variance and Model Complexity


2   Nature of Prediction Error


3   Error Estimation: Analytical methods
      AIC
      BIC
      SRM Approach


4   Error Estimation: Sample re-use
      Cross-validation
      Bootstrapping


5   Model Assessment in R



                                           14 / 34
Analytical methods: AIC, BIC, SRM

   They give the in-sample estimates in the general form:

                                ˆ
                               Err = err + w
                                           ˆ

   where   w
           ˆ   is an estimate of the average optimism

   By using    w,
               ˆ    the methods penalize too complex models

   Unlike regularization, they do not impose a specic
   regularization parameter    λ
   Each criterion denes its notion of model complexity involved in
   the penalizing term




                                                                15 / 34
Akaike Information Criterion (AIC)

   Applicable for linear models

   Either log-likelihood loss or squared error loss is used

   Given a set of models indexed by a tuning parameter        α,   denote
   by   d(α)   number of parameters for each model. Then,


                                            d(α) 2
                         AIC(α) = err + 2       σ
                                                ˆ
                                             N
   where   σ2
           ˆ    is typically estimated by the mean squared error of a
   low-bias model

   Finally, we choose the model giving smallest AIC




                                                                       16 / 34
Akaike Information Criterion (AIC)
Example




                     Phoneme recognition task (N      = 1000)
                     Input vector is the log-periodogram of
                     the spoken vowel quantized to 256
                     uniformly space frequencies

                     Linear logistic regression is used to
                     predict the phonem class

                     Here   d(α)   is a number of basis
                     functions




                                                             17 / 34
Bayesian Information Criterion (BIC)
   BIC, like AIC, is applicable in settings where log-likehood
   maximization is involved

                              N                  d
                     BIC =      2
                                  (err + (log N ) σ 2 )
                                                   ˆ
                              σ
                              ˆ                  N

   BIC is proportional to AIC with the factor 2 replaced by    log N
   Having   N  8,   BIC tends to penalize complex models more
   heavily than AIC

   BIC also provides the posterior probability of each model     m:
                                    1
                                 e− 2 BICm
                                 M      1
                                      − 2 BICl
                                 l=1 e

   BIC is asympotically consistent as   N →∞
                                                                      18 / 34
Structural Risk Minimization
   The Vapnik-Chervonenkis (VC) theory provides a general
   measure of the model complexity and gives associated bounds
   on the optimism

   Such a complexity measure, VC dimension, is dened as follows:

             VC dimension of the class functions {f (x, α)} is
             the largest number of points that can be shattered by
             members of {f (x, α)}

   E.g. a linear indicator function in   p   dimensions has VC
   dimension   p + 1; sin(αx)   has innite VC dimension




                                                                 19 / 34
Structural Risk Minimization (cont.)
    If we t   N   training points using   {f (x, α)} having VC dimension
    h,   then with probability at least    1 − η the following bound holds:

                                       h     2N        ln η
                    Err  err +          (ln    + 1) −      )
                                       N      h         N
    SRM approach ts a nested sequence of models of increasing VC
    dimensions     h1  h2 . . .   and then chooses the model with the
    smallest upper bound

    SVM classier eciently carries out the SRM approach

Issues
  ˆ There exists the diculty in calculating the VC dimension of a class
    of functions
  ˆ In practice, often the upper bound is very loose

                                                                         20 / 34
Outline
1   Bias, Variance and Model Complexity


2   Nature of Prediction Error


3   Error Estimation: Analytical methods
      AIC
      BIC
      SRM Approach


4   Error Estimation: Sample re-use
      Cross-validation
      Bootstrapping


5   Model Assessment in R



                                           21 / 34
Sample re-use: cross-validation, bootstrapping

   These methods directly (and quite accurately) estimate
   the average generalization error
   The extra-sample error is evaluated rather than
   in-sample one (test input vectors do not need to
   coincide with training ones)
   They can be used with any loss function, and with
   nonlinear, adaptive and tting techniques
   However, they may underestimate true error for such
   tting methods as trees


                                                       22 / 34
Cross-validation
   Probably the simplest and widely used method

   However, time-consuming method

   CV procedure looks as follows:
     1   Split data into K roughly equal-sized parts
     2   For k-th part we t the model y −k (x) to other K − 1 parts
     3   Then the cross-validation estimate of the prediction error is
                                     N
                               1
                          CV =             L(ti , y −k(i) (xi ))
                               N
                                     i=1




   The case   K=N      (leave-one-out cross-validation) is roughly
   unbiased, but can have high variance
                                                                         23 / 34
Cross-validation (cont.)
    In practice, 5- or 10-fold cross-validation is recommended

    CV tends to overestimate the true prediction error on small
    datasets

    Often one-standard error rule is used with CV. See example:



                                         We choose the most
                                         parsimonious model
                                         whose error is no more
                                         than one standard error
                                         above the error of the
                                         best model

                                         A model with   p=9
                                         would be chosen


                                                                   24 / 34
Bootstrapping
   General method for assessing statistical accuracy
   Given a training set, here the bootstrapping procedure steps are:
     1   Randomly draw datasets of with replacement from it; each
         sample is of the same size as the original one
     2   This is done by B times, producing B bootstrap datasets
     3   Fit the model to each of the bootstrap datasets
     4   Examine the prediction error using the original training set as a
         test set:
                                   N
                             1            1
                    ˆ
                   Errboot =                               L(ti , y ∗b (xi ))
                             N          |C −i |
                                  i=1             b∈C −i

         where C (−i) is the set of indices of the bootstrap samples that
         do not contain observation i
   To alleviate the upward bias, the .632 estimator is used:

                   ˆ (.632) = 0.368 err + 0.632 Errboot
                  Err                            ˆ
                                                                                25 / 34
Outline
1   Bias, Variance and Model Complexity


2   Nature of Prediction Error


3   Error Estimation: Analytical methods
      AIC
      BIC
      SRM Approach


4   Error Estimation: Sample re-use
      Cross-validation
      Bootstrapping


5   Model Assessment in R



                                           26 / 34
http://r-project.org



     Free software environment for statistical
     computing and graphics
     R packages for machine learning and data
     mining: kernlab, rpart, randomForest,
     animation, gbm, tm etc.
     R packages for evaluation: bootstrap,boot
     RStudio IDE
                                                 27 / 34
Housing dataset at UCI Machine learning
repository
http://archive.ics.uci.edu/ml/datasets/Housing

     Housing values in suburbs of Boston

     506 intances, 13 attributes + 1 numeric class attribute
     (MEDV)




                                                                 28 / 34
Loading data in R



 housing - read.table(∼/projects/r/housing.data,
+ header=T)
 attach(housing)




                                                       29 / 34
Cross-validation example in R
Helper function




Creating a function using crossval() from bootstrap package


   eval - function(fit,k=10){
+   require(bootstrap)
+   theta.fit - function(x,y){lsfit(x,y)}
+   theta.predict - function(fit,x){cbind(1,x)%*%fit$coef}
+   x - fit$model[,2:ncol(fit$model)]
+   y - fit$model[,1]
+   results - crossval(x,y,theta.fit,theta.predict,
+   ngroup=k)
+   squared.error=sum((y-results$cv.fit)^2)/length(y)
+   cat(Cross-validated squared error =,
+   squared.error, n)}

                                                              30 / 34
Cross-validation example in R
Model assessment




 fit - lm(MEDV∼.,data=housing) # A linear model that uses
 all the attributes
 eval(fit)
Cross-validated squared error = 23.15827
 fit - lm(MEDV∼ ZN+NOX+RM+DIS+RAD+TAX+PTRATIO+B+LSTAT+CRIM+CHAS,
+ data=housing) # Less complex model
 eval(fit)
Cross-validated squared error = 23.24319
 fit - lm(MEDV∼ RM,data=housing) # Too simple model
 eval(fit)
Cross-validated squared error = 44.38424




                                                             31 / 34
Bootstrapping example in R
Helper function




Creating a function using boot() function from boot package


   sqer - function(formula,data,indices){
+   d - data[indices,]
+   fit - lm(formula, data=d)
+   return (sum(fit$residuals^2)/length(fit$residuals))
+   }




                                                              32 / 34
Bootstrapping example in R
Model assessment

 results - boot(data=housing,statistic=sqer,R=1000,
formula=MEDV∼.) # 1000 bootstrapped datasets
 print(results)
Bootstrap Statistics :
    original   bias     std. error
t1* 21.89483 -0.76001     2.296025
 results - boot(data=housing,statistic=sqer,R=1000,
formula=MEDV∼ ZN+NOX+RM+DIS+RAD+TAX+PTRATIO+B+LSTAT+CRIM+CHAS)
 print(results)
Bootstrap Statistics :
    original     bias     std. error
t1* 22.88726 -0.5400892     2.744437
 results - boot(data=housing,statistic=sqer,R=1000,
formula=MEDV∼ RM)
 print(results)
Bootstrap Statistics :
    original     bias     std. error
t1* 43.60055 -0.3379168     5.407933
                                                             33 / 34
Resources


   T.Hastie, R.Tibshirani, J.Friedman. The Elements of Statistical
   Learning, 2008
   Stanford Engineering Everywhere CS229  Machine Learning.
   Handouts 4 and 5
   http://videolectures.net/stanfordcs229f07_machine_
   learning/




                                                                34 / 34

More Related Content

What's hot

Bioequivalence studies ( Evaluation and Study design)
Bioequivalence studies ( Evaluation and Study design)Bioequivalence studies ( Evaluation and Study design)
Bioequivalence studies ( Evaluation and Study design)
Selim Akhtar
 
Vaccine delivery system
Vaccine delivery systemVaccine delivery system
Vaccine delivery system
Priyam Patel
 
Dissolution profile comparisons
Dissolution profile comparisonsDissolution profile comparisons
Dissolution profile comparisons
Nagaraju Ravouru
 
Optimization techniques
Optimization techniquesOptimization techniques
Optimization techniques
kunal9689176018
 
Research Methodology 2
Research Methodology 2Research Methodology 2
Research Methodology 2
Tamer Hifnawy
 
Medical research
Medical researchMedical research
Medical research
Lucila De la Calle
 
PERSONALIZED MEDICINE,CUSTOMIZED DRUG DELIVERY SYSTEM ,3D PRINTING ,TELEPHARM...
PERSONALIZED MEDICINE,CUSTOMIZED DRUG DELIVERY SYSTEM ,3D PRINTING ,TELEPHARM...PERSONALIZED MEDICINE,CUSTOMIZED DRUG DELIVERY SYSTEM ,3D PRINTING ,TELEPHARM...
PERSONALIZED MEDICINE,CUSTOMIZED DRUG DELIVERY SYSTEM ,3D PRINTING ,TELEPHARM...
GOKULAKRISHNAN S
 
Design of Experiments
Design of ExperimentsDesign of Experiments
Factorial Design & Application In Formulation.pptx
Factorial Design & Application In Formulation.pptxFactorial Design & Application In Formulation.pptx
Factorial Design & Application In Formulation.pptx
nishenandansuryawans
 
computational model of drug disposition.
computational model of drug disposition.computational model of drug disposition.
computational model of drug disposition.
SaloniDalwadi
 
FDA 2013 Clinical Investigator Training Course Preparing an IND Application: ...
FDA 2013 Clinical Investigator Training Course Preparing an IND Application: ...FDA 2013 Clinical Investigator Training Course Preparing an IND Application: ...
FDA 2013 Clinical Investigator Training Course Preparing an IND Application: ...
MedicReS
 
Study of consolidation parameters
Study of consolidation parametersStudy of consolidation parameters
Study of consolidation parameters
Durga Bhavani
 
Similarity and difference factors of dissolution
Similarity and difference factors of dissolutionSimilarity and difference factors of dissolution
Similarity and difference factors of dissolution
Jessica Fernandes
 
Optimization techniques in pharmaceutical processing
Optimization techniques in pharmaceutical processing Optimization techniques in pharmaceutical processing
Optimization techniques in pharmaceutical processing
ROHIT
 
DISSOLUTION
DISSOLUTIONDISSOLUTION
DISSOLUTION
SagarSahu608102
 
Effect of friction, distribution of force, compaction and solubility suraj se...
Effect of friction, distribution of force, compaction and solubility suraj se...Effect of friction, distribution of force, compaction and solubility suraj se...
Effect of friction, distribution of force, compaction and solubility suraj se...
Suraj Pund
 
Government regulation in pharmaceutical validation
Government regulation in pharmaceutical validationGovernment regulation in pharmaceutical validation
Government regulation in pharmaceutical validation
VaishnaviRaut6
 
Mucosal vaccine delivery system.pptx
Mucosal vaccine delivery system.pptxMucosal vaccine delivery system.pptx
Mucosal vaccine delivery system.pptx
chinjuvineeth
 

What's hot (20)

Bioequivalence studies ( Evaluation and Study design)
Bioequivalence studies ( Evaluation and Study design)Bioequivalence studies ( Evaluation and Study design)
Bioequivalence studies ( Evaluation and Study design)
 
Design of experiments
Design of experiments Design of experiments
Design of experiments
 
Vaccine delivery system
Vaccine delivery systemVaccine delivery system
Vaccine delivery system
 
Dissolution profile comparisons
Dissolution profile comparisonsDissolution profile comparisons
Dissolution profile comparisons
 
Optimization techniques
Optimization techniquesOptimization techniques
Optimization techniques
 
Research Methodology 2
Research Methodology 2Research Methodology 2
Research Methodology 2
 
Medical research
Medical researchMedical research
Medical research
 
PERSONALIZED MEDICINE,CUSTOMIZED DRUG DELIVERY SYSTEM ,3D PRINTING ,TELEPHARM...
PERSONALIZED MEDICINE,CUSTOMIZED DRUG DELIVERY SYSTEM ,3D PRINTING ,TELEPHARM...PERSONALIZED MEDICINE,CUSTOMIZED DRUG DELIVERY SYSTEM ,3D PRINTING ,TELEPHARM...
PERSONALIZED MEDICINE,CUSTOMIZED DRUG DELIVERY SYSTEM ,3D PRINTING ,TELEPHARM...
 
Design of Experiments
Design of ExperimentsDesign of Experiments
Design of Experiments
 
Factorial Design & Application In Formulation.pptx
Factorial Design & Application In Formulation.pptxFactorial Design & Application In Formulation.pptx
Factorial Design & Application In Formulation.pptx
 
computational model of drug disposition.
computational model of drug disposition.computational model of drug disposition.
computational model of drug disposition.
 
FDA 2013 Clinical Investigator Training Course Preparing an IND Application: ...
FDA 2013 Clinical Investigator Training Course Preparing an IND Application: ...FDA 2013 Clinical Investigator Training Course Preparing an IND Application: ...
FDA 2013 Clinical Investigator Training Course Preparing an IND Application: ...
 
Study of consolidation parameters
Study of consolidation parametersStudy of consolidation parameters
Study of consolidation parameters
 
Optimization techniques
Optimization techniquesOptimization techniques
Optimization techniques
 
Similarity and difference factors of dissolution
Similarity and difference factors of dissolutionSimilarity and difference factors of dissolution
Similarity and difference factors of dissolution
 
Optimization techniques in pharmaceutical processing
Optimization techniques in pharmaceutical processing Optimization techniques in pharmaceutical processing
Optimization techniques in pharmaceutical processing
 
DISSOLUTION
DISSOLUTIONDISSOLUTION
DISSOLUTION
 
Effect of friction, distribution of force, compaction and solubility suraj se...
Effect of friction, distribution of force, compaction and solubility suraj se...Effect of friction, distribution of force, compaction and solubility suraj se...
Effect of friction, distribution of force, compaction and solubility suraj se...
 
Government regulation in pharmaceutical validation
Government regulation in pharmaceutical validationGovernment regulation in pharmaceutical validation
Government regulation in pharmaceutical validation
 
Mucosal vaccine delivery system.pptx
Mucosal vaccine delivery system.pptxMucosal vaccine delivery system.pptx
Mucosal vaccine delivery system.pptx
 

Similar to 7 - Model Assessment and Selection

Section6 stochastic
Section6 stochasticSection6 stochastic
Section6 stochastic
cairo university
 
The Magic of Auto Differentiation
The Magic of Auto DifferentiationThe Magic of Auto Differentiation
The Magic of Auto Differentiation
Sanyam Kapoor
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
The Statistical and Applied Mathematical Sciences Institute
 
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Ukraine
 
A Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR FiltersA Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR Filters
IDES Editor
 
Conditional Random Fields
Conditional Random FieldsConditional Random Fields
Conditional Random Fieldslswing
 
Adaline and Madaline.ppt
Adaline and Madaline.pptAdaline and Madaline.ppt
Adaline and Madaline.ppt
neelamsanjeevkumar
 
ML unit-1.pptx
ML unit-1.pptxML unit-1.pptx
ML unit-1.pptx
SwarnaKumariChinni
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
Fabian Pedregosa
 
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
Jialin LIU
 
Projection methods for stochastic structural dynamics
Projection methods for stochastic structural dynamicsProjection methods for stochastic structural dynamics
Projection methods for stochastic structural dynamics
University of Glasgow
 
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow AnalysisDetecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow AnalysisSilvio Cesare
 
Jam 2006 Test Papers Mathematical Statistics
Jam 2006 Test Papers Mathematical StatisticsJam 2006 Test Papers Mathematical Statistics
Jam 2006 Test Papers Mathematical Statisticsashu29
 
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketching
Hsing-chuan Hsieh
 
Nonparametric approach to multiple regression
Nonparametric approach to multiple regressionNonparametric approach to multiple regression
Nonparametric approach to multiple regression
Alexander Decker
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learning
Steve Nouri
 
Automatic bayesian cubature
Automatic bayesian cubatureAutomatic bayesian cubature
Automatic bayesian cubature
Jagadeeswaran Rathinavel
 
Paper computer
Paper computerPaper computer
Paper computerbikram ...
 
Paper computer
Paper computerPaper computer
Paper computerbikram ...
 
2014-mo444-practical-assignment-01-paulo_faria
2014-mo444-practical-assignment-01-paulo_faria2014-mo444-practical-assignment-01-paulo_faria
2014-mo444-practical-assignment-01-paulo_fariaPaulo Faria
 

Similar to 7 - Model Assessment and Selection (20)

Section6 stochastic
Section6 stochasticSection6 stochastic
Section6 stochastic
 
The Magic of Auto Differentiation
The Magic of Auto DifferentiationThe Magic of Auto Differentiation
The Magic of Auto Differentiation
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
 
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
GlobalLogic Machine Learning Webinar “Advanced Statistical Methods for Linear...
 
A Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR FiltersA Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR Filters
 
Conditional Random Fields
Conditional Random FieldsConditional Random Fields
Conditional Random Fields
 
Adaline and Madaline.ppt
Adaline and Madaline.pptAdaline and Madaline.ppt
Adaline and Madaline.ppt
 
ML unit-1.pptx
ML unit-1.pptxML unit-1.pptx
ML unit-1.pptx
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
 
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
A Mathematically Derived Number of Resamplings for Noisy Optimization (GECCO2...
 
Projection methods for stochastic structural dynamics
Projection methods for stochastic structural dynamicsProjection methods for stochastic structural dynamics
Projection methods for stochastic structural dynamics
 
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow AnalysisDetecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
 
Jam 2006 Test Papers Mathematical Statistics
Jam 2006 Test Papers Mathematical StatisticsJam 2006 Test Papers Mathematical Statistics
Jam 2006 Test Papers Mathematical Statistics
 
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketching
 
Nonparametric approach to multiple regression
Nonparametric approach to multiple regressionNonparametric approach to multiple regression
Nonparametric approach to multiple regression
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learning
 
Automatic bayesian cubature
Automatic bayesian cubatureAutomatic bayesian cubature
Automatic bayesian cubature
 
Paper computer
Paper computerPaper computer
Paper computer
 
Paper computer
Paper computerPaper computer
Paper computer
 
2014-mo444-practical-assignment-01-paulo_faria
2014-mo444-practical-assignment-01-paulo_faria2014-mo444-practical-assignment-01-paulo_faria
2014-mo444-practical-assignment-01-paulo_faria
 

Recently uploaded

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 

Recently uploaded (20)

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 

7 - Model Assessment and Selection

  • 1. Model Assessment and Selection Machine Learning Seminar Series'11 Nikita Zhiltsov Kazan (Volga Region) Federal University, Russia 18 November 2011 1 / 34
  • 2. Outline 1 Bias, Variance and Model Complexity 2 Nature of Prediction Error 3 Error Estimation: Analytical methods AIC BIC SRM Approach 4 Error Estimation: Sample re-use Cross-validation Bootstrapping 5 Model Assessment in R 2 / 34
  • 3. Outline 1 Bias, Variance and Model Complexity 2 Nature of Prediction Error 3 Error Estimation: Analytical methods AIC BIC SRM Approach 4 Error Estimation: Sample re-use Cross-validation Bootstrapping 5 Model Assessment in R 3 / 34
  • 4. Notation x = (x1 , . . . , xD ) ∈ X a vector of inputs t ∈ T a target variable y(x) a prediction model L(t, y(x)) the loss function for measuring errors. Usual choices for regression: (y(x) − t)2 squared error L(t, y(x)) = |y(x) − t| absolute error ... and classication: I(y(x) = t) 0-1 loss L(t, y(x)) = −2 log pt (x) log-likelihood loss 4 / 34
  • 5. Notation (cont.) 1 N err = N i=1 L(ti , xi ) training error ErrD = ED [L(t, y(x))] test error (prediction error) for a given training set D Err = E[ErrD ] = E[L(t, y(x))] expected test error NB Most methods eectively estimate only Err. 5 / 34
  • 6. Typical behavior of test and training error Example Training error is not a good estimate of the test error There is some intermediate model complexity that gives minimum expected test error 6 / 34
  • 7. Dening our goals Model Selection Estimating the performance of dierent models in order to choose the best one Model Assessment Having chosen a nal model, estimating its generalization error on new data 7 / 34
  • 8. Data-rich situation Training set is used to learn the models Validation set is used to estimate prediction error for model selection Test set is used for assessment of the generalization error of the chosen model 8 / 34
  • 9. Outline 1 Bias, Variance and Model Complexity 2 Nature of Prediction Error 3 Error Estimation: Analytical methods AIC BIC SRM Approach 4 Error Estimation: Sample re-use Cross-validation Bootstrapping 5 Model Assessment in R 9 / 34
  • 10. Bias-Variance Decomposition Let's consider expected loss E[L] for regression task: E[L] = L(t, y(x)) p(x, t)dxdt R X Under squared error loss, h(x) = E[t|x] = tp(t|x)dt is the optimal prediction. Then, E[L] can be decomposed into the sum of three parts: E[L] = bias2 + variance + noise where 2 bias = (ED [y(x; D)] − h(x))2 p(x)dx variance = ED [(y(x; D) − ED [y(x; D)])2 ] p(x)dx noise = (h(x) − t)2 p(x, t)dxdt 10 / 34
  • 11. Bias-Variance Decomposition Examples p For a linear model y(x, w) = j=1 wj xj , ∀wj = 0, the in-sample error is: N 1 p 2 Err = (¯(xi ) − h(xi ))2 + y σ + σ2 N i=1 N For a ridge regression model (Tikhonov regularization): N 1 Err = {(ˆ(xi ) − h(xi ))2 + (ˆ(xi ) − y (xi ))2 } + V ar + σ 2 y y ¯ N i=1 where y (xi ) ˆ the best-tting linear approximation to h 11 / 34
  • 12. Behavior of bias and variance 12 / 34
  • 13. Bias-variance tradeo Example Regression with squared loss Classication with 0-1 loss In the 2nd case, prediction error is no longer the sum of squared bias and variance ⇒ The best choices of tuning parameters may dier substantially in the two settings 13 / 34
  • 14. Outline 1 Bias, Variance and Model Complexity 2 Nature of Prediction Error 3 Error Estimation: Analytical methods AIC BIC SRM Approach 4 Error Estimation: Sample re-use Cross-validation Bootstrapping 5 Model Assessment in R 14 / 34
  • 15. Analytical methods: AIC, BIC, SRM They give the in-sample estimates in the general form: ˆ Err = err + w ˆ where w ˆ is an estimate of the average optimism By using w, ˆ the methods penalize too complex models Unlike regularization, they do not impose a specic regularization parameter λ Each criterion denes its notion of model complexity involved in the penalizing term 15 / 34
  • 16. Akaike Information Criterion (AIC) Applicable for linear models Either log-likelihood loss or squared error loss is used Given a set of models indexed by a tuning parameter α, denote by d(α) number of parameters for each model. Then, d(α) 2 AIC(α) = err + 2 σ ˆ N where σ2 ˆ is typically estimated by the mean squared error of a low-bias model Finally, we choose the model giving smallest AIC 16 / 34
  • 17. Akaike Information Criterion (AIC) Example Phoneme recognition task (N = 1000) Input vector is the log-periodogram of the spoken vowel quantized to 256 uniformly space frequencies Linear logistic regression is used to predict the phonem class Here d(α) is a number of basis functions 17 / 34
  • 18. Bayesian Information Criterion (BIC) BIC, like AIC, is applicable in settings where log-likehood maximization is involved N d BIC = 2 (err + (log N ) σ 2 ) ˆ σ ˆ N BIC is proportional to AIC with the factor 2 replaced by log N Having N 8, BIC tends to penalize complex models more heavily than AIC BIC also provides the posterior probability of each model m: 1 e− 2 BICm M 1 − 2 BICl l=1 e BIC is asympotically consistent as N →∞ 18 / 34
  • 19. Structural Risk Minimization The Vapnik-Chervonenkis (VC) theory provides a general measure of the model complexity and gives associated bounds on the optimism Such a complexity measure, VC dimension, is dened as follows: VC dimension of the class functions {f (x, α)} is the largest number of points that can be shattered by members of {f (x, α)} E.g. a linear indicator function in p dimensions has VC dimension p + 1; sin(αx) has innite VC dimension 19 / 34
  • 20. Structural Risk Minimization (cont.) If we t N training points using {f (x, α)} having VC dimension h, then with probability at least 1 − η the following bound holds: h 2N ln η Err err + (ln + 1) − ) N h N SRM approach ts a nested sequence of models of increasing VC dimensions h1 h2 . . . and then chooses the model with the smallest upper bound SVM classier eciently carries out the SRM approach Issues ˆ There exists the diculty in calculating the VC dimension of a class of functions ˆ In practice, often the upper bound is very loose 20 / 34
  • 21. Outline 1 Bias, Variance and Model Complexity 2 Nature of Prediction Error 3 Error Estimation: Analytical methods AIC BIC SRM Approach 4 Error Estimation: Sample re-use Cross-validation Bootstrapping 5 Model Assessment in R 21 / 34
  • 22. Sample re-use: cross-validation, bootstrapping These methods directly (and quite accurately) estimate the average generalization error The extra-sample error is evaluated rather than in-sample one (test input vectors do not need to coincide with training ones) They can be used with any loss function, and with nonlinear, adaptive and tting techniques However, they may underestimate true error for such tting methods as trees 22 / 34
  • 23. Cross-validation Probably the simplest and widely used method However, time-consuming method CV procedure looks as follows: 1 Split data into K roughly equal-sized parts 2 For k-th part we t the model y −k (x) to other K − 1 parts 3 Then the cross-validation estimate of the prediction error is N 1 CV = L(ti , y −k(i) (xi )) N i=1 The case K=N (leave-one-out cross-validation) is roughly unbiased, but can have high variance 23 / 34
  • 24. Cross-validation (cont.) In practice, 5- or 10-fold cross-validation is recommended CV tends to overestimate the true prediction error on small datasets Often one-standard error rule is used with CV. See example: We choose the most parsimonious model whose error is no more than one standard error above the error of the best model A model with p=9 would be chosen 24 / 34
  • 25. Bootstrapping General method for assessing statistical accuracy Given a training set, here the bootstrapping procedure steps are: 1 Randomly draw datasets of with replacement from it; each sample is of the same size as the original one 2 This is done by B times, producing B bootstrap datasets 3 Fit the model to each of the bootstrap datasets 4 Examine the prediction error using the original training set as a test set: N 1 1 ˆ Errboot = L(ti , y ∗b (xi )) N |C −i | i=1 b∈C −i where C (−i) is the set of indices of the bootstrap samples that do not contain observation i To alleviate the upward bias, the .632 estimator is used: ˆ (.632) = 0.368 err + 0.632 Errboot Err ˆ 25 / 34
  • 26. Outline 1 Bias, Variance and Model Complexity 2 Nature of Prediction Error 3 Error Estimation: Analytical methods AIC BIC SRM Approach 4 Error Estimation: Sample re-use Cross-validation Bootstrapping 5 Model Assessment in R 26 / 34
  • 27. http://r-project.org Free software environment for statistical computing and graphics R packages for machine learning and data mining: kernlab, rpart, randomForest, animation, gbm, tm etc. R packages for evaluation: bootstrap,boot RStudio IDE 27 / 34
  • 28. Housing dataset at UCI Machine learning repository http://archive.ics.uci.edu/ml/datasets/Housing Housing values in suburbs of Boston 506 intances, 13 attributes + 1 numeric class attribute (MEDV) 28 / 34
  • 29. Loading data in R housing - read.table(∼/projects/r/housing.data, + header=T) attach(housing) 29 / 34
  • 30. Cross-validation example in R Helper function Creating a function using crossval() from bootstrap package eval - function(fit,k=10){ + require(bootstrap) + theta.fit - function(x,y){lsfit(x,y)} + theta.predict - function(fit,x){cbind(1,x)%*%fit$coef} + x - fit$model[,2:ncol(fit$model)] + y - fit$model[,1] + results - crossval(x,y,theta.fit,theta.predict, + ngroup=k) + squared.error=sum((y-results$cv.fit)^2)/length(y) + cat(Cross-validated squared error =, + squared.error, n)} 30 / 34
  • 31. Cross-validation example in R Model assessment fit - lm(MEDV∼.,data=housing) # A linear model that uses all the attributes eval(fit) Cross-validated squared error = 23.15827 fit - lm(MEDV∼ ZN+NOX+RM+DIS+RAD+TAX+PTRATIO+B+LSTAT+CRIM+CHAS, + data=housing) # Less complex model eval(fit) Cross-validated squared error = 23.24319 fit - lm(MEDV∼ RM,data=housing) # Too simple model eval(fit) Cross-validated squared error = 44.38424 31 / 34
  • 32. Bootstrapping example in R Helper function Creating a function using boot() function from boot package sqer - function(formula,data,indices){ + d - data[indices,] + fit - lm(formula, data=d) + return (sum(fit$residuals^2)/length(fit$residuals)) + } 32 / 34
  • 33. Bootstrapping example in R Model assessment results - boot(data=housing,statistic=sqer,R=1000, formula=MEDV∼.) # 1000 bootstrapped datasets print(results) Bootstrap Statistics : original bias std. error t1* 21.89483 -0.76001 2.296025 results - boot(data=housing,statistic=sqer,R=1000, formula=MEDV∼ ZN+NOX+RM+DIS+RAD+TAX+PTRATIO+B+LSTAT+CRIM+CHAS) print(results) Bootstrap Statistics : original bias std. error t1* 22.88726 -0.5400892 2.744437 results - boot(data=housing,statistic=sqer,R=1000, formula=MEDV∼ RM) print(results) Bootstrap Statistics : original bias std. error t1* 43.60055 -0.3379168 5.407933 33 / 34
  • 34. Resources T.Hastie, R.Tibshirani, J.Friedman. The Elements of Statistical Learning, 2008 Stanford Engineering Everywhere CS229 Machine Learning. Handouts 4 and 5 http://videolectures.net/stanfordcs229f07_machine_ learning/ 34 / 34