Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Modeling Social Data, Lecture 6: Re... by jakehofman 452 views
- искусство фотографии by rufinanikolaevna 165 views
- Consejos saludables para lograr una... by Michael White 90 views
- Processo de Reclamação 2017.2 by Yuri Moralles 152 views
- The role of the right hemisphere. g... by Dr. Ron Rubenzer 199 views
- Tipos de adicciones by Leonid Contreras ... 181 views

350 views

Published on

http://modelingsocialdata.org

Published in:
Education

No Downloads

Total views

350

On SlideShare

0

From Embeds

0

Number of Embeds

119

Shares

0

Downloads

5

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Model complexity and generalization APAM E4990 Modeling Social Data Jake Hofman Columbia University March 3, 2017 Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 1 / 10
- 2. Overﬁtting (a la xkcd) Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 2 / 10
- 3. Overﬁtting (a la xkcd) Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 3 / 10
- 4. Complexity Our models should be complex enough to explain the past, but simple enough to generalize to the future Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 4 / 10
- 5. Bias-variance tradeoﬀ Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 5 / 10
- 6. Bias-variance tradeoﬀ 38 2. Overview of Supervised Learning High Bias Low Variance Low Bias High Variance PredictionError Model Complexity Training Sample Test Sample Low High FIGURE 2.11. Test and training error as a function of model complexity. be close to f(x0). As k grows, the neighbors are further away, and then anything can happen. The variance term is simply the variance of an average here, and de- creases as the inverse of k. So as k varies, there is a bias–variance tradeoﬀ. Simple models may be “wrong” (high bias), but ﬁts don’t vary a lot with diﬀerent samples of training data (low variance) Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 6 / 10
- 7. Bias-variance tradeoﬀ 38 2. Overview of Supervised Learning High Bias Low Variance Low Bias High Variance PredictionError Model Complexity Training Sample Test Sample Low High FIGURE 2.11. Test and training error as a function of model complexity. be close to f(x0). As k grows, the neighbors are further away, and then anything can happen. The variance term is simply the variance of an average here, and de- creases as the inverse of k. So as k varies, there is a bias–variance tradeoﬀ. Flexible models can capture more complex relationships (low bias), but are also sensitive to noise in the training data (high variance) Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 6 / 10
- 8. Bigger models = Better models Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 7 / 10
- 9. Cross-validation set error of the ﬁnal chosen model will underestimate the true test error, sometimes substantially. It is diﬃcult to give a general rule on how to choose the number of observations in each of the three parts, as this depends on the signal-to- noise ratio in the data and the training sample size. A typical split might be 50% for training, and 25% each for validation and testing: TestTrain Validation TestTrain Validation TestValidationTrain Validation TestTrain The methods in this chapter are designed for situations where there is insuﬃcient data to split it into three parts. Again it is too diﬃcult to give a general rule on how much training data is enough; among other things, this depends on the signal-to-noise ratio of the underlying function, and the complexity of the models being ﬁt to the data. • Randomly split our data into three sets • Fit models on the training set • Use the validation set to ﬁnd the best model • Quote ﬁnal performance of this model on the test set Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 8 / 10
- 10. K-fold cross-validation Estimates of generalization error from one train / validation split can be noisy, so shuﬄe data and average over K distinct validation partitions instead Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 9 / 10
- 11. K-fold cross-validation: pseudocode (randomly) divide the data into K parts for each model for each of the K folds train on everything but one fold measure the error on the held out fold store the training and validation error compute and store the average error across all folds pick the model with the lowest average validation error evaluate its performance on a final, held out test set Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 10 / 10

No public clipboards found for this slide

Be the first to comment