Modeling Social Data, Lecture 7: Model complexity and generalization

•

1 like•797 views

jakehofman

http://modelingsocialdata.org

Education

Model complexity and generalization
APAM E4990
Modeling Social Data
Jake Hofman
Columbia University
March 3, 2017
Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 1 / 10

Overﬁtting (a la xkcd)
Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 2 / 10

Overﬁtting (a la xkcd)
Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 3 / 10

Complexity
Our models should be complex enough to explain the past, but
simple enough to generalize to the future
Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 4 / 10

Bias-variance tradeoﬀ
Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 5 / 10

Bias-variance tradeoﬀ
38 2. Overview of Supervised Learning
High Bias
Low Variance
Low Bias
High Variance
PredictionError
Model Complexity
Training Sample
Test Sample
Low High
FIGURE 2.11. Test and training error as a function of model complexity.
be close to f(x0). As k grows, the neighbors are further away, and then
anything can happen.
The variance term is simply the variance of an average here, and de-
creases as the inverse of k. So as k varies, there is a bias–variance tradeoﬀ.
Simple models may be “wrong” (high bias), but ﬁts don’t vary a
lot with diﬀerent samples of training data (low variance)
Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 6 / 10

Bigger models = Better models
Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 7 / 10

Cross-validation
set error of the ﬁnal chosen model will underestimate the true test error,
sometimes substantially.
It is diﬃcult to give a general rule on how to choose the number of
observations in each of the three parts, as this depends on the signal-to-
noise ratio in the data and the training sample size. A typical split might
be 50% for training, and 25% each for validation and testing:
TestTrain Validation TestTrain Validation TestValidationTrain Validation TestTrain
The methods in this chapter are designed for situations where there is
insuﬃcient data to split it into three parts. Again it is too diﬃcult to give
a general rule on how much training data is enough; among other things,
this depends on the signal-to-noise ratio of the underlying function, and
the complexity of the models being ﬁt to the data.
• Randomly split our data into three sets
• Fit models on the training set
• Use the validation set to ﬁnd the best model
• Quote ﬁnal performance of this model on the test set
Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 8 / 10

K-fold cross-validation
Estimates of generalization error from one train / validation split
can be noisy, so shuﬄe data and average over K distinct validation
partitions instead
Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 9 / 10

K-fold cross-validation: pseudocode
(randomly) divide the data into K parts
for each model
for each of the K folds
train on everything but one fold
measure the error on the held out fold
store the training and validation error
compute and store the average error across all folds
pick the model with the lowest average validation error
evaluate its performance on a final, held out test set
Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 10 / 10

What's hot

Logistic regression: topological and geometric considerationsColleen Farrelly

Combined queriesLaura Strudeman

Textmining Predictive Modelsguest0edcaf

Selection system: Biplots and Mapping genotyoeAlex Harley

Benchmarking the Effectiveness of Associating Chains of Links for Exploratory...Laurens De Vocht

1645 track2 brandenburger_lempolaRising Media, Inc.

1.4 Data Collection & Samplingmlong24

Persentation of SAD 2Khaled Salmeen BAzqameh

Data pre processingkalavathisugan

Introduction to random forest and gradient boosting methods a lectureShreyas S K

Chapter 2 part 1(Database System)DoLce MiEra

Correlation testingSteve Bishop

What's hot (12)

Logistic regression: topological and geometric considerations

Combined queries

Textmining Predictive Models

Selection system: Biplots and Mapping genotyoe

Benchmarking the Effectiveness of Associating Chains of Links for Exploratory...

1645 track2 brandenburger_lempola

1.4 Data Collection & Sampling

Persentation of SAD 2

Data pre processing

Introduction to random forest and gradient boosting methods a lecture

Chapter 2 part 1(Database System)

Correlation testing

Viewers also liked

искусство фотографииrufinanikolaevna

Consejos saludables para lograr una mente y cuerpo sanosMichael White

Processo de Reclamação 2017.2Yuri Moralles

The role of the right hemisphere. gifted chiild quarterly. rubenzer (columbi...Dr. Ron Rubenzer

Tipos de adiccionesLeonid Contreras Cusi

Indicadores de tablero david pluma

урок 1 очрValia M

ВодаBiljana Gerov

Bridge | Arbor Realty Trust: Growing Financial PartnershipsIvan Kaufman

125 569 Школа №7 Миргород

Q7: looking back at preliminary.lottieseaton7

#MTC2017: Potencjał związany z konwersacyjnym interfejsem użytkownika - Micha...Mobile Trends

ВодаBiljana Gerov

Você está pronto para liderar?!Larissa Turchio

Economía de la Empresa 2º Bachillerato - UD9. La función comercialBea Hervella

3 engaart03Jesús David Cardona

Φιγούρες ελληνικής επανάστασηςFotini Dim

Herramientas básicas de word dpazEstuardo Pinto

Proyecto de Ley para la intervención de ItatíCorrientesaldia

062 taller jennifer a

Viewers also liked (20)

искусство фотографии

Consejos saludables para lograr una mente y cuerpo sanos

Processo de Reclamação 2017.2

The role of the right hemisphere. gifted chiild quarterly. rubenzer (columbi...

Tipos de adicciones

Indicadores de tablero

урок 1 очр

Вода

Bridge | Arbor Realty Trust: Growing Financial Partnerships

125 569

Q7: looking back at preliminary.

#MTC2017: Potencjał związany z konwersacyjnym interfejsem użytkownika - Micha...

Вода

Você está pronto para liderar?!

Economía de la Empresa 2º Bachillerato - UD9. La función comercial

3 engaart03

Φιγούρες ελληνικής επανάστασης

Herramientas básicas de word dpaz

Proyecto de Ley para la intervención de Itatí

062 taller

Similar to Modeling Social Data, Lecture 7: Model complexity and generalization

Variable and feature selectionAaron Karper

Probability density estimation using Product of Conditional ExpertsChirag Gupta

Lecture6 xingTianlu Wang

Modeling strategies for definitive screening designs using jmp and rPhilip Ramsey

Adaptive Multilevel Clustering Model for the Prediction of Academic RiskXavier Ochoa

Research Method for Business chapter 12Mazhar Poohlah

ensemble learningbutest

Presentacion seminario m_vallejo_marzo11greendisc

Ensemble Learning Featuring the Netflix Prize Competition and ...butest

A comparative study of clustering and biclustering of microarray dataijcsit

6238578.pptChijiokeNsofor

Tree net and_randomforests_2009Matthew Magistrado

Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...Daniel Katz

(Machine Learning) Ensemble learning Omkar Rane

Introduction to ModelingJMP software from SAS

Detecting Attributes and Covariates Interaction in Discrete Choice Modelkosby2000

Paper id 312201512IJRAT

Overfitting.pptxPerumalPitchandi

A Systems Approach to the Modeling and Control of Molecular, Microparticle, a...ejhukkanen

Download Itbutest

Similar to Modeling Social Data, Lecture 7: Model complexity and generalization (20)

Variable and feature selection

Probability density estimation using Product of Conditional Experts

Lecture6 xing

Modeling strategies for definitive screening designs using jmp and r

Adaptive Multilevel Clustering Model for the Prediction of Academic Risk

Research Method for Business chapter 12

ensemble learning

Presentacion seminario m_vallejo_marzo11

Ensemble Learning Featuring the Netflix Prize Competition and ...

A comparative study of clustering and biclustering of microarray data

6238578.ppt

Tree net and_randomforests_2009

Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...

(Machine Learning) Ensemble learning

Introduction to Modeling

Detecting Attributes and Covariates Interaction in Discrete Choice Model

Paper id 312201512

Overfitting.pptx

A Systems Approach to the Modeling and Control of Molecular, Microparticle, a...

Download It

Recently uploaded

How to Add New Custom Addons Path in Odoo 17Celine George

Sociology 101 Demonstration of Learning Exhibitjbellavia9

Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid

Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxPooja Bhuva

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC

How to Create and Manage Wizard in Odoo 17Celine George

Towards a code of practice for AI in AT.pptxJisc

80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection

Single or Multiple melodic lines structuredhanjurrannsibayan2

Accessible Digital Futures project (20/03/2024)Jisc

FSB Advising Checklist - Orientation 2024Elizabeth Walsh

Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam

Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand

Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva

Plant propagation: Sexual and Asexual propapagation.pptxUmeshTimilsina1

OSCM Unit 2_Operations Processes & SystemsSandeep D Chaudhary

Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva

COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01

This PowerPoint helps students to consider the concept of infinity.christianmathematics

REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda

Recently uploaded (20)

How to Add New Custom Addons Path in Odoo 17

Sociology 101 Demonstration of Learning Exhibit

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx

Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx

How to Create and Manage Wizard in Odoo 17

Towards a code of practice for AI in AT.pptx

80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...

Single or Multiple melodic lines structure

Accessible Digital Futures project (20/03/2024)

FSB Advising Checklist - Orientation 2024

Python Notes for mca i year students osmania university.docx

Google Gemini An AI Revolution in Education.pptx

Interdisciplinary_Insights_Data_Collection_Methods.pptx

Plant propagation: Sexual and Asexual propapagation.pptx

OSCM Unit 2_Operations Processes & Systems

Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...

COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx

This PowerPoint helps students to consider the concept of infinity.

REMIFENTANIL: An Ultra short acting opioid.pptx

Modeling Social Data, Lecture 7: Model complexity and generalization

1. Model complexity and generalization APAM E4990 Modeling Social Data Jake Hofman Columbia University March 3, 2017 Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 1 / 10

2. Overﬁtting (a la xkcd) Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 2 / 10

3. Overﬁtting (a la xkcd) Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 3 / 10

4. Complexity Our models should be complex enough to explain the past, but simple enough to generalize to the future Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 4 / 10

5. Bias-variance tradeoﬀ Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 5 / 10

6. Bias-variance tradeoff 38 2. Overview of Supervised Learning High Bias Low Variance Low Bias High Variance PredictionError Model Complexity Training Sample Test Sample Low High FIGURE 2.11. Test and training error as a function of model complexity. be close to f(x0). As k grows, the neighbors are further away, and then anything can happen. The variance term is simply the variance of an average here, and de- creases as the inverse of k. So as k varies, there is a bias–variance tradeoff. Simple models may be “wrong” (high bias), but fits don’t vary a lot with different samples of training data (low variance) Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 6 / 10

7. Bias-variance tradeoﬀ 38 2. Overview of Supervised Learning High Bias Low Variance Low Bias High Variance PredictionError Model Complexity Training Sample Test Sample Low High FIGURE 2.11. Test and training error as a function of model complexity. be close to f(x0). As k grows, the neighbors are further away, and then anything can happen. The variance term is simply the variance of an average here, and de- creases as the inverse of k. So as k varies, there is a bias–variance tradeoﬀ. Flexible models can capture more complex relationships (low bias), but are also sensitive to noise in the training data (high variance) Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 6 / 10

8. Bigger models = Better models Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 7 / 10

9. Cross-validation set error of the final chosen model will underestimate the true test error, sometimes substantially. It is difficult to give a general rule on how to choose the number of observations in each of the three parts, as this depends on the signal-to- noise ratio in the data and the training sample size. A typical split might be 50% for training, and 25% each for validation and testing: TestTrain Validation TestTrain Validation TestValidationTrain Validation TestTrain The methods in this chapter are designed for situations where there is insufficient data to split it into three parts. Again it is too difficult to give a general rule on how much training data is enough; among other things, this depends on the signal-to-noise ratio of the underlying function, and the complexity of the models being fit to the data. • Randomly split our data into three sets • Fit models on the training set • Use the validation set to find the best model • Quote final performance of this model on the test set Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 8 / 10

10. K-fold cross-validation Estimates of generalization error from one train / validation split can be noisy, so shuﬄe data and average over K distinct validation partitions instead Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 9 / 10

11. K-fold cross-validation: pseudocode (randomly) divide the data into K parts for each model for each of the K folds train on everything but one fold measure the error on the held out fold store the training and validation error compute and store the average error across all folds pick the model with the lowest average validation error evaluate its performance on a final, held out test set Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 10 / 10

Modeling Social Data, Lecture 7: Model complexity and generalization

Recommended

Recommended

More Related Content

What's hot

What's hot (12)

Viewers also liked

Viewers also liked (20)

Similar to Modeling Social Data, Lecture 7: Model complexity and generalization

Similar to Modeling Social Data, Lecture 7: Model complexity and generalization (20)

More from jakehofman

More from jakehofman (20)

Recently uploaded

Recently uploaded (20)

Modeling Social Data, Lecture 7: Model complexity and generalization