SlideShare a Scribd company logo
1 of 11
Download to read offline
Model complexity and generalization
APAM E4990
Modeling Social Data
Jake Hofman
Columbia University
March 3, 2017
Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 1 / 10
Overfitting (a la xkcd)
Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 2 / 10
Overfitting (a la xkcd)
Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 3 / 10
Complexity
Our models should be complex enough to explain the past, but
simple enough to generalize to the future
Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 4 / 10
Bias-variance tradeoff
Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 5 / 10
Bias-variance tradeoff
38 2. Overview of Supervised Learning
High Bias
Low Variance
Low Bias
High Variance
PredictionError
Model Complexity
Training Sample
Test Sample
Low High
FIGURE 2.11. Test and training error as a function of model complexity.
be close to f(x0). As k grows, the neighbors are further away, and then
anything can happen.
The variance term is simply the variance of an average here, and de-
creases as the inverse of k. So as k varies, there is a bias–variance tradeoff.
Simple models may be “wrong” (high bias), but fits don’t vary a
lot with different samples of training data (low variance)
Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 6 / 10
Bias-variance tradeoff
38 2. Overview of Supervised Learning
High Bias
Low Variance
Low Bias
High Variance
PredictionError
Model Complexity
Training Sample
Test Sample
Low High
FIGURE 2.11. Test and training error as a function of model complexity.
be close to f(x0). As k grows, the neighbors are further away, and then
anything can happen.
The variance term is simply the variance of an average here, and de-
creases as the inverse of k. So as k varies, there is a bias–variance tradeoff.
Flexible models can capture more complex relationships (low bias),
but are also sensitive to noise in the training data (high variance)
Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 6 / 10
Bigger models = Better models
Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 7 / 10
Cross-validation
set error of the final chosen model will underestimate the true test error,
sometimes substantially.
It is difficult to give a general rule on how to choose the number of
observations in each of the three parts, as this depends on the signal-to-
noise ratio in the data and the training sample size. A typical split might
be 50% for training, and 25% each for validation and testing:
TestTrain Validation TestTrain Validation TestValidationTrain Validation TestTrain
The methods in this chapter are designed for situations where there is
insufficient data to split it into three parts. Again it is too difficult to give
a general rule on how much training data is enough; among other things,
this depends on the signal-to-noise ratio of the underlying function, and
the complexity of the models being fit to the data.
• Randomly split our data into three sets
• Fit models on the training set
• Use the validation set to find the best model
• Quote final performance of this model on the test set
Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 8 / 10
K-fold cross-validation
Estimates of generalization error from one train / validation split
can be noisy, so shuffle data and average over K distinct validation
partitions instead
Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 9 / 10
K-fold cross-validation: pseudocode
(randomly) divide the data into K parts
for each model
for each of the K folds
train on everything but one fold
measure the error on the held out fold
store the training and validation error
compute and store the average error across all folds
pick the model with the lowest average validation error
evaluate its performance on a final, held out test set
Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 10 / 10

More Related Content

What's hot

Logistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerationsLogistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerationsColleen Farrelly
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Modelsguest0edcaf
 
Selection system: Biplots and Mapping genotyoe
Selection system: Biplots and Mapping genotyoeSelection system: Biplots and Mapping genotyoe
Selection system: Biplots and Mapping genotyoeAlex Harley
 
Benchmarking the Effectiveness of Associating Chains of Links for Exploratory...
Benchmarking the Effectiveness of Associating Chains of Links for Exploratory...Benchmarking the Effectiveness of Associating Chains of Links for Exploratory...
Benchmarking the Effectiveness of Associating Chains of Links for Exploratory...Laurens De Vocht
 
1645 track2 brandenburger_lempola
1645 track2 brandenburger_lempola1645 track2 brandenburger_lempola
1645 track2 brandenburger_lempolaRising Media, Inc.
 
1.4 Data Collection & Sampling
1.4 Data Collection & Sampling1.4 Data Collection & Sampling
1.4 Data Collection & Samplingmlong24
 
Introduction to random forest and gradient boosting methods a lecture
Introduction to random forest and gradient boosting methods   a lectureIntroduction to random forest and gradient boosting methods   a lecture
Introduction to random forest and gradient boosting methods a lectureShreyas S K
 
Chapter 2 part 1(Database System)
Chapter 2 part 1(Database System)Chapter 2 part 1(Database System)
Chapter 2 part 1(Database System)DoLce MiEra
 
Correlation testing
Correlation testingCorrelation testing
Correlation testingSteve Bishop
 

What's hot (12)

Logistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerationsLogistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerations
 
Combined queries
Combined queriesCombined queries
Combined queries
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
Selection system: Biplots and Mapping genotyoe
Selection system: Biplots and Mapping genotyoeSelection system: Biplots and Mapping genotyoe
Selection system: Biplots and Mapping genotyoe
 
Benchmarking the Effectiveness of Associating Chains of Links for Exploratory...
Benchmarking the Effectiveness of Associating Chains of Links for Exploratory...Benchmarking the Effectiveness of Associating Chains of Links for Exploratory...
Benchmarking the Effectiveness of Associating Chains of Links for Exploratory...
 
1645 track2 brandenburger_lempola
1645 track2 brandenburger_lempola1645 track2 brandenburger_lempola
1645 track2 brandenburger_lempola
 
1.4 Data Collection & Sampling
1.4 Data Collection & Sampling1.4 Data Collection & Sampling
1.4 Data Collection & Sampling
 
Persentation of SAD 2
Persentation of SAD 2Persentation of SAD 2
Persentation of SAD 2
 
Data pre processing
Data pre processingData pre processing
Data pre processing
 
Introduction to random forest and gradient boosting methods a lecture
Introduction to random forest and gradient boosting methods   a lectureIntroduction to random forest and gradient boosting methods   a lecture
Introduction to random forest and gradient boosting methods a lecture
 
Chapter 2 part 1(Database System)
Chapter 2 part 1(Database System)Chapter 2 part 1(Database System)
Chapter 2 part 1(Database System)
 
Correlation testing
Correlation testingCorrelation testing
Correlation testing
 

Viewers also liked

искусство фотографии
искусство фотографииискусство фотографии
искусство фотографииrufinanikolaevna
 
Consejos saludables para lograr una mente y cuerpo sanos
Consejos saludables para lograr una mente y cuerpo sanosConsejos saludables para lograr una mente y cuerpo sanos
Consejos saludables para lograr una mente y cuerpo sanosMichael White
 
Processo de Reclamação 2017.2
Processo de Reclamação 2017.2Processo de Reclamação 2017.2
Processo de Reclamação 2017.2Yuri Moralles
 
The role of the right hemisphere. gifted chiild quarterly. rubenzer (columbi...
The role of the right hemisphere. gifted chiild quarterly.  rubenzer (columbi...The role of the right hemisphere. gifted chiild quarterly.  rubenzer (columbi...
The role of the right hemisphere. gifted chiild quarterly. rubenzer (columbi...Dr. Ron Rubenzer
 
Indicadores de tablero
Indicadores de tablero Indicadores de tablero
Indicadores de tablero david pluma
 
урок 1 очр
урок 1 очрурок 1 очр
урок 1 очрValia M
 
Bridge | Arbor Realty Trust: Growing Financial Partnerships
Bridge | Arbor Realty Trust: Growing Financial PartnershipsBridge | Arbor Realty Trust: Growing Financial Partnerships
Bridge | Arbor Realty Trust: Growing Financial PartnershipsIvan Kaufman
 
Q7: looking back at preliminary.
Q7: looking back at preliminary.Q7: looking back at preliminary.
Q7: looking back at preliminary.lottieseaton7
 
#MTC2017: Potencjał związany z konwersacyjnym interfejsem użytkownika - Micha...
#MTC2017: Potencjał związany z konwersacyjnym interfejsem użytkownika - Micha...#MTC2017: Potencjał związany z konwersacyjnym interfejsem użytkownika - Micha...
#MTC2017: Potencjał związany z konwersacyjnym interfejsem użytkownika - Micha...Mobile Trends
 
Você está pronto para liderar?!
Você está pronto para liderar?!Você está pronto para liderar?!
Você está pronto para liderar?!Larissa Turchio
 
Economía de la Empresa 2º Bachillerato - UD9. La función comercial
Economía de la Empresa 2º Bachillerato - UD9. La función comercialEconomía de la Empresa 2º Bachillerato - UD9. La función comercial
Economía de la Empresa 2º Bachillerato - UD9. La función comercialBea Hervella
 
Φιγούρες ελληνικής επανάστασης
Φιγούρες ελληνικής επανάστασηςΦιγούρες ελληνικής επανάστασης
Φιγούρες ελληνικής επανάστασηςFotini Dim
 
Herramientas básicas de word dpaz
Herramientas básicas de word dpazHerramientas básicas de word dpaz
Herramientas básicas de word dpazEstuardo Pinto
 
Proyecto de Ley para la intervención de Itatí
Proyecto de Ley para la intervención de ItatíProyecto de Ley para la intervención de Itatí
Proyecto de Ley para la intervención de ItatíCorrientesaldia
 

Viewers also liked (20)

искусство фотографии
искусство фотографииискусство фотографии
искусство фотографии
 
Consejos saludables para lograr una mente y cuerpo sanos
Consejos saludables para lograr una mente y cuerpo sanosConsejos saludables para lograr una mente y cuerpo sanos
Consejos saludables para lograr una mente y cuerpo sanos
 
Processo de Reclamação 2017.2
Processo de Reclamação 2017.2Processo de Reclamação 2017.2
Processo de Reclamação 2017.2
 
The role of the right hemisphere. gifted chiild quarterly. rubenzer (columbi...
The role of the right hemisphere. gifted chiild quarterly.  rubenzer (columbi...The role of the right hemisphere. gifted chiild quarterly.  rubenzer (columbi...
The role of the right hemisphere. gifted chiild quarterly. rubenzer (columbi...
 
Tipos de adicciones
Tipos de adiccionesTipos de adicciones
Tipos de adicciones
 
Indicadores de tablero
Indicadores de tablero Indicadores de tablero
Indicadores de tablero
 
урок 1 очр
урок 1 очрурок 1 очр
урок 1 очр
 
Вода
ВодаВода
Вода
 
Bridge | Arbor Realty Trust: Growing Financial Partnerships
Bridge | Arbor Realty Trust: Growing Financial PartnershipsBridge | Arbor Realty Trust: Growing Financial Partnerships
Bridge | Arbor Realty Trust: Growing Financial Partnerships
 
125 569
125 569 125 569
125 569
 
Q7: looking back at preliminary.
Q7: looking back at preliminary.Q7: looking back at preliminary.
Q7: looking back at preliminary.
 
#MTC2017: Potencjał związany z konwersacyjnym interfejsem użytkownika - Micha...
#MTC2017: Potencjał związany z konwersacyjnym interfejsem użytkownika - Micha...#MTC2017: Potencjał związany z konwersacyjnym interfejsem użytkownika - Micha...
#MTC2017: Potencjał związany z konwersacyjnym interfejsem użytkownika - Micha...
 
Вода
ВодаВода
Вода
 
Você está pronto para liderar?!
Você está pronto para liderar?!Você está pronto para liderar?!
Você está pronto para liderar?!
 
Economía de la Empresa 2º Bachillerato - UD9. La función comercial
Economía de la Empresa 2º Bachillerato - UD9. La función comercialEconomía de la Empresa 2º Bachillerato - UD9. La función comercial
Economía de la Empresa 2º Bachillerato - UD9. La función comercial
 
3 engaart03
3 engaart033 engaart03
3 engaart03
 
Φιγούρες ελληνικής επανάστασης
Φιγούρες ελληνικής επανάστασηςΦιγούρες ελληνικής επανάστασης
Φιγούρες ελληνικής επανάστασης
 
Herramientas básicas de word dpaz
Herramientas básicas de word dpazHerramientas básicas de word dpaz
Herramientas básicas de word dpaz
 
Proyecto de Ley para la intervención de Itatí
Proyecto de Ley para la intervención de ItatíProyecto de Ley para la intervención de Itatí
Proyecto de Ley para la intervención de Itatí
 
062 taller
062 taller 062 taller
062 taller
 

Similar to Modeling Social Data, Lecture 7: Model complexity and generalization

Variable and feature selection
Variable and feature selectionVariable and feature selection
Variable and feature selectionAaron Karper
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsChirag Gupta
 
Modeling strategies for definitive screening designs using jmp and r
Modeling strategies for definitive  screening designs using jmp and rModeling strategies for definitive  screening designs using jmp and r
Modeling strategies for definitive screening designs using jmp and rPhilip Ramsey
 
Adaptive Multilevel Clustering Model for the Prediction of Academic Risk
Adaptive Multilevel Clustering Model for the Prediction of Academic RiskAdaptive Multilevel Clustering Model for the Prediction of Academic Risk
Adaptive Multilevel Clustering Model for the Prediction of Academic RiskXavier Ochoa
 
Research Method for Business chapter 12
Research Method for Business chapter 12Research Method for Business chapter 12
Research Method for Business chapter 12Mazhar Poohlah
 
ensemble learning
ensemble learningensemble learning
ensemble learningbutest
 
Presentacion seminario m_vallejo_marzo11
Presentacion seminario m_vallejo_marzo11Presentacion seminario m_vallejo_marzo11
Presentacion seminario m_vallejo_marzo11greendisc
 
Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...butest
 
A comparative study of clustering and biclustering of microarray data
A comparative study of clustering and biclustering of microarray dataA comparative study of clustering and biclustering of microarray data
A comparative study of clustering and biclustering of microarray dataijcsit
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009Matthew Magistrado
 
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...Daniel Katz
 
(Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning (Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning Omkar Rane
 
Detecting Attributes and Covariates Interaction in Discrete Choice Model
Detecting Attributes and Covariates Interaction in Discrete Choice ModelDetecting Attributes and Covariates Interaction in Discrete Choice Model
Detecting Attributes and Covariates Interaction in Discrete Choice Modelkosby2000
 
Paper id 312201512
Paper id 312201512Paper id 312201512
Paper id 312201512IJRAT
 
A Systems Approach to the Modeling and Control of Molecular, Microparticle, a...
A Systems Approach to the Modeling and Control of Molecular, Microparticle, a...A Systems Approach to the Modeling and Control of Molecular, Microparticle, a...
A Systems Approach to the Modeling and Control of Molecular, Microparticle, a...ejhukkanen
 
Download It
Download ItDownload It
Download Itbutest
 

Similar to Modeling Social Data, Lecture 7: Model complexity and generalization (20)

Variable and feature selection
Variable and feature selectionVariable and feature selection
Variable and feature selection
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional Experts
 
Lecture6 xing
Lecture6 xingLecture6 xing
Lecture6 xing
 
Modeling strategies for definitive screening designs using jmp and r
Modeling strategies for definitive  screening designs using jmp and rModeling strategies for definitive  screening designs using jmp and r
Modeling strategies for definitive screening designs using jmp and r
 
Adaptive Multilevel Clustering Model for the Prediction of Academic Risk
Adaptive Multilevel Clustering Model for the Prediction of Academic RiskAdaptive Multilevel Clustering Model for the Prediction of Academic Risk
Adaptive Multilevel Clustering Model for the Prediction of Academic Risk
 
Research Method for Business chapter 12
Research Method for Business chapter 12Research Method for Business chapter 12
Research Method for Business chapter 12
 
ensemble learning
ensemble learningensemble learning
ensemble learning
 
Presentacion seminario m_vallejo_marzo11
Presentacion seminario m_vallejo_marzo11Presentacion seminario m_vallejo_marzo11
Presentacion seminario m_vallejo_marzo11
 
Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...
 
A comparative study of clustering and biclustering of microarray data
A comparative study of clustering and biclustering of microarray dataA comparative study of clustering and biclustering of microarray data
A comparative study of clustering and biclustering of microarray data
 
6238578.ppt
6238578.ppt6238578.ppt
6238578.ppt
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009
 
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
 
(Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning (Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning
 
Introduction to Modeling
Introduction to ModelingIntroduction to Modeling
Introduction to Modeling
 
Detecting Attributes and Covariates Interaction in Discrete Choice Model
Detecting Attributes and Covariates Interaction in Discrete Choice ModelDetecting Attributes and Covariates Interaction in Discrete Choice Model
Detecting Attributes and Covariates Interaction in Discrete Choice Model
 
Paper id 312201512
Paper id 312201512Paper id 312201512
Paper id 312201512
 
Overfitting.pptx
Overfitting.pptxOverfitting.pptx
Overfitting.pptx
 
A Systems Approach to the Modeling and Control of Molecular, Microparticle, a...
A Systems Approach to the Modeling and Control of Molecular, Microparticle, a...A Systems Approach to the Modeling and Control of Molecular, Microparticle, a...
A Systems Approach to the Modeling and Control of Molecular, Microparticle, a...
 
Download It
Download ItDownload It
Download It
 

More from jakehofman

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2jakehofman
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1jakehofman
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networksjakehofman
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classificationjakehofman
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overviewjakehofman
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systemsjakehofman
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayesjakehofman
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studiesjakehofman
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Sciencejakehofman
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classificationjakehofman
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regressionjakehofman
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experimentsjakehofman
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wranglingjakehofman
 
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIComputational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIjakehofman
 
Computational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part IComputational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part Ijakehofman
 
Computational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part IIComputational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part IIjakehofman
 

More from jakehofman (20)

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networks
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classification
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scale
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overview
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systems
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scale
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studies
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classification
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regression
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experiments
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wrangling
 
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIComputational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
 
Computational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part IComputational Social Science, Lecture 07: Counting Fast, Part I
Computational Social Science, Lecture 07: Counting Fast, Part I
 
Computational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part IIComputational Social Science, Lecture 06: Networks, Part II
Computational Social Science, Lecture 06: Networks, Part II
 

Recently uploaded

How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxPooja Bhuva
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxUmeshTimilsina1
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsSandeep D Chaudhary
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 

Recently uploaded (20)

How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 

Modeling Social Data, Lecture 7: Model complexity and generalization

  • 1. Model complexity and generalization APAM E4990 Modeling Social Data Jake Hofman Columbia University March 3, 2017 Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 1 / 10
  • 2. Overfitting (a la xkcd) Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 2 / 10
  • 3. Overfitting (a la xkcd) Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 3 / 10
  • 4. Complexity Our models should be complex enough to explain the past, but simple enough to generalize to the future Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 4 / 10
  • 5. Bias-variance tradeoff Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 5 / 10
  • 6. Bias-variance tradeoff 38 2. Overview of Supervised Learning High Bias Low Variance Low Bias High Variance PredictionError Model Complexity Training Sample Test Sample Low High FIGURE 2.11. Test and training error as a function of model complexity. be close to f(x0). As k grows, the neighbors are further away, and then anything can happen. The variance term is simply the variance of an average here, and de- creases as the inverse of k. So as k varies, there is a bias–variance tradeoff. Simple models may be “wrong” (high bias), but fits don’t vary a lot with different samples of training data (low variance) Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 6 / 10
  • 7. Bias-variance tradeoff 38 2. Overview of Supervised Learning High Bias Low Variance Low Bias High Variance PredictionError Model Complexity Training Sample Test Sample Low High FIGURE 2.11. Test and training error as a function of model complexity. be close to f(x0). As k grows, the neighbors are further away, and then anything can happen. The variance term is simply the variance of an average here, and de- creases as the inverse of k. So as k varies, there is a bias–variance tradeoff. Flexible models can capture more complex relationships (low bias), but are also sensitive to noise in the training data (high variance) Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 6 / 10
  • 8. Bigger models = Better models Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 7 / 10
  • 9. Cross-validation set error of the final chosen model will underestimate the true test error, sometimes substantially. It is difficult to give a general rule on how to choose the number of observations in each of the three parts, as this depends on the signal-to- noise ratio in the data and the training sample size. A typical split might be 50% for training, and 25% each for validation and testing: TestTrain Validation TestTrain Validation TestValidationTrain Validation TestTrain The methods in this chapter are designed for situations where there is insufficient data to split it into three parts. Again it is too difficult to give a general rule on how much training data is enough; among other things, this depends on the signal-to-noise ratio of the underlying function, and the complexity of the models being fit to the data. • Randomly split our data into three sets • Fit models on the training set • Use the validation set to find the best model • Quote final performance of this model on the test set Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 8 / 10
  • 10. K-fold cross-validation Estimates of generalization error from one train / validation split can be noisy, so shuffle data and average over K distinct validation partitions instead Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 9 / 10
  • 11. K-fold cross-validation: pseudocode (randomly) divide the data into K parts for each model for each of the K folds train on everything but one fold measure the error on the held out fold store the training and validation error compute and store the average error across all folds pick the model with the lowest average validation error evaluate its performance on a final, held out test set Jake Hofman (Columbia University) Model complexity and generalization March 3, 2017 10 / 10