SlideShare a Scribd company logo
1 of 51
Download to read offline
Galit Shmuéli Georgetown University October 30, 2009 To Explain or To Predict? Explanatory vs. Predictive Modeling in Scientific Research
The path to discovery
Explain Predict
What are		“explaining”?				“predicting”?
Statistical modeling in social science research Purpose: test causal theory (“explain”) Association-based statistical models  Prediction nearly absent
Lesson #1: Whether statisticians like it or not, in the social sciences, association-based statistical models  are used for testing causal theory. Justification: a strong underlying theoretical model provides the causality.
Definition: Explanatory Model A statistical model used for testing causal theory (“proper” or not)
Definition: Predictive Model An empirical model used for predicting new records/scenarios
Multi-page sections with theoretical justifications of each hypothesis
Concept operationalization Poverty Trust Anger Economic stability Well-being 4 pages of such tables
Statistical model (here: path analysis)
“Statistical” conclusions
Research conclusions
Lesson #2 In the social sciences, empirical analysis is mainly used for testing causal theory. Empirical prediction  is considered un-academic. Some statisticians share this view:  	The two goals in analyzing data... I prefer to describe as “management” and “science”. Management seeks profit... Science seeks truth. Parzen, Statistical Science 2001
Prediction in the  Information Systems literature
Predictive goal stated?Predictive power assessed?
1072 articles of which52 empirical with predictive claims “Examples of [predictive] theory in IS do not come readily to hand, suggesting that they are not common”Gregor, MISQ 2006
Breakdown of the 52 “predictive” articles
Why Predict? Scientific use of empirical models To Predict To Explain test causal  theory (utility) relevance new theory predictability
Why are statistical  explanatory models  different than  predictive models?
Theory vs. its manifestation ?
“The goal of finding models that are predictively accurate differs from the goal of finding models that are true.”
Given the research environment in the social sciences,  two critically important points are: Explanatory power and predictive accuracy cannot be inferred from one another. The “best” explanatory model is (nearly) never the “best” predictive model, and vice versa.
Point #1 Explanatory Power Predictive Power ≠ Cannot infer one from the other
What is R2 ?
In-sample vs. out-of-sample evaluation
out-of-sample interpretation prediction accuracy p-values Performance Evaluation R2 costs goodness-of-fit run time Danger: type I,II errors Danger: over-fitting
Suggestion for social scientists: Report predictive accuracy in addition to explanatory power
Predictive Power Explanatory Power
Point #2 Best explanatory model ≠ Best predictive model
Predict ≠ Explain “We should mention that not all data features were found to be useful. For example, we tried to benefit from an extensive set of attributes describing each of the movies in the dataset. Those attributes certainly carry a significant signal and can explain some of the user behavior. However, we concluded that they could not help at all for improving the accuracy of well tuned collaborative filtering models.”  Bell et al., 2008  + ?
Predict ≠ Explain The FDA considers two products bioequivalent if the 90% CI of the relative mean of the generic to brand formulation is within 80%-125% “We are planning to… develop predictive models for bioavailability and bioequivalence” Lester M. Crawford, 2005 Acting Commissioner of Food & Drugs
Let’s dig in
Explanatory goal:  minimize model bias Predictive goal:  minimize MSE(model bias + sampling variance)
What is Optimized? Bias Prediction MSE or Var(Y)= uncontrollable bias2 = model misspecification estimation  (sampling variance)
Linear Regression Example Underspecified model Estimated model True model Estimated model MSE2 < MSE1 when:  σ2 large  |β2| small  corr(x1,x2) high  limited range of x’s
China's Diverging Paths, photo by Clark Smith Twostatistical modeling paths
Design  & Collection Data Preparation Goal Definition EDA Variables? Methods? Model Use & Reporting Evaluation, Validation    & Model Selection
Hierarchical data Study design  & data collection Observational or experiment? Primary or secondary data? Instrument (reliability+validity vs. measur accuracy)  How much data?  How to sample?
reduced-feature models partitioning Data preparation missing
summary stats plots outliers trends Interactive visualization PCASVD
Which variables? Multicollinearity? A, B, A*B? theoryassociations ex-post availability
Methods / Models bias variance Blackbox / interpretable Mapping to theory ridge regression ensembles boosting PLS PCR
Model fit ≠ Validation Explanatory power Empirical model Theoretical model Data Evaluation, Validation& Model Selection Training data Empirical model Over-fitting analysis Holdout data Predictive power
Model Use Inference Test causal  theory Null hypothesis Predictions (utility) Relevance New theory Predictability Predictive performance Over-fitting analysis Naïve/baseline
Design  & Collection Data Preparation Goal Definition EDA Variables? Methods? Model Use & Reporting Evaluation, Validation,     & Model Selection
How does all this impact  research  in the (social) sciences?
Three Current Problems Prediction underappreciated Distinction blurred Inappropriate modeling/assessment “While the value of scientific prediction… is beyond question… the inexact sciences [do not] have…the use of predictive expertise well in hand.” Helmer & Rescher, 1959
Why? What can be done? Statisticians should acknowledge the difference and teach it!
It’s time for Change To Predict To Explain

More Related Content

What's hot

Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisEva Durall
 
Reviewing quantitative articles_and_checklist
Reviewing quantitative articles_and_checklistReviewing quantitative articles_and_checklist
Reviewing quantitative articles_and_checklistLasse Torkkeli
 
2 types of research
2 types of research2 types of research
2 types of researchNaveed Saeed
 
Research methodology for business .pptx
Research methodology for business .pptxResearch methodology for business .pptx
Research methodology for business .pptxParmeshwar Biradar
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualizationDr. Hamdan Al-Sabri
 
Emil Pulido on Qualitative Research: Analyzing Qualitative Data
Emil Pulido on Qualitative Research: Analyzing Qualitative DataEmil Pulido on Qualitative Research: Analyzing Qualitative Data
Emil Pulido on Qualitative Research: Analyzing Qualitative DataEmilEJP
 
Qual data analysis and interpretation
Qual data analysis and interpretationQual data analysis and interpretation
Qual data analysis and interpretationSam Ladner
 
Collecting, analyzing and interpreting data
Collecting, analyzing and interpreting dataCollecting, analyzing and interpreting data
Collecting, analyzing and interpreting dataJimi Kayode
 
Business research (1)
Business research (1)Business research (1)
Business research (1)007donmj
 
Process of Research (Research Methodology)
Process of Research (Research Methodology)Process of Research (Research Methodology)
Process of Research (Research Methodology)SERAJUL HAQUE
 
Mixed methods research2012
Mixed methods research2012Mixed methods research2012
Mixed methods research2012Gus Cons
 
Exploratory Factor Analysis; Concepts and Theory
Exploratory Factor Analysis; Concepts and TheoryExploratory Factor Analysis; Concepts and Theory
Exploratory Factor Analysis; Concepts and TheoryHamed Taherdoost
 
Theoretical framework and data analysis.
Theoretical framework and data analysis.Theoretical framework and data analysis.
Theoretical framework and data analysis.MINISTRY OF DEFENCE PAK
 
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKSTECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKSHamed Taherdoost
 
Theoretical Framework
Theoretical FrameworkTheoretical Framework
Theoretical FrameworkFarrukh Nazir
 

What's hot (20)

Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
 
Reviewing quantitative articles_and_checklist
Reviewing quantitative articles_and_checklistReviewing quantitative articles_and_checklist
Reviewing quantitative articles_and_checklist
 
2 types of research
2 types of research2 types of research
2 types of research
 
Research methodology for business .pptx
Research methodology for business .pptxResearch methodology for business .pptx
Research methodology for business .pptx
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
Emil Pulido on Qualitative Research: Analyzing Qualitative Data
Emil Pulido on Qualitative Research: Analyzing Qualitative DataEmil Pulido on Qualitative Research: Analyzing Qualitative Data
Emil Pulido on Qualitative Research: Analyzing Qualitative Data
 
02 mixed methods designs
02 mixed methods designs02 mixed methods designs
02 mixed methods designs
 
Qual data analysis and interpretation
Qual data analysis and interpretationQual data analysis and interpretation
Qual data analysis and interpretation
 
Collecting, analyzing and interpreting data
Collecting, analyzing and interpreting dataCollecting, analyzing and interpreting data
Collecting, analyzing and interpreting data
 
data interpretation
data interpretationdata interpretation
data interpretation
 
Business research (1)
Business research (1)Business research (1)
Business research (1)
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
2. theoretical framework
2. theoretical framework2. theoretical framework
2. theoretical framework
 
Process of Research (Research Methodology)
Process of Research (Research Methodology)Process of Research (Research Methodology)
Process of Research (Research Methodology)
 
Mixed methods research2012
Mixed methods research2012Mixed methods research2012
Mixed methods research2012
 
Exploratory Factor Analysis; Concepts and Theory
Exploratory Factor Analysis; Concepts and TheoryExploratory Factor Analysis; Concepts and Theory
Exploratory Factor Analysis; Concepts and Theory
 
Theoretical framework and data analysis.
Theoretical framework and data analysis.Theoretical framework and data analysis.
Theoretical framework and data analysis.
 
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKSTECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
 
Theoretical Framework
Theoretical FrameworkTheoretical Framework
Theoretical Framework
 
Hypothesis
HypothesisHypothesis
Hypothesis
 

Similar to To Explain Or To Predict?

Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingGalit Shmueli
 
Statistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingStatistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingGalit Shmueli
 
Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Aalto University
 
Statistical Modeling in Research_Dr.Balamurugan .pdf
Statistical Modeling in Research_Dr.Balamurugan .pdfStatistical Modeling in Research_Dr.Balamurugan .pdf
Statistical Modeling in Research_Dr.Balamurugan .pdfBalamurugan M
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testingpraveen3030
 
UNIT1-2.pptx
UNIT1-2.pptxUNIT1-2.pptx
UNIT1-2.pptxcsecem
 
1.model building
1.model building1.model building
1.model buildingVinod Sahu
 
Pentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIPentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIStudio Synthesis
 
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docxDeliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docxtheodorelove43763
 
Representing and generating uncertainty effectively presentatıon
Representing and generating uncertainty effectively presentatıonRepresenting and generating uncertainty effectively presentatıon
Representing and generating uncertainty effectively presentatıonAzdeen Najah
 
Data science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptxData science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptxswapnaraghav
 
algorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyalgorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyPaolo Missier
 
Tech meetup Data Driven - Codemotion
Tech meetup Data Driven - Codemotion Tech meetup Data Driven - Codemotion
Tech meetup Data Driven - Codemotion antimo musone
 

Similar to To Explain Or To Predict? (20)

Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, Describing
 
Statistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingStatistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and Predicting
 
Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"
 
man0 ppt.pptx
man0 ppt.pptxman0 ppt.pptx
man0 ppt.pptx
 
Lime
LimeLime
Lime
 
Datascience
DatascienceDatascience
Datascience
 
datascience.docx
datascience.docxdatascience.docx
datascience.docx
 
Statistical Modeling in Research_Dr.Balamurugan .pdf
Statistical Modeling in Research_Dr.Balamurugan .pdfStatistical Modeling in Research_Dr.Balamurugan .pdf
Statistical Modeling in Research_Dr.Balamurugan .pdf
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
UNIT1-2.pptx
UNIT1-2.pptxUNIT1-2.pptx
UNIT1-2.pptx
 
1.model building
1.model building1.model building
1.model building
 
Pentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BIPentaho Meeting 2008 - Statistics & BI
Pentaho Meeting 2008 - Statistics & BI
 
Experimental
ExperimentalExperimental
Experimental
 
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docxDeliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
 
Representing and generating uncertainty effectively presentatıon
Representing and generating uncertainty effectively presentatıonRepresenting and generating uncertainty effectively presentatıon
Representing and generating uncertainty effectively presentatıon
 
Data Analysis
Data Analysis Data Analysis
Data Analysis
 
Data science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptxData science notes for ASDS calicut 2.pptx
Data science notes for ASDS calicut 2.pptx
 
algorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyalgorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparency
 
Tech meetup Data Driven - Codemotion
Tech meetup Data Driven - Codemotion Tech meetup Data Driven - Codemotion
Tech meetup Data Driven - Codemotion
 
Presentation of BRM.pptx
Presentation of BRM.pptxPresentation of BRM.pptx
Presentation of BRM.pptx
 

More from Galit Shmueli

“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modification“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modificationGalit Shmueli
 
Behavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare ResearchBehavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare ResearchGalit Shmueli
 
Reinventing the Data Analytics Classroom
Reinventing the Data Analytics ClassroomReinventing the Data Analytics Classroom
Reinventing the Data Analytics ClassroomGalit Shmueli
 
Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Behavioral Big Data & Healthcare Research: Talk at WiDS TaipeiBehavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Behavioral Big Data & Healthcare Research: Talk at WiDS TaipeiGalit Shmueli
 
Workshop on Information Quality
Workshop on Information QualityWorkshop on Information Quality
Workshop on Information QualityGalit Shmueli
 
Behavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should CareBehavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should CareGalit Shmueli
 
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Galit Shmueli
 
Prediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PMPrediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PMGalit Shmueli
 
When Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of MarriageWhen Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of MarriageGalit Shmueli
 
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...Galit Shmueli
 
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...Galit Shmueli
 
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...Galit Shmueli
 
Research Using Behavioral Big Data (BBD)
Research Using Behavioral Big Data (BBD)Research Using Behavioral Big Data (BBD)
Research Using Behavioral Big Data (BBD)Galit Shmueli
 
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral IssuesAnalyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral IssuesGalit Shmueli
 
Information Quality: A Framework for Evaluating Empirical Studies
Information Quality: A Framework for Evaluating Empirical Studies Information Quality: A Framework for Evaluating Empirical Studies
Information Quality: A Framework for Evaluating Empirical Studies Galit Shmueli
 
E.SUN Academic Award presentation (Jan 2016)
E.SUN Academic Award presentation (Jan 2016)E.SUN Academic Award presentation (Jan 2016)
E.SUN Academic Award presentation (Jan 2016)Galit Shmueli
 
Big Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative IndustriesBig Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative IndustriesGalit Shmueli
 
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)Galit Shmueli
 
Introducing the NTHU-EZTABLE Kaggle Contest (Predicting Repeat Restaurant Boo...
Introducing the NTHU-EZTABLE Kaggle Contest (Predicting Repeat Restaurant Boo...Introducing the NTHU-EZTABLE Kaggle Contest (Predicting Repeat Restaurant Boo...
Introducing the NTHU-EZTABLE Kaggle Contest (Predicting Repeat Restaurant Boo...Galit Shmueli
 
Opening Data With Kaggle
Opening Data With KaggleOpening Data With Kaggle
Opening Data With KaggleGalit Shmueli
 

More from Galit Shmueli (20)

“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modification“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modification
 
Behavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare ResearchBehavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare Research
 
Reinventing the Data Analytics Classroom
Reinventing the Data Analytics ClassroomReinventing the Data Analytics Classroom
Reinventing the Data Analytics Classroom
 
Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Behavioral Big Data & Healthcare Research: Talk at WiDS TaipeiBehavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei
 
Workshop on Information Quality
Workshop on Information QualityWorkshop on Information Quality
Workshop on Information Quality
 
Behavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should CareBehavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should Care
 
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
 
Prediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PMPrediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PM
 
When Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of MarriageWhen Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of Marriage
 
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
 
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
 
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
 
Research Using Behavioral Big Data (BBD)
Research Using Behavioral Big Data (BBD)Research Using Behavioral Big Data (BBD)
Research Using Behavioral Big Data (BBD)
 
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral IssuesAnalyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
 
Information Quality: A Framework for Evaluating Empirical Studies
Information Quality: A Framework for Evaluating Empirical Studies Information Quality: A Framework for Evaluating Empirical Studies
Information Quality: A Framework for Evaluating Empirical Studies
 
E.SUN Academic Award presentation (Jan 2016)
E.SUN Academic Award presentation (Jan 2016)E.SUN Academic Award presentation (Jan 2016)
E.SUN Academic Award presentation (Jan 2016)
 
Big Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative IndustriesBig Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative Industries
 
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
 
Introducing the NTHU-EZTABLE Kaggle Contest (Predicting Repeat Restaurant Boo...
Introducing the NTHU-EZTABLE Kaggle Contest (Predicting Repeat Restaurant Boo...Introducing the NTHU-EZTABLE Kaggle Contest (Predicting Repeat Restaurant Boo...
Introducing the NTHU-EZTABLE Kaggle Contest (Predicting Repeat Restaurant Boo...
 
Opening Data With Kaggle
Opening Data With KaggleOpening Data With Kaggle
Opening Data With Kaggle
 

Recently uploaded

Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 

Recently uploaded (20)

Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 

To Explain Or To Predict?

  • 1. Galit Shmuéli Georgetown University October 30, 2009 To Explain or To Predict? Explanatory vs. Predictive Modeling in Scientific Research
  • 2. The path to discovery
  • 5. Statistical modeling in social science research Purpose: test causal theory (“explain”) Association-based statistical models Prediction nearly absent
  • 6. Lesson #1: Whether statisticians like it or not, in the social sciences, association-based statistical models are used for testing causal theory. Justification: a strong underlying theoretical model provides the causality.
  • 7. Definition: Explanatory Model A statistical model used for testing causal theory (“proper” or not)
  • 8. Definition: Predictive Model An empirical model used for predicting new records/scenarios
  • 9.
  • 10. Multi-page sections with theoretical justifications of each hypothesis
  • 11. Concept operationalization Poverty Trust Anger Economic stability Well-being 4 pages of such tables
  • 12. Statistical model (here: path analysis)
  • 15. Lesson #2 In the social sciences, empirical analysis is mainly used for testing causal theory. Empirical prediction is considered un-academic. Some statisticians share this view: The two goals in analyzing data... I prefer to describe as “management” and “science”. Management seeks profit... Science seeks truth. Parzen, Statistical Science 2001
  • 16. Prediction in the Information Systems literature
  • 18. 1072 articles of which52 empirical with predictive claims “Examples of [predictive] theory in IS do not come readily to hand, suggesting that they are not common”Gregor, MISQ 2006
  • 19. Breakdown of the 52 “predictive” articles
  • 20. Why Predict? Scientific use of empirical models To Predict To Explain test causal theory (utility) relevance new theory predictability
  • 21. Why are statistical explanatory models different than predictive models?
  • 22. Theory vs. its manifestation ?
  • 23. “The goal of finding models that are predictively accurate differs from the goal of finding models that are true.”
  • 24. Given the research environment in the social sciences, two critically important points are: Explanatory power and predictive accuracy cannot be inferred from one another. The “best” explanatory model is (nearly) never the “best” predictive model, and vice versa.
  • 25. Point #1 Explanatory Power Predictive Power ≠ Cannot infer one from the other
  • 28. out-of-sample interpretation prediction accuracy p-values Performance Evaluation R2 costs goodness-of-fit run time Danger: type I,II errors Danger: over-fitting
  • 29. Suggestion for social scientists: Report predictive accuracy in addition to explanatory power
  • 31. Point #2 Best explanatory model ≠ Best predictive model
  • 32. Predict ≠ Explain “We should mention that not all data features were found to be useful. For example, we tried to benefit from an extensive set of attributes describing each of the movies in the dataset. Those attributes certainly carry a significant signal and can explain some of the user behavior. However, we concluded that they could not help at all for improving the accuracy of well tuned collaborative filtering models.” Bell et al., 2008 + ?
  • 33. Predict ≠ Explain The FDA considers two products bioequivalent if the 90% CI of the relative mean of the generic to brand formulation is within 80%-125% “We are planning to… develop predictive models for bioavailability and bioequivalence” Lester M. Crawford, 2005 Acting Commissioner of Food & Drugs
  • 35. Explanatory goal: minimize model bias Predictive goal: minimize MSE(model bias + sampling variance)
  • 36. What is Optimized? Bias Prediction MSE or Var(Y)= uncontrollable bias2 = model misspecification estimation (sampling variance)
  • 37. Linear Regression Example Underspecified model Estimated model True model Estimated model MSE2 < MSE1 when: σ2 large |β2| small corr(x1,x2) high limited range of x’s
  • 38. China's Diverging Paths, photo by Clark Smith Twostatistical modeling paths
  • 39. Design & Collection Data Preparation Goal Definition EDA Variables? Methods? Model Use & Reporting Evaluation, Validation & Model Selection
  • 40. Hierarchical data Study design & data collection Observational or experiment? Primary or secondary data? Instrument (reliability+validity vs. measur accuracy) How much data? How to sample?
  • 41. reduced-feature models partitioning Data preparation missing
  • 42. summary stats plots outliers trends Interactive visualization PCASVD
  • 43. Which variables? Multicollinearity? A, B, A*B? theoryassociations ex-post availability
  • 44. Methods / Models bias variance Blackbox / interpretable Mapping to theory ridge regression ensembles boosting PLS PCR
  • 45. Model fit ≠ Validation Explanatory power Empirical model Theoretical model Data Evaluation, Validation& Model Selection Training data Empirical model Over-fitting analysis Holdout data Predictive power
  • 46. Model Use Inference Test causal theory Null hypothesis Predictions (utility) Relevance New theory Predictability Predictive performance Over-fitting analysis Naïve/baseline
  • 47. Design & Collection Data Preparation Goal Definition EDA Variables? Methods? Model Use & Reporting Evaluation, Validation, & Model Selection
  • 48. How does all this impact research in the (social) sciences?
  • 49. Three Current Problems Prediction underappreciated Distinction blurred Inappropriate modeling/assessment “While the value of scientific prediction… is beyond question… the inexact sciences [do not] have…the use of predictive expertise well in hand.” Helmer & Rescher, 1959
  • 50. Why? What can be done? Statisticians should acknowledge the difference and teach it!
  • 51. It’s time for Change To Predict To Explain

Editor's Notes

  1. Example: confidence interval vs. prediction intervalLift , costs
  2. Relevance; reality check; predictability