SlideShare a Scribd company logo
1 of 39
Galit Shmuéli
       Ij       Israel Statistical Association
                    & Tel Aviv University
                         July 9, 2012


 To Explain or To Predict?
Points for discussion: goo.gl/gcjlN

Twitter: #explainpredict
Road Map
Definitions
Explanatory-dominated social sciences
Explanatory ≠ predictive modeling
 Why?
 Different modeling paths
 Explanatory vs. predictive power


So what?
Definitions

Explanatory modeling:
Theory-based, statistical testing of
causal hypotheses

Explanatory power:
Strength of relationship in statistical
model
Definitions

Predictive modeling:
Empirical method for predicting new
observations

Predictive power:
Ability to accurately predict new
observations
Statistical modeling in
     social science research



Purpose: test causal theory (“explain”)
           Association-based statistical models
                         Prediction nearly absent
Explanatory modeling à-la social sciences
Start with a causal
theory

Generate causal
hypotheses on
constructs

Operationalize constructs → Measurable variables

Fit statistical model

Statistical inference → Causal conclusions
In the social sciences,

data analysis is mainly used for testing
            causal theory.

     “If it explains, it predicts”
“Empirical prediction alone
            is un-scientific”

Some statisticians share this view:

   The two goals in analyzing data... I prefer to describe
   as “management” and “science”. Management seeks
   profit... Science seeks truth.

                        - Parzen, Statistical Science 2001
52 “predictive” articles among 1,072
in Information Systems top journals
Why Predict? for Scientific Research
          new theory
          develop measures
          compare theories
          improve theory
          assess relevance
          predictability

Shmueli & Koppius, “Predictive Analytics in IS Research”
(MISQ, 2011)
“A good explanatory model will also
predict well”
“You must understand the underlying
causes in order to predict”
Philosophy of Science
“Explanation and prediction have the
same logical structure”
                Hempel & Oppenheim, 1948

  “It becomes pertinent to investigate the
  possibilities of predictive procedures
  autonomous of those used for explanation”
                             Helmer & Rescher, 1959

         “Theories of social and human behavior
         address themselves to two distinct goals of
         science: (1) prediction and (2) understanding”
                                Dubin, Theory Building, 1969
Why statistical

explanatory modeling
       differs from

predictive modeling
   Shmueli (2010), Statistical Science
Theory vs. its manifestation




                     ?
Notation

Theoretical constructs: X, Y
Causal theoretical model: Y=F(X)
Measurable variables: X, Y
Statistical model: E(y)=f(X)
Four aspects                 Y=F(X)
                             E(Y)=f(X)
1. Theory – Data
2. Causation – Association
3. Retrospective – Prospective
4. Bias - Variance
“The goal of finding models that are
predictively accurate differs from the
goal of finding models that are true.”
Point #1
Best explanatory model


              ≠
       Best predictive model
Four aspects                 Y=F(X)
                             Y=f(X)
1. Theory - Data
2. Causation – Association
3. Retrospective – Prospective
4. Bias - Variance
Predict ≠ Explain
               “we tried to benefit from an extensive
               set of attributes describing each of the
               movies in the dataset. Those attributes
               certainly carry a significant signal and
                +
               can explain some of the user behavior.
               However… they could not help at all
                                                       ?
               for improving the [predictive]
               accuracy.”
                                         Bell et al., 2008
Predict ≠ Explain
The FDA considers two products
bioequivalent if the 90% CI of the
relative mean of the generic to brand
formulation is within 80%-125%




“We are planning to… develop predictive models for bioavailability
and bioequivalence”
                                           Lester M. Crawford, 2005
                                Acting Commissioner of Food & Drugs
Goal        Design &          Data          EDA
Definition    Collection     Preparation




Variables?                                 Model Use &
Methods?     Evaluation, V                  Reporting
             alidation &
                 Model
               Selection
Study design
    & data collection
Observational or experiment?
Primary or secondary data?
Instrument (reliability+validity vs. measur accuracy)
How much data?
How to sample?

                             Hierarchical data
Data Preprocessing




   missing    reduced-
               feature
               models
                         partitioning
Data exploration & reduction

                   Interactive
                  visualization
                      PCA
                      SVD
Which Variables?



                        endogeneity
                          ex-post
                         availability
causation associations
  Multicollinearity? A, B, A*B?
Methods / Models
                    Blackbox / interpretable
                    Mapping to theory


   variance                       bias




Shrinkage models
           ensembles
Model fit ≠
              Validation
                                 Explanatory power

Theoretical                Empirical
                                              Data
  model                     model

        Evaluation, Validation
          & Model Selection

Empirical                  Training data      Over-fitting
 model                     Holdout data        analysis
         Predictive power
Model Use
 test causal theory         Inference
                             Null hypothesis


new theory
Develop measures
compare theories      Predictive performance
improve theory         Naïve/baseline
assess relevance      Over-fitting analysis
predictability
Point #2

Explanatory            Predictive
  Power         ≠        Power

Cannot infer one from the other
out-of-sample
 interpretation

p-values                        prediction
                                 accuracy
               Performance
R2                                      costs
                 Metrics
                             Training vs.
goodness-of-fit              holdout
     type I,II errors    over-fitting
Predictive Power




                   Explanatory Power
The predictive power of an
explanatory model has important
scientific value


Relevance, reality check, predictability
In “explanatory” fields
Prediction underappreciated

Distinction blurred
Unfamiliar with predictive
modeling/assessment
  “While the value of scientific prediction… is beyond
  question… the inexact sciences *do not+ have…the
  use of predictive expertise well in hand.”
                               Helmer & Rescher, 1959
How does all this impact
   Scientific Research?
What can be done?

   acknowledge
incorporate prediction into
         curriculum
What happens in other fields?

     Epidemiology
         Engineering
             Life sciences

What about “predictive only”
fields?           http://goo.gl/gcjlN
Shmueli (2010), “To Explain or To Predict?”, Statistical Science
Shmueli & Koppius (2011), “Predictive analytics in IS research”, MISQ

More Related Content

What's hot

Slideshare Presentation of Qualitative Data
Slideshare   Presentation of Qualitative DataSlideshare   Presentation of Qualitative Data
Slideshare Presentation of Qualitative DataDavin Marcus Raja
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceAmit Sharma
 
First Year Report, PhD presentation
First Year Report, PhD presentationFirst Year Report, PhD presentation
First Year Report, PhD presentationBang Xiang Yong
 
STATA - Introduction
STATA - IntroductionSTATA - Introduction
STATA - Introductionstata_org_uk
 
Grounded theory methodology of qualitative data analysis
Grounded theory methodology of qualitative data analysisGrounded theory methodology of qualitative data analysis
Grounded theory methodology of qualitative data analysisDr. Shiv S Tripathi
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysisVishwas N
 
Grounded theory
Grounded theoryGrounded theory
Grounded theoryAamiruvas
 
Assumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine LearningAssumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine LearningKush Kulshrestha
 
Ordinal logistic regression
Ordinal logistic regression Ordinal logistic regression
Ordinal logistic regression Dr Athar Khan
 
Meaning of Constructs, Concepts & Variables
Meaning of Constructs, Concepts & VariablesMeaning of Constructs, Concepts & Variables
Meaning of Constructs, Concepts & VariablesSahin Sahari
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersFunctional Imperative
 
Confirmatory Factor Analysis Presented by Mahfoudh Mgammal
Confirmatory Factor Analysis Presented by Mahfoudh MgammalConfirmatory Factor Analysis Presented by Mahfoudh Mgammal
Confirmatory Factor Analysis Presented by Mahfoudh MgammalDr. Mahfoudh Hussein Mgammal
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualizationDr. Hamdan Al-Sabri
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testingpraveen3030
 
difference between the qualitative and quantitative researcher, variables, co...
difference between the qualitative and quantitative researcher, variables, co...difference between the qualitative and quantitative researcher, variables, co...
difference between the qualitative and quantitative researcher, variables, co...laraib asif
 
Hypothesis Formulation
Hypothesis FormulationHypothesis Formulation
Hypothesis FormulationSarang Bhola
 
Stata statistics
Stata statisticsStata statistics
Stata statisticsizahn
 

What's hot (20)

Slideshare Presentation of Qualitative Data
Slideshare   Presentation of Qualitative DataSlideshare   Presentation of Qualitative Data
Slideshare Presentation of Qualitative Data
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
 
First Year Report, PhD presentation
First Year Report, PhD presentationFirst Year Report, PhD presentation
First Year Report, PhD presentation
 
STATA - Introduction
STATA - IntroductionSTATA - Introduction
STATA - Introduction
 
Grounded theory methodology of qualitative data analysis
Grounded theory methodology of qualitative data analysisGrounded theory methodology of qualitative data analysis
Grounded theory methodology of qualitative data analysis
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Data Analysis
Data AnalysisData Analysis
Data Analysis
 
SEM
SEMSEM
SEM
 
Grounded theory
Grounded theoryGrounded theory
Grounded theory
 
Assumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine LearningAssumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine Learning
 
Ordinal logistic regression
Ordinal logistic regression Ordinal logistic regression
Ordinal logistic regression
 
Meaning of Constructs, Concepts & Variables
Meaning of Constructs, Concepts & VariablesMeaning of Constructs, Concepts & Variables
Meaning of Constructs, Concepts & Variables
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Confirmatory Factor Analysis Presented by Mahfoudh Mgammal
Confirmatory Factor Analysis Presented by Mahfoudh MgammalConfirmatory Factor Analysis Presented by Mahfoudh Mgammal
Confirmatory Factor Analysis Presented by Mahfoudh Mgammal
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Data cleaning-outlier-detection
Data cleaning-outlier-detectionData cleaning-outlier-detection
Data cleaning-outlier-detection
 
difference between the qualitative and quantitative researcher, variables, co...
difference between the qualitative and quantitative researcher, variables, co...difference between the qualitative and quantitative researcher, variables, co...
difference between the qualitative and quantitative researcher, variables, co...
 
Hypothesis Formulation
Hypothesis FormulationHypothesis Formulation
Hypothesis Formulation
 
Stata statistics
Stata statisticsStata statistics
Stata statistics
 

Viewers also liked

Predictive Model Selection in PLS-PM (SCECR 2015)
Predictive Model Selection in PLS-PM (SCECR 2015)Predictive Model Selection in PLS-PM (SCECR 2015)
Predictive Model Selection in PLS-PM (SCECR 2015)Galit Shmueli
 
What is Predictive About Partial Least Squares?
What is Predictive About Partial Least Squares?What is Predictive About Partial Least Squares?
What is Predictive About Partial Least Squares?Galit Shmueli
 
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...Galit Shmueli
 
Interpreting machine learning models
Interpreting machine learning modelsInterpreting machine learning models
Interpreting machine learning modelsandosa
 
To Explain Or To Predict?
To Explain Or To Predict?To Explain Or To Predict?
To Explain Or To Predict?Galit Shmueli
 
SAP HANA SPS09 - Predictive Analysis Library
SAP HANA SPS09 - Predictive Analysis LibrarySAP HANA SPS09 - Predictive Analysis Library
SAP HANA SPS09 - Predictive Analysis LibrarySAP Technology
 
DATA SCIENCE Lesson 5 Data Science Predictive Modeling and Modelling Methodol...
DATA SCIENCE Lesson 5 Data Science Predictive Modeling and Modelling Methodol...DATA SCIENCE Lesson 5 Data Science Predictive Modeling and Modelling Methodol...
DATA SCIENCE Lesson 5 Data Science Predictive Modeling and Modelling Methodol...Jean-Antoine Moreau
 
SAP HANA SPS10- Predictive Analysis Library and Application Function Modeler
SAP HANA SPS10- Predictive Analysis Library and Application Function ModelerSAP HANA SPS10- Predictive Analysis Library and Application Function Modeler
SAP HANA SPS10- Predictive Analysis Library and Application Function ModelerSAP Technology
 
Who to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanationsWho to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanationsNicola Barbieri
 
ロジスティック回帰分析の入門 -予測モデル構築-
ロジスティック回帰分析の入門 -予測モデル構築-ロジスティック回帰分析の入門 -予測モデル構築-
ロジスティック回帰分析の入門 -予測モデル構築-Koichiro Gibo
 
Research report purposes and classifications
Research report purposes and classificationsResearch report purposes and classifications
Research report purposes and classificationsAnn Vitug
 
Research Methodology
Research MethodologyResearch Methodology
Research Methodologysh_neha252
 
Gender Report Infographic: Elsevier 2017
Gender Report Infographic: Elsevier 2017Gender Report Infographic: Elsevier 2017
Gender Report Infographic: Elsevier 2017Elsevier
 
Research methodology ppt babasab
Research methodology ppt babasab Research methodology ppt babasab
Research methodology ppt babasab Babasab Patil
 
Research Methods: Basic Concepts and Methods
Research Methods: Basic Concepts and MethodsResearch Methods: Basic Concepts and Methods
Research Methods: Basic Concepts and MethodsAhmed-Refat Refat
 

Viewers also liked (20)

Predictive Model Selection in PLS-PM (SCECR 2015)
Predictive Model Selection in PLS-PM (SCECR 2015)Predictive Model Selection in PLS-PM (SCECR 2015)
Predictive Model Selection in PLS-PM (SCECR 2015)
 
What is Predictive About Partial Least Squares?
What is Predictive About Partial Least Squares?What is Predictive About Partial Least Squares?
What is Predictive About Partial Least Squares?
 
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
 
Interpreting machine learning models
Interpreting machine learning modelsInterpreting machine learning models
Interpreting machine learning models
 
To Explain Or To Predict?
To Explain Or To Predict?To Explain Or To Predict?
To Explain Or To Predict?
 
SAP HANA SPS09 - Predictive Analysis Library
SAP HANA SPS09 - Predictive Analysis LibrarySAP HANA SPS09 - Predictive Analysis Library
SAP HANA SPS09 - Predictive Analysis Library
 
DATA SCIENCE Lesson 5 Data Science Predictive Modeling and Modelling Methodol...
DATA SCIENCE Lesson 5 Data Science Predictive Modeling and Modelling Methodol...DATA SCIENCE Lesson 5 Data Science Predictive Modeling and Modelling Methodol...
DATA SCIENCE Lesson 5 Data Science Predictive Modeling and Modelling Methodol...
 
SAP HANA SPS10- Predictive Analysis Library and Application Function Modeler
SAP HANA SPS10- Predictive Analysis Library and Application Function ModelerSAP HANA SPS10- Predictive Analysis Library and Application Function Modeler
SAP HANA SPS10- Predictive Analysis Library and Application Function Modeler
 
Who to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanationsWho to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanations
 
ロジスティック回帰分析の入門 -予測モデル構築-
ロジスティック回帰分析の入門 -予測モデル構築-ロジスティック回帰分析の入門 -予測モデル構築-
ロジスティック回帰分析の入門 -予測モデル構築-
 
Explanation Slides
Explanation SlidesExplanation Slides
Explanation Slides
 
Research report purposes and classifications
Research report purposes and classificationsResearch report purposes and classifications
Research report purposes and classifications
 
Employee Engagement
Employee EngagementEmployee Engagement
Employee Engagement
 
Employee engagement
Employee engagementEmployee engagement
Employee engagement
 
Research Methodology Lecture for Master & Phd Students
Research Methodology  Lecture for Master & Phd StudentsResearch Methodology  Lecture for Master & Phd Students
Research Methodology Lecture for Master & Phd Students
 
Research Methodology
Research MethodologyResearch Methodology
Research Methodology
 
Gender Report Infographic: Elsevier 2017
Gender Report Infographic: Elsevier 2017Gender Report Infographic: Elsevier 2017
Gender Report Infographic: Elsevier 2017
 
Research methodology ppt babasab
Research methodology ppt babasab Research methodology ppt babasab
Research methodology ppt babasab
 
Research methodology notes
Research methodology notesResearch methodology notes
Research methodology notes
 
Research Methods: Basic Concepts and Methods
Research Methods: Basic Concepts and MethodsResearch Methods: Basic Concepts and Methods
Research Methods: Basic Concepts and Methods
 

Similar to To explain or to predict

Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingGalit Shmueli
 
Statistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingStatistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingGalit Shmueli
 
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)Galit Shmueli
 
1.model building
1.model building1.model building
1.model buildingVinod Sahu
 
Research Methodology
Research MethodologyResearch Methodology
Research MethodologyAneel Raza
 
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...jemille6
 
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docxDeliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docxtheodorelove43763
 
Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Aalto University
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Miningbutest
 
2 types of research
2 types of research2 types of research
2 types of researchNaveed Saeed
 
Lec 2 types of research
Lec 2 types of researchLec 2 types of research
Lec 2 types of researchNaveed Saeed
 
Bps managing dissertation
Bps managing dissertationBps managing dissertation
Bps managing dissertationChuck Eesley
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyjemille6
 
TPCMFinalACone
TPCMFinalAConeTPCMFinalACone
TPCMFinalAConeAdam Cone
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkStats Statswork
 
Theory Building in Business Research
Theory Building in Business ResearchTheory Building in Business Research
Theory Building in Business ResearchRajesh Timane, PhD
 
Pharmacokinetic pharmacodynamic modeling
Pharmacokinetic pharmacodynamic modelingPharmacokinetic pharmacodynamic modeling
Pharmacokinetic pharmacodynamic modelingMeghana Gowda
 

Similar to To explain or to predict (20)

Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, Describing
 
Statistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingStatistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and Predicting
 
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
 
1.model building
1.model building1.model building
1.model building
 
Research Methodology
Research MethodologyResearch Methodology
Research Methodology
 
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
 
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docxDeliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
 
Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
 
2 types of research
2 types of research2 types of research
2 types of research
 
Lec 2 types of research
Lec 2 types of researchLec 2 types of research
Lec 2 types of research
 
man0 ppt.pptx
man0 ppt.pptxman0 ppt.pptx
man0 ppt.pptx
 
Bps managing dissertation
Bps managing dissertationBps managing dissertation
Bps managing dissertation
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severely
 
Mgmt 802 week 1(1)
Mgmt 802 week 1(1)Mgmt 802 week 1(1)
Mgmt 802 week 1(1)
 
TPCMFinalACone
TPCMFinalAConeTPCMFinalACone
TPCMFinalACone
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
 
Theory Building in Business Research
Theory Building in Business ResearchTheory Building in Business Research
Theory Building in Business Research
 
The Business Research Method
The Business Research MethodThe Business Research Method
The Business Research Method
 
Pharmacokinetic pharmacodynamic modeling
Pharmacokinetic pharmacodynamic modelingPharmacokinetic pharmacodynamic modeling
Pharmacokinetic pharmacodynamic modeling
 

More from Galit Shmueli

“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modification“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modificationGalit Shmueli
 
Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Galit Shmueli
 
Behavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare ResearchBehavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare ResearchGalit Shmueli
 
Reinventing the Data Analytics Classroom
Reinventing the Data Analytics ClassroomReinventing the Data Analytics Classroom
Reinventing the Data Analytics ClassroomGalit Shmueli
 
Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Behavioral Big Data & Healthcare Research: Talk at WiDS TaipeiBehavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Behavioral Big Data & Healthcare Research: Talk at WiDS TaipeiGalit Shmueli
 
Repurposing predictive tools for causal research
Repurposing predictive tools for causal researchRepurposing predictive tools for causal research
Repurposing predictive tools for causal researchGalit Shmueli
 
Workshop on Information Quality
Workshop on Information QualityWorkshop on Information Quality
Workshop on Information QualityGalit Shmueli
 
Behavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should CareBehavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should CareGalit Shmueli
 
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Galit Shmueli
 
Prediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PMPrediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PMGalit Shmueli
 
When Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of MarriageWhen Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of MarriageGalit Shmueli
 
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...Galit Shmueli
 
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...Galit Shmueli
 
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...Galit Shmueli
 
Research Using Behavioral Big Data (BBD)
Research Using Behavioral Big Data (BBD)Research Using Behavioral Big Data (BBD)
Research Using Behavioral Big Data (BBD)Galit Shmueli
 
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral IssuesAnalyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral IssuesGalit Shmueli
 
Information Quality: A Framework for Evaluating Empirical Studies
Information Quality: A Framework for Evaluating Empirical Studies Information Quality: A Framework for Evaluating Empirical Studies
Information Quality: A Framework for Evaluating Empirical Studies Galit Shmueli
 
E.SUN Academic Award presentation (Jan 2016)
E.SUN Academic Award presentation (Jan 2016)E.SUN Academic Award presentation (Jan 2016)
E.SUN Academic Award presentation (Jan 2016)Galit Shmueli
 
Big Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative IndustriesBig Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative IndustriesGalit Shmueli
 
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)Galit Shmueli
 

More from Galit Shmueli (20)

“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modification“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modification
 
Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...
 
Behavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare ResearchBehavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare Research
 
Reinventing the Data Analytics Classroom
Reinventing the Data Analytics ClassroomReinventing the Data Analytics Classroom
Reinventing the Data Analytics Classroom
 
Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Behavioral Big Data & Healthcare Research: Talk at WiDS TaipeiBehavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei
 
Repurposing predictive tools for causal research
Repurposing predictive tools for causal researchRepurposing predictive tools for causal research
Repurposing predictive tools for causal research
 
Workshop on Information Quality
Workshop on Information QualityWorkshop on Information Quality
Workshop on Information Quality
 
Behavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should CareBehavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should Care
 
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
 
Prediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PMPrediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PM
 
When Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of MarriageWhen Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of Marriage
 
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
 
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
 
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
 
Research Using Behavioral Big Data (BBD)
Research Using Behavioral Big Data (BBD)Research Using Behavioral Big Data (BBD)
Research Using Behavioral Big Data (BBD)
 
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral IssuesAnalyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
 
Information Quality: A Framework for Evaluating Empirical Studies
Information Quality: A Framework for Evaluating Empirical Studies Information Quality: A Framework for Evaluating Empirical Studies
Information Quality: A Framework for Evaluating Empirical Studies
 
E.SUN Academic Award presentation (Jan 2016)
E.SUN Academic Award presentation (Jan 2016)E.SUN Academic Award presentation (Jan 2016)
E.SUN Academic Award presentation (Jan 2016)
 
Big Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative IndustriesBig Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative Industries
 
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
 

Recently uploaded

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Recently uploaded (20)

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

To explain or to predict

  • 1. Galit Shmuéli Ij Israel Statistical Association & Tel Aviv University July 9, 2012 To Explain or To Predict?
  • 2. Points for discussion: goo.gl/gcjlN Twitter: #explainpredict
  • 3. Road Map Definitions Explanatory-dominated social sciences Explanatory ≠ predictive modeling Why? Different modeling paths Explanatory vs. predictive power So what?
  • 4. Definitions Explanatory modeling: Theory-based, statistical testing of causal hypotheses Explanatory power: Strength of relationship in statistical model
  • 5. Definitions Predictive modeling: Empirical method for predicting new observations Predictive power: Ability to accurately predict new observations
  • 6. Statistical modeling in social science research Purpose: test causal theory (“explain”) Association-based statistical models Prediction nearly absent
  • 7. Explanatory modeling à-la social sciences Start with a causal theory Generate causal hypotheses on constructs Operationalize constructs → Measurable variables Fit statistical model Statistical inference → Causal conclusions
  • 8. In the social sciences, data analysis is mainly used for testing causal theory. “If it explains, it predicts”
  • 9. “Empirical prediction alone is un-scientific” Some statisticians share this view: The two goals in analyzing data... I prefer to describe as “management” and “science”. Management seeks profit... Science seeks truth. - Parzen, Statistical Science 2001
  • 10. 52 “predictive” articles among 1,072 in Information Systems top journals
  • 11. Why Predict? for Scientific Research new theory develop measures compare theories improve theory assess relevance predictability Shmueli & Koppius, “Predictive Analytics in IS Research” (MISQ, 2011)
  • 12. “A good explanatory model will also predict well” “You must understand the underlying causes in order to predict”
  • 13. Philosophy of Science “Explanation and prediction have the same logical structure” Hempel & Oppenheim, 1948 “It becomes pertinent to investigate the possibilities of predictive procedures autonomous of those used for explanation” Helmer & Rescher, 1959 “Theories of social and human behavior address themselves to two distinct goals of science: (1) prediction and (2) understanding” Dubin, Theory Building, 1969
  • 14. Why statistical explanatory modeling differs from predictive modeling Shmueli (2010), Statistical Science
  • 15. Theory vs. its manifestation ?
  • 16. Notation Theoretical constructs: X, Y Causal theoretical model: Y=F(X) Measurable variables: X, Y Statistical model: E(y)=f(X)
  • 17. Four aspects Y=F(X) E(Y)=f(X) 1. Theory – Data 2. Causation – Association 3. Retrospective – Prospective 4. Bias - Variance
  • 18. “The goal of finding models that are predictively accurate differs from the goal of finding models that are true.”
  • 19. Point #1 Best explanatory model ≠ Best predictive model
  • 20. Four aspects Y=F(X) Y=f(X) 1. Theory - Data 2. Causation – Association 3. Retrospective – Prospective 4. Bias - Variance
  • 21. Predict ≠ Explain “we tried to benefit from an extensive set of attributes describing each of the movies in the dataset. Those attributes certainly carry a significant signal and + can explain some of the user behavior. However… they could not help at all ? for improving the [predictive] accuracy.” Bell et al., 2008
  • 22. Predict ≠ Explain The FDA considers two products bioequivalent if the 90% CI of the relative mean of the generic to brand formulation is within 80%-125% “We are planning to… develop predictive models for bioavailability and bioequivalence” Lester M. Crawford, 2005 Acting Commissioner of Food & Drugs
  • 23. Goal Design & Data EDA Definition Collection Preparation Variables? Model Use & Methods? Evaluation, V Reporting alidation & Model Selection
  • 24. Study design & data collection Observational or experiment? Primary or secondary data? Instrument (reliability+validity vs. measur accuracy) How much data? How to sample? Hierarchical data
  • 25. Data Preprocessing missing reduced- feature models partitioning
  • 26. Data exploration & reduction Interactive visualization PCA SVD
  • 27. Which Variables? endogeneity ex-post availability causation associations Multicollinearity? A, B, A*B?
  • 28. Methods / Models Blackbox / interpretable Mapping to theory variance bias Shrinkage models ensembles
  • 29. Model fit ≠ Validation Explanatory power Theoretical Empirical Data model model Evaluation, Validation & Model Selection Empirical Training data Over-fitting model Holdout data analysis Predictive power
  • 30. Model Use test causal theory Inference Null hypothesis new theory Develop measures compare theories Predictive performance improve theory Naïve/baseline assess relevance Over-fitting analysis predictability
  • 31. Point #2 Explanatory Predictive Power ≠ Power Cannot infer one from the other
  • 32. out-of-sample interpretation p-values prediction accuracy Performance R2 costs Metrics Training vs. goodness-of-fit holdout type I,II errors over-fitting
  • 33. Predictive Power Explanatory Power
  • 34. The predictive power of an explanatory model has important scientific value Relevance, reality check, predictability
  • 35. In “explanatory” fields Prediction underappreciated Distinction blurred Unfamiliar with predictive modeling/assessment “While the value of scientific prediction… is beyond question… the inexact sciences *do not+ have…the use of predictive expertise well in hand.” Helmer & Rescher, 1959
  • 36. How does all this impact Scientific Research?
  • 37. What can be done? acknowledge incorporate prediction into curriculum
  • 38. What happens in other fields? Epidemiology Engineering Life sciences What about “predictive only” fields? http://goo.gl/gcjlN
  • 39. Shmueli (2010), “To Explain or To Predict?”, Statistical Science Shmueli & Koppius (2011), “Predictive analytics in IS research”, MISQ