SlideShare a Scribd company logo
1 of 46
Statistical Modeling in 3D:
Describing
Explaining
Predicting
University of Padova, Jun 15, 2018
Galit Shmueli 徐茉
莉
Institute of Service
Science
1997-2000 (PhD, Statistics)
Israel Institute of Technology
Faculty of IE & M
2000-2002
Carnegie Mellon Univ.
Department of Statistics
2002-2012
Univ. of Maryland
Smith School of Business
2011-2014
Indian School of Business
Hyderabad, India
2014-…
National Tsing Hua Univ.
Institute of Service Science
My Academic Path
My Research
‘Entrepreneurial’ statistical &
data mining modeling
Interdisciplinary
Statistical Strategy
• To Explain or To Predict?
• Information Quality
• Data Mining for Causality
• Predicting with Causal Models
Road Map
1. Definitions
2. Monopolies & confusion in academia & industry
3. Explanatory, predictive, descriptive modeling &
evaluation are different
Why?
Different modeling paths
Explanatory power vs. predictive power
4. Where next?
Definitions: Explain
Explanatory modeling
theory-based, statistical testing
of causal hypotheses
Explanatory power
strength of relationship in
statistical model
Definitions: Predict
Predictive modeling
empirical method for predicting
new observations
Predictive power
ability to accurately predict new
observations
Definitions: Describe
Descriptive modeling
statistical model for approximating
a distribution or relationship
Descriptive power
goodness of fit, generalizable to
population
Explai
n
Predict
Describ
e
Social Sciences
Machine
Learning
Statistics
Monopolies in Different Fields
Social sciences & management research
Domination of ”Explain”
Purpose: test causal theory (“explain”)
Association-based statistical models
Prediction & description nearly absent
Start with a causal theory
Generate causal
hypotheses on constructs
Operationalize constructs → measurable variables
Fit statistical model
Classic journal paper
Statistical inference → causal conclusions
Misconception #1:
The same model is best for explaining, describing, predicting
Social Sci & Mgmt: Build explanatory model and use it to ”predict”
“A good explanatory model will also predict well”
“You must understand the underlying causes in order to predict”
“To examine the predictive power of the
proposed model, we compare it to four models
in terms of R2 adjusted”
Misconception #1:
The same model is best for explaining, describing, predicting
CS/eng/stat: Build a predictive model and use it to ”explain”
in cs / stat / engineering / industry
2014 6th International Conference on Mobile
Computing, Applications and Services
(Agent-based modeling using census data)
“our model is able to provide both predictions of how the
population may vote and why they are voting this way”…
2009 IEEE International Conference on Systems, Man and
Cybernetics
Misconception #2:
explain > predict or predict > explain
Emanuel Parzen, Comment on
“Statistical Modeling: The Two Cultures”
Statistical Science 2001
“Correlation supersedes causation, and
science can advance even without
coherent models, unified theories, or
really any mechanistic explanation at all”
*Chris Anderson is the editor in chief of Wired
Philosophy of Science
“Explanation and prediction have the
same logical structure”
Hempel & Oppenheim, 1948
“It becomes pertinent to investigate the
possibilities of predictive procedures
autonomous of those used for explanation”
Helmer & Rescher, 1959
“Theories of social and human behavior
address themselves to two distinct goals of
science: (1) prediction and (2) understanding”
Dubin, Theory Building, 1969
Why statistical
explanatory modeling
predictive modeling
descriptive modeling
are different
Explanatory Model:
test/quantify causal effect between constructs for
“average” unit in population
Descriptive Model:
test/quantify distribution or correlation structure for
measured “average” unit in population
Predictive Model:
predict values for new/future individual units
Different Scientific Goals
Different generalization
Theory vs. its manifestation
?
Notation
Theoretical constructs: X, Y
Causal theoretical model: Y=F(X)
Measurable variables: X, Y
Statistical model: E(y)=f(X)
Breiman, “Stat Modeling: The Two Cultures”, Stat Science, 2001
Five aspects to consider
Theory –
Causation –
Retrospective –
Bias –
Average unit –
Data
Association
Prospective
Variance
Individual unit
“The goal of finding models
that are predictively accurate
differs from the goal of finding
models that are true.”
But there’s more than bias-variance
Example: Regression Model for Explanation
yi|xi = b0 + b1xi +b2 xcontrols + ei
parameter
of interest
(inference)
Chosen to avoid Omitted Var
Bias (better to over-specify)
Measures of
X, Y constructs
Underlying model: X Y
Danger:
endogeneity
yi|xi = b0 + b1 x1i +…+bp xpi + ei
parameters
of interest
(inference)
Chosen b/c related to Y
Danger: multicollinearity
All variables treated/interpreted
as observable
Remain in model only if
statistically significant
Residual analysis
for GoF & test
assumptions
Example: Regression Model for Description
yi|xi = b0 + b1 x1i +…+bp xpi + ei
Quantity of
interest for
new i’s
(prediction)
Chosen b/c possibly
correlated with Y
Danger: over-fitting
All variables treated as observable,
available at time of prediction
Retain only if improve out-
of-sample prediction
Evaluate overfitting
(train vs holdout)
Example: Regression Model for Prediction
best
explanatory
model
best
predictive
model
Point #1
best
descriptive
model
Predict ≠ Explain
+ ?
“we tried to benefit from an
extensive set of attributes
describing each of the movies in
the dataset. Those attributes
certainly carry a significant signal
and can explain some of the user
behavior. However… they could
not help at all for improving the
[predictive] accuracy.”
Bell et al., 2008
Predict ≠ Describe
Election Polls
“There is a subtle, but important, difference between
reflecting current public sentiment and predicting the
results of an election. Surveys have focused largely on
the former… [as opposed to] survey based prediction
models [that are] focused entirely on analysis and
projection”
Kenett, Pfefferman & Steinberg (2017) “Election Polls – A Survey, A Critique,
and Proposals”, Annual Rev of Stat & its Applications
Goal
Definition
Design &
Collection
Data
Preparation
EDA
Variables?
Methods? Evaluation,
Validation
& Model
Selection
Model Use &
Reporting
Observational or experiment?
Primary or secondary data?
Instrument (reliability+validity vs. measurement accuracy)
How much data?
How to sample?
Study design
& data collection
predict: increase group size
explain/describe: increase #groups
Multilevel (nested) data
School
Class
Student
Data preprocessing
Reduced-Feature Models
Saar-Tsechansky & Provost, JMLR 2007
Data exploration, viz, reduction
PCA
Factor Analysis
(interpretable)
Dimension Reduction
(fast, small)
Which variables?
multicollinearity
causation associations
endogeneity
ex-post
availability
identifiability
A, B, A*B
leading,
coincident,
lagging indicators
ensembles
long/short regression
omitted variables bias
shrinkage models
variance
bias
Methods / Models
blackbox / interpretable
mapping to theory
Evaluation, Validation & Model Selection
training datastatistical
model holdout data
Predictive power
Over-fitting
analysis
theoretical
model
statistical
model
Data
Validation
Model fit ≠
Explanatory power
Point #2
Cannot infer one from the others
explanatory
power
predictive
power
descriptive
power
out-of-sample
Performance
Metrics
type I,II errors
goodness-of-fit
p-values
overall, specific
over-fitting
costs
prediction accuracy
interpretation
training vs holdout
R2
Explanatory Power
Predictive
Power
Convinced
?
Currently in Academia
(social sciences, management)
• Theory-based explanatory modeling
• Prediction underappreciated
• Distinction blurred
• Unfamiliar with predictive modeling –
getting better
How/why use prediction
(predictive models + evaluation)
for scientific research
beyond project-specific
solution/utility/profit?
The predictive power of an
explanatory/descriptive model
has important scientific value
relevance, reality check, predictability
Generate new theory
Develop measures
Compare theories
Improve theory
Assess relevance
Evaluate predictability
Prediction for Scientific Research
Shmueli & Koppius, “Predictive Analytics in Information Systems Research”
MIS Quarterly, 2011
Currently in Industry
(and machine learning)
• Data-driven predictive modeling
• Prediction over-appreciated
• Distinction blurred
• A-B testing
• Unfamiliar with theory-based
explanatory modeling
Will the
customer
pay?
What causes
non-payment?
Implications:
Short-term solutions
Shallow/no understanding
Ethical, social, human pitfalls
Shmueli (2017) “Research Dilemmas With
Behavioral Big Data”, Big Data, vol 5(2),
pp. 98-119
How to do theory-based
explanatory modeling with
Behavioral Big Data?
Explain + Predict + Describe

More Related Content

What's hot

Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)Galit Shmueli
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualizationDr. Hamdan Al-Sabri
 
Business Development Analysis
Business Development Analysis Business Development Analysis
Business Development Analysis Manpreet Chandhok
 
Collaboration with Statistician? 矩陣視覺化於探索式資料分析
Collaboration with Statistician? 矩陣視覺化於探索式資料分析Collaboration with Statistician? 矩陣視覺化於探索式資料分析
Collaboration with Statistician? 矩陣視覺化於探索式資料分析台灣資料科學年會
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceAmit Sharma
 
Applications: Prediction
Applications: PredictionApplications: Prediction
Applications: PredictionNBER
 
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)台灣資料科學年會
 
IRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data ScienceIRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data ScienceIRJET Journal
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysisgokulprasath06
 
IRJET- Machine Learning: Introduction, Algorithms and Implementation
IRJET-  	  Machine Learning: Introduction, Algorithms and ImplementationIRJET-  	  Machine Learning: Introduction, Algorithms and Implementation
IRJET- Machine Learning: Introduction, Algorithms and ImplementationIRJET Journal
 
DoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End toolDoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End toolAmit Sharma
 
Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Aalto University
 
Explainable AI in Healthcare
Explainable AI in HealthcareExplainable AI in Healthcare
Explainable AI in Healthcarevonaurum
 
Keynote by Agus Sudjianto, Wells Fargo - Interpretable Machine Learning - H2O...
Keynote by Agus Sudjianto, Wells Fargo - Interpretable Machine Learning - H2O...Keynote by Agus Sudjianto, Wells Fargo - Interpretable Machine Learning - H2O...
Keynote by Agus Sudjianto, Wells Fargo - Interpretable Machine Learning - H2O...Sri Ambati
 
Nuts and bolts
Nuts and boltsNuts and bolts
Nuts and boltsNBER
 
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation TechniquesReview on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniquesijtsrd
 

What's hot (20)

Shmueli
ShmueliShmueli
Shmueli
 
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
 
Lime
LimeLime
Lime
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
Business Development Analysis
Business Development Analysis Business Development Analysis
Business Development Analysis
 
Collaboration with Statistician? 矩陣視覺化於探索式資料分析
Collaboration with Statistician? 矩陣視覺化於探索式資料分析Collaboration with Statistician? 矩陣視覺化於探索式資料分析
Collaboration with Statistician? 矩陣視覺化於探索式資料分析
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
 
Applications: Prediction
Applications: PredictionApplications: Prediction
Applications: Prediction
 
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
 
IRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data ScienceIRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data Science
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Predictive data analytics models and their applications
Predictive data analytics models and their applicationsPredictive data analytics models and their applications
Predictive data analytics models and their applications
 
IRJET- Machine Learning: Introduction, Algorithms and Implementation
IRJET-  	  Machine Learning: Introduction, Algorithms and ImplementationIRJET-  	  Machine Learning: Introduction, Algorithms and Implementation
IRJET- Machine Learning: Introduction, Algorithms and Implementation
 
DoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End toolDoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End tool
 
Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"
 
Explainable AI in Healthcare
Explainable AI in HealthcareExplainable AI in Healthcare
Explainable AI in Healthcare
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Keynote by Agus Sudjianto, Wells Fargo - Interpretable Machine Learning - H2O...
Keynote by Agus Sudjianto, Wells Fargo - Interpretable Machine Learning - H2O...Keynote by Agus Sudjianto, Wells Fargo - Interpretable Machine Learning - H2O...
Keynote by Agus Sudjianto, Wells Fargo - Interpretable Machine Learning - H2O...
 
Nuts and bolts
Nuts and boltsNuts and bolts
Nuts and bolts
 
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation TechniquesReview on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
Review on Algorithmic and Non Algorithmic Software Cost Estimation Techniques
 

Similar to Statistical Modeling in 3D: Describing, Explaining and Predicting

Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingGalit Shmueli
 
To Explain Or To Predict?
To Explain Or To Predict?To Explain Or To Predict?
To Explain Or To Predict?Galit Shmueli
 
1.model building
1.model building1.model building
1.model buildingVinod Sahu
 
Rsqrd AI: Recent Advances in Explainable Machine Learning Research
Rsqrd AI: Recent Advances in Explainable Machine Learning ResearchRsqrd AI: Recent Advances in Explainable Machine Learning Research
Rsqrd AI: Recent Advances in Explainable Machine Learning ResearchSanjana Chowdhury
 
Chao Wrote Some trends that influence human resource are, Leade.docx
Chao Wrote Some trends that influence human resource are, Leade.docxChao Wrote Some trends that influence human resource are, Leade.docx
Chao Wrote Some trends that influence human resource are, Leade.docxsleeperharwell
 
Chao Wrote Some trends that influence human resource are, Leade.docx
Chao Wrote Some trends that influence human resource are, Leade.docxChao Wrote Some trends that influence human resource are, Leade.docx
Chao Wrote Some trends that influence human resource are, Leade.docxketurahhazelhurst
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
G. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsG. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsIstituto nazionale di statistica
 
Real life application of statistics in engineering
Real life application of statistics in engineeringReal life application of statistics in engineering
Real life application of statistics in engineeringJannatulFerdous160
 
Relevance of statistics sgd-slideshare
Relevance of statistics sgd-slideshareRelevance of statistics sgd-slideshare
Relevance of statistics sgd-slideshareSanjeev Deshmukh
 
Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis Peter Reimann
 
algorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyalgorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyPaolo Missier
 
KatKennedy REU D.C. Poster
KatKennedy REU D.C. PosterKatKennedy REU D.C. Poster
KatKennedy REU D.C. PosterKatlynn Kennedy
 
An introduction to machine learning in biomedical research: Key concepts, pr...
An introduction to machine learning in biomedical research:  Key concepts, pr...An introduction to machine learning in biomedical research:  Key concepts, pr...
An introduction to machine learning in biomedical research: Key concepts, pr...FranciscoJAzuajeG
 
Statistical Techniques for Processing & Analysis of Data Part 9.pdf
Statistical Techniques for Processing & Analysis of Data Part 9.pdfStatistical Techniques for Processing & Analysis of Data Part 9.pdf
Statistical Techniques for Processing & Analysis of Data Part 9.pdfAdebisiAdetayo1
 
Tech meetup Data Driven - Codemotion
Tech meetup Data Driven - Codemotion Tech meetup Data Driven - Codemotion
Tech meetup Data Driven - Codemotion antimo musone
 
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...Intel® Software
 
West-Vanderbilt-Talk--Revised-22March2017.ppt
West-Vanderbilt-Talk--Revised-22March2017.pptWest-Vanderbilt-Talk--Revised-22March2017.ppt
West-Vanderbilt-Talk--Revised-22March2017.pptkait23
 

Similar to Statistical Modeling in 3D: Describing, Explaining and Predicting (20)

Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, Describing
 
To Explain Or To Predict?
To Explain Or To Predict?To Explain Or To Predict?
To Explain Or To Predict?
 
1.model building
1.model building1.model building
1.model building
 
Rsqrd AI: Recent Advances in Explainable Machine Learning Research
Rsqrd AI: Recent Advances in Explainable Machine Learning ResearchRsqrd AI: Recent Advances in Explainable Machine Learning Research
Rsqrd AI: Recent Advances in Explainable Machine Learning Research
 
Chao Wrote Some trends that influence human resource are, Leade.docx
Chao Wrote Some trends that influence human resource are, Leade.docxChao Wrote Some trends that influence human resource are, Leade.docx
Chao Wrote Some trends that influence human resource are, Leade.docx
 
Chao Wrote Some trends that influence human resource are, Leade.docx
Chao Wrote Some trends that influence human resource are, Leade.docxChao Wrote Some trends that influence human resource are, Leade.docx
Chao Wrote Some trends that influence human resource are, Leade.docx
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
Data Analysis
Data Analysis Data Analysis
Data Analysis
 
G. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statisticsG. Barcaroli, The use of machine learning in official statistics
G. Barcaroli, The use of machine learning in official statistics
 
Real life application of statistics in engineering
Real life application of statistics in engineeringReal life application of statistics in engineering
Real life application of statistics in engineering
 
Relevance of statistics sgd-slideshare
Relevance of statistics sgd-slideshareRelevance of statistics sgd-slideshare
Relevance of statistics sgd-slideshare
 
Exploratory data analysis
Exploratory data analysis Exploratory data analysis
Exploratory data analysis
 
algorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyalgorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparency
 
KatKennedy REU D.C. Poster
KatKennedy REU D.C. PosterKatKennedy REU D.C. Poster
KatKennedy REU D.C. Poster
 
An introduction to machine learning in biomedical research: Key concepts, pr...
An introduction to machine learning in biomedical research:  Key concepts, pr...An introduction to machine learning in biomedical research:  Key concepts, pr...
An introduction to machine learning in biomedical research: Key concepts, pr...
 
Statistical Techniques for Processing & Analysis of Data Part 9.pdf
Statistical Techniques for Processing & Analysis of Data Part 9.pdfStatistical Techniques for Processing & Analysis of Data Part 9.pdf
Statistical Techniques for Processing & Analysis of Data Part 9.pdf
 
PREDICT 422 - Module 1.pptx
PREDICT 422 - Module 1.pptxPREDICT 422 - Module 1.pptx
PREDICT 422 - Module 1.pptx
 
Tech meetup Data Driven - Codemotion
Tech meetup Data Driven - Codemotion Tech meetup Data Driven - Codemotion
Tech meetup Data Driven - Codemotion
 
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
 
West-Vanderbilt-Talk--Revised-22March2017.ppt
West-Vanderbilt-Talk--Revised-22March2017.pptWest-Vanderbilt-Talk--Revised-22March2017.ppt
West-Vanderbilt-Talk--Revised-22March2017.ppt
 

More from Galit Shmueli

Behavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare ResearchBehavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare ResearchGalit Shmueli
 
Reinventing the Data Analytics Classroom
Reinventing the Data Analytics ClassroomReinventing the Data Analytics Classroom
Reinventing the Data Analytics ClassroomGalit Shmueli
 
Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Behavioral Big Data & Healthcare Research: Talk at WiDS TaipeiBehavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Behavioral Big Data & Healthcare Research: Talk at WiDS TaipeiGalit Shmueli
 
Workshop on Information Quality
Workshop on Information QualityWorkshop on Information Quality
Workshop on Information QualityGalit Shmueli
 
Behavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should CareBehavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should CareGalit Shmueli
 
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Galit Shmueli
 
Prediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PMPrediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PMGalit Shmueli
 
When Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of MarriageWhen Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of MarriageGalit Shmueli
 
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...Galit Shmueli
 
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...Galit Shmueli
 
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...Galit Shmueli
 
Research Using Behavioral Big Data (BBD)
Research Using Behavioral Big Data (BBD)Research Using Behavioral Big Data (BBD)
Research Using Behavioral Big Data (BBD)Galit Shmueli
 
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral IssuesAnalyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral IssuesGalit Shmueli
 
Information Quality: A Framework for Evaluating Empirical Studies
Information Quality: A Framework for Evaluating Empirical Studies Information Quality: A Framework for Evaluating Empirical Studies
Information Quality: A Framework for Evaluating Empirical Studies Galit Shmueli
 
E.SUN Academic Award presentation (Jan 2016)
E.SUN Academic Award presentation (Jan 2016)E.SUN Academic Award presentation (Jan 2016)
E.SUN Academic Award presentation (Jan 2016)Galit Shmueli
 
Big Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative IndustriesBig Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative IndustriesGalit Shmueli
 
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)Galit Shmueli
 
Introducing the NTHU-EZTABLE Kaggle Contest (Predicting Repeat Restaurant Boo...
Introducing the NTHU-EZTABLE Kaggle Contest (Predicting Repeat Restaurant Boo...Introducing the NTHU-EZTABLE Kaggle Contest (Predicting Repeat Restaurant Boo...
Introducing the NTHU-EZTABLE Kaggle Contest (Predicting Repeat Restaurant Boo...Galit Shmueli
 
Opening Data With Kaggle
Opening Data With KaggleOpening Data With Kaggle
Opening Data With KaggleGalit Shmueli
 
Linear Probability Models and Big Data: Kosher or Not?
Linear Probability Models and Big Data: Kosher or Not?Linear Probability Models and Big Data: Kosher or Not?
Linear Probability Models and Big Data: Kosher or Not?Galit Shmueli
 

More from Galit Shmueli (20)

Behavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare ResearchBehavioral Big Data & Healthcare Research
Behavioral Big Data & Healthcare Research
 
Reinventing the Data Analytics Classroom
Reinventing the Data Analytics ClassroomReinventing the Data Analytics Classroom
Reinventing the Data Analytics Classroom
 
Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Behavioral Big Data & Healthcare Research: Talk at WiDS TaipeiBehavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei
 
Workshop on Information Quality
Workshop on Information QualityWorkshop on Information Quality
Workshop on Information Quality
 
Behavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should CareBehavioral Big Data: Why Quality Engineers Should Care
Behavioral Big Data: Why Quality Engineers Should Care
 
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...Researcher Dilemmas  using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
 
Prediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PMPrediction-based Model Selection in PLS-PM
Prediction-based Model Selection in PLS-PM
 
When Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of MarriageWhen Prediction Met PLS: What We learned in 3 Years of Marriage
When Prediction Met PLS: What We learned in 3 Years of Marriage
 
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
 
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
 
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
 
Research Using Behavioral Big Data (BBD)
Research Using Behavioral Big Data (BBD)Research Using Behavioral Big Data (BBD)
Research Using Behavioral Big Data (BBD)
 
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral IssuesAnalyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
 
Information Quality: A Framework for Evaluating Empirical Studies
Information Quality: A Framework for Evaluating Empirical Studies Information Quality: A Framework for Evaluating Empirical Studies
Information Quality: A Framework for Evaluating Empirical Studies
 
E.SUN Academic Award presentation (Jan 2016)
E.SUN Academic Award presentation (Jan 2016)E.SUN Academic Award presentation (Jan 2016)
E.SUN Academic Award presentation (Jan 2016)
 
Big Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative IndustriesBig Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative Industries
 
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
 
Introducing the NTHU-EZTABLE Kaggle Contest (Predicting Repeat Restaurant Boo...
Introducing the NTHU-EZTABLE Kaggle Contest (Predicting Repeat Restaurant Boo...Introducing the NTHU-EZTABLE Kaggle Contest (Predicting Repeat Restaurant Boo...
Introducing the NTHU-EZTABLE Kaggle Contest (Predicting Repeat Restaurant Boo...
 
Opening Data With Kaggle
Opening Data With KaggleOpening Data With Kaggle
Opening Data With Kaggle
 
Linear Probability Models and Big Data: Kosher or Not?
Linear Probability Models and Big Data: Kosher or Not?Linear Probability Models and Big Data: Kosher or Not?
Linear Probability Models and Big Data: Kosher or Not?
 

Recently uploaded

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 

Recently uploaded (20)

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 

Statistical Modeling in 3D: Describing, Explaining and Predicting

  • 1. Statistical Modeling in 3D: Describing Explaining Predicting University of Padova, Jun 15, 2018 Galit Shmueli 徐茉 莉 Institute of Service Science
  • 2. 1997-2000 (PhD, Statistics) Israel Institute of Technology Faculty of IE & M 2000-2002 Carnegie Mellon Univ. Department of Statistics 2002-2012 Univ. of Maryland Smith School of Business 2011-2014 Indian School of Business Hyderabad, India 2014-… National Tsing Hua Univ. Institute of Service Science My Academic Path My Research ‘Entrepreneurial’ statistical & data mining modeling Interdisciplinary Statistical Strategy • To Explain or To Predict? • Information Quality • Data Mining for Causality • Predicting with Causal Models
  • 3. Road Map 1. Definitions 2. Monopolies & confusion in academia & industry 3. Explanatory, predictive, descriptive modeling & evaluation are different Why? Different modeling paths Explanatory power vs. predictive power 4. Where next?
  • 4. Definitions: Explain Explanatory modeling theory-based, statistical testing of causal hypotheses Explanatory power strength of relationship in statistical model
  • 5. Definitions: Predict Predictive modeling empirical method for predicting new observations Predictive power ability to accurately predict new observations
  • 6. Definitions: Describe Descriptive modeling statistical model for approximating a distribution or relationship Descriptive power goodness of fit, generalizable to population
  • 8.
  • 9. Social sciences & management research Domination of ”Explain” Purpose: test causal theory (“explain”) Association-based statistical models Prediction & description nearly absent
  • 10. Start with a causal theory Generate causal hypotheses on constructs Operationalize constructs → measurable variables Fit statistical model Classic journal paper Statistical inference → causal conclusions
  • 11. Misconception #1: The same model is best for explaining, describing, predicting Social Sci & Mgmt: Build explanatory model and use it to ”predict” “A good explanatory model will also predict well” “You must understand the underlying causes in order to predict” “To examine the predictive power of the proposed model, we compare it to four models in terms of R2 adjusted”
  • 12. Misconception #1: The same model is best for explaining, describing, predicting CS/eng/stat: Build a predictive model and use it to ”explain” in cs / stat / engineering / industry 2014 6th International Conference on Mobile Computing, Applications and Services (Agent-based modeling using census data) “our model is able to provide both predictions of how the population may vote and why they are voting this way”… 2009 IEEE International Conference on Systems, Man and Cybernetics
  • 13. Misconception #2: explain > predict or predict > explain Emanuel Parzen, Comment on “Statistical Modeling: The Two Cultures” Statistical Science 2001 “Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all” *Chris Anderson is the editor in chief of Wired
  • 14.
  • 15. Philosophy of Science “Explanation and prediction have the same logical structure” Hempel & Oppenheim, 1948 “It becomes pertinent to investigate the possibilities of predictive procedures autonomous of those used for explanation” Helmer & Rescher, 1959 “Theories of social and human behavior address themselves to two distinct goals of science: (1) prediction and (2) understanding” Dubin, Theory Building, 1969
  • 16. Why statistical explanatory modeling predictive modeling descriptive modeling are different
  • 17. Explanatory Model: test/quantify causal effect between constructs for “average” unit in population Descriptive Model: test/quantify distribution or correlation structure for measured “average” unit in population Predictive Model: predict values for new/future individual units Different Scientific Goals Different generalization
  • 18. Theory vs. its manifestation ?
  • 19. Notation Theoretical constructs: X, Y Causal theoretical model: Y=F(X) Measurable variables: X, Y Statistical model: E(y)=f(X) Breiman, “Stat Modeling: The Two Cultures”, Stat Science, 2001
  • 20. Five aspects to consider Theory – Causation – Retrospective – Bias – Average unit – Data Association Prospective Variance Individual unit
  • 21. “The goal of finding models that are predictively accurate differs from the goal of finding models that are true.”
  • 22. But there’s more than bias-variance
  • 23. Example: Regression Model for Explanation yi|xi = b0 + b1xi +b2 xcontrols + ei parameter of interest (inference) Chosen to avoid Omitted Var Bias (better to over-specify) Measures of X, Y constructs Underlying model: X Y Danger: endogeneity
  • 24. yi|xi = b0 + b1 x1i +…+bp xpi + ei parameters of interest (inference) Chosen b/c related to Y Danger: multicollinearity All variables treated/interpreted as observable Remain in model only if statistically significant Residual analysis for GoF & test assumptions Example: Regression Model for Description
  • 25. yi|xi = b0 + b1 x1i +…+bp xpi + ei Quantity of interest for new i’s (prediction) Chosen b/c possibly correlated with Y Danger: over-fitting All variables treated as observable, available at time of prediction Retain only if improve out- of-sample prediction Evaluate overfitting (train vs holdout) Example: Regression Model for Prediction
  • 27. Predict ≠ Explain + ? “we tried to benefit from an extensive set of attributes describing each of the movies in the dataset. Those attributes certainly carry a significant signal and can explain some of the user behavior. However… they could not help at all for improving the [predictive] accuracy.” Bell et al., 2008
  • 28. Predict ≠ Describe Election Polls “There is a subtle, but important, difference between reflecting current public sentiment and predicting the results of an election. Surveys have focused largely on the former… [as opposed to] survey based prediction models [that are] focused entirely on analysis and projection” Kenett, Pfefferman & Steinberg (2017) “Election Polls – A Survey, A Critique, and Proposals”, Annual Rev of Stat & its Applications
  • 30. Observational or experiment? Primary or secondary data? Instrument (reliability+validity vs. measurement accuracy) How much data? How to sample? Study design & data collection predict: increase group size explain/describe: increase #groups Multilevel (nested) data School Class Student
  • 32. Data exploration, viz, reduction PCA Factor Analysis (interpretable) Dimension Reduction (fast, small)
  • 34. ensembles long/short regression omitted variables bias shrinkage models variance bias Methods / Models blackbox / interpretable mapping to theory
  • 35. Evaluation, Validation & Model Selection training datastatistical model holdout data Predictive power Over-fitting analysis theoretical model statistical model Data Validation Model fit ≠ Explanatory power
  • 36. Point #2 Cannot infer one from the others explanatory power predictive power descriptive power
  • 37. out-of-sample Performance Metrics type I,II errors goodness-of-fit p-values overall, specific over-fitting costs prediction accuracy interpretation training vs holdout R2
  • 40.
  • 41. Currently in Academia (social sciences, management) • Theory-based explanatory modeling • Prediction underappreciated • Distinction blurred • Unfamiliar with predictive modeling – getting better How/why use prediction (predictive models + evaluation) for scientific research beyond project-specific solution/utility/profit?
  • 42. The predictive power of an explanatory/descriptive model has important scientific value relevance, reality check, predictability
  • 43. Generate new theory Develop measures Compare theories Improve theory Assess relevance Evaluate predictability Prediction for Scientific Research Shmueli & Koppius, “Predictive Analytics in Information Systems Research” MIS Quarterly, 2011
  • 44. Currently in Industry (and machine learning) • Data-driven predictive modeling • Prediction over-appreciated • Distinction blurred • A-B testing • Unfamiliar with theory-based explanatory modeling Will the customer pay? What causes non-payment?
  • 45. Implications: Short-term solutions Shallow/no understanding Ethical, social, human pitfalls Shmueli (2017) “Research Dilemmas With Behavioral Big Data”, Big Data, vol 5(2), pp. 98-119 How to do theory-based explanatory modeling with Behavioral Big Data?
  • 46. Explain + Predict + Describe