Your SlideShare is downloading.
×

×

Saving this for later?
Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.

Text the download link to your phone

Standard text messaging rates apply

Like this presentation? Why not share!

937

Published on

Presentation at special event "To Explain or To Predict?" at Tel Aviv University, July 9, 2012. Event co-organized by the Israel Statistical Association and Tel Aviv University's Department of …

Presentation at special event "To Explain or To Predict?" at Tel Aviv University, July 9, 2012. Event co-organized by the Israel Statistical Association and Tel Aviv University's Department of Statistics and OR.

Published in:
Technology

No Downloads

Total Views

937

On Slideshare

0

From Embeds

0

Number of Embeds

1

Shares

0

Downloads

0

Comments

0

Likes

2

No embeds

No notes for slide

- Galit Shmuéli Ij Israel Statistical Association & Tel Aviv University July 9, 2012 To Explain or To Predict?
- Points for discussion: goo.gl/gcjlNTwitter: #explainpredict
- Road MapDefinitionsExplanatory-dominated social sciencesExplanatory ≠ predictive modeling Why? Different modeling paths Explanatory vs. predictive powerSo what?
- DefinitionsExplanatory modeling:Theory-based, statistical testing ofcausal hypothesesExplanatory power:Strength of relationship in statisticalmodel
- DefinitionsPredictive modeling:Empirical method for predicting newobservationsPredictive power:Ability to accurately predict newobservations
- Statistical modeling in social science researchPurpose: test causal theory (“explain”) Association-based statistical models Prediction nearly absent
- Explanatory modeling à-la social sciencesStart with a causaltheoryGenerate causalhypotheses onconstructsOperationalize constructs → Measurable variablesFit statistical modelStatistical inference → Causal conclusions
- In the social sciences,data analysis is mainly used for testing causal theory. “If it explains, it predicts”
- “Empirical prediction alone is un-scientific”Some statisticians share this view: The two goals in analyzing data... I prefer to describe as “management” and “science”. Management seeks profit... Science seeks truth. - Parzen, Statistical Science 2001
- 52 “predictive” articles among 1,072in Information Systems top journals
- Why Predict? for Scientific Research new theory develop measures compare theories improve theory assess relevance predictabilityShmueli & Koppius, “Predictive Analytics in IS Research”(MISQ, 2011)
- “A good explanatory model will alsopredict well”“You must understand the underlyingcauses in order to predict”
- Philosophy of Science“Explanation and prediction have thesame logical structure” Hempel & Oppenheim, 1948 “It becomes pertinent to investigate the possibilities of predictive procedures autonomous of those used for explanation” Helmer & Rescher, 1959 “Theories of social and human behavior address themselves to two distinct goals of science: (1) prediction and (2) understanding” Dubin, Theory Building, 1969
- Why statisticalexplanatory modeling differs frompredictive modeling Shmueli (2010), Statistical Science
- Theory vs. its manifestation ?
- NotationTheoretical constructs: X, YCausal theoretical model: Y=F(X)Measurable variables: X, YStatistical model: E(y)=f(X)
- Four aspects Y=F(X) E(Y)=f(X)1. Theory – Data2. Causation – Association3. Retrospective – Prospective4. Bias - Variance
- “The goal of finding models that arepredictively accurate differs from thegoal of finding models that are true.”
- Point #1Best explanatory model ≠ Best predictive model
- Four aspects Y=F(X) Y=f(X)1. Theory - Data2. Causation – Association3. Retrospective – Prospective4. Bias - Variance
- Predict ≠ Explain “we tried to benefit from an extensive set of attributes describing each of the movies in the dataset. Those attributes certainly carry a significant signal and + can explain some of the user behavior. However… they could not help at all ? for improving the [predictive] accuracy.” Bell et al., 2008
- Predict ≠ ExplainThe FDA considers two productsbioequivalent if the 90% CI of therelative mean of the generic to brandformulation is within 80%-125%“We are planning to… develop predictive models for bioavailabilityand bioequivalence” Lester M. Crawford, 2005 Acting Commissioner of Food & Drugs
- Goal Design & Data EDADefinition Collection PreparationVariables? Model Use &Methods? Evaluation, V Reporting alidation & Model Selection
- Study design & data collectionObservational or experiment?Primary or secondary data?Instrument (reliability+validity vs. measur accuracy)How much data?How to sample? Hierarchical data
- Data Preprocessing missing reduced- feature models partitioning
- Data exploration & reduction Interactive visualization PCA SVD
- Which Variables? endogeneity ex-post availabilitycausation associations Multicollinearity? A, B, A*B?
- Methods / Models Blackbox / interpretable Mapping to theory variance biasShrinkage models ensembles
- Model fit ≠ Validation Explanatory powerTheoretical Empirical Data model model Evaluation, Validation & Model SelectionEmpirical Training data Over-fitting model Holdout data analysis Predictive power
- Model Use test causal theory Inference Null hypothesisnew theoryDevelop measurescompare theories Predictive performanceimprove theory Naïve/baselineassess relevance Over-fitting analysispredictability
- Point #2Explanatory Predictive Power ≠ PowerCannot infer one from the other
- out-of-sample interpretationp-values prediction accuracy PerformanceR2 costs Metrics Training vs.goodness-of-fit holdout type I,II errors over-fitting
- Predictive Power Explanatory Power
- The predictive power of anexplanatory model has importantscientific valueRelevance, reality check, predictability
- In “explanatory” fieldsPrediction underappreciatedDistinction blurredUnfamiliar with predictivemodeling/assessment “While the value of scientific prediction… is beyond question… the inexact sciences *do not+ have…the use of predictive expertise well in hand.” Helmer & Rescher, 1959
- How does all this impact Scientific Research?
- What can be done? acknowledgeincorporate prediction into curriculum
- What happens in other fields? Epidemiology Engineering Life sciencesWhat about “predictive only”fields? http://goo.gl/gcjlN
- Shmueli (2010), “To Explain or To Predict?”, Statistical ScienceShmueli & Koppius (2011), “Predictive analytics in IS research”, MISQ

Be the first to comment