Your SlideShare is downloading. ×
0
Galit Shmuéli       Ij       Israel Statistical Association                    & Tel Aviv University                      ...
Points for discussion: goo.gl/gcjlNTwitter: #explainpredict
Road MapDefinitionsExplanatory-dominated social sciencesExplanatory ≠ predictive modeling Why? Different modeling paths Ex...
DefinitionsExplanatory modeling:Theory-based, statistical testing ofcausal hypothesesExplanatory power:Strength of relatio...
DefinitionsPredictive modeling:Empirical method for predicting newobservationsPredictive power:Ability to accurately predi...
Statistical modeling in     social science researchPurpose: test causal theory (“explain”)           Association-based sta...
Explanatory modeling à-la social sciencesStart with a causaltheoryGenerate causalhypotheses onconstructsOperationalize con...
In the social sciences,data analysis is mainly used for testing            causal theory.     “If it explains, it predicts”
“Empirical prediction alone            is un-scientific”Some statisticians share this view:   The two goals in analyzing d...
52 “predictive” articles among 1,072in Information Systems top journals
Why Predict? for Scientific Research          new theory          develop measures          compare theories          impr...
“A good explanatory model will alsopredict well”“You must understand the underlyingcauses in order to predict”
Philosophy of Science“Explanation and prediction have thesame logical structure”                Hempel & Oppenheim, 1948  ...
Why statisticalexplanatory modeling       differs frompredictive modeling   Shmueli (2010), Statistical Science
Theory vs. its manifestation                     ?
NotationTheoretical constructs: X, YCausal theoretical model: Y=F(X)Measurable variables: X, YStatistical model: E(y)=f(X)
Four aspects                 Y=F(X)                             E(Y)=f(X)1. Theory – Data2. Causation – Association3. Retr...
“The goal of finding models that arepredictively accurate differs from thegoal of finding models that are true.”
Point #1Best explanatory model              ≠       Best predictive model
Four aspects                 Y=F(X)                             Y=f(X)1. Theory - Data2. Causation – Association3. Retrosp...
Predict ≠ Explain               “we tried to benefit from an extensive               set of attributes describing each of ...
Predict ≠ ExplainThe FDA considers two productsbioequivalent if the 90% CI of therelative mean of the generic to brandform...
Goal       Design &         Data          EDADefinition   Collection    PreparationVariables?                             ...
Study design    & data collectionObservational or experiment?Primary or secondary data?Instrument (reliability+validity vs...
Data Preprocessing   missing    reduced-               feature               models                         partitioning
Data exploration & reduction                   Interactive                  visualization                      PCA        ...
Which Variables?                        endogeneity                          ex-post                         availabilityc...
Methods / Models                    Blackbox / interpretable                    Mapping to theory   variance              ...
Model fit ≠              Validation                                 Explanatory powerTheoretical                Empirical ...
Model Use test causal theory         Inference                             Null hypothesisnew theoryDevelop measurescompar...
Point #2Explanatory            Predictive  Power         ≠        PowerCannot infer one from the other
out-of-sample interpretationp-values                        prediction                                 accuracy           ...
Predictive Power                   Explanatory Power
The predictive power of anexplanatory model has importantscientific valueRelevance, reality check, predictability
In “explanatory” fieldsPrediction underappreciatedDistinction blurredUnfamiliar with predictivemodeling/assessment  “While...
How does all this impact   Scientific Research?
What can be done?   acknowledgeincorporate prediction into         curriculum
What happens in other fields?     Epidemiology         Engineering             Life sciencesWhat about “predictive only”fi...
Shmueli (2010), “To Explain or To Predict?”, Statistical ScienceShmueli & Koppius (2011), “Predictive analytics in IS rese...
Upcoming SlideShare
Loading in...5
×

Shmueli

2,728

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,728
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Shmueli"

  1. 1. Galit Shmuéli Ij Israel Statistical Association & Tel Aviv University July 9, 2012 To Explain or To Predict? ?‫להסביר או לנבא‬
  2. 2. Points for discussion: goo.gl/gcjlNTwitter: #explainpredict
  3. 3. Road MapDefinitionsExplanatory-dominated social sciencesExplanatory ≠ predictive modeling Why? Different modeling paths Explanatory vs. predictive powerSo what?
  4. 4. DefinitionsExplanatory modeling:Theory-based, statistical testing ofcausal hypothesesExplanatory power:Strength of relationship in statisticalmodel
  5. 5. DefinitionsPredictive modeling:Empirical method for predicting newobservationsPredictive power:Ability to accurately predict newobservations
  6. 6. Statistical modeling in social science researchPurpose: test causal theory (“explain”) Association-based statistical models Prediction nearly absent
  7. 7. Explanatory modeling à-la social sciencesStart with a causaltheoryGenerate causalhypotheses onconstructsOperationalize constructs → Measurable variablesFit statistical modelStatistical inference → Causal conclusions
  8. 8. In the social sciences,data analysis is mainly used for testing causal theory. “If it explains, it predicts”
  9. 9. “Empirical prediction alone is un-scientific”Some statisticians share this view: The two goals in analyzing data... I prefer to describe as “management” and “science”. Management seeks profit... Science seeks truth. - Parzen, Statistical Science 2001
  10. 10. 52 “predictive” articles among 1,072in Information Systems top journals
  11. 11. Why Predict? for Scientific Research new theory develop measures compare theories improve theory assess relevance predictabilityShmueli & Koppius, “Predictive Analytics in IS Research”(MISQ, 2011)
  12. 12. “A good explanatory model will alsopredict well”“You must understand the underlyingcauses in order to predict”
  13. 13. Philosophy of Science“Explanation and prediction have thesame logical structure” Hempel & Oppenheim, 1948 “It becomes pertinent to investigate the possibilities of predictive procedures autonomous of those used for explanation” Helmer & Rescher, 1959 “Theories of social and human behavior address themselves to two distinct goals of science: (1) prediction and (2) understanding” Dubin, Theory Building, 1969
  14. 14. Why statisticalexplanatory modeling differs frompredictive modeling Shmueli (2010), Statistical Science
  15. 15. Theory vs. its manifestation ?
  16. 16. NotationTheoretical constructs: X, YCausal theoretical model: Y=F(X)Measurable variables: X, YStatistical model: E(y)=f(X)
  17. 17. Four aspects Y=F(X) E(Y)=f(X)1. Theory – Data2. Causation – Association3. Retrospective – Prospective4. Bias - Variance
  18. 18. “The goal of finding models that arepredictively accurate differs from thegoal of finding models that are true.”
  19. 19. Point #1Best explanatory model ≠ Best predictive model
  20. 20. Four aspects Y=F(X) Y=f(X)1. Theory - Data2. Causation – Association3. Retrospective – Prospective4. Bias - Variance
  21. 21. Predict ≠ Explain “we tried to benefit from an extensive set of attributes describing each of the movies in the dataset. Those attributes certainly carry a significant signal and + can explain some of the user behavior. However… they could not help at all ? for improving the [predictive] accuracy.” Bell et al., 2008
  22. 22. Predict ≠ ExplainThe FDA considers two productsbioequivalent if the 90% CI of therelative mean of the generic to brandformulation is within 80%-125%“We are planning to… develop predictive models for bioavailabilityand bioequivalence” Lester M. Crawford, 2005 Acting Commissioner of Food & Drugs
  23. 23. Goal Design & Data EDADefinition Collection PreparationVariables? Model Use &Methods? Evaluation, Reporting Validation & Model Selection
  24. 24. Study design & data collectionObservational or experiment?Primary or secondary data?Instrument (reliability+validity vs. measur accuracy)How much data?How to sample? Hierarchical data
  25. 25. Data Preprocessing missing reduced- feature models partitioning
  26. 26. Data exploration & reduction Interactive visualization PCA SVD
  27. 27. Which Variables? endogeneity ex-post availabilitycausation associations Multicollinearity? A, B, A*B?
  28. 28. Methods / Models Blackbox / interpretable Mapping to theory variance biasShrinkage models ensembles
  29. 29. Model fit ≠ Validation Explanatory powerTheoretical Empirical Data model model Evaluation, Validation & Model SelectionEmpirical Training data Over-fitting model Holdout data analysis Predictive power
  30. 30. Model Use test causal theory Inference Null hypothesisnew theoryDevelop measurescompare theories Predictive performanceimprove theory Naïve/baselineassess relevance Over-fitting analysispredictability
  31. 31. Point #2Explanatory Predictive Power ≠ PowerCannot infer one from the other
  32. 32. out-of-sample interpretationp-values prediction accuracy PerformanceR2 costs Metrics Training vs.goodness-of-fit holdout type I,II errors over-fitting
  33. 33. Predictive Power Explanatory Power
  34. 34. The predictive power of anexplanatory model has importantscientific valueRelevance, reality check, predictability
  35. 35. In “explanatory” fieldsPrediction underappreciatedDistinction blurredUnfamiliar with predictivemodeling/assessment “While the value of scientific prediction… is beyond question… the inexact sciences [do not] have…the use of predictive expertise well in hand.” Helmer & Rescher, 1959
  36. 36. How does all this impact Scientific Research?
  37. 37. What can be done? acknowledgeincorporate prediction into curriculum
  38. 38. What happens in other fields? Epidemiology Engineering Life sciencesWhat about “predictive only”fields? http://goo.gl/gcjlN
  39. 39. Shmueli (2010), “To Explain or To Predict?”, Statistical ScienceShmueli & Koppius (2011), “Predictive analytics in IS research”, MISQ
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×