• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content







Total Views
Views on SlideShare
Embed Views



6 Embeds 2,283

http://www.statistics.org.il 2215
http://statistics.org.il 50
http://pachinger.net 10
http://translate.googleusercontent.com 3
http://www.statistics.org.il. 3
http://abtasty.com 2



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Shmueli Shmueli Presentation Transcript

    • Galit Shmuéli Ij Israel Statistical Association & Tel Aviv University July 9, 2012 To Explain or To Predict? ?‫להסביר או לנבא‬
    • Points for discussion: goo.gl/gcjlNTwitter: #explainpredict
    • Road MapDefinitionsExplanatory-dominated social sciencesExplanatory ≠ predictive modeling Why? Different modeling paths Explanatory vs. predictive powerSo what?
    • DefinitionsExplanatory modeling:Theory-based, statistical testing ofcausal hypothesesExplanatory power:Strength of relationship in statisticalmodel
    • DefinitionsPredictive modeling:Empirical method for predicting newobservationsPredictive power:Ability to accurately predict newobservations
    • Statistical modeling in social science researchPurpose: test causal theory (“explain”) Association-based statistical models Prediction nearly absent
    • Explanatory modeling à-la social sciencesStart with a causaltheoryGenerate causalhypotheses onconstructsOperationalize constructs → Measurable variablesFit statistical modelStatistical inference → Causal conclusions
    • In the social sciences,data analysis is mainly used for testing causal theory. “If it explains, it predicts”
    • “Empirical prediction alone is un-scientific”Some statisticians share this view: The two goals in analyzing data... I prefer to describe as “management” and “science”. Management seeks profit... Science seeks truth. - Parzen, Statistical Science 2001
    • 52 “predictive” articles among 1,072in Information Systems top journals
    • Why Predict? for Scientific Research new theory develop measures compare theories improve theory assess relevance predictabilityShmueli & Koppius, “Predictive Analytics in IS Research”(MISQ, 2011)
    • “A good explanatory model will alsopredict well”“You must understand the underlyingcauses in order to predict”
    • Philosophy of Science“Explanation and prediction have thesame logical structure” Hempel & Oppenheim, 1948 “It becomes pertinent to investigate the possibilities of predictive procedures autonomous of those used for explanation” Helmer & Rescher, 1959 “Theories of social and human behavior address themselves to two distinct goals of science: (1) prediction and (2) understanding” Dubin, Theory Building, 1969
    • Why statisticalexplanatory modeling differs frompredictive modeling Shmueli (2010), Statistical Science
    • Theory vs. its manifestation ?
    • NotationTheoretical constructs: X, YCausal theoretical model: Y=F(X)Measurable variables: X, YStatistical model: E(y)=f(X)
    • Four aspects Y=F(X) E(Y)=f(X)1. Theory – Data2. Causation – Association3. Retrospective – Prospective4. Bias - Variance
    • “The goal of finding models that arepredictively accurate differs from thegoal of finding models that are true.”
    • Point #1Best explanatory model ≠ Best predictive model
    • Four aspects Y=F(X) Y=f(X)1. Theory - Data2. Causation – Association3. Retrospective – Prospective4. Bias - Variance
    • Predict ≠ Explain “we tried to benefit from an extensive set of attributes describing each of the movies in the dataset. Those attributes certainly carry a significant signal and + can explain some of the user behavior. However… they could not help at all ? for improving the [predictive] accuracy.” Bell et al., 2008
    • Predict ≠ ExplainThe FDA considers two productsbioequivalent if the 90% CI of therelative mean of the generic to brandformulation is within 80%-125%“We are planning to… develop predictive models for bioavailabilityand bioequivalence” Lester M. Crawford, 2005 Acting Commissioner of Food & Drugs
    • Goal Design & Data EDADefinition Collection PreparationVariables? Model Use &Methods? Evaluation, Reporting Validation & Model Selection
    • Study design & data collectionObservational or experiment?Primary or secondary data?Instrument (reliability+validity vs. measur accuracy)How much data?How to sample? Hierarchical data
    • Data Preprocessing missing reduced- feature models partitioning
    • Data exploration & reduction Interactive visualization PCA SVD
    • Which Variables? endogeneity ex-post availabilitycausation associations Multicollinearity? A, B, A*B?
    • Methods / Models Blackbox / interpretable Mapping to theory variance biasShrinkage models ensembles
    • Model fit ≠ Validation Explanatory powerTheoretical Empirical Data model model Evaluation, Validation & Model SelectionEmpirical Training data Over-fitting model Holdout data analysis Predictive power
    • Model Use test causal theory Inference Null hypothesisnew theoryDevelop measurescompare theories Predictive performanceimprove theory Naïve/baselineassess relevance Over-fitting analysispredictability
    • Point #2Explanatory Predictive Power ≠ PowerCannot infer one from the other
    • out-of-sample interpretationp-values prediction accuracy PerformanceR2 costs Metrics Training vs.goodness-of-fit holdout type I,II errors over-fitting
    • Predictive Power Explanatory Power
    • The predictive power of anexplanatory model has importantscientific valueRelevance, reality check, predictability
    • In “explanatory” fieldsPrediction underappreciatedDistinction blurredUnfamiliar with predictivemodeling/assessment “While the value of scientific prediction… is beyond question… the inexact sciences [do not] have…the use of predictive expertise well in hand.” Helmer & Rescher, 1959
    • How does all this impact Scientific Research?
    • What can be done? acknowledgeincorporate prediction into curriculum
    • What happens in other fields? Epidemiology Engineering Life sciencesWhat about “predictive only”fields? http://goo.gl/gcjlN
    • Shmueli (2010), “To Explain or To Predict?”, Statistical ScienceShmueli & Koppius (2011), “Predictive analytics in IS research”, MISQ