Minghui Conference Cross-Validation Talk

  • 92 views
Uploaded on

 

More in: Business , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
92
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
1
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results . ...... Challenges with the Use of Cross-validation for Comparing Structured Models Wei Wang joint work with Andrew Gelman Department of Statistics, Columbia University April 13, 2013
  • 2. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Overview ...1 Multilevel Models ...2 Decision-eoretic Model Assessment Framework ...3 Data and Model ...4 Results
  • 3. Overview ...1 Multilevel Models ...2 Decision-eoretic Model Assessment Framework ...3 Data and Model ...4 Results . . . . . .
  • 4. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Bayesian Interpretation of Multilevel Models Multilevel Models have long been proposed to handle data with group structures, e.g., longitudinal study with multiple obs. for each participant, national survey with various demographic and geographic variables.
  • 5. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Bayesian Interpretation of Multilevel Models Multilevel Models have long been proposed to handle data with group structures, e.g., longitudinal study with multiple obs. for each participant, national survey with various demographic and geographic variables. From a Bayesian point of view, what Multilevel Modeling does is to partially pool the estimates through a prior, as opposed to doing separate analysis for each group (no pooling) or analyzing the data as if there is no group structure (complete pooling).
  • 6. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Multilevel Models for Deeply Nested Data Structure Our substantive interest is survey data with deeply nested structures resulting from various categorical demographic-geographic variables, e.g., state, income, education, ethnicity et al.
  • 7. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Multilevel Models for Deeply Nested Data Structure Our substantive interest is survey data with deeply nested structures resulting from various categorical demographic-geographic variables, e.g., state, income, education, ethnicity et al. One typical conundrum is how many interactions between those demographic-geographic variables to include in the model.
  • 8. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results ree Prototypes of Models In the simple case of two predictors, the three prototypes of models are shown below. e response yi is binary. Complete Pooling model Eyij ∼ g−1 (µij) µij = µ0 + ai + bj No Pooling model Eyij ∼ g−1 (µij) µij = µ0 + ai + bj + rij Partial Pooling model Eyij ∼ g−1 (µij) µij = µ0 + ai + bj + γij γ ∼ Φ(·)
  • 9. Overview ...1 Multilevel Models ...2 Decision-eoretic Model Assessment Framework ...3 Data and Model ...4 Results . . . . . .
  • 10. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results True model, Pseudo-true model and Actual Belief model We assume there is a true underlying model pt(·), from which the observations (both available and future observations) come from. While acknowledging the fact that the true distribution is never accessible, some researchers propose basing the discussion on a rich enough Actual Belief Model), which supposedly fully re ects the uncertainty of future data. (Bernardo and Smith 1994)
  • 11. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results M-closed, M-completed and M-open views In M-closed view, it is assumed that the true model is included in a enumerable collection of models, and the Actual Belief Model is the Bayesian Model Averaging predictive distribution. In M-completed view, the Actual Belief Model p(˜y|D, M) is considered to be the best available description of the uncertainty of future data. In M-open view, the correct speci cation of the Actual Belief Model is avoided and the strategy is to generate Monte Carlo samples from it, such as sample re-use methods.
  • 12. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results A Decision-eoretical Framework We de ne a loss function l(˜y, aM), which is the loss incurred from our inferential action aM, based on a model M, in face of future observation ˜y. en the predictive loss from our inferential action aM is Lp(pt , M, D, l) = Ept(˜y)l(˜y, aM) = ∫ l(˜y, aM)pt (˜y)d˜y It is oen convenient and theoretically desirable to use the whole posterior predictive distribution as aM and the log loss as l(·, ·). Lpred(pt,M,D)=Ept [− log p(˜y|D,M)]=− ∫ pt(˜y) log p(˜y|D,M)d˜y
  • 13. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Decision-eoretic Framework Cont'd For Model Selection task, from a pool of candidate models {Mk : k ∈ K}, we should select the model that minimizes the expected predictive loss. min Mk:k∈K − ∫ pt (˜y) log p(˜y|D, M)d˜y
  • 14. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Decision-eoretic Framework Cont'd For Model Selection task, from a pool of candidate models {Mk : k ∈ K}, we should select the model that minimizes the expected predictive loss. min Mk:k∈K − ∫ pt (˜y) log p(˜y|D, M)d˜y For Model Assessment task of a particular model M, we look at the Kullback-Leibler divergence between the true model and the posterior predictive distribution. We call it the predictive error. Err(pt , M, D) = − ∫ pt (˜y) log p(˜y|D, M)d˜y + ∫ pt (˜y) log pt (˜y)d˜y = KL(p(·|D, M); pt (·))
  • 15. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Estimating Expected Predictive Loss e central obstacle of getting the Expected Predicitve Loss is that we don't know the true distribution pt(·).
  • 16. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Estimating Expected Predictive Loss e central obstacle of getting the Expected Predicitve Loss is that we don't know the true distribution pt(·). A M-closed or M-completed view will substitute the true distribution with a reference distribution.
  • 17. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Estimating Expected Predictive Loss e central obstacle of getting the Expected Predicitve Loss is that we don't know the true distribution pt(·). A M-closed or M-completed view will substitute the true distribution with a reference distribution. From a M-open view, plug in available sample gives us the Training Loss, which has a downward bias, since we used the sample twice. Ltraining(M, D) = − 1 n n∑ i=1 log p(yi|D, M)
  • 18. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Estimating Expected Predictive Loss e central obstacle of getting the Expected Predicitve Loss is that we don't know the true distribution pt(·). A M-closed or M-completed view will substitute the true distribution with a reference distribution. From a M-open view, plug in available sample gives us the Training Loss, which has a downward bias, since we used the sample twice. Ltraining(M, D) = − 1 n n∑ i=1 log p(yi|D, M) ere exist two approaches to get an unbiased estimate of Predictive Loss: Bias Correction which leads to various Information Criteria; Held-out Practices which lead to Leave-one-out Cross Validation and k-fold Cross Validation.
  • 19. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Estimation Methods ere is a long list of variants of Information Criteria, AIC/BIC/DIC/TIC/NIC/WAIC et al.
  • 20. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Estimation Methods ere is a long list of variants of Information Criteria, AIC/BIC/DIC/TIC/NIC/WAIC et al. LOO Cross Validation has been shown to be asymptotically equivalent to AIC/WAIC. But the computational burden is huge. e Importance Sampling method introduces new problem of the reliability of the importance weights.
  • 21. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Estimation Methods ere is a long list of variants of Information Criteria, AIC/BIC/DIC/TIC/NIC/WAIC et al. LOO Cross Validation has been shown to be asymptotically equivalent to AIC/WAIC. But the computational burden is huge. e Importance Sampling method introduces new problem of the reliability of the importance weights. We are using the computationally convenient k-fold cross validation, in which the data set is randomly partitioned into k parts, and in each fold, one part is used as the testing set while the rest serve as the training set.
  • 22. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results k-fold Cross Validation en the k-fold Cross Validation estimate of the Predictive Loss is given by LCV(M, D) = − K∑ k=1 ∑ i∈testk log p(yi|Dk , M) = − N∑ i=1 log p(yi|D(i) , M)
  • 23. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results k-fold Cross Validation en the k-fold Cross Validation estimate of the Predictive Loss is given by LCV(M, D) = − K∑ k=1 ∑ i∈testk log p(yi|Dk , M) = − N∑ i=1 log p(yi|D(i) , M) To estimate the Predictive Error, we still need an estimate of the Entropy of the true distribution. We can use the training loss of the saturated model as a surrogate. − ∫ pt(˜y) log pt(˜y)d˜y = − 1 n n∑ i=1 log p(˙yi|D, Msaturated)
  • 24. Overview ...1 Multilevel Models ...2 Decision-eoretic Model Assessment Framework ...3 Data and Model ...4 Results . . . . . .
  • 25. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Data Set Cooperative Congressional Election Survey 2006 N=30,000 71 social and political response outcomes Deeply nested demographic variables, e.g., state, inc, edu, ethn, gender et al.
  • 26. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Data Set Cont'd Figure: A sample of the questions in CCES 2006 survey.
  • 27. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Model Setup For demonstration, we only consider two demographic variables, state and income, together with their interaction. e responses are all yes-no binary outcomes. Complete Pooling πj1j2 = logit−1 ( βstt j1 + βinc j2 ) No Pooling πj1j2 = logit−1 ( βstt j1 + βinc j2 + βstt*inc j1j2 ) Partial Pooling πj1j2 = logit−1 ( βstt j1 + βinc j2 + βstt*inc j1j2 ) βstt*inc j1j2 ∼ Φ(·)
  • 28. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results k-fold Cross Validation Estimate Due to computational constraints, we are using Maximum A Posteriori plug-in estimate instead of full Bayesian estimate. p(˜y|D, M) ≈ p(˜y|ˆπij(D), M)
  • 29. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results k-fold Cross Validation Estimate Due to computational constraints, we are using Maximum A Posteriori plug-in estimate instead of full Bayesian estimate. p(˜y|D, M) ≈ p(˜y|ˆπij(D), M) en under the aforementioned setup, the Cross Validation estimate of the Predictive Loss is LCV(M,D)=− 1 N ∑K k=1 ∑ l∈testk log p(yl|Dk,M) =− 1 N ∑K k=1 ∑ i,j[y testk ij log ˆπij(Dtraink )+(n testk ij −y testk ij ) log(1−ˆπij(Dtraink ))] =− 1 N ∑ i,j ∑K k=1[log ˆπij(Dtraink )y testk ij +log(1−ˆπij(Dtraink ))(n testk ij −y testk ij )] =− 1 N ∑ i,j [ log ˆπij(Dtrain)yij+log(1−ˆπij(Dtrain))(nij−yij) ] =− ∑ i,j nij N [ log ˆπij(Dtrain)˜πij+log(1−ˆπij(Dtrain))(1−˜πij) ]
  • 30. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Calibration of Improvement Let's suppose we only have one cell, with true proportion .4, and the good model gives a posterior estimate of log proportion at roughly log(0.41), and the lesser model gives a estimate of log(0.44) or log(0.38). en the Predictive Loss under the good model is −[.4 ∗ log(.41) + .6 ∗ log(.59)] = 0.67322, and under the two lesser models is −[.4 ∗ log(.44) + .6 ∗ log(.56)] = 0.67386 and −[.4 ∗ log(.38) + .6 ∗ log(.62)] = 0.67628. We can see the improvement of the Predictive Loss is between 0.0006 to 0.003. Also, the lower bound is given by −[.4 ∗ log(.4) + .6 ∗ log(.6)] = 0.67301, so the Predictive Error of the good model is about 0.0002.
  • 31. Overview ...1 Multilevel Models ...2 Decision-eoretic Model Assessment Framework ...3 Data and Model ...4 Results . . . . . .
  • 32. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Cross Validation Results on All Outcomes Responses (ordered by the lower bound) EstimatedPredictiveError 0.01 0.02 0.03 0.04 0.05 10 20 30 40 50 60 70 models complete pooling partial pooling no pooling Figure: Measure of t (Estimated Predictive Error) for all response outcomes in CCES 2006 survey data. Responses are ordered by the lower bound (training loss of the saturated model). No Pooling model gives very bad t, while Predictive Error of Partial Pooling is dominated by Complete Pooling, but the differences seem small.
  • 33. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Compare Partial Pooling and Complete Pooling In the previous gure, apparently No Pooling is doing very badly, but the differences between Partial Pooling and Complete Pooling seem small. We need to further calibrate them.
  • 34. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Compare Partial Pooling and Complete Pooling In the previous gure, apparently No Pooling is doing very badly, but the differences between Partial Pooling and Complete Pooling seem small. We need to further calibrate them. e summary of the differences between Partial Pooling and Complete Pooling for all the outcomes is Min. 1st Qu. Median Mean 3rd Qu. Max. -0.0003405 0.0001821 0.0003827 0.0006041 0.0005630 0.0053770
  • 35. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Compare Partial Pooling and Complete Pooling In the previous gure, apparently No Pooling is doing very badly, but the differences between Partial Pooling and Complete Pooling seem small. We need to further calibrate them. e summary of the differences between Partial Pooling and Complete Pooling for all the outcomes is Min. 1st Qu. Median Mean 3rd Qu. Max. -0.0003405 0.0001821 0.0003827 0.0006041 0.0005630 0.0053770 We can see that the improvement in terms of the Predictive Loss indeed corresponds to some meaningful improvement in prediction accuracy.
  • 36. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Simulations Based on Real Data We want to explore how the structure of the multilevel models affects the dynamics of the performance of different models. Speci cally, we are interested in total sample size and how balanced the cells are in terms of cell size.
  • 37. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Simulations Based on Real Data We want to explore how the structure of the multilevel models affects the dynamics of the performance of different models. Speci cally, we are interested in total sample size and how balanced the cells are in terms of cell size. We generated simulated data sets based on the real data set, i.e., we use the estimated from the Multilevel model t of the real data sets and enlarge the total sample size by 2, 3 and 4 times, either keeping the original relative proportions (highly unequal) of different cells or making the proportions roughly equal.
  • 38. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Simulation Results: Total Sample Size Responses (ordered by the lower bound) EstimatedPredictiveError 0.002 0.003 0.004 0.005 10 20 30 40 50 60 70 models complete pooling partial pooling no pooling Responses (ordered by the lower bound) EstimatedPredictiveError 0.0020 0.0025 0.0030 0.0035 0.0040 0.0045 10 20 30 40 50 60 70 models complete pooling partial pooling no pooling Responses (ordered by the lower bound) EstimatedPredictiveError 0.002 0.003 0.004 0.005 0.006 10 20 30 40 50 60 70 models complete pooling partial pooling no pooling Figure: Estimated Predictive Error of all response outcomes for ``augmented'' data sets.
  • 39. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Simulation Results: Total Sample Size on House Rep Vote sample size EstimatedPredictiveError 0.002 0.004 0.006 0.008 0.010 0.012 0.014 50000 100000 150000 200000 models complete pooling partial pooling no pooling Figure: Predictive Error of the three models as sample size grows. e outcome under consideration is the Republican vote in the House election.
  • 40. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Simulation Results: Balancedness of the Structure Responses (ordered by the lower bound) EstimatedPredictiveError 0.010 0.015 0.020 0.025 0.030 10 20 30 40 50 60 70 models complete pooling partial pooling no pooling Figure: Measure of t (Predictive Error) for all responses, ordered by lower bound. e data set is simulated from real data set, and has the same sample size in total as the real data set, but keeping all demographic-geographic cells balanced.
  • 41. . . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results Conclusions Cross-validation is not a very sensitive instrument in comparing multilevel models. Careful calibrations are needed for better understanding of the results. We also explored how different aspects of the data set structure affect the margin of improvement.