Analytic Methods and Issues in CER from Observational Data  CER Symposium, January 2012 Charles E. McCulloch Division of B...
Outline <ul><li>Some preliminary thoughts  </li></ul><ul><li>Motivating example </li></ul><ul><li>The good old days and wh...
Observational CER <ul><li>One of the objectives of CER is to use observational databases to answer effectiveness questions...
To keep in mind: <ul><li>“ When a selection procedure is biased, taking a large sample does not help.  It just repeats the...
Viewpoint <ul><li>Both randomized and observational studies have a role in CER. </li></ul><ul><li>How can we be as careful...
Example:  treatment of depression <ul><li>Does addition of an internet based cognitive behavioral component aid in treatme...
Example: Depression Data
The good old days <ul><li>The issue : </li></ul><ul><li>The treatment (the predictor of interest) is  confounded by  age (...
Example Treatment effect 1.4 (95% CI 0.6, 2.3)
Issues with regression adjustment <ul><li>Causal estimate is  defined by  a characteristic of the regression model. </li><...
Issues with regression adjustment (true model)
Issues with regression adjustment (fit interaction) Treatment effect 3.2 (95% CI -.6, 7.1) Previously: 1.4 (95% CI 0.6, 2.3)
Regression adjustment <ul><li>To fix the issue in this linear regression situation can just center age.  Use </li></ul><ul...
Another problem <ul><li>The old definition of confounding doesn’t really address causality.  The definition is completely ...
Message <ul><li>Define your causal estimand.  </li></ul><ul><li>Don’t let the statistical method define the target of  you...
<ul><li>Imagine a hypothetical experiment in which you get to observe each participant under both the treatment and contro...
Counterfactuals Outcome under Patient Group Ctl Trt Difference 1 Trt 4 8 4 2 Ctl 0 3 3 3 Trt 4 3 -1 4 Trt 3 4 1 … Ave in p...
Counterfactuals <ul><li>Counter – factual </li></ul><ul><li>Against – the truth </li></ul><ul><li>=Lying </li></ul><ul><li...
Potential Outcomes  – Average Causal Effect <ul><li>A reasonable target of inference is sometimes the average causal effec...
Marginal structural models <ul><li>Consider the averages of Y trt  and Y ctl  across the population (Ave(Y 1 ) and Ave(Y 0...
The new order and the way forward <ul><li>Confounding occurs when an estimation method does not estimate the causal estima...
Propensity scores:  <ul><li>Let prop(x) be the probability of being on treatment as a function of x, the variables that de...
Propensity scores: example
Propensity scores: theory <ul><li>Very important theoretical properties: </li></ul><ul><li>You only need to adjust for pro...
<ul><li>Mean values :  Ave(Trt)= 5.0,  </li></ul><ul><li>Ave(Ctl) = 4.6 </li></ul><ul><li>(Est=0.4, p=0.57, via t-test) </...
Propensity scores: practical issues <ul><li>Often divide propensity scores into quintiles in order to adjust.  </li></ul><...
Propensity scores: estimating the ACE (causal estimand) <ul><li>If the treatment effects vary within strata of propensity ...
Inverse probability weighting <ul><li>Instead of adjusting for the propensity score, we could use it to weight the partici...
IPW:  comments <ul><li>Don’t need quintiles </li></ul><ul><li>Can use with longitudinal studies and time-dependent confoun...
Regression estimation <ul><li>When taking a model-based approach we could get an estimate of the causal effect for each pe...
Regression estimation Predicted causal effect for a trt subject ACE estimated to be 1.2 (CI 0.4, 2.1) Predicted causal eff...
Regression estimation <ul><li>With sufficient data, can fit separate models for treatment and control groups.  </li></ul><...
Doubly robust estimators <ul><li>There are techniques that allow you to combine the features of propensity scores or IPW e...
Instrumental variables <ul><li>All of the techniques described previously depend on the difficult to verify and hard to ac...
Instrumental variables (IVs) <ul><li>An  instrument  is a variable which: </li></ul><ul><li>Is a determinant of the treatm...
IVs <ul><li>The classic example of an instrument is randomization to treatment, because it is 1) the primary determinant o...
IVs:  example of IVs <ul><li>Examples of instruments: </li></ul><ul><li>Effect of maternal smoking on birthweight, IV=stat...
IVs:  Causal estimand <ul><li>IVs do not estimate the ACE. </li></ul><ul><li>Instead they estimate the local average treat...
IVs:  Idea in linear regression <ul><li>Regress the treatment on the instrument, get the predicted values.  This is a func...
IVs:  drawbacks <ul><li>The main drawback of the instrumental variables approach is the leap of faith required to believe ...
Newer ideas <ul><li>Not much new under the sun. </li></ul><ul><li>Not too surprising since many of us have been doing CER ...
Recommendations <ul><li>Measure confounders or consider trying instrumental variables. </li></ul><ul><li>Regression estima...
We wary of methods promising easy causal estimation from observational databases <ul><li>Sensitivity analyses are almost a...
Contact: <ul><li>[email_address] </li></ul>Recommended articles: Average Causal Effects From Nonrandomized Studies: A Prac...
Upcoming SlideShare
Loading in …5
×

Analytic Methods and Issues in CER from Observational Data

1,194 views
1,008 views

Published on

UCSF researcher, Charles McCulloch, PhD, presents. View more related presentations and resources at http://accelerate.ucsf.edu/research/cer

Published in: Health & Medicine
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,194
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
22
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Explain carefully: Y axis is delta BDI. Red is treatment group, blue is control. Change in BDI values for blue and red are well interspersed. So average values are about the same. No treatment effect? Perhaps there is confounding by age. Older participants are less likely to adopt the internet intervention (very few red dots on right of graph). And older participants do better in either treatment group. And, for a given age the treatment group seems to do better. In the older age range, we have very few comparators. This lack of overlap makes it almost impossible to understand the relationship between age and delta BDI (among the older ages) in the treatment group. So difficult to model.
  • We fit a regression model. Being careful, neither nonlinearity in the relationship nor interactions are even close to statistically significant, so we stick with the basic model, which is what many people would fit solely. Stat sign treatment effect. Can interpret causally if we believe the only differences between the groups are their ages and model is correct.
  • Lack of overlap is a serious concern because it is difficult to determine if differences are causal or due to the incorrect model. Can see the lack of overlap here, but wouldn’t be able to in a more realistic example with many predictors.
  • Here is the true model from which I generated the (made-up) data. Observations: I fit the wrong model There is little or no treatment effect in the older age group Since most of the data for comparison is in the younger age group (where the treatment effect is larger), I overestimated the average effect across all ages. Note: data generating model was 4+(age-30)*3.5/40 for treatment and 2+(age-30)*6/40 for control. So average causal effect at a given age is the difference, or 3.875-age*0.0625. and the ACE is 3.875-\\bar(age)*0.0625 or 3.875-45.65*0.0625 = 1.02.
  • What!? Treatment effect is now estimated to be 3.2! And not statistically significant. What is going on? When you enter the interaction, the treatment effect is the estimated (extrapolated) value at age=0. Need to center variables so a value of 0 is a value of interest.
  • Statistically significant treatment effect.
  • I learned this in 1978 …. From a book written in 1959 …. Quoting a result from a 1923 paper by Jerzy Neyman.
  • Entries in table are change in BDI. So the first patient was in the Trt group and improved by 8 points. But if they had been in the Ctl group, they would also have improved, by only by 4 points. Patient 3 improved by 3 points. But if they had been in the Ctl group they would have done better and improved by 4 points. Notes: Causal effect need not be the same in each patient. Need not even be the same direction. Could be different in different subgroups of the population, e.g., women, or those who would choose the treatment.
  • Statisticians have enough bad press without such technical definitions.
  • Generic – treatment and control. Marginal – because it is the average effect. Structural – because it is causal.
  • Assume likelihood of choosing the intervention only depends on whether age&lt;50 or not.
  • Doesn’t tell you how to adjust.
  • Perhaps effect isn’t as big among the elderly?
  • If you want an estimate of ACE in a subgroup, use weights for that subgroup.
  • No longer stat significant. Counting one person ten times introduces significant amounts of variability. Small errors in that one observation get magnified.
  • The filled circles represent the (model-based) predicted value for a participant in the treatment or control groups. The open circles represent the (model-based) predicted potential outcome for the participants. So, for example, a participant in the treatment group has a predicted potential outcome in the control group. And vice versa. Now have an estimate of the individual causal effect, which you can average to get ACE. Can either compare predicted potential outcome with predicted value or with the real value.
  • This used to be restricted to numerical, approximately cont outcomes, but modern software allows a wide variety of outcome and predictor types.
  • Cigarette tax may influence smoking, but no direct effect on birthweight. Issues: cigarette tax may not strongly predict smoking. Could cigarette tax be associated with other omitted variables? Suppose localities with high cigarette taxes also have strong maternal educational programs? These could lead to reduced smoking and (through other pathways, e.g., nutrition) better outcomes.
  • Issue: cigarette tax may not strongly predict smoking.
  • Analytic Methods and Issues in CER from Observational Data

    1. 1. Analytic Methods and Issues in CER from Observational Data CER Symposium, January 2012 Charles E. McCulloch Division of Biostatistics University of California, San Francisco
    2. 2. Outline <ul><li>Some preliminary thoughts </li></ul><ul><li>Motivating example </li></ul><ul><li>The good old days and why they weren’t so good. </li></ul><ul><li>Some statistical methods </li></ul><ul><ul><li>Potential outcomes and Marginal Structural Models </li></ul></ul><ul><ul><li>Propensity scores </li></ul></ul><ul><ul><li>Inverse probability weighting </li></ul></ul><ul><ul><li>Regression estimation </li></ul></ul><ul><ul><li>Instrumental variables </li></ul></ul><ul><li>Some newer ideas </li></ul><ul><li>Recommendations </li></ul>
    3. 3. Observational CER <ul><li>One of the objectives of CER is to use observational databases to answer effectiveness questions (which are invariably causal). </li></ul><ul><li>Basically trading what might be highly selected data that is subject to confounding for </li></ul><ul><li>A wealth of data available easily and cheaply, e.g., a clinical database. </li></ul>
    4. 4. To keep in mind: <ul><li>“ When a selection procedure is biased, taking a large sample does not help. It just repeats the basic mistake on a larger scale.” (a passage boxed for emphasis in the Stats 101 text by Freedman, et al.) </li></ul><ul><li>More generally: what can large samples overcome, if anything? </li></ul><ul><li>An under-appreciated form of selection bias in clinical databases is that the availability of data may be driven by unobserved outcomes or responses to treatment. </li></ul><ul><li>Put together, using a clinical database may be one of the least good ways to estimate causal effects. </li></ul>
    5. 5. Viewpoint <ul><li>Both randomized and observational studies have a role in CER. </li></ul><ul><li>How can we be as careful as possible when analyzing and interpreting the results of observational studies and, in particular </li></ul><ul><li>What role can statistical analysis methods play in elucidating causal effects? </li></ul><ul><li>Goal: explain some of the newer approaches and why needed as well as their limitations. Focus on conceptual. </li></ul>
    6. 6. Example: treatment of depression <ul><li>Does addition of an internet based cognitive behavioral component aid in treatment of depression? </li></ul><ul><li>Outcome = change in Beck Depression Inventory. </li></ul><ul><li>Control group treatment is team care approach, which has proven especially effective in the elderly. </li></ul><ul><li>Observational study based on clinical data. </li></ul><ul><li>So CER! </li></ul>
    7. 7. Example: Depression Data
    8. 8. The good old days <ul><li>The issue : </li></ul><ul><li>The treatment (the predictor of interest) is confounded by age (another predictor) since </li></ul><ul><li>a) age is associated with the outcome (change in BDI) and </li></ul><ul><li>b) age is associated with treatment </li></ul><ul><li>The solution : </li></ul><ul><li>Adjust for age in a multipredictor model </li></ul>
    9. 9. Example Treatment effect 1.4 (95% CI 0.6, 2.3)
    10. 10. Issues with regression adjustment <ul><li>Causal estimate is defined by a characteristic of the regression model. </li></ul><ul><li>What if model is wrong (linearity/interaction)? </li></ul><ul><li>How will we know? </li></ul><ul><li>(Lack of overlap/extrapolation.) </li></ul><ul><li>Lack of comparison group for older ages (plenty of controls, not many treated). </li></ul>
    11. 11. Issues with regression adjustment (true model)
    12. 12. Issues with regression adjustment (fit interaction) Treatment effect 3.2 (95% CI -.6, 7.1) Previously: 1.4 (95% CI 0.6, 2.3)
    13. 13. Regression adjustment <ul><li>To fix the issue in this linear regression situation can just center age. Use </li></ul><ul><li>cage = age-Ave(age) = age-42.7 </li></ul><ul><li>as a predictor instead of age in the model. </li></ul><ul><li>Then the treatment effect is estimated to be 1.4 (95% CI 0.5, 2.2). </li></ul><ul><li>But this points out the danger in using a statistical model to define the causal effect. </li></ul>
    14. 14. Another problem <ul><li>The old definition of confounding doesn’t really address causality. The definition is completely data-based. No information about the nature of the variables is used. </li></ul><ul><li>What if the “other” predictor is a mediator? For example, suppose the variable we adjust for is perception of stress, instead of age. (With those having higher stress less likely to use the additional internet therapy). </li></ul><ul><li>Then conventional wisdom is we shouldn’t adjust for it. </li></ul>
    15. 15. Message <ul><li>Define your causal estimand. </li></ul><ul><li>Don’t let the statistical method define the target of your interest. </li></ul><ul><li>At the very least, be cognizant of the causal target of a statistical procedure. </li></ul>
    16. 16. <ul><li>Imagine a hypothetical experiment in which you get to observe each participant under both the treatment and control conditions holding all else the same: Y trt , Y ctl . Like a perfect cross-over experiment. </li></ul><ul><li>Often, we only get to observe one of Y trt or Y ctl , depending on whether the participant is in the treatment or control condition. </li></ul>Counterfactuals
    17. 17. Counterfactuals Outcome under Patient Group Ctl Trt Difference 1 Trt 4 8 4 2 Ctl 0 3 3 3 Trt 4 3 -1 4 Trt 3 4 1 … Ave in popl’n 1.02 = Ave Causal Effect Outcome under Patient Group Ctl Trt Difference 1 Trt 4 8 4 2 Ctl 0 3 3 3 Trt 4 3 -1 4 Trt 3 4 1 … Ave in popl’n 1.02 = Average Causal Effect
    18. 18. Counterfactuals <ul><li>Counter – factual </li></ul><ul><li>Against – the truth </li></ul><ul><li>=Lying </li></ul><ul><li>Better? “Potential outcomes framework” </li></ul><ul><li>or “Hypothetical outcomes framework” </li></ul>
    19. 19. Potential Outcomes – Average Causal Effect <ul><li>A reasonable target of inference is sometimes the average causal effect (ACE): the average of the individual causal effects across the entire population. </li></ul><ul><li>Or perhaps the ACE in a subset of the population. E.g., the causal effect of a smoking cessation program among smokers. </li></ul>
    20. 20. Marginal structural models <ul><li>Consider the averages of Y trt and Y ctl across the population (Ave(Y 1 ) and Ave(Y 0 )) with A=1 indicating being assigned to the treatment and 0 otherwise. </li></ul><ul><li>A causal model: </li></ul><ul><li>Ave(Y A ) = Ave(Y 0 ) + [Ave(Y 1 )-Ave(Y 0 )]A </li></ul><ul><li>= Ave(Y 0 ) + [ACE]A </li></ul><ul><li>=  +  A </li></ul>
    21. 21. The new order and the way forward <ul><li>Confounding occurs when an estimation method does not estimate the causal estimand, e.g., the average causal effect. </li></ul><ul><li>The 800lb gorilla when trying to conduct CER from observational (especially clinical) databases is dealing with confounding. </li></ul><ul><li>How can we estimate causal effects while doing our best to eliminate confounding? </li></ul>
    22. 22. Propensity scores: <ul><li>Let prop(x) be the probability of being on treatment as a function of x, the variables that determine treatment. </li></ul><ul><li>In our example, suppose temporarily that the probability of selecting treatment only depends on age. </li></ul>
    23. 23. Propensity scores: example
    24. 24. Propensity scores: theory <ul><li>Very important theoretical properties: </li></ul><ul><li>You only need to adjust for prop(x). </li></ul><ul><li>Consider individuals with the same value of prop(x). The ones receiving treatment have the same distribution of x as do those who do not. So complete overlap in the variables x is guaranteed and extrapolation is not a problem. </li></ul>
    25. 25. <ul><li>Mean values : Ave(Trt)= 5.0, </li></ul><ul><li>Ave(Ctl) = 4.6 </li></ul><ul><li>(Est=0.4, p=0.57, via t-test) </li></ul><ul><li>Within propensity score categories </li></ul><ul><li>Prop=1/2: Ave(Trt)=4.8, Ave(Ctl)=3.3, Est=1.5 </li></ul><ul><li>Prop=1/10: Ave(Trt)=6.8, Ave(Ctl)=6.0, Est=0.8 </li></ul><ul><li> (Est=1.4, CI [0.4, 2.3], adj for propen) </li></ul>Propensity scores: Mean values : Ave(Trt)= 5.0, Ave(Ctl) = 4.6 (Est=0.4, p=0.57, via t-test) Within propensity score categories Prop=1/2: Ave(Trt)=4.8, Ave(Ctl)=3.3, Est=1.5 Prop=1/10: Ave(Trt)=6.8, Ave(Ctl)=6.0, Est=0.8
    26. 26. Propensity scores: practical issues <ul><li>Often divide propensity scores into quintiles in order to adjust. </li></ul><ul><li>What if not all the variables that determine treatment are measured? Or included correctly in the model? </li></ul><ul><li>Suggests being more inclusive with both predictors and interactions. </li></ul><ul><li>And to handle continuous predictors with flexible functional forms. </li></ul><ul><li>So something that is easier with large databases. </li></ul>
    27. 27. Propensity scores: estimating the ACE (causal estimand) <ul><li>If the treatment effects vary within strata of propensity scores, then you need to weight the estimates according to the overall sample: </li></ul><ul><li>Prop=1/2: Est = 1.5, N = 20, 64.5% of sample </li></ul><ul><li>Prop=1/10: Est = 0.8, N = 11, 35.5% of sample </li></ul><ul><li>Estimated ACE = 0.645*1.5 + 0.355*0.8 = 1.25 </li></ul><ul><li>Can weight to other causal estimands. </li></ul>
    28. 28. Inverse probability weighting <ul><li>Instead of adjusting for the propensity score, we could use it to weight the participants. </li></ul><ul><li>E.g., if a participant is in the treatment group and has a propensity of 1/10, then we would count that person 10 times. In that way we inflate the contribution of that participant to balance the groups. </li></ul><ul><li>For our data: Trt estimate = 1.2 (CI -0.2, 2.5) </li></ul>
    29. 29. IPW: comments <ul><li>Don’t need quintiles </li></ul><ul><li>Can use with longitudinal studies and time-dependent confounding. </li></ul><ul><li>Small probabilities (large weights) cause instability. This leads to subjective rules to deal with large weights. </li></ul>
    30. 30. Regression estimation <ul><li>When taking a model-based approach we could get an estimate of the causal effect for each person. Then calculate the average causal effect. </li></ul><ul><li>This is especially useful when the regression model is not a linear regression model (e.g., a logistic model). This is because the model estimate based on the “average” subject is not the same as the average of the individual subjects’ estimates. </li></ul>
    31. 31. Regression estimation Predicted causal effect for a trt subject ACE estimated to be 1.2 (CI 0.4, 2.1) Predicted causal effect for a ctl subject
    32. 32. Regression estimation <ul><li>With sufficient data, can fit separate models for treatment and control groups. </li></ul><ul><li>Also called G-estimation. Well known by economists as marginal estimates, built into the current version of Stata. Can get marginal estimates for subpopulations, e.g., causal effect in users of the intervention or in younger participants. </li></ul><ul><li>But, average of conditional models may not be of scientific interest. </li></ul>
    33. 33. Doubly robust estimators <ul><li>There are techniques that allow you to combine the features of propensity scores or IPW estimators and regression estimation. </li></ul><ul><li>Can, e.g., adjust for propensity score quintiles and also use regression estimation. </li></ul><ul><li>Or use IPW and regression methods. </li></ul><ul><li>Gives some protection against getting either the propensity scores or regression model wrong. </li></ul>
    34. 34. Instrumental variables <ul><li>All of the techniques described previously depend on the difficult to verify and hard to achieve assumption that all the variables needed to control for confounding have been measured and properly incorporated in the models. </li></ul><ul><li>This is especially true once we start trying to mine clinical databases for CER purposes. </li></ul><ul><li>The technique of instrumental variables avoids this assumption. </li></ul>
    35. 35. Instrumental variables (IVs) <ul><li>An instrument is a variable which: </li></ul><ul><li>Is a determinant of the treatment. </li></ul><ul><li>Is uncorrelated with any variables that jointly determine treatment and the outcome. </li></ul><ul><li>The entire effect of the instrument is mediated through treatment. </li></ul>
    36. 36. IVs <ul><li>The classic example of an instrument is randomization to treatment, because it is 1) the primary determinant of being on treatment, 2) randomization guarantees lack of correlation with confounders, and 3) the randomization itself is often unrelated to treatment beyond assignment to treatment. </li></ul><ul><li>By using the instrument it is possible to get estimates of the causal effect of treatment. </li></ul><ul><li>Angrist: “Intuitively, instrumental variables solve the omitted (confounders) problem by using only part of the variability in (treatment), specifically, a part that is uncorrelated with the omitted variables - to estimate the relationship between (treatment) and (outcome). </li></ul>
    37. 37. IVs: example of IVs <ul><li>Examples of instruments: </li></ul><ul><li>Effect of maternal smoking on birthweight, IV=state cigarette tax. </li></ul><ul><li>Effect of surgery on health outcomes, IV=distance to care center. </li></ul><ul><li>“ Natural experiments” </li></ul>
    38. 38. IVs: Causal estimand <ul><li>IVs do not estimate the ACE. </li></ul><ul><li>Instead they estimate the local average treatment effect (LATE): the average treatment effect among those who can be induced to change treatment with a change in the instrument. </li></ul><ul><li>For example, in the maternal smoking example, women for whom changing the tax could induce a change in smoking behavior. </li></ul>
    39. 39. IVs: Idea in linear regression <ul><li>Regress the treatment on the instrument, get the predicted values. This is a function of the instrument and hence represents a “portion of the treatment effect unconfounded with treatment” </li></ul><ul><li>Regress the outcome on the predicted treatment effect to get an estimate of the causal effect. </li></ul>
    40. 40. IVs: drawbacks <ul><li>The main drawback of the instrumental variables approach is the leap of faith required to believe the assumptions, which are not verifiable in practice. </li></ul><ul><li>If an instrumental variable is only weakly associated with treatment, then the estimate based on IVs may be quite imprecise. </li></ul>
    41. 41. Newer ideas <ul><li>Not much new under the sun. </li></ul><ul><li>Not too surprising since many of us have been doing CER for decades. </li></ul><ul><li>A few new ideas, such as Propensity score calibration: Suppose you want to do a propensity score analysis but your clinical database is short on measured confounders. Build your propensity score model in a separate cohort (need not have outcomes) and figure out the degree of missclassification and its consequence on the analysis. </li></ul>
    42. 42. Recommendations <ul><li>Measure confounders or consider trying instrumental variables. </li></ul><ul><li>Regression estimation/G-estimation is a good idea. </li></ul><ul><li>If using multivariate adjustment </li></ul><ul><ul><li>Be liberal in including predictors, interactions and nonlinear relationships. </li></ul></ul><ul><ul><li>Center your variables. </li></ul></ul><ul><li>Consider using propensity scores in strata, perhaps in addition to one of the above two methods. </li></ul><ul><li>Be cautious with use of IPW with small probabilities. </li></ul><ul><li>It’s the confounding. Doh! </li></ul>
    43. 43. We wary of methods promising easy causal estimation from observational databases <ul><li>Sensitivity analyses are almost always a good idea (different methods, degree of confounding needed to overturn results). </li></ul>
    44. 44. Contact: <ul><li>[email_address] </li></ul>Recommended articles: Average Causal Effects From Nonrandomized Studies: A Practical Guide and Simulated Example. JL Schafer, J Kang. Psychological Methods 2008,279–313. (somewhat technical but still readable) Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments. JD Angrist, AB Krueger. J Econ Perspectives , 2001, 69-85.

    ×