Can Post-Stratification Adjustments Do Enough to Reduce Bias in Telephone Sur...
metrics_game_paper
1. Metrics Game Submission
Enoch Chan
Neil Cho
Yuan Fei
Jessica Koh
May 15, 2015
1 Introduction
Cawley and Meyerhoefer (2012) use instrumental variables to estimate the impact of obesity on
medical costs. Their structural model of medical spending is a two-part model that looks at the
marginal effect of obesity on healthcare costs. The study uses the eldest child’s Body Mass Index
(BMI) as an instrumental variable for the individual’s BMI due to the inherent genetic link to
correct for endogeneity and attenuation bias. In this report we are able to replicate both their
main non-IV and IV results while further extending the results of their baseline model using newer
data. Additionally, we address possible limitations of endogeneity in the instrument and issues of
underestimation due to the selective nature of our sample.
2 Data
Cawley and Meyerhoefer (C&M hereafter) use Medical Expenditure Panel Survey (MEPS) data
from 2000 to 2005 to estimate obesity’s implications. They limit their sample to adults between the
ages of 20 to 64 with biological children between the ages of 11 and 20, excluding pregnant women
and possible outliers. We apply the same criteria to the publicly-available MEPS data in order to
first replicate their findings. We then expand our sample in order to improve upon existing results
in the literature. In particular, we look at healthcare cost data from 2000 to 2012 normalized to
2005 dollars.
1
2. 3 Model Replication
3.1 Model Description and Implementation
The two-part model of C&M consists of: 1) a logit model estimating the probability that a respondent
spends positive medical expenditure, and 2) a Gamma GLM with a log link estimating the amount
that a respondent spends conditional on spending a nonzero amount. The independent variable
of interest is either the BMI level or a dummy variable indicating obesity. They instrument this
variable with the BMI level or obesity indicator of the adult’s eldest child. They control for various
factors, specifically: gender, ethnicity, age (indicator variables for whether age in years is 20-34,
35-44, 45-54, 55-64), education level, census region (northeast, midwest, south, or west), MSA status,
employment, household composition (average age of all household members) and fixed effects for
year.
We are able to able implement a two-part model as above. As in C&M, we check the validity of the
instruments by using an F-test, and we find that all instruments are valid. In order to replicate
their instrumental variable results, we first regressed our endogenous variable (respondents’ BMI or
obesity) on the instruments and controls. We then used the predicted value of respondents’ BMI or
obesity from this regression as our independent variable of interest in the two-part model. We use
robust standard errors for the regression and cluster them by the primary sampling unit on the
family level.
C&M include a second-degree polynomial of their instrument (eldest child’s BMI) in their two-part
regression. We incorporate this nonlinearity through the mechanics of our first regression, i.e. we
regress respondents’ BMI on a second-degree polynomial of eldest child’s BMI and then use the
predicted value as an independent variable in the two-part model. Our results are summarized in
Tables 1 and 2.
3.2 Replication Results
Table 1 reports the point estimates from both stages (Logit and GLM) of our replication using
BMI and an indicator of obesity as alternative measures of obesity. Both IV and non-IV results are
reported.
Table 2 reports the marginal effect estimates from both stages of our replication with both measures
of obesity. The marginal effect is defined as the estimated change in medical expenditure per unit
change in BMI/obesity. The marginal effect for BMI and obesity is 125.696 (40.306) and 2062.187
(732.5577) for IV estimation, and 68.848 (9.540) and 647.113 (116.860) for non-IV estimation
(standard errors in parentheses). These results are very comparable to C&M’s results. The
2
3. corresponding marginal effects that they report are 149 (35) and 2741 (745) for BMI/obesity IV
estimation, and 49 (9) and 656 (113) for BMI/obesity non-IV estimation.
Table 1: Point Estimates from Two-Part Model (2000 to 2005 data)
IV (total expenditure) Non-IV (total expenditure)
(1) (2) (3) (4) (5) (6) (7) (8)
VARIABLES logit glm logit glm logit glm logit glm
BMI 0.000 0.037*** 0.025*** 0.016***
(0.011) (0.013) (0.002) (0.003)
Obesity 0.136 0.600*** 0.240*** 0.152***
(0.190) (0.224) (0.030) (0.036)
Observations 40,472 40,472 40,472 40,472 40,232 40,232 40,232 40,232
Robust standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1
Table 2: Marginal Effect Estimates from Two-Part Model (2000 to 2005 data)
IV (total expenditure) Non-IV (total expenditure)
VARIABLES (1) (2) (3) (4)
BMI 125.696** 68.848***
(40.306) (9.540)
Obesity 2062.187* 647.113***
(732.5577) (116.860)
Observations 40,472 40,472 40,232 40,232
Robust standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1
3.3 Limitations
C&M acknowledge that their model has several limitations. The first is associated with the nature of
the instrumental variables approach. There is no definite way to check the validity of the instrument.
The assumptions they make in justifying the use of BMI of the oldest child as an instrument are
the following: the weight of a biological family member is strongly correlated with the weight of a
respondent, but uncorrelated with the residual medical care costs. However, the authors acknowledge
that it is possible that the genes that influence weight may also affect other unknown factors that
could affect residual medical care costs.
The second limitation to this model is a data problem. The data on BMI and medical care costs is
available only for a single interval of time for all observations. Access to longitudinal data would
3
4. increase our understanding of the long-term effects of obesity on medical costs. Other limitations
include possible mismeasurement in the data and the generalizability of the limited sample, namely,
in this case due to the nature of the instrument, we restrict our sample to adults with biological
children. The true effect of obesity in the population may in fact be an underestimate due to the
positive health spillovers resulting from family responsibility.
4 Modifications
Given our successful replication, we proceed to propose several modifications to the model. We
decide that C&M’s empirical two-part model is the correct way of conducting estimation for our
data as opposed to other models that also address the problem of having a large proportion of zero
values in our data. The reason why we agree with C&M about the validity of the two-part model
is that health expenditures are actual outcomes (i.e. true zeros) because in such cases no money
for health care is expended. This is in contrast to potential outcomes, where the zeros we observe
in the data are “missing values”. The classic example of this is with non-working women. Their
observed wages are zero, but this is only because they choose not to select into the workforce. If
they did, they would earn a positive wage that we do not observe (i.e. the zeros we observe are not
“true zeros”). Therefore, various papers like Frondel and Vance (2012) have argued that in the case
of true zeros, a two-part model is more appropriate.
The first basic modification we implement is to expand the time horizon we analyze since we have a
larger dataset. We include data from 2000 to 2012. We find that the marginal effect of obesity on
medical expenditure has increased relative the smaller time horizon. Our results are summarized in
Table 3. This model constitutes our baseline estimate of the marginal effect of obesity on healthcare
costs.
Table 3: Marginal Effect Estimates from Two-Part Model (2000 to 2012 data)
IV (total expenditure) Non-IV (total expenditure)
VARIABLES (1) (2) (3) (4)
BMI 169.425*** 85.616***
(30.313) (7.234)
Obesity 3297.591*** 982.399***
(558.406) (97.053)
Observations 88,880 88,880 88,880 88,880
Robust standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1
4
5. 4.1 Further Controls
One improvement that we make is to add two control variables to account for the general health
status and insurance coverage of each individual. As previously mentioned, the existing baseline
model inspired by C&M does not take into account other possible genetic factors that may affect
residual health expenditure through unknown genetic mechanisms not directly related to obesity.
Specifically, although it is difficult to test given the expertise required in biology, genes that affect
weight might affect other health-related factors that then affect medical expenditure. In order to
examine if these unknown factors influence the results from the baseline model of the paper, we
include two more control variables into our model. One measures respondents’ perception of his or
her own health, and the other indicates the status of respondents’ insurance coverage. We definitely
want to filter out the incremental health expense caused by diseases that are not derived or related
to obesity but still have genetic causes. Health perception allows us to take advantage of one’s
self-knowledge of his/her health to control for such non-obesity related diseases. If one thinks he/she
is less healthy, it would be more likely that the person is suffering from other diseases that are not
related to obesity, which may incur a medical expense that should be controlled for. But using
only health perception does not completely filter out the effect of non-obesity related diseases since
obese people will tend to consider themselves unhealthy. Insurance coverage could play a role by
additionally controlling for the non-obesity related diseases that are unaccounted for by health
perception. Insurance companies will be more likely to reject advanced insurance coverage to obese
people than people with latent diseases with genetic causes. Hence, together health perception and
insurance coverage could help sharpen the precision of our estimation. Our results are summarized
in Table 4.
Table 4: Marginal Effect Estimates from Two-Part Model with Further Controls (2000 to 2012 data)
IV (total expenditure) Non-IV (total expenditure)
VARIABLES (1) (2) (3) (4)
BMI 93.136*** 35.979***
(29.527) (6.676)
Obesity 1956.029*** 293.408***
(552.047) (90.678)
Observations 88,880 88,880 88,880 88,880
Robust standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1
As seen from the significant decrease in the marginal effect of BMI/obesity on healthcare expenditure
after including additional controls, we believe that the control variables that we incorporated are
valid.
5
6. 4.2 Investigating Time Heterogeneity
Additionally, we observe that our estimates of the marginal effects of obesity when we expand the
sample from the 2000-2005 time period to the 2000-2012 time period increase quite a bit (before
we implement further controls). This difference is seen by comparing Table 3 to Table 2, as these
two regressions contain the same controls and differ only in sample size. Specifically, a one-unit
increase in BMI causes a $169 increase in medical expenditure compared to $125 in the smaller
sample, and “becoming obese” as measured by the obesity indicator causes a $3,297 increase in
medical expenditure compared to $2,062 in the smaller sample. Thus, we proceed by investigating
whether there exists time heterogeneity. In other words, we investigate whether the marginal effect
of BMI/obesity on medical expenditure, accounting for all our controls including the ones above,
has changed over time.
We implement this investigation by interacting the year fixed effects with the instrumented BMI/obe-
sity variable for both parts of the two-part model. We leave out the dummy for year 2000 to avoid
multicollinearity. We then examine the marginal effects of the interaction terms. We discover that
none of the interaction terms show significant marginal effects across both measures of obesity,
except for the year 2012. According to Frondel and Vance (2012), the significance of the interaction
term is sufficient to establish that there is heterogeneity. However, the actual magnitude and sign of
the effect is hard to determine. Obamacare could be the one potential source of this heterogeneity
given that it was implemented in 2012. We can draw no such inferences from our findings thus far,
and the causes of such time heterogeneity may be a fruitful topic for further research.
4.3 Final Model Results and Discussion
Following these investigations, our final model incorporates two control variables with C&M’s
instrumented two-part model. These results are presented in Table 4.
Our results conclude that a one-unit increase in BMI causes a $93 increase in medical expenditure,
and being obese causes a $1,956 increase in medical expenditure. They suggest that it is possible
that the results from C&M are overestimated. C&M acknowledge that their estimates are higher
than previous research has shown. Our modifications result in an intermediate value between C&M’s
paper and previous literature.
6
7. 5 Conclusion
This report attempts to improve the econometric model suggested by C&M to better measure the
causal impact of obesity on medical costs. Their model adopts an instrumental variables approach
that uses BMI level and obesity status of the oldest child in each household as an instrument for
each respondent’s corresponding BMI/obesity level. It uses a two-part model involving a logit and
gamma GLM with a log link to estimate the marginal effects of BMI level and obesity on health
expenditure. Although they argue that their choice of instrument satisfies validity and exogeneity,
we believe that their model does not take into account the possibility that the genes that affect
weight affect other health-related factors. In order to address this issue, we add new control variables
that reflect the health status and insurance coverage of each respondent. We conclude that the
marginal effect of obesity on medical expenditure decreases from $3,297 to $1,956 after the inclusion
of these variables.
7
8. 6 References
Cawley, John and Meyerhoefer, Chad. “The medical care costs of obesity: An instrumental
variables approach.” Journal of Health Economics, 2012, 31(1), pp.219-230.
Frondel, Manuel and Vance, Colin. “On Interaction Effects: The Case of Heckit and Two-Part
Models.” Ruhr Economic Papers, 2012, 309(1), pp.1-21.
8