1. Publicly Available Secondary Data
Sources: An Overview and an
Example from the Medical
Expenditure Panel Survey (MEPS)
Marion R. Sills, MD, MPH
Department of Pediatrics
University of Colorado Health Sciences Center
2. Goals
• How do I find secondary data sets?
• Once I find one, how do I know it’s right for
me and my research question?
• Example of a MEPS analysis
3. Goals
• How do I find secondary data sets?
• Once I find one, how do I know it’s right for
me and my research question?
• Example of a MEPS analysis
4. Goals
• How do I find secondary data sets?
• Once I find one, how do I know it’s right for
me and my research question?
– What types of questions was it designed to
answer?
– What data elements are available?
– How can I figure out if those data elements
are useful to me?
• Example of a HCUP and a MEPS analysis
5. Health Data Online
• Agency for Healthcare Research and Quality (
• CDC WONDER
• HRSA
• National Center for Health Statistics (NCHS)
• Partners in Information Access for the Public H
6. Two Examples
• HCUP (KID) used for background
statement in a manuscript
• MEPS used for a full analysis for a
manuscript
7. HCUP--KID
• The only all-payer inpatient care database
for children in the United States
• Contains data from 2-3 million pediatric
hospital discharges
• Online data available via HCUPnet
8. HCUP--KID
• Question: What is the utilization of
inpatient resources for asthma among
children?
• Use: A background statement for a grant,
demonstrating why the proposed study is
important
10. MEPS Analysis Example
• Question: What is the association between
parental mental health (MH) status and
• pediatric healthcare utilization patterns
• access to care measures
• Use: A manuscript describing this
association
11. Background: MEPS
• Conducted by
• Agency for Healthcare Research and Quality (AHR
• National Center for Health Statistics (NCHS)
• MEPS sample drawn from NCHS’s National Health
Interview Study (NHIS)
• Started data collection in 1996
14. Background: MEPS
• MEPS
• gives info about US health care use and costs
• improves accuracy of economic projections
• Who has used MEPS data:
• policymakers
• health care administrators
• businesses
• researchers
15. Background: MEPS
• Questions it was designed to address
Growth of
managed
care
Changes in
private health
insurance
Changes in the healthcare delivery system
Kinds,
amount, and
cost of health
care
16. Background: MEPS
• Questions it was designed to address
Growth of
managed
care
Changes in
private health
insurance
Changes in the healthcare delivery system
Kinds,
amount, and
cost of health
care
Who benefits,
who bears the costs
17. Background: MEPS
• MEPS collects data on
• the specific health services US residents use
• how frequently they use them
• the cost of these services
• how they are paid for
• the cost, scope, and breadth of health
insurance held by US population
18. Background: MEPS
• MEPS unique for
• the degree of detail in its data
• its ability to link data:
health services
spending and
health
insurance
demographic,
employment, economic,
health status, and other
characteristics
19. Questions MEPS Can/Cannot
Answer
• CAN
• How do health care
use, insurance, and
spending vary for
different groups?
• How do access to care
and satisfaction with
care vary for different
groups?
• CANNOT
• What are estimates of
disease, prevalence of
health conditions, or
mortality/morbidity?
• What is the frequency
of treatments or costs
associated with
specific treatments?
20.
21. Structure of MEPS: HC
• From nationally representative sample of
households
• Unit of analysis can be:
• Family/household
• Individual
• Healthcare encounter
24. Structure of MEPS: HC
• Household level
– includes respondents
whether or not they
seek health care
– respondent report of
health related
experiences
25. Structure of MEPS HC: N
Year MEPS HC Population Size
1996 21,571
1997 32,636
1998 22,953
1999 22,365
2000 22,839
2001 33,556
2002 39,165
2003 34,215
2004 34,403
2005 33,961
26. Weighting
• Sample based on complex, stratified,
multi-stage, probability design
• Estimates need to be weighted to reflect
sample design and survey non-response
– If unweighted, results are biased
• Use appropriate methods to calculate
standard error to allow for complex design
– If not, standard error is underestimated
28. Weighting
• Basic software procedures assume simple random
sampling (SRS)
– MEPS not SRS
– Point estimates correct (if weighted)
– Standard errors usually too small
• Software to account for complex design
– SUDAAN (stand-alone or callable within SAS)
– STATA (svy commands)
– SAS (8.2 or later) (survey procedures)
– SPSS (complex survey features in 13.0 or later)
29. Example: Using the 2002 MEPS full year consolidated file (PUF HC-070) as the analytic file, the following statements will produce accurate estimates of the
average total expenditures in 2002 for children younger than 18 years of age ($1,085.82) and the corresponding standard error ($70.28).
SAS
proc surveymeans;
stratum varstr;
cluster varpsu;
weight perwt02f;
var totexp02;
domain agegroup;
Note: The domain statement in this example will generate estimates for all categories of the variable agegroup (a hypothetical constructed analytic variable
where the youngest group is children under 18). There is no option within the surveymeans procedure to select only a specific population subgroup (e.g.,
agegroup=1).
SUDAAN
proc descript filetype=sas design=wr;
nest varstr varpsu;
weight perwt02f;
var totexp02;
subpopn agegroup=1;
Note: The subpopn statement in this example generates estimates for children under 18 (where agegroup is a constructed analytic variable that is equal to 1 for
children under 18).
Stata (syntax below applies to releases 8.0 and higher)
svyset [pweight=perwt02f], strata(varstr) psu(varpsu)
svymean totexp02, subpop(children)
Note: The subpop statement in this example generates estimates for children under 18 only (where children is a constructed variable set equal to 1 for persons
under 18 and set equal to 0 for all other persons).
SPSS
csplan analysis
/plan file=’filename’
/planvars analysisweight=perwt02f
/design strata=varstr cluster=varpsu
/estimator type=wr.
csdescriptives
/plan file=’filename’
36. MEPS Analysis Example
• Parental Mental Health and Child
Healthcare Utilization
• Objective: to show the association
between parental mental health (MH)
status and
• pediatric healthcare utilization patterns
• access to care measures
37. Methods
• Data source: MEPS HC, 1996-99
• Inclusion criteria
• 0-18 years old
• <1 parent in MEPS
38. Methods: Conceptual Model
# of parents
with MH dx
healthcare-utilization
variables
access-to-care
variables
child’s
demographics
child’s
chronic
illness
year parent’s
education
39. Methods: Conceptual Model
# of parents
with MH dx
healthcare-utilization
variables
access-to-care
variables
child’s
demographics
child’s
chronic
illness
year parent’s
education
Parent’s Full Year File
Parent’s Medical Conditions File
Child’s Full Year File
Child’s Visit Files
MEPS File Source:
42. Methods
• Other independent variables:
• Child’s
• Year
• Parent’s education
•age •urbanicity •income
•gender •census region •insurance
•race/ethnicity •family size •chronic illness
43. Methods: Analysis
• Bivariate analyses
• Logistic regression: to determine
associations between primary
independent variable and
• healthcare-utilization variables
• access-to-care variables
44. Results
• 31,062 children in 1996-99 weighted
estimate of 76 million children/year
• 18% (13 million) with ≥ 1 parent with a MH
diagnosis
• 89% (12 million) with 1 parent with MH diagnosis
• 11% (1.5 million) with 2 parents with MH
diagnosis
45. Results: Bivariate
Significant Association Between Parents’
MH and Both ER Visits and Hospitalizations
11.7
2.8
40.1
14.6
3.4
40.2
15.0
5.0
38.6
0
5
10
15
20
25
30
35
40
45
ER Visit Hosp WCC
%WithVisit
0 Parents with MH Diagnosis
1 Parent with MH Diagnosis
2 Parents with MH Diagnosis
46. Results: Bivariate
Association Between Child’s Mean Total
Expenditures and Parent’s MH
$744
$935
$1,817
$0
$400
$800
$1,200
$1,600
$2,000
0 Parents with
MH Diagnosis
1 Parent with
MH Diagnosis
2 Parents with
MH Diagnosis
TotalChildHealthcareExpenditures
47. Regression Results
Increased Acute Care Visits and Expenditures
# Parents with MH Diagnosis (referent = 0)
1 Parent 2 Parents
Had WCC
visit
1.06 (0.95, 1.19) 0.99 (0.77, 1.27)
Had ER/Hosp
visit
1.22 (1.08, 1.36) 1.32 (1.05, 1.67)
Had health
expenditures
1.34 (1.17, 1.54) 1.67 (1.13, 2.45)
48. Conclusions
• Parent’s MH diagnoses associated with
child’s
• costlier patterns of health care utilization
• higher overall healthcare costs
Editor's Notes
The title of this talk is Data Analysis in MEPS.
The Agency for Healthcare Research and Quality (AHRQ), formerly called the Agency for Health Care Policy and Research, began fielding MEPS in March 1996. AHRQ conducts MEPS in conjunction with the National Center for Health Statistics (NCHS)
Helpful overview screen
Helpful overview screen
The Medical Expenditure Panel Survey (MEPS) is a vital resource designed to continually provide policymakers, health care administrators, businesses, and others with timely, comprehensive information about health care use and costs in the United States, and to improve the accuracy of their economic projections.
MEPS is designed to help understand how the dramatic growth of managed care, changes in private health insurance, and other dynamics of today&apos;s market-driven health care delivery system have affected, and are likely to affect, the kinds, amounts, and costs of health care that Americans use. MEPS also is necessary for projecting who benefits from, and who bears the cost of, changes to existing health policy and the creation of new policies.
MEPS is designed to help understand how the dramatic growth of managed care, changes in private health insurance, and other dynamics of today&apos;s market-driven health care delivery system have affected, and are likely to affect, the kinds, amounts, and costs of health care that Americans use.
MEPS also is necessary for projecting who benefits from, and who bears the cost of, changes to existing health policy and the creation of new policies.
MEPS collects data on the specific health services that Americans use, how frequently they use them, the cost of these services, and how they are paid for, as well as data on the cost, scope, and breadth of private health insurance held by and available to the U.S. population.
MEPS is unparalleled for the degree of detail in its data, as well as its ability to link data on health services spending and health insurance to the demographic, employment, economic, health status, and other characteristics of survey respondents. Moreover, MEPS is the only national survey that provides a foundation for estimating the impact of changes in sources of payment and insurance coverage on different economic groups or special populations of interest, such as the poor, elderly, families, veterans, the uninsured, and racial and ethnic minorities.
MEPS has 2 major components; only the data files for the HC are available on-line. The rest of this talk will focus on the HC.
The HC collects data on a sample of families and individuals across the Nation, drawn from a nationally representative subsample of households that participated in the prior year&apos;s NCHS National Health Interview Survey. The objective is to produce annual estimates for a variety of measures of health status, health insurance coverage, health care use and expenditures, and sources of payment for health services. These data are particularly important because statisticians and researchers use them to generalize to people in the civilian noninstitutionalized population of the United States, as well as to conduct research in which the family is the unit of analysis.
The MEPS-HC sampling reflects an oversampling of blacks and Hispanics. In certain years MEPS over samples additional policy relevant subgroups.
This design allows linkage back to the previous years NHIS for analytic purposes.
At each interview the MEPS-HC collects detailed data on
Demographic Characteristics--including age, race/ethnicity, sex marital status and family relationships.
Charges and Payments--by payer source.
Health Status--including overall physical and mental health status, activity and functional limitations.
Utilization--MEPS collects data about all hospital (emergency room, inpatient and outpatient events), physician services, home health care, and prescribed medicines.
Employment--for all persons 16+ for each job (including retirement jobs): employment status, roster of all jobs, hours worked, job tenure, wages, types of business, whether health insurance was offered.
Health Insurance--both private and public health insurance status throughout the reference period and for each month, who the policy holder is, the source of coverage (employer sponsored or privately purchased) who is covered whether or not it is an HMO, type of plan (self or family coverage). Availability of coverage from employer is ascertained, and if health insurance was available from the employer, whether or not the person elected coverage.
The HC uses an overlapping panel design, where each year, a new panel starts.
Data are collected at person and household levels in a series of five in-person interviews over the course of a two and a half year period of time using computer-assisted personal interviewing (CAPI) to collect two full years of data.
All data for a household is reported by a single household respondent. At each interview the questionnaire collects information about each household member, and the survey builds on this information from interview to interview.
Each of the five household interviews takes on average 90 minutes to conduct.
Why is this important? Researchers may need to consider pooling years of data to obtain adequate sample sizes for certain types of subpopulation analysis. As a rule of thumb, 100 unweighted cases are needed to support national estimates.
To access data.
At this point, you can explore data in two ways. One is via the public use files, or PUFs, which you can download and analyze offline. The other is through MEPSnet, which allows you to analyze certain variables online.
Here is an example of a simple query in MEPSnet, for children under age 6. You can see that the standard error is large, especially for the “low income” group, which only has 277 unweighted subjects.
Using the total population shrinks your standard error.
To access data.
Getting back to our data selection slide, let’s now look at the types of data files available.
Our data source was the MEPS HC.
We included all children in the 1996-1999 MEPS surveys who were 0-18 years old and who had at least one parent with data in MEPS. We explored looking at the subset of children with asthma, but did not have sufficient sample size for subgroup analyses.
Here’s the conceptual model of our Logistic regression used to determine associations between our primary independent variable and both healthcare-utilization and access-to-care variables.
Here are the three different file types we used for
Our outcome measures included utilization variables and access-to-care variables. We combined ED and hospital visits because of insufficient numbers to analyze these outcomes separately—if a child had visited the ED and/or had been discharged from the hospital during the year, they were considered to have had at least one ED/IP visit. Well-child-care visits were determined based on the visit category and ICD-9 codes for WCC. Each of these variables—ED/IP and WCC—is dichotomized to “at least one visit” and “no visits”.
Total expenditures were evaluated dichotomously, based on whether or not the child had any health expenditures for the year.
Access to care variables were dichotomous, and included questions regarding whether the child had a usual source of care (USC), whether the child has changed providers during the year, whether the child seeks their USC for a new health problem, whether the child seeks their USC for preventive health care, and whether the child had any difficulty obtaining usual care.
Our primary independent variable was the number of parents at home with a MH diagnosis, defined by the ICD-9 codes shown. In MEPS, the diagnosis is based on report by the household member responding to the survey.
For our second set of analyses, based on the subgroup of children who had at least one parent with a MH diagnosis, our primary independent variable indicated whether at least one mentally ill parent had not seen a healthcare provider for their MH diagnosis. We defined an “untreated” parent as one with a MH diagnosis who answered “no” to the question “have you ever seen a doctor or other medical person about this condition?” in all five rounds of their MEPS involvement. If, for example, a child had two mentally-ill parents and only one had seen a provider for treatment, the primary independent variable would indicate an untreated parent.
Other independent variables included the child’s demographics and the presence of a chronic illness in the child, as well as the study year. We also included parental education, using mom’s education level if the child had 2 parents in MEPS.
We used Univariate analyses and Logistic regression to determine associations between our primary independent variable and both healthcare-utilization and access-to-care variables.
MEPS contains a total of 31,052 children from years 1996-99 who met our inclusion criteria. This yields a weighted sample size of 76 million children a year. Of these, 18% had at least one parent with a MH diagnosis. Of those 18%, 89% had 1 parent and 11% had 2 parents with a MH diagnosis.
The univariate analysis showed that the presence of at least one parent with a MH diagnosis was associated with having had an ER visit or a Hospitalization. We found no significant association between the parent’s MH and the child’s WCC visits.
The number of parents with MH diagnoses was associated with the child’s total healthcare expenditures, with all 3 pairwise t-test comparisons significant between groups.
The child’s ED/inpatient utilization was increased by having one parent with a MH diagnosis, as was the probability that the child had had healthcare expenditures. For both, the point estimate was higher if both parents had MH diagnoses. We found no association between parental MH diagnoses and WCC visits. The finding of increased acute care utilization and no change in WCC visits is consistent with the findings of our previously-mentioned lit review.
In this nationally-representative sample, we found a high prevalence of children with at least one parent with a MH diagnosis.
Children with mentally ill parents had higher utilization of costlier forms of healthcare—using more ED and inpatient services—and incurred higher health care expenditures, than did children of parents without mental illness. Parents with MH diagnoses reported greater barriers to accessing primary care, specifically, difficulty in finding and staying with a USC for their child.