SlideShare a Scribd company logo
1 of 16
Download to read offline
The exchangeability of self-reports and administrative health
care resource use measurements: assessement of the
methodological reporting quality
Cindy Yvonne Nobena,*, Angelique de Rijkb
, Frans Nijhuisc
, Jan Kottnerd
, Silvia Eversa
a
Department of Health Services Research, Faculty of Health, Medicine and Life Sciences, Maastricht University, PO Box 616, 6200 MD Maastricht,
The Netherlands
b
Department of Social Medicine, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands
c
Department of Work and Organizational Psychology, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, The Netherlands
d
Department of Dermatology and Allergy, Charite-Universit€atsmedizin Berlin, Germany
Accepted 28 September 2015; Published online 2 February 2016
Abstract
Objective: To assess the exchangeability of self-reported and administrative health care resource use measurements for cost estimation.
Study Design and Setting: In a systematic review (NHS EED and MEDLINE), reviewers evaluate, in duplicate, the methodological
reporting quality of studies comparing the validation evidence of instruments measuring health care resource use. The appraisal tool Meth-
odological Reporting Quality (MeRQ) is developed by merging aspects form the Guidelines for Reporting Reliability and Agreement
Studies and the Standards for Reporting Diagnostic Accuracy.
Results: Out of 173 studies, 35 full-text articles are assessed for eligibility. Sixteen articles are included in this study. In seven articles,
more than 75% of the reporting criteria assessed by MERQ are considered ‘‘good.’’ Most studies score at least ‘‘fair’’ on most of the re-
porting quality criteria. In the end, six studies score ‘‘good’’ on the minimal criteria for reporting. Varying levels of agreement among the
different data sources are found, with correlations ranging from 0.14 up to 0.93 and with occurrences of both random and systematic errors.
Conclusion: The validation evidence of the small number of studies with adequate MeRQ cautiously supports the exchangeability of
both the self-reported and administrative resource use measurement methods. Ó 2016 Elsevier Inc. All rights reserved.
Keywords: Health care; Utilization; Self-report; Data collection; Validation studies; Systematic review
1. Introduction
Essential to the economic evaluation of health care treat-
ments is a robust method for collecting data on health care
resource use, which can be used to calculate the related
costs. Health care resource quantities can be defined in
different ways. The cost types for potential inclusion in
an economic evaluation are commonly classified into health
care costs, patient and family costs, and costs in other sec-
tors (absenteeism and presenteeism) [1,2]. Universally
recognized methods of collecting resource use data are
lacking, and a wide variety of methods can be used.
Resource use data are most often measured through patient
self-reports (e.g., cost questionnaires, diaries, interviews) or
through administrative records (e.g., administrative systems
of care providers or insurance companies).
Both methods have advantages and disadvantages.
Cautiousness is warranted when calculating costs from pa-
tient self-reports because patients may not know or be able
to recall precisely their length of stay and diagnoses, two of
the primary cost drivers [3]. Furthermore, self-reports are
more burdensome for patients, and the different timing of
data collection might influence accuracy. Administrative
data collection methods are assumed to be the most objec-
tive and accurate source of information although there is no
common reference standard for measuring resource use.
However, as health care provision has become more inte-
grated, economic evaluations increasingly need to include
information from many different care providers. It is
becoming more difficult and expensive to review the re-
cords from the many providers involved in administering
treatments to patients. Furthermore, given prolonged sur-
vival rates and the aging population, it is expensive to
Conflict of interest: None.
Funding: None.
* Corresponding author. Tel.: þ31 43 38 81 834; fax: þ31 43 38 84162.
E-mail address: c.noben@maastrichtuniversity.nl (C.Y. Noben).
http://dx.doi.org/10.1016/j.jclinepi.2015.09.019
0895-4356/Ó 2016 Elsevier Inc. All rights reserved.
Journal of Clinical Epidemiology 74 (2016) 93e106
What is new?
Key findings
 Results of good methodological studies assessed
by MeRQ cautiously indicate the exchangeability
for both self-reported and administrative resource
use measurement methods.
What this adds to what was known?
 Only a few studies were conducted using rigorous
methods.
What is the implication and what should change
now?
 The methodological reporting quality of health
care resource use measurements should be assessed
before using the data for cost estimations.
obtain longitudinal treatment information by accessing
medical records at multiple points in time. Even when
administrative data are available and accurate, limits on
benefit coverage might require the collection of health care
utilization data from other sources (i.e., alternative health
care and out-of-pocket payments) to calculate the economic
costs. The major advantages and disadvantages of patient
self-reported vs. administrative measurements are listed in
Table 1.
Considerable research effort is concentrated on the
appropriate measurement of outcomes and the policy impli-
cations of economic evaluation. The methods for collecting
the resource use data are relatively neglected [4]. No stan-
dard methodology for estimating health care resource use is
developed yet. Consequently, the choice for a collection
method for resource use data is often a matter of practi-
cality instead of a decision based on methodological
considerations.
Previous studies primarily focus on the accuracy of pa-
tient recall and the impact of patient self-reported methods
on the accuracy of estimates [3,5,6], and not on the meth-
odological reporting quality (MeRQ) of these studies.
Because of different definitions, the lack of standard statis-
tics regarding the validation evidence (e.g., agreement, reli-
ability, validity, and accuracy), and inconsistent MeRQ,
caution is warranted when interpreting the study findings.
Similar to all other areas of empirical research, high
MeRQ is a prerequisite for interpreting the validation evi-
dence provided. The aim of this systematic review is there-
fore to assess, based on the MeRQ, whether the validation
evidence of studies on health care resource use measure-
ments that are intended for cost estimation support the
exchangeability of both self-reported and administrative
methods of data collection. The research questions are as
follows:
1. Whatarethe characteristicsofstudiesthatcompareself-
reported with administrative methods of measuring
health care resource use?
2. What is the MeRQ of the studies from which this ev-
idence is derived?
3. What validation evidence is reported when comparing
self-reported with administrative methods of collect-
ing health care resource use data that is intended for
cost estimation?
4. Given adequate MeRQ, does the validation evidence
support the exchangeability of both self-reported and
administrative measurement methods for resource use?
2. Methods
This systematic review was planned, conducted, and re-
ported in adherence to the PRISMA reporting guidelines [7].
2.1. Study eligibility
Original research in any language that had as a stated
purpose the comparison of resource use data gathered using
patient self-reports with data gathered using administrative
methods for application in costing studies is included in this
review. Identification of health care resource use should
serve as an item for costing and application in trial-based
economic evaluations. Therefore, studies that solely
modeled or described patterns of health care use or studies
that examined the impact of various patients’ characteris-
tics on congruency were excluded from the analysis. Only
studies that contained validation arguments in the interpre-
tation of each study were eligible. Minimal requirements
for validation arguments included identification of health
care resource use as an item for costing; identification of
collection methods for resource use data; and identification
of validation evidence (i.e., agreement, reliability, and val-
idity). Because of the lack of consistent terminology, some
uncertainty existed as to the appropriate application of
these specific terms. In cases of conflict, consensus was
reached through discussion among the reviewers.
Patient self-reports were considered if they were made
solely by the patient; articles using proxies for self-
reports were therefore excluded. Any type of direct health
care resource use (e.g., a visit to a clinical provider, medi-
cation usage) and any type of data collection instrument or
method (e.g., diary, interview, questionnaire, GP record, in-
surance database) were considered.
2.2. Study identification
A search in NHS EED, by means of the Cochrane Li-
brary or the Center for Reviews and Dissemination, along
with a supplementary search in MEDLINE (PubMed) was
conducted, which is generally an appropriate strategy for
identifying all published economic evaluation studies related
94 C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
to health care that are relevant [8,9]. Details of the search
strategy are provided in Appendix A at www.jclinepi.com.
Both text words and MeSH terms were combined with
AND and OR for ‘‘health care,’’ ‘‘utilization,’’ ‘‘patient
self-report,’’ ‘‘administrative data,’’ ‘‘cost analysis,’’ ‘‘valid-
ity,’’ ‘‘accuracy,’’ ‘‘reliability,’’ ‘‘agreement.’’ Reference and
citation lists of included studies were hand searched to iden-
tify additional studies. International experts were consulted
via the ‘‘International Health Economics Discussion List’’
to identify a wider range of relevant studies [10]. An archive
of collection methods for health care utilization data called
Database of Instruments for Resource Use Measurement [4]
was used to test the sensitivity of the search. Titles and ab-
stracts were screened for inclusion, and full texts were
screened once the articles were judged eligible for inclusion
or uncertain. In cases of conflict, consensus was reached
through discussion among the reviewers.
2.3. Instrument for methodological reporting quality
A new tool for appraising the MeRQ for health care
resource use measurement was developed on the basis of
two guidelines for reporting: Guidelines for Reporting Reli-
ability and Agreement Studies (GRRAS) [11] and the Stan-
dards for Reporting Diagnostic Accuracy (STARD) [12].
The MeRQ tool aimed to assess the methodological quality
of studies reporting on the validation evidence of administra-
tive data compared to self-reported data for cost estimation
in economic evaluation. STARD is a guideline for testing
diagnostic accuracy and thus for research including a refer-
ence test. Although this is less relevant for research on
health care resource use because reference tests regarding
data collection methods for health care resource use are
not available, overlap was found in some reporting domains.
As mentioned in the Section 1, administrative data collec-
tion methods are sometimes assumed to be the gold stan-
dard. Therefore, information on this type of accuracy was
included in the review as a subconcept of validity. GRRAS
complimented STARD in the MeRQ by emphasizing on
items specific to reliability and agreement. The operated
adaptation of both guidelines into the appraisal tool for eval-
uating the MeRQ can be found in Appendix B at www.
jclinepi.com. Adaptations were made by focusing on the
methodological content and by deleting the directions and
layout positions. From both instruments, all aspects that rep-
resented (1) reporting quality of studies comparing measure-
ments (part 1 of the instrument) and (2) MeRQ of the
validation evidence presented in those studies comparing
measurements (part 2 of the instrument) were selected.
Finally, these were rephrased into questions to score the
items in terms of level of quality. Table 2 provides an over-
view of the main criteria for the evaluation of the validity,
reliability, or agreement of self-reported health care resource
use measurement instruments compared with health care
resource use measured using administrative data.
Table 1. Comparing the advantages and disadvantages of patient self-reported vs. administrative measurements of resource use
Measurement of resource use Advantages Disadvantages
Self-reports Low costs
Possibility to include all societal costs (formal and
informal, reimbursed and nonreimbursed)
Higher visibility of additional nonspecific costs
(not directly related to the disease or intervention)
Selective nonresponse
Lower compliance in completing over a longer period
Burdensome for patients
Tendency to record more socially acceptable answers
(respondent bias, recall bias)
Administrative data No burden for patients
Data already registered
Accurate and detailed information (on type and
volume of care delivered)
Limited feasibility (privacy, time lag)
Up (wrong) coding
Cooperation of many different databases necessary
No visibility of additional resource use not directly related
to the health care institution and/or insurance company
(no routine records for patient and family resource use,
additional expenditures, input of informal caregivers,
presenteeism, etc)
Impossible to include unrelated costs, informal costs and
out-of-pocket expenses
Table 2. Criteria for the evaluation of the validity, reliability, or
agreement of a self-reported health care resource use
measurement instrument compared with health care resource use
measured using administrative data
MeRQ (Methodological Reporting Quality)
Part 1: The reporting quality of studies comparing measurements
Each study.
. explicitly identifies of the study aim
. proposes validation arguments and strategies for interpreting
validation evidence
. clearly reports the measurement methods; including relevant
evidence
. provides a rationale for the relationship between the
measurement methods
. elaborates on the measurement procedures of both the
self-reports and the administrative data
. describes the time interval between the data collections
. reports on the population characteristics
Part 2: The methodological reporting quality of the validation
evidence presented in the studies comparing measurements
Assessing the validity, reliability, or agreement, the studies
defined.
. the statistical methods for validation estimates
. the retrieved validation estimates (statistics)
. the methods for statistical uncertainty
. the interpretation of the validation evidence
Part 3: Additional information
95C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
Developing MeRQ was systematically processed to
appraise the available studies included in this review. The
MeRQ tool provides an operational definition of the proper-
ties considered most important for this research as it had to
combine the accuracy and agreement perspective, and such a
tool is not available at the moment. Consequently, the tool
was built on existing evidence and available consensus
(i.e., the GRRAS and STARD instruments). Next, the items
in the MeRQ tool were systematically selected and formu-
lated. Both GRRAS and STARD were used to select those
elements regarding MeRQ that apply to agreement studies.
Creating MeRQ was thus built on the work done by experts
who had substantial contributions and publications in the
field of agreement and reliability work (e.g., the GRRAS
group). Additionally, one of the coauthors of this review is
a member of the GRRAS group. Then, those elements from
GRRAS and STARD that apply to the assessment to report
on the exchangeability of self-reports and administrative
data, and thus on agreement studies, were systematically
selected. During several collective meetings with the au-
thors, the selection and formulation of these elements were
extensively discussed until consensus was reached. Because
two authors have substantial expertise in psychometrics, we
could assure the selection of elements was critically assessed
and justified from a psychometric point of view.
MeRQ was rated using the scoring criteria: good (explicit
reporting in the study); fair (reporting makes it possible to
deduce what is expected); poor (not explicitly reported and
difficult to deduce what is expected), not reported (no
information provided or to be deduced, also unregistered, un-
reported [e.g., no information on the measurement proce-
dure]), not addressed (not designating the measurement,
also undirected [e.g., information on the measurement proce-
dures, although not clear whether they relate to the self-
reports or the administrative data]), or not applicable. For
each study, the quality of reporting on following criteria
was assessed: (1) explicit identification of the study as an
evaluation of the reliability, agreement or validity (accuracy)
of a self-reported health care resource use measurement in-
strument compared with health care resource use measured
using administrative data; (2) validation arguments and stra-
tegies for interpreting validation evidence; (3) measurement
methods (eligibility criteria) with a critical review of evidence
relevant for the study; (4) rationale for the relationship be-
tween the self-reports and the administrative data; (5)
methods or measurement procedures for the self-reports; (6)
methods or measurement procedures for the administrative
data; (7) time interval between self-reports and administrative
data collection; (8) population characteristics; (9) statistical
methods for validation estimates; (10) validation estimates;
(11) methods for statistical uncertainty; (12) concluding vali-
dation argument and/or interpretation of validation evidence.
All authors (C.Y.N., A.d.R., S.E., F.N., and J.K.) contrib-
uted to the rating of the MeRQ of the studies. The included
articles were divided per paired group of reviewers. First,
the scoring was conducted individually, and then the results
of the scoring were pairwise compared. The pairing was
C.N. and S.E. (pair 1), S.E. and J.K. (pair 2), J.K. and
A.d.R. (pair 3), A.d.R. and C.N. (pair 4), C.N. and F.N.
(pair 5). In cases of conflicting scores, which occurred in
five studies [13e17], a third reviewer was added to the pair
and consensus was reached through discussion.
2.4. Validation evidence
There are different concepts in the literature regarding
the assessment of measurement validity (i.e., reliability,
agreement, validity, and accuracy) [18]. In this systematic
review, validation evidence is considered an umbrella term
whereby the terms reliability, agreement, and validity (and
accuracy as a subconcept of validity) are used to provide in-
formation about the quality of the measurements. Reli-
ability is defined here as the ratio of variability between
the subjects to the total variability of all measurements in
the sample. Reliability therefore reflects the ability of a
measurement to differentiate between subjects or objects
(i.e., discriminative ability). Statistical approaches include
intraclass correlation and kappa calculations. Agreement
is defined as the degree to which scores are identical and
is often expressed in percentages (i.e., proportions of agree-
ment, ranges, limits of agreement). Diagnostic accuracy re-
fers to the degree of agreement between the information
from the test being evaluated (index test) and reference
standard, and can be expressed in many ways, that is, sensi-
tivity, specificity, likelihood ratios, odds ratio, area under a
receiver operating curve. For costing measures, the concept
of diagnostic accuracy cannot be used as there is no
consensus on what the reference standard should be. How-
ever, as mentioned in the Section 1, administrative data
collection methods are often, but not supported by scientific
data, assumed to be the gold standard. If information on this
type of accuracy was available, it is included in the review
as a subconcept of validity. Complete and accurate report-
ing makes it possible to detect the potential for bias in the
study (internal validity) and to evaluate the generalizability
and applicability of the results (external validity).
2.5. Best evidence
Studies with ‘‘good’’ reporting on four ‘‘minimal criteria
for reporting’’ of MeRQ were compared in detail. These
minimal criteria include as follows: (1) clear description
of the measurement methods (eligibility criteria) with a crit-
ical review of evidence relevant for the study; (2) rationale
for the relationship between the self-report and the adminis-
trative data; (3) methods or measurement procedures for
self-reports; and (4) methods or measurement procedures
for the administrative data. A ‘‘good reported’’ score on
these criteria is essential to meaningfully interpret the re-
maining criteria as presented in the MeRQ (i.e., explicit
identification of the study as an evaluation of the validity,
reliability, agreement, or accuracy of a self-reported health
96 C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
care resource use measurement instrument compared with
administrative data; proposed validation arguments and stra-
tegies for interpreting validation evidence; time interval be-
tween self-reports and reference standard data collection
mentioned; statistical methods for validation estimates
defined; validation estimates reported; methods for statisti-
cal uncertainty defined; concluding validation argument
and/or interpretation of validation evidence presented).
The ‘‘minimal reporting criteria’’ can therefore be seen
as prerequisites and are based on the collective opinion of
the authors because the rating of the aforementioned broad-
er criteria would not have an interpretable meaning if the
four minimal reporting criteria were not at least rated
‘‘good reported.’’ In essence, the research team concluded
that the broader criteria are conditional on ‘‘the four mini-
mal criteria of reporting.’’
2.6. Data extraction
Data were extracted independently. Basic information
was extracted on the year of publication and country of
origin, study design (e.g., sample of enrollees, case-
control study, randomized controlled trial), description of
the population (e.g., workers, patients/disease group) and
the sample size, the type of health care resource use
measured (e.g., medication, GP visits, hospital admissions),
the method of self-report (e.g., questionnaire, diary), the
type of administrative measure (e.g., hospital record, insur-
ance record), and the data collection period. See Table 3.
The MeRQ score is reported per study by presenting the
score on each of the reporting criteria (Table 4). The
scoring was abstracted independently and in duplicate for
all studies and all criteria, and conflicts were resolved by
consensus. Validation evidence, as presented in Table 5,
included the main statistical analysis, the validation esti-
mates, and the statistical uncertainty. In the end, studies that
scored ‘‘good’’ on four ‘‘minimal criteria for reporting’’ of
MeRQ were compared in detail.
3. Results
3.1. Study selection
Our search in MEDLINE (PubMed) resulted in 69 hits,
and 88 hits were found in the NHS Economic Evaluation
Database, which is included in The Cochrane Library. Af-
ter consulting experts, 11 articles were added. Citation
tracking and hand searching resulted in nine additional hits.
Four duplicates were removed, and the 173 remaining re-
cords were screened by title and abstract. At this stage,
138 records were excluded. The main reasons for exclusion
were that the studies did not aim to measure health care
resource use for the purpose of costing analysis, did not
provide information on the estimation of health care costs,
and did not address the economic significance or the
impact on the costs of care but solely described patterns
of health care use or provided information on how popula-
tion characteristics influenced the agreement or disagree-
ment between both measurement methods. Thirty-five
full-text articles were assessed for eligibility, whereof
another 19 full-text articles were excluded. The most com-
mon reasons for exclusion were that the studies reported
proxy-recorded resource use; the main aim was not the
comparison of health care resource use measured via
self-reports with use measured via administrative methods;
the studies did not measure health care resource use as
items of costing or did not address the economic impact
on costs of care; the studies were a case-study or review;
or the resource use was solely focused on the usage of
one specific type of medication. In the end, 16 articles were
included in this comprehensive literature review
[13e17,19e29]. A flow chart describes the process of in-
clusion (see Fig. 1).
3.2. Characteristics of the studies
Table 3 presents an overview of the characteristics of the
selected studies (N 5 16).
Moststudiesexaminedatleast one ofthe followingtypesof
resource use: specialist care [13,14,19e25,27e29], primary
care [13,16,21,25,26], medication use [17,20,26,27,29], or
hospitalization [13e15,21,23,24,28,29]. Most studies
collected information on multiple types of resource use
[13,14,20,21,23e28]. Eleven studies were conducted in Eu-
rope [13,14,16,17,19,21,22,24,25,28,29], two in Canada
[20,23], one in the USA [15], one in Australia [27], and one
in New Zealand [26]. The population characteristics varied
from patients with mental health complaints [13,24,25,29] to
musculoskeletal disorders [19,22,26], chronic disorders
[14,15,20,23], primary care attenders [16], acute ward
patients [28], and cancer patients [17,21,27]. In most
studies, a randomized controlled trial was conducted
[14,19,21,22,25e29],ofwhichtwowereconductedina multi-
center setting [13,17]. In four studies, the study design was
based on a sample of enrollees from another study [24], pro-
gram [15], or health care service [16,20]. One study used a
population-based cohort [23].
In eight studies, health care resource use was measured
with a questionnaire [16,17,20,22,25,26,28,29], in three
studies with an interview [13,23,24], in another three
studies with a diary [17,19,27], in two studies with a tele-
phone survey [15,21], and one with a booklet [14]. The
studies that measured health care resource use based on
administrative records used data from medical records
[13], including care provider registrations [14,20,24], re-
cord forms [16,17,21,22,25,26], and provider reports
[28]. Six studies used health insurance data
[15,19,20,23,27,29].
In most studies, the data collection was retrospective
[13,15,16,21e26,28,29] with the largest recall period of
97C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
Table 3. Characteristics of studies comparing measurement methods for health care resource use
Reference
and year Country Study design
Population
(sample size)
Health care
resource use
Method of
self-report
Administrative
record
Data
collection
period
Byford
(2007) [13]
England
and
Scotland
Multicenter
RCT
Adults with
recurrent
episodes of
deliberate
self-harm
(N 5 272)
Inpatient days,
outpatient, accident
and ER attendance,
GP, practice nurses,
community
psychologists,
psychiatric nurses
and
occupational
therapist
contacts
Interviews
(Client
Service
Receipt
Inventory)
Medical
records
Retrospectived
baseline, 6
and 12 mo
after trial entry
Goossens
(2000) [14]
The
Netherlands
RCT Patients with
chronic low
back pain
(N 5 40)
Physiotherapy
contacts
and
specialist contacts
Cost diary Health
insurance
company
data
Prospectived
each diary
covered a
period up
to 4 weeks
Guerriere
(2006) [15]
Canada Sample of
enrollees
from a cystic
fibrosis clinic
Patients with
cystic fibrosis
(N 5 110)
Home-based health
care
professional
appointments,
health
care professional
appointments
outside
home, laboratory and
diagnostic tests,
prescription
medications
Questionnaire
(The Ambulatory
and Home
Care Record)
Ontario
Health
Insurance
Plan
database,
Ontario
Home
Care
Administrative
System database,
hospital
pharmacy
database
Daily basis
completion
over a
4-week
data
collection
period
Hoogendoorn
(2009) [16]
The
Netherlands
RCT Patients with
chronic
obstructive
pulmonary
disease
(N 5 175)
Daycare treatment,
hospital admissions,
hospital days,
medical
specialist visits,
physiotherapist
visits,
respiratory nurse
visits,
dietician visits
Cost booklets Care provider
registrations
Prospectived
each
booklet
covered
a period
of 4
weeks
collected
every 2 mo
Janson
(2005) [17]
Sweden RCT Patients with
potentially
curable colon
cancer
(N 5 40)
Outpatient
consultations,
doctor consultations,
GP and district
nurse,
surgical ward
readmissions
Telephone
surveys
Clinical Case
Record
Forms
Retrospectived
every 25
weeks with
a 12-week
follow-up
period
Kennedy
(2002) [18]
United
Kingdom
RCT Patients with a
musculoskeletal
condition
(N 5 52)
Physiotherapy
Questionnaires Outpatient
department
records,
hospital
physiotherapy
department’s
records
Retrospectived
3 and 12 mo
after initial
outpatient
appointment
Longobardi
(2011) [19]
Canada Population-based
cohort
Adults with
inflammatory
bowel disease
(IBD)
(N 5 352)
Hospital and
physician visits
(IBD and non-IBD
related)
Interview
(standardized)
Health
insurance
provider
databases
Retrospective
d12 mo
Marks
(2003) [20]
United
States
Sample of
enrollees in
a diabetes
management
program
Diabetics
(N 5 1,230)
Hospitalizations,
emergency room
visits
Telephone
assessment
Health
insurance
claims
Retrospective
d6 mo
Mirandola
(1999) [21]
Italy Sample of
enrollees
from the
South-Verona
Outcome
Patients in
contact with the
South-Verona
Community
Psychiatric
Contacts with
psychiatrists,
psychologists,
community and
outpatient nurses,
Interview
(Client
Service
Receipt
Interview)
Psychiatric
Case Register
Retrospective
dtwice
yearly
(Continued)
98 C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
12 months [13,22,23]. In three studies that used diaries
for self-reports, the data were collected prospectively
[14,19,27]. One study collected the data on a daily basis
[20]. Another study collected the data both prospectively
(via a diary) and retrospectively (via questionnaires)
[17].
Table 3. Continued
Reference
and year Country Study design
Population
(sample size)
Health care
resource use
Method of
self-report
Administrative
record
Data
collection
period
Project Service
(N 5 339)
and social workers;
admissions in
sheltered
accommodations;
rehabilitation group
contacts; day
hospital
contacts
Mistry
(2005) [22]
United
Kingdom
RCT Patients
consulting
their GP
regarding
symptoms of
depression
(N 5 85)
Contacts with GPs and
other practice
staff, contacts
with community
mental health
teams, psychiatric
and general hospital
contacts
Questionnaires GP records
(community-
based or
manual
records)
Retrospectived
3-mo
intervals
over the
course
of 1 yr
Patel
(2005) [23]
England Random sample
of consecutive
primary
care
attenders
Primary care
attenders
(N 5 229)
GP visits Questionnaires
(variant of the
Client Service
Receipt
Inventory)
GP case
records
Retrospectived
6 mo
Pinto
(2011) [24]
New
Zealand
RCT Patients with
hip or knee
osteoarthritis
(N 5 50)
GP contacts,
total number
of medications
Questionnaire
(Osteoarthritis
Cost and
Consequences
Questionnaire)
Medical
records
Retrospectived
3 mo
Pollicino
(2002) [25]
Australia RCT Patients with
apparently
resectable
nonesmall-cell
lung cancer
(N 5 24)
Broad type of services
(unreferred
attendances,
specialist
attendances,
diagnostic imaging,
pathology tests)
Diaries Health Insurance
Commission
Prospectived
fortnightly
the first
year and
monthly
the year
after
Richards
(2003) [26]
United
Kingdom
RCT Acute ward
patients aged
60 yr or older
(N 5 185)
Hospital admissions,
visits by community
health care staff
Interviewer-
administered
questionnaire
Health care provider
reports
(Patient
Administration
System, Integrated
Community System,
GP questionnaire)
Retrospectived
12 weeks
Van den Brink
(2004) [27]
The
Netherlands
Multicenter
RCT
Patients
with
rectal
cancer
(N 5 147)
Medications,
care products
Diary,
questionnaire
Medical records
(hospital admission
system and
pharmacy records)
Prospective
(diary)d
weekly
from
discharge
to 3 mo
and
monthly
from 3 to 6
mo/
retrospective
(questionnaire)
d3 and 6 mo
Zentner
(2012) [28]
Germany Cluster RCT People with
severe
mental
illness
(N 5 82)
Hospital admissions,
ambulant psychiatric
contacts, medication
Questionnaire
(Client
Sociodemographic
and Service
Receipt Inventory)
Health insurance
administrative
records (AOK)
Retrospective
d6 mo
hospital
admissions/
3-mo
ambulant
contact/
1-mo
medication
Abbreviations: RCT, randomized controlled trial; GP, general practitioner.
99C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
3.3. MeRQ scores
Table 4 summarizes the MeRQ as assessed by the re-
viewers using the MeRQ tool. The reporting of the ele-
ments was scored ‘‘good,’’ ‘‘fair,’’ ‘‘poor,’’ ‘‘not
reported,’’ ‘‘not addressed,’’ or ‘‘not applicable.’’ In seven
of the sixteen studies, more than 75% of the criteria (!9)
were considered ‘‘good’’ (explicit identification)
[21e24,26,28,29]. Overall, most studies scored at least
‘‘fair’’ on most of the MeRQ criteria.
One study [19] did not explicitly identify its aim as eval-
uating the validity of self-reported measurements compared
with administrative measurements. It was only at the end of
the methods section that the aim of comparing both
methods was mentioned. The deducibility was low, result-
ing in a poor reporting quality of the first criteria. Another
study [13] did not address any validation arguments or stra-
tegies for interpreting the validation evidence. The same
study lacked a comprehensive description of the measure-
ment procedures for self-reports and administrative data.
The only information provided was regarding the health
care services that were measured and the time intervals be-
tween both collection methods. This study scored ‘‘poor’’
on two quality criteria. Seven of 16 studies did not report
the time intervals between the self-reported data collection
and the administrative data collection [14,15,21e25]. Pol-
licino et al. [27] did not report statistical methods for the
assessment of validation estimates; they instead reported
the difference counts between agreement and disagree-
ment. Furthermore, no methods for statistical uncertainty
were reported [27]. Five other studies also failed to report
methods for statistical uncertainty [14,17,19,25,29]. Van
den Brink et al. [17] calculated concordance between types
of medication and care products reported using both
methods. Percentages were provided in figures and text;
however, it was impossible to deduce the statistical
methods used and the validation estimates reported.
Although Mirandola et al. [24] reported a bias correction
factor for precision, methods for statistical uncertainty
were lacking, and the study scored ‘‘poor’’ on this report-
ing element.
Table 4. MeRQ scores
Study
Aim to
compare
validation
evidence
Strategies
for
interpreting
validation
evidence
Instrument
eligibility
criteria
Rationale
for
relation
Methods
or
procedures
for
self-reports
Methods or
procedures
for
administrative
data
Time
interval
Population
characteristics
Statistical
methods
for
validation
estimates
Validation
evidence
Validation
estimates
Statistical
uncertainty
Conclusion
and
interpretation
of
validation
evidence
Byford
(2007)
6 6 6 þ À À þ 6 þ Agreement þ þ þ
Goossens
(2000)
À 6 þ þ þ 6 þ þ þ Validity þ ? þ
Guerriere
(2006)
6 À 6 6 À þ þ þ þ Agreement þ þ þ
Hoogendoorn
(2009)
6 6 6 þ þ þ ? þ þ Agreement þ ? þ
Janson
(2005)
6 þ þ 6 þ þ ? þ þ Agreement þ þ þ
Kennedy
(2002)
þ 6 þ þ þ þ ? þ þ Agreement þ þ þ
Longobardi
(2011)
þ þ þ þ þ þ ? þ þ Validity þ þ þ
Marks
(2003)
6 6 6 þ þ þ þ þ þ Agreement þ þ 6
Mirandola
(1999)
þ þ þ þ þ þ ? þ þ Agreement þ À þ
Mistry
(2005)
þ þ 6 6 6 6 þ 6 þ Agreement þ þ 6
Patel
(2005)
6 6 À þ 6 6 þ 6 þ Agreement þ þ þ
Pinto
(2011)
þ þ þ þ þ þ þ þ þ Validity þ þ þ
Pollicino
(2002)
þ ? þ þ À À þ À ? Agreement ? ? À
Richards
(2003)
6 þ þ þ þ þ þ 6 þ Agreement þ þ þ
Van Den
Brink
(2004)
6 þ 6 þ þ þ þ 6 þ Validity þ 6 6
Zentner
(2012)
þ þ þ þ þ þ þ þ þ Validity þ À þ
Abbreviation: MeRQ, Methodological Reporting Quality.
þ, good (explicit reporting in the study); 6, fair (reporting in study makes it possible to deduce what is expected); À, poor (not explicitly re-
ported in the study and difficult to deduce what is expected); ?, not reported.
100 C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
Table 5. Validation evidence
Reference and year Statistical analysis Validation estimate (evidence) Statistical uncertainty
Byford (2007) [13] Lin concordance correlation
coefficient (rc)
Inpatient days (rc 0.658)
Outpatient appointments (rc 0.231)
Accident and ER attendance (rc 0.760)
General practitioner contacts (rc 0.631)
Practice nurse contacts (rc 0.056)
Community psychologist contacts (rc 0.063)
Community psychiatric nurse contacts
(rc 0.349)
Occupational therapist contacts (rc À0.001)
95% confidence interval
95% limits of agreement
Goossens (2000) [14] Nonparametric Wilcoxon
Signed Rank Test
No significant difference between the two
methods for contacts with specialists
(P-value 0.93)
Significant difference for physiotherapist
visits (P-value 0.021)
NA
Guerriere (2006) [15] Observed agreement (Po)
Simple kappa (k)
Quadratic weighted kappa (kw)
Physiotherapy (Po 1.00; k 1.00)
Clinic visits (Po 0.98; k 0.90)
Hospitalization (Po 0.99; k 0.88)
Diagnostic tests (Po 0.94; k 0.78)
Home-based nursing or personal support
(Po 0.97; k 0.75)
General practitioner visits (Po 0.96; k 0.69)
Laboratory tests (Po 0.92; k 0.51)
Medical specialist visits (Po 0.85; k 0.41)
Diabetes therapy (Po 0.95; kw 0.5)
Antibiotics/antifungals (Po 0.81; kw 0.63)
Gastrointestinal therapy (Po 0.78; kw 0.55)
Pulmonary therapy (Po 0.78; kw 0.5)
95% confidence interval
Hoogendoorn (2009) [16] Spearman rank correlation
coefficient (rho)
Proportion of perfect
agreement (p.a. %)
Daycare treatment (rho 0.55; p.a. 87%)
Daycare treatment for COPD (rho 0.49; p.a.
98%)
Hospital admissions (rho 0.93; p.a. 88%)
Hospital admissions for COPD
(rho 0.94; p.a. 95%)
Total hospital days (rho 0.91; p.a. 79%)
Total hospital days for COPD
(rho 0.93; p.a. 93%)
Medical specialist visits (rho 0.70; p.a. 8%)
Physiotherapy (rho 0.86; p.a. 7%)
Respiratory nurse visits (rho 0.65; p.a. 11%)
Dietician visits (rho 0.64; p.a. 29%)
NA
Janson (2005) [17] Spearman correlation
coefficient (SCC)
Outpatient consultations doctor (SCC 0.66)
General practitioner visits (SCC 0.49)
District nurse visits (SCC 0.39)
Surgical ward visits (SCC 0.72)
95% confidence interval
Kennedy (2002) [18] Intraclass correlation
coefficient (ICC)
Physiotherapy (ICC 0.54) 95% confidence interval
Longobardi (2011) [19] Concordance correlation
coefficient (CCC)
Pearson correlation
coefficient (PCC)
Physician visits (CCC 0.58; PCC 0.68)
Hospital overnights, IBD related (CCC 0.80;
PCC 0.88)
Hospital overnights, non-IBD related (CCC
0.86; PCC 0.93)
Bias correction factor
Marks (2003) [20] Kappa (k) Hospitalizations (k 0.6366)
Emergency room visits (k 0.5390)
95% confidence interval
Mirandola (1999) [21] Lin concordance correlation
coefficient (rc)
Psychiatrists community contacts (rc 0.521)
Psychiatrists outpatient contacts (rc 0.538)
Psychologists community contacts (rc
0.495)
Psychologists outpatient contacts (rc 0.539)
Nurses community contacts (rc 0.45)
Nurses outpatient contacts (rc 0.94)
Social workers community contacts
(rc 0.183)
NA
(Continued)
101C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
3.4. Validation evidence
The main statistical analysis and statistical uncertainty
analyses, as well as the validation estimates, were also
captured during basic data abstraction and presented in
Table 5.
The statistical methods used to calculate the health care
resource use validation estimates varied widely. Most
studies examined the correlation between self-reported
and administrative measurements of health care resource
use, with calculations including the Spearman (rank) corre-
lation coefficient [14,21,29], Pearson correlation coefficient
[17,23], the interclass correlation coefficient [22], and pri-
marily Lin’s concordance correlation coefficient
[13,16,23,24,26]. The Lin coefficient combines measure-
ments of precision and accuracy to determine whether the
data significantly deviate from perfect concordance. In an
attempt to measure agreement, another four studies calcu-
lated a (weighted) kappa [15,20,25,28]. Wilcoxon signed
rank tests were used in one study to test differences in es-
timations of nonnormal data distributions [19]. Agreement
and proportion-perfect agreement were considered in the
analyses of five studies [14,17,20,27,28]. In 56% of the
studies, statistical uncertainty was defined. Most studies
presented the 95% confidence intervals around the valida-
tion estimates [13,15,20e22,26,28]. The 95% limits of
agreement were calculated in three studies [13,16,26].
3.5. Best evidence
In the end, six studies [22e24,26,28,29] scored ‘‘good’’
on all four MeRQ ‘‘minimal criteria for reporting.’’ These
studies were assessed and compared in detail. Three studies
assessed the validity of the methods [23,26,29], whereas the
other three studies assessed the methods based on their
levels of agreement [22,24,28].
Longobardi et al. [23] assessed the validity of self-
reported data gathered via standardized interviews
compared with health care utilization drawn from an
administrative database over a 12-month period for inflam-
matory bowel disease (IBD) patients. The authors
concluded that there was good recall of medical contacts.
Table 5. Continued
Reference and year Statistical analysis Validation estimate (evidence) Statistical uncertainty
Social workers outpatient contacts
(rc 0.186)
Admissions in sheltered accommodations
(rc 0.999)
Rehabilitation group contacts (rc 0.165)
Day hospital contacts (rc 0.140)
Admissions to acute psychiatric ward
(rc 0.953)
Mistry (2005) [22] Weighted kappa (kw) General practitioner surgery contacts
(kw 0.370)
General practitioner home visits (kw 0.485)
Psychiatric hospital and community service
(kw 0.278)
Social services (kw 0.021)
Other hospital and specialist services
(kw 0.430)
NA
Patel (2005) [23] Lin concordance correlation
coefficient (rc)
General practitioner visits (rc 0.756) 95% limits of agreement
Pinto (2011) [24] Lin concordance correlation
coefficient (rc)
General practitioner contacts (rc 0.41)
Total number of medications (rc 0.63)
95% limits of agreement
95% confidence interval
Pollicino (2002) [25] Extent of agreement
(absolute numbers, agreed
on number of services used)
Unreferred visits (extent of agreement 3/23)
Specialist visits (extent of agreement 9/22)
Diagnostic imaging (extent of agreement
5/20)
Pathology tests (extent of agreement 1/14)
NA
Richards (2003) [26] Percentage crude agreement
(%) (quadratic) weighted
kappa (kw)
Hospital readmissions (85.7% kw 0.42)
General practitioner surgery visits (83.0%
kw 0.61)
Patient-requested general practitioner home
visits (41.7% kw 0.59)
Routine general practitioner home visits
(71.0% kw 0.71)
95% confidence interval
Van den Brink (2004) [27] Percentage concordance Medications and care products (25%) NA
Zentner (2012) [28] Spearman rank correlation
coefficient
Hospital admissions 0.53
Ambulant psychiatric contacts 0.33
Medications 0.31
NA
Abbreviation: NA, not addressed.
102 C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
However, systematic errors occurred in the reported fre-
quency of physician visits (underreporting by 35e45%)
and overnight stays in the hospital (overreporting by
25e35%) [23]. There was a reasonably good correlation
of self-reported data with adequate concordance
(0.59e0.86) and Pearson (0.68e0.93) correlation coeffi-
cients for both IBD and non-IBDerelated overnight stays
in the hospital. Discrepancies in hospitalizations were due
to failures to estimate a stay within the recall period, omit-
ting stays following an emergency room visit, mistakenly
including day surgery as hospitalization, and misclassifying
an IBD stay. The tendency to report higher numbers of
overnight stays in the interviews was not statistically signif-
icantly different from those reported in the administrative
database. A difference of this magnitude would likely have
been significant in a larger sample with a higher base of
hospitalizations. The self-reported data also reported lower
numbers of physician visits relative to the administrative
data, despite the significant correlation. Regardless of
whether IBD was inactive, fluctuating, or active across
the full 12-months, lower reported numbers of physician
visits occurred by one-third when comparing self-reports
with administrative data [23].
Pinto et al. [26] compared the validity of questionnaires
and medical records regarding health care resource use over
a 3-month period. Lin concordance correlation coefficients
were fair to good, ranging from 0.41 for GP contacts to
0.63 for the number of medications reported. Generally, pa-
tients reported more medication usage and less GP contacts
when compared to the medical records. However, the mean
differences between the estimation periods were small:
À0.22 for medication use and 0.06 for GP contacts. Partic-
ipants more accurately reported their nonuse than use of
health care. GP contacts and medication usage had higher
levels of sensitivity (93e100%) than specificity
(63e100%). The 3-month measurement period might
explain this, as the use of health care might be relatively
low over this short period of time. Compared cost estima-
tions for GP contacts were in agreement with database re-
sults (0.502 Lin’s concordance correlation coefficient).
Zentner et al. [29] assessed the validity of self-reported
data gathered using a questionnaire compared with data
drawn from health insurance administrative records over a
6-month period. Correlation coefficients were moderate to
low (0.53e0.31) and reflected statistical differences when
comparing one measurement method to the other. Hospital
admissions measured over a 6-month recall period, ambu-
lant psychiatric contacts measured over a 3-month recall
period, and medication use measured over a 1-month recall
period were all reported lower in the questionnaire data
Records identified through database
searching
PubMed (n = 69)
NHSEED (n = 88)
ScreeningInclusionEligibilityIdentification
Additional records identified through other
sources
(n = 20)
Records after duplicates removed
(n = 173)
Records screened
(n = 173)
Records excluded based on
abstract and title
(n = 138)
Full-text articles assessed for
eligibility
(n = 35)
Full-text articles excluded
(n = 19)
No identification of items for
costing (n =9)
Proxy reported use (n=2)
Review or case-study (n=3)
Focus on one medication type
(n=3)
No comparison between self-
reported and administrative
methods (n=2)
Studies included
(n = 16)
Fig. 1. Study flow diagram. PRISMA, preferred reporting items for systematic review and meta-analysis.
103C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
when compared to the insurance records. The low correla-
tions between health care resource use measured using
questionnaires and insurance registrations were significant,
indicating a difference between the measurement methods
[29].
Kennedy et al. [22] compared health care use measured
via questionnaires with use measured via hospital physio-
therapy department records. Agreement was measured by
an intraclass correlation coefficient. The intraclass correla-
tion coefficient of 0.54 (95% CI 0.31e0.70) reasonably
supports the exchangeability of both the questionnaire
and department records. However, self-reported data re-
ported more resource use compared with hospital records;
a mean of 1.3 (95% CI 0.4e2.2) fewer visits were reported
in the physiotherapy department notes [22].
Mirandola et al. [24] assessed the agreement of self-
reported health care use measured via interviews with psy-
chiatric case registers over a 6-month period (biannually).
Twelve health care categories were defined, and agreement
between self-reports and administrative data varied greatly
for each category. Overall, agreement was particularly
weak for day-patient (hospital) contacts (0.14) and very
strong for nurses’ outpatient contacts, number of acute psy-
chiatric ward admissions, and sheltered accommodation (all
rc O 0.9). These results translate into overall agreement
(0.926) on the total costs. The findings support that self-
reported and administrative measurements of health care
use are exchangeable when estimating the total cost of ser-
vices. Nonetheless, as the interview tends to report higher
costs of day and community care compared to the case reg-
isters (rc 0.14 and rc 0.407, respectively) and tends to
result in lower outpatient costs (rc 0.422) when compared
to the registers, one should acknowledge the importance of
adequate procedures for analyzing the complexity of costs
components [24].
Richards et al. [28] assessed the agreement of self-
reported data via interviewer-administered questionnaires
compared with health care providers’ reports over a 12-
week period. Recall periods varied for different resources
(4e12 weeks). Agreement and systematic differences in
the distribution of paired responses were examined to
assess the comparability of the methods. Within a range
of 42e93%, agreement was over 72%. Systematic differ-
ences were observed when comparing self-reports with
administrative data for routine primary care home visits
(overreporting). Self-reports reported less resource use for
hospital readmissions, requested primary care home visits,
surgery consultations, district nursing, and physiotherapy
when compared to provider records over short time frames
(4e12 weeks). The number of visits was only gathered for
hospital readmissions, routine and requested primary home
care and surgery visits. Kappa coefficients were moderate
or good for the four comparisons (kappa 0.42e0.86). The
95% confidence intervals of the kappa coefficients were
quite wide for all comparisons. Systematic bias was indi-
cated for hospital readmissions and surgery visits
(P ! 0.05). Marginally significant differences
(P 5 0.05e0.10) were indicated for requested and routine
primary home care visits. There were adequate levels of
crude agreement between patient-reported use of health
care services and usage data from health care providers.
However, the analysis of bias between pairs revealed sys-
tematic differences and more marginally significant differ-
ences in the response patterns. Except for primary care
routine home visits, patients reported less health care use
compared to health care provider records [28].
4. Conclusions
This study draws attention to (1) the MeRQ and (2) the
validation evidence of studies comparing self-reported with
administrative data collection methods used in measuring
health care resource use for cost estimation. Based on the
small number of six studies with adequate MeRQ presented
in this review, the exchangeability of self-reported and
administrative health care resource use measurements can
only be cautiously supported based on the presented valida-
tion evidence (e.g., reporting differences such as different
number of visits, or heterogeneous or weak correlation co-
efficients, etc). No evidence can be provided to suggest that
one measurement is better than the other.
Throughout these six studies, when measured, patients
generally tended to report lower resource use, mainly
regarding physician and physiotherapy contacts, when
compared to the resource use gathered via administrative
data. The most commonly reported source of validation ev-
idence was the calculation of correlation coefficients indi-
cating the strength and direction of association. This
approach is insufficient to indicate agreement or absolute
measurement errors. Further attention should be paid to
methods that enhance adequate MeRQ and the trustworthi-
ness of validation evidence supporting the exchangeability
of self-reports and administrative data for purpose of cost
estimation. Notwithstanding, self-reported data appear to
be able to provide adequate general estimates of health care
use when administrative data are not available.
4.1. Limitations
A limitation of the study is the limited generalizability
of the findings due to great diversity. First, most studies
collected data on multiple types of resource use and with
divergent populations. However, the validation evidence
was not necessarily more trustworthy when the population
being assessed (e.g., patients with mental health com-
plaints) was directly linked to the type of health care
resource use relevant to the illness (e.g., psychologists).
Second, as most studies were conducted in a randomized
controlled trial, the main focus of these studies was on
assessing the effectiveness and/or cost-effectiveness of
an intervention or policy adjustment. Comparing the
104 C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
self-reported data with administrative data was usually not
the primary aim. Third, sample sizes were small. In most
cases, existing populations gathered for a completely
different research purpose were assessed in a secondary
analysis on agreement levels.
4.2. Discussion
The MeRQ was assessed with a newly developed
appraisal tool called MeRQ, which was derived by
combining items of existing guidelines, that is, GRRAS
and STARD. The MeRQ tool was developed to ensure
the reporting of key elements required to assess the valida-
tion evidence and the risk of bias. It was difficult to
appraise the methodological rigor when key information
was reported vaguely or not at all. Interpreting validation
evidence requires knowledge of the methods and statistics
but even then, there are no standard statistics, and different
interpretations are possible. Although the use of the MeRQ
tool was feasible in the current systematic review, further
research is warranted to assess its validity.
Adequate MeRQ was found in a minority of the studies.
Six studies assessed in the best evidence section demon-
strate the complexity between different data sources. Data
sets contained both random and systematic errors. System-
atic errors were generally detected by comparing two
imperfect sources of information, and taking directional
trends into account might explain the discrepancies.
Possible reasons for random sources of error could include
problems in recall (for self-reports) or data entry errors
(for administrative data). In both, interpretation or coding
differences might explain random errors. In addition, dif-
ferential bias may occur in trials examining innovative ser-
vices, particularly where the boundaries between specific
service providers are blurred from the patient’s perspec-
tive. These discrepancies in health care resource use that
relate to cost elements might be of concern, especially
when they have a large impact on the overall costs of care
(e.g., hospital admissions, complex care packages).
Although the difference between two methods of data
collection might be statistically significant, they might
not be economically relevant, or vice versa. For example,
if the health care resource use being researched (e.g.,
physiotherapy contacts) was one of the dominant costs
in the evaluation (i.e., due to the population-specific health
complaints), then the addition or subtraction of one extra
visit per patient may not be statistically significant but
would be economically relevant. Therefore, one needs to
properly address a priori usage and identification of items
for making cost estimations.
4.3. Implications for research and practice
The assumptions made in some studies that adminis-
trative databases should be the reference standard is
questionable because they seem to have errors. When
using the data for cost estimations, statistical corrections
might be relevant. However, corrections for cost data
from administrative databases when computing incre-
mental cost-effectiveness ratios are so far unavailable
[30]. Previous reviews identified limitations of these da-
tabases [3,5,6]. The health economic community does
not yet have a culture of validating resource use measure-
ment, and there is a lack of evidence regarding the collec-
tion of resource use data [31]. Type-a-like reviews
might aid in the assessment of the MeRQ to deduce the
validation evidence available. The validation evidence
of a (small) number of studies with adequate MeRQ
cautiously supports the exchangeability of both the self-
reported and administrative resource use measurements.
Coherent methodological quality which supports the in-
terpretations of estimates and rigorous, well-reported
research providing strategically collected validation evi-
dence is needed.
References
[1] Drummond M, McGuire A. Economic evaluation in health care:
merging theory with practice. New York: Oxford University Press;
2001.
[2] Hakkaart L, Tan SS, Bouwmans CAM. Handleiding voor kostenon-
derzoek. Methoden en standaard kostprijzen voor economische eval-
uaties in de gezondheidszorg. Rotterdam: Instituut voor Medische
Technology Assessment, Erasmus Universiteit Rotterdam. CVZ Col-
lege voor zorgverzekeringen; 2010.
[3] Bhandari A, Wagner T. Self-reported utilization of health care ser-
vices: improving measurement and accuracy. Med Care Res Rev
2006;63:217e35.
[4] Ridyard CH, Hughes DA. Development of a database of instruments
for resource-use measurement: purpose, feasibility, and design. Value
Health 2012;15:650e5.
[5] Evans C, Crawford B. Patient self-reports in pharmacoeconomic
studies. Their use and impact on study validity. Pharmacoeconomics
1999;15:241e56.
[6] Harlow SD, Linet MS. Agreement between questionnaire data and
medical records. The evidence for accuracy of recall. Am J Epide-
miol 1989;129:233e48.
[7] Moher D, Liberati A, Tetzlaff J, Altman DG, Prisma Group. Preferred
reporting items for systematic reviews and meta-analyses: the PRIS-
MA statement. J Clin Epidemiol 2009;62:1006e12.
[8] Sassi F, Archard L, McDaid D. Searching literature databases for
health care economic evaluations: how systematic can we afford to
be? Med Care 2002;40:387e94.
[9] Alton V, Eckerlund I, Norlund A. Health economic evaluations: how
to find them. Int J Technol Assess Health Care 2006;22:512e7.
[10] International Health Economics Discussion List, 2013. Available at
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A25ind1305L5
healthecon-discussP5R16115healthecon-discuss95AJ5
ond5NoþMatch%3BMatch%3BMatchesz54. Accessed May
22, 2013.
[11] Kottner J, Audige L, Brorson S, Donner A, Gajewski BJ,
Hrobjartsson A, et al. Guidelines for Reporting Reliability and
Agreement Studies (GRRAS) were proposed. J Clin Epidemiol
2011;64:96e106.
[12] Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP,
Irwig LM, et al. Towards complete and accurate reporting of studies
of diagnostic accuracy: the STARD initiative. BMJ 2003;326:41e4.
105C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
[13] Byford S, Leese M, Knapp M, Seivewright H, Cameron S, Jones V,
et al. Comparison of alternative methods of collection of service
use data for the economic evaluation of health care interventions.
Health Econ 2007;16:531e6.
[14] Hoogendoorn M, van Wetering CR, Schols AM, Rutten-van
Molken MP. Self-report versus care provider registration of health-
care utilization: impact on cost and cost-utility. Int J Technol Assess
Health Care 2009;25:588e95.
[15] Marks AS, Lee DW, Slezak J, Berger J, Patel H, Johnson KE. Agree-
ment between insurance claim and self-reported hospital and emer-
gency room utilization data among persons with diabetes. Dis
Manag 2003;6:199e205.
[16] Patel A, Rendu A, Moran P, Leese M, Mann A, Knapp M. A compar-
ison of two methods of collecting economic data in primary care.
Fam Pract 2005;22:323e7.
[17] van den Brink M, van den Hout WB, Stiggelbout AM, van de
Velde CJ, Kievit J. Cost measurement in economic evaluations of
health care: whom to ask? Med Care 2004;42:740e6.
[18] Streiner DL, Kottner J. Recommendations for reporting the results of
studies of instrument and scale development and testing. J Adv Nurs
2014;70:1970e9.
[19] Goossens ME, Rutten-van Molken MP, Vlaeyen JW, van der
Linden SM. The cost diary: a method to measure direct and in-
direct costs in cost-effectiveness research. J Clin Epidemiol 2000;
53:688e95.
[20] Guerriere DN, Ungar WJ, Corey M, Croxford R, Tranmer JE,
Tullis E, et al. Evaluation of the ambulatory and home care record:
agreement between self-reports and administrative data. Int J Technol
Assess Health Care 2006;22:203e10.
[21] Janson M, Carlsson P, Haglind E, Anderberg B. Data validation in an
economic evaluation of surgery for colon cancer. Int J Technol Assess
Health Care 2005;21:246e52.
[22] Kennedy AD, Leigh-Brown AP, Torgerson DJ, Campbell J, Grant A.
Resource use data by patient report or hospital records: do they
agree? BMC Health Serv Res 2002;2:2.
[23] Longobardi T, Walker JR, Graff LA, Bernstein CN. Health service
utilization in IBD: comparison of self-report and administrative data.
BMC Health Serv Res 2011;11:137.
[24] Mirandola M, Bisoffi G, Bonizzato P, Amaddeo F. Collecting psychi-
atric resources utilisation data to calculate costs of care: a comparison
between a service receipt interview and a case register. Soc Psychia-
try Psychiatr Epidemiol 1999;34:541e7.
[25] Mistry H, Buxton M, Longworth L, Chatwin J, Peveler R. Compari-
son of general practitioner records and patient self-report question-
naires for estimation of costs. Eur J Health Econ 2005;6:261e6.
[26] Pinto D, Robertson MC, Hansen P, Abbott JH. Good agreement between
questionnaire and administrative databases for health care use and costs
in patients with osteoarthritis. BMC Med Res Methodol 2011;11:45.
[27] Pollicino C, Viney R, Haas M. Measuring health system resource use
for economic evaluation: a comparison of data sources. Aust Health
Rev 2002;25:171e8.
[28] Richards SH, Coast J, Peters TJ. Patient-reported use of health ser-
vice resources compared with information from health providers.
Health Soc Care Community 2003;11:510e8.
[29] Zentner N, Baumgartner I, Becker T, Puschner B. Health service
costs in people with severe mental illness: patient report vs. adminis-
trative records. Psychiatr Prax 2012;39:122e8.
[30] Johnston K, Buxton MJ, Jones DR, Fitzpatrick R. Assessing the costs
of healthcare technologies in clinical trials. Health Technol Assess
1999;3:1e76.
[31] Thorn JC, Coast J, Cohen D, Hollingworth W, Knapp M, Noble SM,
et al. Resource-use measurement based on patient recall: issues and
challenges for economic evaluation. Appl Health Econ Health Pol
2013;11:155e61.
106 C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
Appendix A. Search strategy. Database: Medline (PubMed)
(measure* OR correlat* OR evaluat* OR accuracy OR accurate OR precision OR agreement OR
valid* [Title/Abstract]) AND (self-report [MeSH Terms] OR self-report [Text Word] OR
self-assessment[Text Word] OR patient self-report [Text Word] OR patient-reported [Text
Word] OR questionnaires [Mesh Term] OR recall [Text Word] OR interviews as topic [Mesh
Term] OR health care surveys [Mesh Terms]) AND (Medical Records [MeSH Terms] OR Records
as Topic [Mesh Term] OR medical records systems, computerized [Mesh Term], OR medical re-
cord administrators [Mesh Term]) AND (health resources [MeSH Terms] OR delivery of health
care [MeSH Terms] OR utilization [Text Word] OR resource use [Text Word]) AND data collec-
tion [Mesh Term]) AND cost analysis.
Appendix B. MeRQ tool: criteria of Methodological Reporting Quality as adapted from the Standards for Reporting
Diagnostic Accuracy (STARD) and the Guidelines for Reporting Reliability and Agreement Studies (GRRAS)
Study and reviewer identification
Reviewer: First author: Year of publication:
Title:
Language: Country:
Methodological Reporting Quality (MeRQ)
Study criterion and reporting element (operational
definition)
In this study, reporting of the criterion is
Explicit identification of the study as an evaluation
of the reliability, agreement, or validity (accuracy)
of a self-reported health care resource use
measurement instrument compared with health
care resource use measured using administrative
data
Good
Fair
Poor
Not reported
Not addressed
Not applicable
Proposed validation arguments and strategies for
interpreting validation evidence
Good
Fair
Poor
Not reported
Not addressed
Not applicable
The measurement methods (eligibility criteria) with
a critical review of evidence relevant for the study
Good
Fair
Poor
Not reported
Not addressed
Not applicable
Rationale for relationship between the self-report
and the administrative data
Good
Fair
Poor
Not reported
Not addressed
Not applicable
Methods or measurement procedures for the self-
reports
Good
Fair
Poor
Not reported
Not addressed
Not applicable
Methods or measurement procedures for the
administrative data
Good
Fair
Poor
Not reported
Not addressed
Not applicable
Time interval between self-reports and
administrative data collection
Good
Fair
Poor
Not reported
Not addressed
Not applicable
Population characteristics (including sample size,
sampling methods, recruitment)
Good
Fair
Poor
Not reported
Not addressed
Not applicable
Please choose 1 of the following boxes to fill in according to the information provided in the article: Does this study examine: validity? reliability?
agreement? (fill in according to your assessment)
Validity
Statistical methods for validation estimates defined Good
Fair
Poor
Not reported
Not addressed
Not applicable
Validation estimates defined Good
Fair
Poor
Not reported
Not addressed
Not applicable
Methods for statistical uncertainty defined
(VariabilitydConfidence Intervals)
Good
Fair
Poor
Not reported
Not addressed
Not applicable
(Continued)
106.e1C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
Appendix B. Continued
Reliability
Statistical methods for validation estimates defined Good
Fair
Poor
Not reported
Not addressed
Not applicable
Validation estimates defined Good
Fair
Poor
Not reported
Not addressed
Not applicable
Methods for statistical uncertainty defined
(VariabilitydConfidence Intervals)
Good
Fair
Poor
Not reported
Not addressed
Not applicable
Agreement
Statistical methods for validation estimates defined Good
Fair
Poor
Not reported
Not addressed
Not applicable
Validation estimates defined Good
Fair
Poor
Not reported
Not addressed
Not applicable
Methods for statistical uncertainty defined
(VariabilitydConfidence Intervals)
Good
Fair
Poor
Not reported
Not addressed
Not applicable
Concluding validation argument/interpretation of
validation evidence
Good
Fair
Poor
Not reported
Not addressed
Not applicable
Additional information
106.e2 C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106

More Related Content

What's hot

Evidence based nursing resources
Evidence based nursing resourcesEvidence based nursing resources
Evidence based nursing resourcesDonna Chow
 
Health Technology Assessment- Overview
Health Technology Assessment- OverviewHealth Technology Assessment- Overview
Health Technology Assessment- Overviewshashi sinha
 
Protocol on burnout
Protocol on burnoutProtocol on burnout
Protocol on burnoutRosebenter
 
Dimensions of value, assessment, and decision making
Dimensions of value, assessment, and decision making Dimensions of value, assessment, and decision making
Dimensions of value, assessment, and decision making Office of Health Economics
 
Patient Data Collection Methods. Retrospective Insights.
Patient Data Collection Methods. Retrospective Insights.Patient Data Collection Methods. Retrospective Insights.
Patient Data Collection Methods. Retrospective Insights.QUESTJOURNAL
 
Unit 8 evidence based practice
Unit 8  evidence based practiceUnit 8  evidence based practice
Unit 8 evidence based practiceChanda Jabeen
 
Systematic reviews of
Systematic reviews ofSystematic reviews of
Systematic reviews ofislam kassem
 
MAST and its application in RENEWING HEALTH
MAST and its application in RENEWING HEALTHMAST and its application in RENEWING HEALTH
MAST and its application in RENEWING HEALTHAnna Kotzeva
 
Pathway 2.0 for RWE and MA 2015 -John Cai
Pathway 2.0 for RWE and MA 2015 -John CaiPathway 2.0 for RWE and MA 2015 -John Cai
Pathway 2.0 for RWE and MA 2015 -John CaiJohn Cai
 
Evaluation
EvaluationEvaluation
Evaluationbodo-con
 
Outcome measures and their importance in physiotherapy practice and research
Outcome measures and their importance in physiotherapy practice and researchOutcome measures and their importance in physiotherapy practice and research
Outcome measures and their importance in physiotherapy practice and researchAkhilaNatesan
 
Can CER and Personalized Medicine Work Together
Can CER and Personalized Medicine Work TogetherCan CER and Personalized Medicine Work Together
Can CER and Personalized Medicine Work TogetherJohn Cai
 
Relationships of Providers’ Accountability of Nursing Documentations in the C...
Relationships of Providers’ Accountability of Nursing Documentations in the C...Relationships of Providers’ Accountability of Nursing Documentations in the C...
Relationships of Providers’ Accountability of Nursing Documentations in the C...IJEAB
 

What's hot (20)

Evidence based nursing resources
Evidence based nursing resourcesEvidence based nursing resources
Evidence based nursing resources
 
Health Technology Assessment- Overview
Health Technology Assessment- OverviewHealth Technology Assessment- Overview
Health Technology Assessment- Overview
 
The importance of_measuring_outcomes
The importance of_measuring_outcomesThe importance of_measuring_outcomes
The importance of_measuring_outcomes
 
Evidence based medicine
Evidence based medicineEvidence based medicine
Evidence based medicine
 
Revue system
Revue systemRevue system
Revue system
 
Protocol on burnout
Protocol on burnoutProtocol on burnout
Protocol on burnout
 
MADLENARTICLES
MADLENARTICLESMADLENARTICLES
MADLENARTICLES
 
Dimensions of value, assessment, and decision making
Dimensions of value, assessment, and decision making Dimensions of value, assessment, and decision making
Dimensions of value, assessment, and decision making
 
Patient Data Collection Methods. Retrospective Insights.
Patient Data Collection Methods. Retrospective Insights.Patient Data Collection Methods. Retrospective Insights.
Patient Data Collection Methods. Retrospective Insights.
 
Unit 8 evidence based practice
Unit 8  evidence based practiceUnit 8  evidence based practice
Unit 8 evidence based practice
 
Systematic reviews of
Systematic reviews ofSystematic reviews of
Systematic reviews of
 
MAST and its application in RENEWING HEALTH
MAST and its application in RENEWING HEALTHMAST and its application in RENEWING HEALTH
MAST and its application in RENEWING HEALTH
 
Research design (1)
Research design (1)Research design (1)
Research design (1)
 
Evidence based medicine
Evidence based medicineEvidence based medicine
Evidence based medicine
 
Pathway 2.0 for RWE and MA 2015 -John Cai
Pathway 2.0 for RWE and MA 2015 -John CaiPathway 2.0 for RWE and MA 2015 -John Cai
Pathway 2.0 for RWE and MA 2015 -John Cai
 
What Is Evidence Based Practice
What Is Evidence Based Practice What Is Evidence Based Practice
What Is Evidence Based Practice
 
Evaluation
EvaluationEvaluation
Evaluation
 
Outcome measures and their importance in physiotherapy practice and research
Outcome measures and their importance in physiotherapy practice and researchOutcome measures and their importance in physiotherapy practice and research
Outcome measures and their importance in physiotherapy practice and research
 
Can CER and Personalized Medicine Work Together
Can CER and Personalized Medicine Work TogetherCan CER and Personalized Medicine Work Together
Can CER and Personalized Medicine Work Together
 
Relationships of Providers’ Accountability of Nursing Documentations in the C...
Relationships of Providers’ Accountability of Nursing Documentations in the C...Relationships of Providers’ Accountability of Nursing Documentations in the C...
Relationships of Providers’ Accountability of Nursing Documentations in the C...
 

Similar to 2016_The exchangeability of self-reports and administrative health care resource use measurements

INTERGRATIVE REVIEW 14Equip.docx
INTERGRATIVE REVIEW             14Equip.docxINTERGRATIVE REVIEW             14Equip.docx
INTERGRATIVE REVIEW 14Equip.docxvrickens
 
International Journal for Quality in Health Care 2003; Volume .docx
International Journal for Quality in Health Care 2003; Volume .docxInternational Journal for Quality in Health Care 2003; Volume .docx
International Journal for Quality in Health Care 2003; Volume .docxnormanibarber20063
 
Ware DEFINING AND MEASURING PATIENT SATISFACTION 1983.pdf
Ware DEFINING AND MEASURING PATIENT SATISFACTION 1983.pdfWare DEFINING AND MEASURING PATIENT SATISFACTION 1983.pdf
Ware DEFINING AND MEASURING PATIENT SATISFACTION 1983.pdfJoseErnestoBejarCent
 
Operations research within UK healthcare: A review
Operations research within UK healthcare: A reviewOperations research within UK healthcare: A review
Operations research within UK healthcare: A reviewHarender Singh
 
AMIA 2015 Registries in Accountable Care poster
AMIA 2015 Registries in Accountable Care posterAMIA 2015 Registries in Accountable Care poster
AMIA 2015 Registries in Accountable Care posterKent State University
 
An Empirical Study of the Co-Creation of Values of Healthcare Consumers – The...
An Empirical Study of the Co-Creation of Values of Healthcare Consumers – The...An Empirical Study of the Co-Creation of Values of Healthcare Consumers – The...
An Empirical Study of the Co-Creation of Values of Healthcare Consumers – The...Healthcare and Medical Sciences
 
Prevencion de ulceras por presion y protocolo de tratamiento
Prevencion de ulceras por presion y protocolo de tratamientoPrevencion de ulceras por presion y protocolo de tratamiento
Prevencion de ulceras por presion y protocolo de tratamientoGNEAUPP.
 
Evaluation TableUse this document to complete the evaluati
Evaluation TableUse this document to complete the evaluatiEvaluation TableUse this document to complete the evaluati
Evaluation TableUse this document to complete the evaluatiBetseyCalderon89
 
Can clinical governance deliver quality improvement.pdf
Can clinical governance deliver quality improvement.pdfCan clinical governance deliver quality improvement.pdf
Can clinical governance deliver quality improvement.pdfCarlos Vicente Gazzo Serrano
 
826 Unertl et al., Describing and Modeling WorkflowResearch .docx
826 Unertl et al., Describing and Modeling WorkflowResearch .docx826 Unertl et al., Describing and Modeling WorkflowResearch .docx
826 Unertl et al., Describing and Modeling WorkflowResearch .docxevonnehoggarth79783
 
Real-World Evidence Studies_ Introduction, Purpose, and Data Collection Strat...
Real-World Evidence Studies_ Introduction, Purpose, and Data Collection Strat...Real-World Evidence Studies_ Introduction, Purpose, and Data Collection Strat...
Real-World Evidence Studies_ Introduction, Purpose, and Data Collection Strat...ProRelix Research
 
The Value of Knowing and Knowing the Value: Improving the Health Technology A...
The Value of Knowing and Knowing the Value: Improving the Health Technology A...The Value of Knowing and Knowing the Value: Improving the Health Technology A...
The Value of Knowing and Knowing the Value: Improving the Health Technology A...Office of Health Economics
 
New microsoft office power point presentation
New microsoft office power point presentationNew microsoft office power point presentation
New microsoft office power point presentationEmani Aparna
 
An Overview of Patient Satisfaction and Perceived Care of Quality
An Overview of Patient Satisfaction and Perceived Care of QualityAn Overview of Patient Satisfaction and Perceived Care of Quality
An Overview of Patient Satisfaction and Perceived Care of Qualityijtsrd
 
Written Leadership in nursing.docx
Written Leadership in nursing.docxWritten Leadership in nursing.docx
Written Leadership in nursing.docxwrite22
 
Deborah Schaler - EHPR 2014 - Patient Feedback Research
Deborah Schaler - EHPR 2014 - Patient Feedback ResearchDeborah Schaler - EHPR 2014 - Patient Feedback Research
Deborah Schaler - EHPR 2014 - Patient Feedback ResearchDeborah Schaler
 
Computer Decision Support Systems and Electronic Health Records: Am J Pub Hea...
Computer Decision Support Systems and Electronic Health Records: Am J Pub Hea...Computer Decision Support Systems and Electronic Health Records: Am J Pub Hea...
Computer Decision Support Systems and Electronic Health Records: Am J Pub Hea...Lorenzo Moja
 

Similar to 2016_The exchangeability of self-reports and administrative health care resource use measurements (20)

INTERGRATIVE REVIEW 14Equip.docx
INTERGRATIVE REVIEW             14Equip.docxINTERGRATIVE REVIEW             14Equip.docx
INTERGRATIVE REVIEW 14Equip.docx
 
International Journal for Quality in Health Care 2003; Volume .docx
International Journal for Quality in Health Care 2003; Volume .docxInternational Journal for Quality in Health Care 2003; Volume .docx
International Journal for Quality in Health Care 2003; Volume .docx
 
EVIDENCE BASED NURSING PRACTICE
EVIDENCE BASED NURSING PRACTICE EVIDENCE BASED NURSING PRACTICE
EVIDENCE BASED NURSING PRACTICE
 
Ware DEFINING AND MEASURING PATIENT SATISFACTION 1983.pdf
Ware DEFINING AND MEASURING PATIENT SATISFACTION 1983.pdfWare DEFINING AND MEASURING PATIENT SATISFACTION 1983.pdf
Ware DEFINING AND MEASURING PATIENT SATISFACTION 1983.pdf
 
Operations research within UK healthcare: A review
Operations research within UK healthcare: A reviewOperations research within UK healthcare: A review
Operations research within UK healthcare: A review
 
AMIA 2015 Registries in Accountable Care poster
AMIA 2015 Registries in Accountable Care posterAMIA 2015 Registries in Accountable Care poster
AMIA 2015 Registries in Accountable Care poster
 
An Empirical Study of the Co-Creation of Values of Healthcare Consumers – The...
An Empirical Study of the Co-Creation of Values of Healthcare Consumers – The...An Empirical Study of the Co-Creation of Values of Healthcare Consumers – The...
An Empirical Study of the Co-Creation of Values of Healthcare Consumers – The...
 
s12911-022-01756-2.pdf
s12911-022-01756-2.pdfs12911-022-01756-2.pdf
s12911-022-01756-2.pdf
 
Prevencion de ulceras por presion y protocolo de tratamiento
Prevencion de ulceras por presion y protocolo de tratamientoPrevencion de ulceras por presion y protocolo de tratamiento
Prevencion de ulceras por presion y protocolo de tratamiento
 
Evaluation TableUse this document to complete the evaluati
Evaluation TableUse this document to complete the evaluatiEvaluation TableUse this document to complete the evaluati
Evaluation TableUse this document to complete the evaluati
 
Can clinical governance deliver quality improvement.pdf
Can clinical governance deliver quality improvement.pdfCan clinical governance deliver quality improvement.pdf
Can clinical governance deliver quality improvement.pdf
 
826 Unertl et al., Describing and Modeling WorkflowResearch .docx
826 Unertl et al., Describing and Modeling WorkflowResearch .docx826 Unertl et al., Describing and Modeling WorkflowResearch .docx
826 Unertl et al., Describing and Modeling WorkflowResearch .docx
 
Real-World Evidence Studies_ Introduction, Purpose, and Data Collection Strat...
Real-World Evidence Studies_ Introduction, Purpose, and Data Collection Strat...Real-World Evidence Studies_ Introduction, Purpose, and Data Collection Strat...
Real-World Evidence Studies_ Introduction, Purpose, and Data Collection Strat...
 
The Value of Knowing and Knowing the Value: Improving the Health Technology A...
The Value of Knowing and Knowing the Value: Improving the Health Technology A...The Value of Knowing and Knowing the Value: Improving the Health Technology A...
The Value of Knowing and Knowing the Value: Improving the Health Technology A...
 
Abpi rd conference_towse_ 20_nov_2014
Abpi rd conference_towse_ 20_nov_2014Abpi rd conference_towse_ 20_nov_2014
Abpi rd conference_towse_ 20_nov_2014
 
New microsoft office power point presentation
New microsoft office power point presentationNew microsoft office power point presentation
New microsoft office power point presentation
 
An Overview of Patient Satisfaction and Perceived Care of Quality
An Overview of Patient Satisfaction and Perceived Care of QualityAn Overview of Patient Satisfaction and Perceived Care of Quality
An Overview of Patient Satisfaction and Perceived Care of Quality
 
Written Leadership in nursing.docx
Written Leadership in nursing.docxWritten Leadership in nursing.docx
Written Leadership in nursing.docx
 
Deborah Schaler - EHPR 2014 - Patient Feedback Research
Deborah Schaler - EHPR 2014 - Patient Feedback ResearchDeborah Schaler - EHPR 2014 - Patient Feedback Research
Deborah Schaler - EHPR 2014 - Patient Feedback Research
 
Computer Decision Support Systems and Electronic Health Records: Am J Pub Hea...
Computer Decision Support Systems and Electronic Health Records: Am J Pub Hea...Computer Decision Support Systems and Electronic Health Records: Am J Pub Hea...
Computer Decision Support Systems and Electronic Health Records: Am J Pub Hea...
 

More from Cindy Noben

2015_Discrete choice experiments versus rating scale exercises to evaluate th...
2015_Discrete choice experiments versus rating scale exercises to evaluate th...2015_Discrete choice experiments versus rating scale exercises to evaluate th...
2015_Discrete choice experiments versus rating scale exercises to evaluate th...Cindy Noben
 
Comparative cost effectiveness of two interventions to promote work functioni...
Comparative cost effectiveness of two interventions to promote work functioni...Comparative cost effectiveness of two interventions to promote work functioni...
Comparative cost effectiveness of two interventions to promote work functioni...Cindy Noben
 
Protecting and promoting mental health of nurses in the hospital setting: is ...
Protecting and promoting mental health of nurses in the hospital setting: is ...Protecting and promoting mental health of nurses in the hospital setting: is ...
Protecting and promoting mental health of nurses in the hospital setting: is ...Cindy Noben
 
Quality appraisal of generic self reported instruments measuring healht relat...
Quality appraisal of generic self reported instruments measuring healht relat...Quality appraisal of generic self reported instruments measuring healht relat...
Quality appraisal of generic self reported instruments measuring healht relat...Cindy Noben
 
Design of a trial based economic evaluation on the cost effectiveness of empl...
Design of a trial based economic evaluation on the cost effectiveness of empl...Design of a trial based economic evaluation on the cost effectiveness of empl...
Design of a trial based economic evaluation on the cost effectiveness of empl...Cindy Noben
 
E-book Proefschrift Cindy Noben
E-book Proefschrift Cindy NobenE-book Proefschrift Cindy Noben
E-book Proefschrift Cindy NobenCindy Noben
 

More from Cindy Noben (6)

2015_Discrete choice experiments versus rating scale exercises to evaluate th...
2015_Discrete choice experiments versus rating scale exercises to evaluate th...2015_Discrete choice experiments versus rating scale exercises to evaluate th...
2015_Discrete choice experiments versus rating scale exercises to evaluate th...
 
Comparative cost effectiveness of two interventions to promote work functioni...
Comparative cost effectiveness of two interventions to promote work functioni...Comparative cost effectiveness of two interventions to promote work functioni...
Comparative cost effectiveness of two interventions to promote work functioni...
 
Protecting and promoting mental health of nurses in the hospital setting: is ...
Protecting and promoting mental health of nurses in the hospital setting: is ...Protecting and promoting mental health of nurses in the hospital setting: is ...
Protecting and promoting mental health of nurses in the hospital setting: is ...
 
Quality appraisal of generic self reported instruments measuring healht relat...
Quality appraisal of generic self reported instruments measuring healht relat...Quality appraisal of generic self reported instruments measuring healht relat...
Quality appraisal of generic self reported instruments measuring healht relat...
 
Design of a trial based economic evaluation on the cost effectiveness of empl...
Design of a trial based economic evaluation on the cost effectiveness of empl...Design of a trial based economic evaluation on the cost effectiveness of empl...
Design of a trial based economic evaluation on the cost effectiveness of empl...
 
E-book Proefschrift Cindy Noben
E-book Proefschrift Cindy NobenE-book Proefschrift Cindy Noben
E-book Proefschrift Cindy Noben
 

2016_The exchangeability of self-reports and administrative health care resource use measurements

  • 1. The exchangeability of self-reports and administrative health care resource use measurements: assessement of the methodological reporting quality Cindy Yvonne Nobena,*, Angelique de Rijkb , Frans Nijhuisc , Jan Kottnerd , Silvia Eversa a Department of Health Services Research, Faculty of Health, Medicine and Life Sciences, Maastricht University, PO Box 616, 6200 MD Maastricht, The Netherlands b Department of Social Medicine, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands c Department of Work and Organizational Psychology, Faculty of Psychology and Neuroscience, Maastricht University, Maastricht, The Netherlands d Department of Dermatology and Allergy, Charite-Universit€atsmedizin Berlin, Germany Accepted 28 September 2015; Published online 2 February 2016 Abstract Objective: To assess the exchangeability of self-reported and administrative health care resource use measurements for cost estimation. Study Design and Setting: In a systematic review (NHS EED and MEDLINE), reviewers evaluate, in duplicate, the methodological reporting quality of studies comparing the validation evidence of instruments measuring health care resource use. The appraisal tool Meth- odological Reporting Quality (MeRQ) is developed by merging aspects form the Guidelines for Reporting Reliability and Agreement Studies and the Standards for Reporting Diagnostic Accuracy. Results: Out of 173 studies, 35 full-text articles are assessed for eligibility. Sixteen articles are included in this study. In seven articles, more than 75% of the reporting criteria assessed by MERQ are considered ‘‘good.’’ Most studies score at least ‘‘fair’’ on most of the re- porting quality criteria. In the end, six studies score ‘‘good’’ on the minimal criteria for reporting. Varying levels of agreement among the different data sources are found, with correlations ranging from 0.14 up to 0.93 and with occurrences of both random and systematic errors. Conclusion: The validation evidence of the small number of studies with adequate MeRQ cautiously supports the exchangeability of both the self-reported and administrative resource use measurement methods. Ó 2016 Elsevier Inc. All rights reserved. Keywords: Health care; Utilization; Self-report; Data collection; Validation studies; Systematic review 1. Introduction Essential to the economic evaluation of health care treat- ments is a robust method for collecting data on health care resource use, which can be used to calculate the related costs. Health care resource quantities can be defined in different ways. The cost types for potential inclusion in an economic evaluation are commonly classified into health care costs, patient and family costs, and costs in other sec- tors (absenteeism and presenteeism) [1,2]. Universally recognized methods of collecting resource use data are lacking, and a wide variety of methods can be used. Resource use data are most often measured through patient self-reports (e.g., cost questionnaires, diaries, interviews) or through administrative records (e.g., administrative systems of care providers or insurance companies). Both methods have advantages and disadvantages. Cautiousness is warranted when calculating costs from pa- tient self-reports because patients may not know or be able to recall precisely their length of stay and diagnoses, two of the primary cost drivers [3]. Furthermore, self-reports are more burdensome for patients, and the different timing of data collection might influence accuracy. Administrative data collection methods are assumed to be the most objec- tive and accurate source of information although there is no common reference standard for measuring resource use. However, as health care provision has become more inte- grated, economic evaluations increasingly need to include information from many different care providers. It is becoming more difficult and expensive to review the re- cords from the many providers involved in administering treatments to patients. Furthermore, given prolonged sur- vival rates and the aging population, it is expensive to Conflict of interest: None. Funding: None. * Corresponding author. Tel.: þ31 43 38 81 834; fax: þ31 43 38 84162. E-mail address: c.noben@maastrichtuniversity.nl (C.Y. Noben). http://dx.doi.org/10.1016/j.jclinepi.2015.09.019 0895-4356/Ó 2016 Elsevier Inc. All rights reserved. Journal of Clinical Epidemiology 74 (2016) 93e106
  • 2. What is new? Key findings Results of good methodological studies assessed by MeRQ cautiously indicate the exchangeability for both self-reported and administrative resource use measurement methods. What this adds to what was known? Only a few studies were conducted using rigorous methods. What is the implication and what should change now? The methodological reporting quality of health care resource use measurements should be assessed before using the data for cost estimations. obtain longitudinal treatment information by accessing medical records at multiple points in time. Even when administrative data are available and accurate, limits on benefit coverage might require the collection of health care utilization data from other sources (i.e., alternative health care and out-of-pocket payments) to calculate the economic costs. The major advantages and disadvantages of patient self-reported vs. administrative measurements are listed in Table 1. Considerable research effort is concentrated on the appropriate measurement of outcomes and the policy impli- cations of economic evaluation. The methods for collecting the resource use data are relatively neglected [4]. No stan- dard methodology for estimating health care resource use is developed yet. Consequently, the choice for a collection method for resource use data is often a matter of practi- cality instead of a decision based on methodological considerations. Previous studies primarily focus on the accuracy of pa- tient recall and the impact of patient self-reported methods on the accuracy of estimates [3,5,6], and not on the meth- odological reporting quality (MeRQ) of these studies. Because of different definitions, the lack of standard statis- tics regarding the validation evidence (e.g., agreement, reli- ability, validity, and accuracy), and inconsistent MeRQ, caution is warranted when interpreting the study findings. Similar to all other areas of empirical research, high MeRQ is a prerequisite for interpreting the validation evi- dence provided. The aim of this systematic review is there- fore to assess, based on the MeRQ, whether the validation evidence of studies on health care resource use measure- ments that are intended for cost estimation support the exchangeability of both self-reported and administrative methods of data collection. The research questions are as follows: 1. Whatarethe characteristicsofstudiesthatcompareself- reported with administrative methods of measuring health care resource use? 2. What is the MeRQ of the studies from which this ev- idence is derived? 3. What validation evidence is reported when comparing self-reported with administrative methods of collect- ing health care resource use data that is intended for cost estimation? 4. Given adequate MeRQ, does the validation evidence support the exchangeability of both self-reported and administrative measurement methods for resource use? 2. Methods This systematic review was planned, conducted, and re- ported in adherence to the PRISMA reporting guidelines [7]. 2.1. Study eligibility Original research in any language that had as a stated purpose the comparison of resource use data gathered using patient self-reports with data gathered using administrative methods for application in costing studies is included in this review. Identification of health care resource use should serve as an item for costing and application in trial-based economic evaluations. Therefore, studies that solely modeled or described patterns of health care use or studies that examined the impact of various patients’ characteris- tics on congruency were excluded from the analysis. Only studies that contained validation arguments in the interpre- tation of each study were eligible. Minimal requirements for validation arguments included identification of health care resource use as an item for costing; identification of collection methods for resource use data; and identification of validation evidence (i.e., agreement, reliability, and val- idity). Because of the lack of consistent terminology, some uncertainty existed as to the appropriate application of these specific terms. In cases of conflict, consensus was reached through discussion among the reviewers. Patient self-reports were considered if they were made solely by the patient; articles using proxies for self- reports were therefore excluded. Any type of direct health care resource use (e.g., a visit to a clinical provider, medi- cation usage) and any type of data collection instrument or method (e.g., diary, interview, questionnaire, GP record, in- surance database) were considered. 2.2. Study identification A search in NHS EED, by means of the Cochrane Li- brary or the Center for Reviews and Dissemination, along with a supplementary search in MEDLINE (PubMed) was conducted, which is generally an appropriate strategy for identifying all published economic evaluation studies related 94 C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
  • 3. to health care that are relevant [8,9]. Details of the search strategy are provided in Appendix A at www.jclinepi.com. Both text words and MeSH terms were combined with AND and OR for ‘‘health care,’’ ‘‘utilization,’’ ‘‘patient self-report,’’ ‘‘administrative data,’’ ‘‘cost analysis,’’ ‘‘valid- ity,’’ ‘‘accuracy,’’ ‘‘reliability,’’ ‘‘agreement.’’ Reference and citation lists of included studies were hand searched to iden- tify additional studies. International experts were consulted via the ‘‘International Health Economics Discussion List’’ to identify a wider range of relevant studies [10]. An archive of collection methods for health care utilization data called Database of Instruments for Resource Use Measurement [4] was used to test the sensitivity of the search. Titles and ab- stracts were screened for inclusion, and full texts were screened once the articles were judged eligible for inclusion or uncertain. In cases of conflict, consensus was reached through discussion among the reviewers. 2.3. Instrument for methodological reporting quality A new tool for appraising the MeRQ for health care resource use measurement was developed on the basis of two guidelines for reporting: Guidelines for Reporting Reli- ability and Agreement Studies (GRRAS) [11] and the Stan- dards for Reporting Diagnostic Accuracy (STARD) [12]. The MeRQ tool aimed to assess the methodological quality of studies reporting on the validation evidence of administra- tive data compared to self-reported data for cost estimation in economic evaluation. STARD is a guideline for testing diagnostic accuracy and thus for research including a refer- ence test. Although this is less relevant for research on health care resource use because reference tests regarding data collection methods for health care resource use are not available, overlap was found in some reporting domains. As mentioned in the Section 1, administrative data collec- tion methods are sometimes assumed to be the gold stan- dard. Therefore, information on this type of accuracy was included in the review as a subconcept of validity. GRRAS complimented STARD in the MeRQ by emphasizing on items specific to reliability and agreement. The operated adaptation of both guidelines into the appraisal tool for eval- uating the MeRQ can be found in Appendix B at www. jclinepi.com. Adaptations were made by focusing on the methodological content and by deleting the directions and layout positions. From both instruments, all aspects that rep- resented (1) reporting quality of studies comparing measure- ments (part 1 of the instrument) and (2) MeRQ of the validation evidence presented in those studies comparing measurements (part 2 of the instrument) were selected. Finally, these were rephrased into questions to score the items in terms of level of quality. Table 2 provides an over- view of the main criteria for the evaluation of the validity, reliability, or agreement of self-reported health care resource use measurement instruments compared with health care resource use measured using administrative data. Table 1. Comparing the advantages and disadvantages of patient self-reported vs. administrative measurements of resource use Measurement of resource use Advantages Disadvantages Self-reports Low costs Possibility to include all societal costs (formal and informal, reimbursed and nonreimbursed) Higher visibility of additional nonspecific costs (not directly related to the disease or intervention) Selective nonresponse Lower compliance in completing over a longer period Burdensome for patients Tendency to record more socially acceptable answers (respondent bias, recall bias) Administrative data No burden for patients Data already registered Accurate and detailed information (on type and volume of care delivered) Limited feasibility (privacy, time lag) Up (wrong) coding Cooperation of many different databases necessary No visibility of additional resource use not directly related to the health care institution and/or insurance company (no routine records for patient and family resource use, additional expenditures, input of informal caregivers, presenteeism, etc) Impossible to include unrelated costs, informal costs and out-of-pocket expenses Table 2. Criteria for the evaluation of the validity, reliability, or agreement of a self-reported health care resource use measurement instrument compared with health care resource use measured using administrative data MeRQ (Methodological Reporting Quality) Part 1: The reporting quality of studies comparing measurements Each study. . explicitly identifies of the study aim . proposes validation arguments and strategies for interpreting validation evidence . clearly reports the measurement methods; including relevant evidence . provides a rationale for the relationship between the measurement methods . elaborates on the measurement procedures of both the self-reports and the administrative data . describes the time interval between the data collections . reports on the population characteristics Part 2: The methodological reporting quality of the validation evidence presented in the studies comparing measurements Assessing the validity, reliability, or agreement, the studies defined. . the statistical methods for validation estimates . the retrieved validation estimates (statistics) . the methods for statistical uncertainty . the interpretation of the validation evidence Part 3: Additional information 95C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
  • 4. Developing MeRQ was systematically processed to appraise the available studies included in this review. The MeRQ tool provides an operational definition of the proper- ties considered most important for this research as it had to combine the accuracy and agreement perspective, and such a tool is not available at the moment. Consequently, the tool was built on existing evidence and available consensus (i.e., the GRRAS and STARD instruments). Next, the items in the MeRQ tool were systematically selected and formu- lated. Both GRRAS and STARD were used to select those elements regarding MeRQ that apply to agreement studies. Creating MeRQ was thus built on the work done by experts who had substantial contributions and publications in the field of agreement and reliability work (e.g., the GRRAS group). Additionally, one of the coauthors of this review is a member of the GRRAS group. Then, those elements from GRRAS and STARD that apply to the assessment to report on the exchangeability of self-reports and administrative data, and thus on agreement studies, were systematically selected. During several collective meetings with the au- thors, the selection and formulation of these elements were extensively discussed until consensus was reached. Because two authors have substantial expertise in psychometrics, we could assure the selection of elements was critically assessed and justified from a psychometric point of view. MeRQ was rated using the scoring criteria: good (explicit reporting in the study); fair (reporting makes it possible to deduce what is expected); poor (not explicitly reported and difficult to deduce what is expected), not reported (no information provided or to be deduced, also unregistered, un- reported [e.g., no information on the measurement proce- dure]), not addressed (not designating the measurement, also undirected [e.g., information on the measurement proce- dures, although not clear whether they relate to the self- reports or the administrative data]), or not applicable. For each study, the quality of reporting on following criteria was assessed: (1) explicit identification of the study as an evaluation of the reliability, agreement or validity (accuracy) of a self-reported health care resource use measurement in- strument compared with health care resource use measured using administrative data; (2) validation arguments and stra- tegies for interpreting validation evidence; (3) measurement methods (eligibility criteria) with a critical review of evidence relevant for the study; (4) rationale for the relationship be- tween the self-reports and the administrative data; (5) methods or measurement procedures for the self-reports; (6) methods or measurement procedures for the administrative data; (7) time interval between self-reports and administrative data collection; (8) population characteristics; (9) statistical methods for validation estimates; (10) validation estimates; (11) methods for statistical uncertainty; (12) concluding vali- dation argument and/or interpretation of validation evidence. All authors (C.Y.N., A.d.R., S.E., F.N., and J.K.) contrib- uted to the rating of the MeRQ of the studies. The included articles were divided per paired group of reviewers. First, the scoring was conducted individually, and then the results of the scoring were pairwise compared. The pairing was C.N. and S.E. (pair 1), S.E. and J.K. (pair 2), J.K. and A.d.R. (pair 3), A.d.R. and C.N. (pair 4), C.N. and F.N. (pair 5). In cases of conflicting scores, which occurred in five studies [13e17], a third reviewer was added to the pair and consensus was reached through discussion. 2.4. Validation evidence There are different concepts in the literature regarding the assessment of measurement validity (i.e., reliability, agreement, validity, and accuracy) [18]. In this systematic review, validation evidence is considered an umbrella term whereby the terms reliability, agreement, and validity (and accuracy as a subconcept of validity) are used to provide in- formation about the quality of the measurements. Reli- ability is defined here as the ratio of variability between the subjects to the total variability of all measurements in the sample. Reliability therefore reflects the ability of a measurement to differentiate between subjects or objects (i.e., discriminative ability). Statistical approaches include intraclass correlation and kappa calculations. Agreement is defined as the degree to which scores are identical and is often expressed in percentages (i.e., proportions of agree- ment, ranges, limits of agreement). Diagnostic accuracy re- fers to the degree of agreement between the information from the test being evaluated (index test) and reference standard, and can be expressed in many ways, that is, sensi- tivity, specificity, likelihood ratios, odds ratio, area under a receiver operating curve. For costing measures, the concept of diagnostic accuracy cannot be used as there is no consensus on what the reference standard should be. How- ever, as mentioned in the Section 1, administrative data collection methods are often, but not supported by scientific data, assumed to be the gold standard. If information on this type of accuracy was available, it is included in the review as a subconcept of validity. Complete and accurate report- ing makes it possible to detect the potential for bias in the study (internal validity) and to evaluate the generalizability and applicability of the results (external validity). 2.5. Best evidence Studies with ‘‘good’’ reporting on four ‘‘minimal criteria for reporting’’ of MeRQ were compared in detail. These minimal criteria include as follows: (1) clear description of the measurement methods (eligibility criteria) with a crit- ical review of evidence relevant for the study; (2) rationale for the relationship between the self-report and the adminis- trative data; (3) methods or measurement procedures for self-reports; and (4) methods or measurement procedures for the administrative data. A ‘‘good reported’’ score on these criteria is essential to meaningfully interpret the re- maining criteria as presented in the MeRQ (i.e., explicit identification of the study as an evaluation of the validity, reliability, agreement, or accuracy of a self-reported health 96 C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
  • 5. care resource use measurement instrument compared with administrative data; proposed validation arguments and stra- tegies for interpreting validation evidence; time interval be- tween self-reports and reference standard data collection mentioned; statistical methods for validation estimates defined; validation estimates reported; methods for statisti- cal uncertainty defined; concluding validation argument and/or interpretation of validation evidence presented). The ‘‘minimal reporting criteria’’ can therefore be seen as prerequisites and are based on the collective opinion of the authors because the rating of the aforementioned broad- er criteria would not have an interpretable meaning if the four minimal reporting criteria were not at least rated ‘‘good reported.’’ In essence, the research team concluded that the broader criteria are conditional on ‘‘the four mini- mal criteria of reporting.’’ 2.6. Data extraction Data were extracted independently. Basic information was extracted on the year of publication and country of origin, study design (e.g., sample of enrollees, case- control study, randomized controlled trial), description of the population (e.g., workers, patients/disease group) and the sample size, the type of health care resource use measured (e.g., medication, GP visits, hospital admissions), the method of self-report (e.g., questionnaire, diary), the type of administrative measure (e.g., hospital record, insur- ance record), and the data collection period. See Table 3. The MeRQ score is reported per study by presenting the score on each of the reporting criteria (Table 4). The scoring was abstracted independently and in duplicate for all studies and all criteria, and conflicts were resolved by consensus. Validation evidence, as presented in Table 5, included the main statistical analysis, the validation esti- mates, and the statistical uncertainty. In the end, studies that scored ‘‘good’’ on four ‘‘minimal criteria for reporting’’ of MeRQ were compared in detail. 3. Results 3.1. Study selection Our search in MEDLINE (PubMed) resulted in 69 hits, and 88 hits were found in the NHS Economic Evaluation Database, which is included in The Cochrane Library. Af- ter consulting experts, 11 articles were added. Citation tracking and hand searching resulted in nine additional hits. Four duplicates were removed, and the 173 remaining re- cords were screened by title and abstract. At this stage, 138 records were excluded. The main reasons for exclusion were that the studies did not aim to measure health care resource use for the purpose of costing analysis, did not provide information on the estimation of health care costs, and did not address the economic significance or the impact on the costs of care but solely described patterns of health care use or provided information on how popula- tion characteristics influenced the agreement or disagree- ment between both measurement methods. Thirty-five full-text articles were assessed for eligibility, whereof another 19 full-text articles were excluded. The most com- mon reasons for exclusion were that the studies reported proxy-recorded resource use; the main aim was not the comparison of health care resource use measured via self-reports with use measured via administrative methods; the studies did not measure health care resource use as items of costing or did not address the economic impact on costs of care; the studies were a case-study or review; or the resource use was solely focused on the usage of one specific type of medication. In the end, 16 articles were included in this comprehensive literature review [13e17,19e29]. A flow chart describes the process of in- clusion (see Fig. 1). 3.2. Characteristics of the studies Table 3 presents an overview of the characteristics of the selected studies (N 5 16). Moststudiesexaminedatleast one ofthe followingtypesof resource use: specialist care [13,14,19e25,27e29], primary care [13,16,21,25,26], medication use [17,20,26,27,29], or hospitalization [13e15,21,23,24,28,29]. Most studies collected information on multiple types of resource use [13,14,20,21,23e28]. Eleven studies were conducted in Eu- rope [13,14,16,17,19,21,22,24,25,28,29], two in Canada [20,23], one in the USA [15], one in Australia [27], and one in New Zealand [26]. The population characteristics varied from patients with mental health complaints [13,24,25,29] to musculoskeletal disorders [19,22,26], chronic disorders [14,15,20,23], primary care attenders [16], acute ward patients [28], and cancer patients [17,21,27]. In most studies, a randomized controlled trial was conducted [14,19,21,22,25e29],ofwhichtwowereconductedina multi- center setting [13,17]. In four studies, the study design was based on a sample of enrollees from another study [24], pro- gram [15], or health care service [16,20]. One study used a population-based cohort [23]. In eight studies, health care resource use was measured with a questionnaire [16,17,20,22,25,26,28,29], in three studies with an interview [13,23,24], in another three studies with a diary [17,19,27], in two studies with a tele- phone survey [15,21], and one with a booklet [14]. The studies that measured health care resource use based on administrative records used data from medical records [13], including care provider registrations [14,20,24], re- cord forms [16,17,21,22,25,26], and provider reports [28]. Six studies used health insurance data [15,19,20,23,27,29]. In most studies, the data collection was retrospective [13,15,16,21e26,28,29] with the largest recall period of 97C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
  • 6. Table 3. Characteristics of studies comparing measurement methods for health care resource use Reference and year Country Study design Population (sample size) Health care resource use Method of self-report Administrative record Data collection period Byford (2007) [13] England and Scotland Multicenter RCT Adults with recurrent episodes of deliberate self-harm (N 5 272) Inpatient days, outpatient, accident and ER attendance, GP, practice nurses, community psychologists, psychiatric nurses and occupational therapist contacts Interviews (Client Service Receipt Inventory) Medical records Retrospectived baseline, 6 and 12 mo after trial entry Goossens (2000) [14] The Netherlands RCT Patients with chronic low back pain (N 5 40) Physiotherapy contacts and specialist contacts Cost diary Health insurance company data Prospectived each diary covered a period up to 4 weeks Guerriere (2006) [15] Canada Sample of enrollees from a cystic fibrosis clinic Patients with cystic fibrosis (N 5 110) Home-based health care professional appointments, health care professional appointments outside home, laboratory and diagnostic tests, prescription medications Questionnaire (The Ambulatory and Home Care Record) Ontario Health Insurance Plan database, Ontario Home Care Administrative System database, hospital pharmacy database Daily basis completion over a 4-week data collection period Hoogendoorn (2009) [16] The Netherlands RCT Patients with chronic obstructive pulmonary disease (N 5 175) Daycare treatment, hospital admissions, hospital days, medical specialist visits, physiotherapist visits, respiratory nurse visits, dietician visits Cost booklets Care provider registrations Prospectived each booklet covered a period of 4 weeks collected every 2 mo Janson (2005) [17] Sweden RCT Patients with potentially curable colon cancer (N 5 40) Outpatient consultations, doctor consultations, GP and district nurse, surgical ward readmissions Telephone surveys Clinical Case Record Forms Retrospectived every 25 weeks with a 12-week follow-up period Kennedy (2002) [18] United Kingdom RCT Patients with a musculoskeletal condition (N 5 52) Physiotherapy Questionnaires Outpatient department records, hospital physiotherapy department’s records Retrospectived 3 and 12 mo after initial outpatient appointment Longobardi (2011) [19] Canada Population-based cohort Adults with inflammatory bowel disease (IBD) (N 5 352) Hospital and physician visits (IBD and non-IBD related) Interview (standardized) Health insurance provider databases Retrospective d12 mo Marks (2003) [20] United States Sample of enrollees in a diabetes management program Diabetics (N 5 1,230) Hospitalizations, emergency room visits Telephone assessment Health insurance claims Retrospective d6 mo Mirandola (1999) [21] Italy Sample of enrollees from the South-Verona Outcome Patients in contact with the South-Verona Community Psychiatric Contacts with psychiatrists, psychologists, community and outpatient nurses, Interview (Client Service Receipt Interview) Psychiatric Case Register Retrospective dtwice yearly (Continued) 98 C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
  • 7. 12 months [13,22,23]. In three studies that used diaries for self-reports, the data were collected prospectively [14,19,27]. One study collected the data on a daily basis [20]. Another study collected the data both prospectively (via a diary) and retrospectively (via questionnaires) [17]. Table 3. Continued Reference and year Country Study design Population (sample size) Health care resource use Method of self-report Administrative record Data collection period Project Service (N 5 339) and social workers; admissions in sheltered accommodations; rehabilitation group contacts; day hospital contacts Mistry (2005) [22] United Kingdom RCT Patients consulting their GP regarding symptoms of depression (N 5 85) Contacts with GPs and other practice staff, contacts with community mental health teams, psychiatric and general hospital contacts Questionnaires GP records (community- based or manual records) Retrospectived 3-mo intervals over the course of 1 yr Patel (2005) [23] England Random sample of consecutive primary care attenders Primary care attenders (N 5 229) GP visits Questionnaires (variant of the Client Service Receipt Inventory) GP case records Retrospectived 6 mo Pinto (2011) [24] New Zealand RCT Patients with hip or knee osteoarthritis (N 5 50) GP contacts, total number of medications Questionnaire (Osteoarthritis Cost and Consequences Questionnaire) Medical records Retrospectived 3 mo Pollicino (2002) [25] Australia RCT Patients with apparently resectable nonesmall-cell lung cancer (N 5 24) Broad type of services (unreferred attendances, specialist attendances, diagnostic imaging, pathology tests) Diaries Health Insurance Commission Prospectived fortnightly the first year and monthly the year after Richards (2003) [26] United Kingdom RCT Acute ward patients aged 60 yr or older (N 5 185) Hospital admissions, visits by community health care staff Interviewer- administered questionnaire Health care provider reports (Patient Administration System, Integrated Community System, GP questionnaire) Retrospectived 12 weeks Van den Brink (2004) [27] The Netherlands Multicenter RCT Patients with rectal cancer (N 5 147) Medications, care products Diary, questionnaire Medical records (hospital admission system and pharmacy records) Prospective (diary)d weekly from discharge to 3 mo and monthly from 3 to 6 mo/ retrospective (questionnaire) d3 and 6 mo Zentner (2012) [28] Germany Cluster RCT People with severe mental illness (N 5 82) Hospital admissions, ambulant psychiatric contacts, medication Questionnaire (Client Sociodemographic and Service Receipt Inventory) Health insurance administrative records (AOK) Retrospective d6 mo hospital admissions/ 3-mo ambulant contact/ 1-mo medication Abbreviations: RCT, randomized controlled trial; GP, general practitioner. 99C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
  • 8. 3.3. MeRQ scores Table 4 summarizes the MeRQ as assessed by the re- viewers using the MeRQ tool. The reporting of the ele- ments was scored ‘‘good,’’ ‘‘fair,’’ ‘‘poor,’’ ‘‘not reported,’’ ‘‘not addressed,’’ or ‘‘not applicable.’’ In seven of the sixteen studies, more than 75% of the criteria (!9) were considered ‘‘good’’ (explicit identification) [21e24,26,28,29]. Overall, most studies scored at least ‘‘fair’’ on most of the MeRQ criteria. One study [19] did not explicitly identify its aim as eval- uating the validity of self-reported measurements compared with administrative measurements. It was only at the end of the methods section that the aim of comparing both methods was mentioned. The deducibility was low, result- ing in a poor reporting quality of the first criteria. Another study [13] did not address any validation arguments or stra- tegies for interpreting the validation evidence. The same study lacked a comprehensive description of the measure- ment procedures for self-reports and administrative data. The only information provided was regarding the health care services that were measured and the time intervals be- tween both collection methods. This study scored ‘‘poor’’ on two quality criteria. Seven of 16 studies did not report the time intervals between the self-reported data collection and the administrative data collection [14,15,21e25]. Pol- licino et al. [27] did not report statistical methods for the assessment of validation estimates; they instead reported the difference counts between agreement and disagree- ment. Furthermore, no methods for statistical uncertainty were reported [27]. Five other studies also failed to report methods for statistical uncertainty [14,17,19,25,29]. Van den Brink et al. [17] calculated concordance between types of medication and care products reported using both methods. Percentages were provided in figures and text; however, it was impossible to deduce the statistical methods used and the validation estimates reported. Although Mirandola et al. [24] reported a bias correction factor for precision, methods for statistical uncertainty were lacking, and the study scored ‘‘poor’’ on this report- ing element. Table 4. MeRQ scores Study Aim to compare validation evidence Strategies for interpreting validation evidence Instrument eligibility criteria Rationale for relation Methods or procedures for self-reports Methods or procedures for administrative data Time interval Population characteristics Statistical methods for validation estimates Validation evidence Validation estimates Statistical uncertainty Conclusion and interpretation of validation evidence Byford (2007) 6 6 6 þ À À þ 6 þ Agreement þ þ þ Goossens (2000) À 6 þ þ þ 6 þ þ þ Validity þ ? þ Guerriere (2006) 6 À 6 6 À þ þ þ þ Agreement þ þ þ Hoogendoorn (2009) 6 6 6 þ þ þ ? þ þ Agreement þ ? þ Janson (2005) 6 þ þ 6 þ þ ? þ þ Agreement þ þ þ Kennedy (2002) þ 6 þ þ þ þ ? þ þ Agreement þ þ þ Longobardi (2011) þ þ þ þ þ þ ? þ þ Validity þ þ þ Marks (2003) 6 6 6 þ þ þ þ þ þ Agreement þ þ 6 Mirandola (1999) þ þ þ þ þ þ ? þ þ Agreement þ À þ Mistry (2005) þ þ 6 6 6 6 þ 6 þ Agreement þ þ 6 Patel (2005) 6 6 À þ 6 6 þ 6 þ Agreement þ þ þ Pinto (2011) þ þ þ þ þ þ þ þ þ Validity þ þ þ Pollicino (2002) þ ? þ þ À À þ À ? Agreement ? ? À Richards (2003) 6 þ þ þ þ þ þ 6 þ Agreement þ þ þ Van Den Brink (2004) 6 þ 6 þ þ þ þ 6 þ Validity þ 6 6 Zentner (2012) þ þ þ þ þ þ þ þ þ Validity þ À þ Abbreviation: MeRQ, Methodological Reporting Quality. þ, good (explicit reporting in the study); 6, fair (reporting in study makes it possible to deduce what is expected); À, poor (not explicitly re- ported in the study and difficult to deduce what is expected); ?, not reported. 100 C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
  • 9. Table 5. Validation evidence Reference and year Statistical analysis Validation estimate (evidence) Statistical uncertainty Byford (2007) [13] Lin concordance correlation coefficient (rc) Inpatient days (rc 0.658) Outpatient appointments (rc 0.231) Accident and ER attendance (rc 0.760) General practitioner contacts (rc 0.631) Practice nurse contacts (rc 0.056) Community psychologist contacts (rc 0.063) Community psychiatric nurse contacts (rc 0.349) Occupational therapist contacts (rc À0.001) 95% confidence interval 95% limits of agreement Goossens (2000) [14] Nonparametric Wilcoxon Signed Rank Test No significant difference between the two methods for contacts with specialists (P-value 0.93) Significant difference for physiotherapist visits (P-value 0.021) NA Guerriere (2006) [15] Observed agreement (Po) Simple kappa (k) Quadratic weighted kappa (kw) Physiotherapy (Po 1.00; k 1.00) Clinic visits (Po 0.98; k 0.90) Hospitalization (Po 0.99; k 0.88) Diagnostic tests (Po 0.94; k 0.78) Home-based nursing or personal support (Po 0.97; k 0.75) General practitioner visits (Po 0.96; k 0.69) Laboratory tests (Po 0.92; k 0.51) Medical specialist visits (Po 0.85; k 0.41) Diabetes therapy (Po 0.95; kw 0.5) Antibiotics/antifungals (Po 0.81; kw 0.63) Gastrointestinal therapy (Po 0.78; kw 0.55) Pulmonary therapy (Po 0.78; kw 0.5) 95% confidence interval Hoogendoorn (2009) [16] Spearman rank correlation coefficient (rho) Proportion of perfect agreement (p.a. %) Daycare treatment (rho 0.55; p.a. 87%) Daycare treatment for COPD (rho 0.49; p.a. 98%) Hospital admissions (rho 0.93; p.a. 88%) Hospital admissions for COPD (rho 0.94; p.a. 95%) Total hospital days (rho 0.91; p.a. 79%) Total hospital days for COPD (rho 0.93; p.a. 93%) Medical specialist visits (rho 0.70; p.a. 8%) Physiotherapy (rho 0.86; p.a. 7%) Respiratory nurse visits (rho 0.65; p.a. 11%) Dietician visits (rho 0.64; p.a. 29%) NA Janson (2005) [17] Spearman correlation coefficient (SCC) Outpatient consultations doctor (SCC 0.66) General practitioner visits (SCC 0.49) District nurse visits (SCC 0.39) Surgical ward visits (SCC 0.72) 95% confidence interval Kennedy (2002) [18] Intraclass correlation coefficient (ICC) Physiotherapy (ICC 0.54) 95% confidence interval Longobardi (2011) [19] Concordance correlation coefficient (CCC) Pearson correlation coefficient (PCC) Physician visits (CCC 0.58; PCC 0.68) Hospital overnights, IBD related (CCC 0.80; PCC 0.88) Hospital overnights, non-IBD related (CCC 0.86; PCC 0.93) Bias correction factor Marks (2003) [20] Kappa (k) Hospitalizations (k 0.6366) Emergency room visits (k 0.5390) 95% confidence interval Mirandola (1999) [21] Lin concordance correlation coefficient (rc) Psychiatrists community contacts (rc 0.521) Psychiatrists outpatient contacts (rc 0.538) Psychologists community contacts (rc 0.495) Psychologists outpatient contacts (rc 0.539) Nurses community contacts (rc 0.45) Nurses outpatient contacts (rc 0.94) Social workers community contacts (rc 0.183) NA (Continued) 101C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
  • 10. 3.4. Validation evidence The main statistical analysis and statistical uncertainty analyses, as well as the validation estimates, were also captured during basic data abstraction and presented in Table 5. The statistical methods used to calculate the health care resource use validation estimates varied widely. Most studies examined the correlation between self-reported and administrative measurements of health care resource use, with calculations including the Spearman (rank) corre- lation coefficient [14,21,29], Pearson correlation coefficient [17,23], the interclass correlation coefficient [22], and pri- marily Lin’s concordance correlation coefficient [13,16,23,24,26]. The Lin coefficient combines measure- ments of precision and accuracy to determine whether the data significantly deviate from perfect concordance. In an attempt to measure agreement, another four studies calcu- lated a (weighted) kappa [15,20,25,28]. Wilcoxon signed rank tests were used in one study to test differences in es- timations of nonnormal data distributions [19]. Agreement and proportion-perfect agreement were considered in the analyses of five studies [14,17,20,27,28]. In 56% of the studies, statistical uncertainty was defined. Most studies presented the 95% confidence intervals around the valida- tion estimates [13,15,20e22,26,28]. The 95% limits of agreement were calculated in three studies [13,16,26]. 3.5. Best evidence In the end, six studies [22e24,26,28,29] scored ‘‘good’’ on all four MeRQ ‘‘minimal criteria for reporting.’’ These studies were assessed and compared in detail. Three studies assessed the validity of the methods [23,26,29], whereas the other three studies assessed the methods based on their levels of agreement [22,24,28]. Longobardi et al. [23] assessed the validity of self- reported data gathered via standardized interviews compared with health care utilization drawn from an administrative database over a 12-month period for inflam- matory bowel disease (IBD) patients. The authors concluded that there was good recall of medical contacts. Table 5. Continued Reference and year Statistical analysis Validation estimate (evidence) Statistical uncertainty Social workers outpatient contacts (rc 0.186) Admissions in sheltered accommodations (rc 0.999) Rehabilitation group contacts (rc 0.165) Day hospital contacts (rc 0.140) Admissions to acute psychiatric ward (rc 0.953) Mistry (2005) [22] Weighted kappa (kw) General practitioner surgery contacts (kw 0.370) General practitioner home visits (kw 0.485) Psychiatric hospital and community service (kw 0.278) Social services (kw 0.021) Other hospital and specialist services (kw 0.430) NA Patel (2005) [23] Lin concordance correlation coefficient (rc) General practitioner visits (rc 0.756) 95% limits of agreement Pinto (2011) [24] Lin concordance correlation coefficient (rc) General practitioner contacts (rc 0.41) Total number of medications (rc 0.63) 95% limits of agreement 95% confidence interval Pollicino (2002) [25] Extent of agreement (absolute numbers, agreed on number of services used) Unreferred visits (extent of agreement 3/23) Specialist visits (extent of agreement 9/22) Diagnostic imaging (extent of agreement 5/20) Pathology tests (extent of agreement 1/14) NA Richards (2003) [26] Percentage crude agreement (%) (quadratic) weighted kappa (kw) Hospital readmissions (85.7% kw 0.42) General practitioner surgery visits (83.0% kw 0.61) Patient-requested general practitioner home visits (41.7% kw 0.59) Routine general practitioner home visits (71.0% kw 0.71) 95% confidence interval Van den Brink (2004) [27] Percentage concordance Medications and care products (25%) NA Zentner (2012) [28] Spearman rank correlation coefficient Hospital admissions 0.53 Ambulant psychiatric contacts 0.33 Medications 0.31 NA Abbreviation: NA, not addressed. 102 C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
  • 11. However, systematic errors occurred in the reported fre- quency of physician visits (underreporting by 35e45%) and overnight stays in the hospital (overreporting by 25e35%) [23]. There was a reasonably good correlation of self-reported data with adequate concordance (0.59e0.86) and Pearson (0.68e0.93) correlation coeffi- cients for both IBD and non-IBDerelated overnight stays in the hospital. Discrepancies in hospitalizations were due to failures to estimate a stay within the recall period, omit- ting stays following an emergency room visit, mistakenly including day surgery as hospitalization, and misclassifying an IBD stay. The tendency to report higher numbers of overnight stays in the interviews was not statistically signif- icantly different from those reported in the administrative database. A difference of this magnitude would likely have been significant in a larger sample with a higher base of hospitalizations. The self-reported data also reported lower numbers of physician visits relative to the administrative data, despite the significant correlation. Regardless of whether IBD was inactive, fluctuating, or active across the full 12-months, lower reported numbers of physician visits occurred by one-third when comparing self-reports with administrative data [23]. Pinto et al. [26] compared the validity of questionnaires and medical records regarding health care resource use over a 3-month period. Lin concordance correlation coefficients were fair to good, ranging from 0.41 for GP contacts to 0.63 for the number of medications reported. Generally, pa- tients reported more medication usage and less GP contacts when compared to the medical records. However, the mean differences between the estimation periods were small: À0.22 for medication use and 0.06 for GP contacts. Partic- ipants more accurately reported their nonuse than use of health care. GP contacts and medication usage had higher levels of sensitivity (93e100%) than specificity (63e100%). The 3-month measurement period might explain this, as the use of health care might be relatively low over this short period of time. Compared cost estima- tions for GP contacts were in agreement with database re- sults (0.502 Lin’s concordance correlation coefficient). Zentner et al. [29] assessed the validity of self-reported data gathered using a questionnaire compared with data drawn from health insurance administrative records over a 6-month period. Correlation coefficients were moderate to low (0.53e0.31) and reflected statistical differences when comparing one measurement method to the other. Hospital admissions measured over a 6-month recall period, ambu- lant psychiatric contacts measured over a 3-month recall period, and medication use measured over a 1-month recall period were all reported lower in the questionnaire data Records identified through database searching PubMed (n = 69) NHSEED (n = 88) ScreeningInclusionEligibilityIdentification Additional records identified through other sources (n = 20) Records after duplicates removed (n = 173) Records screened (n = 173) Records excluded based on abstract and title (n = 138) Full-text articles assessed for eligibility (n = 35) Full-text articles excluded (n = 19) No identification of items for costing (n =9) Proxy reported use (n=2) Review or case-study (n=3) Focus on one medication type (n=3) No comparison between self- reported and administrative methods (n=2) Studies included (n = 16) Fig. 1. Study flow diagram. PRISMA, preferred reporting items for systematic review and meta-analysis. 103C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
  • 12. when compared to the insurance records. The low correla- tions between health care resource use measured using questionnaires and insurance registrations were significant, indicating a difference between the measurement methods [29]. Kennedy et al. [22] compared health care use measured via questionnaires with use measured via hospital physio- therapy department records. Agreement was measured by an intraclass correlation coefficient. The intraclass correla- tion coefficient of 0.54 (95% CI 0.31e0.70) reasonably supports the exchangeability of both the questionnaire and department records. However, self-reported data re- ported more resource use compared with hospital records; a mean of 1.3 (95% CI 0.4e2.2) fewer visits were reported in the physiotherapy department notes [22]. Mirandola et al. [24] assessed the agreement of self- reported health care use measured via interviews with psy- chiatric case registers over a 6-month period (biannually). Twelve health care categories were defined, and agreement between self-reports and administrative data varied greatly for each category. Overall, agreement was particularly weak for day-patient (hospital) contacts (0.14) and very strong for nurses’ outpatient contacts, number of acute psy- chiatric ward admissions, and sheltered accommodation (all rc O 0.9). These results translate into overall agreement (0.926) on the total costs. The findings support that self- reported and administrative measurements of health care use are exchangeable when estimating the total cost of ser- vices. Nonetheless, as the interview tends to report higher costs of day and community care compared to the case reg- isters (rc 0.14 and rc 0.407, respectively) and tends to result in lower outpatient costs (rc 0.422) when compared to the registers, one should acknowledge the importance of adequate procedures for analyzing the complexity of costs components [24]. Richards et al. [28] assessed the agreement of self- reported data via interviewer-administered questionnaires compared with health care providers’ reports over a 12- week period. Recall periods varied for different resources (4e12 weeks). Agreement and systematic differences in the distribution of paired responses were examined to assess the comparability of the methods. Within a range of 42e93%, agreement was over 72%. Systematic differ- ences were observed when comparing self-reports with administrative data for routine primary care home visits (overreporting). Self-reports reported less resource use for hospital readmissions, requested primary care home visits, surgery consultations, district nursing, and physiotherapy when compared to provider records over short time frames (4e12 weeks). The number of visits was only gathered for hospital readmissions, routine and requested primary home care and surgery visits. Kappa coefficients were moderate or good for the four comparisons (kappa 0.42e0.86). The 95% confidence intervals of the kappa coefficients were quite wide for all comparisons. Systematic bias was indi- cated for hospital readmissions and surgery visits (P ! 0.05). Marginally significant differences (P 5 0.05e0.10) were indicated for requested and routine primary home care visits. There were adequate levels of crude agreement between patient-reported use of health care services and usage data from health care providers. However, the analysis of bias between pairs revealed sys- tematic differences and more marginally significant differ- ences in the response patterns. Except for primary care routine home visits, patients reported less health care use compared to health care provider records [28]. 4. Conclusions This study draws attention to (1) the MeRQ and (2) the validation evidence of studies comparing self-reported with administrative data collection methods used in measuring health care resource use for cost estimation. Based on the small number of six studies with adequate MeRQ presented in this review, the exchangeability of self-reported and administrative health care resource use measurements can only be cautiously supported based on the presented valida- tion evidence (e.g., reporting differences such as different number of visits, or heterogeneous or weak correlation co- efficients, etc). No evidence can be provided to suggest that one measurement is better than the other. Throughout these six studies, when measured, patients generally tended to report lower resource use, mainly regarding physician and physiotherapy contacts, when compared to the resource use gathered via administrative data. The most commonly reported source of validation ev- idence was the calculation of correlation coefficients indi- cating the strength and direction of association. This approach is insufficient to indicate agreement or absolute measurement errors. Further attention should be paid to methods that enhance adequate MeRQ and the trustworthi- ness of validation evidence supporting the exchangeability of self-reports and administrative data for purpose of cost estimation. Notwithstanding, self-reported data appear to be able to provide adequate general estimates of health care use when administrative data are not available. 4.1. Limitations A limitation of the study is the limited generalizability of the findings due to great diversity. First, most studies collected data on multiple types of resource use and with divergent populations. However, the validation evidence was not necessarily more trustworthy when the population being assessed (e.g., patients with mental health com- plaints) was directly linked to the type of health care resource use relevant to the illness (e.g., psychologists). Second, as most studies were conducted in a randomized controlled trial, the main focus of these studies was on assessing the effectiveness and/or cost-effectiveness of an intervention or policy adjustment. Comparing the 104 C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
  • 13. self-reported data with administrative data was usually not the primary aim. Third, sample sizes were small. In most cases, existing populations gathered for a completely different research purpose were assessed in a secondary analysis on agreement levels. 4.2. Discussion The MeRQ was assessed with a newly developed appraisal tool called MeRQ, which was derived by combining items of existing guidelines, that is, GRRAS and STARD. The MeRQ tool was developed to ensure the reporting of key elements required to assess the valida- tion evidence and the risk of bias. It was difficult to appraise the methodological rigor when key information was reported vaguely or not at all. Interpreting validation evidence requires knowledge of the methods and statistics but even then, there are no standard statistics, and different interpretations are possible. Although the use of the MeRQ tool was feasible in the current systematic review, further research is warranted to assess its validity. Adequate MeRQ was found in a minority of the studies. Six studies assessed in the best evidence section demon- strate the complexity between different data sources. Data sets contained both random and systematic errors. System- atic errors were generally detected by comparing two imperfect sources of information, and taking directional trends into account might explain the discrepancies. Possible reasons for random sources of error could include problems in recall (for self-reports) or data entry errors (for administrative data). In both, interpretation or coding differences might explain random errors. In addition, dif- ferential bias may occur in trials examining innovative ser- vices, particularly where the boundaries between specific service providers are blurred from the patient’s perspec- tive. These discrepancies in health care resource use that relate to cost elements might be of concern, especially when they have a large impact on the overall costs of care (e.g., hospital admissions, complex care packages). Although the difference between two methods of data collection might be statistically significant, they might not be economically relevant, or vice versa. For example, if the health care resource use being researched (e.g., physiotherapy contacts) was one of the dominant costs in the evaluation (i.e., due to the population-specific health complaints), then the addition or subtraction of one extra visit per patient may not be statistically significant but would be economically relevant. Therefore, one needs to properly address a priori usage and identification of items for making cost estimations. 4.3. Implications for research and practice The assumptions made in some studies that adminis- trative databases should be the reference standard is questionable because they seem to have errors. When using the data for cost estimations, statistical corrections might be relevant. However, corrections for cost data from administrative databases when computing incre- mental cost-effectiveness ratios are so far unavailable [30]. Previous reviews identified limitations of these da- tabases [3,5,6]. The health economic community does not yet have a culture of validating resource use measure- ment, and there is a lack of evidence regarding the collec- tion of resource use data [31]. Type-a-like reviews might aid in the assessment of the MeRQ to deduce the validation evidence available. The validation evidence of a (small) number of studies with adequate MeRQ cautiously supports the exchangeability of both the self- reported and administrative resource use measurements. Coherent methodological quality which supports the in- terpretations of estimates and rigorous, well-reported research providing strategically collected validation evi- dence is needed. References [1] Drummond M, McGuire A. Economic evaluation in health care: merging theory with practice. New York: Oxford University Press; 2001. [2] Hakkaart L, Tan SS, Bouwmans CAM. Handleiding voor kostenon- derzoek. Methoden en standaard kostprijzen voor economische eval- uaties in de gezondheidszorg. Rotterdam: Instituut voor Medische Technology Assessment, Erasmus Universiteit Rotterdam. CVZ Col- lege voor zorgverzekeringen; 2010. [3] Bhandari A, Wagner T. Self-reported utilization of health care ser- vices: improving measurement and accuracy. Med Care Res Rev 2006;63:217e35. [4] Ridyard CH, Hughes DA. Development of a database of instruments for resource-use measurement: purpose, feasibility, and design. Value Health 2012;15:650e5. [5] Evans C, Crawford B. Patient self-reports in pharmacoeconomic studies. Their use and impact on study validity. Pharmacoeconomics 1999;15:241e56. [6] Harlow SD, Linet MS. Agreement between questionnaire data and medical records. The evidence for accuracy of recall. Am J Epide- miol 1989;129:233e48. [7] Moher D, Liberati A, Tetzlaff J, Altman DG, Prisma Group. Preferred reporting items for systematic reviews and meta-analyses: the PRIS- MA statement. J Clin Epidemiol 2009;62:1006e12. [8] Sassi F, Archard L, McDaid D. Searching literature databases for health care economic evaluations: how systematic can we afford to be? Med Care 2002;40:387e94. [9] Alton V, Eckerlund I, Norlund A. Health economic evaluations: how to find them. Int J Technol Assess Health Care 2006;22:512e7. [10] International Health Economics Discussion List, 2013. Available at https://www.jiscmail.ac.uk/cgi-bin/webadmin?A25ind1305L5 healthecon-discussP5R16115healthecon-discuss95AJ5 ond5NoþMatch%3BMatch%3BMatchesz54. Accessed May 22, 2013. [11] Kottner J, Audige L, Brorson S, Donner A, Gajewski BJ, Hrobjartsson A, et al. Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed. J Clin Epidemiol 2011;64:96e106. [12] Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. BMJ 2003;326:41e4. 105C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
  • 14. [13] Byford S, Leese M, Knapp M, Seivewright H, Cameron S, Jones V, et al. Comparison of alternative methods of collection of service use data for the economic evaluation of health care interventions. Health Econ 2007;16:531e6. [14] Hoogendoorn M, van Wetering CR, Schols AM, Rutten-van Molken MP. Self-report versus care provider registration of health- care utilization: impact on cost and cost-utility. Int J Technol Assess Health Care 2009;25:588e95. [15] Marks AS, Lee DW, Slezak J, Berger J, Patel H, Johnson KE. Agree- ment between insurance claim and self-reported hospital and emer- gency room utilization data among persons with diabetes. Dis Manag 2003;6:199e205. [16] Patel A, Rendu A, Moran P, Leese M, Mann A, Knapp M. A compar- ison of two methods of collecting economic data in primary care. Fam Pract 2005;22:323e7. [17] van den Brink M, van den Hout WB, Stiggelbout AM, van de Velde CJ, Kievit J. Cost measurement in economic evaluations of health care: whom to ask? Med Care 2004;42:740e6. [18] Streiner DL, Kottner J. Recommendations for reporting the results of studies of instrument and scale development and testing. J Adv Nurs 2014;70:1970e9. [19] Goossens ME, Rutten-van Molken MP, Vlaeyen JW, van der Linden SM. The cost diary: a method to measure direct and in- direct costs in cost-effectiveness research. J Clin Epidemiol 2000; 53:688e95. [20] Guerriere DN, Ungar WJ, Corey M, Croxford R, Tranmer JE, Tullis E, et al. Evaluation of the ambulatory and home care record: agreement between self-reports and administrative data. Int J Technol Assess Health Care 2006;22:203e10. [21] Janson M, Carlsson P, Haglind E, Anderberg B. Data validation in an economic evaluation of surgery for colon cancer. Int J Technol Assess Health Care 2005;21:246e52. [22] Kennedy AD, Leigh-Brown AP, Torgerson DJ, Campbell J, Grant A. Resource use data by patient report or hospital records: do they agree? BMC Health Serv Res 2002;2:2. [23] Longobardi T, Walker JR, Graff LA, Bernstein CN. Health service utilization in IBD: comparison of self-report and administrative data. BMC Health Serv Res 2011;11:137. [24] Mirandola M, Bisoffi G, Bonizzato P, Amaddeo F. Collecting psychi- atric resources utilisation data to calculate costs of care: a comparison between a service receipt interview and a case register. Soc Psychia- try Psychiatr Epidemiol 1999;34:541e7. [25] Mistry H, Buxton M, Longworth L, Chatwin J, Peveler R. Compari- son of general practitioner records and patient self-report question- naires for estimation of costs. Eur J Health Econ 2005;6:261e6. [26] Pinto D, Robertson MC, Hansen P, Abbott JH. Good agreement between questionnaire and administrative databases for health care use and costs in patients with osteoarthritis. BMC Med Res Methodol 2011;11:45. [27] Pollicino C, Viney R, Haas M. Measuring health system resource use for economic evaluation: a comparison of data sources. Aust Health Rev 2002;25:171e8. [28] Richards SH, Coast J, Peters TJ. Patient-reported use of health ser- vice resources compared with information from health providers. Health Soc Care Community 2003;11:510e8. [29] Zentner N, Baumgartner I, Becker T, Puschner B. Health service costs in people with severe mental illness: patient report vs. adminis- trative records. Psychiatr Prax 2012;39:122e8. [30] Johnston K, Buxton MJ, Jones DR, Fitzpatrick R. Assessing the costs of healthcare technologies in clinical trials. Health Technol Assess 1999;3:1e76. [31] Thorn JC, Coast J, Cohen D, Hollingworth W, Knapp M, Noble SM, et al. Resource-use measurement based on patient recall: issues and challenges for economic evaluation. Appl Health Econ Health Pol 2013;11:155e61. 106 C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
  • 15. Appendix A. Search strategy. Database: Medline (PubMed) (measure* OR correlat* OR evaluat* OR accuracy OR accurate OR precision OR agreement OR valid* [Title/Abstract]) AND (self-report [MeSH Terms] OR self-report [Text Word] OR self-assessment[Text Word] OR patient self-report [Text Word] OR patient-reported [Text Word] OR questionnaires [Mesh Term] OR recall [Text Word] OR interviews as topic [Mesh Term] OR health care surveys [Mesh Terms]) AND (Medical Records [MeSH Terms] OR Records as Topic [Mesh Term] OR medical records systems, computerized [Mesh Term], OR medical re- cord administrators [Mesh Term]) AND (health resources [MeSH Terms] OR delivery of health care [MeSH Terms] OR utilization [Text Word] OR resource use [Text Word]) AND data collec- tion [Mesh Term]) AND cost analysis. Appendix B. MeRQ tool: criteria of Methodological Reporting Quality as adapted from the Standards for Reporting Diagnostic Accuracy (STARD) and the Guidelines for Reporting Reliability and Agreement Studies (GRRAS) Study and reviewer identification Reviewer: First author: Year of publication: Title: Language: Country: Methodological Reporting Quality (MeRQ) Study criterion and reporting element (operational definition) In this study, reporting of the criterion is Explicit identification of the study as an evaluation of the reliability, agreement, or validity (accuracy) of a self-reported health care resource use measurement instrument compared with health care resource use measured using administrative data Good Fair Poor Not reported Not addressed Not applicable Proposed validation arguments and strategies for interpreting validation evidence Good Fair Poor Not reported Not addressed Not applicable The measurement methods (eligibility criteria) with a critical review of evidence relevant for the study Good Fair Poor Not reported Not addressed Not applicable Rationale for relationship between the self-report and the administrative data Good Fair Poor Not reported Not addressed Not applicable Methods or measurement procedures for the self- reports Good Fair Poor Not reported Not addressed Not applicable Methods or measurement procedures for the administrative data Good Fair Poor Not reported Not addressed Not applicable Time interval between self-reports and administrative data collection Good Fair Poor Not reported Not addressed Not applicable Population characteristics (including sample size, sampling methods, recruitment) Good Fair Poor Not reported Not addressed Not applicable Please choose 1 of the following boxes to fill in according to the information provided in the article: Does this study examine: validity? reliability? agreement? (fill in according to your assessment) Validity Statistical methods for validation estimates defined Good Fair Poor Not reported Not addressed Not applicable Validation estimates defined Good Fair Poor Not reported Not addressed Not applicable Methods for statistical uncertainty defined (VariabilitydConfidence Intervals) Good Fair Poor Not reported Not addressed Not applicable (Continued) 106.e1C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106
  • 16. Appendix B. Continued Reliability Statistical methods for validation estimates defined Good Fair Poor Not reported Not addressed Not applicable Validation estimates defined Good Fair Poor Not reported Not addressed Not applicable Methods for statistical uncertainty defined (VariabilitydConfidence Intervals) Good Fair Poor Not reported Not addressed Not applicable Agreement Statistical methods for validation estimates defined Good Fair Poor Not reported Not addressed Not applicable Validation estimates defined Good Fair Poor Not reported Not addressed Not applicable Methods for statistical uncertainty defined (VariabilitydConfidence Intervals) Good Fair Poor Not reported Not addressed Not applicable Concluding validation argument/interpretation of validation evidence Good Fair Poor Not reported Not addressed Not applicable Additional information 106.e2 C.Y. Noben et al. / Journal of Clinical Epidemiology 74 (2016) 93e106