Ware DEFINING AND MEASURING PATIENT SATISFACTION 1983.pdf

Evaluation andProgram Planning, Vol. 6, pp. 247-263, 1983
Printed in the USA. All rights reserved.
0149-7189/83 $3.00 + .OO
Copyright c 1984 Pergamon Press Ltd
DEFINING AND MEASURING PATIENT SATISFACTION
WITH MEDICAL CARE
JOHN E. WARE, JR., MARY K. SNYDER, W. RUSSELL WRIGHT, AND ALLYSON R. DAVIES
The Rand Corporation
ABSTRACT
This paper describes the development of Form II of the Patient Satisfaction Questionnaire
(PSQI, a self-administered survey instrument designed for use in general population studies.
The PSQ contains 55 Likert-type items that measure attitudes toward the more salient char-
acteristics of doctors and medical care services (technical and interpersonal skills of providers,
waiting time for appointments, office waits, emergency care, costs of care, insurance cov-
erage, availability of hospitals, and other resources) and satisfaction with care in general.
Scales are balanced to control for acquiescent response set. Scoring rules for 18 multi-item
subscales and eight global scales were standardized following replication of item analyses in
four field tests. Internal-consistency and test-retest estimates indicate satisfactory reliability
for studies involving group comparisons. The PSQ well represents the content of characteris-
tics of providers and services described most often in the literature and in response to open-
ended questions. Empirical tests of validity have also produced generally favorable results.
The Patient Satisfaction Questionnaire (PSQ) was
developed at Southern Illinois University (SIU) School
of Medicine during a study funded by the National
Center For Health Services Research and Develop-
ment. The major goals of the SIU project were to
develop a short, self-administered satisfaction survey
that would be applicable in general population studies
and would yield reliable and valid measures of con-
cepts that had both theoretical and practical impor-
tance to the planning, administration, and evaluation
of health services delivery programs. The SIU work led
to the development and testing of numerous instru-
ments including several patient satisfaction ques-
tionnaires as well as measures of the importance
placed on different features of medical care services.
We summarize here the conceptual work and empirical
results from the SIU studies that have been available
only in technical reports (Ware, Snyder, & Wright,
1976a, 1976b). We focus on Form II, which has proven
to be the most comprehensive and reliable version of
the PSQ.
CONCEPTUALIZING PATIENT SATISFACTION
In theory, a patient satisfaction rating is a personal
evaluation of health care services and providers. It is
wrong to equate all information derived from patient
surveys with patient satisfaction (Ware, 1981). For ex-
ample, patient satisfaction ratings are distinct from
reports about providers and care. Reports are inten-
tionally more factual and objective. Satisfaction rat-
ings are intentionally more subjective; they attempt to
capture a personal evaluation of care that cannot be
known by observing care directly. For example, pa-
tients can be asked to report the length of time spent
with their provider or to rate whether they were given
enough time. Although satisfaction ratings are some-
times criticized because they do not correspond per-
fectly with objective reality or with the perceptions of
providers or administrators of care, this is their unique
strength. They bring new information to the satisfac-
tion equation. We believe that differences in satisfac-
tion mirror the realities of care to a substantial extent;
these differences also reflect personal preferences as
well as expectations (see Ware et al., 1976b, pp. 433-
463, 607-622).
This research and preparation of this manuscript were supported by the National Center for Health Services Research and Development and
by the Health Insurance Study grant from the Department of Health and Human Services.
Reprint requests and inquiries should be sent to John E. Ware, Jr., Behavioral Sciences Department, The Rand Corporation, 1700 Main
Street, Santa Monica, CA 90406.
247

248 JOHN E. WAKE et al.
Thus, a patient satisfaction rating is both a measure
of care and a measure of the patient who provides
the rating. During the development of the PSQ, we
attempted to determine what satisfaction ratings
measure-features of the care or the patient. This
distinction is important for studies that attempt to use
satisfaction ratings as a source of information about
specific aspects of care. Specifically, when dissatisfac-
tion is detected, should care be changed or should pa-
tients be changed (i.e., their expectations, preferences,
and standards) to increase satisfaction?
dimension appear below, along with examples of item
content:
During field tests of Form I of the PSQ, we measured
separately the importance placed oneach characteristic
of doctors and services described by PSQ items. We
also measured independently how often each charac-
teristic was observed or experienced. We noted signifi-
cant effects of patient expectations and value pref-
erences on satisfaction ratings. These effects, however,
proved to be of more theorectical than practical in-
terest because they were small relative to the impact of
experiences reported by patients. For example, the
length of time a patient had to wait to see a doctor
determined satisfaction with office waits substantially
more than expectations or preferences for short and
long office waits. Hence, a satisfaction rating seems to
be much more a measure of care than it is a measure of
the patient, although the latter is a part of the message.
Another important conceptual issue is the nature
and number of dimensions of patient satisfaction. As
described beIow, we attempted to build a taxonomy of
these characteristics that would provide a framework
for classifying the content of satisfaction measures and
for evaluating the content validity of the PSQ. The
taxonomy we have derived during studies of the PSQ
posits that several different characteristics of providers
and medical care services influence patient satisfac-
tion, and that patients develop distinct attitudes toward
each of these characteristics. Brief definitions of each
Inter~ersonai manner: features of the way in which
providers interact personally with patients (e.g.,
concern, friendliness, courtesy, disrespect, rude-
ness).
Technical qua~jty: competence of providers and ad-
herence to high standards of diagnosis and treat-
ment (e.g., thoroughness, accuracy, unnecessary
risks, making mistakes).
Accessibility/convenience: factors involved in ar-
ranging to receive medical care (e.g., time and effort
required to get an appointment, waiting time at of-
fice, ease of reaching care location).
Finances: factors involved in paying for medical ser-
vices (e.g., reasonable costs, alternative payment ar-
rangements, comprehensiveness of insurance cover-
age).
ESficacy/outcomes: the results of medical care en-
counters (e.g., helpfulness of medical care providers
in improving or maintaining health).
Continuity: sameness of provider and/or location of
care (e.g., see same physician).
Physical envjron~ent: features of setting in which
care is delivered (e.g., orderly facilities and equip-
ment, pleasantness of atmosphere, clarity of signs
and directions).
Availability: presence of medical care resources
(e-g., enough hospital facilities and providers in
area).
The preceding order of these dimensions reflects the
relative frequency of their inclusion in studies of pa-
tient satisfaction before the PSQ. The first four (inter-
personal manner, technical quality, accessibility/
convenience, and finances) were by far the most com-
monly measured features of care measured in patient
satisfaction studies.
RESEARCH STRATEGY AND DATA SOURCES
The strategy for developing and testing the PSQ fo-
cused on improving the reliability and validity of items
and multi-item scales and reducing the costs (dollar
and time) required for their administration. That pro-
cess began with a survey (the Seven-County Study)
that included over 900 items administered in person by
trained interviewers (Chu, Ware, & Wright, 1973;
Ware, Wright, Snyder, & Chu, 1975). Ultimately,
Form II of the PSQ was much shorter and was self-
administered with success (Ware et al., 1976a).
iterative process that included formulations of models
of the dimensions of patient satisfaction, construction
of measures of those dimensions, empirical tests of the
measures and models, and refinements in both. This
iterative process included 12 studies of patient satisfac-
tion; some involved secondary analyses of data pro-
vided by others.’
Studies of Form II of the PSQ were replicated in
four independent field tests, including three general
Of necessity, the research began without an agreed-
upon conceptual framework for defining and measur-
ing patient satisfaction and with many unanswered
questions about methodological issues. Hence, the
instruments were field tested over a 4-year period in an
‘We gratefully acknowledge the cooperation of individuals who pro-
vided satisfaction questionnaire data for analysis, including:
Barbara Hulka and John Cassel at the University of North Carolina,
James Greenley and Richard Schoenherr at the University of Wis-
consin, and LuAnn Aday and Ronald Andersen at the University of
Chicago.

Defining and Measuring Satisfaction 249
population household surveys (East St. Louis, Illinois;
Sangamon County, Illinois; and Los Angeles County,
California) and a survey of patients enrolled in a fam-
ily practice center (Springfield, Illinois).
Sample sizes in these four sites ranged from 323 to
640, and their sociodemographic characteristics varied
considerably. Two thirds to four fifths of the respon-
dents in the East St. Louis, Sangamon County, and
Family Practice samples were women; in Los Angeles
County, almost two thirds were men. In East St. Louis,
90% of respondents were nonwhite; in Sangamon
County, 3% were nonwhite, and in Los Angeles
County, 35%. (Data on race were not obtained for the
Family Practice sample.) Median age in the three
household surveys was about 45 years; the Family Prac-
tice sample was younger, with a median age of 32
years. Median annual family incomes (in 1974 dollars)
ranged from a low of $5,400 in East St. Louis to
$9,500 in Los Angeles, and approximately $12,000 in
the Sangamon County and Family Practice samples.
Median educational levels were close to 12 years in
three samples; the median was 14 years in the Family
Practice sample. In summary, the samples ranged
from a chiefly nonwhite and socioeconomically disad-
vantaged sample in East St. Louis to predominantly
white, middle-class samples in Sangamon County and
the Family Practice center.
Content of the PSQ
Our research began by formulating hypotheses about
the nature and number of specific characteristics of
providers and medical care services that should be rep-
resented by PSQ items to achieve content validity. An
outline of satisfaction constructs was developed from
the content of available instruments, published books
and articles from the health services research literature,
and the responses of convenient samples of persons to
open-ended questions about their experiences with
doctors and medical care services. The latter studies
were designed to generate new items. We sought to
achieve a comprehensive specification of patient satis-
faction constructs and a good understanding of the
words people actually use when they talk about medi-
cal care services. This knowledge helped in choosing
the specific vernacular used to construct PSQ items.
The item-generation studies consisted of three tasks:
(a) making sentence fragments into statements of opin-
ions about medical care (e.g., write favorable and un-
favorable opinions using the words cost of cure);
(b) expressing comments about the most- and least-
liked aspects of medical care; and (c) responses from
group sessions in which participants were asked to
compose and discuss statements of opinion that re-
flected favorable and unfavorable sentiments about
medical care. These three tasks yielded a pool of ap-
proximately 2,300 items, that were sorted into content
categories by independent judges. The resulting con-
tent outline and constructs identified from other in-
struments and the literature were integrated into a tax-
onomy on which we based initial hypotheses about the
nature and number of satisfaction constructs. Redun-
dancies and ambiguities were identified and the item
pool was reduced to about 500 edited items, each
describing only one characteristic of medical care ser-
vices.
Data-Gathering and other
Methodological Considerations
A number of methodological studies addressed ques-
tions about data-gathering methods, the structure of
PSQ items, instructions to respondents, and other pro-
cedural issues. Some decisions were made after review-
ing the literature and consulting with experts; other
decisions were made after formal study. These deci-
sions are explained and selected methodological results
are summarized in the following paragraphs. Refer-
ences are provided to the more complete documenta-
tion of results from our studies of methodological
issues.
Choice of Likert-Type Items. A standardized patient
satisfaction item has two parts: the item stem and the
response scale. The item stem describes a specific
feature of care or care in general. The response scale
defines the choices used to evaluate that feature. PSQ
item stems, response choices, and scoring rules were
standardized to facilitate administration and to max-
imize reliability and validity. We chose the traditional
approach to attitude measurement in which the item is
structured as a statement of opinion, such as “It’s hard
to get an appointment for medical care right away,”
and response choices range from strongly agree to
strongly disagree.
Several different questionnaire formats were tested.
The format we recommend places the preceded re-
sponses to the right of the items, and labels these
responses at the head of each page, as shown in Fig-
ure 1.
In general population studies designed to measure
satisfaction with the respondent’s total medical care
experience, instructions are offered as follows:
On the following pages are some statements about medi-
cal care. Please read each one carefully, keeping in mind
the medical care you are receiving now. If you have not
received medical care recently, think about what you
would expect if you needed care today. On the line next to
each statement circle the number for the opinion which is
closest to your own view.
These instructions are followed by an example and fur-
ther explanation of how to use the response scale. The
instructions end with the following:

250 JOHN E. WARE et al.
I’m very satisfied with the
medical care I receive
~1
Figure 1. PSQ Item Format.
Some statements look similar to others, but each state-
ment is different. You should answer each statement by
itself. This is not a test of what you know. There are no
right or wrong answers. We are only interested in your
opinions or best impression. Please circle only one num-
ber for each statement.
This traditional Likert-type approach has several
advantages. First, use of identical response scales for
all items facilitates the task of completing a survey.
Once respondents become familiar with the response
choices, they can listen to or read each item stem and
quickly indicate their response. When choices differ
from item to item, more time and effort is involved.
Second, it is usually easier to format a questionnaire
when the same response choices are used for each item.
Such questionnaires can often be printed on fewer
pages. Third, we found it easier to revise items with the
goal of changing the distribution of item responses
(e.g., reduce skewness) when item stems were struc-
tured as statements of opinion. Examples of how PSQ
items were reworded in more favorable or more un-
favorable terms to manipulate response distributions
are reported in detail elsewhere (Ware et al., 1976a,
pp. 171-179). This manipulation was also done for
items structured as questions about satisfaction with
response choices that defined levels of satisfaction,
although it was more difficult and frequently required
awkward wording.
Nztmber of Response Choices. A key assumption
underlying our work was that satisfaction itself is a
continuum. Our goal in choosing an item response
scale was, therefore, that the responses should place
people as precisely as possible along that continuum in
terms of their attitudes toward services and providers.
The better each item performed in this regard, the
fewer the items required per scale. A response scale
with only two choices-agree versus disagree or satis-
fied versus dissatisfied- was judged to be too coarse.
Published studies and analyses of pretest data sug-
gested that five choices yielded more information and
more reliable responses than did two or three. Any fur-
ther increase in reliability with seven response choices
did not seem to warrant the resulting increase in ques-
tionnaire length and the additional complexity of for-
matting items. Thus, the response scale chosen for the
PSQ asks the respondent to select one of five choices
to report strength of agreement or disagreement
(strongly agree, agree, not sure, disagree, strongly dis-
agree).
Focus on Personal Versus Genera/ Care Experiences.
Another important characteristic of patient satisfac-
tion rating items is whether they focus on the respon-
dent’s personal care experiences or those of people in
general. An example of an item with a general referent
is “It takes most people a long time to get to the place
where they receive medical care.” The same item can
be structured to focus on the respondent’s personal ex-
perience: “It takes me a long time to get to the place
where I receive medical care.” Both kinds of items
have been used widely in patient satisfaction surveys
(Snyder & Ware, 1975). The main reason for being in-
terested in items with a more general referent was to
reduce the number of items left unanswered because of
inapplicability. The validity and other psychometric
characteristics of these general items, however, had
not been studied systematically.
To examine these characteristics, we studied 10 pairs
of items that measured satisfaction constructs in three
general categories: access to care (2 pairs), finances (2
pairs), and quality of care (6 pairs). items in each pair
differed only in terms of whether they asked about the
respondent’s own care or care received by people in
general (as in the examples above). These item pairs
were interspersed throughout a special 7%item version
of the PSQ fielded in the Los Angeles and Sangamon
County field tests (total n = 952). Paired items were
compared in terms of test-retest reliability (6-week in-
terval), factorial validity (similarity of correlations
across derived satisfaction factors), predictive validity
(in relation to five health and illness behaviors), and
differences in mean scores and variances. Results from
both field tests supported the same conclusions. No
noteworthy differences in reliability or validity coeffi-
cients were observed between items in the same pair.
Mean scores for items evaluating personal care ex-
periences were consistently significantly more favor-
able for items that described the experiences of people
in general. Explanations for differences in mean scores
are discussed elsewhere (Snyder et al., 1975). The prac-
tical implication is that the difference in item referent
(personal vs. general) has little or no impact on reli-
ability or validity. Hence, the choice between the two

kinds of items was made with other considerations in
mind.
Ad~~~istr~tion Methods. Development and validation
of the PSQ required the design of oral interview sched-
ules and various self-administered questionnaires.
Our analyses of administration methods examined
their effects on response rates, completeness of data,
data-gathering costs, characteristics of respondents
and nonrespondents, and satisfaction levels. We also
examined the effects of asking other questions before
administration of the PSQ.
Response rates were not completely determined by
administration method. For example, in a randomized-
groups experiment during the Los Angeles County
field test, approximately 69% of those who were asked
to self-administer and return a questionnaire booklet
by mail returned the booklet as compared with a 95%
completion rate for those whose self-administration
was supervised by a trained interviewer. Other field
tests showed no difference between return rates for
groups who self-administered the PSQ with and with-
out supervision (the latter with mail-back). The
vigorousness of follow-up seemed to be the more im-
portant factor in determining completion rates when
mail-back was relied upon. Further, we detected no
difference in data quality between supervised and un-
supervised self-administration of the PSQ. Supervision
by a trained interviewer increased data-gathering costs
about 5-fold.
In the Los Angeles County field test, characteristics
of respondents and nonrespondents to a mail-back
survey were compared. These characteristics were
documented during an interview before the ques-
tionnaire was either dropped off for self-administra-
tion and mailed return or completed under super-
vision. The drop-off/mail-back method resulted in
significant underrepresentation of persons aged 40 and
younger, nonwhites, and low-income persons. Com-
parison of satisfaction scores for persons in the mail-
back and hand-back groups suggested that those who
were more satisfied with the quality of their care are
less likely to return questionnaires.
We also examined whether differences in satisfac-
tion levels might be caused by differences in when the
PSQ was administered during a longer interview
schedule. We randomly varied whether the PSQ was
self-administered before or after a series of questions
that asked about use of health care services and com-
pliance with medical regimens. We hypothesized that
questions about health care experiences might increase
the salience of attitudes toward those experiences (i.e.,
medical care satisfaction). For 14 of the 18 PSQ scales,
scores tended to be lower for those who answered
questions about health care experiences first. Scores
on scales measuring access to care in emergencies,
costs of care, and payment mechanisms were signifi-
cantly lower. These results suggest that administration
procedures (and particularly the placement of satisfac-
tion questions) in a longer survey should be standard-
ized. Further research is necessary to determine
whether satisfaction ratings are more or less valid if
obtained after a review of recent health care exper-
iences.
The length of time required to complete the PSQ
was systematically measured. Considerable variability
was observed across respondents. On average, respon-
dents took about 11-13 seconds to complete each PSQ
item. Thus, the 55 items used to score scales con-
structed from Form II of the PSQ take about 11
minutes to complete. The 43-item PSQ short form
takes 8-9 minutes on average.’ Administration times
tend to be somewhat longer for disadvantaged respon-
dents (i.e., Iow education, low income).
Response Set Effects. Beginning with Form I, all ver-
sions of the PSQ contained a balance of favorably and
unfavorably worded items to control for bias due to
acquiescent response set (ARS), a tendency to agree
with statements of opinion regardless of content. ARS
bias was a noteworthy problem that became apparent
at several stages during development of the PSQ, in-
cluding: empirical studies of item groupings (i.e.,
factor analyses of items), estimation of internal-
consistency reliability, and during comparisons of
group differences in satisfaction (Ware, 1978). More
recently, similar problems have surfaced in other
studies of health care attitudes (Winkler, Kanouse, &
Ware, 1982).
During tests of the PSQ, 40% to 60% of respon-
dents manifested some degree of ARS, and 2% to 10%
demonstrated substantial ARS tendencies. (We used
11matched pairs of favorably and unfavorably worded
items that measured the same feature of care and ex-
tremely worded validity check items to identify such
respondents.) Effects of ARS bias included: appear-
ance of method rather than trait factors in item analy-
ses (i.e., factors defined by differences in the direction
of item wording, not differences in characteristics of
medical care); inflated reliability estimates for un-
balanced multi-item scales; and seriously biased com-
parisons of mean differences between groups of re-
spondents differing in educational attainment, income,
and age. (These effects were also observed in analyses
of responses to the Thurstone scales constructed by
‘The 43-item PSQ short form was developed for use in Rand’s
Health Insurance Experiment, a randomized controlled trial de-
signed to estimate the effects of different health care financing ar-
rangements and organizations on patient satisfaction. This short
form was also fielded in a national study of access to health care ser-
vices (Aday, Andersen, & Fleming, 1980). The short form ques-
tionnaire and scoring instructions are available from the authors.

Hulka and her colleagues [Hulka, Zyzanski, Cassel, &
Thompson, 19701.) For example, differences in satis-
faction with quality of care between education groups
were substantially overestimated by PSQ scales con-
structed entirely from favorably worded items, and
were missed entirely by scales constructed entirely
from unfavorably worded items. The balanced PSQ
Technical Quality satisfaction subscale, which was not
correlated with ARS, detected significant differences
in satisfaction between education groups (Ware,
1978).
PSQ Items and Descriptive Statistics
We also studied two other types of response set that
might bias patient satisfaction ratings: opposition
response set (ORS, a tendency to disagree with state-
ments regardless of content), and socially desirable
response set (SDRS). ORS proved to be very rare and
thus of little concern. SDRS was common but did not
correlate with ratings of satisfaction with medical care
(see Ware et al., 1976b, pp. 537-588).
Following the Seven-County study and several small-
sample pretests of instructions and instrument format,
80 Likert-type items were self-administered in Form I
of the PSQ during a survey of households in three
southern Illinois counties, the Tri-County Study (Ware
8~Snyder, 1975). Each item was worded as a statement
of opinion and items were evenly divided between
favorable and unfavorable statements. Analyses of
items in Form I led to substantial revisions and to con-
struction of Form II of the PSQ. Only 4 items from
Form I were retained without revision; 59 were revised
and retained, and 5 new items were written for Form
II.
The verbatim content of all 68 PSQ Form II items
appears in Table 1; items are listed in the order of their
administration. Before testing multi-item PSQ scales,
we evaluated item descriptive statistics. Specifically,
we checked distributions of item score to determine
TABLE 1
ITEMS IN FORM II OF THE PSQ
Item
Number Item Content
1’
2
3’
4*
5*
6”
7
8*
9*
10’
11
12*
13
14’
15
16’
17
18*
19’
20
21’
22’
23’
24
25*
26’
27
28*
29’
30’
31*
32’
33”
34’
35’
36*
37
I’m very satisfied with the medical care I receive.
Doctors let their patients tell them everything that the patient thinks is important
Doctors ask what foods patients eat and explain why certain foods are best.
I think you can get medical care easily even if you don’t have money with you.
I hardly ever see the same doctor when I go for medical care.
Doctors are very careful to check everything when examining their patients.
We need more doctors in this area who specialize.
If more than one family member needs medical care, we have to go to different doctors.
Medical insurance coverage should pay for more expenses than it does.
I think my doctor’s office has everything needed to provide complete medical care.
Doctors never keep their patients waiting, even for a minute.
Places where you can get medical care are very conveniently located.
Doctors act like they are doing their patients a favor by treating them.
The amount charged for medical care services is reasonable.
Doctors always tell their patients what to expect during treatment.
Most people receive medical care that could be better.
Most people are not encouraged to get a yearly exam when they go for medical care.
If I have a medical question, I can reach someone for help without any problem.
In an emergency, it’s very hard to get medical care quickly.
I can arrange for payment of medical bills later if I’m short of money now.
I am happy with the coverage provided by medical insurance plans.
Doctors always treat their patients with respect.
I see the same doctor just about every time I go for medical care.
The amount charged for lab tests and x-rays is extremely high.
Doctors don’t advise patients about ways to avoid illness or injury.
Doctors never recommend surgery (an operation) unless there is no other way to solve the problem.
Doctors hurt many more people than they help.
Doctors hardly ever explain the patient’s medical problems to him.
Doctors always do their best to keep the patient from worrying.
Doctors aren’t as thorough as they should be.
It’s hard to get an appointment for medical care right away.
There are enough doctors in this area who specialize.
Doctors always avoid unnecessary patient expenses.
Most people are encouraged to get a yearly exam when they go for medical care.
Office hours when you can get medical care are good for most people.
Without proof that you can pay, it’s almost impossible to get admitted to the hospital.
People have to wart too long for emergency care.

Item
Number
Defining and Measuring Satisfaction
TABLE f fcontinued)
Item Content
253
38
39*
40*
41
42’
43*
44
45*
48
47”
48
49”
50
51*
52”
53*
54
55”
56
57
58*
59
60
6-t”
82
63
64
65
66”
67”
66
_*
Medical insurance plans pay for most medical expenses a person might have.
Sometimes doctors make the patient feel foolish.
My doctor’s office lacks some things needed to provide complete medical care.
Doctors always explain the side effects of the medicine they prescribe.
There are enough hospitals in this area.
It takes me a long time to get to the place where I receive medical care.
Just about all doctors make house calls.
The care I have received from doctors in ihe last few years is just about perfect.
Doctors don’t care if their patients worry.
Sometimes doctors take unnecessary risks in treating their patients,
In an emergency, you can always get medical care.
The fees doctors charge are too high.
Doctors are very thorough.
The medical problems I’ve had in the past are ignored when I seek care for a new medical problem.
Parking is a problem when you have to get medical care.
There are enough family doctors around here.
Doctors never expose their patients to unnecessary risk.
Doctors respect their patient’s feelings.
It’s cash in advance when you need medical care.
Doctors never look at their patient’s medical records.
There are things about the medical care I receive that could be better.
When doctors are unsure of what’s wrong with you, they always call in a specialist.
When I seek care for a new medical problem, they always check up on the problems I’ve had before.
More hospitals are needed in this area.
Doctors sefdom explain why they order lab tests and x-rays.
I think the amount chargad for emergency room service is reasonable.
Sometimes doctors miss important information which their patients give them.
My doctor treats everyone in my family when they need care.
Doctors cause some people to worry a lot because they don’t explain medical problems to patients.
There is a big shortage of family doctors around here.
Sometimes doctors cause their patients unnecessary medicai expenses.
People are usually kept waiting a long time when they are at the doctor’s office.
Note. Items marked with an asterisk are included in the43.item short form of the PSQ; one item in that form does
not appear in Form II.
In addition, four items (11, 27, 44, and 57) were used only as validity checks.
whether revisions in item wording would be necessary
to achieve roughIy symmetrical (if not normal)
response distributions. These characteristics are desir-
able for items to be used in simple summated ratings
scales _
Because questionnaire responses for alI PSQ items
were preceded so that “strongly agree” equaled 1 and
“strongly disagree” equaled 5, responses to the favor-
ably worded items were recoded as shown in Table 2.
iMeans and standard deviations for the 68 PSQ Form
II items in four field tests appear in Table 3. AU items
are scored so that a higher number indicates a more
favorable evaluation of medical care.
Constructing Multi-Item Subscales
Our experiences in analyzing 87 items from the Seven-
County Study (Chu et ai., 1973; Ware, Miller, &
Snyder, 1973) convinced us that an individual ques-
tionnaire item is not a very satisfactory unit of analysis
for a study of the structure of patient attitudes about
doctors and medical care services. An item score is
coarse, less reliable, and substantially influenced by
the direction of item wording and other methodologi-
cal features in addition to the construct(s) being
measured. Although the Seven-County Study gave us
TABLE 2
lTEM SCORING RULES FOR FQRM II OF THE PSQ
Scoring Item Numbersa
1 = Strongly disagree 1, 2, 3, 4, 6, 10, 72, 14,
2 = Disagree 15, ‘f8, 20, 21I 22, 2$
3 = Not sure 26, 29, 32, 33, 34, 35,
4 = Agree 38, 41, 42, 45, 48, 50,
5 = Strongly agree 53, 55, 59, 60, 83, 05
5 = Strongly disagree 5, 7, 8, 9, 13, 16, 17, 19,
4 = Disagree 24, 25, 26,30, 31, 36,
3 = Not sure 37, 39,40, 43, 46, 47,
2 = Agree 49, 51, 52, 54, 56, 56,
1 = Strongfy agree 61, 62, 64, 66, 67, 66
Vhe four validity-check items (numbers 11,27,44, and 57) are
not included (see text).

TABLE 3
ITEM DESCRIPTIVE STATISTICS, PSQ FORM II
Item
No.a
East St. Louis
Mean SD
SangamonCounty
Mean SD
Los Angeles County
Mean SD
1 3.50 1.21 3.67 1.05 3.60 1.08
2 3.52 1.13 3.43 1.07 3.49 1.04
3 3.23 1.21 3.01 1.08 3.09 1.11
4 2.61 1.25 3.14 1.11 2.43 1.18
5* 3.44 1.28 3.89 1.03 3.62 1.16
6 3.03 1.29 3.00 1.15 3.03 1.16
7' 1.81 0.98 3.02 1.13 2.83 1.08
8' 2.98 1.22 2.74 1.18 3.01 1.13
9* 1.93 0.94 2.27 1.08 2.02 0.94
10 3.08 1.71 3.50 0.98 3.45 1.06
11 1.80 1.05 1.50 0.80 1.71 0.87
12 2.98 1.22 3.48 0.93 3.24 1.10
13' 2.91 1.17 3.28 1.08 3.25 1.11
14 2.58 1.11 2.53 1.13 2.25 1.08
15 2.89 1.20 3.04 1.09 2.93 1.09
16* 2.29 0.93 2.59 0.86 2.45 0.93
17' 2.64 1.17 3.02 1.08 2.92 1.09
18 2.83 1.24 3.28 1.08 3.16 1.16
19* 2.40 1.24 3.09 1.10 2.90 1.19
20 3.36 1.08 3.71 0.76 3.36 1.01
21 2.89 1.20 2.95 1.11 2.88 1.15
22 3.48 1.07 3.40 1.00 3.38 1.00
23 3.63 1.12 3.78 0.90 3.60 1.08
24* 2.07 0.91 2.17 0.92 2.12 0.96
25' 2.84 1.17 2.92 1.05 2.99 1.12
26 3.40 1.11 3.28 0.96 3.13 1.04
27* 3.58 0.92 4.09 0.70 3.90 0.83
28' 2.89 1.20 3.42 1.04 3.38 1.03
29 3.52 1.03 3.41 0.93 3.46 0.88
30* 2.55 1.04 2.84 1.01 2.74 0.98
31' 2.42 1.12 2.37 1.13 2.70 1.13
32 2.25 1.13 3.19 1.00 3.08 0.96
33 2.66 1.02 2.66 0.95 2.48 0.95
34 3.11 1.09 3.26 0.96 3.14 1.01
35 3.17 1.03 3.30 0.97 3.06 1.04
36' 2.27 1.18 2.68 1.02 2.17 1.01
37* 2.03 1.10 2.55 1.07 2.48 1.04
38 3.15 1.06 2.95 1.02 2.90 1.08
39* 2.79 1.07 3.07 1.02 3.06 1.02
40* 2.72 1.05 3.31 0.91 3.25 0.96
41 2.96 1.16 2.77 1.03 2.90 1.06
42 2.28 1.18 3.06 106 3.26 0.99
43' 3.30 1.16 3.72 0.83 3.53 0.98
44 1.56 0.82 1.49 0.67 1.59 0.74
45 3.02 1.18 3.20 1.07 3.10 1.10
46' 3.48 0.94 3.46 0.91 3.42 0.89
47* 2.98 0.97 3.37 0.86 3.21 0.88
48 2.91 1.21 3.41 0.91 316 1.04
49* 2.22 0.97 2.40 1.04 2.05 0.89
50 2.87 1.05 2.96 0.95 2.98 0.95
51' 3.32 1.08 3.48 0.87 3.46 0.90
52' 2.88 1.11 3.14 1.12 3.17 1.04
53 2.15 1.02 2.19 0.89 2.72 0.97
54 2.98 0.97 3.10 0.83 3.12 0.84
55 3.45 0.98 3.46 0.90 3.50 0.80
56* 3.21 1.07 3.89 0.67 3.22 1.04
57 2.26 0.92 2.06 0.66 2.15 0.69
56* 2.38 1.04 2.66 1.00 2.58 0.99
59 3.45 1 11 3.43 0.89 3.42 0.94
60 3.55 1.00 3.42 0.85 3.51 0.86
61* 1.93 0.97 3.01 1.04 3.08 0.96

TABLE 3 (continued)
East St. Louis Sangamon County Los Angeles County
Item
No.a Mean SD Mean SD Mean SD
62’ 2.90 1.16 3.27 1.01 3.22 1.03
63 2.45 1.04 2.54 1.Ol 2.46 0.97
64* 2.58 0.98 2.85 0.85 2.83 0.86
65 3.19 1.19 2.96 1.17 3.15 1.10
66’ 2.66 1.08 2.98 1.oo 3.02 0.98
67’ 2.03 0.98 2.28 0.88 2.69 0.93
68* 2.48 1.oo 2.78 0.93 2.59 0.89
altems are listed in the order they appear in Form II of the PSQ; see Table 1 for content.
*These items define unfavorable attitudes; their raw scores have been recorded here follow-
ing the item scoring rules in Table 2.
255
our first “picture” of the structure of patient satisfac-
tion, the picture was not very clear because of these
methodological problems.
A major goal in our studies of Form I was to test
empirically our taxonomy of patient satisfaction con-
structs. If supported, this taxonomy would provide the
“blueprint” for Form II. Our progress toward this goal
would be limited by the adequacy of the measures
available for model testing. To increase chances for
success, we adopted the concept of a Factored
Homogeneous Item Dimension (FHID) developed by
Comrey (1961). He used this technique because of its
advantages in solving various measurement problems
in personality research; we discuss these advantages in
reference to the PSQ elsewhere (Ware & Snyder, 1975;
Ware et al., 1976a).
Simply stated, a FHID is a group of items that has
satisfied both logical and statistical criteria. The logical
criterion is that the items have very similar content (ap-
pear highly conceptually related). Abbreviated exam-
ples of the content of items from a FHID measuring
attitude toward the interpersonal manner of providers
are: Doctors treat their patients with respect, doctors
make patients feel foolish, and doctors act like they
are doing patients a favor by treating them. We labeled
this FHID Respect. Empirically, items in the same
FHID must share substantially more variance with
each other than with items in other FHIDs. Items that
fulfill these criteria are combined to yield a single score
that serves as the unit of analysis in subsequent analy-
ses. The FHID strategy, which is in contrast to the
common practice of using a single questionnaire item
as the unit of analysis, was employed extensively in
evaluations of Forms I and II of the PSQ. Results for
Form I are reported elsewhere (Ware & Snyder, 1975;
Ware et al., 1976a, pp. 167-179).
Our evaluation of item groupings hypothesized for
Form II of the PSQ was conducted in two phases.
First, 20 hypothesized FHIDs were tested with data
from the Sangamon County field test (n = 432). Seven
matrices of inter-item correlations, each containing
five or six FHIDs, were factor analyzed. Inspection of
seven factor matrices had several advantages. The
number of PSQ items per matrix ranged from only 14
to 16 for a subjects/variables ratio of greater than
25/l in all matrices. Further, each FHID could be
tested against more than one combination of other
FHIDs. (It is much more difficult to validate a FHID
against other FHIDs that measure conceptually similar
as opposed to dissimilar constructs.) In addition to
testing specific hypotheses about Form II item group-
ings, analyses of the seven matrices also provided a
thorough test for unhypothesized satisfaction factors.
Results from the FHID validation studies in San-
gamon County confirmed 17 of the 20 FHIDs hypoth-
esized to measure specific dimensions of patient
satisfaction with doctors and medical care services.
These FHIDs included 51 items. Results also confirmed
an 18th FHID of four items that measured satisfaction
with medical care in general. These 18 item groupings
(FHIDs) and higher-order factors (global scales), iden-
tified in Table 4, were subjected to multitrait scaling
tests during the second step of our item analyses. The
multitrait analyses were performed independently in
each of the four field tests.
The multitrait analyses involved inspection of item-
scale correlation matrices to evaluate each item in rela-
tion to two criteria: first, based on the logic of Likert
(1932) scaling, whether each item had a substantial
linear relationship with the total score for its hypothe-
sized scale; and second, based on the logic of discrimi-
nant validity, whether each item correlated higher with
its hypothesized scale than with other scales. (In these
analyses, we used a modified version of the Analysis of
Item-Test Homogeneity (ANLITH) program developed
by Thomas Gronek at IBM and Thomas Tyler at the
Academic Computing Facility, Southern Illinois
University.) Additional details regarding specific
criteria for scaling “successes” and “failures” are
reported elsewhere (Ware et al., 1976a, pp. 179-210).
Item-scale correlations were corrected for overlap
using the technique recommended by Howard and

TAOLE4
VALIDATED ITEM GROUPINGS FOR PSQ SUBSCALES
D~mension/ltem Grouping Item Number
Access to care (nonfinancial)
1. Emergency care 19, 37, 48
2. Convenience of services 12,43
3. Access 18,31
Financial aspects
4. Cost of care 14, 24, 49, 63
5. Payment mechanisms 4, 20, 36, 56
6. Insurance coverage 9, 21, 38
Availability of resources
7. Family doctors 53,67
8. Specialists 7, 32
9. Hospitals 42,61
Continuity of care
10. Family 8, 65
11. Self 5, 23
Technical Quality
12. Quality/~ompetence 3, 6, 17, 25, 30, 34,
50, 51, 60
13. Prudence-Risks 47,54
14. Doctor’s facilities 10,40
Interpersonal manner
15. Explanations 28, 62, 66
16. Consideration 22, 26, 29, 39, 55
17. Prudence-Expenses 33,68
Overall satisfaction
18. General satisfaction 1, 16, 45, 58
Note. Source: Adapted from Figure 21 in Ware, Snyder, and
Wright (1976a), p. 198.
Forehand (1962). This correction provided more strin-
gent tests of scaling criteria by removing the effect of
the item being evaluated from the total scale score.
Because the scales were short, each item had a con-
siderable influence on the total scale score.
Multitrait scaling is not as complete as convergent-
discriminant validation with the multitrait-multimethod
(MTMM) approach described by Campbell and Fiske
(1959). Only one measurement method is represented
in the matrix in multitrait scaling. We would argue,
however, that our approach is a “cousin” of the
MTMM approach and that it is superior to traditional
analyses of item internal-consistency because it pro-
vides discriminant tests of item validity across traits (in
this case satisfaction constructs) that are measured by
the same method.
Results of the multitrait scaling analyses were more
than satisfactory for all 18 subscales in all four sites.
Only I1 correlations (corrected for overlap) between
items and their hypothesized scales were below 0.30 in
220 tests across four sites. Of 3,740 tests of the item
discriminant validity criterion (the second and more
stringent criterion just defined), approximately 98%
were favorable. Items in six of the hypothesized scales
(the three Availability scales, Cost of Care, Insurance
Coverage, and Doctor’s Facilities) passed the criterion
in 100% of the item-discriminant validity tests in all
four field tests. The largest number of discrepancies
were observed for items in the Access to Care, Prudence-
Risks, and Prudence-Expenses subscales; most of
TABLE 5
GLOBAL SATISFACTION SCALES SCORED FROM FORM II OF THE PSQ
Global Scale Combine These ItemslSubscales:
Access to care
Availability
Finances
Continuity
Interpersonal manner
Quality total
Access total Access + finances
Doctor conduct total Interpersonal manner + Quality total
(19 + 37 + 48) f (12 + 43) + (18 + 31)
Emergency care + Convenience of services + Access
to care
(53 + 67) + (42 + 61) + (7 + 32)
Availabilitylfamily doctors + Availability/hospitals +
Availability/specialists
(14 + 24 + 49 + 63) + (9 + 21 + 38) + (4 + 20 + 36 + 56)
Cost of care + Insurance coverage + Payment mechanisms
(8 + 65) + (5 + 23)
Continuity of Care/Family + Continuity of Care/Self
(22 + 26 + 29 + 39 + 55) + (28 + 62 + 66)
Consideration -I- Explanations
(10 + 40) + (47 + 54) + (3 + 6 + 17 + 25 + 30 + 34 +
50 + 51 + 60)
Doctor’s facilities + Prudence/expenses + Quality/
competence

TABLE 6
MEANS AND STANDARD DEVIATIONS FOR PSQ FORM II SCALES
257
Number Highest
of Possible
Items Score
ESL SAC FP LAC
R SD X SD R SD ii SD
Nonfinancjal access
Emergency care
Convenience
Access to care
Financial access
Cost of care
Insurance coverage
Payment mechanisms
Availability
Family doctors
Specialists
Hospitals
Continuity of care
Family
Self
Humaneness
Consideration
Explanations
Technical quality
Doctors’ facilities
Prudence/risks
Quality/competence
Prudence/expenses
Overall satisfaction
General satisfaction
3 15 7.3 2.7 9.1 2.4 9.2 2.6 8.6 2.7
2 10 6.3 1.9 7.2 1.5 7.2 1.6 6.8 1.8
2 10 5.2 1.9 5.6 1.8 6.3 2.0 5.9 1.9
4 20 9.2 2.7 9.6 3.1 11.0 3.1 8.9 2.9
3 15 7.9 2.3 8.2 2.6 7.6 2.6 7.9 2.7
4 20 il.4 2.9 13.4 2.3 13.9 2.5 11.2 3.1
2 10 4.1 1.7 4.5 1.6 4.7 1.6 5.4 1.7
2 10 4.0 1.8 6.2 1.9 6.6 1.9 5.9 1.8
2 10 4.1 1.9 6.1 2.0 6.6 2.1 6.4 1.8
2 10 6.2 2.0 5.7 2.1 6.7 2.2 6.4 2.2
2 10 7.1 2.0 7.7 1.6 7.6 2.0 7.2 2.0
5 25 16.6 3.8 16.6 3.6 16.3 4.2 16.6 3.5
3 15 8.5 2.6 9.7 2.4 10.0 2.6 9.6 2.4
4 20 11.2 3.0 12.1 3.1 12.1 3.2 11.8 3.1
10 5.8 2.0 6.8 1.7 6.6 1.8 6.7 1.9
10 6.0 1.4 6.5 1.4 6.4 1.6 6.3 1.5
45 27.0 6.0 27.9 5.9 28.4 6.8 27.9 5.6
10 5.1 1.6 5.4 1.6 5.6 1.8 5.1 1.7
Note. Field Tests: East St. Louis (ESL), Sangamon County (SAC), Family Practice Center (FP), & Los Angeles County (LAC).
these discrepancies were noted in data from the East
St. Louis field test, which provided the most econom-
ically disadvangtaged sample. Thus, with relatively
few exceptions, the internal consistency of hypothesized
subscales and the discriminant validity of the 55 PSQ
item scores was demonstrated successfully in four in-
dependent studies. (The four general satisfaction items
proved to be substantially internally consistent. They
were not expected to show high discriminant validity
and, in fact, correlated significantly, if not substan-
tially, with the 17 other subscales.)
On the strength of these findings, the 18 PSQ Form
II subscales were scored using the item groupings
shown in Table 4. Scale scores were calculated by
computing the simple algebraic sum of the items in the
scale, after scoring the items as shown in Table 2.
Items were constructed and modified, as necessary, to
achieve nearly equal (unit) variances; item content was
modified, as necessary, so that items in the same scale
(FHID) would have approximately the same correla-
tions with their primary factor, and no other substan-
tial correlations. These goals were generally met and it
was not necessary, therefore, to standardize items or
to use factor coefficients to weight them differently.
Higher scores on all scales indicate more favorable at-
titudes.
Logical and empirically verified groupings of PSQ
subscales were used to compute global satisfaction
scores. The item groupings for global scales were hy-
pothesized from the taxonomy of satisfaction con-
structs and the higher-order factor structure of PSQ
subscales (discussed later). Scoring rules for six global
PSQ Form II scales are defined in Table 5. The global
scales are computed after scoring the items as shown in
Table 2 and the subscales as shown in Table 4. De-
scriptive statistics (means and standard deviations) for
the subscales and global scales in the four field tests
appear in Table 6.
RELIABILITY AND STABILITY ANALYSES
We placed considerable emphasis on evaluating the re-
liability of the PSQ. This emphasis stemmed from
several considerations. First, reliability estimates had
been published rarely for the satisfaction measures
developed before the PSQ, and we found no published
estimates of test-retest reliability or intertemporal
stability of such measures (Ware, Davies-Avery, &
Stewart, 1978). Second, reliability estimates were

258 JOHN E.
essential to interpret results of validity studies (e.g., an
MTMM matrix). Finally, because internal consistency
reliability estimates are a direct function of item
homogeneity, these analyses provided further evidence
regarding the appropriateness of the PSQ item group-
ings. (Homogeneity estimates, or average inter-item
correlations, for the subscales and global scales appear
in Ware et al., 1976b, pp. 299-321).
Both internal consistency and test-retest methods of
estimating reliability were used. Internal consistency
reliability was estimated, using coefficient CY
(Cronbach,
1951), from data obtained during a single administra-
tion of the PSQ in each of four field tests. Estimates
of test-retest reliability were obtained by computing
product-moment correlations between scores for the
same respondents on two administrations of the PSQ
approximately 6 weeks apart in two field tests (East St.
Louis and Sangamon County).
Internal consistency (ICR) and test-retest (TRT)
reliability estimates for the PSQ subscales and global
scales appear in Table 7. For the 18 subscales, 68 of the
72 ICR estimates exceeded the 0.50 standard recom-
mended for studies that involve group comparisons
WARE et al.
(Helmstadter, 1964). For 17 subscales administered
twice, 28 of the 34 TRT estimates equaled or exceeded
that criterion (such estimates were not available for
General Satisfaction). These results were encouraging,
particularly because more than half of the Form II
subscales were each constructed from only two ques-
tionnaire items.
Test-retest coefficients for single-item measures
were much less favorable in the two field tests that
repeated administrations of the PSQ. Approximately
75% of the 55 items failed to achieve the 0.50 standard
for test-retest reliability in East St. Louis, and approx-
imately one third failed to meet that standard in San-
gamon County. Thus, muiti-item PSQ subscales repre-
sent a substantial improvement in reliability over
single-item measures. These gains over single-item
measures are particularly important in studies of
disadvantaged respondents.
The reIiability of PSQ scores improved further even
for disadvantaged respondents, when the 1Xsubscales
were aggregated into global satisfaction scores (see the
lower part of Table 7). The highest reliability coeffi-
cients were observed for the global Quality of Care
TABLE 7
SUMMARY OF RELIABILITY ESTIMATES FOR SATISFACTION SCALES
Field Tests
SAC ESL FP LAC
-
Scale Name k ICR TRT ICR TRT ICR ICR
Access to care 2 56 .71 53 .52 .65 .49
Convenience of services 2 .57 58 .48 .44 .47 58
Emergency care 3 .68 .66 .63 56 .72 .70
Availability-family doctors 2 .72 .46 .68 .52 .78 .62
Availability-hospitals 2 .91 .87 .80 .66 .93 .80
Availability-specialists 2 .74 .74 .71 52 .80 .67
Continuity of care-family 2 .73 .79 54 .64 68 .32
Continuity of care-self 2 51 52 .52 59 .83 .66
Cost of care 4 .73 73 .60 .63 .70 .70
Insurance coverage 3 .71 .73 51 .48 76 .64
Payment mechanisms 4 .50 .51 .51 59 .57 .63
Consideration 5 .81 .74 .77 68 .84 74
Explanations 3 .70 .74 .64 .48 .75 .71
Prudence-expenses 2 .66 .58 .47 50 .78 .57
Doctor’s facilities 2 .82 .73 .73 .72 .84 .75
Prudence-risks 2 .60 .46 .23 .39 .69 .54
Quality/competence 9 .83 .74 .77 .70 .a7 .79
General satrsfaction 4 .77 NA .62 NA 73 .70
Global scales
Access to care 7 .72 .72 .73 .62 .74 77
Financial aspects 10 .66 .75 ‘60 69 .70 .76
Access total 17 .79 .79 .78 .7l .81 .84
Availability 6 .66 .75 .74 .62 .57 .73
Contmurty of care 4 59 .74 .43 .63 .73 .52
Doctor conduct 23 .92 .82 .88 .78 .94 .90
Note. Source: based on Tables 54, 56, 59, and 60 in Ware, Snyder, and Wright (1976a).
Field Tests: East St. Louis (ESL), Sangamon County (SAC), Family Practice Center (FP), & Los Angeles County (LAC).

scale, because it is the longest and most homogeneous (age, gender, education, and income) indicated that
scale. Although correlations among subscales in the satisfaction ratings tend to be less reliable for persons
same global scale should be substantial and positive, reporting less income or education.
this standard was not always met. The poor reliability Results (data not presented) regarding the stability
of the global Availability of Resources scale in the of satisfaction levels over a 2-year interval came from
Family Practice study was traced to a negative correla- a follow-up study of respondents in a field test of
tion between two of its component subscales. The access Form I of the PSQ. Correlations between scores for
subscales also tended to be less highly intercorrelated scales administered approximately 2 years apart ranged
than subscales used to construct other global scales. from 0.34 for Availability to 0.60 for Nonfinancial
Given such results, the interpretation of the aggregate Access and 0.61 for Doctor Conduct. (These are lower-
measures may be problematic. bound stability estimates, because the PSQ forms were
The PSQ subscales and global scales tended to be not identical on both administrations.) The results sug-
less reliable in East St. Louis than in other field tests. gest that satisfaction is relatively stable over time.
Consistent with this finding, comparisons of scale Therefore, precision in hypothesis-testing is likely to
reliabilities within field tests for groups formed on the improve significantly with a repeated-measures design
basis of demographic and socioeconomic variables and covariation on initial satisfaction levels.
VALIDITY OF THE PSQ
Validation, or determining the meaning of scores and
how to interpret a difference of a particular size, is an
ongoing process for the PSQ and looms as the greatest
challenge for satisfaction measurement in general.
This process proceeds in the absence of direct mea-
sures of patient satisfaction or of agreed-upon satis-
faction “criteria” that can be used to evaluate validity.
This problem is common in psychological measure-
ment. A solution that is becoming standard is the
strategy of construct validation. This approach exam-
ines a wide range of variables to determine the extent
to which an instrument produces results that are con-
sistent with what would be expected for the construct
to be measured (APA, 1974). A major difficulty in ap-
plying the construct validation method to patient
satisfaction measures is the lack of well-specified
theory. Specifically, what results should one expect for
a valid measure of patient satisfaction?
In the face of this dilemma, several approaches were
used to test the validity of the PSQ: (a) a systematic
review of content validity; (b) factor analytic studies of
the structure of items and subscales; (c) studies of
convergent-discriminant validity that compared results
across alternative methods of measuring patient satis-
faction; and (d) studies of the predictive validity of
PSQ scales in relation to health and illness behaviors
thought to be influenced by individual differences in
patient satisfaction. Our experiences with the first
three kinds of validity studies are documented in detail
elsewhere (Ware et al., 1976b, pp. 323-588) and are
summarized briefly here. Studies of predictive validity
are discussed in a companion paper in this issue (Ware
& Davies, 1983).
In developing the PSQ, we sought to capture the
most salient characteristics of services and providers
that might influence patient satisfaction with care.
Given this goal, content validity is a relevant standard
and has been systematically investigated for the PSQ.
The match between PSQ items and the taxonomy of
characteristics of services and providers that has evolved
using information from a variety of sources is quite
good. (The PSQ is systematically compared with this
taxonomy by Ware et al., 1976b, pp. 373-378; and by
Ware et al., 1978). However, potential areas of im-
provement in the content of PSQ items have been iden-
tified (particularly in the areas of quality of care and
finances, as noted below).
Although the PSQ is more comprehensive than its
predecessors, there are still more distinguishable
features of medical care services than PSQ subscales.
The great majority of these features are assessed by
one or more PSQ items. However, for many if not
most studies of patient satisfaction, a single-item
measure is not a very desirable unit of analysis. Thus,
the PSQ subscales represent a deliberate compromise
between respondent burden and content validity and
other psychometric standards. Specifically, the PSQ
attempts to strike a balance between the number of
different satisfaction constructs measured and how
well each construct is measured while holding ad-
ministration time well below 15 minutes. All but 2 of
the 18 subscales contain two to four items each. The
two subscales measuring satisfaction with the technical
and interpersonal skills of providers are longer,
because these features of care seem most influential in
determining patient satisfaction and are more difficult
to distinguish.
Standards of empirical validity derive from the in-
tended uses of an instrument. The PSQ was designed
with the diverse goals of several types of study in
mind. First, it was designed to measure patient satis-
faction as an outcome of care. For this application, the
PSQ must detect the amount of satisfaction and dis-
satisfaction produced by different systems of care
(e.g., fee-for-service vs. prepaid group practice) as well
as by different facilities. Because competing systems of

care might involve different tradeoffs (e.g., increased
access versus provider continuity), an overall satisfac-
tion score is particularly useful in summa~~ng satis-
faction outcomes. Second, the PSQ was designed to
provide programmatically useful information about
the major sources of satisfaction and dissatisfaction.
For this use, the information it provides about satis-
faction must relate to the distinct features of care. The
validity issue most relevant to this application is
whether PSQ subscales measure different dimensions
of satisfaction and how each subscale should be inter-
preted with regard to a specific feature of care. Finally,
the PSQ was designed to be useful in studies of patient
behavior. This application requires that its predictive
validity be established.
A major feature of the PSQ that is important for
several of its intended applications is its structure. If
there are distinct features of medical care services that
cause differences in patient satisfaction, then a valid
satisfaction measure should be multidimensional. The
validity of the PSQ in this regard rests on a rather
substantial body of empirical evidence. First, the scal-
ing studies involved many tests of item discriminant
validity. Results showed that groupings of items cor-
responding to the PSQ subscales measure different
things. These tests were repeated using subscales as the
unit of analysis, and findings were notably consistent
across four independent field tests in diverse popula-
tions. Specifically, four higher-order factors (quality
of care, access to care, availability of resources, and
continuity of care) were observed and replicated. The
pattern of correlations for each subscale across factors
also showed little variance across field tests and be-
tween groups who had and had not used medical care
services recently. These patterns were evaluated em-
pirically by estimating similarity coefficients using
methods described by Kaiser, Hunka, and Bianchini
(1971).
The weight of empirical evidence regarding the gen-
eralizability of the item and higher-order factor
analyses clearly indicates that PSQ items and subscales
measure distinct dimensions. Differences in the face
validity of items in each subscale also support this con-
clusion. Further, the higher-order factor structure of
PSQ subscales is strikingly similar to the major
features of health care services that are distinguished
in the published literature. This evidence strongly sug-
gests that the PSQ measures the same things that are
written about in this literature. Only the most ardent
supporter of construct validation by factor analysis,
however, is likely to accept from this evidence alone
that the PSQ measures distinct dimensions of patient
satisfaction.
The evidence summarized above constitutes a sound
psychometric basis for scoring and interpreting
distinct factors defined by PSQ items. The content of
these factors suggests that item responses reflect dif-
ferences in satisfaction with the specific characteristics
of doctors and medical care services described by the
scale labels (e.g., finances, interpersonal manner). We
conducted a number of empirical studies to test the ap-
propriateness of this conclusion. These studies focused
on how well the PSQ agrees with the results of other
methods of measuring patient satisfaction. These
studies, described in detail elsewhere (Ware et al.,
1976b, pp. 379-463), are summarized briefly here.
Every field test of the PSQ included open-ended
questions about recent care experiences and other
events that may have changed sentiments regarding
doctors and medical care services. These questions
were included to test for previously unidentified
satisfaction constructs and to validate PSQ scores. In
two field tests (East St. Louis, Sangamon County) of
Form II, these responses were formally analyzed to
test hypotheses about the validity of the PSQ. Two
questions were addressed: (a) Does the PSQ discrimi-
nate between persons who describe negative health
care experiences and those who report positive ex-
periences or no events affecting their sentiments? (b)
Do the PSQ subscales predict the specific sources of
satisfaction and dissatisfaction reported in descrip-
tions of these experiences. For example, are responses
to items in the Technical Quality subscale more sen-
sitive to problems with technical quality than to prob-
lems with finances?
For several reasons, we were not able to perform all
of the planned analyses of responses to open-ended
questions. In both field tests, the majority of respon-
dents preferred not to discuss their experiences verbally.
A practical implication of this result is that, in ad-
dition to costing less than personal interviews, comple-
tion rates using a standardized self-administered
satisfaction survey are much higher than with unstruc-
tured interviews. In East St. Louis, only 3 of 323
respondents volunteered a favorable statement about
doctors or medical care services in response to open-
ended questions. Hence, a traditional sensitivity-
specificity analysis was not possible. In both field
tests, some features of services were not mentioned
frequently enough to test the sensitivity of the cor-
responding PSQ subscale. Only four dimensions of
care (technical quality, access, finances, and interper-
sonal manner) were mentioned frequently enough to
permit any kind of empirical analysis. Hence, we com-
pared responses to open-ended questions against the
four PSQ global scales corresponding to these four
problem areas.
In East St. Louis, complaints were expressed about
(in order of prevalence): technical quality, access,
finances, and interpersonal manner of providers. With

one exception, the PSQ scales showed good conver-
gent and discriminant validity in identifying persons
who made these complaints. Respondents who voiced
complaints tended to score lower than noncomplaining
respondents (approximately 35th-19th percentiles, on
average, across subscales corresponding to the subject
matter of the complaints). This supports the conver-
gent validity of PSQ subscales. Further, persons who
complained about technical quality (n = 30) scored
lower on the Technical Quality subscale (at the 25th
percentile, on average) than on the other three PSQ
subscales studied (access, finances, and interpersonal
manner), The other three PSQ scales showed a similar
pattern of results in support of their sensitivity and
specificity in detecting specific problems with care.
We encountered one noteworthy exception to this
pattern of favorable discriminant validity results
across the four complaint groups and corresponding
PSQ global scales in East St. Louis. The interpersonal
manner of providers was rated very unfavorably (14th-
20th percentiles of the Humaneness Scale distribution)
by all groups who complained (regardless of what was
complained about). This pattern of results, which was
also apparent in other tests of discriminant validity,
raises interesting questions about the dynamics of pa-
tient satisfaction in the area of provider “caring.” Are
practices that have problems with access, finances, and
other features of care also more likely to produce un-
satisfactory doctor-patient relationships? Are patients
inclined to blame their doctors(s) for long waits in the
waiting room, financial difficulties, and so on? Further
research is necessary to determine the extent to which
dimensions of service satisfaction are not orthogonal.
The pattern of results observed in the Sang~on
County study, where the number of positive and
negative comments was large enough to permit a more
traditional correlational analysis, also supported both
the convergent and discriminant validity of the PSQ
subscales. However, the fact that many respondents
commented about more than one feature of care com-
plicated interpretations. In general, for respondents
making only one complaint about their care, scores for
the PSQ scale that corresponded to the content of the
complaint tended to be lowest.
Other validity studies of PSQ subscales focused on
access variables and compared PSQ subscales with
patient reports using standardized questions about ob-
jective features of services, including: distance to care
facilities (in miles, travel time); availability of
emergency care; and proportion of costs paid by out-
side sources (e.g., insurance). These analyses were
replicated in three field tests (East St. Louis,
Sangamon County, and Family Practice). We also
tested whether satisfied patients were more likely to
report a regular source of care and (in the family prac-
tice center study) to claim a particular facility as their
regular source of care.
Tests based on these criteria support the discrimi-
nant validity of PSQ access-related subscales. For ex-
ample, the access criteria correlated higher with the
PSQ access-related subscales (Access to Care,
Emergency Care, Convenience) than with other PSQ
subscales; in fact, most correlations with other PSQ
subscales were not statistically significant. Further,
analyses comparing the PSQ with other standardized
report and rating measures provided support for the
interpretation of PSQ access-related subscales as
evaf~atjons. For example, although the access-related
subscales (particularly the Convenience subscale} cor-
related significantly with reported miles traveled and
travel time, their correlations with a standardized
evaluative rating of travel time were consistently much
higher.
The validity of the PSQ subscales that measure
availability of resources could not be evaluated in our
field tests because such a study requires a geographic
area as the unit of analysis. Results relevant to this
validity issue have been reported by Aday, Andersen,
& Fleming (1980). They have linked the PSQ Avail-
ability of Resources subscales convincingly to indepen-
dent measures of medical resources per capita.
We also examined several multitrait-multimethod
matrices that correlated PSQ subscales and global
scales with measures based on other methods, in-
cluding ratings of care on a satisfaction continuum
(very satisfied vs. very dissatisfied) and a method that
combined measures of the frequency of health care
events with the importance placed on those events.
These analyses provide strong support for the con-
vergent and discriminant validity of PSQ subscales
and global scales as measures of patient satisfaction.
Noteworthy exceptions, however, included results for
the technical and interpersonal subscales, as discussed
later. Some problems with the use of satisfaction
rating scales (i.e., scales using a very satisfied vs.
very dksathfied) response continuum were also noted
in comparisons with the PSQ. For example, correla-
tions among satisfaction ratings seem to be high,
relative to correlations among PSQ subscales, despite
the higher reliability of the latter. This result suggests a
strong halo or method effect of ratings on a satisfac-
tion continuum, or lack of discriminate validity.
We conclude our discussion of validity with com-
ments regarding unanswered questions about PSQ
measures of the quality of care. Throughout our
studies of the PSQ, we have observed substantial cor-
relations (in the &I-.70 range) among PSQ quality of
care subscales and global scales (measures of the
technical and interpersonal skills of providers). Some
access measures also correlate substantially with these

quality of care subscales. When this pattern of results
was first observed in tests of Form I, we attributed it to
the references to doctor in many PSQ items. Item revi-
sions in Form II items deleted most such references to
focus attention on specific features of care rather than
on doctors in general. Substantial correlations among
quality of care items, however, have persisted although
PSQ item analyses and analyses of other items ciearly
indicate that patients can distinguish among specific
quality of care features (Ware et al., 1976a, Ware et
al., 1975).
In convergent-discriminant tests of PSQ measures
(using the rigorous MTMM method) we sometimes en-
countered problems with discriminant validity of
scaIes assessing technical and interpersonal skills of
providers as well as access to providers. According to
the logic of convergent-discriminant validation one
should be able to measure a trait well enough that
measures of the same trait using different methods
correlate more highly than measures of different traits
based on the same method. The opposite pattern of
results is encountered all too often.
Despite these reasons to reserve judgement regard-
ing the discriminant validity of PSQ subscales mea-
suring the interpersonal and technical skills of pro-
viders, we have little doubt that they measure patient
satisfaction. These scales perform very well in relation
to a wide range of criterion variables, and are con-
sistently (across studies) the best predictors of satisfac-
tion with care in general and of continuity of care. At
issue is their discriminant validity in relation to the
particular quality of care attributes they are supposed
to measure- interpersonal manner versus technical
skills- and in relation to the provider’s accessibility
to patients.
An analysis of correlations among the subscales in
question, taking into account the reliability of each
subscale, leaves no doubt that each subscale measures
something not measured by the others. Unfortunately,
we have no basis for evaluating the size of these inter-
scale correlations because we do not know the extent
to which the attributes of providers in question are
correlated in the real world. Are friendlier doctors
likely to be more thorough in examining their patients?
Are doctors who show more courtesy and respect when
they see their patients also more likely to return their
patients’ phone calls in a timely manner? if so,
substantial correlations among measures of the
technical and interpersonal skills of providers and
general access reflect favorably on their validity. We
beIieve that there is something to this argument,
although we also suspect that PSQ items can be con-
structed to better discriminate between the interper-
sonal and technical skills of providers. These hy-
potheses are now being tested (Ware, Kane, Davis, &
Brook, in press).
CONCLUSIONS
Our experience in developing the Patient Satisfaction tient satisfaction concept, the PSQ was constructed to
Questionnaire (PSQ) and testing it in the field has led measure patient satisfaction in general as well as
us to a number of conclusions about the nature of the satisfaction with specific features of care. This permits
patient satisfaction concept and important method- testing of more focused hypotheses and makes rest&s
ological considerations in its measurement. Although more useful from a programmatic point of view. The
much empirical work remains to be done before a PSQ also reflects our solutions to a number of meth-
complete model of patient satisfaction can be odological problems, namely; relying on self-adminis-
specified, we are convinced of the importance of tration to reduce data-gathering costs and increase
several features of that model. First, patient satisfac- confidentiality; structuring items as statements of
tion with medical care is a multidimensonal concept, opinion with an agree-disagree response format to
with dimensions that correspond to the major char- reduce the skewness of response distributions; balanc-
acteristics of providers and services. Second, the ing scales to control for acquiescence; scoring multi-
realities of care are reflected in patients’ satisfaction item scales to achieve minimum standards of reliability;
ratings. Finally, the influence of patients’ expecta- and vaIidating scales using the logic of construct
tions, preferences for specific features of care, and validity, in the absence of agreed-upon criteria. These
other hypothetical constructs on patient satisfaction solutions have served us well and we recommend them
remain to be determined. and the PSQ to others.
Consistent with this preliminary model of the pa-
REFERENCES
ADAY, L. A., ANDERSEN, R., & FLEMING, G. V. Health care
rn the U.S. equituhlefor whom 7 Beverly Hills: Sage Publications,
1980.
cducatronal and psychologrcal tests. Washmgton, DC:: American
Psychological Associatton, 1974.
AMERICAN PSYCHOLOGICAL ASSOCIATION. Standards for
CAMPBELL, D. T., & FISKE. Ct. W. Convergent and di~crlrni~lan~
validation by the multItralt-multtmetllod matrix. Pcychoiogtcul
Bullrrln, 1959. 56. 81-105

CHU, 0. C., WARE, J. E., JR., & WRIGHT, W. R. Health related
research in southernmost Illinois: A preliminary report. (Tech. Rep.
No. HCP-73-6.) Springfield, IL: Southern Illinois University,
School of Medicine, 1973.
WARE, J. E., JR., DAVIES-AVERY, A., & STEWART A. L. The
measurement and meaning of patient satisfaction. Health and
Medical Care Services Review, 1978, 1, 1-15.
COMREY, A. L. Factored homogeneous item dimensions in per-
sonality research. Educational and Psychological Measurement,
1961, 21, 417-431.
WARE, J. E., JR., KANE, R. L., DAVIES, A. R., & BROOKS, R.
H. Thepatient role in assessrng medical careprocess. Santa Monica,
CA: The Rand Corporation, in press.
CRONBACH, L. J. Coefficient (Yand the internal structure of tests.
Psychometrika, 195 1, 16, 297-334.
HELMSTADTER, G. C. Principles of psychological measurement.
New York: Appleton-Century-Crofts, 1964.
WARE, J. E., JR., MILLER, W. G., & SNYDER, M. K. Com-
parison of factor analytic methods in the development of health-
related indexes from questionnaire data. (NTIS No. PB
239-517/AS.) Springfield, VA: National Technical lnformation Ser-
vice, ly973.
HOWARD, K. I., & FOREHAND, G. G. A method for correcting WARE, J. E., JR., & SNYDER, M. K. Dimensions of patient at-
item-total correlations for the effect of relevant item inclusion. titudes regarding doctors and medical care servtces. Medical Care,
Educational and Psychological Measurement, 1962, 22, 731-735. 1975, 13, 669-682.
HULKA, B. S., ZYZANSKI, S. J., CASSEL, J. C., & THOMP-
SON. S. J. Scale for the measurement of attitudes toward physicians
and primary medical care. Medical Care, 1970, 8, 429-436.
KAISER, H. F., HUNKA, S., & BIANCHINI, J. C. Relating fac-
tors between studies based upon different individuals. Multivarrate
Behavioral Research, 197 1, 6, 409-422.
WARE, J. E., JR., SNYDER, M. K., & WRIGHT, W. R. Develop-
ment and validation of scales to measure patient satisfactton with
health care services: Volume I of a Final Report Part A: Review of
literature, overview of methods, and results regarding construction
of scales. (NTIS No. PB 288-329.) Springfield, VA: National
Technical Information Service, 1976. (a).
LIKERT, R. A technique for the measurement of attitudes. Ar-
chives of Psychology, 1932, (No. 140), l-55.
SNYDER, M. K., & WARE, J. E., JR. Differences in sattsfaction
with health care servtces as a function of rectpient: Self or others.
(P-5488.) Santa Monica, CA: The Rand Corporation, 1975.
WARE, J. E., JR., SNYDER, M. K., &WRIGHT, W. R. Develop-
ment and validation of scales to measure patient satisfaction with
health care services: Volume I of a Final Report Part B: Results
regarding scales constructed from the patient satisfaction question-
naire and measures of other health careperceptions. (NTIS No. PB
288-330.) Springfield, VA: National Technical Information Service,
1976. (b).
WARE, J. E., JR. Effects of acquiescent response set on patient
satisfaction ratings. Medical Care, 1978, 16, 327-336.
WARE, J. E., JR. How to survey patient satisfaction. Drug In-
telbgence and Climcal Pharmacy, 1981, 15, 892-899.
WARE, J. E., JR., WRIGHT, W. R., SNYDER, M. K., & CHU,
G. C. Consumer perceptions of health care services: implications
for academic medicine. Journal of Medical Education, 1975, 50,
839-848.
WARE, J. E., JR., & DAVIES, A. R. Behavioral consequences of
consumer dissatisfaction with medical care. Evaluatton and Pro-
gram Plannmg, 1983, 6, 291-297.
WINKLER, J. D., KANOUSE, D. E., & WARE, J. E., JR. Con-
trolling for acquiescence response set in scale development. Journal
of Applied Psychology, 1982, 67, 555-561.

Ware DEFINING AND MEASURING PATIENT SATISFACTION 1983.pdf

Recommended

Recommended

More Related Content

Similar to Ware DEFINING AND MEASURING PATIENT SATISFACTION 1983.pdf

Similar to Ware DEFINING AND MEASURING PATIENT SATISFACTION 1983.pdf (20)

Recently uploaded

Recently uploaded (20)

Ware DEFINING AND MEASURING PATIENT SATISFACTION 1983.pdf