Comparison of therapist to patient judgment bias in low vision

1
Therapist Judgment Bias and Reliability Relative to that of Patients in
the Estimation of Functional Ability from Ordinal Ratings
Robert W. Massof,1 Theresa M. Smith,2 Lisa S. Foret,3 Guy Davis,3 and Kyoko Fujiwara1
1Lions Vision Research and Rehabilitation Center, Wilmer Eye Institute, Johns Hopkins
University School of Medicine
2Department of Occupational Therapy and Rehabilitation Sciences, University of Texas
Medical Branch Galveston
3Evangeline Home Health, Lake Charles, LA
Supported by grant EY022322 from the National Eye Institute, National Institutes of Health,
Bethesda, MD.

2
Abstract
Objective: To present and evaluate a measurement model for estimating the judgment bias of
therapists and patients when rating functional ability. Design: Observational study of the
agreement between therapist ratings and patient self-ratings of functional ability. Setting:
Measures made by telephone interview and in the patient’s home. Participants: Forty-five home
health care patients who have a secondary diagnosis of low vision. Main Outcome Measures:
Functional ability estimated from Rasch analysis of patient difficulty ratings of calibrated items
(activity goals) in the Activity Inventory (AI) and therapist ratings using a FIM scale of the same
activity goals, both at initial evaluation and again after discharge. Results: A linear relationship
was observed between functional ability measures estimated from therapist ratings and measures
estimated from patient self-ratings with the same slope, but different intercepts, for measures
obtained at baseline and at post-rehabilitation follow-up. Conclusions: The observed linear
relationship between measures estimated from therapist ratings and measures estimated from
patient ratings confirms the model prediction. The intercept corresponds to the difference
between the therapist’s judgment bias and the average judgment bias of all patients. Relative to
patient judgments, the therapist’s estimate of functional ability at baseline was less than the
patients’ estimates; it was greater than the patients’ estimates at follow-up. The slope of the line
corresponds to the square root of the ratio of the between-patient plus within-patient variance in
judgment bias to the within-therapist variance in judgment bias. The results indicate that
between-patient variance is almost 3 times the within-therapist variance.

3
1
Introduction2
Rehabilitation medicine employs three different approaches to estimate the functional ability of3
patients: 1) measures of task performance time and/or accuracy;1 2) patient ratings of their own4
ability and/or frequency of performing activities;2 and 3) ratings by a therapist or proxy of a5
patient’s ability and/or frequency of performing activities.3 Functional ability is a trait of the6
patient. Task performance time and accuracy, patient ratings, and therapist ratings only are7
indicators of functional ability. Measurements of functional ability per se must be inferred from8
the observed indicators. Because functional ability is a property of the patient, valid and unbiased9
measures of functional ability estimated from the three different approaches should agree.10
Measurement validity refers to the accuracy of the assumption that the estimated measure is11
linear with the magnitude of the variable of interest. Measurement bias refers to the agreement12
(or disagreement) between different measures of the same variable when the variable magnitude13
has not changed between measures. In the case of functional ability, measurement validity and14
bias can be influenced by the sample of activities selected for observation and, in the case of15
ratings, by properties of the judge.16
This paper is concerned with comparing functional ability measures estimated from ratings by17
patients to functional ability measures estimated from ratings by a therapist. More specifically,18
this paper focuses on the estimation of relative biases and measurement uncertainties of judges19
when comparing functional ability measures estimated from a therapist’s judgments to functional20
ability measures estimated from patient judgments of themselves. We first present a model of21
patient self-ratings and a parallel model of therapist ratings of the patient, explicitly identifying22
respective biases and sources of variance in the observations, and show how the two sets of23

4
ratings are related. We then test the model with a substantive example using low vision24
rehabilitation of visually impaired home health care patients.25
Model of Patient Self-Ratings and Therapist Ratings26
Using ordered rating scale categories (e.g., level of “difficulty” or level of “independence”), both27
the patient and the therapist are asked to judge the patient’s ability to perform specific activities,28
referred to as “items”. The true ability of patient n, which we are attempting to estimate from the29
patient’s and therapist’s ratings, is 𝛼 𝑛. The ability required to perform each of the items, 𝜌𝑗 for30
item j, is a property of the item that is independent of the judge (whether patient or therapist).31
The model assumes that both the patient and therapist are judging the magnitude of the patient’s32
functional reserve for the activity described by the item, which is the difference between the33
ability of patient n and the ability required by item j, i.e., 𝛼 𝑛 − 𝜌𝑗 . Both the patient and therapist34
are instructed in the use of the ratings, but they develop their own criteria for each rating35
category that they will assign to a patient/item pair. These criteria, or “thresholds”, can be36
thought of as boundaries between neighboring categories on a continuous functional reserve37
scale. The thresholds are denoted as 𝜏 𝑘𝑥 for the boundary set by judge k between rating category38
x-1 and rating category x (k  n in the case of patient self-judgment).39
Although the value of 𝜌𝑗 is independent of the judge, judges’ estimates of 𝜌𝑗 are likely to be40
biased. If 𝜌̂ 𝑘𝑗 is the estimate of 𝜌𝑗 by judge k, then 𝜌̂ 𝑘𝑗 = 𝜌𝑗 + 𝜖 𝑘𝑗 where 𝜖 𝑘𝑗 is the bias of judge41
k in estimating the ability required by item j. Similarly, the average threshold for rating category42
x across a population of judges is 𝜏̅ 𝑥, therefore, 𝜏 𝑘𝑥 = 𝜏̅ 𝑥 + 𝜂 𝑘𝑥 where 𝜂 𝑘𝑥 is the bias of judge k,43
relative to the average judge, in the choice of threshold for rating category x. In the case of44
therapists or proxies, the population of judges would refer to all therapists or to all proxies,45

5
respectively. If we define 𝜖̅𝑘 to be the average bias of judge k across items and 𝜂̅ 𝑘 to be the46
average bias of judge k across rating category thresholds, then we can re-express the bias terms47
as the sum of a fixed variable (average) and a random variable (), i.e., 𝜖 𝑘𝑗 = 𝜖̅𝑘 + 𝛿 𝜖 𝑘𝑗
and48
𝜂 𝑘𝑥 = 𝜂̅ 𝑘 + 𝛿 𝜂 𝑘𝑥
(if there is only a single judge contributing to the estimate of 𝜏̅ 𝑥, then 𝜂̅ 𝑘 = 0).49
In each case, the random variable has an expected value of zero and incorporates variance50
associated with real differences in bias between items and/or categories, estimation uncertainty,51
and parameter instability.52
The judge assigns rating category x to item j if the estimated functional reserve exceeds the53
judge’s criterion for category x (and all lower categories) and is less than the criterion for54
category x+1 (and all higher categories), i.e.,55
𝜏 𝑘1, ⋯ , 𝜏 𝑘𝑥 < 𝛼 𝑛 − 𝜌̂ 𝑘𝑗 < 𝜏 𝑘𝑥+1, ⋯, 𝜏 𝑘𝑚. (1a)56
Substituting the definitions presented in the preceding paragraph and, for judge k, combining the57
random variables into a single random term and combining the fixed bias variables into a single58
fixed term, expression (1a) can be expanded to make the fixed and random variables explicit, i.e.,59
𝜏̅1 + 𝛿 𝑘𝑗1, ⋯, 𝜏̅ 𝑥 + 𝛿 𝑘𝑗𝑥 < 𝛼 𝑛 − 𝜌𝑗 − 𝛽 𝑘 < 𝜏̅ 𝑥+1 + 𝛿 𝑘𝑗𝑥+1, ⋯, 𝜏̅ 𝑚 + 𝛿 𝑘𝑗𝑚 (1b)60
where 𝛿 𝑘𝑗𝑥 = 𝛿 𝜂 𝑘𝑥
+ 𝛿 𝜖 𝑘𝑗
and 𝛽 𝑘 = 𝜖̅𝑘 + 𝜂̅ 𝑘. The judgment bias of judge k is summarized with61
the bias term 𝛽 𝑘 and the reliability of judge k is summarized by the variance of 𝛿 𝑘𝑗𝑥, which we62
designate as 𝜎𝑘𝑗𝑥
2
.63
Rasch analysis is used routinely to estimate the average expected rating category thresholds (𝜏̅𝑥64
for rating category x), the true person measures (𝛼 𝑛 for person n), and the true item measures (𝜌𝑗65
for item j) from distributions of observed ratings across persons and items.4 Judgment bias, 𝛽 𝑘,66
affects the accuracy of the estimates and the variance of the random terms, 𝜎𝑘𝑗𝑥
2
, affects67

6
estimation precision (i.e., reliability). Rasch models assume homogeneity of variance, i.e., 𝜎𝑘𝑗𝑥
2
is68
the same for all persons, items, and rating category thresholds (a requirement of unidimensional69
measures). Homogeneity of variance means that 𝜎𝑘𝑗𝑥
2
= 𝜎𝑘
2
. Rasch models also assume that the70
random terms are statistically independent of one another.4 Various statistical tests are used to71
evaluate how well the set of observed ratings conform to these assumptions of the Rasch model.472
In the case of patient self-judgment, when there are N patients there also are N judges. However,73
Rasch models typically (but not necessarily) assume that there is just a single judge, which in74
effect is the average of the judges. In this case, when 𝜎𝑘
2
is referring to the average of N judges, it75
must include variance between judges, 𝜎 𝑏 𝑛
2
, as well as variance within judges, 𝜎𝑛
2
. We therefore76
define the variance of the average patient judge to be77
𝜎𝑃
2
= 𝜎 𝑏 𝑛
2
+ ∑ 𝜎𝑛
2
𝑁⁄𝑁
𝑛=1 , (2)78
the sum of between patient variance and average within patient variance. When a single therapist79
is the judge, the variance of the therapist can be attributed entirely to the variance within the80
judge, 𝜎 𝑇
2
= 𝜎𝑘
2
. To complete the definition of terms for our model, the fixed judgment bias of81
each patient is 𝛽 𝑛 and the fixed judgment bias of the therapist is 𝛽 𝑇.82
In practice, Rasch models normalize the estimated person and item measures to the square root83
of the judge’s variance and ignore the judge’s bias (unless made explicit in a facet model5). Thus,84
person measures estimated from patient self-judgments are expressed as85
𝛼̂ 𝑛𝑃 = ( 𝛼 𝑛 + 𝛽 𝑃) 𝜎𝑃⁄ (3)86
for person n, where 𝛽 𝑃 = ∑
𝛽 𝑛
𝑁
𝑁
𝑛=1 , the average bias across patients. The person measures87
estimated from a therapist’s judgments are expressed as88
𝛼̂ 𝑛𝑇 = ( 𝛼 𝑛 + 𝛽 𝑇) 𝜎 𝑇⁄ (4)89

7
for the same person n. Because both eqs.(3) and (4) are linear functions of the true person90
measure, 𝛼 𝑛, we expect the relationship between person measures estimated from a therapist’s91
ratings and corresponding person measures estimated from patients rating themselves to be92
𝛼̂ 𝑛𝑇 =
𝜎 𝑃
𝜎 𝑇
𝛼̂ 𝑛𝑃 +
𝛽 𝑇−𝛽 𝑃
𝜎 𝑇
, (5)93
a linear relationship for which the slope is the ratio of the standard deviation for the average94
patient to the standard deviation for the therapist and the intercept is the weighted difference95
between therapist and average patient judgment biases.96
Methods97
ResearchDesign98
The present study is part of a larger observational study still in progress. Data reported here were99
collected pre and post usual occupational therapy intervention provided in the participant’s home100
by one occupational therapist who has specialty training in low vision rehabilitation and 12 years101
of experience providing rehabilitation services to home health care patients with low vision.102
Participants103
Eligibility criteria for the study were: 1) patients were new to the occupational therapist; 2)104
patients were adults admitted to home health care; 3) patients met the visual impairment105
diagnostic criteria for Medicare or other third party coverage of low vision rehabilitation106
services;6 and 4) patients understood English and had good enough hearing to be able to107
participate in telephone interviews. Forty-five low vision patients participated in this study.108
Procedures109
The study conformed to the tenets of the Declaration of Helsinki and was approved by the Johns110
Hopkins Institutional Review Board. After the patient consented to participate, one of the111

8
investigators administered the Activity Inventory (AI),7-9 an adaptive rating scale instrument, by112
telephone interview. Participants rated the importance of the 50 activity goals in the AI, and rated113
the difficulty of those goals that were rated to be at least “slightly important”. In the instructions114
to the participant, both importance and difficulty ratings were qualified as to be able to perform115
the activity “without depending on another person”. Goals included in this study were those that116
the participant also rated to be at least “slightly difficult”. In addition, participants rated the117
difficulties of tasks in the AI that are nested under goals that were rated to be at least slightly118
important and slightly difficult.119
At the time of the initial patient evaluation, the occupational therapist was provided with a list of120
the AI goals and subsidiary tasks that were rated by the participant to be at least slightly difficult,121
however, the actual ratings assigned by the participant to each goal and task were not revealed.122
After completing the initial patient evaluation, the occupational therapist assigned a FIM scale123
score3,10 to each of the participant-identified AI goals. Table 1 lists the FIM rating scale124
categories. The occupational therapist then developed the patient’s plan of care and provided125
rehabilitation services following usual procedures. At discharge the occupational therapist again126
used the FIM scale to rate the participant’s functional independence level for the same AI goals127
that were rated at the initial evaluation. The AI was re-administered to the participant by128
telephone interview one to two months after discharge from occupational therapy.129
Data Analysis130
Rasch analysis, using the Andrich rating scale model11 (Winsteps 3.6512), was employed to131
estimate the visual ability of each participant before and after rehabilitation on a continuous132
interval scale from the participants’ difficulty ratings of the AI goals. The item measures for the133
50 goals in the AI item bank and the response category thresholds for levels of difficulty were134

9
anchored to values estimated from the difficulty ratings of 3200 low vision patients.13 Rasch135
analysis also was performed on the FIM scale ratings of each patient’s AI goals by the136
occupational therapist using the same anchored item measures for the goals. In the case of137
analysis of FIM ratings, participant’s ratings obtained prior to the initial patient evaluation and138
ratings obtained post-discharge were stacked and analyzed together to estimate response139
category thresholds for the 7 FIM scale categories. An information-weighted mean square fit140
statistic (infit) and the standard error were estimated for each response category threshold and for141
each person measure.142
FIM
score
Description
1 Totally dependent – patient able to perform less than 25 % of the task
2 Maximal assistance required – patient able to perform 25% of the task
3 Moderate assistance required – patient able to perform 50% of the task
4 Minimal assistance required – patient able to perform 75% of the task
5 Supervision or set-up required – patient performs task without direct assistance
6 Modified independence – patient requires assistive equipment, more time, or safety
concern
7 Independent – no assistance required, patient able to perform 100% of the task
Table 1143
Functional Independence Measure (FIM) Scale Categories144
145
Results146
Participants147

10
Complete data were obtained from 41 of the 45 enrolled participants. All participants resided in148
Louisiana. Participants consisted of 15 males ( 33%) and 30 females ( 67%) between the ages of149
30 and 98 years old (median = 80, SD = 17). Measured binocular visual acuity with habitual150
correction ranged from 20/20 to 20/900 (median = 20/65, SD= 0.52 log MAR); 3 participants151
had no light perception in either eye and 2 participants had only light perception in the better eye.152
Among participants with measurable visual acuity, binocular log contrast sensitivity ranged from153
0.07 to 1.67 (normal>1.6; median = 1.02, SD = 0.44). For binocular central visual field measures154
(12.5o), 35% of participants had central scotomas (blind spots), 20% had hemi- or quad-field155
defects, 27% had contracted visual fields, and visual fields could not be performed on 18% .156
FIM Rating Scale Evaluation157
The therapist used all 7 of the FIM scale response categories to rate AI goals selected by158
participants at baseline and/or at follow-up. As shown in the Table 2 columns labeled Baseline159
Count and Follow-up Count, FIM scale scores of 4 or less were used most frequently at baseline160
and FIM scale scores of 5 or 6 were used most frequently at follow-up. The category threshold161
corresponds to the value of functional reserve (difference between the estimated person measure162
and estimated item measure) at which the probability of using FIM score x is equal to the163
probability of using FIM score x-1, for x = 2 to 7. The ordering of thresholds should agree with164
the ordering of the FIM scale scores. The thresholds are ordered for response categories 2165
through 6. The threshold for response category 7 is disordered. However, the assignment of FIM166
scale score 7 occurred rarely – it represents only 1.3% of the total number of FIM scale scores167
assigned.168
The Rasch model predicts the response category assigned to every combination of person and169
item measures. The residual is defined to be the difference between the FIM scale score observed170

11
for each person/item combination and the FIM scale score predicted for the corresponding171
person and item measure estimates. The infit mean square is the ratio of the observed sums of172
squared residuals for FIM ratings, which are expected to be distributed as 2, to the sums of173
squared residuals expected by the Rasch model, which corresponds to the expected value of 2.174
The expected value of 2 is equal to the degrees of freedom, thus, the infit mean square is175
expected to be distributed as 2/df, which in turn has an expected value of 1.0.4 The infit mean176
square is interpreted as the ratio of the observed variance in the residuals to the expected177
variance. Infit mean square values greater than 1.0 indicate that the observed variance is greater178
than expected. As can be seen in the last two columns of Table 2, the observed variance in179
residuals for response category 6 is more than twice the expected variance both at baseline and at180
follow-up. As a rule of thumb, infit mean squares greater than 1.3 are considered to be indicative181
of excessive observed variance.14 With that criterion, only FIM response categories 1 through 3182
at baseline and 4 and 5 at follow-up behave as expected by the Rasch model, which suggests183
inconsistency in the use of the other FIM response categories across patients and/or across items.184
Table 2185
Functional Independence Measure (FIM) response counts, estimated category thresholds in the Andrich186
model, and information-weighted mean square residuals (Infit) at baseline and follow-up by rating scale187
response category.188

12
189
Infit mean squares also were estimated for each participant at baseline by summing observed190
squared residuals and expected squared residuals across goals. For degrees of freedom of 25 or191
greater, the cube-root of the 2 distribution is well approximated by a normal distribution.15192
Therefore, the infit mean square for each participant was transformed to a standard normal193
deviate and expressed as a z-score.4 Figure 1 illustrates the distribution of infit z-scores on the194
abscissa and the distribution of person measures, i.e., estimated functional ability, on the ordinate195
for all 41 participants. The solid vertical line indicates the expected value of the infit z-score and196
the dashed vertical lines define the range of plus-and-minus two standard deviations from the197
expected value. The majority of participants’ infit mean square z-scores are symmetrically198
distributed about the expected value of 0 and fall in the expected range of +2 SD. These results199
are consistent with the expectations of a valid measure. However, there are seven clear outliers200
where the observed variance in the residuals is more than two standard deviations greater than201
the expected variance. The functional abilities of these outliers fall in the middle of the202
participants’ distribution of functional ability (on the vertical axis).203
Rating scale Baseline Follow-up Category Baseline Follow-up
FIM Score Count Count threshold Infit Infit
1 103 28 NA 1.27 3.47
2 107 25 -2.88 1.2 3.17
3 124 16 -2.03 1.29 2.06
4 145 41 -1.11 1.61 1.01
5 31 123 1.55 1.71 0.91
6 4 212 2.83 2.25 2.14
7 3 10 1.63 1.75 2.83

13
204
Figure 1. Distribution of infit z-scores across items for each participant on the abscissa and the205
distribution of person measures on the ordinate.206
Comparison of Functional Ability Estimates from AI and FIM Ratings207
Because all AI item measures were anchored to calibrated values, i.e., 𝜌𝑗 in eq. (1b), person208
measure estimates from patients’ difficulty ratings and person measure estimates for the same209
patients from the therapist’s FIM ratings are expected to be in the same units of functional210
ability. However, the Andrich rating scale model assumes that the variance in judgment bias is211
constant, thereby normalizing the true values of functional ability, i.e., 𝛼 𝑛 in eq. (3) and eq. (4),212
to the standard deviation of judgment bias, i.e., 𝜎𝑃 in eq. (3) and 𝜎 𝑇 in eq. (4). Thus, we expect213
the standard errors of the two sets of estimated person measures to agree. There is no significant214
difference (paired t-test, p=0.93) between the standard error of the person measure estimated215
from patient difficulty ratings (mean = 0.414) and the standard error of the person measure216
estimated from therapist FIM ratings (mean = 0.415).217
-6
-5
-4
-3
-2
-1
0
1
-3 -2 -1 0 1 2 3 4 5 6
FIM-estimatedpersonmeasure(anchoredAIgoals)
INFIT MNSQ (zstd)

14
It is possible that FIM ratings could be different enough from difficulty ratings that using item218
measures anchored with values estimated from difficulty ratings is not appropriate for the FIM219
scale. If so, variance in residuals should be greater for FIM ratings than for difficulty ratings.220
With the exception of the FIM outliers noted above, Figure 2 illustrates that the z-scores for221
transformed infit mean squares for the two sets of estimates of person measures at baseline are222
within the range of values expected by the 2 distribution (2 SD box).223
224
Figure 2. Z-scores for transformed infit mean squares for person measures estimated from therapist FIM225
ratings (ordinate) vs. transformed infit mean squats for person measures estimated from patients’226
difficulty ratings (abscissa).227
Measures of functional ability, both at baseline and post-discharge, were estimated from patients’228
difficulty ratings of those AI goals that were rated at baseline to be at least slightly important.229
Measures of functional ability also were estimated for the same patients at baseline and at230
discharge from the therapist’s ratings of the same set of AI goals for each patient using FIM scale231
scores. For measures based on patients’ difficulty ratings and measures based on the therapist’s232
FIM scale scores, the mean functional ability at baseline was subtracted from each corresponding233
baseline measure and the mean functional ability at post-discharge was subtracted from each234
-4
-2
0
2
4
6
8
10
-4 -2 0 2 4 6 8 10
INFITMNSQZSTD(FIM)
INFIT MNSQ ZSTD (AI)

15
corresponding post-discharge measure. Figure 3 is a scatter plot comparing measures based on235
patients’ difficulty ratings of the important AI goals (abscissa) to the occupational therapist FIM236
scale ratings of the same AI goals (ordinate) for baseline (filled circles) and post-discharge (open237
circles) measures relative to their respective means. Bivariate linear regression, minimizing238
orthogonal distance of data points from the regression line (i.e., principal component), was239
performed on the combined baseline and post-discharge data. The slope of the regression line is240
1.96 and the intercept is -0.04. The Pearson correlation is 0.52.241
242
Figure 3. Comparing person measures based on patients’ difficulty ratings of important AI goals243
(abscissa) to occupational therapist FIM scale ratings of the same AI goals (ordinate) for baseline and244
post-discharge measures relative to their respective means.245
Figure 4 illustrates scatter plots of the unadjusted functional ability measures estimated from the246
occupational therapist FIM scale ratings of AI goals (ordinate) versus the unadjusted functional247
ability measures estimated from the patient’s difficulty ratings of the same AI goals (abscissa) at248
baseline (filled circles) and at post-discharge follow-up (open circles). The lines fit to the data by249
orthogonal regression have the same slope (1.96), which was estimated from the regression line250
fit to the combined data in Figure 3. The intercepts are -1.02 for the baseline measures and 1.63251
for the post-discharge measures. The dashed lines illustrate the respective mean functional ability252
-3
-2
-1
0
1
2
3
-2 -1.5 -1 -0.5 0 0.5 1 1.5
Functionalability(OTFIMscale)-Mean
Functional ability (patient difficulty ratings) - Mean
PRE
POST

16
measures. The difference between the vertical dashed lines is the intervention effect (difference253
between the means) estimated from patient difficulty ratings (translates to Cohen’s effect size =254
0.49) and the difference between the horizontal dashed lines is the intervention effect estimated255
from the therapist’s FIM scale ratings (Cohen’s effect size = 3.28)256
257
Figure 4. Unadjusted functional ability measures estimated from the occupational therapist FIM scale258
ratings of AI goals (ordinate) versus unadjusted functional ability measures estimated from patient’s259
difficulty ratings of same AI goals (abscissa) at baseline (filled circles) and at post-discharge follow-up260
(open circles).261
Discussion and Conclusions262
The linear relationship between functional ability estimated from patient difficulty ratings and263
functional ability estimated from the therapist’s FIM scale ratings confirms the expectations of264
the model expressed by eqs. (3) and (4), which lead to the specific prediction of a linear function265
expressed by eq. (5). If we interpret the results in Figure 4 in terms of eq. (5), then we must266
conclude from the slope of the regression lines that 𝜎𝑃 = 1.96𝜎 𝑇, both at baseline and at post-267
discharge follow-up. This result means that the variance in bias for the average of the patients is268
nearly 4 times that of the within person variance in bias for our single therapist. If we can assume269
that the average variance in bias within patients is approximately the same as the within person270
-5
-4
-3
-2
-1
0
1
2
3
4
-2 -1.5 -1 -0.5 0 0.5 1
Functionalability(OTFIMscalerating)
Functional ability (patient difficulty rating)
PRE
POST

17
variance in bias of our sole therapist, then in eq. (2), ∑ 𝜎𝑛
2
𝑁⁄𝑁
𝑛=1 ≅ 𝜎 𝑇
2
, and substituting 1.962
𝜎 𝑇
2
271
for 𝜎𝑃
2
in eq. (2), we obtain an estimate for the standard deviation of bias between-patients to be272
𝜎𝑏 𝑛
= 1.69𝜎 𝑇.273
From eq. (5), the intercepts of the regression lines in Figure (4) correspond to the difference274
between the fixed bias of the average patient and the therapist’s fixed bias, in within-therapist275
standard deviation units. The intercept for baseline measures indicates that fixed bias for the276
average patient, 𝛽 𝑃 is 1.02 logits greater than the therapist’s fixed bias, 𝛽 𝑇. However, post-277
discharge the therapist’s fixed bias is 1.63 logits greater than the fixed bias of the average278
patient. From the patients’ perspective, the therapist is underestimating patients’ functional279
abilities at baseline and overestimating patients’ functional abilities at post-discharge follow-up.280
From the therapist’s perspective, the patients are overestimating their functional abilities at281
baseline and underestimating their functional abilities at post-discharge follow-up.282
We cannot draw any conclusions from this study about why the difference between therapist and283
average patient bias is negative at baseline and positive at post-discharge follow-up. One could284
speculate that patients tend to be stoic and/or stubborn – underestimating the magnitude of their285
problems at baseline and underestimating improvements in their function at follow-up.286
Anecdotally, during evaluation therapists often see evidence of problems that patients deny or do287
not recognize (e.g., seeing pills on the floor, stained clothing, signs of poor hygiene). Therapists288
also report that patients may be able to perform a task after therapy, but refuse to accept the289
required adaptation as an improvement over dependency. From another viewpoint, a cynic might290
claim that the therapist is exaggerating the patient’s problems at baseline and exaggerating the291
success of therapy at follow-up, making the intervention look more effective than it actually is.292

18
However, in the final analysis we only can estimate differences between people in judgment293
biases – we cannot know their values relative to a ground truth.294
The purpose of this study has been to present and test a model of judgment bias and show how295
judgment bias can influence measures estimated by psychometric models from observer296
magnitude estimates. The observation of a linear relationship between continuous interval-scale297
measures estimated from ordinal patient ratings and equivalent measures estimated from ordinal298
therapist ratings confirms the linear prediction of the model. Grounded in a simple axiomatic299
scaling theory, the model provides plausible interpretations of the slopes and intercepts of the300
linear relationships in terms of fixed and random bias parameters. This model can be used as a301
tool to study the effects of independent variables on judgment bias or compare differences302
between judges.303

20
References
1. Owsley C, Sloane M, McGwin G Jr, Ball K. Timed instrumental activities of daily living
tasks: relationship to cognitive function and everyday performance assessments in older
adults. Gerontology 2002;48:254-265.
2. McHorney CA, Haley SM, Ware JE Jr. Evaluation of the MOS SF-36 Physical Functioning
Scale (PF-10): II, Comparison of relative precision using Likert and Rasch scoring methods.
J Clin Epidemiol. 1997;50:451-461.
3. Granger CV, Deutsch A, Linn RT. Rasch analysis of the Functional Independence Measure
(FIM) Mastery Test. Arch Phys Med Rehabil. 1998;79:52-57.
4. Massof RW. Understanding Rasch and Item Response Theory models: Applications to the
estimation and validation of interval latent trait measures from responses to rating scale
questionnaires. Ophthal Epidemiol. 2011;18:1-19.
5. Fisher AG. The assessment of IADL motor skills: An application of many-faceted Rasch
analysis. Am J Occup Ther. 1993;47:319-329.
6. U.S. Department of Health & Human Services, Centers for Medicare and Medicaid Services.
(2002). Program memorandum intermediaries/carriers: Transmittal AB-02-078, May 29,
2002. Baltimore, MD: Government Printing Office.
7. Massof RW, Hsu CT, Baker FH, Barnett GD, Park WL, Deremeik JT, Rainey C, Epstein C.
Visual disability variables. I: The importance and difficulty of activity goals for a sample of
low vision patients. Arch Phys Med Rehabil. 2005;86:946-953.
8. Massof RW, Hsu CT, Baker FH, Barnett GD, Park WL, Deremeik JT, Rainey C, Epstein C.
Visual disability variables. II: The difficulty of tasks for a sample of low vision patients.
Arch Phys Med Rehabil. 2005;86:954-967.

21
9. Massof RW, Ahmadian L, Grover LL, Deremeik J T, Goldstein J E, Rainey C, Epstein C,
Barnett GD. The Activity Inventory: an adaptive visual function questionnaire. Optom Vis
Sci, 2007;84:763-774.
10. Centers for Medicare/Medicaid Services. (2004). The Inpatient Rehabilitation Facility-
Patient Assessment Instrument Training Manual. Available from
https://www.cms.gov/medicare/medicare-fee-for-service-
payment/inpatientrehabfacpps/irfpai.html
11. Andrich D. A rating formulation for rating response categories. Psychometrika 1978;43:561-
573.
12. Lincare JM, Wright BD. A user's guide to Winsteps. Rasch model computer program:
Chicago, IL: MESA Press. 2001.
13. Goldstein JE, Chun MW, Fletcher DC, Deremeik JT, Massof RW. Visual ability of patients
seeking outpatient low vision services in the United States. JAMA Ophthalmol
2014;132;1169-1177.
14. Bond, T., & Fox , C. M. Applying the Rasch model: Fundamental measurement in the human
sciences. (2 Ed.). New York, NY: Routledge, 2007.
15. Wilson EB, Hilferty MM. The distribution of chi-square. Proc Natl Acad Sci USA
1931;17:684-688.

Comparison of therapist to patient judgment bias in low vision

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to Comparison of therapist to patient judgment bias in low vision

Similar to Comparison of therapist to patient judgment bias in low vision (20)

Comparison of therapist to patient judgment bias in low vision