SlideShare a Scribd company logo
1
Therapist Judgment Bias and Reliability Relative to that of Patients in
the Estimation of Functional Ability from Ordinal Ratings
Robert W. Massof,1 Theresa M. Smith,2 Lisa S. Foret,3 Guy Davis,3 and Kyoko Fujiwara1
1Lions Vision Research and Rehabilitation Center, Wilmer Eye Institute, Johns Hopkins
University School of Medicine
2Department of Occupational Therapy and Rehabilitation Sciences, University of Texas
Medical Branch Galveston
3Evangeline Home Health, Lake Charles, LA
Supported by grant EY022322 from the National Eye Institute, National Institutes of Health,
Bethesda, MD.
2
Abstract
Objective: To present and evaluate a measurement model for estimating the judgment bias of
therapists and patients when rating functional ability. Design: Observational study of the
agreement between therapist ratings and patient self-ratings of functional ability. Setting:
Measures made by telephone interview and in the patient’s home. Participants: Forty-five home
health care patients who have a secondary diagnosis of low vision. Main Outcome Measures:
Functional ability estimated from Rasch analysis of patient difficulty ratings of calibrated items
(activity goals) in the Activity Inventory (AI) and therapist ratings using a FIM scale of the same
activity goals, both at initial evaluation and again after discharge. Results: A linear relationship
was observed between functional ability measures estimated from therapist ratings and measures
estimated from patient self-ratings with the same slope, but different intercepts, for measures
obtained at baseline and at post-rehabilitation follow-up. Conclusions: The observed linear
relationship between measures estimated from therapist ratings and measures estimated from
patient ratings confirms the model prediction. The intercept corresponds to the difference
between the therapist’s judgment bias and the average judgment bias of all patients. Relative to
patient judgments, the therapist’s estimate of functional ability at baseline was less than the
patients’ estimates; it was greater than the patients’ estimates at follow-up. The slope of the line
corresponds to the square root of the ratio of the between-patient plus within-patient variance in
judgment bias to the within-therapist variance in judgment bias. The results indicate that
between-patient variance is almost 3 times the within-therapist variance.
3
1
Introduction2
Rehabilitation medicine employs three different approaches to estimate the functional ability of3
patients: 1) measures of task performance time and/or accuracy;1 2) patient ratings of their own4
ability and/or frequency of performing activities;2 and 3) ratings by a therapist or proxy of a5
patient’s ability and/or frequency of performing activities.3 Functional ability is a trait of the6
patient. Task performance time and accuracy, patient ratings, and therapist ratings only are7
indicators of functional ability. Measurements of functional ability per se must be inferred from8
the observed indicators. Because functional ability is a property of the patient, valid and unbiased9
measures of functional ability estimated from the three different approaches should agree.10
Measurement validity refers to the accuracy of the assumption that the estimated measure is11
linear with the magnitude of the variable of interest. Measurement bias refers to the agreement12
(or disagreement) between different measures of the same variable when the variable magnitude13
has not changed between measures. In the case of functional ability, measurement validity and14
bias can be influenced by the sample of activities selected for observation and, in the case of15
ratings, by properties of the judge.16
This paper is concerned with comparing functional ability measures estimated from ratings by17
patients to functional ability measures estimated from ratings by a therapist. More specifically,18
this paper focuses on the estimation of relative biases and measurement uncertainties of judges19
when comparing functional ability measures estimated from a therapist’s judgments to functional20
ability measures estimated from patient judgments of themselves. We first present a model of21
patient self-ratings and a parallel model of therapist ratings of the patient, explicitly identifying22
respective biases and sources of variance in the observations, and show how the two sets of23
4
ratings are related. We then test the model with a substantive example using low vision24
rehabilitation of visually impaired home health care patients.25
Model of Patient Self-Ratings and Therapist Ratings26
Using ordered rating scale categories (e.g., level of “difficulty” or level of “independence”), both27
the patient and the therapist are asked to judge the patient’s ability to perform specific activities,28
referred to as “items”. The true ability of patient n, which we are attempting to estimate from the29
patient’s and therapist’s ratings, is 𝛼 𝑛. The ability required to perform each of the items, 𝜌𝑗 for30
item j, is a property of the item that is independent of the judge (whether patient or therapist).31
The model assumes that both the patient and therapist are judging the magnitude of the patient’s32
functional reserve for the activity described by the item, which is the difference between the33
ability of patient n and the ability required by item j, i.e., 𝛼 𝑛 − 𝜌𝑗 . Both the patient and therapist34
are instructed in the use of the ratings, but they develop their own criteria for each rating35
category that they will assign to a patient/item pair. These criteria, or “thresholds”, can be36
thought of as boundaries between neighboring categories on a continuous functional reserve37
scale. The thresholds are denoted as 𝜏 𝑘𝑥 for the boundary set by judge k between rating category38
x-1 and rating category x (k  n in the case of patient self-judgment).39
Although the value of 𝜌𝑗 is independent of the judge, judges’ estimates of 𝜌𝑗 are likely to be40
biased. If 𝜌̂ 𝑘𝑗 is the estimate of 𝜌𝑗 by judge k, then 𝜌̂ 𝑘𝑗 = 𝜌𝑗 + 𝜖 𝑘𝑗 where 𝜖 𝑘𝑗 is the bias of judge41
k in estimating the ability required by item j. Similarly, the average threshold for rating category42
x across a population of judges is 𝜏̅ 𝑥, therefore, 𝜏 𝑘𝑥 = 𝜏̅ 𝑥 + 𝜂 𝑘𝑥 where 𝜂 𝑘𝑥 is the bias of judge k,43
relative to the average judge, in the choice of threshold for rating category x. In the case of44
therapists or proxies, the population of judges would refer to all therapists or to all proxies,45
5
respectively. If we define 𝜖̅𝑘 to be the average bias of judge k across items and 𝜂̅ 𝑘 to be the46
average bias of judge k across rating category thresholds, then we can re-express the bias terms47
as the sum of a fixed variable (average) and a random variable (), i.e., 𝜖 𝑘𝑗 = 𝜖̅𝑘 + 𝛿 𝜖 𝑘𝑗
and48
𝜂 𝑘𝑥 = 𝜂̅ 𝑘 + 𝛿 𝜂 𝑘𝑥
(if there is only a single judge contributing to the estimate of 𝜏̅ 𝑥, then 𝜂̅ 𝑘 = 0).49
In each case, the random variable has an expected value of zero and incorporates variance50
associated with real differences in bias between items and/or categories, estimation uncertainty,51
and parameter instability.52
The judge assigns rating category x to item j if the estimated functional reserve exceeds the53
judge’s criterion for category x (and all lower categories) and is less than the criterion for54
category x+1 (and all higher categories), i.e.,55
𝜏 𝑘1, ⋯ , 𝜏 𝑘𝑥 < 𝛼 𝑛 − 𝜌̂ 𝑘𝑗 < 𝜏 𝑘𝑥+1, ⋯, 𝜏 𝑘𝑚. (1a)56
Substituting the definitions presented in the preceding paragraph and, for judge k, combining the57
random variables into a single random term and combining the fixed bias variables into a single58
fixed term, expression (1a) can be expanded to make the fixed and random variables explicit, i.e.,59
𝜏̅1 + 𝛿 𝑘𝑗1, ⋯, 𝜏̅ 𝑥 + 𝛿 𝑘𝑗𝑥 < 𝛼 𝑛 − 𝜌𝑗 − 𝛽 𝑘 < 𝜏̅ 𝑥+1 + 𝛿 𝑘𝑗𝑥+1, ⋯, 𝜏̅ 𝑚 + 𝛿 𝑘𝑗𝑚 (1b)60
where 𝛿 𝑘𝑗𝑥 = 𝛿 𝜂 𝑘𝑥
+ 𝛿 𝜖 𝑘𝑗
and 𝛽 𝑘 = 𝜖̅𝑘 + 𝜂̅ 𝑘. The judgment bias of judge k is summarized with61
the bias term 𝛽 𝑘 and the reliability of judge k is summarized by the variance of 𝛿 𝑘𝑗𝑥, which we62
designate as 𝜎𝑘𝑗𝑥
2
.63
Rasch analysis is used routinely to estimate the average expected rating category thresholds (𝜏̅𝑥64
for rating category x), the true person measures (𝛼 𝑛 for person n), and the true item measures (𝜌𝑗65
for item j) from distributions of observed ratings across persons and items.4 Judgment bias, 𝛽 𝑘,66
affects the accuracy of the estimates and the variance of the random terms, 𝜎𝑘𝑗𝑥
2
, affects67
6
estimation precision (i.e., reliability). Rasch models assume homogeneity of variance, i.e., 𝜎𝑘𝑗𝑥
2
is68
the same for all persons, items, and rating category thresholds (a requirement of unidimensional69
measures). Homogeneity of variance means that 𝜎𝑘𝑗𝑥
2
= 𝜎𝑘
2
. Rasch models also assume that the70
random terms are statistically independent of one another.4 Various statistical tests are used to71
evaluate how well the set of observed ratings conform to these assumptions of the Rasch model.472
In the case of patient self-judgment, when there are N patients there also are N judges. However,73
Rasch models typically (but not necessarily) assume that there is just a single judge, which in74
effect is the average of the judges. In this case, when 𝜎𝑘
2
is referring to the average of N judges, it75
must include variance between judges, 𝜎 𝑏 𝑛
2
, as well as variance within judges, 𝜎𝑛
2
. We therefore76
define the variance of the average patient judge to be77
𝜎𝑃
2
= 𝜎 𝑏 𝑛
2
+ ∑ 𝜎𝑛
2
𝑁⁄𝑁
𝑛=1 , (2)78
the sum of between patient variance and average within patient variance. When a single therapist79
is the judge, the variance of the therapist can be attributed entirely to the variance within the80
judge, 𝜎 𝑇
2
= 𝜎𝑘
2
. To complete the definition of terms for our model, the fixed judgment bias of81
each patient is 𝛽 𝑛 and the fixed judgment bias of the therapist is 𝛽 𝑇.82
In practice, Rasch models normalize the estimated person and item measures to the square root83
of the judge’s variance and ignore the judge’s bias (unless made explicit in a facet model5). Thus,84
person measures estimated from patient self-judgments are expressed as85
𝛼̂ 𝑛𝑃 = ( 𝛼 𝑛 + 𝛽 𝑃) 𝜎𝑃⁄ (3)86
for person n, where 𝛽 𝑃 = ∑
𝛽 𝑛
𝑁
𝑁
𝑛=1 , the average bias across patients. The person measures87
estimated from a therapist’s judgments are expressed as88
𝛼̂ 𝑛𝑇 = ( 𝛼 𝑛 + 𝛽 𝑇) 𝜎 𝑇⁄ (4)89
7
for the same person n. Because both eqs.(3) and (4) are linear functions of the true person90
measure, 𝛼 𝑛, we expect the relationship between person measures estimated from a therapist’s91
ratings and corresponding person measures estimated from patients rating themselves to be92
𝛼̂ 𝑛𝑇 =
𝜎 𝑃
𝜎 𝑇
𝛼̂ 𝑛𝑃 +
𝛽 𝑇−𝛽 𝑃
𝜎 𝑇
, (5)93
a linear relationship for which the slope is the ratio of the standard deviation for the average94
patient to the standard deviation for the therapist and the intercept is the weighted difference95
between therapist and average patient judgment biases.96
Methods97
ResearchDesign98
The present study is part of a larger observational study still in progress. Data reported here were99
collected pre and post usual occupational therapy intervention provided in the participant’s home100
by one occupational therapist who has specialty training in low vision rehabilitation and 12 years101
of experience providing rehabilitation services to home health care patients with low vision.102
Participants103
Eligibility criteria for the study were: 1) patients were new to the occupational therapist; 2)104
patients were adults admitted to home health care; 3) patients met the visual impairment105
diagnostic criteria for Medicare or other third party coverage of low vision rehabilitation106
services;6 and 4) patients understood English and had good enough hearing to be able to107
participate in telephone interviews. Forty-five low vision patients participated in this study.108
Procedures109
The study conformed to the tenets of the Declaration of Helsinki and was approved by the Johns110
Hopkins Institutional Review Board. After the patient consented to participate, one of the111
8
investigators administered the Activity Inventory (AI),7-9 an adaptive rating scale instrument, by112
telephone interview. Participants rated the importance of the 50 activity goals in the AI, and rated113
the difficulty of those goals that were rated to be at least “slightly important”. In the instructions114
to the participant, both importance and difficulty ratings were qualified as to be able to perform115
the activity “without depending on another person”. Goals included in this study were those that116
the participant also rated to be at least “slightly difficult”. In addition, participants rated the117
difficulties of tasks in the AI that are nested under goals that were rated to be at least slightly118
important and slightly difficult.119
At the time of the initial patient evaluation, the occupational therapist was provided with a list of120
the AI goals and subsidiary tasks that were rated by the participant to be at least slightly difficult,121
however, the actual ratings assigned by the participant to each goal and task were not revealed.122
After completing the initial patient evaluation, the occupational therapist assigned a FIM scale123
score3,10 to each of the participant-identified AI goals. Table 1 lists the FIM rating scale124
categories. The occupational therapist then developed the patient’s plan of care and provided125
rehabilitation services following usual procedures. At discharge the occupational therapist again126
used the FIM scale to rate the participant’s functional independence level for the same AI goals127
that were rated at the initial evaluation. The AI was re-administered to the participant by128
telephone interview one to two months after discharge from occupational therapy.129
Data Analysis130
Rasch analysis, using the Andrich rating scale model11 (Winsteps 3.6512), was employed to131
estimate the visual ability of each participant before and after rehabilitation on a continuous132
interval scale from the participants’ difficulty ratings of the AI goals. The item measures for the133
50 goals in the AI item bank and the response category thresholds for levels of difficulty were134
9
anchored to values estimated from the difficulty ratings of 3200 low vision patients.13 Rasch135
analysis also was performed on the FIM scale ratings of each patient’s AI goals by the136
occupational therapist using the same anchored item measures for the goals. In the case of137
analysis of FIM ratings, participant’s ratings obtained prior to the initial patient evaluation and138
ratings obtained post-discharge were stacked and analyzed together to estimate response139
category thresholds for the 7 FIM scale categories. An information-weighted mean square fit140
statistic (infit) and the standard error were estimated for each response category threshold and for141
each person measure.142
FIM
score
Description
1 Totally dependent – patient able to perform less than 25 % of the task
2 Maximal assistance required – patient able to perform 25% of the task
3 Moderate assistance required – patient able to perform 50% of the task
4 Minimal assistance required – patient able to perform 75% of the task
5 Supervision or set-up required – patient performs task without direct assistance
6 Modified independence – patient requires assistive equipment, more time, or safety
concern
7 Independent – no assistance required, patient able to perform 100% of the task
Table 1143
Functional Independence Measure (FIM) Scale Categories144
145
Results146
Participants147
10
Complete data were obtained from 41 of the 45 enrolled participants. All participants resided in148
Louisiana. Participants consisted of 15 males ( 33%) and 30 females ( 67%) between the ages of149
30 and 98 years old (median = 80, SD = 17). Measured binocular visual acuity with habitual150
correction ranged from 20/20 to 20/900 (median = 20/65, SD= 0.52 log MAR); 3 participants151
had no light perception in either eye and 2 participants had only light perception in the better eye.152
Among participants with measurable visual acuity, binocular log contrast sensitivity ranged from153
0.07 to 1.67 (normal>1.6; median = 1.02, SD = 0.44). For binocular central visual field measures154
(12.5o), 35% of participants had central scotomas (blind spots), 20% had hemi- or quad-field155
defects, 27% had contracted visual fields, and visual fields could not be performed on 18% .156
FIM Rating Scale Evaluation157
The therapist used all 7 of the FIM scale response categories to rate AI goals selected by158
participants at baseline and/or at follow-up. As shown in the Table 2 columns labeled Baseline159
Count and Follow-up Count, FIM scale scores of 4 or less were used most frequently at baseline160
and FIM scale scores of 5 or 6 were used most frequently at follow-up. The category threshold161
corresponds to the value of functional reserve (difference between the estimated person measure162
and estimated item measure) at which the probability of using FIM score x is equal to the163
probability of using FIM score x-1, for x = 2 to 7. The ordering of thresholds should agree with164
the ordering of the FIM scale scores. The thresholds are ordered for response categories 2165
through 6. The threshold for response category 7 is disordered. However, the assignment of FIM166
scale score 7 occurred rarely – it represents only 1.3% of the total number of FIM scale scores167
assigned.168
The Rasch model predicts the response category assigned to every combination of person and169
item measures. The residual is defined to be the difference between the FIM scale score observed170
11
for each person/item combination and the FIM scale score predicted for the corresponding171
person and item measure estimates. The infit mean square is the ratio of the observed sums of172
squared residuals for FIM ratings, which are expected to be distributed as 2, to the sums of173
squared residuals expected by the Rasch model, which corresponds to the expected value of 2.174
The expected value of 2 is equal to the degrees of freedom, thus, the infit mean square is175
expected to be distributed as 2/df, which in turn has an expected value of 1.0.4 The infit mean176
square is interpreted as the ratio of the observed variance in the residuals to the expected177
variance. Infit mean square values greater than 1.0 indicate that the observed variance is greater178
than expected. As can be seen in the last two columns of Table 2, the observed variance in179
residuals for response category 6 is more than twice the expected variance both at baseline and at180
follow-up. As a rule of thumb, infit mean squares greater than 1.3 are considered to be indicative181
of excessive observed variance.14 With that criterion, only FIM response categories 1 through 3182
at baseline and 4 and 5 at follow-up behave as expected by the Rasch model, which suggests183
inconsistency in the use of the other FIM response categories across patients and/or across items.184
Table 2185
Functional Independence Measure (FIM) response counts, estimated category thresholds in the Andrich186
model, and information-weighted mean square residuals (Infit) at baseline and follow-up by rating scale187
response category.188
12
189
Infit mean squares also were estimated for each participant at baseline by summing observed190
squared residuals and expected squared residuals across goals. For degrees of freedom of 25 or191
greater, the cube-root of the 2 distribution is well approximated by a normal distribution.15192
Therefore, the infit mean square for each participant was transformed to a standard normal193
deviate and expressed as a z-score.4 Figure 1 illustrates the distribution of infit z-scores on the194
abscissa and the distribution of person measures, i.e., estimated functional ability, on the ordinate195
for all 41 participants. The solid vertical line indicates the expected value of the infit z-score and196
the dashed vertical lines define the range of plus-and-minus two standard deviations from the197
expected value. The majority of participants’ infit mean square z-scores are symmetrically198
distributed about the expected value of 0 and fall in the expected range of +2 SD. These results199
are consistent with the expectations of a valid measure. However, there are seven clear outliers200
where the observed variance in the residuals is more than two standard deviations greater than201
the expected variance. The functional abilities of these outliers fall in the middle of the202
participants’ distribution of functional ability (on the vertical axis).203
Rating scale Baseline Follow-up Category Baseline Follow-up
FIM Score Count Count threshold Infit Infit
1 103 28 NA 1.27 3.47
2 107 25 -2.88 1.2 3.17
3 124 16 -2.03 1.29 2.06
4 145 41 -1.11 1.61 1.01
5 31 123 1.55 1.71 0.91
6 4 212 2.83 2.25 2.14
7 3 10 1.63 1.75 2.83
13
204
Figure 1. Distribution of infit z-scores across items for each participant on the abscissa and the205
distribution of person measures on the ordinate.206
Comparison of Functional Ability Estimates from AI and FIM Ratings207
Because all AI item measures were anchored to calibrated values, i.e., 𝜌𝑗 in eq. (1b), person208
measure estimates from patients’ difficulty ratings and person measure estimates for the same209
patients from the therapist’s FIM ratings are expected to be in the same units of functional210
ability. However, the Andrich rating scale model assumes that the variance in judgment bias is211
constant, thereby normalizing the true values of functional ability, i.e., 𝛼 𝑛 in eq. (3) and eq. (4),212
to the standard deviation of judgment bias, i.e., 𝜎𝑃 in eq. (3) and 𝜎 𝑇 in eq. (4). Thus, we expect213
the standard errors of the two sets of estimated person measures to agree. There is no significant214
difference (paired t-test, p=0.93) between the standard error of the person measure estimated215
from patient difficulty ratings (mean = 0.414) and the standard error of the person measure216
estimated from therapist FIM ratings (mean = 0.415).217
-6
-5
-4
-3
-2
-1
0
1
-3 -2 -1 0 1 2 3 4 5 6
FIM-estimatedpersonmeasure(anchoredAIgoals)
INFIT MNSQ (zstd)
14
It is possible that FIM ratings could be different enough from difficulty ratings that using item218
measures anchored with values estimated from difficulty ratings is not appropriate for the FIM219
scale. If so, variance in residuals should be greater for FIM ratings than for difficulty ratings.220
With the exception of the FIM outliers noted above, Figure 2 illustrates that the z-scores for221
transformed infit mean squares for the two sets of estimates of person measures at baseline are222
within the range of values expected by the 2 distribution (2 SD box).223
224
Figure 2. Z-scores for transformed infit mean squares for person measures estimated from therapist FIM225
ratings (ordinate) vs. transformed infit mean squats for person measures estimated from patients’226
difficulty ratings (abscissa).227
Measures of functional ability, both at baseline and post-discharge, were estimated from patients’228
difficulty ratings of those AI goals that were rated at baseline to be at least slightly important.229
Measures of functional ability also were estimated for the same patients at baseline and at230
discharge from the therapist’s ratings of the same set of AI goals for each patient using FIM scale231
scores. For measures based on patients’ difficulty ratings and measures based on the therapist’s232
FIM scale scores, the mean functional ability at baseline was subtracted from each corresponding233
baseline measure and the mean functional ability at post-discharge was subtracted from each234
-4
-2
0
2
4
6
8
10
-4 -2 0 2 4 6 8 10
INFITMNSQZSTD(FIM)
INFIT MNSQ ZSTD (AI)
15
corresponding post-discharge measure. Figure 3 is a scatter plot comparing measures based on235
patients’ difficulty ratings of the important AI goals (abscissa) to the occupational therapist FIM236
scale ratings of the same AI goals (ordinate) for baseline (filled circles) and post-discharge (open237
circles) measures relative to their respective means. Bivariate linear regression, minimizing238
orthogonal distance of data points from the regression line (i.e., principal component), was239
performed on the combined baseline and post-discharge data. The slope of the regression line is240
1.96 and the intercept is -0.04. The Pearson correlation is 0.52.241
242
Figure 3. Comparing person measures based on patients’ difficulty ratings of important AI goals243
(abscissa) to occupational therapist FIM scale ratings of the same AI goals (ordinate) for baseline and244
post-discharge measures relative to their respective means.245
Figure 4 illustrates scatter plots of the unadjusted functional ability measures estimated from the246
occupational therapist FIM scale ratings of AI goals (ordinate) versus the unadjusted functional247
ability measures estimated from the patient’s difficulty ratings of the same AI goals (abscissa) at248
baseline (filled circles) and at post-discharge follow-up (open circles). The lines fit to the data by249
orthogonal regression have the same slope (1.96), which was estimated from the regression line250
fit to the combined data in Figure 3. The intercepts are -1.02 for the baseline measures and 1.63251
for the post-discharge measures. The dashed lines illustrate the respective mean functional ability252
-3
-2
-1
0
1
2
3
-2 -1.5 -1 -0.5 0 0.5 1 1.5
Functionalability(OTFIMscale)-Mean
Functional ability (patient difficulty ratings) - Mean
PRE
POST
16
measures. The difference between the vertical dashed lines is the intervention effect (difference253
between the means) estimated from patient difficulty ratings (translates to Cohen’s effect size =254
0.49) and the difference between the horizontal dashed lines is the intervention effect estimated255
from the therapist’s FIM scale ratings (Cohen’s effect size = 3.28)256
257
Figure 4. Unadjusted functional ability measures estimated from the occupational therapist FIM scale258
ratings of AI goals (ordinate) versus unadjusted functional ability measures estimated from patient’s259
difficulty ratings of same AI goals (abscissa) at baseline (filled circles) and at post-discharge follow-up260
(open circles).261
Discussion and Conclusions262
The linear relationship between functional ability estimated from patient difficulty ratings and263
functional ability estimated from the therapist’s FIM scale ratings confirms the expectations of264
the model expressed by eqs. (3) and (4), which lead to the specific prediction of a linear function265
expressed by eq. (5). If we interpret the results in Figure 4 in terms of eq. (5), then we must266
conclude from the slope of the regression lines that 𝜎𝑃 = 1.96𝜎 𝑇, both at baseline and at post-267
discharge follow-up. This result means that the variance in bias for the average of the patients is268
nearly 4 times that of the within person variance in bias for our single therapist. If we can assume269
that the average variance in bias within patients is approximately the same as the within person270
-5
-4
-3
-2
-1
0
1
2
3
4
-2 -1.5 -1 -0.5 0 0.5 1
Functionalability(OTFIMscalerating)
Functional ability (patient difficulty rating)
PRE
POST
17
variance in bias of our sole therapist, then in eq. (2), ∑ 𝜎𝑛
2
𝑁⁄𝑁
𝑛=1 ≅ 𝜎 𝑇
2
, and substituting 1.962
𝜎 𝑇
2
271
for 𝜎𝑃
2
in eq. (2), we obtain an estimate for the standard deviation of bias between-patients to be272
𝜎𝑏 𝑛
= 1.69𝜎 𝑇.273
From eq. (5), the intercepts of the regression lines in Figure (4) correspond to the difference274
between the fixed bias of the average patient and the therapist’s fixed bias, in within-therapist275
standard deviation units. The intercept for baseline measures indicates that fixed bias for the276
average patient, 𝛽 𝑃 is 1.02 logits greater than the therapist’s fixed bias, 𝛽 𝑇. However, post-277
discharge the therapist’s fixed bias is 1.63 logits greater than the fixed bias of the average278
patient. From the patients’ perspective, the therapist is underestimating patients’ functional279
abilities at baseline and overestimating patients’ functional abilities at post-discharge follow-up.280
From the therapist’s perspective, the patients are overestimating their functional abilities at281
baseline and underestimating their functional abilities at post-discharge follow-up.282
We cannot draw any conclusions from this study about why the difference between therapist and283
average patient bias is negative at baseline and positive at post-discharge follow-up. One could284
speculate that patients tend to be stoic and/or stubborn – underestimating the magnitude of their285
problems at baseline and underestimating improvements in their function at follow-up.286
Anecdotally, during evaluation therapists often see evidence of problems that patients deny or do287
not recognize (e.g., seeing pills on the floor, stained clothing, signs of poor hygiene). Therapists288
also report that patients may be able to perform a task after therapy, but refuse to accept the289
required adaptation as an improvement over dependency. From another viewpoint, a cynic might290
claim that the therapist is exaggerating the patient’s problems at baseline and exaggerating the291
success of therapy at follow-up, making the intervention look more effective than it actually is.292
18
However, in the final analysis we only can estimate differences between people in judgment293
biases – we cannot know their values relative to a ground truth.294
The purpose of this study has been to present and test a model of judgment bias and show how295
judgment bias can influence measures estimated by psychometric models from observer296
magnitude estimates. The observation of a linear relationship between continuous interval-scale297
measures estimated from ordinal patient ratings and equivalent measures estimated from ordinal298
therapist ratings confirms the linear prediction of the model. Grounded in a simple axiomatic299
scaling theory, the model provides plausible interpretations of the slopes and intercepts of the300
linear relationships in terms of fixed and random bias parameters. This model can be used as a301
tool to study the effects of independent variables on judgment bias or compare differences302
between judges.303
19
20
References
1. Owsley C, Sloane M, McGwin G Jr, Ball K. Timed instrumental activities of daily living
tasks: relationship to cognitive function and everyday performance assessments in older
adults. Gerontology 2002;48:254-265.
2. McHorney CA, Haley SM, Ware JE Jr. Evaluation of the MOS SF-36 Physical Functioning
Scale (PF-10): II, Comparison of relative precision using Likert and Rasch scoring methods.
J Clin Epidemiol. 1997;50:451-461.
3. Granger CV, Deutsch A, Linn RT. Rasch analysis of the Functional Independence Measure
(FIM) Mastery Test. Arch Phys Med Rehabil. 1998;79:52-57.
4. Massof RW. Understanding Rasch and Item Response Theory models: Applications to the
estimation and validation of interval latent trait measures from responses to rating scale
questionnaires. Ophthal Epidemiol. 2011;18:1-19.
5. Fisher AG. The assessment of IADL motor skills: An application of many-faceted Rasch
analysis. Am J Occup Ther. 1993;47:319-329.
6. U.S. Department of Health & Human Services, Centers for Medicare and Medicaid Services.
(2002). Program memorandum intermediaries/carriers: Transmittal AB-02-078, May 29,
2002. Baltimore, MD: Government Printing Office.
7. Massof RW, Hsu CT, Baker FH, Barnett GD, Park WL, Deremeik JT, Rainey C, Epstein C.
Visual disability variables. I: The importance and difficulty of activity goals for a sample of
low vision patients. Arch Phys Med Rehabil. 2005;86:946-953.
8. Massof RW, Hsu CT, Baker FH, Barnett GD, Park WL, Deremeik JT, Rainey C, Epstein C.
Visual disability variables. II: The difficulty of tasks for a sample of low vision patients.
Arch Phys Med Rehabil. 2005;86:954-967.
21
9. Massof RW, Ahmadian L, Grover LL, Deremeik J T, Goldstein J E, Rainey C, Epstein C,
Barnett GD. The Activity Inventory: an adaptive visual function questionnaire. Optom Vis
Sci, 2007;84:763-774.
10. Centers for Medicare/Medicaid Services. (2004). The Inpatient Rehabilitation Facility-
Patient Assessment Instrument Training Manual. Available from
https://www.cms.gov/medicare/medicare-fee-for-service-
payment/inpatientrehabfacpps/irfpai.html
11. Andrich D. A rating formulation for rating response categories. Psychometrika 1978;43:561-
573.
12. Lincare JM, Wright BD. A user's guide to Winsteps. Rasch model computer program:
Chicago, IL: MESA Press. 2001.
13. Goldstein JE, Chun MW, Fletcher DC, Deremeik JT, Massof RW. Visual ability of patients
seeking outpatient low vision services in the United States. JAMA Ophthalmol
2014;132;1169-1177.
14. Bond, T., & Fox , C. M. Applying the Rasch model: Fundamental measurement in the human
sciences. (2 Ed.). New York, NY: Routledge, 2007.
15. Wilson EB, Hilferty MM. The distribution of chi-square. Proc Natl Acad Sci USA
1931;17:684-688.

More Related Content

What's hot

Business Research Methods PPT -IV
Business Research Methods PPT -IVBusiness Research Methods PPT -IV
Business Research Methods PPT -IV
Ravinder Singh
 
Measures and Feedback (miller & schuckard, 2014)
Measures and Feedback (miller & schuckard, 2014)Measures and Feedback (miller & schuckard, 2014)
Measures and Feedback (miller & schuckard, 2014)
Scott Miller
 
Statistical significance
Statistical significanceStatistical significance
Statistical significance
Mai Ngoc Duc
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testingNirajan Bam
 
95720357 a-design-of-experiments
95720357 a-design-of-experiments95720357 a-design-of-experiments
95720357 a-design-of-experiments
Sathish Kumar
 
Reese Norsworthy Rowlands
Reese Norsworthy RowlandsReese Norsworthy Rowlands
Reese Norsworthy Rowlands
Barry Duncan
 
Hypothesis testing; z test, t-test. f-test
Hypothesis testing; z test, t-test. f-testHypothesis testing; z test, t-test. f-test
Hypothesis testing; z test, t-test. f-test
Shakehand with Life
 
Reliability Analysis
Reliability AnalysisReliability Analysis
Reliability Analysis
Muhammad Tawakal Shah
 
Webinar slides- alternatives to the p-value and power
Webinar slides- alternatives to the p-value and power Webinar slides- alternatives to the p-value and power
Webinar slides- alternatives to the p-value and power
nQuery
 
Measures and feedback 2016
Measures and feedback 2016Measures and feedback 2016
Measures and feedback 2016
Scott Miller
 
Zac Peterson Poster
Zac Peterson PosterZac Peterson Poster
Zac Peterson PosterZac Peterson
 
Hypothesis testing: A single sample test
Hypothesis testing: A single sample testHypothesis testing: A single sample test
Hypothesis testing: A single sample test
Umme Salma Tuli
 
Hypothesis testing ppt final
Hypothesis testing ppt finalHypothesis testing ppt final
Hypothesis testing ppt finalpiyushdhaker
 
Avoid overfitting in precision medicine: How to use cross-validation to relia...
Avoid overfitting in precision medicine: How to use cross-validation to relia...Avoid overfitting in precision medicine: How to use cross-validation to relia...
Avoid overfitting in precision medicine: How to use cross-validation to relia...
Nicole Krämer
 
Crossover design ppt
Crossover design pptCrossover design ppt
Crossover design ppt
HARISH J
 
2020 trends in biostatistics what you should know about study design - slid...
2020 trends in biostatistics   what you should know about study design - slid...2020 trends in biostatistics   what you should know about study design - slid...
2020 trends in biostatistics what you should know about study design - slid...
nQuery
 
Statistical tests
Statistical testsStatistical tests
Statistical tests
martyynyyte
 
Statistical analysis for large sample
Statistical analysis for large sampleStatistical analysis for large sample
Statistical analysis for large sample
Navya Kini
 

What's hot (20)

Business Research Methods PPT -IV
Business Research Methods PPT -IVBusiness Research Methods PPT -IV
Business Research Methods PPT -IV
 
Measures and Feedback (miller & schuckard, 2014)
Measures and Feedback (miller & schuckard, 2014)Measures and Feedback (miller & schuckard, 2014)
Measures and Feedback (miller & schuckard, 2014)
 
Reliability
ReliabilityReliability
Reliability
 
Statistical significance
Statistical significanceStatistical significance
Statistical significance
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
95720357 a-design-of-experiments
95720357 a-design-of-experiments95720357 a-design-of-experiments
95720357 a-design-of-experiments
 
Reese Norsworthy Rowlands
Reese Norsworthy RowlandsReese Norsworthy Rowlands
Reese Norsworthy Rowlands
 
Experimental
ExperimentalExperimental
Experimental
 
Hypothesis testing; z test, t-test. f-test
Hypothesis testing; z test, t-test. f-testHypothesis testing; z test, t-test. f-test
Hypothesis testing; z test, t-test. f-test
 
Reliability Analysis
Reliability AnalysisReliability Analysis
Reliability Analysis
 
Webinar slides- alternatives to the p-value and power
Webinar slides- alternatives to the p-value and power Webinar slides- alternatives to the p-value and power
Webinar slides- alternatives to the p-value and power
 
Measures and feedback 2016
Measures and feedback 2016Measures and feedback 2016
Measures and feedback 2016
 
Zac Peterson Poster
Zac Peterson PosterZac Peterson Poster
Zac Peterson Poster
 
Hypothesis testing: A single sample test
Hypothesis testing: A single sample testHypothesis testing: A single sample test
Hypothesis testing: A single sample test
 
Hypothesis testing ppt final
Hypothesis testing ppt finalHypothesis testing ppt final
Hypothesis testing ppt final
 
Avoid overfitting in precision medicine: How to use cross-validation to relia...
Avoid overfitting in precision medicine: How to use cross-validation to relia...Avoid overfitting in precision medicine: How to use cross-validation to relia...
Avoid overfitting in precision medicine: How to use cross-validation to relia...
 
Crossover design ppt
Crossover design pptCrossover design ppt
Crossover design ppt
 
2020 trends in biostatistics what you should know about study design - slid...
2020 trends in biostatistics   what you should know about study design - slid...2020 trends in biostatistics   what you should know about study design - slid...
2020 trends in biostatistics what you should know about study design - slid...
 
Statistical tests
Statistical testsStatistical tests
Statistical tests
 
Statistical analysis for large sample
Statistical analysis for large sampleStatistical analysis for large sample
Statistical analysis for large sample
 

Viewers also liked

Tecnicas de recoleccion de datos
Tecnicas de recoleccion de datosTecnicas de recoleccion de datos
Tecnicas de recoleccion de datos
Guillermo Alarcon Bedoya.
 
The Monarch School and Institute - About
The Monarch School and Institute - AboutThe Monarch School and Institute - About
The Monarch School and Institute - About
Tracy Burnett, LCSW, CFRE
 
A survey on weighted clustering techniques in manets
A survey on weighted clustering techniques in manetsA survey on weighted clustering techniques in manets
A survey on weighted clustering techniques in manets
IAEME Publication
 
Online framework for video stabilization
Online framework for video stabilizationOnline framework for video stabilization
Online framework for video stabilization
IAEME Publication
 
Husky Pensions for Newbies
Husky Pensions for NewbiesHusky Pensions for Newbies
Husky Pensions for Newbies
Husky Finance
 
The Well-EQUIPped Classroom: Using the Electronic Quality of Inquiry Protocol...
The Well-EQUIPped Classroom: Using the Electronic Quality of Inquiry Protocol...The Well-EQUIPped Classroom: Using the Electronic Quality of Inquiry Protocol...
The Well-EQUIPped Classroom: Using the Electronic Quality of Inquiry Protocol...
Timothy Duckett
 
5989 8541 es
5989 8541 es5989 8541 es
5989 8541 es
Bibiana Portillo
 
Crowdfunding for Sustainable Entrepreneurship and Innovation
Crowdfunding for Sustainable Entrepreneurship and InnovationCrowdfunding for Sustainable Entrepreneurship and Innovation
Crowdfunding for Sustainable Entrepreneurship and Innovation
Crowdsourcing Week
 

Viewers also liked (10)

Tecnicas de recoleccion de datos
Tecnicas de recoleccion de datosTecnicas de recoleccion de datos
Tecnicas de recoleccion de datos
 
The Monarch School and Institute - About
The Monarch School and Institute - AboutThe Monarch School and Institute - About
The Monarch School and Institute - About
 
A survey on weighted clustering techniques in manets
A survey on weighted clustering techniques in manetsA survey on weighted clustering techniques in manets
A survey on weighted clustering techniques in manets
 
Online framework for video stabilization
Online framework for video stabilizationOnline framework for video stabilization
Online framework for video stabilization
 
Husky Pensions for Newbies
Husky Pensions for NewbiesHusky Pensions for Newbies
Husky Pensions for Newbies
 
The Well-EQUIPped Classroom: Using the Electronic Quality of Inquiry Protocol...
The Well-EQUIPped Classroom: Using the Electronic Quality of Inquiry Protocol...The Well-EQUIPped Classroom: Using the Electronic Quality of Inquiry Protocol...
The Well-EQUIPped Classroom: Using the Electronic Quality of Inquiry Protocol...
 
SAURABH RESUME
SAURABH RESUMESAURABH RESUME
SAURABH RESUME
 
5989 8541 es
5989 8541 es5989 8541 es
5989 8541 es
 
Introduction of GS Caltex_2015
Introduction of GS Caltex_2015Introduction of GS Caltex_2015
Introduction of GS Caltex_2015
 
Crowdfunding for Sustainable Entrepreneurship and Innovation
Crowdfunding for Sustainable Entrepreneurship and InnovationCrowdfunding for Sustainable Entrepreneurship and Innovation
Crowdfunding for Sustainable Entrepreneurship and Innovation
 

Similar to Comparison of therapist to patient judgment bias in low vision

linear model multiple predictors.pdf
linear model multiple predictors.pdflinear model multiple predictors.pdf
linear model multiple predictors.pdf
ssuser7d5314
 
Discriminant analysis.pptx
Discriminant analysis.pptxDiscriminant analysis.pptx
Discriminant analysis.pptx
DevendraRavindraPati
 
Chi square(hospital admin) A
Chi square(hospital admin) AChi square(hospital admin) A
Chi square(hospital admin) A
Mmedsc Hahm
 
Data Processing and Statistical Treatment: Spreads and Correlation
Data Processing and Statistical Treatment: Spreads and CorrelationData Processing and Statistical Treatment: Spreads and Correlation
Data Processing and Statistical Treatment: Spreads and Correlation
Janet Penilla
 
36086 Topic Discussion3Number of Pages 2 (Double Spaced).docx
36086 Topic Discussion3Number of Pages 2 (Double Spaced).docx36086 Topic Discussion3Number of Pages 2 (Double Spaced).docx
36086 Topic Discussion3Number of Pages 2 (Double Spaced).docx
rhetttrevannion
 
Validity, Reliability ,Objective & Their Types
Validity, Reliability ,Objective & Their TypesValidity, Reliability ,Objective & Their Types
Validity, Reliability ,Objective & Their Types
MohammadRabbani18
 
Chi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical VariableChi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical Variable
Nigar Kadar Mujawar,Womens College of Pharmacy,Peth Vadgaon,Kolhapur,416112
 
JSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerJSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerDennis Sweitzer
 
2013jsm,Proceedings,DSweitzer,26sep
2013jsm,Proceedings,DSweitzer,26sep2013jsm,Proceedings,DSweitzer,26sep
2013jsm,Proceedings,DSweitzer,26sepDennis Sweitzer
 
parameter Estimation and effect size
parameter Estimation and effect size parameter Estimation and effect size
parameter Estimation and effect size
hannantahir30
 
EXERCISE 24 UNDERSTANDING PEARSONS r, EFFECT SIZE, AND PERCEN.docx
EXERCISE 24 UNDERSTANDING PEARSONS r, EFFECT SIZE, AND PERCEN.docxEXERCISE 24 UNDERSTANDING PEARSONS r, EFFECT SIZE, AND PERCEN.docx
EXERCISE 24 UNDERSTANDING PEARSONS r, EFFECT SIZE, AND PERCEN.docx
SANSKAR20
 
clinical evaluation of diagnostic methods.pptx
clinical evaluation of diagnostic methods.pptxclinical evaluation of diagnostic methods.pptx
clinical evaluation of diagnostic methods.pptx
JyotiSharma560718
 
Qualities of good evaluation tool (1)
Qualities of good evaluation  tool (1)Qualities of good evaluation  tool (1)
Lecture no 04 mspt 1st semester research methods in rehabilitation by abdul g...
Lecture no 04 mspt 1st semester research methods in rehabilitation by abdul g...Lecture no 04 mspt 1st semester research methods in rehabilitation by abdul g...
Lecture no 04 mspt 1st semester research methods in rehabilitation by abdul g...
riphah college of rehabilitation sciences
 
5 regressionand correlation
5 regressionand correlation5 regressionand correlation
5 regressionand correlation
Lama K Banna
 
Development of health measurement scales – part 2
Development of health measurement scales – part 2Development of health measurement scales – part 2
Development of health measurement scales – part 2Rizwan S A
 
Stats 3000 Week 1 - Winter 2011
Stats 3000 Week 1 - Winter 2011Stats 3000 Week 1 - Winter 2011
Stats 3000 Week 1 - Winter 2011Lauren Crosby
 
Research methodology and biostatistics
Research methodology and biostatisticsResearch methodology and biostatistics
Research methodology and biostatistics
Medical Ultrasound
 

Similar to Comparison of therapist to patient judgment bias in low vision (20)

linear model multiple predictors.pdf
linear model multiple predictors.pdflinear model multiple predictors.pdf
linear model multiple predictors.pdf
 
Discriminant analysis.pptx
Discriminant analysis.pptxDiscriminant analysis.pptx
Discriminant analysis.pptx
 
Chi square(hospital admin) A
Chi square(hospital admin) AChi square(hospital admin) A
Chi square(hospital admin) A
 
Data Processing and Statistical Treatment: Spreads and Correlation
Data Processing and Statistical Treatment: Spreads and CorrelationData Processing and Statistical Treatment: Spreads and Correlation
Data Processing and Statistical Treatment: Spreads and Correlation
 
36086 Topic Discussion3Number of Pages 2 (Double Spaced).docx
36086 Topic Discussion3Number of Pages 2 (Double Spaced).docx36086 Topic Discussion3Number of Pages 2 (Double Spaced).docx
36086 Topic Discussion3Number of Pages 2 (Double Spaced).docx
 
Validity, Reliability ,Objective & Their Types
Validity, Reliability ,Objective & Their TypesValidity, Reliability ,Objective & Their Types
Validity, Reliability ,Objective & Their Types
 
Chi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical VariableChi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical Variable
 
JSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerJSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzer
 
2013jsm,Proceedings,DSweitzer,26sep
2013jsm,Proceedings,DSweitzer,26sep2013jsm,Proceedings,DSweitzer,26sep
2013jsm,Proceedings,DSweitzer,26sep
 
parameter Estimation and effect size
parameter Estimation and effect size parameter Estimation and effect size
parameter Estimation and effect size
 
MSTHESIS_Fuzzy
MSTHESIS_FuzzyMSTHESIS_Fuzzy
MSTHESIS_Fuzzy
 
EXERCISE 24 UNDERSTANDING PEARSONS r, EFFECT SIZE, AND PERCEN.docx
EXERCISE 24 UNDERSTANDING PEARSONS r, EFFECT SIZE, AND PERCEN.docxEXERCISE 24 UNDERSTANDING PEARSONS r, EFFECT SIZE, AND PERCEN.docx
EXERCISE 24 UNDERSTANDING PEARSONS r, EFFECT SIZE, AND PERCEN.docx
 
clinical evaluation of diagnostic methods.pptx
clinical evaluation of diagnostic methods.pptxclinical evaluation of diagnostic methods.pptx
clinical evaluation of diagnostic methods.pptx
 
Qualities of good evaluation tool (1)
Qualities of good evaluation  tool (1)Qualities of good evaluation  tool (1)
Qualities of good evaluation tool (1)
 
Lecture no 04 mspt 1st semester research methods in rehabilitation by abdul g...
Lecture no 04 mspt 1st semester research methods in rehabilitation by abdul g...Lecture no 04 mspt 1st semester research methods in rehabilitation by abdul g...
Lecture no 04 mspt 1st semester research methods in rehabilitation by abdul g...
 
Sample size calculation final
Sample size calculation finalSample size calculation final
Sample size calculation final
 
5 regressionand correlation
5 regressionand correlation5 regressionand correlation
5 regressionand correlation
 
Development of health measurement scales – part 2
Development of health measurement scales – part 2Development of health measurement scales – part 2
Development of health measurement scales – part 2
 
Stats 3000 Week 1 - Winter 2011
Stats 3000 Week 1 - Winter 2011Stats 3000 Week 1 - Winter 2011
Stats 3000 Week 1 - Winter 2011
 
Research methodology and biostatistics
Research methodology and biostatisticsResearch methodology and biostatistics
Research methodology and biostatistics
 

Comparison of therapist to patient judgment bias in low vision

  • 1. 1 Therapist Judgment Bias and Reliability Relative to that of Patients in the Estimation of Functional Ability from Ordinal Ratings Robert W. Massof,1 Theresa M. Smith,2 Lisa S. Foret,3 Guy Davis,3 and Kyoko Fujiwara1 1Lions Vision Research and Rehabilitation Center, Wilmer Eye Institute, Johns Hopkins University School of Medicine 2Department of Occupational Therapy and Rehabilitation Sciences, University of Texas Medical Branch Galveston 3Evangeline Home Health, Lake Charles, LA Supported by grant EY022322 from the National Eye Institute, National Institutes of Health, Bethesda, MD.
  • 2. 2 Abstract Objective: To present and evaluate a measurement model for estimating the judgment bias of therapists and patients when rating functional ability. Design: Observational study of the agreement between therapist ratings and patient self-ratings of functional ability. Setting: Measures made by telephone interview and in the patient’s home. Participants: Forty-five home health care patients who have a secondary diagnosis of low vision. Main Outcome Measures: Functional ability estimated from Rasch analysis of patient difficulty ratings of calibrated items (activity goals) in the Activity Inventory (AI) and therapist ratings using a FIM scale of the same activity goals, both at initial evaluation and again after discharge. Results: A linear relationship was observed between functional ability measures estimated from therapist ratings and measures estimated from patient self-ratings with the same slope, but different intercepts, for measures obtained at baseline and at post-rehabilitation follow-up. Conclusions: The observed linear relationship between measures estimated from therapist ratings and measures estimated from patient ratings confirms the model prediction. The intercept corresponds to the difference between the therapist’s judgment bias and the average judgment bias of all patients. Relative to patient judgments, the therapist’s estimate of functional ability at baseline was less than the patients’ estimates; it was greater than the patients’ estimates at follow-up. The slope of the line corresponds to the square root of the ratio of the between-patient plus within-patient variance in judgment bias to the within-therapist variance in judgment bias. The results indicate that between-patient variance is almost 3 times the within-therapist variance.
  • 3. 3 1 Introduction2 Rehabilitation medicine employs three different approaches to estimate the functional ability of3 patients: 1) measures of task performance time and/or accuracy;1 2) patient ratings of their own4 ability and/or frequency of performing activities;2 and 3) ratings by a therapist or proxy of a5 patient’s ability and/or frequency of performing activities.3 Functional ability is a trait of the6 patient. Task performance time and accuracy, patient ratings, and therapist ratings only are7 indicators of functional ability. Measurements of functional ability per se must be inferred from8 the observed indicators. Because functional ability is a property of the patient, valid and unbiased9 measures of functional ability estimated from the three different approaches should agree.10 Measurement validity refers to the accuracy of the assumption that the estimated measure is11 linear with the magnitude of the variable of interest. Measurement bias refers to the agreement12 (or disagreement) between different measures of the same variable when the variable magnitude13 has not changed between measures. In the case of functional ability, measurement validity and14 bias can be influenced by the sample of activities selected for observation and, in the case of15 ratings, by properties of the judge.16 This paper is concerned with comparing functional ability measures estimated from ratings by17 patients to functional ability measures estimated from ratings by a therapist. More specifically,18 this paper focuses on the estimation of relative biases and measurement uncertainties of judges19 when comparing functional ability measures estimated from a therapist’s judgments to functional20 ability measures estimated from patient judgments of themselves. We first present a model of21 patient self-ratings and a parallel model of therapist ratings of the patient, explicitly identifying22 respective biases and sources of variance in the observations, and show how the two sets of23
  • 4. 4 ratings are related. We then test the model with a substantive example using low vision24 rehabilitation of visually impaired home health care patients.25 Model of Patient Self-Ratings and Therapist Ratings26 Using ordered rating scale categories (e.g., level of “difficulty” or level of “independence”), both27 the patient and the therapist are asked to judge the patient’s ability to perform specific activities,28 referred to as “items”. The true ability of patient n, which we are attempting to estimate from the29 patient’s and therapist’s ratings, is 𝛼 𝑛. The ability required to perform each of the items, 𝜌𝑗 for30 item j, is a property of the item that is independent of the judge (whether patient or therapist).31 The model assumes that both the patient and therapist are judging the magnitude of the patient’s32 functional reserve for the activity described by the item, which is the difference between the33 ability of patient n and the ability required by item j, i.e., 𝛼 𝑛 − 𝜌𝑗 . Both the patient and therapist34 are instructed in the use of the ratings, but they develop their own criteria for each rating35 category that they will assign to a patient/item pair. These criteria, or “thresholds”, can be36 thought of as boundaries between neighboring categories on a continuous functional reserve37 scale. The thresholds are denoted as 𝜏 𝑘𝑥 for the boundary set by judge k between rating category38 x-1 and rating category x (k  n in the case of patient self-judgment).39 Although the value of 𝜌𝑗 is independent of the judge, judges’ estimates of 𝜌𝑗 are likely to be40 biased. If 𝜌̂ 𝑘𝑗 is the estimate of 𝜌𝑗 by judge k, then 𝜌̂ 𝑘𝑗 = 𝜌𝑗 + 𝜖 𝑘𝑗 where 𝜖 𝑘𝑗 is the bias of judge41 k in estimating the ability required by item j. Similarly, the average threshold for rating category42 x across a population of judges is 𝜏̅ 𝑥, therefore, 𝜏 𝑘𝑥 = 𝜏̅ 𝑥 + 𝜂 𝑘𝑥 where 𝜂 𝑘𝑥 is the bias of judge k,43 relative to the average judge, in the choice of threshold for rating category x. In the case of44 therapists or proxies, the population of judges would refer to all therapists or to all proxies,45
  • 5. 5 respectively. If we define 𝜖̅𝑘 to be the average bias of judge k across items and 𝜂̅ 𝑘 to be the46 average bias of judge k across rating category thresholds, then we can re-express the bias terms47 as the sum of a fixed variable (average) and a random variable (), i.e., 𝜖 𝑘𝑗 = 𝜖̅𝑘 + 𝛿 𝜖 𝑘𝑗 and48 𝜂 𝑘𝑥 = 𝜂̅ 𝑘 + 𝛿 𝜂 𝑘𝑥 (if there is only a single judge contributing to the estimate of 𝜏̅ 𝑥, then 𝜂̅ 𝑘 = 0).49 In each case, the random variable has an expected value of zero and incorporates variance50 associated with real differences in bias between items and/or categories, estimation uncertainty,51 and parameter instability.52 The judge assigns rating category x to item j if the estimated functional reserve exceeds the53 judge’s criterion for category x (and all lower categories) and is less than the criterion for54 category x+1 (and all higher categories), i.e.,55 𝜏 𝑘1, ⋯ , 𝜏 𝑘𝑥 < 𝛼 𝑛 − 𝜌̂ 𝑘𝑗 < 𝜏 𝑘𝑥+1, ⋯, 𝜏 𝑘𝑚. (1a)56 Substituting the definitions presented in the preceding paragraph and, for judge k, combining the57 random variables into a single random term and combining the fixed bias variables into a single58 fixed term, expression (1a) can be expanded to make the fixed and random variables explicit, i.e.,59 𝜏̅1 + 𝛿 𝑘𝑗1, ⋯, 𝜏̅ 𝑥 + 𝛿 𝑘𝑗𝑥 < 𝛼 𝑛 − 𝜌𝑗 − 𝛽 𝑘 < 𝜏̅ 𝑥+1 + 𝛿 𝑘𝑗𝑥+1, ⋯, 𝜏̅ 𝑚 + 𝛿 𝑘𝑗𝑚 (1b)60 where 𝛿 𝑘𝑗𝑥 = 𝛿 𝜂 𝑘𝑥 + 𝛿 𝜖 𝑘𝑗 and 𝛽 𝑘 = 𝜖̅𝑘 + 𝜂̅ 𝑘. The judgment bias of judge k is summarized with61 the bias term 𝛽 𝑘 and the reliability of judge k is summarized by the variance of 𝛿 𝑘𝑗𝑥, which we62 designate as 𝜎𝑘𝑗𝑥 2 .63 Rasch analysis is used routinely to estimate the average expected rating category thresholds (𝜏̅𝑥64 for rating category x), the true person measures (𝛼 𝑛 for person n), and the true item measures (𝜌𝑗65 for item j) from distributions of observed ratings across persons and items.4 Judgment bias, 𝛽 𝑘,66 affects the accuracy of the estimates and the variance of the random terms, 𝜎𝑘𝑗𝑥 2 , affects67
  • 6. 6 estimation precision (i.e., reliability). Rasch models assume homogeneity of variance, i.e., 𝜎𝑘𝑗𝑥 2 is68 the same for all persons, items, and rating category thresholds (a requirement of unidimensional69 measures). Homogeneity of variance means that 𝜎𝑘𝑗𝑥 2 = 𝜎𝑘 2 . Rasch models also assume that the70 random terms are statistically independent of one another.4 Various statistical tests are used to71 evaluate how well the set of observed ratings conform to these assumptions of the Rasch model.472 In the case of patient self-judgment, when there are N patients there also are N judges. However,73 Rasch models typically (but not necessarily) assume that there is just a single judge, which in74 effect is the average of the judges. In this case, when 𝜎𝑘 2 is referring to the average of N judges, it75 must include variance between judges, 𝜎 𝑏 𝑛 2 , as well as variance within judges, 𝜎𝑛 2 . We therefore76 define the variance of the average patient judge to be77 𝜎𝑃 2 = 𝜎 𝑏 𝑛 2 + ∑ 𝜎𝑛 2 𝑁⁄𝑁 𝑛=1 , (2)78 the sum of between patient variance and average within patient variance. When a single therapist79 is the judge, the variance of the therapist can be attributed entirely to the variance within the80 judge, 𝜎 𝑇 2 = 𝜎𝑘 2 . To complete the definition of terms for our model, the fixed judgment bias of81 each patient is 𝛽 𝑛 and the fixed judgment bias of the therapist is 𝛽 𝑇.82 In practice, Rasch models normalize the estimated person and item measures to the square root83 of the judge’s variance and ignore the judge’s bias (unless made explicit in a facet model5). Thus,84 person measures estimated from patient self-judgments are expressed as85 𝛼̂ 𝑛𝑃 = ( 𝛼 𝑛 + 𝛽 𝑃) 𝜎𝑃⁄ (3)86 for person n, where 𝛽 𝑃 = ∑ 𝛽 𝑛 𝑁 𝑁 𝑛=1 , the average bias across patients. The person measures87 estimated from a therapist’s judgments are expressed as88 𝛼̂ 𝑛𝑇 = ( 𝛼 𝑛 + 𝛽 𝑇) 𝜎 𝑇⁄ (4)89
  • 7. 7 for the same person n. Because both eqs.(3) and (4) are linear functions of the true person90 measure, 𝛼 𝑛, we expect the relationship between person measures estimated from a therapist’s91 ratings and corresponding person measures estimated from patients rating themselves to be92 𝛼̂ 𝑛𝑇 = 𝜎 𝑃 𝜎 𝑇 𝛼̂ 𝑛𝑃 + 𝛽 𝑇−𝛽 𝑃 𝜎 𝑇 , (5)93 a linear relationship for which the slope is the ratio of the standard deviation for the average94 patient to the standard deviation for the therapist and the intercept is the weighted difference95 between therapist and average patient judgment biases.96 Methods97 ResearchDesign98 The present study is part of a larger observational study still in progress. Data reported here were99 collected pre and post usual occupational therapy intervention provided in the participant’s home100 by one occupational therapist who has specialty training in low vision rehabilitation and 12 years101 of experience providing rehabilitation services to home health care patients with low vision.102 Participants103 Eligibility criteria for the study were: 1) patients were new to the occupational therapist; 2)104 patients were adults admitted to home health care; 3) patients met the visual impairment105 diagnostic criteria for Medicare or other third party coverage of low vision rehabilitation106 services;6 and 4) patients understood English and had good enough hearing to be able to107 participate in telephone interviews. Forty-five low vision patients participated in this study.108 Procedures109 The study conformed to the tenets of the Declaration of Helsinki and was approved by the Johns110 Hopkins Institutional Review Board. After the patient consented to participate, one of the111
  • 8. 8 investigators administered the Activity Inventory (AI),7-9 an adaptive rating scale instrument, by112 telephone interview. Participants rated the importance of the 50 activity goals in the AI, and rated113 the difficulty of those goals that were rated to be at least “slightly important”. In the instructions114 to the participant, both importance and difficulty ratings were qualified as to be able to perform115 the activity “without depending on another person”. Goals included in this study were those that116 the participant also rated to be at least “slightly difficult”. In addition, participants rated the117 difficulties of tasks in the AI that are nested under goals that were rated to be at least slightly118 important and slightly difficult.119 At the time of the initial patient evaluation, the occupational therapist was provided with a list of120 the AI goals and subsidiary tasks that were rated by the participant to be at least slightly difficult,121 however, the actual ratings assigned by the participant to each goal and task were not revealed.122 After completing the initial patient evaluation, the occupational therapist assigned a FIM scale123 score3,10 to each of the participant-identified AI goals. Table 1 lists the FIM rating scale124 categories. The occupational therapist then developed the patient’s plan of care and provided125 rehabilitation services following usual procedures. At discharge the occupational therapist again126 used the FIM scale to rate the participant’s functional independence level for the same AI goals127 that were rated at the initial evaluation. The AI was re-administered to the participant by128 telephone interview one to two months after discharge from occupational therapy.129 Data Analysis130 Rasch analysis, using the Andrich rating scale model11 (Winsteps 3.6512), was employed to131 estimate the visual ability of each participant before and after rehabilitation on a continuous132 interval scale from the participants’ difficulty ratings of the AI goals. The item measures for the133 50 goals in the AI item bank and the response category thresholds for levels of difficulty were134
  • 9. 9 anchored to values estimated from the difficulty ratings of 3200 low vision patients.13 Rasch135 analysis also was performed on the FIM scale ratings of each patient’s AI goals by the136 occupational therapist using the same anchored item measures for the goals. In the case of137 analysis of FIM ratings, participant’s ratings obtained prior to the initial patient evaluation and138 ratings obtained post-discharge were stacked and analyzed together to estimate response139 category thresholds for the 7 FIM scale categories. An information-weighted mean square fit140 statistic (infit) and the standard error were estimated for each response category threshold and for141 each person measure.142 FIM score Description 1 Totally dependent – patient able to perform less than 25 % of the task 2 Maximal assistance required – patient able to perform 25% of the task 3 Moderate assistance required – patient able to perform 50% of the task 4 Minimal assistance required – patient able to perform 75% of the task 5 Supervision or set-up required – patient performs task without direct assistance 6 Modified independence – patient requires assistive equipment, more time, or safety concern 7 Independent – no assistance required, patient able to perform 100% of the task Table 1143 Functional Independence Measure (FIM) Scale Categories144 145 Results146 Participants147
  • 10. 10 Complete data were obtained from 41 of the 45 enrolled participants. All participants resided in148 Louisiana. Participants consisted of 15 males ( 33%) and 30 females ( 67%) between the ages of149 30 and 98 years old (median = 80, SD = 17). Measured binocular visual acuity with habitual150 correction ranged from 20/20 to 20/900 (median = 20/65, SD= 0.52 log MAR); 3 participants151 had no light perception in either eye and 2 participants had only light perception in the better eye.152 Among participants with measurable visual acuity, binocular log contrast sensitivity ranged from153 0.07 to 1.67 (normal>1.6; median = 1.02, SD = 0.44). For binocular central visual field measures154 (12.5o), 35% of participants had central scotomas (blind spots), 20% had hemi- or quad-field155 defects, 27% had contracted visual fields, and visual fields could not be performed on 18% .156 FIM Rating Scale Evaluation157 The therapist used all 7 of the FIM scale response categories to rate AI goals selected by158 participants at baseline and/or at follow-up. As shown in the Table 2 columns labeled Baseline159 Count and Follow-up Count, FIM scale scores of 4 or less were used most frequently at baseline160 and FIM scale scores of 5 or 6 were used most frequently at follow-up. The category threshold161 corresponds to the value of functional reserve (difference between the estimated person measure162 and estimated item measure) at which the probability of using FIM score x is equal to the163 probability of using FIM score x-1, for x = 2 to 7. The ordering of thresholds should agree with164 the ordering of the FIM scale scores. The thresholds are ordered for response categories 2165 through 6. The threshold for response category 7 is disordered. However, the assignment of FIM166 scale score 7 occurred rarely – it represents only 1.3% of the total number of FIM scale scores167 assigned.168 The Rasch model predicts the response category assigned to every combination of person and169 item measures. The residual is defined to be the difference between the FIM scale score observed170
  • 11. 11 for each person/item combination and the FIM scale score predicted for the corresponding171 person and item measure estimates. The infit mean square is the ratio of the observed sums of172 squared residuals for FIM ratings, which are expected to be distributed as 2, to the sums of173 squared residuals expected by the Rasch model, which corresponds to the expected value of 2.174 The expected value of 2 is equal to the degrees of freedom, thus, the infit mean square is175 expected to be distributed as 2/df, which in turn has an expected value of 1.0.4 The infit mean176 square is interpreted as the ratio of the observed variance in the residuals to the expected177 variance. Infit mean square values greater than 1.0 indicate that the observed variance is greater178 than expected. As can be seen in the last two columns of Table 2, the observed variance in179 residuals for response category 6 is more than twice the expected variance both at baseline and at180 follow-up. As a rule of thumb, infit mean squares greater than 1.3 are considered to be indicative181 of excessive observed variance.14 With that criterion, only FIM response categories 1 through 3182 at baseline and 4 and 5 at follow-up behave as expected by the Rasch model, which suggests183 inconsistency in the use of the other FIM response categories across patients and/or across items.184 Table 2185 Functional Independence Measure (FIM) response counts, estimated category thresholds in the Andrich186 model, and information-weighted mean square residuals (Infit) at baseline and follow-up by rating scale187 response category.188
  • 12. 12 189 Infit mean squares also were estimated for each participant at baseline by summing observed190 squared residuals and expected squared residuals across goals. For degrees of freedom of 25 or191 greater, the cube-root of the 2 distribution is well approximated by a normal distribution.15192 Therefore, the infit mean square for each participant was transformed to a standard normal193 deviate and expressed as a z-score.4 Figure 1 illustrates the distribution of infit z-scores on the194 abscissa and the distribution of person measures, i.e., estimated functional ability, on the ordinate195 for all 41 participants. The solid vertical line indicates the expected value of the infit z-score and196 the dashed vertical lines define the range of plus-and-minus two standard deviations from the197 expected value. The majority of participants’ infit mean square z-scores are symmetrically198 distributed about the expected value of 0 and fall in the expected range of +2 SD. These results199 are consistent with the expectations of a valid measure. However, there are seven clear outliers200 where the observed variance in the residuals is more than two standard deviations greater than201 the expected variance. The functional abilities of these outliers fall in the middle of the202 participants’ distribution of functional ability (on the vertical axis).203 Rating scale Baseline Follow-up Category Baseline Follow-up FIM Score Count Count threshold Infit Infit 1 103 28 NA 1.27 3.47 2 107 25 -2.88 1.2 3.17 3 124 16 -2.03 1.29 2.06 4 145 41 -1.11 1.61 1.01 5 31 123 1.55 1.71 0.91 6 4 212 2.83 2.25 2.14 7 3 10 1.63 1.75 2.83
  • 13. 13 204 Figure 1. Distribution of infit z-scores across items for each participant on the abscissa and the205 distribution of person measures on the ordinate.206 Comparison of Functional Ability Estimates from AI and FIM Ratings207 Because all AI item measures were anchored to calibrated values, i.e., 𝜌𝑗 in eq. (1b), person208 measure estimates from patients’ difficulty ratings and person measure estimates for the same209 patients from the therapist’s FIM ratings are expected to be in the same units of functional210 ability. However, the Andrich rating scale model assumes that the variance in judgment bias is211 constant, thereby normalizing the true values of functional ability, i.e., 𝛼 𝑛 in eq. (3) and eq. (4),212 to the standard deviation of judgment bias, i.e., 𝜎𝑃 in eq. (3) and 𝜎 𝑇 in eq. (4). Thus, we expect213 the standard errors of the two sets of estimated person measures to agree. There is no significant214 difference (paired t-test, p=0.93) between the standard error of the person measure estimated215 from patient difficulty ratings (mean = 0.414) and the standard error of the person measure216 estimated from therapist FIM ratings (mean = 0.415).217 -6 -5 -4 -3 -2 -1 0 1 -3 -2 -1 0 1 2 3 4 5 6 FIM-estimatedpersonmeasure(anchoredAIgoals) INFIT MNSQ (zstd)
  • 14. 14 It is possible that FIM ratings could be different enough from difficulty ratings that using item218 measures anchored with values estimated from difficulty ratings is not appropriate for the FIM219 scale. If so, variance in residuals should be greater for FIM ratings than for difficulty ratings.220 With the exception of the FIM outliers noted above, Figure 2 illustrates that the z-scores for221 transformed infit mean squares for the two sets of estimates of person measures at baseline are222 within the range of values expected by the 2 distribution (2 SD box).223 224 Figure 2. Z-scores for transformed infit mean squares for person measures estimated from therapist FIM225 ratings (ordinate) vs. transformed infit mean squats for person measures estimated from patients’226 difficulty ratings (abscissa).227 Measures of functional ability, both at baseline and post-discharge, were estimated from patients’228 difficulty ratings of those AI goals that were rated at baseline to be at least slightly important.229 Measures of functional ability also were estimated for the same patients at baseline and at230 discharge from the therapist’s ratings of the same set of AI goals for each patient using FIM scale231 scores. For measures based on patients’ difficulty ratings and measures based on the therapist’s232 FIM scale scores, the mean functional ability at baseline was subtracted from each corresponding233 baseline measure and the mean functional ability at post-discharge was subtracted from each234 -4 -2 0 2 4 6 8 10 -4 -2 0 2 4 6 8 10 INFITMNSQZSTD(FIM) INFIT MNSQ ZSTD (AI)
  • 15. 15 corresponding post-discharge measure. Figure 3 is a scatter plot comparing measures based on235 patients’ difficulty ratings of the important AI goals (abscissa) to the occupational therapist FIM236 scale ratings of the same AI goals (ordinate) for baseline (filled circles) and post-discharge (open237 circles) measures relative to their respective means. Bivariate linear regression, minimizing238 orthogonal distance of data points from the regression line (i.e., principal component), was239 performed on the combined baseline and post-discharge data. The slope of the regression line is240 1.96 and the intercept is -0.04. The Pearson correlation is 0.52.241 242 Figure 3. Comparing person measures based on patients’ difficulty ratings of important AI goals243 (abscissa) to occupational therapist FIM scale ratings of the same AI goals (ordinate) for baseline and244 post-discharge measures relative to their respective means.245 Figure 4 illustrates scatter plots of the unadjusted functional ability measures estimated from the246 occupational therapist FIM scale ratings of AI goals (ordinate) versus the unadjusted functional247 ability measures estimated from the patient’s difficulty ratings of the same AI goals (abscissa) at248 baseline (filled circles) and at post-discharge follow-up (open circles). The lines fit to the data by249 orthogonal regression have the same slope (1.96), which was estimated from the regression line250 fit to the combined data in Figure 3. The intercepts are -1.02 for the baseline measures and 1.63251 for the post-discharge measures. The dashed lines illustrate the respective mean functional ability252 -3 -2 -1 0 1 2 3 -2 -1.5 -1 -0.5 0 0.5 1 1.5 Functionalability(OTFIMscale)-Mean Functional ability (patient difficulty ratings) - Mean PRE POST
  • 16. 16 measures. The difference between the vertical dashed lines is the intervention effect (difference253 between the means) estimated from patient difficulty ratings (translates to Cohen’s effect size =254 0.49) and the difference between the horizontal dashed lines is the intervention effect estimated255 from the therapist’s FIM scale ratings (Cohen’s effect size = 3.28)256 257 Figure 4. Unadjusted functional ability measures estimated from the occupational therapist FIM scale258 ratings of AI goals (ordinate) versus unadjusted functional ability measures estimated from patient’s259 difficulty ratings of same AI goals (abscissa) at baseline (filled circles) and at post-discharge follow-up260 (open circles).261 Discussion and Conclusions262 The linear relationship between functional ability estimated from patient difficulty ratings and263 functional ability estimated from the therapist’s FIM scale ratings confirms the expectations of264 the model expressed by eqs. (3) and (4), which lead to the specific prediction of a linear function265 expressed by eq. (5). If we interpret the results in Figure 4 in terms of eq. (5), then we must266 conclude from the slope of the regression lines that 𝜎𝑃 = 1.96𝜎 𝑇, both at baseline and at post-267 discharge follow-up. This result means that the variance in bias for the average of the patients is268 nearly 4 times that of the within person variance in bias for our single therapist. If we can assume269 that the average variance in bias within patients is approximately the same as the within person270 -5 -4 -3 -2 -1 0 1 2 3 4 -2 -1.5 -1 -0.5 0 0.5 1 Functionalability(OTFIMscalerating) Functional ability (patient difficulty rating) PRE POST
  • 17. 17 variance in bias of our sole therapist, then in eq. (2), ∑ 𝜎𝑛 2 𝑁⁄𝑁 𝑛=1 ≅ 𝜎 𝑇 2 , and substituting 1.962 𝜎 𝑇 2 271 for 𝜎𝑃 2 in eq. (2), we obtain an estimate for the standard deviation of bias between-patients to be272 𝜎𝑏 𝑛 = 1.69𝜎 𝑇.273 From eq. (5), the intercepts of the regression lines in Figure (4) correspond to the difference274 between the fixed bias of the average patient and the therapist’s fixed bias, in within-therapist275 standard deviation units. The intercept for baseline measures indicates that fixed bias for the276 average patient, 𝛽 𝑃 is 1.02 logits greater than the therapist’s fixed bias, 𝛽 𝑇. However, post-277 discharge the therapist’s fixed bias is 1.63 logits greater than the fixed bias of the average278 patient. From the patients’ perspective, the therapist is underestimating patients’ functional279 abilities at baseline and overestimating patients’ functional abilities at post-discharge follow-up.280 From the therapist’s perspective, the patients are overestimating their functional abilities at281 baseline and underestimating their functional abilities at post-discharge follow-up.282 We cannot draw any conclusions from this study about why the difference between therapist and283 average patient bias is negative at baseline and positive at post-discharge follow-up. One could284 speculate that patients tend to be stoic and/or stubborn – underestimating the magnitude of their285 problems at baseline and underestimating improvements in their function at follow-up.286 Anecdotally, during evaluation therapists often see evidence of problems that patients deny or do287 not recognize (e.g., seeing pills on the floor, stained clothing, signs of poor hygiene). Therapists288 also report that patients may be able to perform a task after therapy, but refuse to accept the289 required adaptation as an improvement over dependency. From another viewpoint, a cynic might290 claim that the therapist is exaggerating the patient’s problems at baseline and exaggerating the291 success of therapy at follow-up, making the intervention look more effective than it actually is.292
  • 18. 18 However, in the final analysis we only can estimate differences between people in judgment293 biases – we cannot know their values relative to a ground truth.294 The purpose of this study has been to present and test a model of judgment bias and show how295 judgment bias can influence measures estimated by psychometric models from observer296 magnitude estimates. The observation of a linear relationship between continuous interval-scale297 measures estimated from ordinal patient ratings and equivalent measures estimated from ordinal298 therapist ratings confirms the linear prediction of the model. Grounded in a simple axiomatic299 scaling theory, the model provides plausible interpretations of the slopes and intercepts of the300 linear relationships in terms of fixed and random bias parameters. This model can be used as a301 tool to study the effects of independent variables on judgment bias or compare differences302 between judges.303
  • 19. 19
  • 20. 20 References 1. Owsley C, Sloane M, McGwin G Jr, Ball K. Timed instrumental activities of daily living tasks: relationship to cognitive function and everyday performance assessments in older adults. Gerontology 2002;48:254-265. 2. McHorney CA, Haley SM, Ware JE Jr. Evaluation of the MOS SF-36 Physical Functioning Scale (PF-10): II, Comparison of relative precision using Likert and Rasch scoring methods. J Clin Epidemiol. 1997;50:451-461. 3. Granger CV, Deutsch A, Linn RT. Rasch analysis of the Functional Independence Measure (FIM) Mastery Test. Arch Phys Med Rehabil. 1998;79:52-57. 4. Massof RW. Understanding Rasch and Item Response Theory models: Applications to the estimation and validation of interval latent trait measures from responses to rating scale questionnaires. Ophthal Epidemiol. 2011;18:1-19. 5. Fisher AG. The assessment of IADL motor skills: An application of many-faceted Rasch analysis. Am J Occup Ther. 1993;47:319-329. 6. U.S. Department of Health & Human Services, Centers for Medicare and Medicaid Services. (2002). Program memorandum intermediaries/carriers: Transmittal AB-02-078, May 29, 2002. Baltimore, MD: Government Printing Office. 7. Massof RW, Hsu CT, Baker FH, Barnett GD, Park WL, Deremeik JT, Rainey C, Epstein C. Visual disability variables. I: The importance and difficulty of activity goals for a sample of low vision patients. Arch Phys Med Rehabil. 2005;86:946-953. 8. Massof RW, Hsu CT, Baker FH, Barnett GD, Park WL, Deremeik JT, Rainey C, Epstein C. Visual disability variables. II: The difficulty of tasks for a sample of low vision patients. Arch Phys Med Rehabil. 2005;86:954-967.
  • 21. 21 9. Massof RW, Ahmadian L, Grover LL, Deremeik J T, Goldstein J E, Rainey C, Epstein C, Barnett GD. The Activity Inventory: an adaptive visual function questionnaire. Optom Vis Sci, 2007;84:763-774. 10. Centers for Medicare/Medicaid Services. (2004). The Inpatient Rehabilitation Facility- Patient Assessment Instrument Training Manual. Available from https://www.cms.gov/medicare/medicare-fee-for-service- payment/inpatientrehabfacpps/irfpai.html 11. Andrich D. A rating formulation for rating response categories. Psychometrika 1978;43:561- 573. 12. Lincare JM, Wright BD. A user's guide to Winsteps. Rasch model computer program: Chicago, IL: MESA Press. 2001. 13. Goldstein JE, Chun MW, Fletcher DC, Deremeik JT, Massof RW. Visual ability of patients seeking outpatient low vision services in the United States. JAMA Ophthalmol 2014;132;1169-1177. 14. Bond, T., & Fox , C. M. Applying the Rasch model: Fundamental measurement in the human sciences. (2 Ed.). New York, NY: Routledge, 2007. 15. Wilson EB, Hilferty MM. The distribution of chi-square. Proc Natl Acad Sci USA 1931;17:684-688.