This document presents a model for comparing measures of functional ability estimated from patient self-ratings to measures estimated from therapist ratings. The model accounts for judgment bias and variance in ratings between patients and therapists. The study applied the model to data from 45 home health patients rated on functional activities by patients and therapists. It found a linear relationship between measures from patient and therapist ratings, with different intercepts reflecting differences in judgment bias. The slope indicated between-patient variance was almost 3 times greater than within-therapist variance.
Data Reduction and Classification for Lumosity DataYao Yao
In 2015 researchers at Lumos Labs, the Lumosity cognitive training games platform maker, sought to determine if cognitive training (via the Lumosity platform) would result in cognitive performance gains. Can the randomization grouping of participants in the original study be predicted? Utilizing cognitive ability measurements, participant activity measurements, and participants’ ages, we attempt to predict randomization grouping utilizing linear discriminant analysis and principal component analysis techniques.
https://github.com/yaowser/LDA-PCA-Lumosity-Categorical-Prediction
PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...ijmvsc
In recent years, India’s service industry is developing rapidly. The objective of the study is to explore the
dimensions of customer perceived service quality in the context of the Indian banking industry. In order to
categorize the customer needs into quality dimensions, Factor analysis (FA) has been carried out on
customer responses obtained through questionnaire survey. Analytic Hierarchy Process (AHP) is employed
to determine the weights of the banking service quality dimensions. The priority structure of the quality
dimensions provides an idea for the Banking management to allocate the resources in an effective manner
to achieve more customer satisfaction. Technique for Order Preference Similarity to Ideal Solution
(TOPSIS) is used to obtain final ranking of different branches.
Data Reduction and Classification for Lumosity DataYao Yao
In 2015 researchers at Lumos Labs, the Lumosity cognitive training games platform maker, sought to determine if cognitive training (via the Lumosity platform) would result in cognitive performance gains. Can the randomization grouping of participants in the original study be predicted? Utilizing cognitive ability measurements, participant activity measurements, and participants’ ages, we attempt to predict randomization grouping utilizing linear discriminant analysis and principal component analysis techniques.
https://github.com/yaowser/LDA-PCA-Lumosity-Categorical-Prediction
PRIORITIZING THE BANKING SERVICE QUALITY OF DIFFERENT BRANCHES USING FACTOR A...ijmvsc
In recent years, India’s service industry is developing rapidly. The objective of the study is to explore the
dimensions of customer perceived service quality in the context of the Indian banking industry. In order to
categorize the customer needs into quality dimensions, Factor analysis (FA) has been carried out on
customer responses obtained through questionnaire survey. Analytic Hierarchy Process (AHP) is employed
to determine the weights of the banking service quality dimensions. The priority structure of the quality
dimensions provides an idea for the Banking management to allocate the resources in an effective manner
to achieve more customer satisfaction. Technique for Order Preference Similarity to Ideal Solution
(TOPSIS) is used to obtain final ranking of different branches.
Hypothesis is usually considered as the principal instrument in research and quality control. Its main function is to suggest new experiments and observations. In fact, many experiments are carried out with the deliberate object of testing hypothesis. Decision makers often face situations wherein they are interested in testing hypothesis on the basis of available information and then take decisions on the basis of such testing. In Six –Sigma methodology, hypothesis testing is a tool of substance and used in analysis phase of the six sigma project so that improvement can be done in right direction
Very useful reliability analysis available in this file.Through this you can also enable to explain the reliability of all variables and Reliability should be high for further procedure.
Webinar slides- alternatives to the p-value and power nQuery
What are the alternatives to the p-value & power? What is the next step for sample size determination? We will explore these issues in this free webinar presented by nQuery
Summary of current research on routine outcome measurement, feedback, the validity, reliability, and effectiveness of the ORS and SRS (or PCOMS Outcome Management System)
Avoid overfitting in precision medicine: How to use cross-validation to relia...Nicole Krämer
The identification of patient subgroups who may derive benefit from a treatment is of crucial importance in precision medicine. Many different algorithms have been proposed and studied in the literature.
We illustrate that many of these algorithms overfit in the sense that the treatment benefit for the identified patients is substantially overestimated. Further, we show that with cross-validation, it is possible to obtain more realistic estimates.
2020 trends in biostatistics what you should know about study design - slid...nQuery
2020 Trends In Biostatistics - What you should know about study design.
In this free webinar you will learn about:
-Adaptive designs in confirmatory trials
-Using external data in study planning
-Innovative designs in early-stage trials
To watch the full webinar:
https://www.statsols.com/webinar/2020-trends-in-biostatistics-what-you-should-know-about-study-design
Hypothesis is usually considered as the principal instrument in research and quality control. Its main function is to suggest new experiments and observations. In fact, many experiments are carried out with the deliberate object of testing hypothesis. Decision makers often face situations wherein they are interested in testing hypothesis on the basis of available information and then take decisions on the basis of such testing. In Six –Sigma methodology, hypothesis testing is a tool of substance and used in analysis phase of the six sigma project so that improvement can be done in right direction
Very useful reliability analysis available in this file.Through this you can also enable to explain the reliability of all variables and Reliability should be high for further procedure.
Webinar slides- alternatives to the p-value and power nQuery
What are the alternatives to the p-value & power? What is the next step for sample size determination? We will explore these issues in this free webinar presented by nQuery
Summary of current research on routine outcome measurement, feedback, the validity, reliability, and effectiveness of the ORS and SRS (or PCOMS Outcome Management System)
Avoid overfitting in precision medicine: How to use cross-validation to relia...Nicole Krämer
The identification of patient subgroups who may derive benefit from a treatment is of crucial importance in precision medicine. Many different algorithms have been proposed and studied in the literature.
We illustrate that many of these algorithms overfit in the sense that the treatment benefit for the identified patients is substantially overestimated. Further, we show that with cross-validation, it is possible to obtain more realistic estimates.
2020 trends in biostatistics what you should know about study design - slid...nQuery
2020 Trends In Biostatistics - What you should know about study design.
In this free webinar you will learn about:
-Adaptive designs in confirmatory trials
-Using external data in study planning
-Innovative designs in early-stage trials
To watch the full webinar:
https://www.statsols.com/webinar/2020-trends-in-biostatistics-what-you-should-know-about-study-design
Basic pension questions answered. Easy to understand, no jargon, plain English around the topic of Pensions for young people. Understanding what the difference between state pension and private pension. I'm young, what does it mean to me? Pension pros and cons.
Crowdfunding for Sustainable Entrepreneurship and InnovationCrowdsourcing Week
By Walter Vassallo and Chiara Candelise. Presented at Crowdsourcing Week Europe 2016. For more information and details on our next event, visit www.crowdsourcingweek.com.
Discriminant analysis is a technique that is used by the researcher to analyze the research data when the criterion or the dependent variable is categorical and the predictor or the independent variable is the interval in nature. The term categorical variable means that the predictor variable is divided into a number of categories.
DA is typically used when the groups are already defined prior to the study.
The end result of DA is a model that can be used for the prediction of group memberships. This model allows us to understand the relationship between the set of selected variables and the observations. Furthermore, this model will enable one to assess the contributions of different variables.
Data Processing and Statistical Treatment: Spreads and CorrelationJanet Penilla
A hyperlinked presentation. The objectives of the topic were written. The presentation was started with the variance and then the standard deviation provided with examples. It also answers on when to use the sample standard deviation and the population standard deviation or what type of data should we use when we calculate a standard deviation. The presentation also includes Correlations and other correlation techniques(Pearson-product moment correlation; Spearman - rank order correlation coefficient; t-test for correlation).
36086 Topic Discussion3Number of Pages 2 (Double Spaced).docxrhetttrevannion
36086 Topic: Discussion3
Number of Pages: 2 (Double Spaced)
Number of sources: 1
Writing Style: APA
Type of document: Essay
Academic Level:Master
Category: Psychology
Language Style: English (U.S.)
Order Instructions: Attached
I will upload the instructions
Reference/Module
Learning Objectives
•Explain what the x2 goodness-of-fit test is and what it does.
•Calculate a x2 goodness-of-fit test.
•List the assumptions of the x2 goodness-of-fit test.
•Calculate the x2 test of independence.
•Interpret the x2 test of independence.
•Explain the assumptions of the x2 test of independence.
The Chi-Square (x2) Goodness-of-Fit test: What It Is and What It Does
The chi-square (x2) goodness-of-fit test is used for comparing categorical information against what we would expect based on previous knowledge. As such, it tests what are called observed frequencies (the frequency with which participants fall into a category) against expected frequencies (the frequency expected in a category if the sample data represent the population). It is a nondirectional test, meaning that the alternative hypothesis is neither one-tailed nor two-tailed. The alternative hypothesis for a x2 goodness-of-fit test is that the observed data do not fit the expected frequencies for the population, and the null hypothesis is that they do fit the expected frequencies for the population. There is no conventional way to write these hypotheses in symbols, as we have done with the previous statistical tests. To illustrate the x2 goodness-of-fit test, let's look at a situation in which its use would be appropriate.
chi-square (x2) goodness-of-fit test A nonparametric inferential procedure that determines how well an observed frequency distribution fits an expected distribution.
observed frequencies The frequency with which participants fall into a category.
expected frequencies The frequency expected in a category if the sample data represent the population.
Calculations for the x2 Goodness-of-Fit Test
Suppose that a researcher is interested in determining whether the teenage pregnancy rate at a particular high school is different from the rate statewide. Assume that the rate statewide is 17%. A random sample of 80 female students is selected from the target high school. Seven of the students are either pregnant now or have been pregnant previously. The χ2goodness-of-fit test measures the observed frequencies against the expected frequencies. The observed and expected frequencies are presented in Table 21.1.
TABLE 21.1Observed and expected frequencies for χ2 goodness-of-fit example
FREQUENCIES
PREGNANT
NOT PREGNANT
Observed
7
73
Expected
14
66
As can be seen in the table, the observed frequencies represent the number of high school females in the sample of 80 who were pregnant versus not pregnant. The expected frequencies represent what we would expect based on chance, given what is known about the population. In this case, we would expect 17% of the females to be pregnant .
EXERCISE 24 UNDERSTANDING PEARSONS r, EFFECT SIZE, AND PERCEN.docxSANSKAR20
EXERCISE 24 UNDERSTANDING PEARSON'S r, EFFECT SIZE, AND PERCENTAGE OF VARIANCE EXPLAINED
STATISTICAL TECHNIQUE IN REVIEW
Review the statistical information regarding Pearson's Product-Moment Correlation Coefficient presented in Exercise 23. In this exercise, you
will need to apply that information to gain an understanding of interpreting Pearson r results presented in a mirror-image table. A mirror-image
table, as the name implies, has the same labels in the same order for both the x- and y-axes. Frequently, letters or numbers are assigned to each
label, and only the letter or number designator is used to label one of the axes. To find the r value for a pair of variables, look both along the
labeled or y-axis in the table below and then along the x-axis, using the letter designator assigned to the variable you want to know the
relationship for, and find the cell in the table with the r value. Below is an example of a mirror-image table that compares hours of class attended,
hours studying, and final grade as a percentage. The results in the table are intended as an example of a mirror-image table and are not based on
research. If you were asked to identify the r value for the relationship between hours of class attended and the final grade as a percentage, the
answer would be r = 0.72, and between hours studying and final grade as a percentage, the answer would be r = 0.78. The dash (–) marks located
on the diagonal line of the table represent the variable's correlation with itself, which is always a perfect positive correlation or r = +1.00.
VARIABLES A B C
A. Hours of class attended – 0.44 0.72
B. Hours studying 0.44 – 0.78
C. Final grade as a percentage 0.72 0.78 –
Effect Size of an r Value
In determining the strength of a relationship, remember that a weak relationship is r < 0.3 or r < −0.3, a moderate relationship is r = 0.3 to 0.5 or
−0.3 to −0.5, and a strong relationship is r > 0.5 or > −0.5. The r value is equal to the effect size or the strength of a relationship. In the table
above, the relationship between hours of class attended and hours of studying is r = 0.44 and the effect size = 0.44. The effect size is used in
power analysis to determine sample size for future studies. The strength of the effect size is the same as that for the r values, with a weak effect
size < 0.3 or < −0.3, a moderate effect size 0.3 to 0.5 or −0.3 to −0.5, and a strong effect size > 0.5 or > −0.5. The smaller the effect size, the
greater the sample size needed to detect significant relationships in future studies. Thus the larger the effect size, the smaller the sample size that
is needed to determine significant relationships. The determination of study sample sizes with power analysis is presented in Exercise 12.
173
174
Percentage of Variance Explained in a Relationship
Percentage of variance explained is a calculation based on a Pearson's r value. The purpose for calculating the percentage of variance expla ...
Comparison of therapist to patient judgment bias in low vision
1. 1
Therapist Judgment Bias and Reliability Relative to that of Patients in
the Estimation of Functional Ability from Ordinal Ratings
Robert W. Massof,1 Theresa M. Smith,2 Lisa S. Foret,3 Guy Davis,3 and Kyoko Fujiwara1
1Lions Vision Research and Rehabilitation Center, Wilmer Eye Institute, Johns Hopkins
University School of Medicine
2Department of Occupational Therapy and Rehabilitation Sciences, University of Texas
Medical Branch Galveston
3Evangeline Home Health, Lake Charles, LA
Supported by grant EY022322 from the National Eye Institute, National Institutes of Health,
Bethesda, MD.
2. 2
Abstract
Objective: To present and evaluate a measurement model for estimating the judgment bias of
therapists and patients when rating functional ability. Design: Observational study of the
agreement between therapist ratings and patient self-ratings of functional ability. Setting:
Measures made by telephone interview and in the patient’s home. Participants: Forty-five home
health care patients who have a secondary diagnosis of low vision. Main Outcome Measures:
Functional ability estimated from Rasch analysis of patient difficulty ratings of calibrated items
(activity goals) in the Activity Inventory (AI) and therapist ratings using a FIM scale of the same
activity goals, both at initial evaluation and again after discharge. Results: A linear relationship
was observed between functional ability measures estimated from therapist ratings and measures
estimated from patient self-ratings with the same slope, but different intercepts, for measures
obtained at baseline and at post-rehabilitation follow-up. Conclusions: The observed linear
relationship between measures estimated from therapist ratings and measures estimated from
patient ratings confirms the model prediction. The intercept corresponds to the difference
between the therapist’s judgment bias and the average judgment bias of all patients. Relative to
patient judgments, the therapist’s estimate of functional ability at baseline was less than the
patients’ estimates; it was greater than the patients’ estimates at follow-up. The slope of the line
corresponds to the square root of the ratio of the between-patient plus within-patient variance in
judgment bias to the within-therapist variance in judgment bias. The results indicate that
between-patient variance is almost 3 times the within-therapist variance.
3. 3
1
Introduction2
Rehabilitation medicine employs three different approaches to estimate the functional ability of3
patients: 1) measures of task performance time and/or accuracy;1 2) patient ratings of their own4
ability and/or frequency of performing activities;2 and 3) ratings by a therapist or proxy of a5
patient’s ability and/or frequency of performing activities.3 Functional ability is a trait of the6
patient. Task performance time and accuracy, patient ratings, and therapist ratings only are7
indicators of functional ability. Measurements of functional ability per se must be inferred from8
the observed indicators. Because functional ability is a property of the patient, valid and unbiased9
measures of functional ability estimated from the three different approaches should agree.10
Measurement validity refers to the accuracy of the assumption that the estimated measure is11
linear with the magnitude of the variable of interest. Measurement bias refers to the agreement12
(or disagreement) between different measures of the same variable when the variable magnitude13
has not changed between measures. In the case of functional ability, measurement validity and14
bias can be influenced by the sample of activities selected for observation and, in the case of15
ratings, by properties of the judge.16
This paper is concerned with comparing functional ability measures estimated from ratings by17
patients to functional ability measures estimated from ratings by a therapist. More specifically,18
this paper focuses on the estimation of relative biases and measurement uncertainties of judges19
when comparing functional ability measures estimated from a therapist’s judgments to functional20
ability measures estimated from patient judgments of themselves. We first present a model of21
patient self-ratings and a parallel model of therapist ratings of the patient, explicitly identifying22
respective biases and sources of variance in the observations, and show how the two sets of23
4. 4
ratings are related. We then test the model with a substantive example using low vision24
rehabilitation of visually impaired home health care patients.25
Model of Patient Self-Ratings and Therapist Ratings26
Using ordered rating scale categories (e.g., level of “difficulty” or level of “independence”), both27
the patient and the therapist are asked to judge the patient’s ability to perform specific activities,28
referred to as “items”. The true ability of patient n, which we are attempting to estimate from the29
patient’s and therapist’s ratings, is 𝛼 𝑛. The ability required to perform each of the items, 𝜌𝑗 for30
item j, is a property of the item that is independent of the judge (whether patient or therapist).31
The model assumes that both the patient and therapist are judging the magnitude of the patient’s32
functional reserve for the activity described by the item, which is the difference between the33
ability of patient n and the ability required by item j, i.e., 𝛼 𝑛 − 𝜌𝑗 . Both the patient and therapist34
are instructed in the use of the ratings, but they develop their own criteria for each rating35
category that they will assign to a patient/item pair. These criteria, or “thresholds”, can be36
thought of as boundaries between neighboring categories on a continuous functional reserve37
scale. The thresholds are denoted as 𝜏 𝑘𝑥 for the boundary set by judge k between rating category38
x-1 and rating category x (k n in the case of patient self-judgment).39
Although the value of 𝜌𝑗 is independent of the judge, judges’ estimates of 𝜌𝑗 are likely to be40
biased. If 𝜌̂ 𝑘𝑗 is the estimate of 𝜌𝑗 by judge k, then 𝜌̂ 𝑘𝑗 = 𝜌𝑗 + 𝜖 𝑘𝑗 where 𝜖 𝑘𝑗 is the bias of judge41
k in estimating the ability required by item j. Similarly, the average threshold for rating category42
x across a population of judges is 𝜏̅ 𝑥, therefore, 𝜏 𝑘𝑥 = 𝜏̅ 𝑥 + 𝜂 𝑘𝑥 where 𝜂 𝑘𝑥 is the bias of judge k,43
relative to the average judge, in the choice of threshold for rating category x. In the case of44
therapists or proxies, the population of judges would refer to all therapists or to all proxies,45
5. 5
respectively. If we define 𝜖̅𝑘 to be the average bias of judge k across items and 𝜂̅ 𝑘 to be the46
average bias of judge k across rating category thresholds, then we can re-express the bias terms47
as the sum of a fixed variable (average) and a random variable (), i.e., 𝜖 𝑘𝑗 = 𝜖̅𝑘 + 𝛿 𝜖 𝑘𝑗
and48
𝜂 𝑘𝑥 = 𝜂̅ 𝑘 + 𝛿 𝜂 𝑘𝑥
(if there is only a single judge contributing to the estimate of 𝜏̅ 𝑥, then 𝜂̅ 𝑘 = 0).49
In each case, the random variable has an expected value of zero and incorporates variance50
associated with real differences in bias between items and/or categories, estimation uncertainty,51
and parameter instability.52
The judge assigns rating category x to item j if the estimated functional reserve exceeds the53
judge’s criterion for category x (and all lower categories) and is less than the criterion for54
category x+1 (and all higher categories), i.e.,55
𝜏 𝑘1, ⋯ , 𝜏 𝑘𝑥 < 𝛼 𝑛 − 𝜌̂ 𝑘𝑗 < 𝜏 𝑘𝑥+1, ⋯, 𝜏 𝑘𝑚. (1a)56
Substituting the definitions presented in the preceding paragraph and, for judge k, combining the57
random variables into a single random term and combining the fixed bias variables into a single58
fixed term, expression (1a) can be expanded to make the fixed and random variables explicit, i.e.,59
𝜏̅1 + 𝛿 𝑘𝑗1, ⋯, 𝜏̅ 𝑥 + 𝛿 𝑘𝑗𝑥 < 𝛼 𝑛 − 𝜌𝑗 − 𝛽 𝑘 < 𝜏̅ 𝑥+1 + 𝛿 𝑘𝑗𝑥+1, ⋯, 𝜏̅ 𝑚 + 𝛿 𝑘𝑗𝑚 (1b)60
where 𝛿 𝑘𝑗𝑥 = 𝛿 𝜂 𝑘𝑥
+ 𝛿 𝜖 𝑘𝑗
and 𝛽 𝑘 = 𝜖̅𝑘 + 𝜂̅ 𝑘. The judgment bias of judge k is summarized with61
the bias term 𝛽 𝑘 and the reliability of judge k is summarized by the variance of 𝛿 𝑘𝑗𝑥, which we62
designate as 𝜎𝑘𝑗𝑥
2
.63
Rasch analysis is used routinely to estimate the average expected rating category thresholds (𝜏̅𝑥64
for rating category x), the true person measures (𝛼 𝑛 for person n), and the true item measures (𝜌𝑗65
for item j) from distributions of observed ratings across persons and items.4 Judgment bias, 𝛽 𝑘,66
affects the accuracy of the estimates and the variance of the random terms, 𝜎𝑘𝑗𝑥
2
, affects67
6. 6
estimation precision (i.e., reliability). Rasch models assume homogeneity of variance, i.e., 𝜎𝑘𝑗𝑥
2
is68
the same for all persons, items, and rating category thresholds (a requirement of unidimensional69
measures). Homogeneity of variance means that 𝜎𝑘𝑗𝑥
2
= 𝜎𝑘
2
. Rasch models also assume that the70
random terms are statistically independent of one another.4 Various statistical tests are used to71
evaluate how well the set of observed ratings conform to these assumptions of the Rasch model.472
In the case of patient self-judgment, when there are N patients there also are N judges. However,73
Rasch models typically (but not necessarily) assume that there is just a single judge, which in74
effect is the average of the judges. In this case, when 𝜎𝑘
2
is referring to the average of N judges, it75
must include variance between judges, 𝜎 𝑏 𝑛
2
, as well as variance within judges, 𝜎𝑛
2
. We therefore76
define the variance of the average patient judge to be77
𝜎𝑃
2
= 𝜎 𝑏 𝑛
2
+ ∑ 𝜎𝑛
2
𝑁⁄𝑁
𝑛=1 , (2)78
the sum of between patient variance and average within patient variance. When a single therapist79
is the judge, the variance of the therapist can be attributed entirely to the variance within the80
judge, 𝜎 𝑇
2
= 𝜎𝑘
2
. To complete the definition of terms for our model, the fixed judgment bias of81
each patient is 𝛽 𝑛 and the fixed judgment bias of the therapist is 𝛽 𝑇.82
In practice, Rasch models normalize the estimated person and item measures to the square root83
of the judge’s variance and ignore the judge’s bias (unless made explicit in a facet model5). Thus,84
person measures estimated from patient self-judgments are expressed as85
𝛼̂ 𝑛𝑃 = ( 𝛼 𝑛 + 𝛽 𝑃) 𝜎𝑃⁄ (3)86
for person n, where 𝛽 𝑃 = ∑
𝛽 𝑛
𝑁
𝑁
𝑛=1 , the average bias across patients. The person measures87
estimated from a therapist’s judgments are expressed as88
𝛼̂ 𝑛𝑇 = ( 𝛼 𝑛 + 𝛽 𝑇) 𝜎 𝑇⁄ (4)89
7. 7
for the same person n. Because both eqs.(3) and (4) are linear functions of the true person90
measure, 𝛼 𝑛, we expect the relationship between person measures estimated from a therapist’s91
ratings and corresponding person measures estimated from patients rating themselves to be92
𝛼̂ 𝑛𝑇 =
𝜎 𝑃
𝜎 𝑇
𝛼̂ 𝑛𝑃 +
𝛽 𝑇−𝛽 𝑃
𝜎 𝑇
, (5)93
a linear relationship for which the slope is the ratio of the standard deviation for the average94
patient to the standard deviation for the therapist and the intercept is the weighted difference95
between therapist and average patient judgment biases.96
Methods97
ResearchDesign98
The present study is part of a larger observational study still in progress. Data reported here were99
collected pre and post usual occupational therapy intervention provided in the participant’s home100
by one occupational therapist who has specialty training in low vision rehabilitation and 12 years101
of experience providing rehabilitation services to home health care patients with low vision.102
Participants103
Eligibility criteria for the study were: 1) patients were new to the occupational therapist; 2)104
patients were adults admitted to home health care; 3) patients met the visual impairment105
diagnostic criteria for Medicare or other third party coverage of low vision rehabilitation106
services;6 and 4) patients understood English and had good enough hearing to be able to107
participate in telephone interviews. Forty-five low vision patients participated in this study.108
Procedures109
The study conformed to the tenets of the Declaration of Helsinki and was approved by the Johns110
Hopkins Institutional Review Board. After the patient consented to participate, one of the111
8. 8
investigators administered the Activity Inventory (AI),7-9 an adaptive rating scale instrument, by112
telephone interview. Participants rated the importance of the 50 activity goals in the AI, and rated113
the difficulty of those goals that were rated to be at least “slightly important”. In the instructions114
to the participant, both importance and difficulty ratings were qualified as to be able to perform115
the activity “without depending on another person”. Goals included in this study were those that116
the participant also rated to be at least “slightly difficult”. In addition, participants rated the117
difficulties of tasks in the AI that are nested under goals that were rated to be at least slightly118
important and slightly difficult.119
At the time of the initial patient evaluation, the occupational therapist was provided with a list of120
the AI goals and subsidiary tasks that were rated by the participant to be at least slightly difficult,121
however, the actual ratings assigned by the participant to each goal and task were not revealed.122
After completing the initial patient evaluation, the occupational therapist assigned a FIM scale123
score3,10 to each of the participant-identified AI goals. Table 1 lists the FIM rating scale124
categories. The occupational therapist then developed the patient’s plan of care and provided125
rehabilitation services following usual procedures. At discharge the occupational therapist again126
used the FIM scale to rate the participant’s functional independence level for the same AI goals127
that were rated at the initial evaluation. The AI was re-administered to the participant by128
telephone interview one to two months after discharge from occupational therapy.129
Data Analysis130
Rasch analysis, using the Andrich rating scale model11 (Winsteps 3.6512), was employed to131
estimate the visual ability of each participant before and after rehabilitation on a continuous132
interval scale from the participants’ difficulty ratings of the AI goals. The item measures for the133
50 goals in the AI item bank and the response category thresholds for levels of difficulty were134
9. 9
anchored to values estimated from the difficulty ratings of 3200 low vision patients.13 Rasch135
analysis also was performed on the FIM scale ratings of each patient’s AI goals by the136
occupational therapist using the same anchored item measures for the goals. In the case of137
analysis of FIM ratings, participant’s ratings obtained prior to the initial patient evaluation and138
ratings obtained post-discharge were stacked and analyzed together to estimate response139
category thresholds for the 7 FIM scale categories. An information-weighted mean square fit140
statistic (infit) and the standard error were estimated for each response category threshold and for141
each person measure.142
FIM
score
Description
1 Totally dependent – patient able to perform less than 25 % of the task
2 Maximal assistance required – patient able to perform 25% of the task
3 Moderate assistance required – patient able to perform 50% of the task
4 Minimal assistance required – patient able to perform 75% of the task
5 Supervision or set-up required – patient performs task without direct assistance
6 Modified independence – patient requires assistive equipment, more time, or safety
concern
7 Independent – no assistance required, patient able to perform 100% of the task
Table 1143
Functional Independence Measure (FIM) Scale Categories144
145
Results146
Participants147
10. 10
Complete data were obtained from 41 of the 45 enrolled participants. All participants resided in148
Louisiana. Participants consisted of 15 males ( 33%) and 30 females ( 67%) between the ages of149
30 and 98 years old (median = 80, SD = 17). Measured binocular visual acuity with habitual150
correction ranged from 20/20 to 20/900 (median = 20/65, SD= 0.52 log MAR); 3 participants151
had no light perception in either eye and 2 participants had only light perception in the better eye.152
Among participants with measurable visual acuity, binocular log contrast sensitivity ranged from153
0.07 to 1.67 (normal>1.6; median = 1.02, SD = 0.44). For binocular central visual field measures154
(12.5o), 35% of participants had central scotomas (blind spots), 20% had hemi- or quad-field155
defects, 27% had contracted visual fields, and visual fields could not be performed on 18% .156
FIM Rating Scale Evaluation157
The therapist used all 7 of the FIM scale response categories to rate AI goals selected by158
participants at baseline and/or at follow-up. As shown in the Table 2 columns labeled Baseline159
Count and Follow-up Count, FIM scale scores of 4 or less were used most frequently at baseline160
and FIM scale scores of 5 or 6 were used most frequently at follow-up. The category threshold161
corresponds to the value of functional reserve (difference between the estimated person measure162
and estimated item measure) at which the probability of using FIM score x is equal to the163
probability of using FIM score x-1, for x = 2 to 7. The ordering of thresholds should agree with164
the ordering of the FIM scale scores. The thresholds are ordered for response categories 2165
through 6. The threshold for response category 7 is disordered. However, the assignment of FIM166
scale score 7 occurred rarely – it represents only 1.3% of the total number of FIM scale scores167
assigned.168
The Rasch model predicts the response category assigned to every combination of person and169
item measures. The residual is defined to be the difference between the FIM scale score observed170
11. 11
for each person/item combination and the FIM scale score predicted for the corresponding171
person and item measure estimates. The infit mean square is the ratio of the observed sums of172
squared residuals for FIM ratings, which are expected to be distributed as 2, to the sums of173
squared residuals expected by the Rasch model, which corresponds to the expected value of 2.174
The expected value of 2 is equal to the degrees of freedom, thus, the infit mean square is175
expected to be distributed as 2/df, which in turn has an expected value of 1.0.4 The infit mean176
square is interpreted as the ratio of the observed variance in the residuals to the expected177
variance. Infit mean square values greater than 1.0 indicate that the observed variance is greater178
than expected. As can be seen in the last two columns of Table 2, the observed variance in179
residuals for response category 6 is more than twice the expected variance both at baseline and at180
follow-up. As a rule of thumb, infit mean squares greater than 1.3 are considered to be indicative181
of excessive observed variance.14 With that criterion, only FIM response categories 1 through 3182
at baseline and 4 and 5 at follow-up behave as expected by the Rasch model, which suggests183
inconsistency in the use of the other FIM response categories across patients and/or across items.184
Table 2185
Functional Independence Measure (FIM) response counts, estimated category thresholds in the Andrich186
model, and information-weighted mean square residuals (Infit) at baseline and follow-up by rating scale187
response category.188
12. 12
189
Infit mean squares also were estimated for each participant at baseline by summing observed190
squared residuals and expected squared residuals across goals. For degrees of freedom of 25 or191
greater, the cube-root of the 2 distribution is well approximated by a normal distribution.15192
Therefore, the infit mean square for each participant was transformed to a standard normal193
deviate and expressed as a z-score.4 Figure 1 illustrates the distribution of infit z-scores on the194
abscissa and the distribution of person measures, i.e., estimated functional ability, on the ordinate195
for all 41 participants. The solid vertical line indicates the expected value of the infit z-score and196
the dashed vertical lines define the range of plus-and-minus two standard deviations from the197
expected value. The majority of participants’ infit mean square z-scores are symmetrically198
distributed about the expected value of 0 and fall in the expected range of +2 SD. These results199
are consistent with the expectations of a valid measure. However, there are seven clear outliers200
where the observed variance in the residuals is more than two standard deviations greater than201
the expected variance. The functional abilities of these outliers fall in the middle of the202
participants’ distribution of functional ability (on the vertical axis).203
Rating scale Baseline Follow-up Category Baseline Follow-up
FIM Score Count Count threshold Infit Infit
1 103 28 NA 1.27 3.47
2 107 25 -2.88 1.2 3.17
3 124 16 -2.03 1.29 2.06
4 145 41 -1.11 1.61 1.01
5 31 123 1.55 1.71 0.91
6 4 212 2.83 2.25 2.14
7 3 10 1.63 1.75 2.83
13. 13
204
Figure 1. Distribution of infit z-scores across items for each participant on the abscissa and the205
distribution of person measures on the ordinate.206
Comparison of Functional Ability Estimates from AI and FIM Ratings207
Because all AI item measures were anchored to calibrated values, i.e., 𝜌𝑗 in eq. (1b), person208
measure estimates from patients’ difficulty ratings and person measure estimates for the same209
patients from the therapist’s FIM ratings are expected to be in the same units of functional210
ability. However, the Andrich rating scale model assumes that the variance in judgment bias is211
constant, thereby normalizing the true values of functional ability, i.e., 𝛼 𝑛 in eq. (3) and eq. (4),212
to the standard deviation of judgment bias, i.e., 𝜎𝑃 in eq. (3) and 𝜎 𝑇 in eq. (4). Thus, we expect213
the standard errors of the two sets of estimated person measures to agree. There is no significant214
difference (paired t-test, p=0.93) between the standard error of the person measure estimated215
from patient difficulty ratings (mean = 0.414) and the standard error of the person measure216
estimated from therapist FIM ratings (mean = 0.415).217
-6
-5
-4
-3
-2
-1
0
1
-3 -2 -1 0 1 2 3 4 5 6
FIM-estimatedpersonmeasure(anchoredAIgoals)
INFIT MNSQ (zstd)
14. 14
It is possible that FIM ratings could be different enough from difficulty ratings that using item218
measures anchored with values estimated from difficulty ratings is not appropriate for the FIM219
scale. If so, variance in residuals should be greater for FIM ratings than for difficulty ratings.220
With the exception of the FIM outliers noted above, Figure 2 illustrates that the z-scores for221
transformed infit mean squares for the two sets of estimates of person measures at baseline are222
within the range of values expected by the 2 distribution (2 SD box).223
224
Figure 2. Z-scores for transformed infit mean squares for person measures estimated from therapist FIM225
ratings (ordinate) vs. transformed infit mean squats for person measures estimated from patients’226
difficulty ratings (abscissa).227
Measures of functional ability, both at baseline and post-discharge, were estimated from patients’228
difficulty ratings of those AI goals that were rated at baseline to be at least slightly important.229
Measures of functional ability also were estimated for the same patients at baseline and at230
discharge from the therapist’s ratings of the same set of AI goals for each patient using FIM scale231
scores. For measures based on patients’ difficulty ratings and measures based on the therapist’s232
FIM scale scores, the mean functional ability at baseline was subtracted from each corresponding233
baseline measure and the mean functional ability at post-discharge was subtracted from each234
-4
-2
0
2
4
6
8
10
-4 -2 0 2 4 6 8 10
INFITMNSQZSTD(FIM)
INFIT MNSQ ZSTD (AI)
15. 15
corresponding post-discharge measure. Figure 3 is a scatter plot comparing measures based on235
patients’ difficulty ratings of the important AI goals (abscissa) to the occupational therapist FIM236
scale ratings of the same AI goals (ordinate) for baseline (filled circles) and post-discharge (open237
circles) measures relative to their respective means. Bivariate linear regression, minimizing238
orthogonal distance of data points from the regression line (i.e., principal component), was239
performed on the combined baseline and post-discharge data. The slope of the regression line is240
1.96 and the intercept is -0.04. The Pearson correlation is 0.52.241
242
Figure 3. Comparing person measures based on patients’ difficulty ratings of important AI goals243
(abscissa) to occupational therapist FIM scale ratings of the same AI goals (ordinate) for baseline and244
post-discharge measures relative to their respective means.245
Figure 4 illustrates scatter plots of the unadjusted functional ability measures estimated from the246
occupational therapist FIM scale ratings of AI goals (ordinate) versus the unadjusted functional247
ability measures estimated from the patient’s difficulty ratings of the same AI goals (abscissa) at248
baseline (filled circles) and at post-discharge follow-up (open circles). The lines fit to the data by249
orthogonal regression have the same slope (1.96), which was estimated from the regression line250
fit to the combined data in Figure 3. The intercepts are -1.02 for the baseline measures and 1.63251
for the post-discharge measures. The dashed lines illustrate the respective mean functional ability252
-3
-2
-1
0
1
2
3
-2 -1.5 -1 -0.5 0 0.5 1 1.5
Functionalability(OTFIMscale)-Mean
Functional ability (patient difficulty ratings) - Mean
PRE
POST
16. 16
measures. The difference between the vertical dashed lines is the intervention effect (difference253
between the means) estimated from patient difficulty ratings (translates to Cohen’s effect size =254
0.49) and the difference between the horizontal dashed lines is the intervention effect estimated255
from the therapist’s FIM scale ratings (Cohen’s effect size = 3.28)256
257
Figure 4. Unadjusted functional ability measures estimated from the occupational therapist FIM scale258
ratings of AI goals (ordinate) versus unadjusted functional ability measures estimated from patient’s259
difficulty ratings of same AI goals (abscissa) at baseline (filled circles) and at post-discharge follow-up260
(open circles).261
Discussion and Conclusions262
The linear relationship between functional ability estimated from patient difficulty ratings and263
functional ability estimated from the therapist’s FIM scale ratings confirms the expectations of264
the model expressed by eqs. (3) and (4), which lead to the specific prediction of a linear function265
expressed by eq. (5). If we interpret the results in Figure 4 in terms of eq. (5), then we must266
conclude from the slope of the regression lines that 𝜎𝑃 = 1.96𝜎 𝑇, both at baseline and at post-267
discharge follow-up. This result means that the variance in bias for the average of the patients is268
nearly 4 times that of the within person variance in bias for our single therapist. If we can assume269
that the average variance in bias within patients is approximately the same as the within person270
-5
-4
-3
-2
-1
0
1
2
3
4
-2 -1.5 -1 -0.5 0 0.5 1
Functionalability(OTFIMscalerating)
Functional ability (patient difficulty rating)
PRE
POST
17. 17
variance in bias of our sole therapist, then in eq. (2), ∑ 𝜎𝑛
2
𝑁⁄𝑁
𝑛=1 ≅ 𝜎 𝑇
2
, and substituting 1.962
𝜎 𝑇
2
271
for 𝜎𝑃
2
in eq. (2), we obtain an estimate for the standard deviation of bias between-patients to be272
𝜎𝑏 𝑛
= 1.69𝜎 𝑇.273
From eq. (5), the intercepts of the regression lines in Figure (4) correspond to the difference274
between the fixed bias of the average patient and the therapist’s fixed bias, in within-therapist275
standard deviation units. The intercept for baseline measures indicates that fixed bias for the276
average patient, 𝛽 𝑃 is 1.02 logits greater than the therapist’s fixed bias, 𝛽 𝑇. However, post-277
discharge the therapist’s fixed bias is 1.63 logits greater than the fixed bias of the average278
patient. From the patients’ perspective, the therapist is underestimating patients’ functional279
abilities at baseline and overestimating patients’ functional abilities at post-discharge follow-up.280
From the therapist’s perspective, the patients are overestimating their functional abilities at281
baseline and underestimating their functional abilities at post-discharge follow-up.282
We cannot draw any conclusions from this study about why the difference between therapist and283
average patient bias is negative at baseline and positive at post-discharge follow-up. One could284
speculate that patients tend to be stoic and/or stubborn – underestimating the magnitude of their285
problems at baseline and underestimating improvements in their function at follow-up.286
Anecdotally, during evaluation therapists often see evidence of problems that patients deny or do287
not recognize (e.g., seeing pills on the floor, stained clothing, signs of poor hygiene). Therapists288
also report that patients may be able to perform a task after therapy, but refuse to accept the289
required adaptation as an improvement over dependency. From another viewpoint, a cynic might290
claim that the therapist is exaggerating the patient’s problems at baseline and exaggerating the291
success of therapy at follow-up, making the intervention look more effective than it actually is.292
18. 18
However, in the final analysis we only can estimate differences between people in judgment293
biases – we cannot know their values relative to a ground truth.294
The purpose of this study has been to present and test a model of judgment bias and show how295
judgment bias can influence measures estimated by psychometric models from observer296
magnitude estimates. The observation of a linear relationship between continuous interval-scale297
measures estimated from ordinal patient ratings and equivalent measures estimated from ordinal298
therapist ratings confirms the linear prediction of the model. Grounded in a simple axiomatic299
scaling theory, the model provides plausible interpretations of the slopes and intercepts of the300
linear relationships in terms of fixed and random bias parameters. This model can be used as a301
tool to study the effects of independent variables on judgment bias or compare differences302
between judges.303
20. 20
References
1. Owsley C, Sloane M, McGwin G Jr, Ball K. Timed instrumental activities of daily living
tasks: relationship to cognitive function and everyday performance assessments in older
adults. Gerontology 2002;48:254-265.
2. McHorney CA, Haley SM, Ware JE Jr. Evaluation of the MOS SF-36 Physical Functioning
Scale (PF-10): II, Comparison of relative precision using Likert and Rasch scoring methods.
J Clin Epidemiol. 1997;50:451-461.
3. Granger CV, Deutsch A, Linn RT. Rasch analysis of the Functional Independence Measure
(FIM) Mastery Test. Arch Phys Med Rehabil. 1998;79:52-57.
4. Massof RW. Understanding Rasch and Item Response Theory models: Applications to the
estimation and validation of interval latent trait measures from responses to rating scale
questionnaires. Ophthal Epidemiol. 2011;18:1-19.
5. Fisher AG. The assessment of IADL motor skills: An application of many-faceted Rasch
analysis. Am J Occup Ther. 1993;47:319-329.
6. U.S. Department of Health & Human Services, Centers for Medicare and Medicaid Services.
(2002). Program memorandum intermediaries/carriers: Transmittal AB-02-078, May 29,
2002. Baltimore, MD: Government Printing Office.
7. Massof RW, Hsu CT, Baker FH, Barnett GD, Park WL, Deremeik JT, Rainey C, Epstein C.
Visual disability variables. I: The importance and difficulty of activity goals for a sample of
low vision patients. Arch Phys Med Rehabil. 2005;86:946-953.
8. Massof RW, Hsu CT, Baker FH, Barnett GD, Park WL, Deremeik JT, Rainey C, Epstein C.
Visual disability variables. II: The difficulty of tasks for a sample of low vision patients.
Arch Phys Med Rehabil. 2005;86:954-967.
21. 21
9. Massof RW, Ahmadian L, Grover LL, Deremeik J T, Goldstein J E, Rainey C, Epstein C,
Barnett GD. The Activity Inventory: an adaptive visual function questionnaire. Optom Vis
Sci, 2007;84:763-774.
10. Centers for Medicare/Medicaid Services. (2004). The Inpatient Rehabilitation Facility-
Patient Assessment Instrument Training Manual. Available from
https://www.cms.gov/medicare/medicare-fee-for-service-
payment/inpatientrehabfacpps/irfpai.html
11. Andrich D. A rating formulation for rating response categories. Psychometrika 1978;43:561-
573.
12. Lincare JM, Wright BD. A user's guide to Winsteps. Rasch model computer program:
Chicago, IL: MESA Press. 2001.
13. Goldstein JE, Chun MW, Fletcher DC, Deremeik JT, Massof RW. Visual ability of patients
seeking outpatient low vision services in the United States. JAMA Ophthalmol
2014;132;1169-1177.
14. Bond, T., & Fox , C. M. Applying the Rasch model: Fundamental measurement in the human
sciences. (2 Ed.). New York, NY: Routledge, 2007.
15. Wilson EB, Hilferty MM. The distribution of chi-square. Proc Natl Acad Sci USA
1931;17:684-688.