2. OUTLINE: What is
reliability of a
scale?
What are the broad
approaches to
reliability? –
repeatability and
internal consistency
How is repeatability or
reproducibility
assessed? – for
categorical variables
and continuous
variables
3. Definition of Reliability: Reliability usually
“refers to the consistency of scores obtained by
the same persons when they are reexamined
with the same test on different occasions, or
with different sets of equivalent items, or under
other variable examining conditions
(Anastasi & Urbina, 1997)
4. Reliability
• The extent to which a scale provide
the same numerical score each time it is
administered, provided no true change
has actually happened is called reliability
• Reliability is a necessary but not
sufficient consideration in scale
development
5. Approaches to reliability testing
Reliability
Internal
Consistency
Cronbach’s
Alpha
Split Half
Reliability
Repeatability
Test-retest
reliability
Inter observer
reliability
Alternative
forms
reliability
6. Reliability
• Internal consistency reliability is a measure of how well the items on
a test measure the same construct or idea.
Cronbach’s Alpha
Split Half Reliability
• Repeatability measures the variation in measurements taken by a
single instrument or person under the same conditions.
(Reproducibility measures whether an entire study or experiment can
be reproduced in its entirety)
7.
8. Statistical Assessment of Repeatability
Repeatability
Continuous
Variables
Intra class
Correlation
Coefficient (ICC)
Bland Altmann’s
Plot for
agreement
Categorical
Variables
Kappa statistic
for agreement
9. Intra Class Correlation (ICC)
• When one is interested in the relationship between variables of a common class, one uses an
Intraclass Correlation Coefficient
• It is, as a general matter, the ratio of two variances:
Variance due to rated subjects (patients) ICC
= --------------------------------------------------------------
(Variance due to subjects + Variance due to Judges + Residual
Variance)
ICC < 0.7 – low reliability
ICC 0.7 – 0.89 – moderate to good reliability
ICC > 0.89 – Excellent reliability
11. • The Kappa statistic (or value) is a metric that compares an Observed Accuracy
with an Expected Accuracy (random chance).
• Kappa coefficient (κ) is a statistic which measures inter-rater agreement for
qualitative (categorical) items.
Kappa Statistics
12. Kappa Statistics
Kppa = observed agreement – chance agreement / 1- chance agreement
Observed agreement Pr(a) = a + d / a+b+c+d
Expected agreement Pr(e) = [(a+b) * (a+c) / a+b+c+d] + [(c+d) * (b+d) / a+b+c+d] / (a+b+c+d)
k = Pr(a) – Pr(e) / 1 – Pr(e)
Kappa < 0.4 – poor reliability
Kappa 0.41 – 0.74 – moderate to good reliability
Kappa >0.74 – Excellent reliability
Rater 1 – positive Rater 1 – negative
Rater 2 - positive a b
Rater 2 - negative c d