5. 5
Norm-Referenced Reliability
Estimated by:Estimated by:
Criterion-Referenced Reliability
Estimated by:Estimated by:
Test-retestTest-retest Parallel formParallel form Internal
consistency
Internal
consistency
Test-retestTest-retest Parallel formParallel form Inter-rater and
intra-ratter
Inter-rater and
intra-ratter
6. 6
Test-retestTest-retest Parallel formParallel form Internal consistencyInternal consistency
-For affective
measures
1. Administer
instrument under
standardized
conditions.
2. Re-administer
instrument with
same conditions.
3. Determine the
correlation
between the two
scores.
-For affective
measures
1. Administer
instrument under
standardized
conditions.
2. Re-administer
instrument with
same conditions.
3. Determine the
correlation
between the two
scores.
-Administer the two
forms of instrument and
identify the correlation
between them.
-The two forms should
have equal mean and
standard deviation, equal
correlation with a third
variable, and being
constructed with the
same objective and
procedure.
- Can assess equivalence
& stability.
-Administer the two
forms of instrument and
identify the correlation
between them.
-The two forms should
have equal mean and
standard deviation, equal
correlation with a third
variable, and being
constructed with the
same objective and
procedure.
- Can assess equivalence
& stability.
-For cognitive
measures.
•Consistency of a
single measure on
one occasion.
•Cronbach’s alpha
is the preferred
index of internal
consistency
reliability. Why?
•KR 20 & KR 21
are special cases of
alpha, used when
data are
dichotomous
-For cognitive
measures.
•Consistency of a
single measure on
one occasion.
•Cronbach’s alpha
is the preferred
index of internal
consistency
reliability. Why?
•KR 20 & KR 21
are special cases of
alpha, used when
data are
dichotomous
7. 7
αα coefficientcoefficient
•Equal to the mean of
the all possible split-half
coefficients associated
with a set of data
• Indicator for the
consistency of items in
the same instrument.
•Affected by test length,
total test variance, and
shape of the resulting
distribution of test
scores, and response
rate.
•Equal to the mean of
the all possible split-half
coefficients associated
with a set of data
• Indicator for the
consistency of items in
the same instrument.
•Affected by test length,
total test variance, and
shape of the resulting
distribution of test
scores, and response
rate.
15. 15
•The two parallel forms assess the same content domain and haveThe two parallel forms assess the same content domain and have
relatively homogeneous items.relatively homogeneous items.
16. 16
Inter-rater and intra-ratter/Inter-rater and intra-ratter/ Criterion-Referenced
Reliability
•Error of standardsError of standards
•Halo errorHalo error
•Logic errorLogic error
•Similarity errorSimilarity error
•Central tendency errorCentral tendency error
Expected Rating errorsExpected Rating errors
29. 29
1. Reliability1. Reliability
3.Construct
Validity
3.Construct
Validity
4.Discriminant
Validity
4.Discriminant
Validity
2. Convergent
Validity
2. Convergent
Validity
SameSame
ConstructConstruct
DifferentDifferent
ConstructsConstructs
SameSame
MethodMethod
DifferentDifferent
MethodsMethods
Method 1: rating scale
Method 2: checklist
ConstructConstruct 1: bonding
ConstructConstruct 2: perinatal care
1- Reliability should be high as a
prerequisite for validity
2- Convergent validity should be
high (correlation between
different methods measuring the
same construct).
3- Construct validity is evidenced
when heterotrait-monomethod
correlations be lower than
correlations mentioned in point 2
(function of trait not method).
4- Discriminant validity is
evidenced when heterotrait-
heteromethod correlations are the
lowest among all previously
mentioned correlations.
Example
This example can be applied for more than 2 constructs
Multitrait- Multimethod
Approach
36. 36
Item
No.
Proportion of Correct answers in
Group
Item Discrimination
Index (D)
(range from -1.00 to
+1.00)
Upper 1/4 Lower 1/4
1 90% 20% 0.7
2 80% 70% 0.1
3 100% 0% 1
4 100% 100% 0
5 50% 50% 0
6 20% 60%
- 0.4
Adapted from : www. distance.fsu.edu/docs/
A negative D value usually indicates that an item is
faulty and needs improvement because the item is
not discriminating in the same way as the total test.
A positive
D value is
desirable
•D values greater than +0.20 are desirable for a norm-referenced measure.
41. 41
1- Content specialists1- Content specialists 2-Determination of
Interrater Agreement
2-Determination of
Interrater Agreement
3-Average
Congruency/Percentag
e
3-Average
Congruency/Percentag
e
•Two or more content
specialists examine the
format and content of
each item.
•Item-objective
congruence focuses on
content validity at the
item level.
•If more than one
objective is used for a
measure, the items that
are measures of each
objective usually are
treated separately.
•Two or more content
specialists examine the
format and content of
each item.
•Item-objective
congruence focuses on
content validity at the
item level.
•If more than one
objective is used for a
measure, the items that
are measures of each
objective usually are
treated separately.
•Interrater Agreement
can be evaluated by:
•1- The index of
content validity (CVI).
•2- P0 and K as
measures of inter-rater
agreement with
acceptable levels (P0 ≥
0.80, & K ≥ 0.25).
• Too low P0 and K
are indicators of ???
•Interrater Agreement
can be evaluated by:
•1- The index of
content validity (CVI).
•2- P0 and K as
measures of inter-rater
agreement with
acceptable levels (P0 ≥
0.80, & K ≥ 0.25).
• Too low P0 and K
are indicators of ???
The percentage of
items rated congruent
by each judge is
calculated.
• The mean
percentage for all
judges is the “average
congruency
percentage”.
• An average
congruency
percentage of 90% or
higher is acceptable .
The percentage of
items rated congruent
by each judge is
calculated.
• The mean
percentage for all
judges is the “average
congruency
percentage”.
• An average
congruency
percentage of 90% or
higher is acceptable .
44. 44
1-Item-Objective or Item-
Subscale Congruence
1-Item-Objective or Item-
Subscale Congruence
2-Item Difficulty2-Item Difficulty 3-Item Discrimination3-Item Discrimination
-Based on the ratings of
two or more content
specialists who assign a
value of +1 (definitely
measure), 0 (undecided),
or -1 (not a measure) for
each item upon the
item’s congruence with
the measure’s objective
-The Index is computed
based on formula (6.1) ,
& ranges from (-1 to
+1).
-Based on the ratings of
two or more content
specialists who assign a
value of +1 (definitely
measure), 0 (undecided),
or -1 (not a measure) for
each item upon the
item’s congruence with
the measure’s objective
-The Index is computed
based on formula (6.1) ,
& ranges from (-1 to
+1).
•Item p level is
calculated for each
item.
•The item p level
should be higher for
the group that is
known to possess
more of a specified
trait or attribute
than for the group
known to possess
less
•Item p level is
calculated for each
item.
•The item p level
should be higher for
the group that is
known to possess
more of a specified
trait or attribute
than for the group
known to possess
less
•Focus on
measurement of
performance
changes (e.g.,
pretest/posttest) or
differences (e.g.,
experienced/
inexperienced)
between the groups.
-Referred to as (D‘)
is directly related to
the property of
decision validity.
•Focus on
measurement of
performance
changes (e.g.,
pretest/posttest) or
differences (e.g.,
experienced/
inexperienced)
between the groups.
-Referred to as (D‘)
is directly related to
the property of
decision validity.
45. 45
•A useful adjunct item-discrimination index is provided through the use of Po or K
•Usually a negative discrimination index is due to a faulty item.
46. 46
1. McBride, D. L., LeVasseur, S. A., & Li, D. (2013). Development
and Validation of a Web-Based Survey on the Use of Personal
Communication Devices by Hospital Registered Nurses: Pilot Study.
JMIR research protocols, 2(2).
2. Sriratanaprapat, J., Chaowalit, A., & Suttharangsee, W. (2012).
Development and Psychometric Evaluation of the Thai Nurses' Job
Satisfaction Scale. Pacific Rim International Journal of Nursing
Research, 16(3).
3. Yildirim, Y., Tokem, Y., Bozkurt, N., Fadiloglu, C., Uyar, M., &
Uslu, R. (2011). Reliability and validity of the Turkish Version of
the Memorial Symptom Assessment Scale in cancer patients. Asian
Pacific Journal of Cancer Prevention, 12, 3389-3396.
Thank you!