This document discusses the concept of reliability in psychological testing. It defines reliability and explains sources of error variance like test construction, administration, and scoring. It then describes different methods used to estimate reliability, including test-retest, parallel forms, split-half, and internal consistency. Guidelines are provided for interpreting reliability coefficients depending on the purpose and importance of the test. The true score model of measurement is introduced along with alternatives like generalizability theory and item response theory.
2. Outline
THE CONCEPT OF RELIABILITY
- Sources of Error Variance
RELIABILITY ESTIMATES
- Test-Retest Reliability Estimates
- Parallel-Forms & Alternate-Forms Reliability
Estimates
- Split-Half Reliability Estimates
- Other Methods of Estimating Internal Consistency
- Measures of Inter-Scorer Reliability
3. Outline
USING & INTERPRETING A COEFFICIENT OF
RELIABILITY
- The Purpose of the Reliability Coefficient
- The Nature of the Test
- The True Score Model of Measurement &
Alternatives to It
15. Parallel-Forms & Alternate-
Forms Reliability Estimates
Parallel forms (of a test)- each form of the test,
the means the variances of observed test scores
are equal
Alternate forms (of a test)- simply different
versions of a test that have been constructed so as
to be parallel
Example: Variable such as CONTENT & LEVEL OF
DIFFICULTY
17. Other Methods of Estimating
Internal Consistency
• KUDER-RICHARDSON FORMULAS
• CRONBACH ALPHA
• AVERAGE PROPORTIONAL DISTANCE (APD)
18. Measures of Inter-Scorer
Reliability
• Is the degree of agreement or consistency
between two or more scores (or judges or
raters) with regard to a particular measure
20. Guide question…
How high should the coefficient of reliability be?
Answer:
1. “on a continuum relative to the purpose and
importance of the decisions to be made on the
basis of scores on the test”
2. A .95 or higher (important decisions)
B .85 to .90
B- .75 to 80s
F .74 and below (barely passing)
22. Nature of the test
•Homogenous or heterogeneous
•Dynamic or static
•Restriction or inflation of range
•Speed test or power test
•Criterion referenced tests
23. The true score model of
measurement and Alternatives
to it
• Classical test theory
• Domain sampling theory and
generalizability theory
• Item response theory
LAY MAN- synonym of dependability or consistency
PSYCHOMETRICS- refers to the consistency in measurement
In everyday conversation reliability always connotes something positive, in psychometric sense it really only refers to something that is consistent
Reliability coefficient- is an index of reliability, a proportion that indicates the ratio between the true score variance on test and the total variance
RANDOM ERROR- sudden noise pollution, school rally, truck sound
SYSTEMATIC ERROR- the use of ruler that is a tenth of one inch longer. The weighing scale of the biggest loser
One source of variance during test construction is ITEM SAMPLING or CONTENT SAMPLING
FROM THE PERSPECTIVE OF A TEST CREATOR, A CHALLENGE IN TEST DEVELOPMENT IS TO MAXIMIZE THE PROPORTION OF THE TOTAL VARIANCE THAT IS TRUE VARIANCE AND TO MINIMIZE THE PROPORTION OF THE TOTAL VARIANCE THAT IS ERROR VARIANCE
3 bullets in this slide
TE- room temperature, level of lighting, ventilation & noise
TTV- emotional problems, physical discomfort, lack of sleep & drugs or medication
ERV- physical appearance & demeanor, presence or absence of examiner
COMPUTER SCORING- in the advent of modern technology, computer-scorable items have virtually eliminated error variance
SUBJECTIVE TEST- well trained professional should be very objective about it
ASSESSMENT PURPOSES- assessment as we all know is crucial hence systematic and objective decision making must be observed (i.e group co.leader in group process)
Next slide is reliability estimates
Methods that doesn’t qualify under the diferent types of error variance
A quote before the Reliability Estimates
Its hard to know the true variance of a certain test or assessment method so what we can do best is to estimate as much as we can
COEFFICIENT OF STABILITY- when the interval between testing is greater than six months, the estimate of test-retest reliability is often refered as coefficient of stability
SIMILARITY
Two test with the same administrations with the same group are required
Test scores may be affected by factors such as motivation, fatigue, or any intervening events
SPEARMAN BROWN- psychometric reliability to test length and used by psychometricians to predict the reliability of a test after changing the test length
1. Essentially it lets you know whether the exam as a whole discriminated among students who mastered the subject matter and those who did not.
The KR(20) generally ranges between 0.0 and +1.0, but it can fall below 0.0 with smaller sample sizes.
2. Corellation of 2 test that measures the same construct
3. The APD is a measure that focuses on the degree of difference that exists between item scores
NEXT OUTLINE NEXT SLIDE
Restriction or inflation of range- is the correlation analysis restricted by the sampling procedure used
- Is the range of variances employed is appropriate to the objective of the correlational analysis