Calculate & Improve Test Reliability

Contents:
 Reliability Overview.
 How to estimate reliability (different types of
reliability).
 Calculating and computing reliability.
 Factors affecting test reliability.
 Improving test reliability.
 The relationship between validity and
reliability.

Definition of Reliability:
• Reliability is the extent to which the test is consistent in
measuring whatever it does measure.
(Sims, 2015)
• Reliability refers to measure which is reliable to the extent that
independent but comparable measures of the same trait or
construct of a given object agree.
(Zhu & Han, 2011)
• Reliability indicates whether a measurement device can
measure the same characteristics over and over the time.
(Blerkom, 2009)
• The reliable test is consistent and dependable.
(Brown, 2004)

THE THEORITICAL CONCEPT OF
RELIABILITY:
• Observed scores: scores that the students actually obtain.
• True scores: scores that students should get if the measurement
device worked perfectly.
• Error variance, also known as measurement error, is anything that
causes a student’s observed score to differ from his or her true
score.
Variance
Observed
= Variance
True
+ Variance
Error

Reliability = VarianceTrue VarianceObserved
10 = 7 + 3
Reliability = 7 10 = .70
70% variaiton in true scores.
30% variation in measurement errors

• The variation that
contributes to unreliability
• The change that represent
a regress to one’s score.
unsystematic
variation
• The changes that represent
a steady increase to one’s
score.
• This variation contributes in
reliability.
systematic
variation

• Reliability: 0 1
• r = 0 = minimum unreliable
• r = 1= maximale completly reliable
• Standard error of measurement SEM =
Standard deviation of all error score obtained
from a given measure in differnet situations.

Types of riliability:
1- Test-retest reliability:
• Test-retest reliability is concerned with the
test’s stability over time.
(Blerkom, 2009)
• For measuring reliability for two tests, we use
the pearson correlation coefficient.
• For measuring more than two tests, we use
intraclass correlation that can be used also for
two tests.

The pearson correlation:
• The pearson correlation is the most commonly
used measure of correlation, sometimes called
the pearson product moment correlation
• It is simply the average of the sum of the Z
score products and it measures the strenght of
linear relationship between characteristics.

INTRA CLASS CORRELATION
USING SPSS:

2- Parallel-form reliability:
• Determines if two forms are equivalent
• More common for standardized tests
• Can be measured with the correlation
coefficient.
Self esteem
Version 1
Self esteem
Version 2
reliable

3- Internal consistency
reliability:
-Split-half reliability:
• In Split-half reliability separate the test into halves and
then compare each half.
• The most common approach is to split the test into two
tests by separating the odd- numbered and even-
numbered items
• The KR-20 can be used to calculate the reliability for
the tests that have a right answer, it can not be used for
used for likert scale.

KR-20 FORMULA
• KR-20 is a formulas developed by (Kuder &
Richardson, 1937) to measure the internal
consistency realibility.

Split-half reliability
Likert scale:
• Likert scale is the test that uses the likert scale that does not
have just one answer.
Split half reliability for the tests using likert scale can be
estimated by coefficent alpha (Cronbach’s alfa). It is easier to be
claculated using SPSS.
• Spearman Brown formula is the easiest one to be calculated
by hand but less accurate.

4- Inter and intera rater
reliability:
• Inter-rater reliability occurs when two or more
scorers yield inconsistent scores of same test
(Brown, 2004)
• To calculate the inter and intera rater reliability
we can use:
• Cohen’s krappa (ICC) for two raters
• Intera-class correlation

Kappa coefficients:
• The kappa coeficient is used in inter rater
reliability to indicate the extent of agreement
between the raters
• Kappa is used to measure agreement between
two individuals, that is why it is used in the
interrater reliability calculation. The value of
Kappa K is:
• Po – Pe
1 – Pe

Improving test reliability:
• Using consistent criteria for a correct response
• Promoting students with good test environment
conditions
• Providing a good administration for the test
• Give uniform attention to those sets throughout
the evaluation time.
• Read through tests at list twice to check for
your consistency.
• If you have “mid- stream” modifications of what
you consider as a correct response, go back
and apply the same standards to all.

THE RELATIONSHIP BETWEEN
RELIABILITY AND VALIDITY:
• Reliability is necessary for validity
• The test can not be valid unless it is reliable
• Not every test that is reliable means that is valid
• Detremining the reliability of a test is an
important first step, but not the defining step, in
determining the validity of a test.

Calculate & Improve Test Reliability

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Calculate & Improve Test Reliability

Similar to Calculate & Improve Test Reliability (20)

More from Djihad .B

More from Djihad .B (19)

Recently uploaded

Recently uploaded (20)

Calculate & Improve Test Reliability