Relaibility
Contents:
 Reliability Overview.
 How to estimate reliability (different types of
reliability).
 Calculating and computing reliability.
 Factors affecting test reliability.
 Improving test reliability.
 The relationship between validity and
reliability.
Definition of Reliability:
• Reliability is the extent to which the test is consistent in
measuring whatever it does measure.
(Sims, 2015)
• Reliability refers to measure which is reliable to the extent that
independent but comparable measures of the same trait or
construct of a given object agree.
(Zhu & Han, 2011)
• Reliability indicates whether a measurement device can
measure the same characteristics over and over the time.
(Blerkom, 2009)
• The reliable test is consistent and dependable.
(Brown, 2004)
THE THEORITICAL CONCEPT OF
RELIABILITY:
• Observed scores: scores that the students actually obtain.
• True scores: scores that students should get if the measurement
device worked perfectly.
• Error variance, also known as measurement error, is anything that
causes a student’s observed score to differ from his or her true
score.
Variance
Observed
= Variance
True
+ Variance
Error
Reliability = VarianceTrue VarianceObserved
10 = 7 + 3
Reliability = 7 10 = .70
70% variaiton in true scores.
30% variation in measurement errors
• The variation that
contributes to unreliability
• The change that represent
a regress to one’s score.
unsystematic
variation
• The changes that represent
a steady increase to one’s
score.
• This variation contributes in
reliability.
systematic
variation
• Reliability: 0 1
• r = 0 = minimum unreliable
• r = 1= maximale completly reliable
• Standard error of measurement SEM =
Standard deviation of all error score obtained
from a given measure in differnet situations.
Types of riliability:
1- Test-retest reliability:
• Test-retest reliability is concerned with the
test’s stability over time.
(Blerkom, 2009)
• For measuring reliability for two tests, we use
the pearson correlation coefficient.
• For measuring more than two tests, we use
intraclass correlation that can be used also for
two tests.
The pearson correlation:
• The pearson correlation is the most commonly
used measure of correlation, sometimes called
the pearson product moment correlation
• It is simply the average of the sum of the Z
score products and it measures the strenght of
linear relationship between characteristics.
INTRA CLASS CORRELATION
USING SPSS:
2- Parallel-form reliability:
• Determines if two forms are equivalent
• More common for standardized tests
• Can be measured with the correlation
coefficient.
Self esteem
Version 1
Self esteem
Version 2
reliable
3- Internal consistency
reliability:
-Split-half reliability:
• In Split-half reliability separate the test into halves and
then compare each half.
• The most common approach is to split the test into two
tests by separating the odd- numbered and even-
numbered items
• The KR-20 can be used to calculate the reliability for
the tests that have a right answer, it can not be used for
used for likert scale.
KR-20 FORMULA
• KR-20 is a formulas developed by (Kuder &
Richardson, 1937) to measure the internal
consistency realibility.
Split-half reliability
Likert scale:
• Likert scale is the test that uses the likert scale that does not
have just one answer.
Split half reliability for the tests using likert scale can be
estimated by coefficent alpha (Cronbach’s alfa). It is easier to be
claculated using SPSS.
• Spearman Brown formula is the easiest one to be calculated
by hand but less accurate.
4- Inter and intera rater
reliability:
• Inter-rater reliability occurs when two or more
scorers yield inconsistent scores of same test
(Brown, 2004)
• To calculate the inter and intera rater reliability
we can use:
• Cohen’s krappa (ICC) for two raters
• Intera-class correlation
Kappa coefficients:
• The kappa coeficient is used in inter rater
reliability to indicate the extent of agreement
between the raters
• Kappa is used to measure agreement between
two individuals, that is why it is used in the
interrater reliability calculation. The value of
Kappa K is:
• Po – Pe
1 – Pe
Improving test reliability:
• Using consistent criteria for a correct response
• Promoting students with good test environment
conditions
• Providing a good administration for the test
• Give uniform attention to those sets throughout
the evaluation time.
• Read through tests at list twice to check for
your consistency.
• If you have “mid- stream” modifications of what
you consider as a correct response, go back
and apply the same standards to all.
THE RELATIONSHIP BETWEEN
RELIABILITY AND VALIDITY:
• Reliability is necessary for validity
• The test can not be valid unless it is reliable
• Not every test that is reliable means that is valid
• Detremining the reliability of a test is an
important first step, but not the defining step, in
determining the validity of a test.

Establishing the English Language Test Reliability

  • 1.
  • 2.
    Contents:  Reliability Overview. How to estimate reliability (different types of reliability).  Calculating and computing reliability.  Factors affecting test reliability.  Improving test reliability.  The relationship between validity and reliability.
  • 3.
    Definition of Reliability: •Reliability is the extent to which the test is consistent in measuring whatever it does measure. (Sims, 2015) • Reliability refers to measure which is reliable to the extent that independent but comparable measures of the same trait or construct of a given object agree. (Zhu & Han, 2011) • Reliability indicates whether a measurement device can measure the same characteristics over and over the time. (Blerkom, 2009) • The reliable test is consistent and dependable. (Brown, 2004)
  • 4.
    THE THEORITICAL CONCEPTOF RELIABILITY: • Observed scores: scores that the students actually obtain. • True scores: scores that students should get if the measurement device worked perfectly. • Error variance, also known as measurement error, is anything that causes a student’s observed score to differ from his or her true score. Variance Observed = Variance True + Variance Error
  • 6.
    Reliability = VarianceTrueVarianceObserved 10 = 7 + 3 Reliability = 7 10 = .70 70% variaiton in true scores. 30% variation in measurement errors
  • 7.
    • The variationthat contributes to unreliability • The change that represent a regress to one’s score. unsystematic variation • The changes that represent a steady increase to one’s score. • This variation contributes in reliability. systematic variation
  • 8.
    • Reliability: 01 • r = 0 = minimum unreliable • r = 1= maximale completly reliable • Standard error of measurement SEM = Standard deviation of all error score obtained from a given measure in differnet situations.
  • 9.
    Types of riliability: 1-Test-retest reliability: • Test-retest reliability is concerned with the test’s stability over time. (Blerkom, 2009) • For measuring reliability for two tests, we use the pearson correlation coefficient. • For measuring more than two tests, we use intraclass correlation that can be used also for two tests.
  • 10.
    The pearson correlation: •The pearson correlation is the most commonly used measure of correlation, sometimes called the pearson product moment correlation • It is simply the average of the sum of the Z score products and it measures the strenght of linear relationship between characteristics.
  • 11.
  • 16.
    2- Parallel-form reliability: •Determines if two forms are equivalent • More common for standardized tests • Can be measured with the correlation coefficient. Self esteem Version 1 Self esteem Version 2 reliable
  • 17.
    3- Internal consistency reliability: -Split-halfreliability: • In Split-half reliability separate the test into halves and then compare each half. • The most common approach is to split the test into two tests by separating the odd- numbered and even- numbered items • The KR-20 can be used to calculate the reliability for the tests that have a right answer, it can not be used for used for likert scale.
  • 18.
    KR-20 FORMULA • KR-20is a formulas developed by (Kuder & Richardson, 1937) to measure the internal consistency realibility.
  • 19.
    Split-half reliability Likert scale: •Likert scale is the test that uses the likert scale that does not have just one answer. Split half reliability for the tests using likert scale can be estimated by coefficent alpha (Cronbach’s alfa). It is easier to be claculated using SPSS. • Spearman Brown formula is the easiest one to be calculated by hand but less accurate.
  • 21.
    4- Inter andintera rater reliability: • Inter-rater reliability occurs when two or more scorers yield inconsistent scores of same test (Brown, 2004) • To calculate the inter and intera rater reliability we can use: • Cohen’s krappa (ICC) for two raters • Intera-class correlation
  • 22.
    Kappa coefficients: • Thekappa coeficient is used in inter rater reliability to indicate the extent of agreement between the raters • Kappa is used to measure agreement between two individuals, that is why it is used in the interrater reliability calculation. The value of Kappa K is: • Po – Pe 1 – Pe
  • 23.
    Improving test reliability: •Using consistent criteria for a correct response • Promoting students with good test environment conditions • Providing a good administration for the test • Give uniform attention to those sets throughout the evaluation time. • Read through tests at list twice to check for your consistency. • If you have “mid- stream” modifications of what you consider as a correct response, go back and apply the same standards to all.
  • 24.
    THE RELATIONSHIP BETWEEN RELIABILITYAND VALIDITY: • Reliability is necessary for validity • The test can not be valid unless it is reliable • Not every test that is reliable means that is valid • Detremining the reliability of a test is an important first step, but not the defining step, in determining the validity of a test.