Reliability
nazbatool149@yahoo.com
Overview of Reliability and
Validity
 Reliability refers to the consistency or stability of test
scores
 Validity refers to the extent to which a test measures
what it is supposed to measure.
Reliability is a necessary but not sufficient condition for validity
(i.e., if you are going to have validity, you must have reliability
but reliability in and of itself is not enough to ensure validity.
Thus: A reliable test may or may not be valid.
An unreliable test can never be valid
Example
 Assume you weigh 125 pounds. If you
weigh yourself five times and get 135, 134,
134, 135, 136 then your scales are reliable
but not valid. The scores were consistent
but wrong- Not Valid!
Example
 A research example of this phenomenon would be
a questionnaire designed to assess job satisfaction
that asked questions such as, ‘
 Do you like to watch hockey games?”, “
 What do you like to eat pizza?”
 As you can readily imagine, the responses to these
questions would probably remain stable over time,
thus, demonstrating highly reliable scores.
 However, are the questions valid when one is
attempting to measure job satisfaction? Of course
not, as they have nothing to do with an individual’s
level of job satisfaction.
Reliability
 Reliability is defined as the extent to which
a questionnaire, test, observation or any
measurement procedure produces the
same or consistent results on repeated
trials.
Reliability
 Synonym for dependability or consistency
 Consistency in measurement
 A test may be reliable in one context and
unreliable in another
 Different types and degrees of reliability
CONCEPT
X = T + E
Observed
Score
True
Score
Error
For example, magnitude of presence of a certain psychological trait
(such as extraversion) as measured by the test will be due to the
true mount of extraversion and not other factors.
Concept of reliability
 Scores on a test may vary
1. The true ability of the test taker
2. Other factors
 If we use X to represent observed score, E to
represent error (a score resulting from random
and irrelevant influence),and T to represent
true score (part of observed score not affected
by error), then the observed score equals to
the true score plus error score:
𝑋 = +
𝑇 𝐸
Variance
 A statistic useful in describing sources of test
score variability is the Variance σ2
 Variance due to true differences = true
Variance σ2
tr
 Variance due to irrelevant and random
sources = error Variance σ2
e
 Total Variance σ2 =
σ2
tr + σ2
e
CONCEPT
 Same goes for variance.
 The term reliability refers to the proportion of
variance attributed to true variance.
 Because true differences are assumed to be stable
they are expected to yield consistent results upon
repeated administrations of the same test as well
on equivalent forms of test.
 Because error variance ,may increase or decrease
a test score by varying amounts, consistency of the
test score, thus reliability can be effected.
Reliability in terms of Variance
 Reliability refers to the proportion(amount)
of total variance attributed to true
variance. Greater the proportion to total
variance attributed to true variance,
more reliable is the test.
Sources of Error Variance
 Test Construction – item sampling and content
sampling
 Test Administration- sources of variance in this
case influence test taker’s attention and
motivation
 Examiner Related Variables
 Test Scoring and Interpretation- as soon as
psychological measures uses anything but
objective type, the scorer or scoring system
becomes source of error variance
Measurement of Reliability
 Reliability is usually determined using a
correlation coefficient (it is called a reliability
coefficient in this context).
 Correlation coefficient is a measure of
relationship that varies from -1 to 0 to 1 and
the farther the number is from zero,
the stronger the correlation.
 For example, minus one (-1.00) indicates a
perfect negative correlation, zero indicates
no correlation at all, and positive one (+1.00)
indicates a perfect positive correlation.
Reliability Coefficient
 A reliability coefficient is an index of
reliability
 The proportion that indicates the ratio
between true score variance and total
variance.
 Reliability coefficients of .70 or higher are
generally considered to be acceptable
for research purposes. Reliability
coefficients of .90 or higher are needed to
make decisions that have impacts on
people's lives (e.g., the clinical uses of
tests).
Types
Test Retest
Parallel
form/
Alternate
Forms
SPLIT HALF
Internal
Consistency
Test Retest Reliability
 Estimate of reliability obtained by correlating pairs
of scores from the same people on two different
administrations of the same test.
 To measure something relatively stable over time
e.g., personality.
 Passage of time can be a source of error variance.
 Proper use by exploring the possible intervening
factors between the test administrations
(experience, memory, fatigue, motivation).
 Practice effect
Parallel/ALTERNATE forms
reliability
 The degree of relationship between various
forms of a test can be evaluated by means of
alternate-forms or parallel forms coefficient of
reliability known as coefficient of equivalence.
 Parallel forms : when means and variances for
both forms are equal .
 Alternate forms : different versions of the test
constructed so as to be parallel.
 Equivalent with respect to content and difficulty.
Parallel/ALTERNATE forms
reliability
 Two test administrations for the same
group are required.
 Test scores may be affected by the
factors such as motivation , fatigue,
practice, learning, therapy
Parallel/ALTERNATE forms
reliability
 Sources of error variance are same as in
test-retest reliability
 Addition source is = item sampling
The success of this method hinges on the
equivalence of the two forms of the test.
Disadvantages
Advantages
SPLIT HALF RELIABILITY
 An estimate obtained by correlating two pairs of scores
obtained from equivalent halves of a single test administered
once.
 Useful when it is impractical or undesirable to assess reliability
with two tests or administer a test twice.
 Ways to divide into half
 Randomly assign items to two halves
 Odd numbered items to one half of a test and even to another
half (odd-even reliability).
 Items equivalent w.r.t content and difficulty.
After dividing into halves, a correlation is computed followed by
Spearman Brown formula.
internal consistency
 Inter-item consistency refers to the degree
of correlation among all items on a scale.
 It is useful in assessing homogeneity of a
test i.e., test contains items that measure
a single trait.
 Heterogeneity : the degree to which test
measures different factors.
Spearman Brown formula
 Spearman Brown formula is used to
estimate internal consistency reliability
from a correlation of two halves of a test.
 It is used to assess reliability for various
lengths of a test.
 How did shortening of a test effect
reliability?
 Number of items needed to attain a
measure of reliability.
Other Methods of measuring
internal consistency
 Kuder-richardson formula: Inter-item
consistency for dichotomous items.
 Coefrficient Alpha : Developed by Cronbach
(1951) , mean of all possible split half
correlations , corrected by Spearman Brown
formula.
 Appropriate for non-dichotomous items.
 Ranges form 0 to 1
 Preffered statistic for measuring internal
consistency.
Other Methods of measuring
internal consistency
 Average Proportional Distance (APD) :
measure of internal consistency of a test
that focuses on the degree of difference
that exists between item scores.

Reliability And it's types in psychological testing and measurements

  • 1.
  • 2.
    Overview of Reliabilityand Validity  Reliability refers to the consistency or stability of test scores  Validity refers to the extent to which a test measures what it is supposed to measure. Reliability is a necessary but not sufficient condition for validity (i.e., if you are going to have validity, you must have reliability but reliability in and of itself is not enough to ensure validity. Thus: A reliable test may or may not be valid. An unreliable test can never be valid
  • 3.
    Example  Assume youweigh 125 pounds. If you weigh yourself five times and get 135, 134, 134, 135, 136 then your scales are reliable but not valid. The scores were consistent but wrong- Not Valid!
  • 4.
    Example  A researchexample of this phenomenon would be a questionnaire designed to assess job satisfaction that asked questions such as, ‘  Do you like to watch hockey games?”, “  What do you like to eat pizza?”  As you can readily imagine, the responses to these questions would probably remain stable over time, thus, demonstrating highly reliable scores.  However, are the questions valid when one is attempting to measure job satisfaction? Of course not, as they have nothing to do with an individual’s level of job satisfaction.
  • 5.
    Reliability  Reliability isdefined as the extent to which a questionnaire, test, observation or any measurement procedure produces the same or consistent results on repeated trials.
  • 6.
    Reliability  Synonym fordependability or consistency  Consistency in measurement  A test may be reliable in one context and unreliable in another  Different types and degrees of reliability
  • 7.
    CONCEPT X = T+ E Observed Score True Score Error For example, magnitude of presence of a certain psychological trait (such as extraversion) as measured by the test will be due to the true mount of extraversion and not other factors.
  • 8.
    Concept of reliability Scores on a test may vary 1. The true ability of the test taker 2. Other factors  If we use X to represent observed score, E to represent error (a score resulting from random and irrelevant influence),and T to represent true score (part of observed score not affected by error), then the observed score equals to the true score plus error score: 𝑋 = + 𝑇 𝐸
  • 9.
    Variance  A statisticuseful in describing sources of test score variability is the Variance σ2  Variance due to true differences = true Variance σ2 tr  Variance due to irrelevant and random sources = error Variance σ2 e  Total Variance σ2 = σ2 tr + σ2 e
  • 11.
    CONCEPT  Same goesfor variance.  The term reliability refers to the proportion of variance attributed to true variance.  Because true differences are assumed to be stable they are expected to yield consistent results upon repeated administrations of the same test as well on equivalent forms of test.  Because error variance ,may increase or decrease a test score by varying amounts, consistency of the test score, thus reliability can be effected.
  • 12.
    Reliability in termsof Variance  Reliability refers to the proportion(amount) of total variance attributed to true variance. Greater the proportion to total variance attributed to true variance, more reliable is the test.
  • 13.
    Sources of ErrorVariance  Test Construction – item sampling and content sampling  Test Administration- sources of variance in this case influence test taker’s attention and motivation  Examiner Related Variables  Test Scoring and Interpretation- as soon as psychological measures uses anything but objective type, the scorer or scoring system becomes source of error variance
  • 14.
    Measurement of Reliability Reliability is usually determined using a correlation coefficient (it is called a reliability coefficient in this context).  Correlation coefficient is a measure of relationship that varies from -1 to 0 to 1 and the farther the number is from zero, the stronger the correlation.  For example, minus one (-1.00) indicates a perfect negative correlation, zero indicates no correlation at all, and positive one (+1.00) indicates a perfect positive correlation.
  • 15.
    Reliability Coefficient  Areliability coefficient is an index of reliability  The proportion that indicates the ratio between true score variance and total variance.
  • 16.
     Reliability coefficientsof .70 or higher are generally considered to be acceptable for research purposes. Reliability coefficients of .90 or higher are needed to make decisions that have impacts on people's lives (e.g., the clinical uses of tests).
  • 17.
  • 18.
    Test Retest Reliability Estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test.  To measure something relatively stable over time e.g., personality.  Passage of time can be a source of error variance.  Proper use by exploring the possible intervening factors between the test administrations (experience, memory, fatigue, motivation).  Practice effect
  • 19.
    Parallel/ALTERNATE forms reliability  Thedegree of relationship between various forms of a test can be evaluated by means of alternate-forms or parallel forms coefficient of reliability known as coefficient of equivalence.  Parallel forms : when means and variances for both forms are equal .  Alternate forms : different versions of the test constructed so as to be parallel.  Equivalent with respect to content and difficulty.
  • 20.
    Parallel/ALTERNATE forms reliability  Twotest administrations for the same group are required.  Test scores may be affected by the factors such as motivation , fatigue, practice, learning, therapy
  • 21.
    Parallel/ALTERNATE forms reliability  Sourcesof error variance are same as in test-retest reliability  Addition source is = item sampling The success of this method hinges on the equivalence of the two forms of the test. Disadvantages Advantages
  • 22.
    SPLIT HALF RELIABILITY An estimate obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once.  Useful when it is impractical or undesirable to assess reliability with two tests or administer a test twice.  Ways to divide into half  Randomly assign items to two halves  Odd numbered items to one half of a test and even to another half (odd-even reliability).  Items equivalent w.r.t content and difficulty. After dividing into halves, a correlation is computed followed by Spearman Brown formula.
  • 23.
    internal consistency  Inter-itemconsistency refers to the degree of correlation among all items on a scale.  It is useful in assessing homogeneity of a test i.e., test contains items that measure a single trait.  Heterogeneity : the degree to which test measures different factors.
  • 24.
    Spearman Brown formula Spearman Brown formula is used to estimate internal consistency reliability from a correlation of two halves of a test.  It is used to assess reliability for various lengths of a test.  How did shortening of a test effect reliability?  Number of items needed to attain a measure of reliability.
  • 25.
    Other Methods ofmeasuring internal consistency  Kuder-richardson formula: Inter-item consistency for dichotomous items.  Coefrficient Alpha : Developed by Cronbach (1951) , mean of all possible split half correlations , corrected by Spearman Brown formula.  Appropriate for non-dichotomous items.  Ranges form 0 to 1  Preffered statistic for measuring internal consistency.
  • 26.
    Other Methods ofmeasuring internal consistency  Average Proportional Distance (APD) : measure of internal consistency of a test that focuses on the degree of difference that exists between item scores.