SlideShare a Scribd company logo
1 of 45
Reliability of Test
Dr. Sarat Kumar Rout
Assist. Prof. Department of Education
Ravenshaw University
Email:saratrout2007@rediffmail.com
Meaning of Reliability
• It refers to the precision or accuracy of the measurement of
score.
• Reliability refers to the stability of a test measure.
• Reliability is the degree to which a Practice, Procedure, or
Test (PPT) produces stable and consistent results if
repeated/re-examined on same individuals/students on
different occasions, or with different sets of equivalent items
when all other factors are held constant.
Meaning of Reliability
• Reliability is one of the important
characteristics of a good test.
• (explanation and generalization of results)
Example of tests:
Achievement test;
Intelligence test;
Creativity test; and
Personality test…..etc
Logical Meaning of Reliability of a Test
• Whenever we measure something (attribute or trait)
either in the Physical or Social science, the
measurement involves some kind of error.
(Sources of error – observers/scoring, instruments,
instability of the attribute, guessing…..etc)
• In other way, we can say the extent to which the
Practice, Procedure, or Test (PPT) free from
error(random/measurement and systematic) in any
measurement (Physical science or Social science).
Logical Meaning of Reliability of a Test
• In terms of an equation, it can be written as:
XT = X∞ +Xe
XT = the actual obtained score
X∞ = true score
Xe= error score
Logical Meaning of Reliability of a Test
• Whenever we administer a test to examinees,
we would like to know how much of their
scores reflects "truth" and how much reflects
“error”.
• It is a measure of reliability that provides us
with an estimate of the ‘proportion of
variability in examinees obtained scores that
is due to true differences among examinees
on the attribute(s) measured by the test.
Logical Meaning of Reliability of a Test
• When measurement will be free from error,
the reliability will be perfect and reliability
index will be +1.00.
• But reliability is never perfect.
Logical Meaning of Reliability of a Test
• Since any obtained score is divided into the true
score plus error score, the total variance of a test
is also divided into two components-true
variance and error variance.
• Variance= square of the standard deviation
• In terms of equation, it may be written as:
σ2
T =σ2∞ +σ2
e
σ2
T = total score variance
σ2∞ = true score variance
σ2
e= error score variance
Logical Meaning of Reliability of a Test
• Thus the variance of total score is equal to the
variance of true score + the variance of error score.
• In classical test theory, the reliability of a test scores
is logically defined as:
“proportion of the true variance”
• The proportion of the true variance and the error
variance are found by dividing the total variance
• The proportion of the true variance= σ2∞ ⁄σ2
T
• The proportion of the error variance= σ2
e ⁄σ2
T
Logical Meaning of Reliability of a Test
• Now, reliability coefficient rtt= σ2∞ ⁄σ2
T ………..(i)
or
• reliability coefficient rtt= 1- σ2
e ⁄σ2
T……………………. (ii)
• Suppose a achievement test in mathematics is
administered on group of 50 students. The
hypothetical total score variance, true score
variance and error score variance are as follows:
• Total variance=58.36, true variance=43.19 and error
variance=15.17
• By equation (i)= σ2∞ ⁄σ2
T =43.9/58.36=0.74
• By equation (ii)=1- σ2
e ⁄σ2
T =1-15.17/58.36=1-0.26=0.74
What is reliability coefficient?
• Study Tip: Remember that, in contrast to other
correlation coefficients, the reliability
coefficient is never squared to interpret it but is
interpreted directly as a measure of true score
variability. A reliability coefficient of .89
means that 89% of variability in obtained
scores is true score variability.
What is reliability coefficient?
•The reliability coefficient is symbolized with
the letter "r" and a subscript that contains
two of the same letters or numbers (e.g.,
''rtt'').
• The subscript indicates that the correlation
coefficient was calculated by correlating a test
with itself rather than with some other
measure.
What is reliability coefficient?
• Most methods for estimating reliability produce a
reliability coefficient, which is a correlation co-
efficient that ranges in value from 0.0 to + 1.0.
• When a test's reliability coefficient is 0.0, this
means that all variability in obtained test scores is
due to measurement error.
• Conversely, when a test's reliability coefficient is +
1.0, this indicates that all variability in scores
reflects true score variability.
What is reliability coefficient?
Taken from page 3-3 of the U.S. Department of Labor’s “Testing and Assessment:
An Employer’s Guide to Good Practices” (2000).
http://www.onetcenter.org/dl_files/empTestAsse.pdf
What is reliability coefficient?
• Note that a reliability coefficient does not provide
any information about what is actually being
measured by a test?
• A reliability coefficient only indicates whether the
attribute measured by the test— whatever it is—
is being assessed in a consistent, precise way.
• Whether the test is actually assessing what it was
designed to measure is addressed by an analysis
of the test's validity.
Methods of Estimating Reliability Coefficient
•A test's true score variance is not known,
however, and reliability must be estimated rather
than calculated directly.
•There are several ways to estimate a test's
reliability coefficient index.
1.Test-Retest Reliability
2.Alternate Forms Reliability
3.Internal Consistency Reliability
•Each involves assessing the consistency of an
examinee's scores over time, across different
content samples, or across different scorers.
Methods for Estimating Reliability
• The common assumption for each of these reliability
techniques that consistent variability is true score
variability, while variability that is inconsistent reflects
random error.
• The selection of a method for estimating reliability
depends on the nature of the test.
• Each method not only entails different procedures but
is also affected by different sources of error. For many
tests, more than one method should be used.
1. Test-Retest Reliability
• The test-retest method for estimating reliability
involves administering the same test to the same
group of examinees on two different occasions
and then correlating the two sets of scores.
• When using this method, the reliability
coefficient indicates the degree of stability
(consistency) of examinees' scores over time and
is also known as the coefficient of stability.
Test-Retest Reliability
• The primary sources of measurement error for
test-retest reliability are any random factors
related to the time that passes between the
two administrations of the test.
• These time sampling factors include random
fluctuations in examinees over time (e.g.,
changes in anxiety or motivation) and random
variations in the testing situation.
Test-Retest Reliability
•Memory and practice also contribute to error when they
have random carryover effects; i.e., when they affect
many or all examinees but not in the same way.
Despite all these limitations
•Test-retest reliability is appropriate to measure attributes
that are relatively stable over time.
(Aptitude, Achievement–speed and power test)
•Test-retest reliability is also appropriate to measure
Heterogeneous test.
2. Alternate (Equivalent, Parallel) Forms Reliability
• To assess a test's alternate forms reliability, two
equivalent forms of the test are administered to the
same group of examinees and the two sets of scores
are correlated.
• Alternate forms reliability indicates the consistency
of responding to different item samples (the two
test forms) and, when the forms are administered at
different times, the consistency of responding over
time.
Alternate (Equivalent, Parallel) Forms Reliability
• The alternate forms reliability coefficient is also
called the coefficient of equivalence when the two
forms are administered at about the same time.
• The primary source of measurement error for
alternate forms reliability is content sampling, time
sampling or error introduced by an interaction
between different examinees' knowledge and the
different content assessed by the items included in
the two forms (eg: Form A and Form B)
Alternate (Equivalent, Parallel) Forms Reliability
•The items in Form A might be a better match of one
examinee's knowledge than items in Form B, while
the opposite is true for another examinee.
•In this situation, the two scores obtained by each
examinee will differ, which will lower the alternate
forms reliability coefficient.
•When administration of the two forms is separated
by a period of time, time sampling factors also
contribute to error.
Alternate (Equivalent, Parallel) Forms Reliability
• Like test-retest reliability, alternate forms reliability is
not appropriate when the attribute measured by the
test is likely to fluctuate over time or when scores are
likely to be affected by repeated measurement.
• If the same strategies required to solve problems on
Form A are used to solve problems on Form B, even if
the problems on the two forms are not identical,
there are likely to be practice effects.
Alternate (Equivalent, Parallel) Forms Reliability
• When these effects differ for different examinees
(i.e., are random), practice will serve as a source of
measurement error.
• Although alternate forms reliability is considered by
some experts as the most rigorous method for
estimating reliability, it is not often assessed due to
the difficulty in developing two forms of the same
test that are truly equivalent. (Discuss criteria of
parallel test)
3. Internal Consistency Estimates of Reliability
• We have discussed that reliability estimates can be obtained
by administering the same test to the same examinees and by
correlating the results: Test/Retest.
• We have also seen that reliability estimates can be obtained
by by administering two parallel or alternate forms of a test,
and then correlating those results: Parallel & Alternate Forms.
• In both of the above cases, the test constructer or researcher
must administer two exams, and they are sometimes given at
different times to reduce the carry over effects.
• Here we will see that it is also possible to obtained a reliability
estimate using only a single test.
• The most common way to obtained a reliability estimate using
a single test is through the split- half approach/method
Split-Half approach to Reliability
• When using the Split-half approach, one gives a
single test to a group of examinees.
• Later, the test is divided into two parts, which may be
considered to be alternate forms of one another.
• In fact, the split is not so arbitrary; an attempt should
be made to choose the two haves so that they are
parallel or essential equivalent i.e. odd-even
method .
• Then the reliability of the whole test is estimated by
using the Spearman Brown formula.
Split-Half approach to Reliability
• Using the Spearman-Brown formula:
• Here we are assuming the two test halves (t and t’) are
parallel forms.
• Then the two halves are correlated, producing the estimated
Spearman-Brown formula reliability coefficient, rtt’.
• But this is only a measure of the reliability of one half
of the test.
• The reliability of the entire test would be greater than
the reliability of half test.
• The Spearman-Brown formula for estimating reliability of the
entire test/ whole test is therefore:
• Reliability coefficient= rh’. = 2x rtt’/ 1+ rtt’
Split-Half approach to Reliability
Reliability coefficient of half test (rtt’) and
entire/whole test (rh)
0.00 0.00
0.20 0.33
0.40 0.57
0.60 0.75
0.80 0.89
1.00 1.00
ii. Cronbach’s coefficient α approach
• on the other hand, the two test halves may not
parallel forms.
• This is confirmed when it is determined that the two
halves have unequal variances.
• In these situations, it is best to use a different
approach to estimating reliability.
• Cronbach’s coefficient α
• α can be used to estimate the reliability of the entire test.
Cronbach’s coefficient α approach
Cronbach’s coefficient α= 2 [σh
2 –(σt1
2 – σt2
2 )]/σh
2
σh = variance of the entire test, h
σt1
= variance of the half test, t1
σt2
= variance of the half test, t2
• It is the case, that if the variances on both test
halves are equal, then the Spearman-Brown
formula and Cronbach’s α will produce
identical results.
• Content sampling is a source of error for both split-half
reliability and coefficient alpha.
• For split-half reliability, content sampling refers to the
error resulting from differences between the content of
the two halves of the test (i.e., the items included in
one half may better fit the knowledge of some
examinees than items in the other half).
• For coefficient alpha, content (item) sampling refers to
differences between individual test items rather than
between test halves.
iii. Kuder-Richardson Formulas-20 & 21
When test items are scored dichotomously
(right or wrong), a variation of coefficient
alpha known as the Kuder-Richardson
Formula 20 (KR-20) can be used.
Kuder-Richardson Formulas-20 & 21
Internal Consistency Reliability
•The methods for assessing internal consistency
reliability are useful when a test is designed to
measure a single characteristic, when the characteristic
measured by the test fluctuates over time, or when
scores are likely to be affected by repeated exposure to
the test.
•They are not appropriate for assessing the reliability of
speed tests because, for these tests, they tend to
produce spuriously high coefficients. (For speed tests,
alternate forms reliability is usually the best choice.)
Factors That Affect The Reliability Coefficient
The magnitude of the reliability coefficient is affected
not only by the sources of error discussed earlier, but
also by the length of the test, the range of the test
scores, and the probability that the correct response
to items can be selected by guessing.
– Test Length
– Range of Test Scores
– Guessing
1. Test Length
•The larger the sample of the attribute being
measured by a test, the less the relative effects of
measurement error and the more likely the sample
will provide dependable, consistent information.
•Consequently, a general rule is that the longer the
test, the larger the test's reliability coefficient.
•The Spearman-Brown prophecy formula is most
associated with split-half reliability but can actually
be used whenever a test developer wants to
estimate the effects of lengthening or shortening a
test on its reliability coefficient.
Test Length
For instance, if a 100-item test has a reliability
coefficient of .84, the Spearman-Brown formula
could be used to estimate the effects of increasing
the number of items to 150 or reducing the number
to 50.
A problem with the Spearman-Brown formula is that
it does not always yield an accurate estimate of
reliability: In general, it tends to overestimate a test's
true reliability (Gay, 1992).
Test Length
• This is most likely to be the case when the added
items do not measure the same content domain
as the original items and/or are more susceptible
to the effects of measurement error.
• Note that, when used to correct the split-half
reliability coefficient, the situation is more
complex, and this generalization does not always
apply: When the two halves are not equivalent in
terms of their means and standard deviations,
the Spearman-Brown formula may either over- or
underestimate the test's actual reliability.
2.Range of Test Scores
• Since the reliability coefficient is a correlation
coefficient, it is maximized when the range of
scores is unrestricted.
• When examinees are heterogeneous, the
range of scores is maximized.
• The range is also affected by the difficulty
level of the test items.
Range of Test Scores
• When all items are either very difficult or very
easy, all examinees will obtain either low or
high scores, resulting in a restricted range.
• Therefore, the best strategy is to choose items
so that the average difficulty level is in the
mid-range (r = .50).
3. Guessing
• A test's reliability coefficient is also affected by
the probability that examinees can guess the
correct answers to test items.
• As the probability of correctly guessing answers
increases, the reliability coefficient decreases.
• All other things being equal, a true/false test will
have a lower reliability coefficient than a four-
alternative multiple-choice test which, in turn,
will have a lower reliability coefficient than a free
recall test.
General points about reliability
• Any test is neither perfectly reliable nor perfectly
unreliable. The reliability is not an absolute
principle, rather it is always a matter of degree.
• Reliability is necessary but not a sufficient condition
for validity.
• The reliability is primarily statistical.
Continue………….
Why reliability is an important characteristics of a good test ?
No matter how well the objectives are written, or how clever the
items, the quality and usefulness of an examination is predicated
on Validity and Reliability.
• Without reliability and validity of one cannot test
hypothesis.
• Without testing hypothesis, one cannot support a
theory.
• Without a supported theory, one cannot explain why
events occur.
• Without adequate explanation, one cannot develop any
effective material or non-material technologies.

More Related Content

What's hot

Achievement tests
Achievement testsAchievement tests
Achievement testsManu Sethi
 
Validity and Reliability
Validity and ReliabilityValidity and Reliability
Validity and ReliabilityMaury Martinez
 
Qualities of a Good Test
Qualities of a Good TestQualities of a Good Test
Qualities of a Good TestDrSindhuAlmas
 
Administration/Conducting the Test
Administration/Conducting the TestAdministration/Conducting the Test
Administration/Conducting the TestDr. Amjad Ali Arain
 
Validity and objectivity of tests
Validity and objectivity of testsValidity and objectivity of tests
Validity and objectivity of testsbushra mushtaq
 
Reliability for testing and assessment
Reliability for testing and assessmentReliability for testing and assessment
Reliability for testing and assessmentErlwinmer Mangmang
 
Meaning of Test, Testing and Evaluation
Meaning of Test, Testing and EvaluationMeaning of Test, Testing and Evaluation
Meaning of Test, Testing and EvaluationDr. Amjad Ali Arain
 
Scaling Z-scores T-scores C-scores
Scaling Z-scores T-scores C-scoresScaling Z-scores T-scores C-scores
Scaling Z-scores T-scores C-scoresSurbhi Sharma
 
Educational testing and assessment
Educational testing and assessmentEducational testing and assessment
Educational testing and assessmentAbdul Majid
 
Standardized Testing
Standardized TestingStandardized Testing
Standardized TestingMiss EAP
 
Validity, Reliability and Feasibility
Validity, Reliability and FeasibilityValidity, Reliability and Feasibility
Validity, Reliability and FeasibilityJasna3134
 
teacher made test Vs standardized test
 teacher made test Vs standardized test teacher made test Vs standardized test
teacher made test Vs standardized testathiranandan
 
Presentation validity
Presentation validityPresentation validity
Presentation validityAshMusavi
 
1. Achievement Test- Meaning and Purposes.pptx
1. Achievement Test- Meaning and Purposes.pptx1. Achievement Test- Meaning and Purposes.pptx
1. Achievement Test- Meaning and Purposes.pptxDrJishaBaby
 
Achievement& Diagnostic test
Achievement& Diagnostic testAchievement& Diagnostic test
Achievement& Diagnostic testrkbioraj24
 
What is Reliability and its Types?
What is Reliability and its Types? What is Reliability and its Types?
What is Reliability and its Types? Dr. Amjad Ali Arain
 
Qualities of a good test (1)
Qualities of a good test (1)Qualities of a good test (1)
Qualities of a good test (1)kimoya
 

What's hot (20)

Achievement tests
Achievement testsAchievement tests
Achievement tests
 
Qualities of a Good Test
Qualities of a Good TestQualities of a Good Test
Qualities of a Good Test
 
Validity and Reliability
Validity and ReliabilityValidity and Reliability
Validity and Reliability
 
Characteristics of a good measuring tool
Characteristics of a good measuring toolCharacteristics of a good measuring tool
Characteristics of a good measuring tool
 
Qualities of a Good Test
Qualities of a Good TestQualities of a Good Test
Qualities of a Good Test
 
Administration/Conducting the Test
Administration/Conducting the TestAdministration/Conducting the Test
Administration/Conducting the Test
 
Validity and objectivity of tests
Validity and objectivity of testsValidity and objectivity of tests
Validity and objectivity of tests
 
Reliability for testing and assessment
Reliability for testing and assessmentReliability for testing and assessment
Reliability for testing and assessment
 
Meaning of Test, Testing and Evaluation
Meaning of Test, Testing and EvaluationMeaning of Test, Testing and Evaluation
Meaning of Test, Testing and Evaluation
 
Scaling Z-scores T-scores C-scores
Scaling Z-scores T-scores C-scoresScaling Z-scores T-scores C-scores
Scaling Z-scores T-scores C-scores
 
Educational testing and assessment
Educational testing and assessmentEducational testing and assessment
Educational testing and assessment
 
Standardized Testing
Standardized TestingStandardized Testing
Standardized Testing
 
Validity, Reliability and Feasibility
Validity, Reliability and FeasibilityValidity, Reliability and Feasibility
Validity, Reliability and Feasibility
 
teacher made test Vs standardized test
 teacher made test Vs standardized test teacher made test Vs standardized test
teacher made test Vs standardized test
 
Presentation validity
Presentation validityPresentation validity
Presentation validity
 
1. Achievement Test- Meaning and Purposes.pptx
1. Achievement Test- Meaning and Purposes.pptx1. Achievement Test- Meaning and Purposes.pptx
1. Achievement Test- Meaning and Purposes.pptx
 
Achievement& Diagnostic test
Achievement& Diagnostic testAchievement& Diagnostic test
Achievement& Diagnostic test
 
What is Reliability and its Types?
What is Reliability and its Types? What is Reliability and its Types?
What is Reliability and its Types?
 
Assembling The Test
Assembling The TestAssembling The Test
Assembling The Test
 
Qualities of a good test (1)
Qualities of a good test (1)Qualities of a good test (1)
Qualities of a good test (1)
 

Similar to Reliability of test

What makes a good testA test is considered good” if the .docx
What makes a good testA test is considered good” if the .docxWhat makes a good testA test is considered good” if the .docx
What makes a good testA test is considered good” if the .docxmecklenburgstrelitzh
 
reliablity and validity in social sciences research
reliablity and validity  in social sciences researchreliablity and validity  in social sciences research
reliablity and validity in social sciences researchSourabh Sharma
 
Evaluation of Measurement Instruments.ppt
Evaluation of Measurement Instruments.pptEvaluation of Measurement Instruments.ppt
Evaluation of Measurement Instruments.pptCityComputers3
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Linejan
 
Valiadity and reliability- Language testing
Valiadity and reliability- Language testingValiadity and reliability- Language testing
Valiadity and reliability- Language testingPhuong Tran
 
Validity and reliability of the instrument
Validity and reliability of the instrumentValidity and reliability of the instrument
Validity and reliability of the instrumentBhumi Patel
 
Testing in language programs (chapter 8)
Testing in language programs (chapter 8)Testing in language programs (chapter 8)
Testing in language programs (chapter 8)Tahere Bakhshi
 
Characteristics of a good test
Characteristics of a good test Characteristics of a good test
Characteristics of a good test Arash Yazdani
 
unit 9 measurements presentation- short.ppt
unit 9 measurements presentation- short.pptunit 9 measurements presentation- short.ppt
unit 9 measurements presentation- short.pptMitikuTeka1
 
Reliability in Language Testing
Reliability in Language Testing Reliability in Language Testing
Reliability in Language Testing Seray Tanyer
 
Chapter 2 The Science of Psychological Measurement (Alivio, Ansula).pptx
Chapter 2 The Science of Psychological Measurement (Alivio, Ansula).pptxChapter 2 The Science of Psychological Measurement (Alivio, Ansula).pptx
Chapter 2 The Science of Psychological Measurement (Alivio, Ansula).pptxHazelLansula1
 
Validity and reliability in assessment.
Validity and reliability in assessment. Validity and reliability in assessment.
Validity and reliability in assessment. Tarek Tawfik Amin
 
Establishing the English Language Test Reliability
 Establishing the  English Language Test Reliability  Establishing the  English Language Test Reliability
Establishing the English Language Test Reliability Djihad .B
 

Similar to Reliability of test (20)

What makes a good testA test is considered good” if the .docx
What makes a good testA test is considered good” if the .docxWhat makes a good testA test is considered good” if the .docx
What makes a good testA test is considered good” if the .docx
 
Monika seminar
Monika seminarMonika seminar
Monika seminar
 
Monika seminar
Monika seminarMonika seminar
Monika seminar
 
reliablity and validity in social sciences research
reliablity and validity  in social sciences researchreliablity and validity  in social sciences research
reliablity and validity in social sciences research
 
Evaluation of Measurement Instruments.ppt
Evaluation of Measurement Instruments.pptEvaluation of Measurement Instruments.ppt
Evaluation of Measurement Instruments.ppt
 
Reliability
ReliabilityReliability
Reliability
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity
 
Valiadity and reliability- Language testing
Valiadity and reliability- Language testingValiadity and reliability- Language testing
Valiadity and reliability- Language testing
 
Validity and reliability of the instrument
Validity and reliability of the instrumentValidity and reliability of the instrument
Validity and reliability of the instrument
 
Testing in language programs (chapter 8)
Testing in language programs (chapter 8)Testing in language programs (chapter 8)
Testing in language programs (chapter 8)
 
Characteristics of a good test
Characteristics of a good test Characteristics of a good test
Characteristics of a good test
 
unit 9 measurements presentation- short.ppt
unit 9 measurements presentation- short.pptunit 9 measurements presentation- short.ppt
unit 9 measurements presentation- short.ppt
 
EM&E.pptx
EM&E.pptxEM&E.pptx
EM&E.pptx
 
Reliability and validity
Reliability and validityReliability and validity
Reliability and validity
 
Reliability in Language Testing
Reliability in Language Testing Reliability in Language Testing
Reliability in Language Testing
 
Edm 202
Edm 202Edm 202
Edm 202
 
Chapter 2 The Science of Psychological Measurement (Alivio, Ansula).pptx
Chapter 2 The Science of Psychological Measurement (Alivio, Ansula).pptxChapter 2 The Science of Psychological Measurement (Alivio, Ansula).pptx
Chapter 2 The Science of Psychological Measurement (Alivio, Ansula).pptx
 
Validity and reliability in assessment.
Validity and reliability in assessment. Validity and reliability in assessment.
Validity and reliability in assessment.
 
Rep
RepRep
Rep
 
Establishing the English Language Test Reliability
 Establishing the  English Language Test Reliability  Establishing the  English Language Test Reliability
Establishing the English Language Test Reliability
 

Recently uploaded

Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayMakMakNepo
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 

Recently uploaded (20)

Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up Friday
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 

Reliability of test

  • 1. Reliability of Test Dr. Sarat Kumar Rout Assist. Prof. Department of Education Ravenshaw University Email:saratrout2007@rediffmail.com
  • 2. Meaning of Reliability • It refers to the precision or accuracy of the measurement of score. • Reliability refers to the stability of a test measure. • Reliability is the degree to which a Practice, Procedure, or Test (PPT) produces stable and consistent results if repeated/re-examined on same individuals/students on different occasions, or with different sets of equivalent items when all other factors are held constant.
  • 3. Meaning of Reliability • Reliability is one of the important characteristics of a good test. • (explanation and generalization of results) Example of tests: Achievement test; Intelligence test; Creativity test; and Personality test…..etc
  • 4. Logical Meaning of Reliability of a Test • Whenever we measure something (attribute or trait) either in the Physical or Social science, the measurement involves some kind of error. (Sources of error – observers/scoring, instruments, instability of the attribute, guessing…..etc) • In other way, we can say the extent to which the Practice, Procedure, or Test (PPT) free from error(random/measurement and systematic) in any measurement (Physical science or Social science).
  • 5. Logical Meaning of Reliability of a Test • In terms of an equation, it can be written as: XT = X∞ +Xe XT = the actual obtained score X∞ = true score Xe= error score
  • 6. Logical Meaning of Reliability of a Test • Whenever we administer a test to examinees, we would like to know how much of their scores reflects "truth" and how much reflects “error”. • It is a measure of reliability that provides us with an estimate of the ‘proportion of variability in examinees obtained scores that is due to true differences among examinees on the attribute(s) measured by the test.
  • 7. Logical Meaning of Reliability of a Test • When measurement will be free from error, the reliability will be perfect and reliability index will be +1.00. • But reliability is never perfect.
  • 8. Logical Meaning of Reliability of a Test • Since any obtained score is divided into the true score plus error score, the total variance of a test is also divided into two components-true variance and error variance. • Variance= square of the standard deviation • In terms of equation, it may be written as: σ2 T =σ2∞ +σ2 e σ2 T = total score variance σ2∞ = true score variance σ2 e= error score variance
  • 9. Logical Meaning of Reliability of a Test • Thus the variance of total score is equal to the variance of true score + the variance of error score. • In classical test theory, the reliability of a test scores is logically defined as: “proportion of the true variance” • The proportion of the true variance and the error variance are found by dividing the total variance • The proportion of the true variance= σ2∞ ⁄σ2 T • The proportion of the error variance= σ2 e ⁄σ2 T
  • 10. Logical Meaning of Reliability of a Test • Now, reliability coefficient rtt= σ2∞ ⁄σ2 T ………..(i) or • reliability coefficient rtt= 1- σ2 e ⁄σ2 T……………………. (ii) • Suppose a achievement test in mathematics is administered on group of 50 students. The hypothetical total score variance, true score variance and error score variance are as follows: • Total variance=58.36, true variance=43.19 and error variance=15.17 • By equation (i)= σ2∞ ⁄σ2 T =43.9/58.36=0.74 • By equation (ii)=1- σ2 e ⁄σ2 T =1-15.17/58.36=1-0.26=0.74
  • 11. What is reliability coefficient? • Study Tip: Remember that, in contrast to other correlation coefficients, the reliability coefficient is never squared to interpret it but is interpreted directly as a measure of true score variability. A reliability coefficient of .89 means that 89% of variability in obtained scores is true score variability.
  • 12. What is reliability coefficient? •The reliability coefficient is symbolized with the letter "r" and a subscript that contains two of the same letters or numbers (e.g., ''rtt''). • The subscript indicates that the correlation coefficient was calculated by correlating a test with itself rather than with some other measure.
  • 13. What is reliability coefficient? • Most methods for estimating reliability produce a reliability coefficient, which is a correlation co- efficient that ranges in value from 0.0 to + 1.0. • When a test's reliability coefficient is 0.0, this means that all variability in obtained test scores is due to measurement error. • Conversely, when a test's reliability coefficient is + 1.0, this indicates that all variability in scores reflects true score variability.
  • 14. What is reliability coefficient? Taken from page 3-3 of the U.S. Department of Labor’s “Testing and Assessment: An Employer’s Guide to Good Practices” (2000). http://www.onetcenter.org/dl_files/empTestAsse.pdf
  • 15. What is reliability coefficient? • Note that a reliability coefficient does not provide any information about what is actually being measured by a test? • A reliability coefficient only indicates whether the attribute measured by the test— whatever it is— is being assessed in a consistent, precise way. • Whether the test is actually assessing what it was designed to measure is addressed by an analysis of the test's validity.
  • 16. Methods of Estimating Reliability Coefficient •A test's true score variance is not known, however, and reliability must be estimated rather than calculated directly. •There are several ways to estimate a test's reliability coefficient index. 1.Test-Retest Reliability 2.Alternate Forms Reliability 3.Internal Consistency Reliability •Each involves assessing the consistency of an examinee's scores over time, across different content samples, or across different scorers.
  • 17. Methods for Estimating Reliability • The common assumption for each of these reliability techniques that consistent variability is true score variability, while variability that is inconsistent reflects random error. • The selection of a method for estimating reliability depends on the nature of the test. • Each method not only entails different procedures but is also affected by different sources of error. For many tests, more than one method should be used.
  • 18. 1. Test-Retest Reliability • The test-retest method for estimating reliability involves administering the same test to the same group of examinees on two different occasions and then correlating the two sets of scores. • When using this method, the reliability coefficient indicates the degree of stability (consistency) of examinees' scores over time and is also known as the coefficient of stability.
  • 19. Test-Retest Reliability • The primary sources of measurement error for test-retest reliability are any random factors related to the time that passes between the two administrations of the test. • These time sampling factors include random fluctuations in examinees over time (e.g., changes in anxiety or motivation) and random variations in the testing situation.
  • 20. Test-Retest Reliability •Memory and practice also contribute to error when they have random carryover effects; i.e., when they affect many or all examinees but not in the same way. Despite all these limitations •Test-retest reliability is appropriate to measure attributes that are relatively stable over time. (Aptitude, Achievement–speed and power test) •Test-retest reliability is also appropriate to measure Heterogeneous test.
  • 21. 2. Alternate (Equivalent, Parallel) Forms Reliability • To assess a test's alternate forms reliability, two equivalent forms of the test are administered to the same group of examinees and the two sets of scores are correlated. • Alternate forms reliability indicates the consistency of responding to different item samples (the two test forms) and, when the forms are administered at different times, the consistency of responding over time.
  • 22. Alternate (Equivalent, Parallel) Forms Reliability • The alternate forms reliability coefficient is also called the coefficient of equivalence when the two forms are administered at about the same time. • The primary source of measurement error for alternate forms reliability is content sampling, time sampling or error introduced by an interaction between different examinees' knowledge and the different content assessed by the items included in the two forms (eg: Form A and Form B)
  • 23. Alternate (Equivalent, Parallel) Forms Reliability •The items in Form A might be a better match of one examinee's knowledge than items in Form B, while the opposite is true for another examinee. •In this situation, the two scores obtained by each examinee will differ, which will lower the alternate forms reliability coefficient. •When administration of the two forms is separated by a period of time, time sampling factors also contribute to error.
  • 24. Alternate (Equivalent, Parallel) Forms Reliability • Like test-retest reliability, alternate forms reliability is not appropriate when the attribute measured by the test is likely to fluctuate over time or when scores are likely to be affected by repeated measurement. • If the same strategies required to solve problems on Form A are used to solve problems on Form B, even if the problems on the two forms are not identical, there are likely to be practice effects.
  • 25. Alternate (Equivalent, Parallel) Forms Reliability • When these effects differ for different examinees (i.e., are random), practice will serve as a source of measurement error. • Although alternate forms reliability is considered by some experts as the most rigorous method for estimating reliability, it is not often assessed due to the difficulty in developing two forms of the same test that are truly equivalent. (Discuss criteria of parallel test)
  • 26. 3. Internal Consistency Estimates of Reliability • We have discussed that reliability estimates can be obtained by administering the same test to the same examinees and by correlating the results: Test/Retest. • We have also seen that reliability estimates can be obtained by by administering two parallel or alternate forms of a test, and then correlating those results: Parallel & Alternate Forms. • In both of the above cases, the test constructer or researcher must administer two exams, and they are sometimes given at different times to reduce the carry over effects. • Here we will see that it is also possible to obtained a reliability estimate using only a single test. • The most common way to obtained a reliability estimate using a single test is through the split- half approach/method
  • 27. Split-Half approach to Reliability • When using the Split-half approach, one gives a single test to a group of examinees. • Later, the test is divided into two parts, which may be considered to be alternate forms of one another. • In fact, the split is not so arbitrary; an attempt should be made to choose the two haves so that they are parallel or essential equivalent i.e. odd-even method . • Then the reliability of the whole test is estimated by using the Spearman Brown formula.
  • 28. Split-Half approach to Reliability • Using the Spearman-Brown formula: • Here we are assuming the two test halves (t and t’) are parallel forms. • Then the two halves are correlated, producing the estimated Spearman-Brown formula reliability coefficient, rtt’. • But this is only a measure of the reliability of one half of the test. • The reliability of the entire test would be greater than the reliability of half test. • The Spearman-Brown formula for estimating reliability of the entire test/ whole test is therefore: • Reliability coefficient= rh’. = 2x rtt’/ 1+ rtt’
  • 29. Split-Half approach to Reliability Reliability coefficient of half test (rtt’) and entire/whole test (rh) 0.00 0.00 0.20 0.33 0.40 0.57 0.60 0.75 0.80 0.89 1.00 1.00
  • 30. ii. Cronbach’s coefficient α approach • on the other hand, the two test halves may not parallel forms. • This is confirmed when it is determined that the two halves have unequal variances. • In these situations, it is best to use a different approach to estimating reliability. • Cronbach’s coefficient α • α can be used to estimate the reliability of the entire test.
  • 31. Cronbach’s coefficient α approach Cronbach’s coefficient α= 2 [σh 2 –(σt1 2 – σt2 2 )]/σh 2 σh = variance of the entire test, h σt1 = variance of the half test, t1 σt2 = variance of the half test, t2 • It is the case, that if the variances on both test halves are equal, then the Spearman-Brown formula and Cronbach’s α will produce identical results.
  • 32. • Content sampling is a source of error for both split-half reliability and coefficient alpha. • For split-half reliability, content sampling refers to the error resulting from differences between the content of the two halves of the test (i.e., the items included in one half may better fit the knowledge of some examinees than items in the other half). • For coefficient alpha, content (item) sampling refers to differences between individual test items rather than between test halves.
  • 33. iii. Kuder-Richardson Formulas-20 & 21 When test items are scored dichotomously (right or wrong), a variation of coefficient alpha known as the Kuder-Richardson Formula 20 (KR-20) can be used.
  • 35. Internal Consistency Reliability •The methods for assessing internal consistency reliability are useful when a test is designed to measure a single characteristic, when the characteristic measured by the test fluctuates over time, or when scores are likely to be affected by repeated exposure to the test. •They are not appropriate for assessing the reliability of speed tests because, for these tests, they tend to produce spuriously high coefficients. (For speed tests, alternate forms reliability is usually the best choice.)
  • 36. Factors That Affect The Reliability Coefficient The magnitude of the reliability coefficient is affected not only by the sources of error discussed earlier, but also by the length of the test, the range of the test scores, and the probability that the correct response to items can be selected by guessing. – Test Length – Range of Test Scores – Guessing
  • 37. 1. Test Length •The larger the sample of the attribute being measured by a test, the less the relative effects of measurement error and the more likely the sample will provide dependable, consistent information. •Consequently, a general rule is that the longer the test, the larger the test's reliability coefficient. •The Spearman-Brown prophecy formula is most associated with split-half reliability but can actually be used whenever a test developer wants to estimate the effects of lengthening or shortening a test on its reliability coefficient.
  • 38. Test Length For instance, if a 100-item test has a reliability coefficient of .84, the Spearman-Brown formula could be used to estimate the effects of increasing the number of items to 150 or reducing the number to 50. A problem with the Spearman-Brown formula is that it does not always yield an accurate estimate of reliability: In general, it tends to overestimate a test's true reliability (Gay, 1992).
  • 39. Test Length • This is most likely to be the case when the added items do not measure the same content domain as the original items and/or are more susceptible to the effects of measurement error. • Note that, when used to correct the split-half reliability coefficient, the situation is more complex, and this generalization does not always apply: When the two halves are not equivalent in terms of their means and standard deviations, the Spearman-Brown formula may either over- or underestimate the test's actual reliability.
  • 40. 2.Range of Test Scores • Since the reliability coefficient is a correlation coefficient, it is maximized when the range of scores is unrestricted. • When examinees are heterogeneous, the range of scores is maximized. • The range is also affected by the difficulty level of the test items.
  • 41. Range of Test Scores • When all items are either very difficult or very easy, all examinees will obtain either low or high scores, resulting in a restricted range. • Therefore, the best strategy is to choose items so that the average difficulty level is in the mid-range (r = .50).
  • 42. 3. Guessing • A test's reliability coefficient is also affected by the probability that examinees can guess the correct answers to test items. • As the probability of correctly guessing answers increases, the reliability coefficient decreases. • All other things being equal, a true/false test will have a lower reliability coefficient than a four- alternative multiple-choice test which, in turn, will have a lower reliability coefficient than a free recall test.
  • 43. General points about reliability • Any test is neither perfectly reliable nor perfectly unreliable. The reliability is not an absolute principle, rather it is always a matter of degree. • Reliability is necessary but not a sufficient condition for validity. • The reliability is primarily statistical.
  • 45. Why reliability is an important characteristics of a good test ? No matter how well the objectives are written, or how clever the items, the quality and usefulness of an examination is predicated on Validity and Reliability. • Without reliability and validity of one cannot test hypothesis. • Without testing hypothesis, one cannot support a theory. • Without a supported theory, one cannot explain why events occur. • Without adequate explanation, one cannot develop any effective material or non-material technologies.