SlideShare a Scribd company logo
Reliability
Evaluation of Measurement Instruments
• Reliability has to do with the consistency of the instrument.
- Internal Consistency (Consistency of the items)
- Test-retest Reliability (Consistency over time)
- Interrater Reliability (Consistency between raters)
- Split-half Methods
- Alternate Forms Methods
• Validity of an instrument has to do with the ability to
measure what it is supposed to measure and the extent to
which it predicts outcomes.
- Face Validity -
Construct & Content Validity
- Convergent & Divergent Validity
- Predictive Validity
- Discriminant Validity
Reliability
• Reliability is synonymous with consistency. It is the degree to
which test scores for a an individual test taker or group of test
takers are consistent over repeated applications.
• No psychological test is completely consistent, however, a
measurement that is unreliable is worthless.
For Example
A student receives a score of 100 on one intelligence tests and
114 in another or imagine that every time you stepped on a
scale it showed a different weight.
Would you keep using these measurement tools?
• The consistency of test scores is critically important in
determining whether a test can provide good measurement.
Reliability (cont.)
• Because no unit of measurement is exact, any time you measure
something (observed score), you are really measuring two things
1. True Score - the amount of observed score that truly represents
what you are intending to measure.
2. Error Component - the amount of other variables that can impact
the observed score
Observed Test Score = True Score + Errors of Measurement
For Example - if you weigh yourself today and weigh 140 lbs. and
then weigh yourself tomorrow and weigh 142 lbs., is the 2 pound
increase a true measure of your weight gain or could other variables
be involved?
Other variables may include: food intake, placement of scale, error
in the scale itself.
Why Do Test Scores Vary?
Possible Sources of Variability of Scores (pg. 110)
- General Ability to comprehend instructions
- Stable response sets (e.g., answering “C” option more frequently)
- The element of chance of getting a question right
- Conditions of testing
- Unreliability or bias in grading or rating performance
- Motivation
- Emotional Strain
Measurement Error
• Any fluctuation in test scores that results from factors related to
the measurement process that are irrelevant to what is being
measured.
• The difference between the observed score and the true score is
called the error score. S true = S observed - S error
• Developing better tests with less random measurement error is
better than simply documenting the amount of error.
Measurement Error is Reduced By:
- Writing items clearly
- Making instructions easily understood
- Adhering to proper test administration
- Providing consistent scoring
Determining Reliability
• There are several ways that a measurements reliability
can be determined, depending on the type of
measurement the and the supporting data required.
They include:
- Internal Consistency
- Test-retest Reliability
- Interrater Reliability
- Split-half Methods
- Odd-even Reliability
- Alternate Forms Methods
Internal Consistency
• Measures the reliability of a test solely on the number of items on
the test and the intercorrelation among the items. Therefore, it
compares each item to every other item.
• If a scale is measuring a construct, then overall the items on that
scale should be highly correlated with one another.
• There are two common ways of measuring internal consistency …
1. Cronbach’s Alpha: .80 to .95 (Excellent)
.70 to .80 (Very Good)
.60 to .70 (Satisfactory)
<.60 (Suspect)
2. Item-Total Correlations - the correlation of the item with the
remainder of the items (.30 is the minimum acceptable item-total
correlation).
Internal Consistency (cont.)
Internal consistency estimates are a function of:
The Number of Items - if we think that each test item is an
observation of behaviour, high internal consistency strengthens
the relationship --- i.e., There is more of it to observe.
Average Intercorrelation - the extent to which each item represents
the observation of the same thing observed.
The more you observe a construct, with greater consistency
=
Reliability
Split Half & Odd-Even Reliability
Split Half - refers to determining a correlation between the first
half of the measurement and the second half of the measurement
(i.e., we would expect answers to the first half to be similar to the
second half).
Odd-Even - refers to the correlation between even items and odd
items of a measurement tool.
• In this sense, we are using a single test to create two tests,
eliminating the need for additional items and multiple
administrations.
• Since in both of these types only 1 administration is needed and
the groups are determined by the internal components of the test,
it is referred to as an internal consistency measure.
Split Half & Odd-Even Reliability
Possible Advantages
• Simplest method - easy to perform
• Time and Cost Effective
Possible Disadvantages
• Many was of splitting
• Each split yields a somewhat different reliability estimate
• Which is the real reliability of the test?
Test-retest Reliability
• Test-retest reliability is usually measured by computing
the correlation coefficient between scores of two
administrations.
Test-retest Reliability (cont.)
• The amount of time allowed between measures is critical.
• The shorter the time gap, the higher the correlation; the longer
the time gap, the lower the correlation. This is because the two
observations are related over time.
• Optimum time betweem administrations is 2 to 4 weeks.
• If a scale is measuring a construct consistently, then there should
not be radical changes on the scores between administrations ---
unless something significant happened.
• The rationale behind this method is that the difference between
the scores of the test and the retest should be due to measurement
solely.
Test-retest Reliability (cont.)
• It is hard to specify one acceptable test-retest correlation
since what is considered acceptable depends on the the
type of scale, the use of the scale, and the time between
testing.
For example - it is not clear whether differences in test
scores are regarded as sources of measurement error or
as sources of real stability.
Possible difference in scores between tests? : experience,
characteristic being measured may change over time
(e.g. reading test), carryover effects (e.g., remember test)
Test-retest Reliability (cont.)
• A minimum correlation of at least .50 is expected.
• The higher the correlation (in a positive direction) the
higher the test-retest reliability
• The biggest problem with this type of reliability is what
called memory effect. Which means that a respondent
may recall the answers from the original test, therefore
inflating the reliability.
• Also, is it practical?
Interrater Reliability
• Whenever you use humans as a part of your measurement
procedure, you have to worry about whether the results you get
are reliable or consistent. People are notorious for their
inconsistency. We are easily distractible. We get tired of doing
repetitive tasks. We daydream. We misinterpret.
Interrater Reliability (cont.)
• For some scales it is important to assess interrater
reliability.
• Interrater reliability means that if two different raters
scored the scale using the scoring rules, they should
attain the same result.
• Interrater reliability is usually measured by computing
the correlation coefficient between the scores of two
raters for the set of respondents.
• Here the criterion of acceptability is pretty high (e.g., a
correlation of at least .9), but what is considered
acceptable will vary from situation to situation.
Parallel/Alternate Forms Method
Parallel/Alternate Forms Method - refers to the
administration of two alternate forms of the same
measurement device and then comparing the
scores.
• Both forms are administered to the same person and
the scores are correlated. If the two produce the
same results, then the instrument is considered
reliable.
Parallel/Alternate Forms Method (cont.)
• A correlation between these two forms is computed just
as the test-retest method.
Advantages
• Eliminates the problem of memory effect.
• Reactivity effects (i.e., experience of taking the test) are
also partially controlled.
• Can address a wider array of sampling of the entire
domain than the test-retest method.
Parallel/Alternate Forms Method (cont.)
Possible Disadvantages
• Are the two forms of the test actually measuring
the same thing.
• More Expensive
• Requires additional work to develop two
measurement tools.
Factors Affecting Reliability
• Administrator Factors
• Number of Items on the instrument
• The Instrument Taker
• Heterogeneity of the Items
• Heterogeneity of the Group Members
• Length of Time between Test and Retest
• Poor or unclear directions given during
administration or inaccurate scoring can affect
reliability.
For Example - say you were told that your scores on
being social determined your promotion. The result
is more likely to be what you think they want than
what your behavior is.
Administrator Factors
• The larger the number of items, the greater the
chance for high reliability.
For Example -it makes sense when you ponder that
twenty questions on your leadership style is more
likely to get a consistent result than four questions.
• Remedy: Use longer tests or accumulate
scores from short tests.
Number of Items on the Instrument
For Example -If you took an instrument in August
when you had a terrible flu and then in December
when you were feeling quite good, we might see a
difference in your response consistency. If you were
under considerable stress of some sort or if you were
interrupted while answering the instrument
questions, you might give different responses.
The Test Taker
Heterogeneity of the Items -- The greater the
heterogeneity (differences in the kind of questions or
difficulty of the question) of the items, the greater
the chance for high reliability correlation
coefficients.
Heterogeneity of the Group Members -- The greater
the heterogeneity of the group members in the
preferences, skills or behaviors being tested, the
greater the chance for high reliability correlation
coefficients.
Heterogeneity
• The shorter the time, the greater the chance for high
reliability correlation coefficients.
• As we have experiences, we tend to adjust our views a little
from time to time. Therefore, the time interval between the
first time we took an instrument and the second time is
really an "experience" interval.
• Experience happens, and it influences how we see things.
Because internal consistency has no time lapse, one can
expect it to have the highest reliability correlation
coefficient.
Length of Time between Test and Retest
How High Should Reliability Be?
• A highly reliable test is always preferable to a test with
lower reliability.
.80 > greater (Excellent)
.70 to .80 (Very Good)
.60 to .70 (Satisfactory)
<.60 (Suspect)
• A reliability coefficient of .80 indicates that 20% of the
variability in test scores is due to measurement error.
Generalizability Theory
Theory of measurement that attempts to determine the
sources of consistency and inconsistency
• It is necessary to obtain multiple observations for the sample
group of individuals on all the variables that might contribute to
causing measurement error (e.g., scores across occasions, across
scorers, across alternative forms).
• Allows for the evaluation of interaction effects from different
types of error sources.
• Allows for the evaluation of interaction effects from different
types of error sources.
Generalizability Theory (cont.)
• If feasible, it is a more thorough procedure for identifying the
error component that may enter scores.
• Useful when associated with complex methods:
1. The conditions of measurement affect test scores.
2. Test scores are used for several different purposes.
For Example - measurement involving subjectivity (e.g.,
interviews, rating scales) involve bias. Therefore, human
judgement could be considered “conditions of measurement”
Standard Error of Measurement (SEM)
• SEM is a statistic that obtains the confidence interval for many
obtained scores. It represents the hypothetical distribution we
would have if someone took a test an infinite # of times.
• A measure that allows one to predict the range of fluctuation that
is likely to occur in a single individual's score because of
irrelevant, chance factors. This measurement is used in analyzing
the reliability of the test in obtaining the "true" score.
• Indicates how much variability in test scores can be expected as
a result of measurement error.
• SEM is a function of two factors: reliability of test & variability
of test scores. Formula for SEM is :
SM = SD(Sq root of 1 minus reliability)
Standard Error of Measurement (cont.)
• The most common use of the SEM is the production of the
confidence intervals. The SEM is an estimate of how much
error there is in a test.
• The SEM can be looked at in the same way as Standard
Deviations. Sixty eight percent of the time the true score
would be between plus one SEM and minus one SEM. We
could be 68% sure that the students true score would be
between +/- one SEM. Between +/- two SEM the true score
would be found 96% of the time (e.g., SEM x +/- two SEM)
• Or, if the student took the test 100 times, 64 times the true
score would fall between +/- one SEM.

More Related Content

Similar to Evaluation of Measurement Instruments.ppt

Characteristics of a good test
Characteristics of a good test Characteristics of a good test
Characteristics of a good test
Arash Yazdani
 
Reliability and validity1
Reliability and validity1Reliability and validity1
Reliability and validity1
MMIHS
 
Meaning and Methods of Estimating Reliability of Test.pptx
Meaning and Methods of Estimating Reliability of Test.pptxMeaning and Methods of Estimating Reliability of Test.pptx
Meaning and Methods of Estimating Reliability of Test.pptx
sarat68
 
Reliability of test
Reliability of testReliability of test
Reliability of test
Sarat Rout
 
Reliability and validity- research-for BSC/PBBSC AND MSC NURSING
Reliability and validity- research-for BSC/PBBSC AND MSC NURSINGReliability and validity- research-for BSC/PBBSC AND MSC NURSING
Reliability and validity- research-for BSC/PBBSC AND MSC NURSING
SUCHITRARATI1976
 
EM&E.pptx
EM&E.pptxEM&E.pptx
EM&E.pptx
Hafiz20006
 
Reliability & validity
Reliability & validityReliability & validity
Reliability & validityshefali84
 
What makes a good testA test is considered good” if the .docx
What makes a good testA test is considered good” if the .docxWhat makes a good testA test is considered good” if the .docx
What makes a good testA test is considered good” if the .docx
mecklenburgstrelitzh
 
RELIABILITY AND VALIDITY
RELIABILITY AND VALIDITYRELIABILITY AND VALIDITY
RELIABILITY AND VALIDITY
Joydeep Singh
 
Validity and reliability of the instrument
Validity and reliability of the instrumentValidity and reliability of the instrument
Validity and reliability of the instrument
Bhumi Patel
 
Questionnaire measurement (1).pptx
Questionnaire measurement (1).pptxQuestionnaire measurement (1).pptx
Questionnaire measurement (1).pptx
ChetanGarg52
 
JC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptxJC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptx
saurami
 
Characteristics of effective tests and hiring
Characteristics of effective tests and hiringCharacteristics of effective tests and hiring
Characteristics of effective tests and hiring
Binibining Kalawakan
 
Rep
RepRep
Rep
Cedy_28
 
Validity and reliability in assessment.
Validity and reliability in assessment. Validity and reliability in assessment.
Validity and reliability in assessment.
Tarek Tawfik Amin
 
Characteristics of Good Evaluation Instrument
Characteristics of Good Evaluation InstrumentCharacteristics of Good Evaluation Instrument
Characteristics of Good Evaluation Instrument
Suresh Babu
 
Ag Extn.504 :- RESEARCH METHODS IN BEHAVIOURAL SCIENCE
Ag Extn.504 :-  RESEARCH METHODS IN BEHAVIOURAL SCIENCE  Ag Extn.504 :-  RESEARCH METHODS IN BEHAVIOURAL SCIENCE
Ag Extn.504 :- RESEARCH METHODS IN BEHAVIOURAL SCIENCE
Pradip Limbani
 
Qualities of Good Test.pdf
Qualities of Good Test.pdfQualities of Good Test.pdf
Qualities of Good Test.pdf
FaheemGul17
 
Method of measuring test reliability
Method of measuring test reliabilityMethod of measuring test reliability
Method of measuring test reliability
namrata227
 

Similar to Evaluation of Measurement Instruments.ppt (20)

Characteristics of a good test
Characteristics of a good test Characteristics of a good test
Characteristics of a good test
 
Reliability and validity1
Reliability and validity1Reliability and validity1
Reliability and validity1
 
Meaning and Methods of Estimating Reliability of Test.pptx
Meaning and Methods of Estimating Reliability of Test.pptxMeaning and Methods of Estimating Reliability of Test.pptx
Meaning and Methods of Estimating Reliability of Test.pptx
 
Reliability of test
Reliability of testReliability of test
Reliability of test
 
Reliability and validity- research-for BSC/PBBSC AND MSC NURSING
Reliability and validity- research-for BSC/PBBSC AND MSC NURSINGReliability and validity- research-for BSC/PBBSC AND MSC NURSING
Reliability and validity- research-for BSC/PBBSC AND MSC NURSING
 
EM&E.pptx
EM&E.pptxEM&E.pptx
EM&E.pptx
 
Reliability & validity
Reliability & validityReliability & validity
Reliability & validity
 
What makes a good testA test is considered good” if the .docx
What makes a good testA test is considered good” if the .docxWhat makes a good testA test is considered good” if the .docx
What makes a good testA test is considered good” if the .docx
 
RELIABILITY AND VALIDITY
RELIABILITY AND VALIDITYRELIABILITY AND VALIDITY
RELIABILITY AND VALIDITY
 
Validity and reliability of the instrument
Validity and reliability of the instrumentValidity and reliability of the instrument
Validity and reliability of the instrument
 
Questionnaire measurement (1).pptx
Questionnaire measurement (1).pptxQuestionnaire measurement (1).pptx
Questionnaire measurement (1).pptx
 
JC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptxJC-16-23June2021-rel-val.pptx
JC-16-23June2021-rel-val.pptx
 
Characteristics of effective tests and hiring
Characteristics of effective tests and hiringCharacteristics of effective tests and hiring
Characteristics of effective tests and hiring
 
Rep
RepRep
Rep
 
Validity and reliability in assessment.
Validity and reliability in assessment. Validity and reliability in assessment.
Validity and reliability in assessment.
 
Characteristics of Good Evaluation Instrument
Characteristics of Good Evaluation InstrumentCharacteristics of Good Evaluation Instrument
Characteristics of Good Evaluation Instrument
 
Ag Extn.504 :- RESEARCH METHODS IN BEHAVIOURAL SCIENCE
Ag Extn.504 :-  RESEARCH METHODS IN BEHAVIOURAL SCIENCE  Ag Extn.504 :-  RESEARCH METHODS IN BEHAVIOURAL SCIENCE
Ag Extn.504 :- RESEARCH METHODS IN BEHAVIOURAL SCIENCE
 
Qualities of Good Test.pdf
Qualities of Good Test.pdfQualities of Good Test.pdf
Qualities of Good Test.pdf
 
Reliability and validity
Reliability and  validityReliability and  validity
Reliability and validity
 
Method of measuring test reliability
Method of measuring test reliabilityMethod of measuring test reliability
Method of measuring test reliability
 

More from CityComputers3

unit 04-social interaction (1) 2 (2).pptx
unit 04-social interaction (1) 2 (2).pptxunit 04-social interaction (1) 2 (2).pptx
unit 04-social interaction (1) 2 (2).pptx
CityComputers3
 
Chapter 12- Curriculum Issues and Trends
Chapter 12- Curriculum Issues and TrendsChapter 12- Curriculum Issues and Trends
Chapter 12- Curriculum Issues and Trends
CityComputers3
 
6573 Unit 8.pptx
6573 Unit 8.pptx6573 Unit 8.pptx
6573 Unit 8.pptx
CityComputers3
 
6573-Unit-5.pptx
6573-Unit-5.pptx6573-Unit-5.pptx
6573-Unit-5.pptx
CityComputers3
 
6573 unit 2.pptx
6573 unit 2.pptx6573 unit 2.pptx
6573 unit 2.pptx
CityComputers3
 
Air Pollutaion.pptx
Air Pollutaion.pptxAir Pollutaion.pptx
Air Pollutaion.pptx
CityComputers3
 
junk food.pptx
junk food.pptxjunk food.pptx
junk food.pptx
CityComputers3
 
Leadership final .pptx
Leadership final .pptxLeadership final .pptx
Leadership final .pptx
CityComputers3
 
5-Performance Management by Jamshed (2).pptx
5-Performance Management by Jamshed (2).pptx5-Performance Management by Jamshed (2).pptx
5-Performance Management by Jamshed (2).pptx
CityComputers3
 
Qualitative Research 001.pdf
Qualitative Research 001.pdfQualitative Research 001.pdf
Qualitative Research 001.pdf
CityComputers3
 
qualitative research final.pdf
qualitative research final.pdfqualitative research final.pdf
qualitative research final.pdf
CityComputers3
 
Qualitative Research (01).ppt
Qualitative Research (01).pptQualitative Research (01).ppt
Qualitative Research (01).ppt
CityComputers3
 
qualitative research.pdf
qualitative research.pdfqualitative research.pdf
qualitative research.pdf
CityComputers3
 
qualitative.ppt
qualitative.pptqualitative.ppt
qualitative.ppt
CityComputers3
 
classification od computer types.pptx
classification od computer types.pptxclassification od computer types.pptx
classification od computer types.pptx
CityComputers3
 
computing in educaiton.pdf
computing in educaiton.pdfcomputing in educaiton.pdf
computing in educaiton.pdf
CityComputers3
 
types of computers.pdf
types of computers.pdftypes of computers.pdf
types of computers.pdf
CityComputers3
 
types of computer.pdf
types of computer.pdftypes of computer.pdf
types of computer.pdf
CityComputers3
 
Unit 1 Performance Management Overview.ppt
Unit 1 Performance Management Overview.pptUnit 1 Performance Management Overview.ppt
Unit 1 Performance Management Overview.ppt
CityComputers3
 
Performance Management.ppt
Performance Management.pptPerformance Management.ppt
Performance Management.ppt
CityComputers3
 

More from CityComputers3 (20)

unit 04-social interaction (1) 2 (2).pptx
unit 04-social interaction (1) 2 (2).pptxunit 04-social interaction (1) 2 (2).pptx
unit 04-social interaction (1) 2 (2).pptx
 
Chapter 12- Curriculum Issues and Trends
Chapter 12- Curriculum Issues and TrendsChapter 12- Curriculum Issues and Trends
Chapter 12- Curriculum Issues and Trends
 
6573 Unit 8.pptx
6573 Unit 8.pptx6573 Unit 8.pptx
6573 Unit 8.pptx
 
6573-Unit-5.pptx
6573-Unit-5.pptx6573-Unit-5.pptx
6573-Unit-5.pptx
 
6573 unit 2.pptx
6573 unit 2.pptx6573 unit 2.pptx
6573 unit 2.pptx
 
Air Pollutaion.pptx
Air Pollutaion.pptxAir Pollutaion.pptx
Air Pollutaion.pptx
 
junk food.pptx
junk food.pptxjunk food.pptx
junk food.pptx
 
Leadership final .pptx
Leadership final .pptxLeadership final .pptx
Leadership final .pptx
 
5-Performance Management by Jamshed (2).pptx
5-Performance Management by Jamshed (2).pptx5-Performance Management by Jamshed (2).pptx
5-Performance Management by Jamshed (2).pptx
 
Qualitative Research 001.pdf
Qualitative Research 001.pdfQualitative Research 001.pdf
Qualitative Research 001.pdf
 
qualitative research final.pdf
qualitative research final.pdfqualitative research final.pdf
qualitative research final.pdf
 
Qualitative Research (01).ppt
Qualitative Research (01).pptQualitative Research (01).ppt
Qualitative Research (01).ppt
 
qualitative research.pdf
qualitative research.pdfqualitative research.pdf
qualitative research.pdf
 
qualitative.ppt
qualitative.pptqualitative.ppt
qualitative.ppt
 
classification od computer types.pptx
classification od computer types.pptxclassification od computer types.pptx
classification od computer types.pptx
 
computing in educaiton.pdf
computing in educaiton.pdfcomputing in educaiton.pdf
computing in educaiton.pdf
 
types of computers.pdf
types of computers.pdftypes of computers.pdf
types of computers.pdf
 
types of computer.pdf
types of computer.pdftypes of computer.pdf
types of computer.pdf
 
Unit 1 Performance Management Overview.ppt
Unit 1 Performance Management Overview.pptUnit 1 Performance Management Overview.ppt
Unit 1 Performance Management Overview.ppt
 
Performance Management.ppt
Performance Management.pptPerformance Management.ppt
Performance Management.ppt
 

Recently uploaded

How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
Col Mukteshwar Prasad
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
AzmatAli747758
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 

Recently uploaded (20)

How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 

Evaluation of Measurement Instruments.ppt

  • 2. Evaluation of Measurement Instruments • Reliability has to do with the consistency of the instrument. - Internal Consistency (Consistency of the items) - Test-retest Reliability (Consistency over time) - Interrater Reliability (Consistency between raters) - Split-half Methods - Alternate Forms Methods • Validity of an instrument has to do with the ability to measure what it is supposed to measure and the extent to which it predicts outcomes. - Face Validity - Construct & Content Validity - Convergent & Divergent Validity - Predictive Validity - Discriminant Validity
  • 3. Reliability • Reliability is synonymous with consistency. It is the degree to which test scores for a an individual test taker or group of test takers are consistent over repeated applications. • No psychological test is completely consistent, however, a measurement that is unreliable is worthless. For Example A student receives a score of 100 on one intelligence tests and 114 in another or imagine that every time you stepped on a scale it showed a different weight. Would you keep using these measurement tools? • The consistency of test scores is critically important in determining whether a test can provide good measurement.
  • 4. Reliability (cont.) • Because no unit of measurement is exact, any time you measure something (observed score), you are really measuring two things 1. True Score - the amount of observed score that truly represents what you are intending to measure. 2. Error Component - the amount of other variables that can impact the observed score Observed Test Score = True Score + Errors of Measurement For Example - if you weigh yourself today and weigh 140 lbs. and then weigh yourself tomorrow and weigh 142 lbs., is the 2 pound increase a true measure of your weight gain or could other variables be involved? Other variables may include: food intake, placement of scale, error in the scale itself.
  • 5. Why Do Test Scores Vary? Possible Sources of Variability of Scores (pg. 110) - General Ability to comprehend instructions - Stable response sets (e.g., answering “C” option more frequently) - The element of chance of getting a question right - Conditions of testing - Unreliability or bias in grading or rating performance - Motivation - Emotional Strain
  • 6. Measurement Error • Any fluctuation in test scores that results from factors related to the measurement process that are irrelevant to what is being measured. • The difference between the observed score and the true score is called the error score. S true = S observed - S error • Developing better tests with less random measurement error is better than simply documenting the amount of error. Measurement Error is Reduced By: - Writing items clearly - Making instructions easily understood - Adhering to proper test administration - Providing consistent scoring
  • 7. Determining Reliability • There are several ways that a measurements reliability can be determined, depending on the type of measurement the and the supporting data required. They include: - Internal Consistency - Test-retest Reliability - Interrater Reliability - Split-half Methods - Odd-even Reliability - Alternate Forms Methods
  • 8. Internal Consistency • Measures the reliability of a test solely on the number of items on the test and the intercorrelation among the items. Therefore, it compares each item to every other item. • If a scale is measuring a construct, then overall the items on that scale should be highly correlated with one another. • There are two common ways of measuring internal consistency … 1. Cronbach’s Alpha: .80 to .95 (Excellent) .70 to .80 (Very Good) .60 to .70 (Satisfactory) <.60 (Suspect) 2. Item-Total Correlations - the correlation of the item with the remainder of the items (.30 is the minimum acceptable item-total correlation).
  • 9. Internal Consistency (cont.) Internal consistency estimates are a function of: The Number of Items - if we think that each test item is an observation of behaviour, high internal consistency strengthens the relationship --- i.e., There is more of it to observe. Average Intercorrelation - the extent to which each item represents the observation of the same thing observed. The more you observe a construct, with greater consistency = Reliability
  • 10. Split Half & Odd-Even Reliability Split Half - refers to determining a correlation between the first half of the measurement and the second half of the measurement (i.e., we would expect answers to the first half to be similar to the second half). Odd-Even - refers to the correlation between even items and odd items of a measurement tool. • In this sense, we are using a single test to create two tests, eliminating the need for additional items and multiple administrations. • Since in both of these types only 1 administration is needed and the groups are determined by the internal components of the test, it is referred to as an internal consistency measure.
  • 11. Split Half & Odd-Even Reliability Possible Advantages • Simplest method - easy to perform • Time and Cost Effective Possible Disadvantages • Many was of splitting • Each split yields a somewhat different reliability estimate • Which is the real reliability of the test?
  • 12. Test-retest Reliability • Test-retest reliability is usually measured by computing the correlation coefficient between scores of two administrations.
  • 13. Test-retest Reliability (cont.) • The amount of time allowed between measures is critical. • The shorter the time gap, the higher the correlation; the longer the time gap, the lower the correlation. This is because the two observations are related over time. • Optimum time betweem administrations is 2 to 4 weeks. • If a scale is measuring a construct consistently, then there should not be radical changes on the scores between administrations --- unless something significant happened. • The rationale behind this method is that the difference between the scores of the test and the retest should be due to measurement solely.
  • 14. Test-retest Reliability (cont.) • It is hard to specify one acceptable test-retest correlation since what is considered acceptable depends on the the type of scale, the use of the scale, and the time between testing. For example - it is not clear whether differences in test scores are regarded as sources of measurement error or as sources of real stability. Possible difference in scores between tests? : experience, characteristic being measured may change over time (e.g. reading test), carryover effects (e.g., remember test)
  • 15. Test-retest Reliability (cont.) • A minimum correlation of at least .50 is expected. • The higher the correlation (in a positive direction) the higher the test-retest reliability • The biggest problem with this type of reliability is what called memory effect. Which means that a respondent may recall the answers from the original test, therefore inflating the reliability. • Also, is it practical?
  • 16. Interrater Reliability • Whenever you use humans as a part of your measurement procedure, you have to worry about whether the results you get are reliable or consistent. People are notorious for their inconsistency. We are easily distractible. We get tired of doing repetitive tasks. We daydream. We misinterpret.
  • 17. Interrater Reliability (cont.) • For some scales it is important to assess interrater reliability. • Interrater reliability means that if two different raters scored the scale using the scoring rules, they should attain the same result. • Interrater reliability is usually measured by computing the correlation coefficient between the scores of two raters for the set of respondents. • Here the criterion of acceptability is pretty high (e.g., a correlation of at least .9), but what is considered acceptable will vary from situation to situation.
  • 18. Parallel/Alternate Forms Method Parallel/Alternate Forms Method - refers to the administration of two alternate forms of the same measurement device and then comparing the scores. • Both forms are administered to the same person and the scores are correlated. If the two produce the same results, then the instrument is considered reliable.
  • 19. Parallel/Alternate Forms Method (cont.) • A correlation between these two forms is computed just as the test-retest method. Advantages • Eliminates the problem of memory effect. • Reactivity effects (i.e., experience of taking the test) are also partially controlled. • Can address a wider array of sampling of the entire domain than the test-retest method.
  • 20. Parallel/Alternate Forms Method (cont.) Possible Disadvantages • Are the two forms of the test actually measuring the same thing. • More Expensive • Requires additional work to develop two measurement tools.
  • 21. Factors Affecting Reliability • Administrator Factors • Number of Items on the instrument • The Instrument Taker • Heterogeneity of the Items • Heterogeneity of the Group Members • Length of Time between Test and Retest
  • 22. • Poor or unclear directions given during administration or inaccurate scoring can affect reliability. For Example - say you were told that your scores on being social determined your promotion. The result is more likely to be what you think they want than what your behavior is. Administrator Factors
  • 23. • The larger the number of items, the greater the chance for high reliability. For Example -it makes sense when you ponder that twenty questions on your leadership style is more likely to get a consistent result than four questions. • Remedy: Use longer tests or accumulate scores from short tests. Number of Items on the Instrument
  • 24. For Example -If you took an instrument in August when you had a terrible flu and then in December when you were feeling quite good, we might see a difference in your response consistency. If you were under considerable stress of some sort or if you were interrupted while answering the instrument questions, you might give different responses. The Test Taker
  • 25. Heterogeneity of the Items -- The greater the heterogeneity (differences in the kind of questions or difficulty of the question) of the items, the greater the chance for high reliability correlation coefficients. Heterogeneity of the Group Members -- The greater the heterogeneity of the group members in the preferences, skills or behaviors being tested, the greater the chance for high reliability correlation coefficients. Heterogeneity
  • 26. • The shorter the time, the greater the chance for high reliability correlation coefficients. • As we have experiences, we tend to adjust our views a little from time to time. Therefore, the time interval between the first time we took an instrument and the second time is really an "experience" interval. • Experience happens, and it influences how we see things. Because internal consistency has no time lapse, one can expect it to have the highest reliability correlation coefficient. Length of Time between Test and Retest
  • 27. How High Should Reliability Be? • A highly reliable test is always preferable to a test with lower reliability. .80 > greater (Excellent) .70 to .80 (Very Good) .60 to .70 (Satisfactory) <.60 (Suspect) • A reliability coefficient of .80 indicates that 20% of the variability in test scores is due to measurement error.
  • 28. Generalizability Theory Theory of measurement that attempts to determine the sources of consistency and inconsistency • It is necessary to obtain multiple observations for the sample group of individuals on all the variables that might contribute to causing measurement error (e.g., scores across occasions, across scorers, across alternative forms). • Allows for the evaluation of interaction effects from different types of error sources. • Allows for the evaluation of interaction effects from different types of error sources.
  • 29. Generalizability Theory (cont.) • If feasible, it is a more thorough procedure for identifying the error component that may enter scores. • Useful when associated with complex methods: 1. The conditions of measurement affect test scores. 2. Test scores are used for several different purposes. For Example - measurement involving subjectivity (e.g., interviews, rating scales) involve bias. Therefore, human judgement could be considered “conditions of measurement”
  • 30. Standard Error of Measurement (SEM) • SEM is a statistic that obtains the confidence interval for many obtained scores. It represents the hypothetical distribution we would have if someone took a test an infinite # of times. • A measure that allows one to predict the range of fluctuation that is likely to occur in a single individual's score because of irrelevant, chance factors. This measurement is used in analyzing the reliability of the test in obtaining the "true" score. • Indicates how much variability in test scores can be expected as a result of measurement error. • SEM is a function of two factors: reliability of test & variability of test scores. Formula for SEM is : SM = SD(Sq root of 1 minus reliability)
  • 31. Standard Error of Measurement (cont.) • The most common use of the SEM is the production of the confidence intervals. The SEM is an estimate of how much error there is in a test. • The SEM can be looked at in the same way as Standard Deviations. Sixty eight percent of the time the true score would be between plus one SEM and minus one SEM. We could be 68% sure that the students true score would be between +/- one SEM. Between +/- two SEM the true score would be found 96% of the time (e.g., SEM x +/- two SEM) • Or, if the student took the test 100 times, 64 times the true score would fall between +/- one SEM.