SlideShare a Scribd company logo
1 of 34
Fundamentals of Classical Test Theory
(CTT)
Dr. Jai Singh
National Accreditation Board for Education and
Training (NABET) -QCI
Objectives-
• To understand the construct and latent traits.
• To know about measuring latent traits.
• To understand the terminology in test
construction.
• To know about fundamentals of CTT.
• To know various assumptions of CTT.
• To critically evaluate use of CTT and its
limitations.
Constructs and Measures
constructs are theoretical terms that refer to unobserved, idealized entities.
A Construct's height, weight or depth cannot be measured because
constructs are not concrete materials in the visible world.
In psychology, Construct refers to any complex psychological concept.
Construct is a skill, attribute, or ability that is based on one or more
established theories. Constructs exist in the human brain and are not directly
observable.
In psychology and cognitive science, constructs include terms like person's
motivation, Intelligence, anxiety, and fear, anger, personality, love,
attachment, memory, creativity, learning outcomes and attention.
Measures are the observations used in science to learn about constructs.
These include things like reaction times, accuracy scores, and response
frequencies.
Latent Traits
Latent traits are a specific kind of construct.
– Relatively stable qualities of individuals that are
changeable, but only over the long term.
• Transient things, such as “attention,” are not traits.
– Latent traits include everyday things like attitudes,
personality, preferences, and dispositions (e.g.,
“talkative”).
– Latent traits also include many kinds of things that
educators are interested in:
• Ability, aptitude, creativity, expertise, learning
outcomes and intelligence.
Measuring Latent Traits
It is important to recognize that no single measure of a
latent trait is ever taken to be a perfectly accurate
measure of that trait.
– Instead, different kinds of “measures” or “tests” are
seen as “tapping into” the latent trait.
– Different measures may “tap into” a latent trait in
different ways, capturing some aspects of the trait
better than others.
– Multiple measures can provide “converging”
evidence.
Psychometric Tests
• Psychometric tests are standardized tests, and
they are designed to assess a particular
variable.
• Psychometric tests- scientific and systematic
ways to test someone's ability to do a job or
measure their personality or some mental
ability (like math achievement, learning
outcomes-language outcomes etc.).
• Psychometrics means the study of developing
measurements.
Measuring body temperature
• Using temperature to indicate illness
• Measurement tool: a mercury
• thermometer - a glass vacuum tube with a
bulb of mercury at one end.
Measuring body temperature
To make inference between taking temperature
and illness
– theory regarding: Thermal equilibrium via conduction.
– The proportionality of mercury density with a
conceptual temperature scale.
– Relationship between mouth and core body
temperature.
– Relationship between core body temperature and
illness.
At each stage, error may intrude
• Thermal equilibrium may not have been reached (e.g.
thermometer removed too quickly).
• –
• Expansion of mercury also affected by other things (e.g. air
pressure).
• –
• Mouth temperature may not reflect core body temperature
(e.g. after a hot cup of tea).
• –
• Core body temperature does not vary with all illnesses, and
is not even completely stable in health.
Identify the sources of errors in measuring students’
attributes.
Test developer’s concerns –
• Quality of test items
• How examinees respond to it when constructing
tests
• Reliable and Valid tool
A psychometrician generally uses psychometric
techniques to determine the validity and reliability.
Construction of Test based on CTT
Table of Specification (Blue Print) for Class 5th Science
Instructional
Objective
Content Areas
Knowledge
40%
Understanding
35%
Application
25%
Total
SU MC MT TF SU MC MT TF SU MC MT TF
1) Food and Health 1 2 2 1 1 2 1 1 1 1 1 1 15
2)Plant Life 0 2 2 1 0 1 2 1 1 0 1 1 12
3)Animal Life 1 1 2 1 1 0 1 2 0 1 1 1 12
4)Force Work and Energy 2 2 1 2 1 2 1 1 1 1 1 1 16
5)Weight, Volume &
Density
2 2 1 2 2 1 2 1 2 1 1 1 18
6)The Environment 2 1 2 1 1 1 1 2 1 2 1 0 15
7)The Rocks and
Minerals
1 1 2 1 1 1 1 1 0 1 1 1 12
Total 9 11 12 9 7 8 9 9 6 7 7 6 100
SU= Supply Type, MC=Multiple Choice, MT=Matching, TF=True False
Examples
1) Instrument used to measure earthquake is
known as-
(a)Seismograph
(b) Quake meter
(c ) Barometer
(d) None of above
2) How many seismograph stations are needed to
locate the epicenter of earthquake?
(a) 2
(b) 3
(c ) 4
(d) 5
3) In which situation spring tides can occur?
(a) The moon , sun and earth are at right angle
with the earth at apex
(b) The moon is the farthest from the earth
(c )The sun is closest to the earth
(d) The moon , sun and earth are at the same line
4) The topic of cancer passes through
(a) India and Iran
(b) Iran and Pakistan
(c) India and Saudi Arabia
(d) Iran and Iraq
1) The area of semi circle is
(a) ∏R2
(b) ∏R2/2
(c) 2∏R
(d) ∏R
2) In a circle given below, if
AB is diameter, angle a=300
Then the value of angle b
will be
(a) 450
(b) 600
(c) 900
(d) 550
Higher Score Achiever and Low Score Achiever
Higher Group
27%
Lower Group
27%
Difficulty of an Item
The difficulty of an item is understood as the
proportion of the persons who answer a test
item correctly.
– The higher this proportion- the lower the
difficulty
– the greater the difficulty of an item- the lower its
index
Index of Difficulty
DI= RU+RL x 100
T
RU= The number in upper group who answered correctly
RL=The number in lower group who answered correctly
T= The total number who tried the item
Hypothetical Example-
Discrimination
A good item should discriminate between those who score
high on the test and those who score low.
We would expect that –
- those having a high overall test score would have a high
probability of being able to answer the item.
- those having low test scores would have a low probability of
answering the item correctly.
The higher the discrimination index, the better the item can
determine the difference between those with high test scores and
those with low ones.
Formula for item discriminating power
Item discriminating power =
RU-RL
T/2
Where
RU= Students from upper group who got the answer correct.
RL= Students from lower group who got the answer correct.
T/2 = half of the total number of pupils included in the item analysis.
Hypothetical Example-
Test Theories/Model -
Classical Test Theory
Item Response Theory
Test
Theory/Model
Classical Test
Theory (CTT)
Item Response
Theory (IRT)
• Both theories enable to predict outcomes of psychological tests by identifying
parameters of item difficulty and the ability of test takers.
• Both are concerned to improve the reliability and validity of psychological
tests. Both of these approaches provide measures of validity and reliability.
Classical Test Theory
Classical Test Theory is used to predict an
individual’s latent trait based on an observed
total score on an instrument.
Continued-
• In CTT, the true score predicts the level of the
latent variable.
• The random error is normally distributed with
a mean of 0 and a SD of 1.
• The random errors are uncorrelated with each
other and also are uncorrelated to the true
scores.
Mathematical Model of CTT
Observed test scores (X) are composed of a true score (T) and an error
score (E)
-the true and the error scores are independent.
Charles Spearman- reduce random error as much as possible, thereby making tests better.
Illustrated in the formula: X = T + E.
Where –
X= Total Score
T=True Score
E=Error Score
The variables are established by Spearman (1904) and Novick (1966)
Classical Test Theory
Classical test theory (CTT) in psychometrics
is all about reliability.
• Reliability refers to how consistent a test or measure is.
• In CTT- Three Base Terms- test/Observed score, error,
and true score.
• Ex. - math exam and get an 85,
• Test score-85.
• Error – sound
- mistake in the test, or
-external environment not totally control but
that impact testing
But psychometrics assumes everyone has, in theory, a
true score.
- We can calculate this true score with an equation.
Is True score reflect
true ability?
Why true score vary
without intervention?
Standard error of measurement
Sm = S √1 - r .
The standard deviation of the distribution of random errors for each individual
standard error of measurement-larger- the less certain is the accuracy
standard error of measurement-small- high accuracy- individual score is probably
close to the true score.
Use of Standard error of measurement –
create confidence intervals around specific observed scores
The lower and upper bound of the confidence interval approximate the value of the
true score.
Will distribution of random errors be the same for all individuals - ?
Why score vary over different administration on subjects ?
Is not error due to item characteristics, administration , environment, and nature of
tool ?
Error Distribution
St. Obtained Score True Score Error
1 85 80 +5
2 69 72 -3
3 48 45 +3
4 82 85 -3
5 39 43 -4
6 45 41 +4
7 78 79 -1
Assumption of Classical Test Theory
• Varying responses of examinees are due only to
variation in ability of interest.
• All other potential sources of variation existing in
the testing materials such as external conditions
or internal conditions of examinees are assumed
to be constant.
Continued-
• Each individual has a true score which would be
obtained if there were no errors in measurement.
• The difference between the true score and the observed
test score results from measurement error.
• Error is often assumed to be a random variable having a
normal distribution.
• Tests are fallible imprecise tools. The true score for an
individual will not change with repeated applications of
the same test.
Shortcomings of CTT
• Examinee characteristics and test characteristics -cannot be separated
each can only be interpreted in the context of the other.
• Reliability is "the correlation between test scores on parallel forms of a test".
differing opinions of what parallel tests are-reliability coefficients
provide either lower bound estimates of reliability or reliability estimates with
unknown biases.
• Standard error of measurement
the standard error of measurement is assumed to be the same for all
examinees of different ability.
• Measurement accuracy and Attribute level
Common estimate of the measurement precision that is assumed to be equal
for all individuals irrespective of their attribute levels.
• CTT is test oriented, rather than item oriented
cannot help us make predictions of how well an individual or even a group
of examinees might do on a test item
Limitations of CTT
Sample Dependent-
The focus of the analysis is –
 total test score;
 frequency of correct responses (to indicate question difficulty);
 frequency of responses (to examine distracters);
 reliability of the test and item-total correlation (to evaluate discrimination
at the item level)
• one limitation is that they relate to the sample under scrutiny and thus all
the statistics that describe items and questions are sample dependent
This critique may not be particularly relevant where successive samples are
reasonably representative and do not vary across time, but this will need to
be confirmed and complex strategies have been proposed to overcome this
limitation.
CTT: Limitations
• Item analysis from CTT perspectives "is
essentially sample-based descriptive statistics"
- This means that, for example, DV and DP
values are only representative of the specific
sample of examinees from which they were
calculated.
- so that making generalizations across
different groups of examinees—or across
different test formats—may not be possible.
Need of More complex analytic
approaches
More complex assessment situations
such as measuring test taker
performance at different points in time
(pre/ post);
using different test forms
different items of different difficulty
Different Raters assign scores
different elements of a performance exam
CTT VS. IRT
• The test is the unit of
analysis
• Measures with more items
(longer) are more reliable
than their counterparts.
• Comparing scores from
different measures can
only be done when the test
forms/measures are
parallel.
• Item properties depend on
a representative sample.
• The items is the unit of
analysis
• Measures with lesser items
(Shorter) can be more reliable
than their counterparts.
• Item responses of different
measures can be compared as
long as they are measuring the
same latent trait.
• Item properties do not
depend on a representative
sample
CTT VS. IRT
• Position on the latent
trait continuum is
derived from comparing
the test score with
score of reference
group.
• All items on the
measure must have the
same response
categories.
• Position on the latent
trait continuum are
derived by comparing
the distance between
items on the ability
scale.
• Items on measure can
have different response
categories.
Thanks to All

More Related Content

What's hot

What's hot (20)

Psychological testing
Psychological testingPsychological testing
Psychological testing
 
Reliability
ReliabilityReliability
Reliability
 
Reliability
ReliabilityReliability
Reliability
 
Educational testing and assessment
Educational testing and assessmentEducational testing and assessment
Educational testing and assessment
 
What is Reliability and its Types?
What is Reliability and its Types? What is Reliability and its Types?
What is Reliability and its Types?
 
Item Response Theory (IRT)
Item Response Theory (IRT)Item Response Theory (IRT)
Item Response Theory (IRT)
 
Test Reliability and Validity
Test Reliability and ValidityTest Reliability and Validity
Test Reliability and Validity
 
Validity in Assessment
Validity in AssessmentValidity in Assessment
Validity in Assessment
 
Week 8 & 9 - Validity and Reliability
Week 8 & 9 - Validity and ReliabilityWeek 8 & 9 - Validity and Reliability
Week 8 & 9 - Validity and Reliability
 
Item analysis ppt
Item analysis pptItem analysis ppt
Item analysis ppt
 
Validity of test
Validity of testValidity of test
Validity of test
 
Test interpretation
Test interpretationTest interpretation
Test interpretation
 
Reliability for testing and assessment
Reliability for testing and assessmentReliability for testing and assessment
Reliability for testing and assessment
 
ETHICAL STANDARDS IN TESTING.
ETHICAL STANDARDS IN TESTING.ETHICAL STANDARDS IN TESTING.
ETHICAL STANDARDS IN TESTING.
 
Validity and reliability
Validity and reliabilityValidity and reliability
Validity and reliability
 
Test item formats: definition, types, pros and cons
Test item formats: definition, types, pros and consTest item formats: definition, types, pros and cons
Test item formats: definition, types, pros and cons
 
Presentation validity
Presentation validityPresentation validity
Presentation validity
 
Test development
Test developmentTest development
Test development
 
Norms[1]
Norms[1]Norms[1]
Norms[1]
 
Research tool.rating scale
Research tool.rating scaleResearch tool.rating scale
Research tool.rating scale
 

Similar to Classical Test Theory (CTT)- By Dr. Jai Singh

The nature of probability and statistics
The nature of probability and statisticsThe nature of probability and statistics
The nature of probability and statistics
San Benito CISD
 
Statistics Chapter 01[1]
Statistics  Chapter 01[1]Statistics  Chapter 01[1]
Statistics Chapter 01[1]
plisasm
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity
Louzel Linejan
 
Applied statistics lecture_3
Applied statistics lecture_3Applied statistics lecture_3
Applied statistics lecture_3
Daria Bogdanova
 

Similar to Classical Test Theory (CTT)- By Dr. Jai Singh (20)

Unit. 6.doc
Unit. 6.docUnit. 6.doc
Unit. 6.doc
 
Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)
 
1. complete stats notes
1. complete stats notes1. complete stats notes
1. complete stats notes
 
Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.ppt
 
Statistical Estimation and Testing Lecture Notes.pdf
Statistical Estimation and Testing Lecture Notes.pdfStatistical Estimation and Testing Lecture Notes.pdf
Statistical Estimation and Testing Lecture Notes.pdf
 
Reliability and validity
Reliability and validityReliability and validity
Reliability and validity
 
reliabilityandvalidity-180910201110 (1).pptx
reliabilityandvalidity-180910201110 (1).pptxreliabilityandvalidity-180910201110 (1).pptx
reliabilityandvalidity-180910201110 (1).pptx
 
Meta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptxMeta-Analysis -- Introduction.pptx
Meta-Analysis -- Introduction.pptx
 
Sample determinants and size
Sample determinants and sizeSample determinants and size
Sample determinants and size
 
2-nature.pptx
2-nature.pptx2-nature.pptx
2-nature.pptx
 
Chi sqaure test
Chi sqaure testChi sqaure test
Chi sqaure test
 
Establlishing Reliability-Validity.pptx
Establlishing Reliability-Validity.pptxEstabllishing Reliability-Validity.pptx
Establlishing Reliability-Validity.pptx
 
The nature of probability and statistics
The nature of probability and statisticsThe nature of probability and statistics
The nature of probability and statistics
 
Fulcher standardized testing
Fulcher standardized testingFulcher standardized testing
Fulcher standardized testing
 
Ag Extn.504 :- RESEARCH METHODS IN BEHAVIOURAL SCIENCE
Ag Extn.504 :-  RESEARCH METHODS IN BEHAVIOURAL SCIENCE  Ag Extn.504 :-  RESEARCH METHODS IN BEHAVIOURAL SCIENCE
Ag Extn.504 :- RESEARCH METHODS IN BEHAVIOURAL SCIENCE
 
Statistics Chapter 01[1]
Statistics  Chapter 01[1]Statistics  Chapter 01[1]
Statistics Chapter 01[1]
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity
 
Applied statistics lecture_3
Applied statistics lecture_3Applied statistics lecture_3
Applied statistics lecture_3
 
hypothesis.pptx
hypothesis.pptxhypothesis.pptx
hypothesis.pptx
 
Sample size determination
Sample size determinationSample size determination
Sample size determination
 

More from Academy for Higher Education and Social Science Research

More from Academy for Higher Education and Social Science Research (11)

Learning styles pedagogy - ppt presented in Pondicherry university Workshop
Learning styles pedagogy - ppt presented in Pondicherry university Workshop Learning styles pedagogy - ppt presented in Pondicherry university Workshop
Learning styles pedagogy - ppt presented in Pondicherry university Workshop
 
Constructivism learning/Learning Styles
Constructivism learning/Learning StylesConstructivism learning/Learning Styles
Constructivism learning/Learning Styles
 
Blended and online Learning PPT Presented in Pondicherry university
Blended and online Learning PPT Presented in Pondicherry universityBlended and online Learning PPT Presented in Pondicherry university
Blended and online Learning PPT Presented in Pondicherry university
 
Workshop QCI- regression_analysis
Workshop QCI- regression_analysis Workshop QCI- regression_analysis
Workshop QCI- regression_analysis
 
QCI WORKSHOP- Factor analysis-
QCI WORKSHOP- Factor analysis-QCI WORKSHOP- Factor analysis-
QCI WORKSHOP- Factor analysis-
 
Workshop on Data Analysis and Result Interpretation in Social Science Researc...
Workshop on Data Analysis and Result Interpretation in Social Science Researc...Workshop on Data Analysis and Result Interpretation in Social Science Researc...
Workshop on Data Analysis and Result Interpretation in Social Science Researc...
 
Workshop on Data Analysis and Result Interpretation in Social Science Researc...
Workshop on Data Analysis and Result Interpretation in Social Science Researc...Workshop on Data Analysis and Result Interpretation in Social Science Researc...
Workshop on Data Analysis and Result Interpretation in Social Science Researc...
 
ONLINE Workshop Presentation - VCW
ONLINE Workshop Presentation - VCWONLINE Workshop Presentation - VCW
ONLINE Workshop Presentation - VCW
 
IRT in Test Construction
IRT in Test Construction IRT in Test Construction
IRT in Test Construction
 
Digital Initiative in Higher Education (Flipped and Online Learning)
Digital Initiative in Higher Education (Flipped and Online Learning)Digital Initiative in Higher Education (Flipped and Online Learning)
Digital Initiative in Higher Education (Flipped and Online Learning)
 
Median from grouped data (special case)
Median from grouped data (special case)Median from grouped data (special case)
Median from grouped data (special case)
 

Recently uploaded

Recently uploaded (20)

How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Basic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationBasic Intentional Injuries Health Education
Basic Intentional Injuries Health Education
 
Philosophy of china and it's charactistics
Philosophy of china and it's charactisticsPhilosophy of china and it's charactistics
Philosophy of china and it's charactistics
 
How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17How to Add a Tool Tip to a Field in Odoo 17
How to Add a Tool Tip to a Field in Odoo 17
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx
 
Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 

Classical Test Theory (CTT)- By Dr. Jai Singh

  • 1. Fundamentals of Classical Test Theory (CTT) Dr. Jai Singh National Accreditation Board for Education and Training (NABET) -QCI
  • 2. Objectives- • To understand the construct and latent traits. • To know about measuring latent traits. • To understand the terminology in test construction. • To know about fundamentals of CTT. • To know various assumptions of CTT. • To critically evaluate use of CTT and its limitations.
  • 3. Constructs and Measures constructs are theoretical terms that refer to unobserved, idealized entities. A Construct's height, weight or depth cannot be measured because constructs are not concrete materials in the visible world. In psychology, Construct refers to any complex psychological concept. Construct is a skill, attribute, or ability that is based on one or more established theories. Constructs exist in the human brain and are not directly observable. In psychology and cognitive science, constructs include terms like person's motivation, Intelligence, anxiety, and fear, anger, personality, love, attachment, memory, creativity, learning outcomes and attention. Measures are the observations used in science to learn about constructs. These include things like reaction times, accuracy scores, and response frequencies.
  • 4. Latent Traits Latent traits are a specific kind of construct. – Relatively stable qualities of individuals that are changeable, but only over the long term. • Transient things, such as “attention,” are not traits. – Latent traits include everyday things like attitudes, personality, preferences, and dispositions (e.g., “talkative”). – Latent traits also include many kinds of things that educators are interested in: • Ability, aptitude, creativity, expertise, learning outcomes and intelligence.
  • 5. Measuring Latent Traits It is important to recognize that no single measure of a latent trait is ever taken to be a perfectly accurate measure of that trait. – Instead, different kinds of “measures” or “tests” are seen as “tapping into” the latent trait. – Different measures may “tap into” a latent trait in different ways, capturing some aspects of the trait better than others. – Multiple measures can provide “converging” evidence.
  • 6. Psychometric Tests • Psychometric tests are standardized tests, and they are designed to assess a particular variable. • Psychometric tests- scientific and systematic ways to test someone's ability to do a job or measure their personality or some mental ability (like math achievement, learning outcomes-language outcomes etc.). • Psychometrics means the study of developing measurements.
  • 7. Measuring body temperature • Using temperature to indicate illness • Measurement tool: a mercury • thermometer - a glass vacuum tube with a bulb of mercury at one end.
  • 8. Measuring body temperature To make inference between taking temperature and illness – theory regarding: Thermal equilibrium via conduction. – The proportionality of mercury density with a conceptual temperature scale. – Relationship between mouth and core body temperature. – Relationship between core body temperature and illness.
  • 9. At each stage, error may intrude • Thermal equilibrium may not have been reached (e.g. thermometer removed too quickly). • – • Expansion of mercury also affected by other things (e.g. air pressure). • – • Mouth temperature may not reflect core body temperature (e.g. after a hot cup of tea). • – • Core body temperature does not vary with all illnesses, and is not even completely stable in health. Identify the sources of errors in measuring students’ attributes.
  • 10. Test developer’s concerns – • Quality of test items • How examinees respond to it when constructing tests • Reliable and Valid tool A psychometrician generally uses psychometric techniques to determine the validity and reliability.
  • 11. Construction of Test based on CTT Table of Specification (Blue Print) for Class 5th Science Instructional Objective Content Areas Knowledge 40% Understanding 35% Application 25% Total SU MC MT TF SU MC MT TF SU MC MT TF 1) Food and Health 1 2 2 1 1 2 1 1 1 1 1 1 15 2)Plant Life 0 2 2 1 0 1 2 1 1 0 1 1 12 3)Animal Life 1 1 2 1 1 0 1 2 0 1 1 1 12 4)Force Work and Energy 2 2 1 2 1 2 1 1 1 1 1 1 16 5)Weight, Volume & Density 2 2 1 2 2 1 2 1 2 1 1 1 18 6)The Environment 2 1 2 1 1 1 1 2 1 2 1 0 15 7)The Rocks and Minerals 1 1 2 1 1 1 1 1 0 1 1 1 12 Total 9 11 12 9 7 8 9 9 6 7 7 6 100 SU= Supply Type, MC=Multiple Choice, MT=Matching, TF=True False
  • 12. Examples 1) Instrument used to measure earthquake is known as- (a)Seismograph (b) Quake meter (c ) Barometer (d) None of above 2) How many seismograph stations are needed to locate the epicenter of earthquake? (a) 2 (b) 3 (c ) 4 (d) 5 3) In which situation spring tides can occur? (a) The moon , sun and earth are at right angle with the earth at apex (b) The moon is the farthest from the earth (c )The sun is closest to the earth (d) The moon , sun and earth are at the same line 4) The topic of cancer passes through (a) India and Iran (b) Iran and Pakistan (c) India and Saudi Arabia (d) Iran and Iraq 1) The area of semi circle is (a) ∏R2 (b) ∏R2/2 (c) 2∏R (d) ∏R 2) In a circle given below, if AB is diameter, angle a=300 Then the value of angle b will be (a) 450 (b) 600 (c) 900 (d) 550
  • 13.
  • 14. Higher Score Achiever and Low Score Achiever Higher Group 27% Lower Group 27%
  • 15. Difficulty of an Item The difficulty of an item is understood as the proportion of the persons who answer a test item correctly. – The higher this proportion- the lower the difficulty – the greater the difficulty of an item- the lower its index
  • 16. Index of Difficulty DI= RU+RL x 100 T RU= The number in upper group who answered correctly RL=The number in lower group who answered correctly T= The total number who tried the item Hypothetical Example-
  • 17. Discrimination A good item should discriminate between those who score high on the test and those who score low. We would expect that – - those having a high overall test score would have a high probability of being able to answer the item. - those having low test scores would have a low probability of answering the item correctly. The higher the discrimination index, the better the item can determine the difference between those with high test scores and those with low ones.
  • 18. Formula for item discriminating power Item discriminating power = RU-RL T/2 Where RU= Students from upper group who got the answer correct. RL= Students from lower group who got the answer correct. T/2 = half of the total number of pupils included in the item analysis. Hypothetical Example-
  • 19. Test Theories/Model - Classical Test Theory Item Response Theory Test Theory/Model Classical Test Theory (CTT) Item Response Theory (IRT) • Both theories enable to predict outcomes of psychological tests by identifying parameters of item difficulty and the ability of test takers. • Both are concerned to improve the reliability and validity of psychological tests. Both of these approaches provide measures of validity and reliability.
  • 20. Classical Test Theory Classical Test Theory is used to predict an individual’s latent trait based on an observed total score on an instrument.
  • 21. Continued- • In CTT, the true score predicts the level of the latent variable. • The random error is normally distributed with a mean of 0 and a SD of 1. • The random errors are uncorrelated with each other and also are uncorrelated to the true scores.
  • 22. Mathematical Model of CTT Observed test scores (X) are composed of a true score (T) and an error score (E) -the true and the error scores are independent. Charles Spearman- reduce random error as much as possible, thereby making tests better. Illustrated in the formula: X = T + E. Where – X= Total Score T=True Score E=Error Score The variables are established by Spearman (1904) and Novick (1966)
  • 23. Classical Test Theory Classical test theory (CTT) in psychometrics is all about reliability. • Reliability refers to how consistent a test or measure is. • In CTT- Three Base Terms- test/Observed score, error, and true score. • Ex. - math exam and get an 85, • Test score-85. • Error – sound - mistake in the test, or -external environment not totally control but that impact testing But psychometrics assumes everyone has, in theory, a true score. - We can calculate this true score with an equation. Is True score reflect true ability? Why true score vary without intervention?
  • 24. Standard error of measurement Sm = S √1 - r . The standard deviation of the distribution of random errors for each individual standard error of measurement-larger- the less certain is the accuracy standard error of measurement-small- high accuracy- individual score is probably close to the true score. Use of Standard error of measurement – create confidence intervals around specific observed scores The lower and upper bound of the confidence interval approximate the value of the true score. Will distribution of random errors be the same for all individuals - ? Why score vary over different administration on subjects ? Is not error due to item characteristics, administration , environment, and nature of tool ?
  • 25. Error Distribution St. Obtained Score True Score Error 1 85 80 +5 2 69 72 -3 3 48 45 +3 4 82 85 -3 5 39 43 -4 6 45 41 +4 7 78 79 -1
  • 26. Assumption of Classical Test Theory • Varying responses of examinees are due only to variation in ability of interest. • All other potential sources of variation existing in the testing materials such as external conditions or internal conditions of examinees are assumed to be constant.
  • 27. Continued- • Each individual has a true score which would be obtained if there were no errors in measurement. • The difference between the true score and the observed test score results from measurement error. • Error is often assumed to be a random variable having a normal distribution. • Tests are fallible imprecise tools. The true score for an individual will not change with repeated applications of the same test.
  • 28. Shortcomings of CTT • Examinee characteristics and test characteristics -cannot be separated each can only be interpreted in the context of the other. • Reliability is "the correlation between test scores on parallel forms of a test". differing opinions of what parallel tests are-reliability coefficients provide either lower bound estimates of reliability or reliability estimates with unknown biases. • Standard error of measurement the standard error of measurement is assumed to be the same for all examinees of different ability. • Measurement accuracy and Attribute level Common estimate of the measurement precision that is assumed to be equal for all individuals irrespective of their attribute levels. • CTT is test oriented, rather than item oriented cannot help us make predictions of how well an individual or even a group of examinees might do on a test item
  • 29. Limitations of CTT Sample Dependent- The focus of the analysis is –  total test score;  frequency of correct responses (to indicate question difficulty);  frequency of responses (to examine distracters);  reliability of the test and item-total correlation (to evaluate discrimination at the item level) • one limitation is that they relate to the sample under scrutiny and thus all the statistics that describe items and questions are sample dependent This critique may not be particularly relevant where successive samples are reasonably representative and do not vary across time, but this will need to be confirmed and complex strategies have been proposed to overcome this limitation.
  • 30. CTT: Limitations • Item analysis from CTT perspectives "is essentially sample-based descriptive statistics" - This means that, for example, DV and DP values are only representative of the specific sample of examinees from which they were calculated. - so that making generalizations across different groups of examinees—or across different test formats—may not be possible.
  • 31. Need of More complex analytic approaches More complex assessment situations such as measuring test taker performance at different points in time (pre/ post); using different test forms different items of different difficulty Different Raters assign scores different elements of a performance exam
  • 32. CTT VS. IRT • The test is the unit of analysis • Measures with more items (longer) are more reliable than their counterparts. • Comparing scores from different measures can only be done when the test forms/measures are parallel. • Item properties depend on a representative sample. • The items is the unit of analysis • Measures with lesser items (Shorter) can be more reliable than their counterparts. • Item responses of different measures can be compared as long as they are measuring the same latent trait. • Item properties do not depend on a representative sample
  • 33. CTT VS. IRT • Position on the latent trait continuum is derived from comparing the test score with score of reference group. • All items on the measure must have the same response categories. • Position on the latent trait continuum are derived by comparing the distance between items on the ability scale. • Items on measure can have different response categories.