SlideShare a Scribd company logo
1 of 84
Educational Assessment and Examinations
Service (EAES)
August, 2023
Dembel View Hotel, Adama
Outlines
• Introduction
• Basic concepts of
CTT
• Basic concepts of
IRT
• Reliability and Validity
• Differential Item
Functioning
• Item analysis using
Software
Introduction
• Test is concerned with Turing performance in numbers (Baxten,
1998)
• 13% of students who fail in the class are caused by faulty test items
(World Watch, 2005)
• Masters et al. (2001) in US , 2,233 minor and major violations of
item-writing guidelines were identified.
• It is estimated that 90% of the testing items are out of quality
(Wilen, W.W, 1992)
• Teachers have difficulty in developing plausible distractors in MCQs
and only 52% of all distractors were functioning effectively
( Tarrant, Ware & Mohammed, 2009)
• What about in case of Ethiopia with respect to quality test??
• Thus, item analysis is very important to keep the quality of the test
3
Cont.…
4
Item
Analysis
A method used to evaluate test items, typically for the
purpose of test construction and revision.
A process to examines students individual test items and
test as a whole.
Useful to improve items and eliminate ambiguous/
misleading items
Help to identify specific areas of subject content that need
greater emphasis/clarity
Suggest ways of improving the measurement of a test
Valuable to increase the skill in test construction
Cont.…
Helps to bring a match between
what is taught and what is
assessed. Care should be taken
on sampling of items and its
difficulty level.
Help to understand and make
decisions about poor
performing items
Helps to improve test items and
identify unfair or biased items
Help to carefully align
instruction with the Grade level
expectations from which a
standardized test items derived.
Purpose of Item
Analysis
Item Analysis Methods
Qualitative
• A non – numerical method for
analyzing test items not employing
students responses.
• Rather considering test objectives,
content validity, and technical item
quality such as
Matching items and objectives
Editing poorly written items
Improving the content validity of the test
Evaluating the items across table of
specification and item writing guidelines
Quantitative
• A numerical method for
analyzing test items based
on students response.
• It includes:
Difficulty/b-parameter
Discrimination/ a-parameter
Option analysis
Reliability
Differential item functioning ,
etc.
6
Cont.…
Item Analysis Methods
Quantitative Qualitative
Difficulty/b-parameter
Discrimination/a-parameter
Option Analysis
Reliability
Validity
Differential Item Functioning/DIF
Basic concepts of Classical Test Theory and Item
Response Theory
• CTT and IRT are the two primary psychometric paradigms.
• They are a mathematical approaches to how tests items are
analyzed.
• They differ quite substantially in substance and complexity, even
though they both nominally do the same thing.
• So there is no single best answer to the question of using either
CTT or IRT.
• Since in many cases, BOTH are necessary and can be used based
on the purpose of the item analysis.
• However, CTT and IRT have some differences .
8
Comparison of CTT vs. IRT
9
Feature CTT IRT
Ability-Item relationship Linear Logistic curve
Invariance of item &
person statistics
No Yes
Difficulty P-value b-parameter
Discrimination D(item-total) a-parameter
Adaptive Testing Rare Suitable
Reliability Depend on test
length
Don't depend on
test length
Equating Complicated Automatic
Item – Model Fit No Yes
Sample size needed Small Large
Option analysis Preferable Rare
Classical Test Theory and Item Response Theory
Classical Test Theory (CTT)
• They are the easiest and most widely used form of analyses.
• The statistics can be computed by readily available statistical
packages (or even by hand).
• They are performed on the test as a whole rather than on the item
• Although item statistics can be generated, they apply only to that
group of students on that collection of items
• CTT assumes that each person has a true score (T), that would be
obtained if there were no errors (E) in measurement.
• Unfortunately, test users never observe a person's true score, only an
observed score (X)
• Thus, CTT is based on the true score model:
• In CTT we assume that the error :
Is normally distributed
Uncorrelated with true score
Has a mean of Zero
X T E
 
Item Difficulty and Discrimination in CTT
Item Difficulty Level (P)
• The percentage of students who answered the item correctly
• Calculation:
or
• The range is b/n 0% and 100% (0.0&1.00)
• The higher the value, the easier the item and the lower the value,
the harder the item
• An item with a p value of .0 or 1.0 does not contribute to measure
individual differences
• Ideal value of an item difficulty is 0.50
• Small number of easy or difficult items may be included to motivate
or differentiate the test takers 12
P= Upper group + Lower group
Total group
P= # Correct students
Total students
The distribution items by difficulty levels in a Test
13
Author Type of Items Percentage
Sugianto [2020]
Very Easy 10%
Easy 20%
Moderate 40%
Difficult 20%
Challenging 10%
Arifin (2009)- three options
based on the purpose of test
Difficult
Medium
Easy
25%
50%
25%
Difficult
Medium
Easy
20%
60%
20%
Difficult
Medium
Easy
15%
70%
15%
Interpretation of difficulty index (p-value).
14
Author Difficulty index Interpretation
Uddin et al. (2020) >80% Easy
30–80% Moderate
<30% Difficult
Kaur, Singla et al. (2016) >80 Easy
40–80 Moderate
<39 Difficult
Sugianto (2020) 90% Easy
50% Moderate
10% Difficult
Jaipurkar et al. (2021) >70% Too easy
50–60% Excellent/ideal
30–70% Good/acceptable/average
Obon and Rey (2019) > 0.76 Easy
0.26–0.75 Right difficult (Retain)
0–0.25 Difficult (Revise/Discard)
Bhat and Prasad (2021) >70% Easy
30–70% Good
<30% Difficult
Item Discrimination Power (D)
• Ability of items to elicit different responses from students with
different abilities/skills.
• The computed difference between the percentage of high achievers
and the percentage of low achievers who got the item right.
• The maximum range of the Discrimination Index is from -1.0 to +1.0
• The higher the value of D, the more adequately the item discriminates
(highest value is 1.0)
• Values close to 0 means most students performed the same on an item
15
D = (Correct Upper) - (Correct Lower)
(1/2 Total)
Discrimination Index (D)
• Those who did well on the overall test chose the correct
answer for a particular item more often than those who did
poorly on the overall test.
Positive Discrimination Index
• Those who did poorly on the overall test chose the correct
answer for a particular item more often than those who did
well on the overall test.
Negative Discrimination Index
• Those who did well and those who did poorly on the
overall test chose the correct answer for a particular item
with equal frequency
Zero Discrimination Index
 Negative discriminators (-) (This is never what we want)
Non-discriminators (0) (This may or may not be what we want)
Positive discriminators (+) (This is usually what we want)
Interpretation of Discrimination power (D)
17
Author Discrimination power Interpretation
Elfaki, Bahamdan et
al. [2015]
• ≥0.35 Excellent
• 0.25–0.34 Good
• 0.21–0.24 Acceptable
• ≤ 0.20 Poor
Obon and Rey [2019]
• ≥ 0.50 Very Good
• 0.40–0.49 Good (Very Usable)
• 0.30–0.39 Fair Quality (Usable Item)
• 0.20–0.29 Poor (Revised)
• ≤ 0.20 Very Poor (Critically Revised/ Discard
Bhat and Prasad
[2021]
• > 0.35 Excellent
• 0.2–0.35 Good
• < 0.2 Poor
Sugianto [2020]
• >0.40 Very good
• 0.30–0.39 Good
• 0.20–0.29 Marginal & need improvement
• <0.19 Poor, rejected/ improved by revision
Interpretation of Discrimination power(D)
18
Author Discrimination power Interpretation
Aljehani, Pullisheryet
al. [2020] and Sharma
[2021]
• ≥ 0.40 Very good item (Keep)
• 0.30–0.39 Good item (Keep)
• 0.20–0.29 Moderate & fair (Keep)
• < 0.20 Marginal, Revise/Discard
• Negative Worst & Definitely Discard
Ramzan, Imran et al.
[2020]
• > 0.30 Excellent
• 0.20–0.29 Good
• 0–0.19 Poor
• <0 Defective & Discard
Uddin et al. [2020]
• ≥ 0.35 Excellent
• 0.25–0.34 Good
• 0.21–0.24 Acceptable
• < 0.20 Poor
Item Analysis for Partial Credit Items
Partial credit items (short answer, essay) items can be analyzed using
the following formula:
P = UGSP + LGSP x 100
WI x T
D = UGSP - LGSP
WI x 1/2T
Where: UGSP-Upper group sum point on item
LGSP-Lower group sum point on item
WI- Weight of an Item
T- Total number of students
19
Example
• The maximum point of a short answer item was 3. Among the upper
groups of students, each 6 of them scored 3 points and 4 of them
scored 2 points. From the lower group, each 4 of them 2 and 6 of
them scored 1 point. Find P and D of an item.
• P = UGSP + LGSP x 100
WI x T
= (6*3+4*2)+(4*2+6*1) = 40/60 = 0.67
3*20
• D = UGSP - LGSP
WI x 1/2T
= 26-14 = 8/30 = 0.27
3*10 20
Option Analysis
• Analysis of how well the high and low groups are responding to the
items options.
• Compare the performance of the highest- and lowest-scoring of the
students on the distracter options
• Fewer of the top performers should choose each of the distractors
as their answer compared to the bottom performers.
• A good distractor attracts more students from the lower group
than the upper group.
• It is not desirable to have one of the distractors chosen more
often than the correct answer.
• If so, this distractor may be too similar to the correct answer
and/or there may be something in either the stem or the alternatives
that is misleading.
• At the key answer, the difference between upper and lower
performers expected to be positive, while at the distracters it
expected to be negative.
21
Activity 1
Consider the case below
Suppose your students chose the options to a four – alternative
multiple – choice item.
Let C as the correct answer.
A B C* D
3 0 18 9
Questions
• How does this information help us?
• Is the item too difficult/easy for the students?
• What is the difficulty level value?
• What is the discrimination index value?
• Are the distractors of the items effective?
• Should this item be eliminated?
Item X
𝒑 =
𝑵𝒖𝒎𝒃𝒆𝒓 𝒔𝒆𝒍𝒆𝒄𝒕𝒊𝒏𝒈 𝒄𝒐𝒓𝒓𝒆𝒄𝒕 𝒂𝒏𝒔𝒘𝒆𝒓
𝑻𝒐𝒕𝒂𝒍 𝑵𝒖𝒎𝒃𝒆𝒓 𝒕𝒂𝒌𝒊𝒏𝒈 𝒕𝒉𝒆 𝒕𝒆𝒔𝒕
𝒑 = 𝟎. 𝟔𝟎
Solving the difficulty index for Item X.
𝒑 =
𝟏𝟖
𝟑𝟎
• Thus, the difficulty level of the item is 0. 60 (60%), the item is
moderate.
A B C* D
3 0 18 9
Item X
As Bhat and Prasad (2021):
If level > 0.70, the item is considered relatively easy.
If P level < 0. 30, the item is considered relatively difficult.
D= 𝟎. 𝟐𝟕
Solving the Discrimination Index for item X. if the # of
upper and lower groups correct are 11 and 7 respectively.
D=
𝟏𝟏−𝟕
𝟏𝟓
=
𝟒
𝟏𝟓
• The discrimination power of item X is 0. 27 and positive.
• More students who did well on the overall test answered the item
correctly than students who did poorly on the overall test.
• Thus, it has good discrimination power
A B C* D
3 0 18 9
Item X
As Bhat and Prasad (2021):
If D index is 20-35 , the item has good discrimination power.
D = (Correct Upper) - (Correct Lower)
(1/2 Total)
Implication
Difficulty Level (p) = 0. 60
Discrimination Index (D) = 0.27
1. Should item X be eliminated?
Item X is considered a moderately difficult item that has positive
(desirable) discrimination ability.
NO
2. Should any distractor(s) be modified?
A B C* D
3 0 18 9
Item X
YES
• Option B is ought to be modified or replaced. As No one chose it
• Option A also need revision
Item Response Theory (IRT)
• IRT – refers to a family of latent trait models used to establish
psychometric properties of items and scales
• Sometimes known as modern psychometrics because in large-scale
assessment, testing programs and professional testing IRT has
almost completely replaced CTT
• IRT has many advantages over CTT that have brought IRT into
more frequent use
• Three Basics Components of IRT are:
Item Response Function (IRF) – Mathematical function that relates the latent
trait to the probability of endorsing an item
Item Information Function – an indication of item quality; an item’s ability
to differentiate among respondents
Invariance – position on the latent trait can be estimated by any items with
know IRFs and item characteristics are population independent within a
linear transformation
Cont.…
Item Response Function (IRF)
• It characterizes the relation between a latent variable/ability and
the probability of endorsing an item.
• The IRF models the relationship between examinee trait level,
item properties and the probability of endorsing the item.
• Examinee trait level is signified by the Greek letter theta () and
typically has mean = 0 and a standard deviation = 1
• IRFs can be converted into Item Characteristic Curves (ICC)
which are graphical functions that represents the respondents
ability as a function of the probability of endorsing the item
IRF- Item Parameters in IRT
Location – b-parameter
• An item’s location is defined as the amount of the latent trait
needed to have a .5 probability of endorsing the item.
• The higher the “b” parameter the higher on the trait level a
respondent needs to be in order to endorse the item
• It is analogous to difficulty level in CTT
• Like Z scores, the values of b typically range from -3 to +3
• Indicates the steepness of the IRF at the items location
Discrimination/Slope –a- parameter
• It indicates how strongly related the item is to the latent trait like
loadings in a factor analysis
• Items with high discriminations are better at differentiating
respondents around the location point
• It typically ranges from 0 to 2 and should never be negative.
• Vice versa for items with low discriminations
Cont.…
Pseudo-guessing –c - parameter
• The inclusion of a “c” parameter suggests that respondents very low
on the trait may still choose the correct answer.
• In other words respondents with low trait levels may still have a
small probability of endorsing an item
• This is mostly used with multiple choice testing and the value should
not vary excessively from the reciprocal of the number of choices.
• In general, it is the probability of getting the item correct by guessing
alone and varies from 0 to 1. For instance, c = 0.20 means that at all
ability levels, the probability of getting the item correct by guessing
alone is 0.20
Upper asymptote –d-parameter
• The inclusion of a “d” parameter suggests that respondents very high
on the latent trait are not guaranteed (i.e. have less than 1
probability) to endorse the item
• Often an item that is difficult to endorse
IRT - Logistic models
The 4-parameter logistic model:
Where
•  represents examinee trait level
• b is the item difficulty that determines the location of the IRF
• a is the item’s discrimination that determines the steepness of
the IRF
• c is a lower asymptote parameter for the IRF
• d is an upper asymptote parameter for the IRF
( )
( )
e
( 1 , , , , ) ( )
1 e
a b
a b
P X a b c d c d c





   

ICC for 4PLM
31
Cont.…
The 3-parameter logistic model
• If the upper asymptote parameter is set to 1.0, then the model is
termed a 3PLM.
• In this model, individuals at low trait levels have a non-zero
probability of endorsing the item.
( )
( )
e
( 1 , , , ) (1 )
1 e
a b
a b
P X a b c c c





   

ICC for 3PLM
33
Cont.…
The 2-parameter logistic model:
• If the lower asymptote parameter is constrained to zero, then the
model is termed a 2PLM.
• In the 2PLM, IRFs vary both in their discrimination and
difficulty (i.e., location) parameters.
( )
( )
e
( 1 , , )
1 e
a b
a b
P X a b





 

ICC for 2PLM
35
Cont.…
The 1-parameter logistic model:
• If the item discrimination is set to 1.0 the result is a 1PLM
• A 1PLM assumes that all scale items relate to the latent trait equally
and items vary only in difficulty (equivalent to having equal factor
loadings across items).
• Mathematically, the most basic IRT model in 1PLM is identical to,
Rasch model, however, there are some differences
• In Rasch, the model is superior and data which does not fit the model
is discarded
• Rasch does not permit abilities to be estimated for extreme items and
persons
( )
( )
e
( 1 , )
1 e
b
b
P X b





 

ICCs for 1PLM
37
Activity
38
• Which item do think the most difficult?
• Which item do think mostly differentiate the learners?
Cont.….
In IRT:
• If the ability of the student > the difficulty of the item, what do you
think the p(1)?
• If the ability of the student < the difficulty of the item, what do
you think the p(1)?
• If the ability of the student = the difficulty of the item, what do
you think the p(1)?
Test Response Curve/TCC
• Test response
curve/TCC is the
sum of item
response functions
/ICCs.
• A TCC is the latent
trait relative to the
number of items
Differential Item Functioning (DIF)
What is DIF?
• DIF is a statistical characteristic of an item that shows the extent to
which the item might be measuring different abilities for members
of separate subgroups (gender, location, language, ethnicity etc.)
• DIF occurs when one group of examinees has a different expected
item score than comparable examinees from another group.
• An item is considered free of differential functioning (DIF) if the
item response function is the same across groups (Zwick, 1990).
• DIF means that either the item performs differently or measures
something different. If the item shows DIF it means that the item is
less valid for one subgroup (Steinberg & Thissen, 2006).
• A fundamental aspect of all DIF is the matching of students in the
reference and focal groups on some measure of ability (Clauser &
Mazor, 1998).
• The focal group is the one of interest and usually represents a
minority group, while a reference group represents a larger group.
Types DIF
Uniform DIF
• DIF is in the same direction across the entire spectrum of item
response curves for two groups do not cross
• DIF involves the location (b) parameters
• DIF is a significant main (group) effect in regression analyses
predicting item response
Non-uniform DIF
• An item favors one group at certain disability levels, and other
groups at other levels
• DIF involves the discrimination (a) parameters
• DIF is a significant group by ability interaction in regressions
predicting item response
45
Uniform DIF Non-uniform DIF
46
ETS: DIF Classification Levels
ETS rules for classifying the magnitude of DIF. Stated in terms of
the common odds ratio, these rules are as follows:
• “A” /items have:
 (a) a CMH p-value greater than 0.05, or
 (b) the common odds ratio is strictly between 0.65 and 1.53.
• “B” items are neither “A” nor “C” items.
• “C” items have:
 (a) a common odds ratio less than 0.53, and the upper bound of the 95%
confidence interval for the common odds ratio is less than 0.65, or
 (b) a common odds ratio greater than 1.89 and the lower bound of the 95%
confidence interval is greater than 1.53.
47
Reliability
• Types of reliability
Test-retest
Parallel Forms
Split-half
Internal Consistency
• Can be calculated by:
Split half
KR20 (Kuder-Richardson Formula 20)
KR21 (Kuder-Richardson Formula 21)
Chrobach Alpha
48
• Produce results which are accurate and consistent
• Degree to which scores are free of “measurement error” (higher
reliabilities = less measurement error)
• Reliability coefficients range from .00 - 1.00.
• Ideal score of reliability is >0.80 and at least not < 0.70
Interpretation for Reliability
49
Author Interpretation of Cronbach’s alpha (KR20)
Robinson,
Shaver et al
(1999)
- ≥0.80 Exemplary - 0.70–0.79 Extensive
- 0.60–0.69 Moderate - <0.60 Minimal
Cicchetti (1994) - >0 .90 Excellent - 0.80–0.90 Good
- 0.70–0.80 Fair - <0.70 Unacceptable
Axelson and
Kreiter (2019)
• >0.90 is needed for very high stakes tests
• 0.80–0.89 is acceptable for moderate stakes tests
• 0.70–0.79 acceptable for lower stakes assessments
• <0.70 might be useful as component of overall composite score
Obon and Rey
(2019)
• >0.90 Excellent reliability
• 0.80–0.90 Very good for a classroom test
• 0.70–0.80 good for a classroom test
• 0.60–0.70 Somewhat low
• 0.50–0.60 Suggests need for revision of test.
• 0.50 < Questionable reliability.
Hassan and Hod
(2017)
- > 0.7 is excellent - < 0.5 is unacceptable
- 0.6–0.7 is acceptable - < 0.30 is unreliable
- 0.5-0.6 is poor
Validity
• The extent to which measures indicate what they are intended to
measure.
• Establishes the measure covers the full range of the concept’s
meaning, i.e., covers all dimensions of a concept
• Internal statistical validity can be applied to check Uni-
dimensionality
It more depends on “good “ expert judgment
50
Validity Vs. Reliability
Neither Valid nor Reliable Reliable but not Valid
Valid & Reliable
• Reliability is a necessary condition for validity but not sufficient
• Reliability is a prerequisite for measurement validity
• One needs reliability, but it’s not enough for validity
Test Item Analysis Using Software's
• There are software products performing the computations of both
CTT’s and IRT’s statistics.
• There area also those solely performing calculations for CTT or IRT.
• For Example:
 CITAS, ITEMAN, Lertap, and TAP are the packages that are
widely can be performed used only for CTT.
 BILOG-MG, flexMIRT, ICL, MULTILOG, PARSCALE,
PARAM-3PL, Winsteps and Xcalibre, IRT PRO, NOHARM,
TESTFACT, flexMIRT are the packages only used in IRT
Whereas JMetrik, R, Mplus and IATA are the software products can
compute statistics for both CTT and IRT.
• Among software packages facilitate the computations of IRT, only
jMetrik, PARAM-3PL, NOHARM and R are free of charge.
52
Hands on Practice
The commonly used open source programs and user friendly Test
Analysis software’s to be discussed in this session are:
• TAP: Test Analysis Program /CTT
Item And Test Analysis / CTT and IRT -
JMetrik used in IRT and CTT.
Data Preparation
• Capture data in excel/access/SPSS/Text
• Design test Map contains answer key, content domain, cognitive
domain etc
• Insert/import or open student response data in the software (TAP,
IATA, JMetrik etc.)
• Fill all necessary information required according to the character of
the software
• Analyze the data and save/print
• Identify any exam items that may require revision.
• For each identified item, list your observation and a hypothesis of the
nature of the problem.
54
TAP: Test Analysis Program
TAP: Test Analysis Program …
• The Test Analysis Program (TAP) is designed as a powerful, easy-to-
use (and free!) test analysis package.
• TAP is a classical test and item analysis program:- performs test
analyses and item analyses based on CTT.
TAP output provides:
• Examinee Analysis, including percentage correct, letter grade and
confidence intervals for each student and aggregate descriptive
statistics for the group.
• It generates report for each student indicating his/her score,
responses to each item and the correct answers to items missed.
• Item and Test Analysis, including item difficulty, point biserial,
discrimination index and various statistics if item deleted (KR20,
scale mean and standard deviation, etc.).
• Options Analysis, including high group and low group item
difficulty for correct answer and distracters.
TAP: Test Analysis Program …
• TAP uses text file.
• Any text file can be inserted into the TAP data
editor window. But for it to work with TAP, the
text file must be formatted as follows:
 Every case must be on a single row,
 The EXAMINEE ID LABEL must be at the far left of each
row,
 The ITEM data must be numeric and must be in single columns
with NO SPACES between the item scores,
 The first line must be the ANSWER KEY, formatted like the
data (but with no numbers until the first correct answer). After
you insert your data, any text in the ANSWER KEY must be
deleted.
 Also, you will need to fill in the appropriate data editor fields:
# examinees, # items, id label length, # options for each
item and INCLUDE each item
TAP: Test Analysis Program …
Steps to Run TAP
1. TAP Software Installation and Setup
• Write Test Analysis Program (TAP) in Google search engine , then
click website https://people.ohio.edu/brooksg/
2. Under Programs Available (click on the program name to go to the
Description and Download Link for the Program), Click the link
TAP: Test Analysis Program (last updated December 2018)
3. The program will be downloaded and launch ready to begin.
4. Click Run to open the TAP.
5. Entering test data directly into TAP
Entering new data and go to data editor
Importing test data - TAP also provides the option of
entering data from an existing text (.txt) file.
Steps to Run TAP …
Steps to Run TAP …
In the data editor screen enter the following
information's:
• descriptive information in the Title and Comments
sections.
• Input the number of examinees, number of items,
missing data symbol and ID label (student name)
length in the appropriate fields.
• In the Answer Key field, enter the numbers
corresponding to the correct answers as a string with
no delimiters.
• In the # Options field, enter the number of options
corresponding to each question.
• The Item Included field allows the user to eliminate
items from the analysis or set alternative correct
answers. Say Y to include or N to exclude.
• In the Data screen, enter the student identification
information and scores. Align the score data with the
guide above the Data screen.
• When all information is entered, click on either Save
File or Close and Analyze at the bottom of the Data
Entry screen.
Steps to Run TAP …
5. Saving Data Files
• Test data created entered by the user or created by TAP’s
random data generator can be saved as TAP files for archival
information, future modification or analysis.
• Data files can be saved in TAP’s data editor window.
i. Choose Save TAP file under the File menu.
ii. Select the location and save the TAP file.
iii. To open file later, choose Open TAP file under the File
menu in the Data Editor Screen.
6. Analyzing Tests with TAP
Once you have a set of test scores in the Data Editor either by
direct entering or importing your own test, you can run the
analysis by clicking Analyze (F9).
To retrieve the full analysis click on the View Full Results box
•Response Data and Key for TAPAnalysis
•TAP
63
IATA: Item And Test Analysis
• The item and test analysis software (IATA) is intended to help
national assessment practitioners, researchers, and others analyze
test item data as well as build effective assessment tools.
• IATA was designed to offer a user-friendly way to address many
statistical considerations related to national assessments.
• It targets specifically those who are interested in analyzing test
data, creating a new test from an item bank, or comparing or
scaling test items between different samples.
• The overarching goal of IATA is to increase the usability and
interpretability of test scores.
• The primary goal of test development from a statistical perspective
is to reduce the error of measurement. To reduce error of
measurement, IATA identifies problematic items that contribute to
error so that they may be revised, replaced, or removed altogether.
• The second goal is to establish meaningful and consistent scales on
which to report test scores.
IATA: Item And Test Analysis …
• IATA can read and write a variety of common data table formats (
Access, Excel, SPSS, delimited text files) if they are formatted
correctly.
• If the data are not formatted with the correct structure, IATA will
not be able to carry out the analyses.
• Database-compatible format such as Access or SPSS already take
care of most data formatting issue.
• However, if the data are stored in a less restrictive format, such as
Excel or text file, the following conventions should be followed:
The names of variables should appear in the cell at the top of each column (header).
The name of each variable must be distinct from the names of other variables in a data
file. The names of variables must begin with a letter & should not contain any spaces.
The data range must not contain any empty rows or columns. The data range is the
rectangle of cells that contain data, beginning with the variable name of the first
variable to appear in the data file and ending with the value of the last variable in the
bottom-most row.
The data range must begin at the first cell in the spreadsheet or file. In Excel, this cell
is labelled “A1.” In text files, this is the top-left cursor position in the text file.
IATA: Item And Test Analysis …
IATA: Item And Test Analysis …
• There are two main types of data produced by and used in the
analysis of assessments: response data and item data.
1. Response data are produced by the individual learners as they
answer questions on a test.
• It includes the response of each student to each test item. Should
record codes representing the options endorsed by each student
(e.g., A, B, C, D, etc. or it may be numbers).
• It may also include other useful demographic information on
variables for analyzing test results such as age, grade, gender,
school, and region.
• Unique response Identifier (ID) other wise IATA will
automatically produce it.
• In order to score the response data, for most analyses, an answer
key must be loaded into IATA.
IATA: Item And Test Analysis …
• Treatment of Missing and Omitted Data:
When a student does not provide a response to a test item, rather than leaving
the data field blank, a missing value code is used to record why the response is
missing. There are two types of missing responses: missing and omitted.
Common conventions use specific values for the different types of non-
response data. See Greaney and Kellaghan (2012) for information on response
codes. Common values used are:
9 for missing responses, where students have not responded at all to an item,
8 for muptiple responses, which typically occurs in multiple choice tests when
students provide multiple response and in open-ended items when student
responses are illegible, and
7 for omitted or not-presented items, which might be used in a rotated booklet
design.
Regardless of the specific codes used, you must specify how IATA is to treat
each non-response code, as either missing or omitted.
IATA: Item And Test Analysis …
• Item naming:
• It is important to assign a unique name to each item in a
national assessment program (see Anderson and Morgan 2008;
Greaney and Kellaghan 2012) for different purposes such as
linking, future use (IB), etc. (e.g. MA04M17001).
• Variables reserved by IATA
• During the analysis of response data, IATA will calculate a
variety of different working variables so that we are restricted
and should not be used as names of test items or questionnaire
variables.
• These are: Xweight, Missing, PercentScore, PercentError,
Percentile, RawZScore, Zscore, IRTscore, IRTerror, IRTskew,
IRTkurt, TrueScore, Level and “@” symbol.
IATA: Item And Test Analysis …
2. Item Data:
• A test is a specific collection of questions that evaluate a
common domain of proficiency or knowledge. Individual
questions on a test are referred to as items.
• IATA produces and uses item data files with a specific format.
• An item data file contains all the information required to perform
statistical analysis of items and may contain the parameters used
to describe the statistical properties of items.
• It is simply a bank file that describe the item.
IATA: Item And Test Analysis …
• Required Variables in an item data file are the following.
• Example
IATA Result interpretations
• Uses traffic symbols:
Symbol Meaning
Green circles indicate no problems.
A yellow diamond indicates that the results are less than optimal. This
indicator is used to suggest that modifications may be required to either
the analysis specifications or the items themselves. However, the item is
not introducing any significant error into the analysis results.
A red warning triangle appears beside any potentially problematic items.
This indicator is used either to indicate items that could not be included
in the analysis due to problems with the data or specifications, or to
recommend a more detailed examination of the specifications or
underlying data and test item. When this indicator appears, it does not
necessarily mean that there is a problem, but it does suggest that the
overall analysis results may be more accurate if the indicated test item
were removed or if the analysis were re-specified.
IATA analysis workflows and interfaces (Steps)
Double click or right click to open IATA.
Click main menu.
IATA analysis workflows and interfaces (Steps) …
There are five workflows
available in IATA:
1. Response data analysis,
2. Response data analysis
with linking,
3. Linking item data,
4. Selecting optimal test
items, and
5. Developing and
assigning performance
standards
• There are 10 different tasks in IATA and the workflows in which
they are used
Task
Workflow:
A. Response data analysis
B. Response data analysis with linking
C. Linking item data
D. Selecting optimal test items
E. Developing and assigning performance
standards.
A B C D E
1. Loading data ● ● ● ● ●
2. Setting analysis specifications ● ●
3. Analyzing test items ● ●
4. Analysing test dimensionality ● ●
5. Analyzing differential item functioning ● ●
6. Linking ● ●
7. Scaling test results ● ●
8. Selecting optimal test items ● ● ●
9. Informing development of performance
standards
● ● ●
10. Saving results ● ● ● ● ●
•Response Data and Key for IATAAnalysis
•IATA
76
JMetrik
• JMetrik software is one of the open source programs that can be
used in the context of IRT and CTT.
Download and Install jMetrik
• jMetrik is a free application that runs on any Windows, MacOS, or
Linux computer that has Java installed.
• jMetrik is available from this url:
https://itemanalysis.com/jmetrik-download/
• Make sure that you updated Java or you may have problems getting
jMetrik to run properly. You can download Java from this URL:
https://www.java.com/en/download/
77
JMetrik
78
JMetrik Software Main Screen
JMetrik….
79
Data Creation
Opening database window
80
Data Transfer window
81
Data Definition Window I
82
Data Definition Window II
83
Item Scoring window
84
DIF analysis window
85
•Response Data and Key for JMetrik Analysis
•JMeterik
86
87

More Related Content

Similar to ITEM ANALYSIS 2023.pptx uses for exam development especially national examination development.

evaluations Item Analysis for teachers.pdf
evaluations  Item Analysis for teachers.pdfevaluations  Item Analysis for teachers.pdf
evaluations Item Analysis for teachers.pdfBatMan752678
 
Item analysis- 1st yr Msc[n] research
Item analysis- 1st yr Msc[n] researchItem analysis- 1st yr Msc[n] research
Item analysis- 1st yr Msc[n] researchSUCHITRARATI1976
 
Item analysis and validation
Item analysis and validationItem analysis and validation
Item analysis and validationKEnkenken Tan
 
Tools construction and validation of Tools
Tools construction and validation of ToolsTools construction and validation of Tools
Tools construction and validation of ToolsSrinivasan Padmanaban
 
ESE444/544 - Types of Assessment
ESE444/544 - Types of AssessmentESE444/544 - Types of Assessment
ESE444/544 - Types of Assessmentamacargel
 
CHAPTER 6 Assessment of Learning 1
CHAPTER 6 Assessment of Learning 1CHAPTER 6 Assessment of Learning 1
CHAPTER 6 Assessment of Learning 1FriasKentOmer
 
Analysis of Multiple Choice Questions (MCQs): Item and Test Statistics from a...
Analysis of Multiple Choice Questions (MCQs): Item and Test Statistics from a...Analysis of Multiple Choice Questions (MCQs): Item and Test Statistics from a...
Analysis of Multiple Choice Questions (MCQs): Item and Test Statistics from a...iosrjce
 
Item and Distracter Analysis
Item and Distracter AnalysisItem and Distracter Analysis
Item and Distracter AnalysisSue Quirante
 
Psychometrics 101: Know What Your Assessment Data is Telling You
Psychometrics 101: Know What Your Assessment Data is Telling YouPsychometrics 101: Know What Your Assessment Data is Telling You
Psychometrics 101: Know What Your Assessment Data is Telling YouExamSoft
 
tryout test, item analysis (difficulty, discrimination)
tryout test, item analysis (difficulty, discrimination)tryout test, item analysis (difficulty, discrimination)
tryout test, item analysis (difficulty, discrimination)April Gealene Alera
 
educatiinar.pptx
educatiinar.pptxeducatiinar.pptx
educatiinar.pptxNithuNithu7
 
Caveon webinar series Standard Setting for the 21st Century, Using Informa...
Caveon webinar series    Standard Setting for the 21st Century, Using Informa...Caveon webinar series    Standard Setting for the 21st Century, Using Informa...
Caveon webinar series Standard Setting for the 21st Century, Using Informa...Caveon Test Security
 
ITEM ANALYSIS.pptx
ITEM ANALYSIS.pptxITEM ANALYSIS.pptx
ITEM ANALYSIS.pptxRizaGarganza
 

Similar to ITEM ANALYSIS 2023.pptx uses for exam development especially national examination development. (20)

Teaching technology2
Teaching technology2Teaching technology2
Teaching technology2
 
evaluations Item Analysis for teachers.pdf
evaluations  Item Analysis for teachers.pdfevaluations  Item Analysis for teachers.pdf
evaluations Item Analysis for teachers.pdf
 
Item analysis- 1st yr Msc[n] research
Item analysis- 1st yr Msc[n] researchItem analysis- 1st yr Msc[n] research
Item analysis- 1st yr Msc[n] research
 
Item analysis and validation
Item analysis and validationItem analysis and validation
Item analysis and validation
 
Tools construction and validation of Tools
Tools construction and validation of ToolsTools construction and validation of Tools
Tools construction and validation of Tools
 
ESE444/544 - Types of Assessment
ESE444/544 - Types of AssessmentESE444/544 - Types of Assessment
ESE444/544 - Types of Assessment
 
item analysics
item analysicsitem analysics
item analysics
 
CHAPTER 6 Assessment of Learning 1
CHAPTER 6 Assessment of Learning 1CHAPTER 6 Assessment of Learning 1
CHAPTER 6 Assessment of Learning 1
 
Item analysis
Item analysisItem analysis
Item analysis
 
Analysis of Multiple Choice Questions (MCQs): Item and Test Statistics from a...
Analysis of Multiple Choice Questions (MCQs): Item and Test Statistics from a...Analysis of Multiple Choice Questions (MCQs): Item and Test Statistics from a...
Analysis of Multiple Choice Questions (MCQs): Item and Test Statistics from a...
 
Item and Distracter Analysis
Item and Distracter AnalysisItem and Distracter Analysis
Item and Distracter Analysis
 
Psychometrics 101: Know What Your Assessment Data is Telling You
Psychometrics 101: Know What Your Assessment Data is Telling YouPsychometrics 101: Know What Your Assessment Data is Telling You
Psychometrics 101: Know What Your Assessment Data is Telling You
 
Item Analysis
Item AnalysisItem Analysis
Item Analysis
 
Item analysis with spss software
Item analysis with spss softwareItem analysis with spss software
Item analysis with spss software
 
DepEd Item Analysis
DepEd Item AnalysisDepEd Item Analysis
DepEd Item Analysis
 
tryout test, item analysis (difficulty, discrimination)
tryout test, item analysis (difficulty, discrimination)tryout test, item analysis (difficulty, discrimination)
tryout test, item analysis (difficulty, discrimination)
 
educatiinar.pptx
educatiinar.pptxeducatiinar.pptx
educatiinar.pptx
 
Item analysis ppt
Item analysis pptItem analysis ppt
Item analysis ppt
 
Caveon webinar series Standard Setting for the 21st Century, Using Informa...
Caveon webinar series    Standard Setting for the 21st Century, Using Informa...Caveon webinar series    Standard Setting for the 21st Century, Using Informa...
Caveon webinar series Standard Setting for the 21st Century, Using Informa...
 
ITEM ANALYSIS.pptx
ITEM ANALYSIS.pptxITEM ANALYSIS.pptx
ITEM ANALYSIS.pptx
 

Recently uploaded

Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 

Recently uploaded (20)

Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 

ITEM ANALYSIS 2023.pptx uses for exam development especially national examination development.

  • 1. Educational Assessment and Examinations Service (EAES) August, 2023 Dembel View Hotel, Adama
  • 2. Outlines • Introduction • Basic concepts of CTT • Basic concepts of IRT • Reliability and Validity • Differential Item Functioning • Item analysis using Software
  • 3. Introduction • Test is concerned with Turing performance in numbers (Baxten, 1998) • 13% of students who fail in the class are caused by faulty test items (World Watch, 2005) • Masters et al. (2001) in US , 2,233 minor and major violations of item-writing guidelines were identified. • It is estimated that 90% of the testing items are out of quality (Wilen, W.W, 1992) • Teachers have difficulty in developing plausible distractors in MCQs and only 52% of all distractors were functioning effectively ( Tarrant, Ware & Mohammed, 2009) • What about in case of Ethiopia with respect to quality test?? • Thus, item analysis is very important to keep the quality of the test 3
  • 4. Cont.… 4 Item Analysis A method used to evaluate test items, typically for the purpose of test construction and revision. A process to examines students individual test items and test as a whole. Useful to improve items and eliminate ambiguous/ misleading items Help to identify specific areas of subject content that need greater emphasis/clarity Suggest ways of improving the measurement of a test Valuable to increase the skill in test construction
  • 5. Cont.… Helps to bring a match between what is taught and what is assessed. Care should be taken on sampling of items and its difficulty level. Help to understand and make decisions about poor performing items Helps to improve test items and identify unfair or biased items Help to carefully align instruction with the Grade level expectations from which a standardized test items derived. Purpose of Item Analysis
  • 6. Item Analysis Methods Qualitative • A non – numerical method for analyzing test items not employing students responses. • Rather considering test objectives, content validity, and technical item quality such as Matching items and objectives Editing poorly written items Improving the content validity of the test Evaluating the items across table of specification and item writing guidelines Quantitative • A numerical method for analyzing test items based on students response. • It includes: Difficulty/b-parameter Discrimination/ a-parameter Option analysis Reliability Differential item functioning , etc. 6
  • 7. Cont.… Item Analysis Methods Quantitative Qualitative Difficulty/b-parameter Discrimination/a-parameter Option Analysis Reliability Validity Differential Item Functioning/DIF
  • 8. Basic concepts of Classical Test Theory and Item Response Theory • CTT and IRT are the two primary psychometric paradigms. • They are a mathematical approaches to how tests items are analyzed. • They differ quite substantially in substance and complexity, even though they both nominally do the same thing. • So there is no single best answer to the question of using either CTT or IRT. • Since in many cases, BOTH are necessary and can be used based on the purpose of the item analysis. • However, CTT and IRT have some differences . 8
  • 9. Comparison of CTT vs. IRT 9 Feature CTT IRT Ability-Item relationship Linear Logistic curve Invariance of item & person statistics No Yes Difficulty P-value b-parameter Discrimination D(item-total) a-parameter Adaptive Testing Rare Suitable Reliability Depend on test length Don't depend on test length Equating Complicated Automatic Item – Model Fit No Yes Sample size needed Small Large Option analysis Preferable Rare
  • 10. Classical Test Theory and Item Response Theory
  • 11. Classical Test Theory (CTT) • They are the easiest and most widely used form of analyses. • The statistics can be computed by readily available statistical packages (or even by hand). • They are performed on the test as a whole rather than on the item • Although item statistics can be generated, they apply only to that group of students on that collection of items • CTT assumes that each person has a true score (T), that would be obtained if there were no errors (E) in measurement. • Unfortunately, test users never observe a person's true score, only an observed score (X) • Thus, CTT is based on the true score model: • In CTT we assume that the error : Is normally distributed Uncorrelated with true score Has a mean of Zero X T E  
  • 12. Item Difficulty and Discrimination in CTT Item Difficulty Level (P) • The percentage of students who answered the item correctly • Calculation: or • The range is b/n 0% and 100% (0.0&1.00) • The higher the value, the easier the item and the lower the value, the harder the item • An item with a p value of .0 or 1.0 does not contribute to measure individual differences • Ideal value of an item difficulty is 0.50 • Small number of easy or difficult items may be included to motivate or differentiate the test takers 12 P= Upper group + Lower group Total group P= # Correct students Total students
  • 13. The distribution items by difficulty levels in a Test 13 Author Type of Items Percentage Sugianto [2020] Very Easy 10% Easy 20% Moderate 40% Difficult 20% Challenging 10% Arifin (2009)- three options based on the purpose of test Difficult Medium Easy 25% 50% 25% Difficult Medium Easy 20% 60% 20% Difficult Medium Easy 15% 70% 15%
  • 14. Interpretation of difficulty index (p-value). 14 Author Difficulty index Interpretation Uddin et al. (2020) >80% Easy 30–80% Moderate <30% Difficult Kaur, Singla et al. (2016) >80 Easy 40–80 Moderate <39 Difficult Sugianto (2020) 90% Easy 50% Moderate 10% Difficult Jaipurkar et al. (2021) >70% Too easy 50–60% Excellent/ideal 30–70% Good/acceptable/average Obon and Rey (2019) > 0.76 Easy 0.26–0.75 Right difficult (Retain) 0–0.25 Difficult (Revise/Discard) Bhat and Prasad (2021) >70% Easy 30–70% Good <30% Difficult
  • 15. Item Discrimination Power (D) • Ability of items to elicit different responses from students with different abilities/skills. • The computed difference between the percentage of high achievers and the percentage of low achievers who got the item right. • The maximum range of the Discrimination Index is from -1.0 to +1.0 • The higher the value of D, the more adequately the item discriminates (highest value is 1.0) • Values close to 0 means most students performed the same on an item 15 D = (Correct Upper) - (Correct Lower) (1/2 Total)
  • 16. Discrimination Index (D) • Those who did well on the overall test chose the correct answer for a particular item more often than those who did poorly on the overall test. Positive Discrimination Index • Those who did poorly on the overall test chose the correct answer for a particular item more often than those who did well on the overall test. Negative Discrimination Index • Those who did well and those who did poorly on the overall test chose the correct answer for a particular item with equal frequency Zero Discrimination Index  Negative discriminators (-) (This is never what we want) Non-discriminators (0) (This may or may not be what we want) Positive discriminators (+) (This is usually what we want)
  • 17. Interpretation of Discrimination power (D) 17 Author Discrimination power Interpretation Elfaki, Bahamdan et al. [2015] • ≥0.35 Excellent • 0.25–0.34 Good • 0.21–0.24 Acceptable • ≤ 0.20 Poor Obon and Rey [2019] • ≥ 0.50 Very Good • 0.40–0.49 Good (Very Usable) • 0.30–0.39 Fair Quality (Usable Item) • 0.20–0.29 Poor (Revised) • ≤ 0.20 Very Poor (Critically Revised/ Discard Bhat and Prasad [2021] • > 0.35 Excellent • 0.2–0.35 Good • < 0.2 Poor Sugianto [2020] • >0.40 Very good • 0.30–0.39 Good • 0.20–0.29 Marginal & need improvement • <0.19 Poor, rejected/ improved by revision
  • 18. Interpretation of Discrimination power(D) 18 Author Discrimination power Interpretation Aljehani, Pullisheryet al. [2020] and Sharma [2021] • ≥ 0.40 Very good item (Keep) • 0.30–0.39 Good item (Keep) • 0.20–0.29 Moderate & fair (Keep) • < 0.20 Marginal, Revise/Discard • Negative Worst & Definitely Discard Ramzan, Imran et al. [2020] • > 0.30 Excellent • 0.20–0.29 Good • 0–0.19 Poor • <0 Defective & Discard Uddin et al. [2020] • ≥ 0.35 Excellent • 0.25–0.34 Good • 0.21–0.24 Acceptable • < 0.20 Poor
  • 19. Item Analysis for Partial Credit Items Partial credit items (short answer, essay) items can be analyzed using the following formula: P = UGSP + LGSP x 100 WI x T D = UGSP - LGSP WI x 1/2T Where: UGSP-Upper group sum point on item LGSP-Lower group sum point on item WI- Weight of an Item T- Total number of students 19
  • 20. Example • The maximum point of a short answer item was 3. Among the upper groups of students, each 6 of them scored 3 points and 4 of them scored 2 points. From the lower group, each 4 of them 2 and 6 of them scored 1 point. Find P and D of an item. • P = UGSP + LGSP x 100 WI x T = (6*3+4*2)+(4*2+6*1) = 40/60 = 0.67 3*20 • D = UGSP - LGSP WI x 1/2T = 26-14 = 8/30 = 0.27 3*10 20
  • 21. Option Analysis • Analysis of how well the high and low groups are responding to the items options. • Compare the performance of the highest- and lowest-scoring of the students on the distracter options • Fewer of the top performers should choose each of the distractors as their answer compared to the bottom performers. • A good distractor attracts more students from the lower group than the upper group. • It is not desirable to have one of the distractors chosen more often than the correct answer. • If so, this distractor may be too similar to the correct answer and/or there may be something in either the stem or the alternatives that is misleading. • At the key answer, the difference between upper and lower performers expected to be positive, while at the distracters it expected to be negative. 21
  • 22. Activity 1 Consider the case below Suppose your students chose the options to a four – alternative multiple – choice item. Let C as the correct answer. A B C* D 3 0 18 9 Questions • How does this information help us? • Is the item too difficult/easy for the students? • What is the difficulty level value? • What is the discrimination index value? • Are the distractors of the items effective? • Should this item be eliminated? Item X
  • 23. 𝒑 = 𝑵𝒖𝒎𝒃𝒆𝒓 𝒔𝒆𝒍𝒆𝒄𝒕𝒊𝒏𝒈 𝒄𝒐𝒓𝒓𝒆𝒄𝒕 𝒂𝒏𝒔𝒘𝒆𝒓 𝑻𝒐𝒕𝒂𝒍 𝑵𝒖𝒎𝒃𝒆𝒓 𝒕𝒂𝒌𝒊𝒏𝒈 𝒕𝒉𝒆 𝒕𝒆𝒔𝒕 𝒑 = 𝟎. 𝟔𝟎 Solving the difficulty index for Item X. 𝒑 = 𝟏𝟖 𝟑𝟎 • Thus, the difficulty level of the item is 0. 60 (60%), the item is moderate. A B C* D 3 0 18 9 Item X As Bhat and Prasad (2021): If level > 0.70, the item is considered relatively easy. If P level < 0. 30, the item is considered relatively difficult.
  • 24. D= 𝟎. 𝟐𝟕 Solving the Discrimination Index for item X. if the # of upper and lower groups correct are 11 and 7 respectively. D= 𝟏𝟏−𝟕 𝟏𝟓 = 𝟒 𝟏𝟓 • The discrimination power of item X is 0. 27 and positive. • More students who did well on the overall test answered the item correctly than students who did poorly on the overall test. • Thus, it has good discrimination power A B C* D 3 0 18 9 Item X As Bhat and Prasad (2021): If D index is 20-35 , the item has good discrimination power. D = (Correct Upper) - (Correct Lower) (1/2 Total)
  • 25. Implication Difficulty Level (p) = 0. 60 Discrimination Index (D) = 0.27 1. Should item X be eliminated? Item X is considered a moderately difficult item that has positive (desirable) discrimination ability. NO 2. Should any distractor(s) be modified? A B C* D 3 0 18 9 Item X YES • Option B is ought to be modified or replaced. As No one chose it • Option A also need revision
  • 26. Item Response Theory (IRT) • IRT – refers to a family of latent trait models used to establish psychometric properties of items and scales • Sometimes known as modern psychometrics because in large-scale assessment, testing programs and professional testing IRT has almost completely replaced CTT • IRT has many advantages over CTT that have brought IRT into more frequent use • Three Basics Components of IRT are: Item Response Function (IRF) – Mathematical function that relates the latent trait to the probability of endorsing an item Item Information Function – an indication of item quality; an item’s ability to differentiate among respondents Invariance – position on the latent trait can be estimated by any items with know IRFs and item characteristics are population independent within a linear transformation
  • 27. Cont.… Item Response Function (IRF) • It characterizes the relation between a latent variable/ability and the probability of endorsing an item. • The IRF models the relationship between examinee trait level, item properties and the probability of endorsing the item. • Examinee trait level is signified by the Greek letter theta () and typically has mean = 0 and a standard deviation = 1 • IRFs can be converted into Item Characteristic Curves (ICC) which are graphical functions that represents the respondents ability as a function of the probability of endorsing the item
  • 28. IRF- Item Parameters in IRT Location – b-parameter • An item’s location is defined as the amount of the latent trait needed to have a .5 probability of endorsing the item. • The higher the “b” parameter the higher on the trait level a respondent needs to be in order to endorse the item • It is analogous to difficulty level in CTT • Like Z scores, the values of b typically range from -3 to +3 • Indicates the steepness of the IRF at the items location Discrimination/Slope –a- parameter • It indicates how strongly related the item is to the latent trait like loadings in a factor analysis • Items with high discriminations are better at differentiating respondents around the location point • It typically ranges from 0 to 2 and should never be negative. • Vice versa for items with low discriminations
  • 29. Cont.… Pseudo-guessing –c - parameter • The inclusion of a “c” parameter suggests that respondents very low on the trait may still choose the correct answer. • In other words respondents with low trait levels may still have a small probability of endorsing an item • This is mostly used with multiple choice testing and the value should not vary excessively from the reciprocal of the number of choices. • In general, it is the probability of getting the item correct by guessing alone and varies from 0 to 1. For instance, c = 0.20 means that at all ability levels, the probability of getting the item correct by guessing alone is 0.20 Upper asymptote –d-parameter • The inclusion of a “d” parameter suggests that respondents very high on the latent trait are not guaranteed (i.e. have less than 1 probability) to endorse the item • Often an item that is difficult to endorse
  • 30. IRT - Logistic models The 4-parameter logistic model: Where •  represents examinee trait level • b is the item difficulty that determines the location of the IRF • a is the item’s discrimination that determines the steepness of the IRF • c is a lower asymptote parameter for the IRF • d is an upper asymptote parameter for the IRF ( ) ( ) e ( 1 , , , , ) ( ) 1 e a b a b P X a b c d c d c          
  • 32. Cont.… The 3-parameter logistic model • If the upper asymptote parameter is set to 1.0, then the model is termed a 3PLM. • In this model, individuals at low trait levels have a non-zero probability of endorsing the item. ( ) ( ) e ( 1 , , , ) (1 ) 1 e a b a b P X a b c c c          
  • 34. Cont.… The 2-parameter logistic model: • If the lower asymptote parameter is constrained to zero, then the model is termed a 2PLM. • In the 2PLM, IRFs vary both in their discrimination and difficulty (i.e., location) parameters. ( ) ( ) e ( 1 , , ) 1 e a b a b P X a b        
  • 36. Cont.… The 1-parameter logistic model: • If the item discrimination is set to 1.0 the result is a 1PLM • A 1PLM assumes that all scale items relate to the latent trait equally and items vary only in difficulty (equivalent to having equal factor loadings across items). • Mathematically, the most basic IRT model in 1PLM is identical to, Rasch model, however, there are some differences • In Rasch, the model is superior and data which does not fit the model is discarded • Rasch does not permit abilities to be estimated for extreme items and persons ( ) ( ) e ( 1 , ) 1 e b b P X b        
  • 38. Activity 38 • Which item do think the most difficult? • Which item do think mostly differentiate the learners?
  • 39. Cont.…. In IRT: • If the ability of the student > the difficulty of the item, what do you think the p(1)? • If the ability of the student < the difficulty of the item, what do you think the p(1)? • If the ability of the student = the difficulty of the item, what do you think the p(1)?
  • 40. Test Response Curve/TCC • Test response curve/TCC is the sum of item response functions /ICCs. • A TCC is the latent trait relative to the number of items
  • 41. Differential Item Functioning (DIF) What is DIF? • DIF is a statistical characteristic of an item that shows the extent to which the item might be measuring different abilities for members of separate subgroups (gender, location, language, ethnicity etc.) • DIF occurs when one group of examinees has a different expected item score than comparable examinees from another group. • An item is considered free of differential functioning (DIF) if the item response function is the same across groups (Zwick, 1990). • DIF means that either the item performs differently or measures something different. If the item shows DIF it means that the item is less valid for one subgroup (Steinberg & Thissen, 2006). • A fundamental aspect of all DIF is the matching of students in the reference and focal groups on some measure of ability (Clauser & Mazor, 1998). • The focal group is the one of interest and usually represents a minority group, while a reference group represents a larger group.
  • 42. Types DIF Uniform DIF • DIF is in the same direction across the entire spectrum of item response curves for two groups do not cross • DIF involves the location (b) parameters • DIF is a significant main (group) effect in regression analyses predicting item response Non-uniform DIF • An item favors one group at certain disability levels, and other groups at other levels • DIF involves the discrimination (a) parameters • DIF is a significant group by ability interaction in regressions predicting item response 45
  • 44. ETS: DIF Classification Levels ETS rules for classifying the magnitude of DIF. Stated in terms of the common odds ratio, these rules are as follows: • “A” /items have:  (a) a CMH p-value greater than 0.05, or  (b) the common odds ratio is strictly between 0.65 and 1.53. • “B” items are neither “A” nor “C” items. • “C” items have:  (a) a common odds ratio less than 0.53, and the upper bound of the 95% confidence interval for the common odds ratio is less than 0.65, or  (b) a common odds ratio greater than 1.89 and the lower bound of the 95% confidence interval is greater than 1.53. 47
  • 45. Reliability • Types of reliability Test-retest Parallel Forms Split-half Internal Consistency • Can be calculated by: Split half KR20 (Kuder-Richardson Formula 20) KR21 (Kuder-Richardson Formula 21) Chrobach Alpha 48 • Produce results which are accurate and consistent • Degree to which scores are free of “measurement error” (higher reliabilities = less measurement error) • Reliability coefficients range from .00 - 1.00. • Ideal score of reliability is >0.80 and at least not < 0.70
  • 46. Interpretation for Reliability 49 Author Interpretation of Cronbach’s alpha (KR20) Robinson, Shaver et al (1999) - ≥0.80 Exemplary - 0.70–0.79 Extensive - 0.60–0.69 Moderate - <0.60 Minimal Cicchetti (1994) - >0 .90 Excellent - 0.80–0.90 Good - 0.70–0.80 Fair - <0.70 Unacceptable Axelson and Kreiter (2019) • >0.90 is needed for very high stakes tests • 0.80–0.89 is acceptable for moderate stakes tests • 0.70–0.79 acceptable for lower stakes assessments • <0.70 might be useful as component of overall composite score Obon and Rey (2019) • >0.90 Excellent reliability • 0.80–0.90 Very good for a classroom test • 0.70–0.80 good for a classroom test • 0.60–0.70 Somewhat low • 0.50–0.60 Suggests need for revision of test. • 0.50 < Questionable reliability. Hassan and Hod (2017) - > 0.7 is excellent - < 0.5 is unacceptable - 0.6–0.7 is acceptable - < 0.30 is unreliable - 0.5-0.6 is poor
  • 47. Validity • The extent to which measures indicate what they are intended to measure. • Establishes the measure covers the full range of the concept’s meaning, i.e., covers all dimensions of a concept • Internal statistical validity can be applied to check Uni- dimensionality It more depends on “good “ expert judgment 50
  • 48. Validity Vs. Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable • Reliability is a necessary condition for validity but not sufficient • Reliability is a prerequisite for measurement validity • One needs reliability, but it’s not enough for validity
  • 49. Test Item Analysis Using Software's • There are software products performing the computations of both CTT’s and IRT’s statistics. • There area also those solely performing calculations for CTT or IRT. • For Example:  CITAS, ITEMAN, Lertap, and TAP are the packages that are widely can be performed used only for CTT.  BILOG-MG, flexMIRT, ICL, MULTILOG, PARSCALE, PARAM-3PL, Winsteps and Xcalibre, IRT PRO, NOHARM, TESTFACT, flexMIRT are the packages only used in IRT Whereas JMetrik, R, Mplus and IATA are the software products can compute statistics for both CTT and IRT. • Among software packages facilitate the computations of IRT, only jMetrik, PARAM-3PL, NOHARM and R are free of charge. 52
  • 50. Hands on Practice The commonly used open source programs and user friendly Test Analysis software’s to be discussed in this session are: • TAP: Test Analysis Program /CTT Item And Test Analysis / CTT and IRT - JMetrik used in IRT and CTT.
  • 51. Data Preparation • Capture data in excel/access/SPSS/Text • Design test Map contains answer key, content domain, cognitive domain etc • Insert/import or open student response data in the software (TAP, IATA, JMetrik etc.) • Fill all necessary information required according to the character of the software • Analyze the data and save/print • Identify any exam items that may require revision. • For each identified item, list your observation and a hypothesis of the nature of the problem. 54
  • 53. TAP: Test Analysis Program … • The Test Analysis Program (TAP) is designed as a powerful, easy-to- use (and free!) test analysis package. • TAP is a classical test and item analysis program:- performs test analyses and item analyses based on CTT. TAP output provides: • Examinee Analysis, including percentage correct, letter grade and confidence intervals for each student and aggregate descriptive statistics for the group. • It generates report for each student indicating his/her score, responses to each item and the correct answers to items missed. • Item and Test Analysis, including item difficulty, point biserial, discrimination index and various statistics if item deleted (KR20, scale mean and standard deviation, etc.). • Options Analysis, including high group and low group item difficulty for correct answer and distracters.
  • 54. TAP: Test Analysis Program … • TAP uses text file. • Any text file can be inserted into the TAP data editor window. But for it to work with TAP, the text file must be formatted as follows:  Every case must be on a single row,  The EXAMINEE ID LABEL must be at the far left of each row,  The ITEM data must be numeric and must be in single columns with NO SPACES between the item scores,  The first line must be the ANSWER KEY, formatted like the data (but with no numbers until the first correct answer). After you insert your data, any text in the ANSWER KEY must be deleted.  Also, you will need to fill in the appropriate data editor fields: # examinees, # items, id label length, # options for each item and INCLUDE each item
  • 55. TAP: Test Analysis Program …
  • 56. Steps to Run TAP 1. TAP Software Installation and Setup • Write Test Analysis Program (TAP) in Google search engine , then click website https://people.ohio.edu/brooksg/ 2. Under Programs Available (click on the program name to go to the Description and Download Link for the Program), Click the link TAP: Test Analysis Program (last updated December 2018) 3. The program will be downloaded and launch ready to begin. 4. Click Run to open the TAP. 5. Entering test data directly into TAP Entering new data and go to data editor Importing test data - TAP also provides the option of entering data from an existing text (.txt) file.
  • 57. Steps to Run TAP …
  • 58. Steps to Run TAP … In the data editor screen enter the following information's: • descriptive information in the Title and Comments sections. • Input the number of examinees, number of items, missing data symbol and ID label (student name) length in the appropriate fields. • In the Answer Key field, enter the numbers corresponding to the correct answers as a string with no delimiters. • In the # Options field, enter the number of options corresponding to each question. • The Item Included field allows the user to eliminate items from the analysis or set alternative correct answers. Say Y to include or N to exclude. • In the Data screen, enter the student identification information and scores. Align the score data with the guide above the Data screen. • When all information is entered, click on either Save File or Close and Analyze at the bottom of the Data Entry screen.
  • 59. Steps to Run TAP … 5. Saving Data Files • Test data created entered by the user or created by TAP’s random data generator can be saved as TAP files for archival information, future modification or analysis. • Data files can be saved in TAP’s data editor window. i. Choose Save TAP file under the File menu. ii. Select the location and save the TAP file. iii. To open file later, choose Open TAP file under the File menu in the Data Editor Screen. 6. Analyzing Tests with TAP Once you have a set of test scores in the Data Editor either by direct entering or importing your own test, you can run the analysis by clicking Analyze (F9). To retrieve the full analysis click on the View Full Results box
  • 60. •Response Data and Key for TAPAnalysis •TAP 63
  • 61. IATA: Item And Test Analysis • The item and test analysis software (IATA) is intended to help national assessment practitioners, researchers, and others analyze test item data as well as build effective assessment tools. • IATA was designed to offer a user-friendly way to address many statistical considerations related to national assessments. • It targets specifically those who are interested in analyzing test data, creating a new test from an item bank, or comparing or scaling test items between different samples. • The overarching goal of IATA is to increase the usability and interpretability of test scores. • The primary goal of test development from a statistical perspective is to reduce the error of measurement. To reduce error of measurement, IATA identifies problematic items that contribute to error so that they may be revised, replaced, or removed altogether. • The second goal is to establish meaningful and consistent scales on which to report test scores.
  • 62. IATA: Item And Test Analysis … • IATA can read and write a variety of common data table formats ( Access, Excel, SPSS, delimited text files) if they are formatted correctly. • If the data are not formatted with the correct structure, IATA will not be able to carry out the analyses. • Database-compatible format such as Access or SPSS already take care of most data formatting issue. • However, if the data are stored in a less restrictive format, such as Excel or text file, the following conventions should be followed: The names of variables should appear in the cell at the top of each column (header). The name of each variable must be distinct from the names of other variables in a data file. The names of variables must begin with a letter & should not contain any spaces. The data range must not contain any empty rows or columns. The data range is the rectangle of cells that contain data, beginning with the variable name of the first variable to appear in the data file and ending with the value of the last variable in the bottom-most row. The data range must begin at the first cell in the spreadsheet or file. In Excel, this cell is labelled “A1.” In text files, this is the top-left cursor position in the text file.
  • 63. IATA: Item And Test Analysis …
  • 64. IATA: Item And Test Analysis … • There are two main types of data produced by and used in the analysis of assessments: response data and item data. 1. Response data are produced by the individual learners as they answer questions on a test. • It includes the response of each student to each test item. Should record codes representing the options endorsed by each student (e.g., A, B, C, D, etc. or it may be numbers). • It may also include other useful demographic information on variables for analyzing test results such as age, grade, gender, school, and region. • Unique response Identifier (ID) other wise IATA will automatically produce it. • In order to score the response data, for most analyses, an answer key must be loaded into IATA.
  • 65. IATA: Item And Test Analysis … • Treatment of Missing and Omitted Data: When a student does not provide a response to a test item, rather than leaving the data field blank, a missing value code is used to record why the response is missing. There are two types of missing responses: missing and omitted. Common conventions use specific values for the different types of non- response data. See Greaney and Kellaghan (2012) for information on response codes. Common values used are: 9 for missing responses, where students have not responded at all to an item, 8 for muptiple responses, which typically occurs in multiple choice tests when students provide multiple response and in open-ended items when student responses are illegible, and 7 for omitted or not-presented items, which might be used in a rotated booklet design. Regardless of the specific codes used, you must specify how IATA is to treat each non-response code, as either missing or omitted.
  • 66. IATA: Item And Test Analysis … • Item naming: • It is important to assign a unique name to each item in a national assessment program (see Anderson and Morgan 2008; Greaney and Kellaghan 2012) for different purposes such as linking, future use (IB), etc. (e.g. MA04M17001). • Variables reserved by IATA • During the analysis of response data, IATA will calculate a variety of different working variables so that we are restricted and should not be used as names of test items or questionnaire variables. • These are: Xweight, Missing, PercentScore, PercentError, Percentile, RawZScore, Zscore, IRTscore, IRTerror, IRTskew, IRTkurt, TrueScore, Level and “@” symbol.
  • 67. IATA: Item And Test Analysis … 2. Item Data: • A test is a specific collection of questions that evaluate a common domain of proficiency or knowledge. Individual questions on a test are referred to as items. • IATA produces and uses item data files with a specific format. • An item data file contains all the information required to perform statistical analysis of items and may contain the parameters used to describe the statistical properties of items. • It is simply a bank file that describe the item.
  • 68. IATA: Item And Test Analysis … • Required Variables in an item data file are the following. • Example
  • 69. IATA Result interpretations • Uses traffic symbols: Symbol Meaning Green circles indicate no problems. A yellow diamond indicates that the results are less than optimal. This indicator is used to suggest that modifications may be required to either the analysis specifications or the items themselves. However, the item is not introducing any significant error into the analysis results. A red warning triangle appears beside any potentially problematic items. This indicator is used either to indicate items that could not be included in the analysis due to problems with the data or specifications, or to recommend a more detailed examination of the specifications or underlying data and test item. When this indicator appears, it does not necessarily mean that there is a problem, but it does suggest that the overall analysis results may be more accurate if the indicated test item were removed or if the analysis were re-specified.
  • 70. IATA analysis workflows and interfaces (Steps) Double click or right click to open IATA. Click main menu.
  • 71. IATA analysis workflows and interfaces (Steps) … There are five workflows available in IATA: 1. Response data analysis, 2. Response data analysis with linking, 3. Linking item data, 4. Selecting optimal test items, and 5. Developing and assigning performance standards
  • 72. • There are 10 different tasks in IATA and the workflows in which they are used Task Workflow: A. Response data analysis B. Response data analysis with linking C. Linking item data D. Selecting optimal test items E. Developing and assigning performance standards. A B C D E 1. Loading data ● ● ● ● ● 2. Setting analysis specifications ● ● 3. Analyzing test items ● ● 4. Analysing test dimensionality ● ● 5. Analyzing differential item functioning ● ● 6. Linking ● ● 7. Scaling test results ● ● 8. Selecting optimal test items ● ● ● 9. Informing development of performance standards ● ● ● 10. Saving results ● ● ● ● ●
  • 73. •Response Data and Key for IATAAnalysis •IATA 76
  • 74. JMetrik • JMetrik software is one of the open source programs that can be used in the context of IRT and CTT. Download and Install jMetrik • jMetrik is a free application that runs on any Windows, MacOS, or Linux computer that has Java installed. • jMetrik is available from this url: https://itemanalysis.com/jmetrik-download/ • Make sure that you updated Java or you may have problems getting jMetrik to run properly. You can download Java from this URL: https://www.java.com/en/download/ 77
  • 83. •Response Data and Key for JMetrik Analysis •JMeterik 86
  • 84. 87