Criteria to consider when constructing good tests

 Criteria to Consider when Constructing Good Tests
A. Validity – is the degree to which the test measures what is intended to
measure. It is the usefulness of the test for a given purpose. It is the most
important criterion of a good examination.
Factors Influencing the Validity of the Tests In General
1. Appropriateness of Test – it should measure the abilities, skill and
information it is supposed to measure.
2. Directions –it should indicate how the learners should answer and
record their answers.
3. Reading Vocabulary and Sentence Structure –it should be based on
the intellectual level of maturity and background experience of the
learners.
4. Difficulty of Items - it should have items that are not too difficult and not
too easy to be able to discriminate the bright from slow pupils.
5. Construction of Test Items – it should not provide clues so it will not be
a test on clues nor ambiguous so it will not be a test on interpretation.
6. Length of the Test –it should just be sufficient length so it can measure
what it is supposed to measure and not that it is too short that it cannot
adequately measure the performance we want to measure.
7. Arrangement of Items –it should have items that are arranged in
ascending level of difficulty such that it starts with the easy so that the
pupils will pursue on taking the test.
8. Patterns of Answer –it should not allow the creation of patterns in
answering the test.
Ways in Establishing Validity
1. Face Validity – is done by examining the physical appearance of the test
2. Content Validity – is done through a careful and critical examination of
the objectives of the test so that it reflects the curricular objectives.
3. Criterion-related Validity – is established statistically such that a set of
scores revealed by a test is correlated with the scores obtained in
another external predictor or measure.
a. Concurrent validity – describes the present status of the individual
by correlating the sets of scores obtained from two measures given
concurrently.
b. Predictive validity – describes the future performance of an
individual by correlating the sets of scores obtained from two
measures given at a longer time interval.
4. Construct Validity – is established statistically by comparing
psychological traits or factors that theoretically influence scores in a test.
a. Convergent Validity – is established if the instrument defines
another similar trait other than what it is intended to measure. e.g.
Critical Thinking Test may be correlated with Creative Thinking Test.
b. Divergent Validity – is established if an instrument can describe only
the intended trait and not the other traits. e. g. Critical Thinking Test
may not be correlated with Reading Comprehension Test.
B. Reliability – it refers to the consistency of scores obtained by the same person
when retested using the same instrument or one that is parallel to it.

Factors Affecting Reliability
1. Length of the Test – as a general rule, the longer the test, the higher the
reliability. A longer test provides a more adequate sample of the behavior
being measured and is less distorted by chance factors like guessing.
2. Difficulty of the Test – ideally, achievement tests should be constructed
such that the average score is 50 percent correct and the scores range from
near zero to perfect. The bigger spread of the scores, the more reliable the
measured difference is likely to be. A test is reliable if the coefficient of
correlation is not less than 0.85.
3. Objectivity – can be obtained by eliminating the bias, opinions or
judgments of the person who checks the test.
Method
Type of Reliability
Measure
Procedure
Statistical
Measure
A.
Test-Retest Measure
of stability
Give a test twice to the same
group with any time interval
between tests from several
minutes to several years.
Pearson r
B.
Equivalent
Forms
Measure
of equivalence
Give parallel forms of tests with
close time intervals between
forms.
Pearson r
C.
Test-Retest
with Equivalent
Forms
Measure
of stability
and equivalence
Give parallel forms of test with
increased time intervals
between forms.
Pearson r
D.
Split Half Measure
of Internal Consistency
Give a test once. Score
equivalent halves of the test
e.g. odd- and even- numbered
items
Pearson r &
Spearman
Brown
Formula
E.
Kuder-
Richardson
Measure
of Internal Consistency
Give the test once then
correlate the
proportion/percentage of the
students passing and not
passing a given item.
Kuder-
Richardson
Formula 20
and 21
Formulas for Measures of Correlation Used in Establishing Test Validity & Reliability
Pearson r
𝑟 =
∑ 𝑋𝑌
𝑁
−(
∑ 𝑋
𝑁
)(
∑ 𝑌
𝑁
)
√∑ 𝑋2
𝑁
−(
∑ 𝑋
𝑁
)
2
√∑ 𝑌2
𝑁
− (
∑ 𝑌
𝑁
)
2
Spearman Brown Formula
𝑟𝑒𝑙𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑤ℎ𝑜𝑙𝑒 𝑡𝑒𝑠𝑡 =
2𝑟 𝑜𝑒
1+ 𝑟 𝑜𝑒
Kuder-Richardson Formula 20
𝐾𝑅20 =
𝐾
𝐾−1
[1 −
∑ 𝑝𝑞
𝑆2
]
Where:
X – scores in a test
Y – scores in a retest
N –number of examinees
Where:
roe– reliability coefficient
using the split-half or odd-
even procedure
Where:
K – no. of items
p – proportion of the examinees who got the
item right
q – proportion of the examinees who got the
item wrong
S2
– variance or the square of the standard
deviation

Kuder-Richardson Formula 21
𝐾𝑅21 =
𝐾
𝐾−1
[1 −
𝑘𝑝̅ 𝑞
𝑆2
]
Interpretation of the Pearson r correlation value
𝐻𝑖𝑔ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 {
1 − 𝑃𝑒𝑟𝑓𝑒𝑐𝑡 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
0.5 − 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
𝐿𝑜𝑤 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 {
0.5 − 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
0 − 𝑍𝑒𝑟𝑜 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
𝐿𝑜𝑤 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 {
0 − 𝑍𝑒𝑟𝑜 𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
−0.5 − 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝐶𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
𝐻𝑖𝑔ℎ 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 {
−0.5 − 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
−1 − 𝑃𝑒𝑟𝑓𝑒𝑐𝑡 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛
C. Administrability – the test should be administered with ease, clarity and
uniformity so that scores obtained are comparable. Uniformity can be obtained
by setting the time limit and oral instructions.
D. Scorability – the test should be easy to score such that directions for scoring
are clear, the scoring key is simple; provisions for answer sheets are made.
E. Economy – the test should be given in the cheapest way, which means that
answer sheets must be provided so the test can be given from time to time.
F. Adequacy – the test should contain a wide sampling of items to determine the
educational outcomes or abilities so that the resulting scores are
representatives of the total performance in the areas measured.
G. Authenticity – the test should simulate real-life situations.
 Shapes of the Frequency Polygons
1. Normal – bell-shaped curve
2. Positively skewed – most scores are below the mean and there are extremely high scores, 𝑥̅ >
𝑥̂ (mean is greater than the mode)
3. Negatively skewed – most scores are above the mean and there are extremely low scores,
𝑥̅ < 𝑥̂ (mean is lower than the mode)
4. Leptokurtic – highly peaked and the tails are more elevated above the baseline
5. Mesokurtic – moderately peaked
6. Platykurtic – flattened peak
7. Bimodal Curve – curve with two peaks or mode
8. Polymodal Curve – curve with three or more modes
9. Rectangular Distribution – there is no mode
 Four Types of Measurement Scales
Measurement Scale Characteristics Example
1. Nominal  Groups and labels data Gender (1-male, 2-female)
2. Ordinal  Ranks data
 Distance between
points are indefinite
Income (1-low, 2-average, 3-
high)
3. Interval  Distance between
points are equal
 No absolute zero point
Test scores and temperature
*a score of zero in a test does
not mean no knowledge at all
4. Ratio  All of the above except
that it has an absolute
zero point
Height, weight
* a zero weight means no
weight at all
Where: 𝑝̅ =
𝑋̅
𝐾
; 𝑞 = 1 − 𝑝

Measures of Central Tendency and Variability
Assumptions When Used
Appropriate Statistical Tools
Measure of Central
Tendency
-describes the
representative value of
a set of data
Measure of Variability
-describes the degree of
spread or dispersion of a
set of data
 When the frequency distribution
is regularly/ symmetrically/
normal
 Usually used when the data are
numeric (interval or ratio)
Mean – the arithmetic
average
Standard Deviation – the
root-mean-square of the
deviations from the mean.
 When the frequency distribution
is irregular/ skewed
ordinal
Median – the middle
score in a group of
scores that are ranked
Quartile Deviation – the
average deviation of the 1st
and 3rd quartiles from the
median
 When the distribution of scores is
normal and quick answer is
needed
nominal
Mode – the score that
occurs frequently
Range – the difference
between the highest and
lowest score in a set of
observation
I. Procedure in the Computation of the Measures of Central Tendency
A. Mean
Procedure:
1. Mean of Ungrouped Data: used for few cases (N<30)
a. Get the sum of scores (ΣX)
b. Divide the sum by the number of cases (N)
Formula: 𝑋̅ = ∑ 𝑋/𝑁
2. Mean of Grouped Data: uses for large cases (N>30)
There are 2 possible methods that will be discussed in computing the mean of grouped data.
a. Using Midpoint Method
Procedures:
1) Group data in the form of a frequency distribution
2) Compute the midpoints of all class limits (M)
3) Multiply the midpoints by their frequencies (M x F)
4) Get the sum of the products of the midpoints and frequencies (Σ MF)
5) Divide the sum by the number of cases (N)
Formula: 𝑋̅ =
∑ 𝑀𝐹
𝑁
b. Using Class Deviation Method
Procedures:
1) Choose your arbitrary starting point or origin from any of the class limits
2) Get the midpoint of the class limit that you have chosen as your starting point. Call this
your assumed mean (AM)
3) Get the deviation (D) of each class limit from the class limit where the assumed mean
is. The deviation of the class limit where the assumed mean is located is 0. Add one
(+1) to each class limit higher than this point of origin and subtract one (-1) to the
class limit lower than the origin.
4) Multiply the frequencies by their corresponding deviations (FD)
5) Add the products of the frequencies and deviations (ΣFD)
6) Divide the sum by the number of cases (ΣFD/N)
7) Multiply the quotient by the number of class interval (i)
8) Add the product to the assumed mean
Formula: 𝑋̅ = 𝐴𝑀 + 𝑖 (
∑ 𝐹𝐷
𝑁
)

B. Mode
 Median of Ungrouped Data
There are several ways in the computation of median for ungrouped data. The process
depends on a case to case basis
Case 1: The total number of cases is an odd number
Procedure:
1.) Arrange the scores from the highest to lowest or vice versa
2.) Get the middlemost score. The score is the median score
Case 2: The total number of cases is an even number
Procedure:
1.) Arrange the scores from highest to lowest or vice versa.
2.) Get the two middlemost scores
3.) Compute the average of the two middlemost scores. The average is the median score.
Case 3: The middlemost score occurs twice, thrice, or more number of times
Procedure:
1.) Get the middlemost score/s, its/their identical score/s and its/their counterparts either
above or below the middlemost score/s
2.) Compute their average and the average score is the median.
2. Median for Grouped Data
Procedure:
1.) Add up or accumulate the frequencies starting from the lowest to the highest class limit. Call
this the cumulative frequency. (CF)
2.) Find one half of the number of cases in the distribution. (N/2)
3.) Find the cumulative frequency which is equal or closest but higher than the half of the
number of cases. The class containing this frequency is the median class.
4.) Find the lowest limit (LL) of the median class.
5.) Get the cumulative frequency of the class below the median class. (CFb)
6.) Subtract this from the half of the number of cases in the distribution. (N/2 – CFb)
7.) Get the frequency of the median class. (FMdn)
8.) Find the class interval (i) then follow the given formula below.
Formula:
𝑋̃ = 𝐿𝐿 + 𝑖 (
𝑁
2
−𝐶𝐹 𝑏
𝐹𝑀𝑑𝑛
)
C. Mode
Procedure
1. Mode of Ungrouped Data
 Get the most frequent score
 when there are more than three modes, they are called polymodal or multimodal
 when there is no mode, it is describe as a rectangular distribution.
2. Mode for Grouped Data
a. Crude Mode – refers to the midpoint of the class limit with the highest frequency.
Procedure:
1.) Find the class limit with the highest frequency
2.) Get the midpoint of that class limit
3.) The midpoint of the class limit with the highest frequency is the crude mode
Where:
LL = lowestlimitof the medianclass
i = class interval
N/2 = half of the numberof cases
CFb = cumulative frequencybelow the
medianclass
FMdn = frequencyof the medianclass

b. Refined Mode–refers to the mode obtained from an ordered arrangements or a class
frequency distribution
Procedure:
1.) Get the mean and the median of the grouped data.
2.) Multiply the median by three (3Mdn)
3.) Multiply the mean by two (2Mn)
4.) Subtract 2Mn from 3Mdn to get the Mode. (Md)
Formula: 𝑋̂ = 3𝑀𝑑𝑛 − 2𝑀𝑛
 How will you interpret the Measures of Central Tendency?
1.) The value that represents a set of data will be the basis in determining whether the group is
performing better or poorer than the other groups.
II. Procedure in the computation of the Measures of Variability
A. Range (R)
1. For Ungrouped Data – the difference between the highest and lowest score
2. For Grouped Data – the difference between the highest limit of the highest class limit and
the lowest limit of the lowest class limit.
B. Standard Deviation (SD)
Procedure for Ungrouped Data
1.) Find the mean. (𝑋̅)
2.) Subtract the mean from each score to get the deviation. [ 𝑑 = 𝑋̅ − 𝑋̅]
3.) Square the deviation. (d2)
4.) Get the sum of the squared deviations. (Σd2)
5.) Divide the sum by the number of cases (Σ d2 / N – 1)
6.) Get the square root of the answer. √Σd2 / N-1
Formula: 𝑆𝐷 = √ ∑ 𝑑
2
𝑁−1
Procedure for Grouped Data
A. Using Class Deviation Method
1.) Like what you did in the mean, get the deviation (d) and the product of the frequency and
deviation of each score. (fd)
2.) Multiply the product of the frequency and the deviation by the deviation. (fd2)
3.) Get the sum of the product of the frequency and squared deviation. (Σfd2)
4.) Compute the standard deviation using the formula below
Formula: 𝑺𝑫 = 𝑰√[
∑ 𝒇𝒅
𝟐
𝑵
] − [
(∑ 𝒇𝒅)
𝟐
𝑵
𝟐
]
B. Using Midpoint Method
1.) Square the midpoint (M2) and multiply it by the
frequency midpoint (FM)
2.) Write the products of M & FM in another column and label it (FM2)
3.) Use the formula below to compute the Standard Deviation.
Formula:
𝑆𝐷 = √
∑ 𝐹𝑀2
𝑁
− ( 𝑋̅)2
Where:
I = interval
N = Number of cases
Σfd = sum of the product of frequency
and deviation
Σfd2
= sum of the product of the
frequency and squared
deviation

 How will you interpret the standard deviation?
1.) The results will help you determine if the group is homogeneous or not.
2.) The results will also help you determine the number of students that fall below and above
the average performance.
Study how to do this:
 Mean – 1 SD and mean + 1 SD would give the limits of an average ability
 The point right below – 1 SD is the upper limit of the below average ability
 The point right above + 1 SD is the lower limitof the above average ability
C. Quartile Deviation (QD)
1. Procedure in the Computation of QD for Ungrouped Data
1.) Arrange the scores in descending or ascending order
2.) Compute the Q1 i.e. [¼ (N)] and the results tells the rank of the Q1 score in the ordered
arrangement from the bottom.
3.) Look for the score in this rank.
4.) Compute the Q3 score [d = ¾ (N)] and the results tells the rank of the Q3 score.
5.) Look for the Q3 score in this rank
6.) Compute the QD
𝑄𝐷 =
𝑄3−𝑄1
2
2. Procedure in the Computation of QD for Grouped Data
1.) Compute for the value of the 1st quartile
𝑄1 = 𝐿𝐿 + (
𝑁
2
−𝐶𝐹 𝑏
𝐹𝑞
) 𝑖
2.) Compute for the 3rd quartile
𝑄3 = 𝐿𝐿 + (
3𝑁
2
−𝐶𝐹 𝑏
𝐹𝑞
) 𝑖
3.) Compute for the interquartile range or quartile
𝑄𝐷 =
𝑄3−𝑄1
2
 How will you interpret the quartile deviation?
The results will also tell if the group is homogeneous or not. It will also tell
how many of the students fall below or above the region of acceptable
performance. To do this, study the instruction below.
 Median – 1 QD and Median +1 QD would give the limits of an average ability
 The Point right below the (-1) QD is the upper limit of the below average
ability
 The point right above the +1 QD is the lower limit of the above average ability
STANDARD SCORES
 Indicate the pupil’s relative position by showing how far his raw score is
above or below average
 Express the pupil’s performance in terms of standard unit from the mean
 Represented by the normal probability curve or what is commonly called the
normal curve
 Used to have a common unit to compare raw scores from different tests
1. PERCENTILE
 tells the percentage of examinees that
lies below one’s score.
Formula: P𝑎 = LL + i [
𝑎𝑁−𝐶𝐹 𝑏
𝐹𝑃 𝑎
]
Where:
Q1 – standsforthe 1st
quartile
LL – lowestlimit
N/4 – one-fourthof the total
numberof the population
CF – cumulative frequencybelow
the quartile class
Fq – frequencyof the classwhere
the firstquartile score falls
I - interval
Where:
LL – lowestlimitof the classof a% N
CFb – cumulative frequencybelowthe
classof a% N
FPa – frequencyof the classof a% N

2. Z-SCORES
 tells the number of standard deviations equivalent to a given raw score
Formula: 𝑍 =
𝑋−𝑋̅
𝑆𝐷
Note:
Z – score is negative when X <𝑋̅
Z – score is positive when X >𝑋̅
3. T-SCORES
 it refers to any set of normally distributed standard deviation score that has a mean of
50 and a standard deviation of 10.
 computed after converting raw scores to z-scores to get rid of negative values
Formula: 𝑇 − 𝑠𝑐𝑜𝑟𝑒 = 50 + 10(𝑍)
ASSIGNING GRADES/MARKS/RATINGS
A. Marking/Grading - is the process of assigning value to a performance
B. Mark/Grades/Ratings are symbols which:
Could be in –
 Percent such as: 70%, 75%, 80%, etc.
 Letters such as: A, B, C, D, or F
 Numbers such as: 1, 2, 3, 4, or 5
 Descriptive expressions such as:
Outstanding (O),
Very Satisfactory (VS),
Satisfactory (S),
Moderately Satisfactory (MS),
Needs Improvement (NI), etc.
[Note: Any symbol can be used provided that it has uniform meaning to all concerned]
Could represent –
 How a student is performing in relation to other students (Norm-Referenced
Grading)
 The extent to which a student has mastered a particular body of knowledge
(Criterion-Referenced Grading)
 How a student is performing in relation to a teacher’s judgment of his or her
potential. (Grading in Relation to Teacher’s Judgment)
Could be for –
 Certification that gives assurance that a student has mastered a specific
content or achieved a certain level of accomplishment.
 Selection that provides basis in identifying or grouping students for certain
educational paths or programs.
 Direction that provides information for diagnosis and planning
 Motivation that emphasizes specific material or skills to be learned and
helping students to understand and improve their performance.
Could be based on –
 Examination results or test data
 Observations of student work
 Group evaluation activities
 Class discussions and recitations

 Homework
 Notebooks and note taking
 Reports, themes and research papers discussions and debates
 Portfolios
 Projects
 Attitudes, etc.
Could be assigned by –
 Criterion-referenced grading or grading - based on fixed or absolute
standards where grade is assigned based on how a student has met the
criteria or the well-defined objectives of a course that were spelled out in
advance.
It is then up to the student to earn the grade he or she wants to
receive regardless of how other students in the class have performed. This
is done by transmuting test scores into marks or ratings.
 Norm-referenced grading or grading - based on relative standards where
a student’s grade reflects his or her level of achievement relative to the
performance of other students in the class.
In this system the grade is assigned based in the average of test
scores. The rating scales that are used in assigning grades are:
1.) The four point rating scale which uses the median and quartile deviation
of the test scores to group the scores into four and each group is
assigned the corresponding grade of A, B, C, and D or 1, 2, 3, or 4.
2.) The five point rating scale which uses the median and quartile deviation
of the test scores to group the scores into 5 and each group is assigned
the corresponding grade of A, B, C, D, or F or 1, 2, 3, 4, or 5
 Point or Percentage Grading System whereby the teacher identifies
points or percentages of various tests and class activities depending on
their importance. The total of these points will be the bases for the grade
assigned to the student.
 Contract Grading System where each student agrees to work for a
particular grade according to agreed-upon standards.
 Guidelines in Grading Students
1.) Explain your grading system to the students early in the course and remind them
of the grading policies regularly
2.) Base grades on a predetermined and reasonable set of standards.
3.) Base your grades on as much objective evidence as possible.
4.) Base grades on the student’s attitude as well as achievement, especially at the
elementary and high school level.
5.) Base grades on the student’s relative standing compared to classmates.
6.) Base grades on a variety of sources
7.) As a rule, do not change grades.
8.) Become familiar with the grading policy of your school and with your colleagues’
standards
9.) When failing a student, closely follow school procedures.
10.)Record grades on report cards and cumulative records.
11.)Guard against bias in grading.
12.)Keep pupils informed of their standing in the class

References
Frankael, J.R. & Wallen, N.E. (1993). How to Design and Evaluate Research in
Education, 2nd Edition, New York: McGrawHill Inc.
Nackmeas, C.F. and Nachmeas, D. (1996). Research Methods in the Social Sciences,
5th Edition, London: St. Martius Press, Inc.
Oriondo, Leonora et. al. (1996). Evaluating Educational Outcomes. Quezon City: Rex
Printing Company, Inc.
Omstein, Allan C. (1990). Strategies for Effective Teaching. Newyork: Harper Collins
Publisher: Navotas, M.M.

Criteria to consider when constructing good tests

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Criteria to consider when constructing good tests

Similar to Criteria to consider when constructing good tests (20)

Recently uploaded

Recently uploaded (20)

Criteria to consider when constructing good tests