2. What is testing?
• “A test is a sample of behavior, products, answers, or
performances from a particular domain” (Carrington, 1994)
“A test will predict performance levels, and the learner
will somehow reconstruct its parts in meaningful
situations when necessary” (McCann, 2000)
“ Testing is generally concerned with turning
performance into numbers.” (Baxter, 1997)
Test [scores] can give parents useful information about
their children. Retrieved May 26, 2016 from http://PAREonline.net/getvn.asp?
v=1&n=1
“A test is an instrument or systematic procedure for
measuring a sample of behavior” (Gronlund and Linn 1990)
3. Why Are Students Tested?
• They help teachers, principals, and
administrators:
– evaluate and improve the school district
– evaluate and improve the individual school
– identify a child's academic strengths
– identify areas where a child may need to improve
• REMEMBER: Children are never measured
on the basis of ONE test alone.
Retrieved May 26, 2016 from http://PAREonline.net/getvn.asp?v=1&n=1
4. Why pay attention to test construction?
• Teachers who are trained on test construction and
analysis prepared tests that were more valid and reliable
(Magno, 2003)
• Studies have suggested that faulty test items affect
students’ comprehension and ability to provide accurate
answers to the items (Koksal, 2004; Leighton and
Gokiert, 2005)
5. 13% of students who got low grades in exams are caused
by faulty test questions (WORLDWATCH: The Philadelphia
Trumpet, August 2005)
6. • Poorly designed test items can lead to inaccurate
measurements of learning and provide false information
regarding student performance as well as instructional
effectiveness (Education Up Close, 2005).
7. • Length of the tests also affected the quality of the tests.
According to Wells and Wollack (2003), longer tests
produce higher reliabilities and validities.
8. • Any item answered correctly or incorrectly because of
extraneous factors in the item results in misleading
feedback to both examinee and examiner (Frey, 2007).
9. Characteristics of Good Tests
• Validity
– refers to the accuracy of an assessment
• Reliability
– the consistency with which a test measures
what it is supposed to measure
• Usability
– the test can be administered with ease,
clarity and uniformity
Retrieved May 26, 2016 from http://fcit.usf.edu/assessment/basic/basicc.html
10. Characteristics of Good Tests
• Scorability
– easy to score
• Interpretability
– test results can be properly interpreted and
is a major basis in making sound
educational decisions
• Economical
– the test can be reused without compromising
the validity and reliability
13. Table of Specifications (TOS)
• A two way chart that relates the learning
outcomes to the course content
• It enables the teacher to prepare a test
containing a representative sample of
student behavior in each of the areas
tested.
15. Teachers who used the table of specification to design test
items generated tests with higher validity and reliability than
those who did not. (Linn and Gronlund,1995)
16. Commonly used Test Format
• Multiple Choice
• True or False
• Matching Type
• Fill-in the blanks (Sentence Completion)
• Essay
18. Rules for Writing Multiple-Choice Items
• When checking the stems for correctness:
– Ensure that the stem asks a clear question.
– Reading level is appropriate to the students
– The stem is grammatically correct.
– Negatively stated stems are discouraged.
• When using incomplete statements place the blank space at the
end.
• All options should be homogenous and nearly equal in length.
• Stem (question) should contain only one main idea.
• Keep all options either singular or plural.
• Have four or five responses per stem (question).
Gronlund, N., Assessment of student achievement, 7th ed. Pearson Education, Inc., Boston, 2003)
19. PARTS
• Stem
– is the section of a multiple-choice item that poses the
problem that the students must answer.
– Stems can be in the form of a question or an
incomplete sentence.
– Poorly written stems fail to state clearly the problem
when they are vague, full of irrelevant data, or
negatively written.
http://www.duq.edu/about/centers-and-institutes/center-for-teaching-excellence/teaching-and-learning/multiple-choice-exam-construction
20. PARTS
• Alternatives
– consist of the answer and distractors that are
inferior or incorrect.
– common mistakes in writing exam alternatives have
to do with how the various alternatives relate.
– They should be:
• mutually exclusive
• homogenous,
• plausible and consistently phrased.
http://www.duq.edu/about/centers-and-institutes/center-for-teaching-excellence/teaching-and-learning/multiple-choice-exam-construction
21. Poorly Written Stems
Avoid vague stems by stating the problem in
the stem:
Poor Example
California:
a. Contains the tallest mountain in the United States.
b. Has an eagle on its state flag.
c. Is the second largest state in terms of area.
*d. Was the location of the Gold Rush of 1849.
Good Example
What is the main reason so many people moved to California in 1849?
a. California land was fertile, plentiful, and inexpensive.
*b. Gold was discovered in central California.
c. The east was preparing for a civil war.
d. They wanted to establish religious settlements.
22. Avoid wordy stems by removing irrelevant data:
Poor Example
Suppose you are a mathematics professor who wants to determine whether or
not your teaching of a unit on probability has had a significant effect on your
students. You decide to analyze their scores from a test they took before the
instruction and their scores from another exam taken after the instruction.
Which of the following t-tests is appropriate to use in this situation?
*a. Dependent samples.
b. Heterogenous samples.
c. Homogenous samples.
d. Independent samples.
Good Example
When analyzing your students’ pretest and posttest scores to determine if your
teaching has had a significant effect, an appropriate statistic to use is the t-test
for:
*a. Dependent samples.
b. Heterogenous samples.
c. Homogenous samples.
d. Independent samples.
23. Poorly Written Alternatives
Avoid Overlapping Alternatives
Poor Example
What is the average effective radiation dose from chest CT?
a. 1-8 mSv
b. 8-16 mSv
c. 16-24 mSv
d. 24-32 mSv
Good Example
What is the average effective radiation dose from chest CT?
a. 1-7 mSv
b. 8-15 mSv
c. 16-24 mSv
d. 24-32 mSv
24. Avoid Dissimilar Alternatives
Poor Example
Idaho is widely known as:
*a. The largest producer of potatoes in the United States.
b. The location of the tallest mountain in the United States.
c. The state with a beaver on its flag.
d. The “Treasure State.”
Good Example
Idaho is widely known for its:
a. Apples.
b. Corn.
*c. Potatoes.
d. Wheat
25. Avoid implausible alternatives
Poor Example
Which of the following artists is known for painting the ceiling of the Sistine
Chapel?
a. Warhol.
b. Flinstone.
*c. Michelangelo.
d. Santa Claus.
Good Example
Which of the following artists is known for painting the ceiling of the Sistine
Chapel?
a. Botticelli.
b. da Vinci.
*c. Michelangelo.
d. Raphael.
27. True or False
•Each statement is clearly true or clearly false.
•Trivial details should not make a statement false.
•Statements are written concisely without more
elaboration than necessary.
•Statements are NOT quoted exactly from text.
28. •Give emphasis on the use of quantitative terms
than qualitative terms.
•Avoid using of specific determiners which usually
gives a clue to the answer.
•False = all, always, never, every, none, only
•True = generally, sometimes, usually, maybe,
often
•Discourage the use of negative statements.
•Whenever a controversial statement is used, the
authority should be quoted.
•Discourage the use of pattern for answers.
29. Express the item statement as simply
and as clearly as possible.
Undesirable:
• When you see a highway
with a marker that reads,
“Interstate 80” you know
that the construction and
upkeep of that road is
maintained by the state
and federal government
Desirable:
• The construction and
maintenance of interstate
highways are provided by
both state and federal
governments.
30. Express a single idea in each
test item
Undesirable:
• Water will boil at a higher
temperature if the
atmospheric pressure on
its surface is increased
and more heat is applied
to the container.
Desirable:
• Water will boil at a higher
temperature if the
atmospheric pressure on
its surface is increased.
31. Avoid the use of extreme
modifiers or qualifiers.
Undesirable:
• All sessions of Congress
are called by the
President. (F)
• The Supreme Court
frequently rules on the
constitutionality of law.
(T)
• An objective test is
generally easier to score
than an essay test. (T)
• Desirable:
• The sum of the angles of
a triangle is always 180o
.
(T)
• The galvanometer is the
instrument usually used
for the metering of
electrical energy used in
a home. (F)
32. Extreme Modifiers:
• all
• none
• always
• never
• Only
• nobody
• Invariably
• no one
• best
• absolutely
• worst
• absolutely not
• everybody
• certainly
• everyone
• certainly not
Qualifiers:
• usually
• frequently
• often
• sometimes
• some
• many
• much
• probably
• a majority
• apt to
• Most
• might
• a few
• unlikely
33. Avoid lifting statements from the text, lecture or
other materials so that memory alone will not
permit a correct answer.
Undesirable:
• For every action there is
an opposite and equal
reaction.
Desirable:
• If you were to stand in a
canoe and throw a life
jacket forward to another
canoe, chances are your
canoe would jerk
backward.
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
34. Avoid the use of unfamiliar
vocabulary
Undesirable:
• According to some
politicians, the raison
d’etre for capital
punishment is retribution
Desirable:
• According to some
politicians, justification for
capital punishment is
retribution.
Writing Hint… One method for developing true-false items is to
write a set of true statements that cover the content, then convert
approximately half of them to false statements. Remember:
When changing items to false (as well as in writing the true
statements initially), state the items positively, avoiding negatives
or double negatives
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
36. Directions: "Place the letter of the term in the right hand
column on the line to the left of the definition column."
Circle the letter(s) that describe the best way to revise
these directions:
A. Add: “Match the following”
B. Add:“Each term may not be used more than once”
C. Change the order of the directions provided
D. No changes needed
Problem: Faulty directions.
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
42. Omit only significant words from the statement.
Do not omit so many words from the statement that the
intended meaning is lost.
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
43. Be sure there is only one correct response.
If possible, put the blank at the end of a statement
rather than at the beginning.
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
45. Formulate the question so that the task is clearly
defined for the student.
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
46. Choose a scoring model.
• The major task in scoring essay tests is to maintain
consistency, to make sure that answers of equal quality
are given the same number of points. There are two
approaches to scoring essay items: (1) analytic or point
method and (2) holistic or rating method.
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
47. Analytic
Before scoring, prepare an ideal answer in which the
major components are defined and assigned point
values.
Read and compare the student’s answer with the
model answer. If all the necessary elements are
present, the student receives the maximum number
of points.
Partial credit is given based on the elements
included in the answer. In order to arrive at the
overall exam score, the teacher adds the points
earned on the separate questions.
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
48. Holistic:
This method involves considering the student’s answer
as a whole and judging the total quality of the answer
relative to other student responses or the total quality
of the answer based on certain criteria that you
develop.
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
49. Prepare students to take essay
exams
• Essay tests are valid measures of student achievement
only if students know how to take them. Many college
freshmen do not know how to take an essay exam,
because they have not been required to learn this skill in
high school.
• Take some class time to tell students how to prepare for
and how to take an essay exam. Use old exam
questions and let students see what an "A" answer looks
like and how it differs from a "C" answer
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
50. REFERENCES
This presentation is patterned after the Powerpoint presentation of :
Arnel O. Rivera. Faculty Member. BNHS-Villa Maria.CAS, LPU-Cavite
http://pareonline.net/genpare.asp?cx=partner-pub-8146434030680546%3A8994471369&cof=FORID
%3A9&ie=UTF-8&wh=5&q=validity+and+reliability
http://www.cte.cornell.edu/documents/Test%20Construction%20Manual.pdf)
http://arc.duke.edu/documents/The%20difference%20between%20assessment%20and%20evaluation.pdf
http://www.edudemic.com/summative-and-formative-assessments/
http://pareonline.net/getvn.asp?v=5&n=2
http://www.duq.edu/about/centers-and-institutes/center-for-teaching-excellence/teaching-and-learning/multiple-
choice-exam-construction
http://www.k-state.edu/ksde/alp/resources/Handout-Module6.pdf
Editor's Notes
Example: VALIDITY: Let’s imagine a bathroom scale that consistently tells you that you weigh 130 pounds. The reliability (consistency) of this scale is very good, but it is not accurate (valid) because you actually weigh 145 pounds (perhaps you re-set the scale in a weak moment)!Â
Example: RELIABILITY: If you weigh five pounds of potatoes in the morning, and the scale is reliable, the same scale should register five pounds for the potatoes an hour later (unless, of course, you peeled and cooked them)