Daniel Tefera, PhD
School of Psychology
CELS, AAU
There is little doubt among
educational practitioners
about the special value of
assessment as a basic
condition for effective
learning.
 There are FOUR steps in test
development:
◦ Stating instructional objectives
◦ Developing test specification
◦ Selecting item format
◦ Preparing relevant test items
 Educational objectives can be defined
as statements that describe the end
results of learning
◦ goals and purposes that it is hoped will be
realized through the process of education
◦ desired behavioral changes that educators
want to bring about in their students.
 An acceptable instructional objective
has two components:
◦ a content element, and
◦ a behavioral element
 The Cognitive Process Dimension
◦ Remember
◦ Understand
◦ Apply
◦ Analyze
◦ Evaluate
◦ Create
 A classroom test becomes a valid
measure of the instructional objectives
and course content if is developed
based on TOS
 TOS enables the test constructor obtain
a representative sample of pupil
behavior in each of the areas to be
measured
A detailed description for a test, often
called a blueprint, that specifies the
number or proportion of items that
assess each content and process/skill
area
 Developing Test Blue-Print
◦ Obtaining a comprehensive list of
instructional objectives
◦ Outlining the course content
 A list of major topics to be covered or
 A more detailed list of topics and subtopics
Content Areas
Objectives/Behavioral Dimensions
Total
Defines,
recalls, lists
…
Comprehends,
explains,
interprets …
Demonstrate
s, applies,
prepares, …
Illustrates,
separates,
analyzes …
Science and the scientific approach 2 1 - - 3
Science & common sense 1 1 - - 2
Methods of knowing 1 2 - - 3
The two broad views of science - 2 - - 2
The Aims of science, scientific
explanation and theory
1 1 1 - 3
Goals of scientific research 1 1 - - 2
Classification of research 1 2 2 1 6
Types of research 1 2 3 3 9
The research proposal - 1 2 3 6
Review of related literature - 1 2 1 4
Total 8 14 10 8 40
 The decision about which type of item format
to use will depend on:
◦ the cognitive process to be measured,
◦ the strengths and weaknesses of each item type for
the process and content to be measured, and
◦ the degree of precision needed in the test results
 Other practical factors
◦ The number of students taking the test
◦ The skill of the test constructor
◦ Time available for constructing and scoring the test
 Matching items to intended outcomes
 Obtaining a representative sample
◦ Carefully developed test plan needed
◦ The length of a test is also an important
factor in obtaining a representative
sample. Test length depends on such
factors as the
 purpose of testing
 type of item format used
 age and educational level of the pupils,
 the amount of computation or quantitative thinking
required by the item, &
 level of reliability needed for effective test use.
 Selecting proper item difficulty.
◦ The difficulty of the items to be included
in a classroom test depends largely on
whether the test is being designed:
 to describe the specific learning tasks
pupils can perform (i.e., criterion-
referenced test) or
 to rank the pupils in order of their
achievement (i.e. norm-referenced test).
 Eliminating irrelevant barriers to
the answer.
◦ Some common barriers
 Ambiguous statements
 Excessive wordiness
 Difficult vocabulary
 Complex sentence structure
 Unclear instructions
 Preventing unintended clues to the
answer.
◦ Some common clues
 Grammatical inconsistencies
 Verbal association
 Specific determiners
 Phrasing of correct responses
 Length of correct response
 Location of correct responses
 General Principles in Preparing Paper
and Pencil Tests
◦ Make the instructions for each type of question
simple and brief
◦ Use and continually refer the table of
specification while you are writing test items.
◦ Write items that require specific understanding
or ability developed in that course, not just
general intelligence or test taking skills.
◦ Do not suggest the answer to one question in the
body of another question
◦ Be sure that each item has a correct or best
answer on which experts in the field would agree
◦ Avoid sex, religion, and other bias in stating the
test items
◦ Do not write questions in the negative. If you must
use negatives, highlight them, as they may
mislead students into answering incorrectly.
 Suggestion for Writing Better True-False Items
◦ Avoid using specific determiners such as sometimes,
usually, all, always, none, under certain conditions,
may be, never, might, only, etc.
 Poor: T F All of the lakes in Ethiopia were formed by
volcanic action.
 Good: T F The lakes in the rift valley were formed by
volcanic action.
◦ Avoid the use of negative statements, and
particularly double negatives.
 Poor: T F Tuberculosis is not a non-communicable
disease.
 Good: T F Tuberculosis is a communicable disease.
◦ Do not take statements out of textbooks and use it
for true-false items directly, but write the statement
in your own words
◦ Do not make the true statements consistently
longer than the false statements and vice versa.
◦ Avoid the use of more than one idea in an item
unless it is a cause-effect item. If it is a cause-
effect item, it should be stated so that students will
react to the effect and not the cause.
 Poor: T F Bleeding of the gum is associated with
gingivitis, which can be cured by the sufferer by daily
brushing of teeth.
 Good: T F Daily brushing of the teeth will cure gingivitis.
◦ The crucial part of a true-false item should be placed at
the end of the item
 Poor: T F The economic situation of the southern states was
a major cause of the Civil War.
 Good: T F A major cause of the Civil War was the economic
situation of the southern states.
◦ When using opinion, the source should be identified unless
the ability to distinguish fact from opinion is being
measured
 Poor: T F According to most botanists, Mendel is
considered to be the greatest botanist.
 Good: T F According to Brett Moulding, Mendel is
considered to be the greatest botanist.
◦ Keep true and false statements at approximately the
same length, and be sure that there are approximately
equal numbers of true and false items
Instruction: Match A and B
A B
____1. Abebe Bikila A. Ethiopian philanthropic
____2. Girma W/Giorgis B. Author of ‘Fikir Eske Mekabir’
____3. Abebech Gobena C. Ex-president of Ethiopia
____4. Mahmud Ahmed D. Ethiopian singer
____5. Haddis Alemayehu E. Won at Rome Olympics
Problems inherent in matching items
Lack of homogeneity
Poor instruction
Wrong order of lists
Equal number of premises and responses
Easy guessing
 Suggestions for Writing Better Matching Items
◦ Keep the set of statements in a single matching
exercise homogeneous
◦ Explain completely the intended basis for
matching
◦ All responses should function as plausible options
for each premise
◦ Matching exercises should be completed on a
single page. Splitting the exercise is
 confusing
 distracting
 time consuming for the student
◦ Keep the number of stimuli (premises) and responses
unequal. Avoid “perfect matching”.
◦ Use short lists of responses and premises. The reasons
are:
 longer lists make maintaining homogeneity difficult,
 longer matching exercises overload a test with one kind of
behavior.
 longer lists require too much examinee “ searching time”
◦ Keep statements in the response column short and list
them in some logical order
 Multiple-choice items are unique among
objective test items because, they enable you
to measure behavior at the higher levels of the
taxonomy of educational objectives
 Problems inherent in multiple choice items
◦ Grammatical clue
◦ Stem clue
◦ Opinionated items
◦ The stem is failing to present a problem
◦ An item has more than one defensible answer
◦ The length of options
◦ The use of “all of the above” and “none of the
above”
 Suggestions for Constructing Multiple-Choice
Item Format
◦ The stem should pose a clear question or problem
and should contain as much of the item as possible.
Be sure the stem of the item is meaningful without the
alternatives.
 Poor: Evolution _________.
 Better: Who developed the theory of evolution?
◦ The item stem should include as much of the item as
possible but should be free from irrelevant material.
 Poor: Most of south America was settled by colonists
from Spain. How would you account for the large
number of Spanish colonists settling there?
 Better: Spanish colonists settled most of South America in
search of _______.
◦ Use a negatively stated item stem only
when significant learning outcomes require
it.
 Although negatively stated items are
generally to be avoided, there are
occasions where they are useful
 These are mainly areas where the wrong
information, or wrong procedure, can have
dire consequences.
 Poor: Which one of the following is not a safe driving
practice on icy roads ____.
 Better: Which one of the following is NOT a safe driving
practice on icy roads ____.
◦ All of the alternatives should be
grammatically consistent with stem of the
item – grammatical clue
 Poor: An electric transformer can be used _________.
A. for storing up electricity
B. to increase the voltage of alternating current
C. it converts electrical energy into mechanical energy
D. alternating current is changed to direct current
 Better: An electric transformer can be used to _______.
A. store up electricity
B. increase the voltage of alternating current
C. converts electrical energy into mechanical energy
D. change alternating current is changed to direct current
◦ An item should contain only one correct or clearly
best answer
2x2 = __
A. 4 C. 6-2
B. 6 D. 5
◦ Items used to measure understanding or ability to
apply principles should
 Use pictorial, graphical, or tabular stimuli
 Use analogies that demonstrate relationships among
terms
 Require the application of previously learned principles
or procedures to novel situations
◦ All distracters should be plausible. The purpose of a
distracter is to distract the uniformed away from the
correct answer. One factor contributing to the
stability of distracters is their homogeneity.
 Poor: Who discovered the North Pole?
 A. Christopher Columbus C. Robert Peary
 B. Ferdinand Magellan D. Marco Polo
 Better: Who discovered the North Pole?
 A. Ronald Amudsen C. Robert Peary
 B. Richard Byrd D. Robert Scott
◦ Be sure no unintentional clues to the
correct answer are given
 Clang Association
 Poor: The function of the platelets in the blood
is to help in ________.
A. carrying oxygen to the cells C. clotting of the
blood
B. carrying food to the cells D. fighting disease
 Better: Which of the following structures in the blood
helps in forming blood clots?
 A. Red blood cells C.
Platelets
 B. Lymphocytes D.
Monocytes
 Length Clues
 Poor: The term side effect of a drug refers
to________.
A. additional benefits from the drug
B. the chain effect of drug action
C. the influence of drugs on crime
D. any action of the drug in the body other than the one
the doctor wanted the drug to have
 Better: Which one of the following, if occurred, would be a
side effect of aspirin for a man who had been taking two
aspirin tablets every 3 hours for a heavy cold and slight
fever?
A.Normal body temperature
B.Reduction in frequency of coughing
B.Easier breathing
C.Ringing in the ears
 Grammatical Inconsistency
 Poor: U.S. Grant is an _______.
A. alcoholic B. trader C. pirate D. musician
 Better: U.S. Grant is a/an _______.
A. alcoholic B. trader C. pirate D. musician
◦ Use the option “none of these” or “none of the
above” only when the keyed answer can be
classified unequivocally as correct or incorrect
◦ Avoid the use of “all of the above” in the multiple-
choice item
◦ Each alternative position should appear as answer
approximately an equal number of times, but
should be arranged in random orders
 Suggestions for Constructing Supply Items
◦ Require short, definite, clear-cut, & explicit
answers.
 E.g., Haddis Alemayehu wrote a book entitled
___.
◦ Specify and announce in advance whether
scoring will take spelling into account
◦ The blanks should be at or near the end of the
statement, so that the response logically
follows the stimulus
◦ There should preferably be only one blank
per test item, if more than one, they should be
for a related series
 E.g., Poor: __observed great diversity in __in the __.
 Good: Who was given credit for the early
development of the theory of evolution?
◦ Each blank in all items should be the same
length
◦ When writing the item do not include any
specific determinate (clues).
 E.g., Poor: When an animal eats plants, it is said to be
an ______________.
 Better: When an animal eats plants, it is said to be
a/an ____________.
◦ When writing an item, do not take a statement
directly from a textbook or from the lecture,
but write the item so as to test understanding
rather than rote memory
◦ In testing for comprehension of terms and
knowledge of definitions, it is often better to
supply the term and require a definition than to
provide a definition and require the term
 E.g., Poor What is the general measurement
term describing the consistency with which
items in a test measure the same thing?
 Good Define “internal consistency reliability
Classroom Tests (By Dr. Daniel T.)_2.ppt

Classroom Tests (By Dr. Daniel T.)_2.ppt

  • 1.
    Daniel Tefera, PhD Schoolof Psychology CELS, AAU
  • 2.
    There is littledoubt among educational practitioners about the special value of assessment as a basic condition for effective learning.
  • 3.
     There areFOUR steps in test development: ◦ Stating instructional objectives ◦ Developing test specification ◦ Selecting item format ◦ Preparing relevant test items
  • 4.
     Educational objectivescan be defined as statements that describe the end results of learning ◦ goals and purposes that it is hoped will be realized through the process of education ◦ desired behavioral changes that educators want to bring about in their students.
  • 5.
     An acceptableinstructional objective has two components: ◦ a content element, and ◦ a behavioral element  The Cognitive Process Dimension ◦ Remember ◦ Understand ◦ Apply ◦ Analyze ◦ Evaluate ◦ Create
  • 6.
     A classroomtest becomes a valid measure of the instructional objectives and course content if is developed based on TOS  TOS enables the test constructor obtain a representative sample of pupil behavior in each of the areas to be measured
  • 7.
    A detailed descriptionfor a test, often called a blueprint, that specifies the number or proportion of items that assess each content and process/skill area
  • 8.
     Developing TestBlue-Print ◦ Obtaining a comprehensive list of instructional objectives ◦ Outlining the course content  A list of major topics to be covered or  A more detailed list of topics and subtopics
  • 9.
    Content Areas Objectives/Behavioral Dimensions Total Defines, recalls,lists … Comprehends, explains, interprets … Demonstrate s, applies, prepares, … Illustrates, separates, analyzes … Science and the scientific approach 2 1 - - 3 Science & common sense 1 1 - - 2 Methods of knowing 1 2 - - 3 The two broad views of science - 2 - - 2 The Aims of science, scientific explanation and theory 1 1 1 - 3 Goals of scientific research 1 1 - - 2 Classification of research 1 2 2 1 6 Types of research 1 2 3 3 9 The research proposal - 1 2 3 6 Review of related literature - 1 2 1 4 Total 8 14 10 8 40
  • 10.
     The decisionabout which type of item format to use will depend on: ◦ the cognitive process to be measured, ◦ the strengths and weaknesses of each item type for the process and content to be measured, and ◦ the degree of precision needed in the test results  Other practical factors ◦ The number of students taking the test ◦ The skill of the test constructor ◦ Time available for constructing and scoring the test
  • 11.
     Matching itemsto intended outcomes  Obtaining a representative sample ◦ Carefully developed test plan needed ◦ The length of a test is also an important factor in obtaining a representative sample. Test length depends on such factors as the  purpose of testing  type of item format used  age and educational level of the pupils,  the amount of computation or quantitative thinking required by the item, &  level of reliability needed for effective test use.
  • 12.
     Selecting properitem difficulty. ◦ The difficulty of the items to be included in a classroom test depends largely on whether the test is being designed:  to describe the specific learning tasks pupils can perform (i.e., criterion- referenced test) or  to rank the pupils in order of their achievement (i.e. norm-referenced test).
  • 13.
     Eliminating irrelevantbarriers to the answer. ◦ Some common barriers  Ambiguous statements  Excessive wordiness  Difficult vocabulary  Complex sentence structure  Unclear instructions
  • 14.
     Preventing unintendedclues to the answer. ◦ Some common clues  Grammatical inconsistencies  Verbal association  Specific determiners  Phrasing of correct responses  Length of correct response  Location of correct responses
  • 15.
     General Principlesin Preparing Paper and Pencil Tests ◦ Make the instructions for each type of question simple and brief ◦ Use and continually refer the table of specification while you are writing test items. ◦ Write items that require specific understanding or ability developed in that course, not just general intelligence or test taking skills.
  • 16.
    ◦ Do notsuggest the answer to one question in the body of another question ◦ Be sure that each item has a correct or best answer on which experts in the field would agree ◦ Avoid sex, religion, and other bias in stating the test items ◦ Do not write questions in the negative. If you must use negatives, highlight them, as they may mislead students into answering incorrectly.
  • 17.
     Suggestion forWriting Better True-False Items ◦ Avoid using specific determiners such as sometimes, usually, all, always, none, under certain conditions, may be, never, might, only, etc.  Poor: T F All of the lakes in Ethiopia were formed by volcanic action.  Good: T F The lakes in the rift valley were formed by volcanic action. ◦ Avoid the use of negative statements, and particularly double negatives.  Poor: T F Tuberculosis is not a non-communicable disease.  Good: T F Tuberculosis is a communicable disease.
  • 18.
    ◦ Do nottake statements out of textbooks and use it for true-false items directly, but write the statement in your own words ◦ Do not make the true statements consistently longer than the false statements and vice versa. ◦ Avoid the use of more than one idea in an item unless it is a cause-effect item. If it is a cause- effect item, it should be stated so that students will react to the effect and not the cause.  Poor: T F Bleeding of the gum is associated with gingivitis, which can be cured by the sufferer by daily brushing of teeth.  Good: T F Daily brushing of the teeth will cure gingivitis.
  • 19.
    ◦ The crucialpart of a true-false item should be placed at the end of the item  Poor: T F The economic situation of the southern states was a major cause of the Civil War.  Good: T F A major cause of the Civil War was the economic situation of the southern states. ◦ When using opinion, the source should be identified unless the ability to distinguish fact from opinion is being measured  Poor: T F According to most botanists, Mendel is considered to be the greatest botanist.  Good: T F According to Brett Moulding, Mendel is considered to be the greatest botanist. ◦ Keep true and false statements at approximately the same length, and be sure that there are approximately equal numbers of true and false items
  • 20.
    Instruction: Match Aand B A B ____1. Abebe Bikila A. Ethiopian philanthropic ____2. Girma W/Giorgis B. Author of ‘Fikir Eske Mekabir’ ____3. Abebech Gobena C. Ex-president of Ethiopia ____4. Mahmud Ahmed D. Ethiopian singer ____5. Haddis Alemayehu E. Won at Rome Olympics Problems inherent in matching items Lack of homogeneity Poor instruction Wrong order of lists Equal number of premises and responses Easy guessing
  • 21.
     Suggestions forWriting Better Matching Items ◦ Keep the set of statements in a single matching exercise homogeneous ◦ Explain completely the intended basis for matching ◦ All responses should function as plausible options for each premise ◦ Matching exercises should be completed on a single page. Splitting the exercise is  confusing  distracting  time consuming for the student
  • 22.
    ◦ Keep thenumber of stimuli (premises) and responses unequal. Avoid “perfect matching”. ◦ Use short lists of responses and premises. The reasons are:  longer lists make maintaining homogeneity difficult,  longer matching exercises overload a test with one kind of behavior.  longer lists require too much examinee “ searching time” ◦ Keep statements in the response column short and list them in some logical order
  • 23.
     Multiple-choice itemsare unique among objective test items because, they enable you to measure behavior at the higher levels of the taxonomy of educational objectives  Problems inherent in multiple choice items ◦ Grammatical clue ◦ Stem clue ◦ Opinionated items ◦ The stem is failing to present a problem ◦ An item has more than one defensible answer ◦ The length of options ◦ The use of “all of the above” and “none of the above”
  • 24.
     Suggestions forConstructing Multiple-Choice Item Format ◦ The stem should pose a clear question or problem and should contain as much of the item as possible. Be sure the stem of the item is meaningful without the alternatives.  Poor: Evolution _________.  Better: Who developed the theory of evolution? ◦ The item stem should include as much of the item as possible but should be free from irrelevant material.  Poor: Most of south America was settled by colonists from Spain. How would you account for the large number of Spanish colonists settling there?  Better: Spanish colonists settled most of South America in search of _______.
  • 25.
    ◦ Use anegatively stated item stem only when significant learning outcomes require it.  Although negatively stated items are generally to be avoided, there are occasions where they are useful  These are mainly areas where the wrong information, or wrong procedure, can have dire consequences.  Poor: Which one of the following is not a safe driving practice on icy roads ____.  Better: Which one of the following is NOT a safe driving practice on icy roads ____.
  • 26.
    ◦ All ofthe alternatives should be grammatically consistent with stem of the item – grammatical clue  Poor: An electric transformer can be used _________. A. for storing up electricity B. to increase the voltage of alternating current C. it converts electrical energy into mechanical energy D. alternating current is changed to direct current  Better: An electric transformer can be used to _______. A. store up electricity B. increase the voltage of alternating current C. converts electrical energy into mechanical energy D. change alternating current is changed to direct current
  • 27.
    ◦ An itemshould contain only one correct or clearly best answer 2x2 = __ A. 4 C. 6-2 B. 6 D. 5 ◦ Items used to measure understanding or ability to apply principles should  Use pictorial, graphical, or tabular stimuli  Use analogies that demonstrate relationships among terms  Require the application of previously learned principles or procedures to novel situations
  • 28.
    ◦ All distractersshould be plausible. The purpose of a distracter is to distract the uniformed away from the correct answer. One factor contributing to the stability of distracters is their homogeneity.  Poor: Who discovered the North Pole?  A. Christopher Columbus C. Robert Peary  B. Ferdinand Magellan D. Marco Polo  Better: Who discovered the North Pole?  A. Ronald Amudsen C. Robert Peary  B. Richard Byrd D. Robert Scott
  • 29.
    ◦ Be sureno unintentional clues to the correct answer are given  Clang Association  Poor: The function of the platelets in the blood is to help in ________. A. carrying oxygen to the cells C. clotting of the blood B. carrying food to the cells D. fighting disease  Better: Which of the following structures in the blood helps in forming blood clots?  A. Red blood cells C. Platelets  B. Lymphocytes D. Monocytes
  • 30.
     Length Clues Poor: The term side effect of a drug refers to________. A. additional benefits from the drug B. the chain effect of drug action C. the influence of drugs on crime D. any action of the drug in the body other than the one the doctor wanted the drug to have  Better: Which one of the following, if occurred, would be a side effect of aspirin for a man who had been taking two aspirin tablets every 3 hours for a heavy cold and slight fever? A.Normal body temperature B.Reduction in frequency of coughing B.Easier breathing C.Ringing in the ears
  • 31.
     Grammatical Inconsistency Poor: U.S. Grant is an _______. A. alcoholic B. trader C. pirate D. musician  Better: U.S. Grant is a/an _______. A. alcoholic B. trader C. pirate D. musician ◦ Use the option “none of these” or “none of the above” only when the keyed answer can be classified unequivocally as correct or incorrect ◦ Avoid the use of “all of the above” in the multiple- choice item ◦ Each alternative position should appear as answer approximately an equal number of times, but should be arranged in random orders
  • 32.
     Suggestions forConstructing Supply Items ◦ Require short, definite, clear-cut, & explicit answers.  E.g., Haddis Alemayehu wrote a book entitled ___. ◦ Specify and announce in advance whether scoring will take spelling into account ◦ The blanks should be at or near the end of the statement, so that the response logically follows the stimulus
  • 33.
    ◦ There shouldpreferably be only one blank per test item, if more than one, they should be for a related series  E.g., Poor: __observed great diversity in __in the __.  Good: Who was given credit for the early development of the theory of evolution? ◦ Each blank in all items should be the same length ◦ When writing the item do not include any specific determinate (clues).  E.g., Poor: When an animal eats plants, it is said to be an ______________.  Better: When an animal eats plants, it is said to be a/an ____________.
  • 34.
    ◦ When writingan item, do not take a statement directly from a textbook or from the lecture, but write the item so as to test understanding rather than rote memory ◦ In testing for comprehension of terms and knowledge of definitions, it is often better to supply the term and require a definition than to provide a definition and require the term  E.g., Poor What is the general measurement term describing the consistency with which items in a test measure the same thing?  Good Define “internal consistency reliability

Editor's Notes

  • #7 the format of items, responses, and the desired psychometric properties of the items and test such as: the distribution of item difficulty and discrimination indices.