TEST DEVELOPMENT
STEPS IN TEST
DEVELOPMENT
• Test Conceptualization
• Test Construction
• Test Tryout
• Item Analysis
• Test Revision
STEP 1: TEST
CONCEPTUALIZATION
• The process can be traced through thoughts
• “There ought to be a test designed to measure _____ in
such and such way”
• An emerging phenomenon or pattern of behavior might
serve as the stimulus for test conceptualization
• Pilot Work: the generalized term for preliminary research
surrounding the creation of the test prototype
• Items must be subject to pilot studies to evaluate whether
or not they should be included in the final form of the test
STEP 1: TEST
CONCEPTUALIZATION
• Criterion-Referenced: based on the amount of
knowledge and/or the level of competence ;
employed in licensing
• Norm-Referenced: based on the performance of a
specific group; employed in educational contexts;
mastery of material; existing base of knowledge and
skills
STEP 2: TEST
CONSTRUCTION
• Scaling
• setting rules for assigning numbers in measurement
• process by which a measuring device is designed and
calibrated by which numbers are assigned to different
amounts of trait, attribute, or characteristic being
measured
STEP 2: TEST
CONSTRUCTION
• Scaling Methods
• Rankings of Experts
• Asking a panel of experts which would then rank the
behavioral indicators and provide a meaningful numerical
score
• Method of Equal-Appearing Intervals
• Developed by L. L. Thurstone (1929)
• A large number of true-false statements reflects positive and
negative attitudes
• Items would be in an interval scale
• Reliability and validity analyses are important to determine
the appropriateness and usefulness
• An item with a larger standard deviation would be dropped
STEP 2: TEST
CONSTRUCTION
• Scaling Methods
• Method of Absolute Scaling
• Obtaining a measure of absolute item difficulty based
on results for different age groups of testtakers
• Commonly used in group achievement and aptitude
testing
• Likert Scale
• Consists of ordered responses in a continuum
• Total score is obtained by adding the scores from
individual items
STEP 2: TEST
CONSTRUCTION
• Scaling Methods (cont’d)
• Guttman Scales
• Respondents that endorse a stronger statement will also
endorse on the milder ones
• Method of Empirical Keying
• Test items are selected based entirely on how well they
contrast a criterion group from a normative sample
STEP 2: TEST
CONSTRUCTION
• Scaling Methods (cont’d)
• Method of Rational Scaling
• All scale items correlate positively with each other and
with the total score for each scale
• Method of Paired Comparisons
• Testtakers are presented with pairs of stimuli which they
will be asked to compare
• Categorical Scaling
• Stimuli are placed into one of two or more alternative
categories that differ quantitatively with respect to
some continuum.
STEP 2: TEST
CONSTRUCTION
• Writing Items
• Define clearly what you want to measure
• Generate an item pool
• Avoid exceptionally long items
• Keep the level of difficulty appropriate for those who
will
• Avoid double-barreled items that convey two or more
ideas at the same time
• Consider mixing positively and negatively worded itms
STEP 2: TEST
CONSTRUCTION
• Approaches to Test Construction:
• Rational (Theoretical) Approach
• Reliance on reason and logic over data collection for
statistical analysis
• Empirical Approach
• Reliance on data gathering to identify items that relate to the
construct
• Bootstrap
• Combination of rational and empirical approaches based on a
theory, then an empirical approach will be used to identify
items that are highly related to the construct
STEP 2: TEST
CONSTRUCTION
• Item Format: form, plan, structure, arrangement, and
layout of individual test items
• Multiple choice
• Matching
• Binary-choice (i. e., True or False)
• Short Answer
STEP 2: TEST
CONSTRUCTION
• Scoring Models
• Cumulative
• the number of items endorsed/responded to match the key
which represents the construct being measured
• Class/Category
• the placement of an individual to a particular class for
description or prediction
• Ipsative
• the indication of how an individual performed on one scale
within the given test
STEP 3: TEST TRYOUT
• The test should be tried out on people who are
similar in critical respects to the people to whom the
test was designed
A x 5 to 10 = n
A = items on a questionnaire
n = participants
• For validation purposes, there must be at least 20
participants each
• A good test helps in discriminating testtakers
STEP 4: ITEM ANALYSIS
• Item-Difficulty Index
• Calculation of the proportion of the total number of
testtakers that answered the test correctly
• The difficulty of the test can be found by averaging the
item-difficulty indices
• Item-Reliability Index
• Indication of the test’s internal consistenct
• Use factor analysis
STEP 4: ITEM ANALYSIS
• Item-Validity Index
• Indicates the degree on which a test is measuring what
it intends to measure
• Can be calculated by means of item score standard
deviation and the correlation between the item and
criterion score
• Item-Discrimination Index
• How an item discriminates high-scorers and the low-
scorers
STEP 4: ITEM ANALYSIS
• Considerations:
• Guessing
• Item fairness
• Speed Tests
• Qualitative Item Analysis
• Comparison of individual test items with one another and
the test as a whole
STEP 4: ITEM ANALYSIS
• “Think Aloud” Test Administration
• Innovative approach to cognitive assessment by
having respondents verbalize thoughts as they occur
• Expert Panels
• Sensitivity Review
• Testtakers could be interviewed
STEP 5: TEST REVISION
• Popular culture changes
• Adequacy of test norms
• Changes in reliability or validity
• Theoretical modifications
STEP 5: TEST REVISION
• Cross-Validation
• Revalidation of a test on a sample of testtakers other
than those on whom test performance was originally
found to be a valid predictor of some criterion
• Co-validation
• A validation process conducted on two or more tests
using the same sample of testtakers
STEP 5: TEST REVISION
• Quality Assurance
• Anchor Protocol
• Produced by a highly authoritative scorer designed to model
scoring and resolve discrepancies that goes along with it
• Scoring Drift
• Discrepancy between scoring in an anchor protocol and
another protocol
• Evaluate properties of existing tests and guide in revisions
• Determine measurement equivalence across populations
• Development of item banks

Test Construction

  • 1.
  • 2.
    STEPS IN TEST DEVELOPMENT •Test Conceptualization • Test Construction • Test Tryout • Item Analysis • Test Revision
  • 3.
    STEP 1: TEST CONCEPTUALIZATION •The process can be traced through thoughts • “There ought to be a test designed to measure _____ in such and such way” • An emerging phenomenon or pattern of behavior might serve as the stimulus for test conceptualization • Pilot Work: the generalized term for preliminary research surrounding the creation of the test prototype • Items must be subject to pilot studies to evaluate whether or not they should be included in the final form of the test
  • 4.
    STEP 1: TEST CONCEPTUALIZATION •Criterion-Referenced: based on the amount of knowledge and/or the level of competence ; employed in licensing • Norm-Referenced: based on the performance of a specific group; employed in educational contexts; mastery of material; existing base of knowledge and skills
  • 5.
    STEP 2: TEST CONSTRUCTION •Scaling • setting rules for assigning numbers in measurement • process by which a measuring device is designed and calibrated by which numbers are assigned to different amounts of trait, attribute, or characteristic being measured
  • 6.
    STEP 2: TEST CONSTRUCTION •Scaling Methods • Rankings of Experts • Asking a panel of experts which would then rank the behavioral indicators and provide a meaningful numerical score • Method of Equal-Appearing Intervals • Developed by L. L. Thurstone (1929) • A large number of true-false statements reflects positive and negative attitudes • Items would be in an interval scale • Reliability and validity analyses are important to determine the appropriateness and usefulness • An item with a larger standard deviation would be dropped
  • 7.
    STEP 2: TEST CONSTRUCTION •Scaling Methods • Method of Absolute Scaling • Obtaining a measure of absolute item difficulty based on results for different age groups of testtakers • Commonly used in group achievement and aptitude testing • Likert Scale • Consists of ordered responses in a continuum • Total score is obtained by adding the scores from individual items
  • 8.
    STEP 2: TEST CONSTRUCTION •Scaling Methods (cont’d) • Guttman Scales • Respondents that endorse a stronger statement will also endorse on the milder ones • Method of Empirical Keying • Test items are selected based entirely on how well they contrast a criterion group from a normative sample
  • 9.
    STEP 2: TEST CONSTRUCTION •Scaling Methods (cont’d) • Method of Rational Scaling • All scale items correlate positively with each other and with the total score for each scale • Method of Paired Comparisons • Testtakers are presented with pairs of stimuli which they will be asked to compare • Categorical Scaling • Stimuli are placed into one of two or more alternative categories that differ quantitatively with respect to some continuum.
  • 10.
    STEP 2: TEST CONSTRUCTION •Writing Items • Define clearly what you want to measure • Generate an item pool • Avoid exceptionally long items • Keep the level of difficulty appropriate for those who will • Avoid double-barreled items that convey two or more ideas at the same time • Consider mixing positively and negatively worded itms
  • 11.
    STEP 2: TEST CONSTRUCTION •Approaches to Test Construction: • Rational (Theoretical) Approach • Reliance on reason and logic over data collection for statistical analysis • Empirical Approach • Reliance on data gathering to identify items that relate to the construct • Bootstrap • Combination of rational and empirical approaches based on a theory, then an empirical approach will be used to identify items that are highly related to the construct
  • 12.
    STEP 2: TEST CONSTRUCTION •Item Format: form, plan, structure, arrangement, and layout of individual test items • Multiple choice • Matching • Binary-choice (i. e., True or False) • Short Answer
  • 13.
    STEP 2: TEST CONSTRUCTION •Scoring Models • Cumulative • the number of items endorsed/responded to match the key which represents the construct being measured • Class/Category • the placement of an individual to a particular class for description or prediction • Ipsative • the indication of how an individual performed on one scale within the given test
  • 14.
    STEP 3: TESTTRYOUT • The test should be tried out on people who are similar in critical respects to the people to whom the test was designed A x 5 to 10 = n A = items on a questionnaire n = participants • For validation purposes, there must be at least 20 participants each • A good test helps in discriminating testtakers
  • 15.
    STEP 4: ITEMANALYSIS • Item-Difficulty Index • Calculation of the proportion of the total number of testtakers that answered the test correctly • The difficulty of the test can be found by averaging the item-difficulty indices • Item-Reliability Index • Indication of the test’s internal consistenct • Use factor analysis
  • 16.
    STEP 4: ITEMANALYSIS • Item-Validity Index • Indicates the degree on which a test is measuring what it intends to measure • Can be calculated by means of item score standard deviation and the correlation between the item and criterion score • Item-Discrimination Index • How an item discriminates high-scorers and the low- scorers
  • 17.
    STEP 4: ITEMANALYSIS • Considerations: • Guessing • Item fairness • Speed Tests • Qualitative Item Analysis • Comparison of individual test items with one another and the test as a whole
  • 18.
    STEP 4: ITEMANALYSIS • “Think Aloud” Test Administration • Innovative approach to cognitive assessment by having respondents verbalize thoughts as they occur • Expert Panels • Sensitivity Review • Testtakers could be interviewed
  • 19.
    STEP 5: TESTREVISION • Popular culture changes • Adequacy of test norms • Changes in reliability or validity • Theoretical modifications
  • 20.
    STEP 5: TESTREVISION • Cross-Validation • Revalidation of a test on a sample of testtakers other than those on whom test performance was originally found to be a valid predictor of some criterion • Co-validation • A validation process conducted on two or more tests using the same sample of testtakers
  • 21.
    STEP 5: TESTREVISION • Quality Assurance • Anchor Protocol • Produced by a highly authoritative scorer designed to model scoring and resolve discrepancies that goes along with it • Scoring Drift • Discrepancy between scoring in an anchor protocol and another protocol • Evaluate properties of existing tests and guide in revisions • Determine measurement equivalence across populations • Development of item banks