Applied Psych Test Design: Part B - Test and Item Development
Upcoming SlideShare
Loading in...5
×
 

Applied Psych Test Design: Part B - Test and Item Development

on

  • 4,401 views

The Art and Science of Applied Test Development. This is the second in a series of PPT modules explicating the development of psychological tests in the domain of cognitive ability using contemporary ...

The Art and Science of Applied Test Development. This is the second in a series of PPT modules explicating the development of psychological tests in the domain of cognitive ability using contemporary methods (e.g., theory-driven test specification; IRT-Rasch scaling; etc.). The presentations are intended to be conceptual and not statistical in nature. Feedback is appreciated.

Statistics

Views

Total Views
4,401
Views on SlideShare
4,386
Embed Views
15

Actions

Likes
0
Downloads
192
Comments
3

2 Embeds 15

http://www.slideshare.net 12
http://www.google.com 3

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • thanks for the slides it gives me an easier understanding on test development,etc.
    Are you sure you want to
    Your message goes here
    Processing…
  • Dr Kevin
    thank you for the slides in this series.
    Are you sure you want to
    Your message goes here
    Processing…
  • your slides inteaching and testing language are so wonderful.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • d

Applied Psych Test Design: Part B - Test and Item Development Applied Psych Test Design: Part B - Test and Item Development Presentation Transcript

  • The Art and Science of Test Development—Part B Test and item development Kevin S. McGrew, PhD. Educational Psychologist Research Director Woodcock-Muñoz Foundation The basic structure and content of this presentation is grounded extensively on the test development procedures developed by Dr. Richard Woodcock
  • The Art and Science of Test Development The above titled topic is presented in a series of sequential PowerPoint modules. It is strongly recommended that the modules (A-G) be viewed in sequence. Part A: Planning, development frameworks & domain/test specification blueprints Part B: Test and Item Development Part C: Use of Rasch Technology Part D: Develop norm (standardization) plan Part E: Calculate norms and derived scores Part F: Psychometric/technical and statistical analysis: Internal Part G: Psychometric/technical and statistical analysis: External The current module is designated by red bold font lettering
  • Substantive Stage of Test Development: Develop Test Design and Specification Blueprint – Item Scale Development Gv Theoretical Domain = Cattell- Horn-Carroll (CHC) theory of cognitive abilities – Gv domain & 3 selected narrow Gv abilities There is a universe of potential Vz item ? tasks. Which type/format should be selected? What types of tasks, items, format, materials, examiner responses, etc.? How do we minimize construct irrelevant variance and maximize Measurement or empirical domain construct relevant variance?
  • Visualization (Vz) Item Development: How do you do it? Visualization (Vz): The ability to apprehend a spatial form, object, or scene and match it with another spatial object, form, or scene with the requirement to rotate it (one or more times) in two or three dimensions. Requires the ability to mentally imagine, manipulate or transform objects or visual patterns (without regard to speed of responding). What type, or types, of items are to be used ? • Stimulus presentation (e.g, visual-graphic, oral, etc.) • Response mode (e.g., verbal, pointing, drawing, etc.) • etc What physical materials are needed and how should they appear? • Test books • Test records • Manipulatives • Audio tapes/CDs • etc. How is the test to be scored ? •Dichotomous (0/1), polychotomous (0/1//3/4/…) • By hand, machine, computer • Scoring rubrics/guides • etc
  • Visualization (Vz) Item Development: How do you do it? (cont.) Use definition to identify essential task requirements Visualization (Vz): The ability to apprehend a spatial form, object, or scene and match it with another spatial object, form, or scene with the requirement to rotate it (one or more times) in two or three dimensions. Requires the ability to mentally imagine, manipulate or transform objects or visual patterns (without regard to speed of responding). Possible sources for prototype items (select list) • Tests that appear to measure same ability on other test batteries • Source books • Carroll’s 1993 book lists tasks he found in factor analysis review • International Directory of Spatial Tests (1983) • Journal articles (old classic articles; recent articles for newer ideas) • Published test resources (e.g., ETS Kit of Factor-Referenced Cognitive Tests) • Internet • Domain related books • Create your own ideas • etc [Some examples from select sources follow on next series of slides]
  • Vz test design decision: Unspeeded 3-D Block Rotation type task
  • Vz test design decision: Un-speeded 3-D Block Rotation type task Primary reasons for decision • Historical research base supporting construct validity of these tasks (construct validity) • No manipulatives (actual blocks) required (minimize construct irrelevant variance: no psychomotor abilities involved) • Ease of administration and scoring (increased reliability) • Items could span a wide developmental age-range (intended use of battery) • Primary elements in Vz definition included in this type of task (content /construct validity) • Added 3-D Gv task to WJ III – all other Gv tests were 2-D (content/face validity; goal to increase predictive validity of Gv cluster) • Believed block format would be enjoyed by young children (part of intended examines) • Future considerations: Readily adaptable to possible computer administration
  • Substantive Stage of Test Development: Develop Test Design and Specification Blueprint – Item Scale Development Gv Theoretical Domain = Cattell- Horn-Carroll (CHC) theory of cognitive abilities – Gv domain & 3 selected narrow Gv abilities Measurement or empirical domain High ability/difficult items Develop pool of items, from easy to difficult, that will then be ordered from low to high on the underlying Block Rotation measurement scale (analogy = developing a yardstick or ruler) Low ability/easy items
  • This is the “try out” phase of the new test idea Pay extra attention to the development of very “easy” and very “difficult” items Be prepared for the possibility that your initial item type/format may not be the one you end up with. The empirical may data send you in a different direction, or down a slightly modified path. Typically these initial prototype try out items are not of publication quality Often the hardest part of item development is the development of the examiners verbal instructions – need to be succinct, easy to understand, not complicated, etc.
  • This is one of the reasons for try out phase. To catch possible problems with the items.
  • This is typically done by authors themselves (and other trained staff) with very small numbers of subjects (as few as 10-12) to see how the instructions and ideas work Carefully observe examinee performance and make note of any ideas regarding how to improve the items and instructions •Sometimes useful to do post-testing interview of examinee about their experience—e.g., “how did you go about working the problems?”, etc. Need to try out with young/immature and mature subjects •Be careful to not just us very select subjects (friends and children of university faculty) Make changes in items, instructions, materials, etc. and continue to try out items with additional subjects. Repeat this iterative process as many times as necessary. •More times are better—better to be safe rather than sorry down the road Wise to have multiple individuals try out the items to gather multiple sources of feedback
  • Structural (Internal) Stage of Test Development Purpose Examine the internal relations among the measures used to operationalize the theoretical construct domain (i.e., intelligence or cognitive abilities) Questions asked Do the observed measures “behave” in a manner consistent with the theoretical domain definition of intelligence? Method and concepts • Internal domain studies • Item/subscale intercorrelations • Item response theory (IRT) Characteristics of • Moderate item internal consistency strong test validity • Items/measures are representative of the empirical domain program • Items fit the theoretical structure
  • Prepare and evaluate “first draft” version of the test Develop 2 to 3 times as many items as needed for the final test •In our Block Rotation example, we crafted an initial set of 44 items Develop preliminary examiner test record for test. You must have a standardized test record system for quality control purposes Administer to 50+ subjects (authors and trained staff) •Need to try out with young/immature and mature subjects Carefully observe examinee performance and make note of any ideas regarding how to improve the items and instructions Run initial IRT/Rasch on data to get some sense of item difficulty Make modifications that are desirable, including dropping and initial re-ordering of items
  • Prepare and evaluate “calibration” version of the test Administer to 200 to 300+ subjects • Need to administer to a “rectangular” (aka, uniform) distribution of subjects Important concept: You are “norming the scale” – not yet constructing the “norms for the test”                …is better than…                      [cont. next slide]
  • Prepare and evaluate “calibration” version of the test. Then develop “norming” test Enter calibration data and run IRT/RASCH (WINSTEPS; BIGSTEPS) • [Note – sample Rasch output will be displayed later] Revise calibration test as indicated based on RASCH findings • Drop, add, possibly re-order items • Improve instructions and item scoring keys Have an expert panel review items for potential bias         Prepare “norming” test form       [Note: There is also a “norming-calibration” process that will not   be covered during this broad stroke presentation. If this process is used one would prepare the norming-calibration form of the test at this stage.
  • g Theoretical Domain - CHC Cylinders = broad CHC abilities Circles = narrow CHC abilities Gf Gv Glr Gs Gc Gsm Ga The process just described occurs for all the tests you are developing Measurement or empirical domain
  • Specification of test design blueprint is critically important Development of operational definition of constructs to be measured is important. Should flow from theory and research You don’t always need to re-invent the wheel when deciding on item format, type, etc. Lots of existing resources are often available for ideas Rectangular distributions are critical during item development and try out phases. Pay extra attention to the development of very “easy” and very “difficult” items Don’t grow to attached to your items or tests. Some will not survive. IRT/Rasch analysis, although made easy with contemporary software, is not for novices/rookies. You need to know, or have a consultant who knows, how these programs work. • Common serious error by novices (one example): Not leaving “blanks” for items not administered below the basal or above the ceiling. This makes it all but impossible to know how items are ‘behaving”---IRT technology/software knows how to handle the blanks. DON’T impute 1’s below the basal and 0’s above the ceiling. It makes it all but impossible to then construct a test from the pool of items. 1/0 imputation can be done later (in new separate files) for internal consistency reliability calculations or special Rasch-based reasons.
  • You should develop 2-3 times as many items as you will need in the final test Spend considerable time and effort in developing concise standardized examiner instructions. Edit, revise, edit, revise, etc. More small-sample item tryouts, following by iterative revisions and new item development, is better than less. Test authors must be intimately involved in item development and initial item try- outs
  • Test design engineering (Woodcock, 1970, 1975, 1999, 2008)
  • Test design engineering section will be part of a future “advanced module”
  • End of Part B Additional steps in test development process will be presented in subsequent modules as they are developed