The Art and Science of Test Development—Part A

  Planning, development frameworks & domain/test
               specificat...
The Art and Science of Test Development
           The above titled topic is presented in a series of sequential PowerPoin...
“In an ever-changing
            world, psychological
             testing remains the
              flagship of applied
 ...
Desirable Personality Traits of Test Developers




• Obsessive-compulsive

• Intellectually inquisitive

• Masochistic

 ...
Desirable Personality Traits of Test Developers




     • Willingness to take risks
      and giant leaps of faith
The approach to test development used:
     Item Response Theory (IRT)




            X      =      T     + E
    observe...
The bible of test development: The “Joint
                Standards”
Test development is a complex
       series of interconnected steps



• The reality of the complexity of test development...
“Generic”
Woodcock test
development
  flowchart
Test/Battery Development:
Practical “Broad Stroke” Framework
            (Woodcock)
A detailed description for a test, often called a test blueprint, that
specifies:
           • The number or proportion of...
Test/Battery Development:
Common Conceptual Psychometric Validity Framework
              (Bensen, 1998 summary)
Substantive Stage of Test Development

Purpose                Define the theoretical and empirical/measurement domains of
...
Structural (Internal) Stage of Test Development

Purpose                Examine the internal relations among the measures ...
External Stage of Test Development

Purpose                Examine the external relations among the focal construct (i.e.,...
Practical “Broad Stroke” Framework:
       Typical Questions to Ask
                    (Woodcock)


   What is the intend...
Practical “Broad Stroke” Framework:
       Typical Questions to Ask
                     (Woodcock)



   How are the scor...
This presentation is an integration of the
    practical and psychometric test/battery
                  frameworks



   ...
Substantive Stage




Structural (Internal) & External Stages
Examples used in this presentation come from
    the domain of intelligence or cognitive abilities
             (cognitive...
Typically there are two types of test specification blueprints

• Well defined a priori (typically theory-based) blueprint...
Possible theory-based intelligence model test
                        design blueprints (select examples)




            ...
Possible emerging, empirical, or pragmatic
 intelligence model test design blueprints
             (select examples)




 ...
Substantive Stage of Test Development

Purpose                Define the theoretical and empirical/measurement domains of
...
•    Psychometric approach: is     •   Several theorists argue that
     the dominant approach, has        there are many ...
CHC Theory Defined

•   Combination of research by Raymond Cattell, John Horn,
    and John Carroll

•   The most empirica...
Cattell-Horn                  Carroll

                             Fluid




                                           G...
...most disciplines have a common set of terms
    and definitions (i.e., a standard nomenclature)
 that facilitates commu...
The verdict is unanimous re: the importance of Carroll’s (1993) work

Richard Snow (1993):
      “John Carroll has done a ...
Carroll and Cattell-Horn Broad Ability Correspondence
Stratum III                 g             A. Carroll Three-Stratum M...
Substantive Stage of Test Development:
                       Develop Test Design and Specification Blueprint


Cylinders ...
Substantive Stage of Test Development:
                            Develop Test Design and Specification Blueprint


   Cy...
Substantive Stage of Test Development:
                         Develop Test Design and Specification Blueprint


Cylinder...
What narrow Gv ability domain(s) are to be measured and in what proportion ?

     • Answer relates to questions regarding...
Substantive Stage of Test Development

Purpose                Define the theoretical and empirical/measurement domains of
...
Definition of broad Gv (Visual Processing)
• Ability to perceive, analyze, synthesize and think with visual patterns
• Abi...
Narrow Gv ability definitions




Spatial Relations (SR): Ability to rapidly perceive and manipulate relatively simple vis...
Substantive Stage of Test Development

Purpose                Define the theoretical and empirical/measurement domains of
...
Content validity evidence




       Knowledge and skills covered (sampled) by the test items should be
      representati...
Content validity evidence:
     One example
Content validity evidence: One example
             (cont. – for all tests in battery)




Etc…….
Content validity evidence: Another example in the domain of reading:
      Logical—theoretical skill hierarchy task analys...
End of Part A
  Additional steps in test development process will be
presented in subsequent modules as they are developed
Applied Psych Test Design: Part A--Planning, development frameworks & domain/testspecification blueprints
Upcoming SlideShare
Loading in …5
×

Applied Psych Test Design: Part A--Planning, development frameworks & domain/testspecification blueprints

3,818 views

Published on

The Art and Science of Applied Test Development. This is the first in a series of PPT modules explicating the development of psychological tests in the domain of cognitive ability using contemporary methods (e.g., theory-driven test specification; IRT-Rasch scaling; etc.). The presentations are intended to be conceptual and not statistical in nature. Feedback is appreciated.

Published in: Education, Technology, Business
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,818
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
255
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide
  • d
  • d
  • d
  • Applied Psych Test Design: Part A--Planning, development frameworks & domain/testspecification blueprints

    1. 1. The Art and Science of Test Development—Part A Planning, development frameworks & domain/test specification blueprints Kevin S. McGrew, PhD. Educational Psychologist Research Director Woodcock-Muñoz Foundation The basic structure and content of this presentation is grounded extensively on the test development procedures developed by Dr. Richard Woodcock
    2. 2. The Art and Science of Test Development The above titled topic is presented in a series of sequential PowerPoint modules. It is strongly recommended that the modules (A-G) be viewed in sequence. Part A: Planning, development frameworks & domain/test specification blueprints Part B: Test and Item Development Part C: Use of Rasch Technology Part D: Develop norm (standardization) plan Part E: Calculate norms and derived scores Part F: Psychometric/technical and statistical analysis: Internal Part G: Psychometric/technical and statistical analysis: External The current module is designated by red bold font lettering
    3. 3. “In an ever-changing world, psychological testing remains the flagship of applied psychology” Embretson, S. E. (1996). The new rules of measurement. Psychological Assessment, 8 (4), 341-349.
    4. 4. Desirable Personality Traits of Test Developers • Obsessive-compulsive • Intellectually inquisitive • Masochistic • 1/99 % I/P ratio • Sadistic • Tough-skinned
    5. 5. Desirable Personality Traits of Test Developers • Willingness to take risks and giant leaps of faith
    6. 6. The approach to test development used: Item Response Theory (IRT) X = T + E observed score = true score + error Classical Test Theory (CTT), and IRT vs CTT comparisons, are not covered in this presentation
    7. 7. The bible of test development: The “Joint Standards”
    8. 8. Test development is a complex series of interconnected steps • The reality of the complexity of test development is not fully appreciated by most test users • The following complex flow-charts are intended to illustrate the magnitude of the overall project complexity • This presentation will focus on the more general, broad stroke test development framework • The process is much more non-linear than depicted by flow charts and presentations
    9. 9. “Generic” Woodcock test development flowchart
    10. 10. Test/Battery Development: Practical “Broad Stroke” Framework (Woodcock)
    11. 11. A detailed description for a test, often called a test blueprint, that specifies: • The number or proportion of items that assess each content and process/skill area • The format of items, response, and scoring rubrics and procedures, and • The desired psychometric properties of the items and test such as the distribution of item difficulty and discrimination abilities
    12. 12. Test/Battery Development: Common Conceptual Psychometric Validity Framework (Bensen, 1998 summary)
    13. 13. Substantive Stage of Test Development Purpose Define the theoretical and empirical/measurement domains of interest (e.g., intelligence or cognitive abilities –cognitive + achievement) Questions asked How should intelligence be defined and operationally measured? Method and concepts • Theory development & validation • Generate definitions • Item and scale development • Content validation • Evaluate construct underrepresentation and construct irrelevancy Characteristics of • A strong psychological theory plays a prominent role strong test validity • Theory provides a well-specified and bounded domain of program constructs • The empirical domain includes measures of all potential constructs (i.e., adequate construct representation) • The empirical domain includes measures that only contain reliable variance related to the theoretical constructs (i.e., construct relevance)
    14. 14. Structural (Internal) Stage of Test Development Purpose Examine the internal relations among the measures used to operationalize the theoretical construct domain (i.e., intelligence or cognitive abilities) Questions asked Do the observed measures “behave” in a manner consistent with the theoretical domain definition of intelligence? Method and concepts • Internal domain studies • Item/subscale intercorrelations • Exploratory/confirmatory factor analysis • Item response theory (IRT) • Multitrait-Multimethod matrix • Generalizability theory Characteristics of • Moderate item internal consistency strong test validity • Measures co-vary in a manner consistent with the intended program theoretical structure • Factors reflect trait rather than method variance • Items/measures are representative of the empirical domain • Items fit the theoretical structure • The theoretical/empirical model is deemed plausible (especially when compared against other competing models) based on substantive and statistical criteria
    15. 15. External Stage of Test Development Purpose Examine the external relations among the focal construct (i.e., intelligence or cognitive abilities) and other constructs and/or subject characteristics Questions asked Do the focal constructs and observed measures “fit” within a network of expected construct relations (i.e., the nomological network) Method and concepts • Group differentiation • Structural equation modeling • Correlation of observed measures with other measures • Multitrait-Multimethod matrix Characteristics of • Focal constructs vary in theorized ways with other constructs strong test validity • Measures of the constructs differentiate existing groups that program are known to differ on the constructs • Measures of focal constructs correlate with other validated measures of the same constructs • Theory-based hypotheses are supported, particularly when compared to rival hypotheses
    16. 16. Practical “Broad Stroke” Framework: Typical Questions to Ask (Woodcock) What is the intended purpose ? Who are the potential users ? Who are the intended examinees ? What domain (s) of behavior are to be measured and in what proportion ? • Content/substantive validity • Maximize construct representation • Minimize construct irrelevant variance What type, or types, of items are to be used ? How is the test to be scored ? • By hand, machine, computer • Scoring rubrics/guides • Correction for guessing What types of derived scores will be provided ?
    17. 17. Practical “Broad Stroke” Framework: Typical Questions to Ask (Woodcock) How are the scores to be interpreted ? • Types of profiles to provide What physical materials are needed and how should they appear? • Test books • Test records • Manipulatives • Audio tapes/CDs • Computer disks • Scoring keys • Manuals • Training materials • etc.
    18. 18. This presentation is an integration of the practical and psychometric test/battery frameworks Practical “Broad Stroke” Framework Common Conceptual Psychometric Validity Framework
    19. 19. Substantive Stage Structural (Internal) & External Stages
    20. 20. Examples used in this presentation come from the domain of intelligence or cognitive abilities (cognitive + achievement) Based on presenters experience as a coauthor of the Woodcock- Johnson Battery—Third Edition (WJ III; 2001)
    21. 21. Typically there are two types of test specification blueprints • Well defined a priori (typically theory-based) blueprints • Less well-defined (emerging) data-driven (empirical) blueprints
    22. 22. Possible theory-based intelligence model test design blueprints (select examples) Gardner MI theory Das-Naglieri PASS Theory Cattell-Horn-Carroll (CHC) theory
    23. 23. Possible emerging, empirical, or pragmatic intelligence model test design blueprints (select examples) Original Wechsler Verbal/Nonverbal model 1977 WJ Pragmatic Decision-Making model
    24. 24. Substantive Stage of Test Development Purpose Define the theoretical and empirical/measurement domains of interest (e.g., intelligence or cognitive abilities –cognitive + achievement) Questions asked How should intelligence be defined and operationally measured? Method and concepts • Theory development & validation Characteristics of • A strong psychological theory plays a prominent role strong test validity • Theory provides a well-specified and bounded domain of program constructs
    25. 25. • Psychometric approach: is • Several theorists argue that the dominant approach, has there are many different inspired the most research, “intelligences” (systems of is used most widely in abilities), only a few of which practical settings can be captured by standard (p. 77). psychometric tests (p. 78)
    26. 26. CHC Theory Defined • Combination of research by Raymond Cattell, John Horn, and John Carroll • The most empirically-supported, psychometric-based, contemporary description of the structure of human cognitive abilities • Based on the analyses of hundreds of data sets that were not restricted to a particular test battery • The theory describes cognitive abilities as a function of degree of breadth/generality – Broad and narrow cognitive abilities
    27. 27. Cattell-Horn Carroll Fluid Gf Fluid Gf Intelligence Intelligence g Quantitative Gq Knowledge Crystallized Crystallized Gc Intelligence Gc Intelligence Short-Term Gen. Memory Gy Memory & Learning Gsm Broad Visual Gv Visual Processing Gv Perception Broad Auditory Gu Auditory Perception Ga Processing Long-Term Broad Retrieval Gr Glr Retrieval Ability Comparison Processing Broad Cognitive Gs Speediness Gs Speed Dec/Reaction Gt Correct Time/Speed Decision Speed CDS Carroll and Cattell-Horn Model Reading/ Grw Writing
    28. 28. ...most disciplines have a common set of terms and definitions (i.e., a standard nomenclature) that facilitates communication among professionals and guards against misinterpretations. In chemistry, this standard nomenclature is reflected in the ‘Table of Periodic Elements’. Carroll (1993a) has provided an analogous table for intelligence….. (Flanagan & McGrew, 1998)
    29. 29. The verdict is unanimous re: the importance of Carroll’s (1993) work Richard Snow (1993): “John Carroll has done a magnificent thing. He has reviewed and reanalyzed the world’s literature on individual differences in cognitive abilities…no one else could have done it… it defines the taxonomy of cognitive differential psychology for many years to come.” Burns (1994): Carroll’s book “is simply the finest work of research and scholarship I have read and is destined to be the classic study and reference work on human abilities for decades to come” (p. 35). John Horn (1998): A “tour de force summary and integration” that is the “definitive foundation for current theory” (p. 58). Horn compared Carroll’s summary to “Mendelyev’s first presentation of a periodic table of elements in chemistry” (p. 58). Arthur Jensen (2004): “…on my first reading this tome, in 1993, I was reminded of the conductor Hans von Bülow’s exclamation on first reading the full orchestral score of Wagner’s Die Meistersinger, ‘‘It’s impossible, but there it is!’’ “Carroll’s magnum opus thus distills and synthesizes the results of a century of factor analyses of mental tests. It is virtually the grand finale of the era of psychometric description and taxonomy of human cognitive abilities. It is unlikely that his monumental feat will ever be attempted again by anyone, or that it could be much improved on. It will long be the key reference point and a solid foundation for the explanatory era of differential psychology that we now see burgeoning in genetics and the brain sciences” (p. 5).
    30. 30. Carroll and Cattell-Horn Broad Ability Correspondence Stratum III g A. Carroll Three-Stratum Model (vertically-aligned ovals represent similar broad domains) (general) Notes. Broad ability factor codes based on Carroll (1993) and Horn and Blankson (2005). See Table 1 for additional explanation. 80+ Stratum I (narrow) abilities have been identified under the Stratum II broad abilities. They are not listed here due to space limitations (see Table 1). Gf Gc Gy Gv Gu Gr Gs Gt Placement of g to the left-side of the Carroll Three-Stratum Model (A) is consistent with Carroll's (1993) published figures, a placement reflecting his finding that the broad abilities towards the left (e.g,Gf, Gc) had the highest loadings on the g-factor. The placement of the Stratum II Grw and Gq factors in the Cattell-Horn Extended Gf-Gc Model (B) is not consistent with thisg-broad ability representation as Grw and Gq (broad) typically demonstrate highg-loadings. Grw and Gq are placed to the B. Cattell-Horn Extended Gf-Gc Model right in B to reflect their absence in model A. SAR TSR Gf Gc Gv Ga Gs CDS Grw Gq Gsm Glm C. Cattell-Horn-Carroll (CHC) Integrated Model D. Tentatively identified Stratum II g (broad) domains 1 Gf Gc Gsm Gv Ga Glr Gs Gt Grw Gq Gkn Gh Gk Go Gp Gps (Missing g-to-broad ability arrows acknowledges that Carroll and Cattell-Horn disagreed on the validity of the general factor) CHC Broad (Stratum II) Ability Domains Gf Fluid reasoning Gkn General (domain-specific) knowledge Gc Comprehension-knowledge Gh Tactile abilities Gsm Short-term memory Gk Kinesthetic abilities Gv Visual processing Go Olfactory abilities Ga Auditory processing Gp Psychomotor abilities Glr Long-term storage and retrieval Gps Psychomotor speed Gs Processing speed Gt Decision and reaction speed (see Table 1 for definitions) Grw Reading and writing 1 See McGrew (2004, 2005) for literature review supporting these domains Gq Quantitative knowledge © Institute for Applied Psychometrics, LLC Kevin S. McGrew 7-22-08
    31. 31. Substantive Stage of Test Development: Develop Test Design and Specification Blueprint Cylinders = broad CHC abilities g Theoretical Domain Circles = narrow CHC abilities Specification Gf Gv Glr Gs Gc Gsm Ga • What is the theoretical domain? • How should intelligence be defined? • What intelligence theory has the best validity evidence? Answer: Cattell-Horn-Carroll (CHC) theory of cognitive abilities
    32. 32. Substantive Stage of Test Development: Develop Test Design and Specification Blueprint Cylinders = broad CHC abilities g Theoretical Domain = Cattell- Circles = narrow CHC abilities Horn-Carroll (CHC) theory of cognitive abilities Gf Gv Glr Gs Gc Gsm Ga What broad and narrow ability domain(s) are to be measured and in what proportion ? • Answer relates to questions regarding intended purpose of battery, intended examinees, and intended users. • How do we assure adequate construct representation? How do we define the broad and narrow ability constructs? • Content validity important
    33. 33. Substantive Stage of Test Development: Develop Test Design and Specification Blueprint Cylinders = broad CHC abilities g Theoretical Domain = Cattell- Circles = narrow CHC abilities Horn-Carroll (CHC) theory of cognitive abilities Gf Gv Glr Gs Gc Gsm Ga Example domain to be used for illustration of process: Gv (Visual Processing)
    34. 34. What narrow Gv ability domain(s) are to be measured and in what proportion ? • Answer relates to questions regarding intended purpose of battery, intended examinees, and intended users. • How do we assure adequate construct representation?
    35. 35. Substantive Stage of Test Development Purpose Define the theoretical and empirical/measurement domains of interest (e.g., intelligence or cognitive abilities –cognitive + achievement) Questions asked How should intelligence be defined and operationally measured? Method and concepts • Generate definitions Characteristics of • The empirical domain includes measures of all potential strong test validity constructs (i.e., adequate construct representation) program
    36. 36. Definition of broad Gv (Visual Processing) • Ability to perceive, analyze, synthesize and think with visual patterns • Ability to store and recall visual representations • Fluent thinking with stimuli that are visual in the “mind’s eye” What narrow Gv ability domain(s) are to be measured and in what proportion ? • Answer relates to questions regarding intended purpose of battery, intended examinees, and intended users. • How do we assure adequate construct representation?
    37. 37. Narrow Gv ability definitions Spatial Relations (SR): Ability to rapidly perceive and manipulate relatively simple visual patterns or to maintain orientation with respect to objects in space. Visualization (Vz): The ability to apprehend a spatial form, object, or scene and match it with another spatial object, form, or scene with the requirement to rotate it (one or more times) in two or three dimensions. Requires the ability to mentally imagine, manipulate or transform objects or visual patterns (without regard to speed of responding). Visual Memory (MV): Ability to form and store a mental representation or image of a visual stimulus and then recognize or recall it later. We will focus on one: Visualization (Vz)
    38. 38. Substantive Stage of Test Development Purpose Define the theoretical and empirical/measurement domains of interest (e.g., intelligence or cognitive abilities –cognitive + achievement) Questions asked How should intelligence be defined and operationally measured? Method and concepts • Content validation Characteristics of strong test validity program
    39. 39. Content validity evidence Knowledge and skills covered (sampled) by the test items should be representative of the larger population domain of knowledge and skills. Refers to logical or empirical analyses of the adequacy with which the test content represents the content domain and of the relevance of the content domain to the proposed interpretation of test scores (Joint Test Standards) This is a non-statistical type of validity that involves “the systematic examination of the test content to determine whether it covers a representative sample of the behaviour domain to be measured” (Anastasi & Urbina, 1997)
    40. 40. Content validity evidence: One example
    41. 41. Content validity evidence: One example (cont. – for all tests in battery) Etc…….
    42. 42. Content validity evidence: Another example in the domain of reading: Logical—theoretical skill hierarchy task analysis model
    43. 43. End of Part A Additional steps in test development process will be presented in subsequent modules as they are developed

    ×