Applied Psych Test Design: Part A--Planning, development frameworks & domain/testspecification blueprints

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Applied Psych Test Design: Part A--Planning, development frameworks & domain/testspecification blueprints - Presentation Transcript

    1. The Art and Science of Test Development—Part A Planning, development frameworks & domain/test specification blueprints Kevin S. McGrew, PhD. Educational Psychologist Research Director Woodcock-Muñoz Foundation The basic structure and content of this presentation is grounded extensively on the test development procedures developed by Dr. Richard Woodcock
    2. The Art and Science of Test Development The above titled topic is presented in a series of sequential PowerPoint modules. It is strongly recommended that the modules (A-G) be viewed in sequence. Part A: Planning, development frameworks & domain/test specification blueprints Part B: Test and Item Development Part C: Use of Rasch Technology Part D: Develop norm (standardization) plan Part E: Calculate norms and derived scores Part F: Psychometric/technical and statistical analysis: Internal Part G: Psychometric/technical and statistical analysis: External The current module is designated by red bold font lettering
    3. “In an ever-changing world, psychological testing remains the flagship of applied psychology” Embretson, S. E. (1996). The new rules of measurement. Psychological Assessment, 8 (4), 341-349.
    4. Desirable Personality Traits of Test Developers • Obsessive-compulsive • Intellectually inquisitive • Masochistic • 1/99 % I/P ratio • Sadistic • Tough-skinned
    5. Desirable Personality Traits of Test Developers • Willingness to take risks and giant leaps of faith
    6. The approach to test development used: Item Response Theory (IRT) X = T + E observed score = true score + error Classical Test Theory (CTT), and IRT vs CTT comparisons, are not covered in this presentation
    7. The bible of test development: The “Joint Standards”
    8. Test development is a complex series of interconnected steps • The reality of the complexity of test development is not fully appreciated by most test users • The following complex flow-charts are intended to illustrate the magnitude of the overall project complexity • This presentation will focus on the more general, broad stroke test development framework • The process is much more non-linear than depicted by flow charts and presentations
    9. “Generic” Woodcock test development flowchart
    10. Test/Battery Development: Practical “Broad Stroke” Framework (Woodcock)
    11. A detailed description for a test, often called a test blueprint, that specifies: • The number or proportion of items that assess each content and process/skill area • The format of items, response, and scoring rubrics and procedures, and • The desired psychometric properties of the items and test such as the distribution of item difficulty and discrimination abilities
    12. Test/Battery Development: Common Conceptual Psychometric Validity Framework (Bensen, 1998 summary)
    13. Substantive Stage of Test Development Purpose Define the theoretical and empirical/measurement domains of interest (e.g., intelligence or cognitive abilities –cognitive + achievement) Questions asked How should intelligence be defined and operationally measured? Method and concepts • Theory development & validation • Generate definitions • Item and scale development • Content validation • Evaluate construct underrepresentation and construct irrelevancy Characteristics of • A strong psychological theory plays a prominent role strong test validity • Theory provides a well-specified and bounded domain of program constructs • The empirical domain includes measures of all potential constructs (i.e., adequate construct representation) • The empirical domain includes measures that only contain reliable variance related to the theoretical constructs (i.e., construct relevance)
    14. Structural (Internal) Stage of Test Development Purpose Examine the internal relations among the measures used to operationalize the theoretical construct domain (i.e., intelligence or cognitive abilities) Questions asked Do the observed measures “behave” in a manner consistent with the theoretical domain definition of intelligence? Method and concepts • Internal domain studies • Item/subscale intercorrelations • Exploratory/confirmatory factor analysis • Item response theory (IRT) • Multitrait-Multimethod matrix • Generalizability theory Characteristics of • Moderate item internal consistency strong test validity • Measures co-vary in a manner consistent with the intended program theoretical structure • Factors reflect trait rather than method variance • Items/measures are representative of the empirical domain • Items fit the theoretical structure • The theoretical/empirical model is deemed plausible (especially when compared against other competing models) based on substantive and statistical criteria
    15. External Stage of Test Development Purpose Examine the external relations among the focal construct (i.e., intelligence or cognitive abilities) and other constructs and/or subject characteristics Questions asked Do the focal constructs and observed measures “fit” within a network of expected construct relations (i.e., the nomological network) Method and concepts • Group differentiation • Structural equation modeling • Correlation of observed measures with other measures • Multitrait-Multimethod matrix Characteristics of • Focal constructs vary in theorized ways with other constructs strong test validity • Measures of the constructs differentiate existing groups that program are known to differ on the constructs • Measures of focal constructs correlate with other validated measures of the same constructs • Theory-based hypotheses are supported, particularly when compared to rival hypotheses
    16. Practical “Broad Stroke” Framework: Typical Questions to Ask (Woodcock) What is the intended purpose ? Who are the potential users ? Who are the intended examinees ? What domain (s) of behavior are to be measured and in what proportion ? • Content/substantive validity • Maximize construct representation • Minimize construct irrelevant variance What type, or types, of items are to be used ? How is the test to be scored ? • By hand, machine, computer • Scoring rubrics/guides • Correction for guessing What types of derived scores will be provided ?
    17. Practical “Broad Stroke” Framework: Typical Questions to Ask (Woodcock) How are the scores to be interpreted ? • Types of profiles to provide What physical materials are needed and how should they appear? • Test books • Test records • Manipulatives • Audio tapes/CDs • Computer disks • Scoring keys • Manuals • Training materials • etc.
    18. This presentation is an integration of the practical and psychometric test/battery frameworks Practical “Broad Stroke” Framework Common Conceptual Psychometric Validity Framework
    19. Substantive Stage Structural (Internal) & External Stages
    20. Examples used in this presentation come from the domain of intelligence or cognitive abilities (cognitive + achievement) Based on presenters experience as a coauthor of the Woodcock- Johnson Battery—Third Edition (WJ III; 2001)
    21. Typically there are two types of test specification blueprints • Well defined a priori (typically theory-based) blueprints • Less well-defined (emerging) data-driven (empirical) blueprints
    22. Possible theory-based intelligence model test design blueprints (select examples) Gardner MI theory Das-Naglieri PASS Theory Cattell-Horn-Carroll (CHC) theory
    23. Possible emerging, empirical, or pragmatic intelligence model test design blueprints (select examples) Original Wechsler Verbal/Nonverbal model 1977 WJ Pragmatic Decision-Making model
    24. Substantive Stage of Test Development Purpose Define the theoretical and empirical/measurement domains of interest (e.g., intelligence or cognitive abilities –cognitive + achievement) Questions asked How should intelligence be defined and operationally measured? Method and concepts • Theory development & validation Characteristics of • A strong psychological theory plays a prominent role strong test validity • Theory provides a well-specified and bounded domain of program constructs
    25. • Psychometric approach: is • Several theorists argue that the dominant approach, has there are many different inspired the most research, “intelligences” (systems of is used most widely in abilities), only a few of which practical settings can be captured by standard (p. 77). psychometric tests (p. 78)
    26. CHC Theory Defined • Combination of research by Raymond Cattell, John Horn, and John Carroll • The most empirically-supported, psychometric-based, contemporary description of the structure of human cognitive abilities • Based on the analyses of hundreds of data sets that were not restricted to a particular test battery • The theory describes cognitive abilities as a function of degree of breadth/generality – Broad and narrow cognitive abilities
    27. Cattell-Horn Carroll Fluid Gf Fluid Gf Intelligence Intelligence g Quantitative Gq Knowledge Crystallized Crystallized Gc Intelligence Gc Intelligence Short-Term Gen. Memory Gy Memory & Learning Gsm Broad Visual Gv Visual Processing Gv Perception Broad Auditory Gu Auditory Perception Ga Processing Long-Term Broad Retrieval Gr Glr Retrieval Ability Comparison Processing Broad Cognitive Gs Speediness Gs Speed Dec/Reaction Gt Correct Time/Speed Decision Speed CDS Carroll and Cattell-Horn Model Reading/ Grw Writing
    28. ...most disciplines have a common set of terms and definitions (i.e., a standard nomenclature) that facilitates communication among professionals and guards against misinterpretations. In chemistry, this standard nomenclature is reflected in the ‘Table of Periodic Elements’. Carroll (1993a) has provided an analogous table for intelligence….. (Flanagan & McGrew, 1998)
    29. The verdict is unanimous re: the importance of Carroll’s (1993) work Richard Snow (1993): “John Carroll has done a magnificent thing. He has reviewed and reanalyzed the world’s literature on individual differences in cognitive abilities…no one else could have done it… it defines the taxonomy of cognitive differential psychology for many years to come.” Burns (1994): Carroll’s book “is simply the finest work of research and scholarship I have read and is destined to be the classic study and reference work on human abilities for decades to come” (p. 35). John Horn (1998): A “tour de force summary and integration” that is the “definitive foundation for current theory” (p. 58). Horn compared Carroll’s summary to “Mendelyev’s first presentation of a periodic table of elements in chemistry” (p. 58). Arthur Jensen (2004): “…on my first reading this tome, in 1993, I was reminded of the conductor Hans von Bülow’s exclamation on first reading the full orchestral score of Wagner’s Die Meistersinger, ‘‘It’s impossible, but there it is!’’ “Carroll’s magnum opus thus distills and synthesizes the results of a century of factor analyses of mental tests. It is virtually the grand finale of the era of psychometric description and taxonomy of human cognitive abilities. It is unlikely that his monumental feat will ever be attempted again by anyone, or that it could be much improved on. It will long be the key reference point and a solid foundation for the explanatory era of differential psychology that we now see burgeoning in genetics and the brain sciences” (p. 5).
    30. Carroll and Cattell-Horn Broad Ability Correspondence Stratum III g A. Carroll Three-Stratum Model (vertically-aligned ovals represent similar broad domains) (general) Notes. Broad ability factor codes based on Carroll (1993) and Horn and Blankson (2005). See Table 1 for additional explanation. 80+ Stratum I (narrow) abilities have been identified under the Stratum II broad abilities. They are not listed here due to space limitations (see Table 1). Gf Gc Gy Gv Gu Gr Gs Gt Placement of g to the left-side of the Carroll Three-Stratum Model (A) is consistent with Carroll's (1993) published figures, a placement reflecting his finding that the broad abilities towards the left (e.g,Gf, Gc) had the highest loadings on the g-factor. The placement of the Stratum II Grw and Gq factors in the Cattell-Horn Extended Gf-Gc Model (B) is not consistent with thisg-broad ability representation as Grw and Gq (broad) typically demonstrate highg-loadings. Grw and Gq are placed to the B. Cattell-Horn Extended Gf-Gc Model right in B to reflect their absence in model A. SAR TSR Gf Gc Gv Ga Gs CDS Grw Gq Gsm Glm C. Cattell-Horn-Carroll (CHC) Integrated Model D. Tentatively identified Stratum II g (broad) domains 1 Gf Gc Gsm Gv Ga Glr Gs Gt Grw Gq Gkn Gh Gk Go Gp Gps (Missing g-to-broad ability arrows acknowledges that Carroll and Cattell-Horn disagreed on the validity of the general factor) CHC Broad (Stratum II) Ability Domains Gf Fluid reasoning Gkn General (domain-specific) knowledge Gc Comprehension-knowledge Gh Tactile abilities Gsm Short-term memory Gk Kinesthetic abilities Gv Visual processing Go Olfactory abilities Ga Auditory processing Gp Psychomotor abilities Glr Long-term storage and retrieval Gps Psychomotor speed Gs Processing speed Gt Decision and reaction speed (see Table 1 for definitions) Grw Reading and writing 1 See McGrew (2004, 2005) for literature review supporting these domains Gq Quantitative knowledge © Institute for Applied Psychometrics, LLC Kevin S. McGrew 7-22-08
    31. Substantive Stage of Test Development: Develop Test Design and Specification Blueprint Cylinders = broad CHC abilities g Theoretical Domain Circles = narrow CHC abilities Specification Gf Gv Glr Gs Gc Gsm Ga • What is the theoretical domain? • How should intelligence be defined? • What intelligence theory has the best validity evidence? Answer: Cattell-Horn-Carroll (CHC) theory of cognitive abilities
    32. Substantive Stage of Test Development: Develop Test Design and Specification Blueprint Cylinders = broad CHC abilities g Theoretical Domain = Cattell- Circles = narrow CHC abilities Horn-Carroll (CHC) theory of cognitive abilities Gf Gv Glr Gs Gc Gsm Ga What broad and narrow ability domain(s) are to be measured and in what proportion ? • Answer relates to questions regarding intended purpose of battery, intended examinees, and intended users. • How do we assure adequate construct representation? How do we define the broad and narrow ability constructs? • Content validity important
    33. Substantive Stage of Test Development: Develop Test Design and Specification Blueprint Cylinders = broad CHC abilities g Theoretical Domain = Cattell- Circles = narrow CHC abilities Horn-Carroll (CHC) theory of cognitive abilities Gf Gv Glr Gs Gc Gsm Ga Example domain to be used for illustration of process: Gv (Visual Processing)
    34. What narrow Gv ability domain(s) are to be measured and in what proportion ? • Answer relates to questions regarding intended purpose of battery, intended examinees, and intended users. • How do we assure adequate construct representation?
    35. Substantive Stage of Test Development Purpose Define the theoretical and empirical/measurement domains of interest (e.g., intelligence or cognitive abilities –cognitive + achievement) Questions asked How should intelligence be defined and operationally measured? Method and concepts • Generate definitions Characteristics of • The empirical domain includes measures of all potential strong test validity constructs (i.e., adequate construct representation) program
    36. Definition of broad Gv (Visual Processing) • Ability to perceive, analyze, synthesize and think with visual patterns • Ability to store and recall visual representations • Fluent thinking with stimuli that are visual in the “mind’s eye” What narrow Gv ability domain(s) are to be measured and in what proportion ? • Answer relates to questions regarding intended purpose of battery, intended examinees, and intended users. • How do we assure adequate construct representation?
    37. Narrow Gv ability definitions Spatial Relations (SR): Ability to rapidly perceive and manipulate relatively simple visual patterns or to maintain orientation with respect to objects in space. Visualization (Vz): The ability to apprehend a spatial form, object, or scene and match it with another spatial object, form, or scene with the requirement to rotate it (one or more times) in two or three dimensions. Requires the ability to mentally imagine, manipulate or transform objects or visual patterns (without regard to speed of responding). Visual Memory (MV): Ability to form and store a mental representation or image of a visual stimulus and then recognize or recall it later. We will focus on one: Visualization (Vz)
    38. Substantive Stage of Test Development Purpose Define the theoretical and empirical/measurement domains of interest (e.g., intelligence or cognitive abilities –cognitive + achievement) Questions asked How should intelligence be defined and operationally measured? Method and concepts • Content validation Characteristics of strong test validity program
    39. Content validity evidence Knowledge and skills covered (sampled) by the test items should be representative of the larger population domain of knowledge and skills. Refers to logical or empirical analyses of the adequacy with which the test content represents the content domain and of the relevance of the content domain to the proposed interpretation of test scores (Joint Test Standards) This is a non-statistical type of validity that involves “the systematic examination of the test content to determine whether it covers a representative sample of the behaviour domain to be measured” (Anastasi & Urbina, 1997)
    40. Content validity evidence: One example
    41. Content validity evidence: One example (cont. – for all tests in battery) Etc…….
    42. Content validity evidence: Another example in the domain of reading: Logical—theoretical skill hierarchy task analysis model
    43. End of Part A Additional steps in test development process will be presented in subsequent modules as they are developed

    + iapsychiapsych, 4 months ago

    custom

    842 views, 0 favs, 0 embeds more stats

    The Art and Science of Applied Test Development. T more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 842
      • 842 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 44
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories