Applied Psych Test Design: Part E--Cacluate norms and derived scores

2,724 views

Published on

The Art and Science of Applied Test Development. This is the fifth in a series of PPT modules explicating the development of psychological tests in the domain of cognitive ability using contemporary methods (e.g., theory-driven test specification; IRT-Rasch scaling; etc.). The presentations are intended to be conceptual and not statistical in nature. Feedback is appreciated.

Published in: Business, Technology
  • Be the first to comment

  • Be the first to like this

Applied Psych Test Design: Part E--Cacluate norms and derived scores

  1. 1. The Art and Science of Test Development—Part E Calculate norms and derived scores Kevin S. McGrew, PhD. Educational Psychologist Research Director Woodcock-Muñoz Foundation The basic structure and content of this presentation is grounded extensively on the test development procedures developed by Dr. Richard Woodcock
  2. 2. The Art and Science of Test Development The above titled topic is presented in a series of sequential PowerPoint modules. It is strongly recommended that the modules (A-G) be viewed in sequence. Part A: Planning, development frameworks & domain/test specification blueprints Part B: Test and Item Development Part C: Use of Rasch Technology Part D: Develop norm (standardization) plan Part E: Calculate norms and derived scores Part F: Psychometric/technical and statistical analysis: Internal Part G: Psychometric/technical and statistical analysis: External The current module is designated by red bold font lettering
  3. 3. Norm: A standard or range of values that represents the typical performance of a group or of an individual (of a certain age, for example) against which comparisons can be made
  4. 4. How do we construct age-based norms from standardization norm data? Answer: Curve fitting of sorted subsample data points is the engine that drives the development of all derived scores
  5. 5. These Block 546 Rotation W-scores Block Rotation are then used for Summary: Final developing test Rasch for “norms” and Publication test – validity research graphic item map n = 37 norming items (0-74 RS points)  n = 4,722 norm   subjects                    Graphic display of distribution of Block Rotation person abilities Pub. Test W-score scale 432
  6. 6. 1 6 11 WJ III “classic” norm calculation procedures Age Age Age W W W 2 7 12 Age Age Age W W W (each ball represents an individual norm subject) 3 8 13 Age Age Age W W W n =8,000+ norm subjects 4 9 … Age Age Age W W W 5 10 8,000 Age Age Age W W W 1. Sort 8,000 subjects from youngest (CA in months) to oldest Oldest Youngest ……………………………………………………………………………………… Mdn CA x1 x2 …….. 2. Divide sorted pool of subjects in successively older blocks of n=50 Mdn W y1 y2 …….. 3. Calculate “weighted” (US Census derived subject weights) median (average) CAMOS (X) and REF W (Y) for each block 4. Plot mdn CAMOS (x1, x2,..) and REF W (y1, y2…) and smooth curve
  7. 7. 550 550 Example: Letter-Word ID Ref W (20-120 months) raw data points 500 500 450 450 400 400 350 Each data point is a “sample” that 350 contains “sampling error” --- this accounts for the “bounce” between data points. How do we deal with 300 this sampling error (bounce) to 300 construct norms and derived scores? 250 250 20 40 60 80 100 120
  8. 8. Letter-Word ID Ref W (20-120 months) polynomial curve generated solution (using special curve fitting software) 550 550 500 500 450 450 400 400 The smoothed curve 350 350 represents the best approximation of the 300 population average norm 300 W-score for a test (Reference W or REF W) 250 250 20 40 60 80 100 120
  9. 9. Obtaining Developmental Scores (age/grade equivalents) A W-score of 450 (for Letter-Word Identification test) = 2.4 grade equivalent A W-score of 400 = 1.3 grade equivalent Smoothed age curves are used in the same manner to obtain age equivalents
  10. 10. Developing norms and derived scores: What does a tested person’s score on a test mean when compared to the appropriate reference group (age norms will be used as example) The meaning of a Block     Rotation W-score of X                     (e.g., 477) will have             different interpretations             when compared to                                            different age group norm subsamples 2 yr olds 3 yr olds 4 yr olds 5 yr olds Measures of relative standing (percentile rank, standard score) derive meaning based on how far away the 431.6 477 545.7 person’s W-score is from average (for age) BBB Block Rotation W-scale
  11. 11. Obtaining Measures of Relative Standing: A subjects W-score for a specific measure is compared to the average W-scores for that subjects specific age (age norms) or grade (grade norms). This is called the Reference W (REF W) Expected “average REF W” for someone tested at grade 3.0 (grade norms) is 472.5 (obtained score of 472.5 would be SS=100; PR=50)
  12. 12. Obtaining Scores of Relative Standing: Subjects obtained W-score for a specific measure is compared to the distribution (mean and SD) of W-scores for that subjects specific age (age norms) or grade (grade norms) “Mean” is the smoothed “Ref W” value for a specific age/grade “SD” is the smoothed SD (10/90) for a specific age/grade SS (M=100; SD=15) = (z x 15) + 100 • e.g. z = -1; SS = 85
  13. 13. X Y Custom software generated norm “setup” data file example (Block Rotation) Input for graphing and polynomial curve fitting X Y Note: These examples are from original WJ III 2001 norms and not the subsequent WJ III NU (2007) norms
  14. 14. Original Block Rotation Reference W age-based curve fitting: A real- world example of the “art + science” of constructing norms r^2=0.12670607 Eqn 8160 Line(a,b) Robust None 7667WLO Eqn 7667 Chebyshev=>Std Rational Order 8/9 Block Rotation Ref-W Age 510 510 505 505 500 500 495 495 490 490 Ref-W Ref-W 485 485 480 480 475 475 470 470 465 465 460 460 12 120 1200 Age (in months) Solution A: Up to 230 months (note: age scale is a log scale) Note: These examples are from original WJ III 2001 norms and not the subsequent WJ III NU (2007) norms
  15. 15. Original Block Rotation Reference W age-based curve fitting: A real- world example of the “art + science” of constructing norms r^2=0.12670607 Eqn 8160 Line(a,b) Robust None 6870WHI Eqn 6870 Chebyshev=>Std Polynomial Order 20 Block Rotation Ref-W Age 510 510 505 505 500 500 495 495 490 490 Ref-W Ref-W 485 485 480 480 475 475 470 470 465 465 460 460 12 276 540 804 1068 Age (in months) Solution B: 231 to 1200 months (note: age scale is regular interval scale--not log scale) Note: These examples are from original WJ III 2001 norms and not the subsequent WJ III NU (2007) norms
  16. 16. Original Block Rotation Reference W age-based curve fitting: A real- world example of the “art + science” of constructing norms Curve solution A “feathered/blended” with Curve Solution B at 230 months for single final solution. Sometimes more than 2 curve parts are needed for age norms. Note: These examples are from original WJ III 2001 norms and not the subsequent WJ III NU (2007) norms
  17. 17. Final smoothed curves serve as the mechanism for the published norms, either in the form of equations in software Or, Note: These examples are from original WJ III 2001 norms and not the subsequent WJ III NU (2007) norms
  18. 18. Age Reference (in months) W Age Reference (in months) W Tables of values for published norms in test manuals Note: These examples are from original WJ III 2001 norms and etc…… not the subsequent WJ III NU (2007) norms
  19. 19. X Y Custom software generated norm “setup” data file example (Block Rotation) Input for graphing and polynomial curve fitting X Y Note: These examples are from original WJ III 2001 norms and not the subsequent WJ III NU (2007) norms
  20. 20. Original Block Rotation SD90 age-based curve fitting: A real-world example of the “art + science” of constructing norms Block Rotation SD90 Age Rank 2502 Eqn 7938 y=(a+cx^(0.5)+ex+gx^(1.5)+ix^2)/(1+bx^(0.5)+dx+fx^(1.5)+hx^2+jx^(2.5)) [NL] r^2=0.48235094 DF Adj r^2=0.41998358 FitStdErr=1.6978814 Fstat=8.6968999 a=15.791894 b=0.66619087 c=1.7270779 d=-0.2462822 e=-1.0287721 f=0.02543265 g=0.082451267 h=-0.00095281528 i=-0.0010522608 j=1.5044367e-05 15 15 13 13 11 11 9 9 SD90 SD90 7 7 5 5 3 3 1 1 12 120 1200 Age (in months) Same is done for SD 10 Note: These examples are from original WJ III 2001 norms and not the subsequent WJ III NU (2007) norms
  21. 21. Original Block Rotation SD90 age-based curve fitting: A real-world example of the “art + science” of constructing norms Block Rotation SD90 Age Rank 2502 Eqn 7938 y=(a+cx^(0.5)+ex+gx^(1.5)+ix^2)/(1+bx^(0.5)+dx+fx^(1.5)+hx^2+jx^(2.5)) [NL] r^2=0.48235094 DF Adj r^2=0.41998358 FitStdErr=1.6978814 Fstat=8.6968999 a=15.791894 b=0.66619087 c=1.7270779 d=-0.2462822 e=-1.0287721 f=0.02543265 g=0.082451267 h=-0.00095281528 i=-0.0010522608 j=1.5044367e-05 15 15 13 13 11 11 9 9 SD90 SD90 7 7 5 5 3 3 1 1 12 276 540 804 1068 Age (in months) Same is done for SD 10 Note: These examples are from original WJ III 2001 norms and not the subsequent WJ III NU (2007) norms
  22. 22. Final smoothed curves serve as the mechanism for the published norms, either in the form of equations in software Or, Note: These examples are from original WJ III 2001 norms and not the subsequent WJ III NU (2007) norms
  23. 23. Age SD (in (in months) W units) Age SD (in (in months) W units) Tables of values for published norms in test manuals Note: These examples are from original WJ III 2001 norms and etc…… not the subsequent WJ III NU (2007) norms
  24. 24. Obtaining Scores of Relative Standing: Subjects W-score for a specific measure is then compared to the distribution of W-scores for that subjects specific age (age norms) or grade (grade norms) Smoothed SD90 Smoothed REF W (average) Note: These are NOT the curves for Block Rotation. They are from another measure. Used here as example Smoothed SD10
  25. 25. More is better. The larger the sample the smaller will be the sampling error associated with computed scores When calculating norm curves, use medians for each age (or grade) block— not means Special test(s)-cluster consistency checks and procedures need to be used to prevent test(s)-cluster score anomalies Apply the proposed norms for each measure to the actual norm data as a quality control procedure If concurrent validity data are available (correlations with other published and respected measures of similar abilities/constructs), it may be wise to apply proposed norms to your tests and then compare the respective set on the derived scores to the external measures via correlations and descriptive statistics (means and SDs). • May be particularly informative if you begin to question the variability in your norm sample data (to restricted or to variable) • You are using other as crude “benchmarks” established test batteries
  26. 26. Use of bootstrap re-sampling methods in curve fitting Special proprietary iterative curve fitting Q/A procedures for selecting best possible curve from a pool of plausible curves Different subject weighting procedures Calculating other measure norms •Cluster norms (combinations of tests) •Differentially weighted cluster norms (e.g., WJ III GIA cluster) •Discrepancy norms Special test-cluster consistency checks and procedures Creating special Rasch (W-score) based interpretative scoring options and features (e.g., RPI, instructional ranges) – explained in separate PPT module Special test-length correction procedures for calculation of reliabilities and correlations Linear vs area (normalization) transformation of scores. Woodcock combined approach
  27. 27. With publication of WJ III NU norms, we now use bootstrap generated “sticks” and not raw single data points
  28. 28. WJ III NU boostraping: If you really want to know check out ASB9
  29. 29. WJ III NU boostraping: If you really want to know check out ASB9
  30. 30. WJ III NU boostraping: If you really want to know check out ASB9
  31. 31. End of Part E Additional steps in test development process will be presented in subsequent modules as they are developed

×