The Art and Science of Test Development—Part F

Psychometric/technical statistical analysis: Internal


                  ...
The Art and Science of Test Development
           The above titled topic is presented in a series of sequential PowerPoin...
“In god we trust….all others must show data”
               (unknown source)




                                 Test aut...
Calculate psychometric/measurement
statistics for technical manual/chapters




  Use Joint Test Standards as a guide
Theoretical Domain - CHC      g




 Gf                 Gv              Glr        Gs
                                    ...
Calculate summary statistics (n, means, SDs, SEM) and
reliabilities for all tests and clusters by technical age groups



...
Special reliability analyses required for speeded tests



        Traditional test-retest reliability analysis
Special reliability analyses for all tests
    More complex repeated measures reliability analysis
(McArdle and Woodcock, ...
Provide evidence based on internal
    structure (internal validity)
Structural (Internal) Stage of Test Development

Purpose                Examine the internal relations among the measures ...
Structural/internal validity evidence: Test and cluster
        inter-correlation matrices by technical age groups




   ...
Structural/internal
     validity

Confirmatory factor
 analysis by major
   age groups

(exploratory factor
   analysis i...
Structural/internal validity Confirmatory factor
             analysis by major age groups

(exploratory factor analysis i...
Structural (Internal) Stage of Test Development

Purpose                Examine the internal relations among the measures ...
Structural/internal validity: Confirmatory factor
                    analysis model comparisons by major age groups



  ...
WJ III General Intellectual Ability (GIA) as a differentially weighted
                  measure of g (general intelligenc...
Internal validity evidence example: g-loadings for
differentially weighted General Intellectual Ability cluster
Provide evidence based on internal
structure: Developmental evidence?
Developmental evidence in the form of
differential growth curves of measures
Provide Test Fairness Evidence
Structural/internal validity

Evaluating structural invariance with Multiple Group CFA




                          =



...
Structural/internal validity

Evaluating structural invariance with Multiple Group CFA




                          =



...
Structural/internal validity

Evaluating structural invariance with Multiple Group CFA




                          =



...
Test fairness evidence: Item Level Analyses:
      Differential Item Functioning (DIF)




                               ...
Test fairness evidence: Item Level Analyses:
      Differential Item Functioning (DIF)




                               ...
Lack of rigor and quality control in all prior/earlier stages will “rattle through the
data” and rear its ugly head when p...
Don’t be seduced and completely reliant on factor analysis as the primary internal/structural
validity tool

     • An exa...
Exploratory-driven confirmatory factor analysis is often used by test
developers to explore unexpected characteristics of ...
End of Part F
  Additional steps in test development process will be
presented in subsequent modules as they are developed
Applied Psych Test Design: Part F--Psychometric/technical statistical analysis:  Internal
Upcoming SlideShare
Loading in …5
×

Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal

2,269 views

Published on

The Art and Science of Applied Test Development. This is the fifth in a series of PPT modules explicating the development of psychological tests in the domain of cognitive ability using contemporary methods (e.g., theory-driven test specification; IRT-Rasch scaling; etc.). The presentations are intended to be conceptual and not statistical in nature. Feedback is appreciated.

Published in: Technology, Business
1 Comment
0 Likes
Statistics
Notes
  • For data visualization,data analyticsand data intelligence tools online training with job placements, register at http://www.todaycourses.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total views
2,269
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
154
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Applied Psych Test Design: Part F--Psychometric/technical statistical analysis: Internal

  1. 1. The Art and Science of Test Development—Part F Psychometric/technical statistical analysis: Internal Kevin S. McGrew, PhD. Educational Psychologist Research Director Woodcock-Muñoz Foundation The basic structure and content of this presentation is grounded extensively on the test development procedures developed by Dr. Richard Woodcock
  2. 2. The Art and Science of Test Development The above titled topic is presented in a series of sequential PowerPoint modules. It is strongly recommended that the modules (A-G) be viewed in sequence. Part A: Planning, development frameworks & domain/test specification blueprints Part B: Test and Item Development Part C: Use of Rasch Technology Part D: Develop norm (standardization) plan Part E: Calculate norms and derived scores Part F: Psychometric/technical and statistical analysis: Internal Part G: Psychometric/technical and statistical analysis: External The current module is designated by red bold font lettering
  3. 3. “In god we trust….all others must show data” (unknown source) Test authors and publishers have standards-based responsibility to provide supporting psychometric technical information re: tests and battery Typically in the form of a series of technical chapters in manual or a separate technical manual
  4. 4. Calculate psychometric/measurement statistics for technical manual/chapters Use Joint Test Standards as a guide
  5. 5. Theoretical Domain - CHC g Gf Gv Glr Gs Internal evidence is Gc Gsm Ga focused on relations between and among variables (measures or latent constructs) within the designed battery Measurement or empirical domain
  6. 6. Calculate summary statistics (n, means, SDs, SEM) and reliabilities for all tests and clusters by technical age groups etc… etc…
  7. 7. Special reliability analyses required for speeded tests Traditional test-retest reliability analysis
  8. 8. Special reliability analyses for all tests More complex repeated measures reliability analysis (McArdle and Woodcock, 1989—see WJ-R Technical Manual)
  9. 9. Provide evidence based on internal structure (internal validity)
  10. 10. Structural (Internal) Stage of Test Development Purpose Examine the internal relations among the measures used to operationalize the theoretical construct domain (i.e., intelligence or cognitive abilities) Questions asked Do the observed measures “behave” in a manner consistent with the theoretical domain definition of intelligence? Method and concepts • Internal domain studies • Item/subscale intercorrelations • Exploratory/confirmatory factor analysis Characteristics of • Measures co-vary in a manner consistent with the intended strong test validity theoretical structure program • Factors reflect trait rather than method variance • Items/measures are representative of the empirical domain
  11. 11. Structural/internal validity evidence: Test and cluster inter-correlation matrices by technical age groups etc… etc…
  12. 12. Structural/internal validity Confirmatory factor analysis by major age groups (exploratory factor analysis if not theory-driven test blueprint)
  13. 13. Structural/internal validity Confirmatory factor analysis by major age groups (exploratory factor analysis if not theory-driven test blueprint) .53 .67 .40 .42 .43
  14. 14. Structural (Internal) Stage of Test Development Purpose Examine the internal relations among the measures used to operationalize the theoretical construct domain (i.e., intelligence or cognitive abilities) Questions asked Do the observed measures “behave” in a manner consistent with the theoretical domain definition of intelligence? Method and concepts • Exploratory/confirmatory factor analysis Characteristics of • The theoretical/empirical model is deemed plausible strong test validity (especially when compared against other competing models) program based on substantive and statistical criteria
  15. 15. Structural/internal validity: Confirmatory factor analysis model comparisons by major age groups The WJ III factor structure model provided the best fit to the data when compared to six alternative models Fit Statistics Models Chi-square df AIC RMSEA WJ III CHC 7-factor 13189.16 536 13377.16 0.056 (0.055-0.057) Gc/Gsm/Gs/Gv+Gf (WAIS 4-factor) 15113.99 537 15301.00 0.060 (0.059-0.061) Gc/Gsm/Gq/Gv+Gf (SB IV 4-factor) 20379.58 537 20565.58 0.070 (0.069-0.071) Gf-Gc Dichotomous (KAIT) 23145.12 549 23307.12 0.074 (0.073-0.075) PASS 4-factor * 25198.46 542 25374.46 0.077 (0.078-0.079) g single factor 65314.78 1170 65524.78 0.086 (0.085-0.086) Null model 215827.54 1219 215939.54 0.153 (0.153-0.154) The conclusion was the same across 5 age-differentiated samples
  16. 16. WJ III General Intellectual Ability (GIA) as a differentially weighted measure of g (general intelligence) Therefore need to provide internal validity evidence for test g-weights GIA (g) 1 Tests at this end are weighted (“counted”) more in the GIA score
  17. 17. Internal validity evidence example: g-loadings for differentially weighted General Intellectual Ability cluster
  18. 18. Provide evidence based on internal structure: Developmental evidence?
  19. 19. Developmental evidence in the form of differential growth curves of measures
  20. 20. Provide Test Fairness Evidence
  21. 21. Structural/internal validity Evaluating structural invariance with Multiple Group CFA = White Non-White
  22. 22. Structural/internal validity Evaluating structural invariance with Multiple Group CFA = Male Female
  23. 23. Structural/internal validity Evaluating structural invariance with Multiple Group CFA = Hispanic Non-Hispanic
  24. 24. Test fairness evidence: Item Level Analyses: Differential Item Functioning (DIF) •Male/Female •White/Non-White •Hispanic/Non- Hispanic
  25. 25. Test fairness evidence: Item Level Analyses: Differential Item Functioning (DIF) •Male/Female •White/Non- White •Hispanic/Non- Hispanic Results combined with results from Bias Sensitivity Review Panels
  26. 26. Lack of rigor and quality control in all prior/earlier stages will “rattle through the data” and rear its ugly head when performing the final statistical analysis Shorts cuts in prior stages will “bite you in in the ____” as you attempt to perform final statistical analysis Data screening, data screening, data screening!!!!……. prior to do performing final statistical analysis • Compute extensive descriptive statistical analysis for all variables (e.g., histograms, scatterplots, box-whisker plots, etc.) • More than means and SD’s. Also calculate median, skew, kurtosis, n-tiles, etc. Deliberately planned and sophisticated “front end” data collection short-cuts (e.g., matrix sampling) introduce an extreme level of “back end” complexity to routine statistical/psychometric analysis Know your limits, level of expertise, and skills. Even those with extensive test development experience often need access to trusted measurement/statistical consultants (cont. next slide)
  27. 27. Don’t be seduced and completely reliant on factor analysis as the primary internal/structural validity tool • An example: Inability of CFA to differentiate closely related latent constructs (e.g., Gc and Reading/Writing—Grw) doesn’t prove they are the same. Need to examine other evidence (e.g., very different developmental growth curves for Gc and Grw) Published statistics/psychometric information needs to be based on final publication length tests • Often need to use test-length correction formula’s (e.g., KR-21) for test reliabilities • Correlations between short /and or long norming versions of a test, that differ in test length (number of items) from publication length test, may need special adjustments/corrections. Back up, back up, back up!!!!!!!!!! Don’t let a dead hard drive or computer destroy your work and progress. Do it constantly. Build redundancy into your files and people skill sets Sad fact: Majority of test users do NOT pay attention to the fancy and special psychometric/statistical analysis you report in technical chapters or manuals. Be prepared for post-publication education via other methods. Post-manual publication technical reports of special/sophisticated analyses are good when publication time-line pressures dictate making difficult decisions.
  28. 28. Exploratory-driven confirmatory factor analysis is often used by test developers to explore unexpected characteristics of tests (often called “model generation modeling” in SEM/CFA literature) Different approaches to DIF (differential item functioning) Multiple group CFA to test invariance (by age, by gender, by……..) • Different degrees of measurement invariance can be tested Traditional definition of psychometric bias and appropriate/inappropriate statistical methods Equating (e.g., Form A/B) methods and evidence Methods for calculating prediction models that account for regression to the mean and that are sensitive to developmental (age) X content interactions Complex repeated measures reliability analyses to tease out test stability, internal consistency, and trait stability sources of score variance (see WJ-R Technical Manual)
  29. 29. End of Part F Additional steps in test development process will be presented in subsequent modules as they are developed

×