Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

605 views

Published on

Published in:
Health & Medicine

No Downloads

Total views

605

On SlideShare

0

From Embeds

0

Number of Embeds

9

Shares

0

Downloads

13

Comments

0

Likes

1

No embeds

No notes for slide

- 1. How does health psychology measure up?<br />A critical look at measurement in health psychology<br />Matthew Hankins16th September 2011 <br />
- 2. The empirical basis of Health Psychology<br />Why do Health Psychologists collect data?<br />Theory generation, esp. identifying constructs<br />Theory corroboration <br />Measuring outcomes (trials etc.)<br />The value of such activities is therefore critically dependent on the quality of the data <br />2<br />
- 3. Questionnaire measures<br />Majority of data collected by Health Psychologists is generated by questionnaire measures (‘scales’)<br />Questionnaires vary in the quality of data that they generate<br />Validity: extent to which the questionnaire measures what is intended<br />Reliability: extent to which variance in data reflects variance in construct measured<br />Index of measurement error <br />3<br />
- 4. Pragmatic approach<br />Validity<br />Unidimensionality (factor analysis)<br />Associations between measures<br />Discrimination between known groups<br />Reliability<br />Estimated by Cronbach’s Alpha<br />Or test-retest correlation <br />4<br />
- 5. Scale development<br />Combination of these approaches is derived from ‘Classical Test Theory’ (CTT)<br />Originated with Spearman (1904)<br />Landmark text: Guilford 2nd ed. (1954) <br />Fully developed by Lord & Novick (1968)<br />Further developments: ‘item-response theory’ (IRT)<br />E.gRasch model (1960)<br />CTT implicit in most empirical Health Psychology research<br />5<br />
- 6. CTT vs. IRT<br />Argument tends to be that IRT is superior to CTT<br />In particular, it is argued that IRT is ‘objective’ measurement<br />For large samples, differences more apparent than real:<br />Strong correlations between CTT data & IRT data<br />And differences tend to be smaller than the margin of error<br />If data treated as ordinal, perfect correlation between CTT & Rasch data<br />6<br />
- 7. What is a scale?<br />A scale orders people on the construct of interest<br />Both CTT & IRT agree that a person’s position on the dimension can be estimated from the item scores<br />Strength of IRT is that it does not assume that a set of correlated items forms a scale<br />Implicit in CTT: if items load on same factor, we automatically assume that they form a scale<br />7<br />LowPerson APerson BPerson CPerson D<br />High<br />Construct<br />
- 8. Scaling problem<br />Whether a set of items forms a scale is a hypothesis (Guttman 1950)<br />Formally tested whether items formed ‘Guttman scales’<br />“In contemporary psychometric practice, it is the rule rather than the exception that two people having the same score on a test will have [endorsed]different items…Such scores are crude empirical devices known to have some predictive efficiency, but they cannot be called measurements in any strict sense” (Loevinger 1948)<br />Additionally, there is no rational basis for adding up a set of ordinalLikert scores unless they have been shown to scale<br />8<br />
- 9. Example: PHQ-9<br />Feeling tired + Little interest in doing things + Poor appetite several days in last 2 weeks<br />Scale score = +3<br />Thoughts of hurting yourself in some way nearly every day in last 2 weeks<br />Scale score = +3<br />Are these responses really equivalent?<br />9<br />
- 10. Implications<br />If a set of items are assumed to form a scale, then we cannot be sure that the scale score accurately ranks people on the construct of interest<br />People with different positions may be assigned the same score<br />People with the same position may be assigned different scores<br />Unless we test the hypothesis, assessing reliability & validity is pointless<br />10<br />
- 11. 11<br />Disordered categories<br />What we would like: interval scales<br />What we think we have: ordinal scales<br />What we probably have: disordered categories<br />A scale that cannot rank-order people is not a scale<br />
- 12. Item ‘difficulty’ (intensity)<br />The problem arises because CTT does not account for item difficulty or intensity<br />Some items are endorsed at low levels of the construct<br />‘Low intensity item’<br />Endorsement may indicate low or high level of construct<br />Some items are endorsed at high levels of the construct<br />‘High intensity item’<br />Endorsement indicates high level of construct<br />12<br />
- 13. Example: PHQ-9<br />Feeling tiredon several days is a low intensity item<br />Endorsed at low level of depression<br />But may also be endorsed at higher levels of depression <br />13<br />LowYesYesYesYes<br />High<br />Depression<br />
- 14. Example: PHQ-9<br />Thoughts of hurting yourself in some way nearly every day in last 2 weeks is a high intensity item<br />Endorsed at high level of depression<br />But not endorsed at lower levels of depression <br />14<br />LowNoNoNo Yes<br />High<br />Depression<br />
- 15. How CTT fails to deal with item intensity<br />Factor analysis groups items of similar intensity<br />Factor analysis of a unidimensional construct will produce more than one ‘factor’<br />These ‘factors’ are simply sets of items with similar intensities<br />15<br />
- 16. Example: GHQ-12<br />Example: GHQ-12<br />Many studies report 2- or 3-factor solutions<br />‘Factors’ simply group items by intensity<br />16<br />Low <br />High<br />7 45 2 6 10 11<br />1 129<br />8 3<br />Psychiatric morbidity<br />
- 17. How CTT fails to deal with item intensity<br />Selecting items on basis of factor analysis exacerbates problem, but simultaneously conceals it<br />Items are selected on basis of similar intensities, creating scales with limited range but high reliability<br />17<br />Low <br />High<br />7 45 2 6 10 11<br />1 129<br />8 3<br />Psychiatric morbidity<br />7 4<br />1 12<br />8 3<br />Low <br />High<br />Psychiatric morbidity<br />
- 18. Why Rasch modelling is not the answer<br />Rasch modelling explicitly takes into account item intensities<br />Stochastic Guttman scale<br />Additionally claims to produce interval scaling & ‘objective’ measurement<br />Increasingly popular in Health Psychology<br />18<br />
- 19. Problems<br />Rasch models require very large samples to allow estimation of person and item parameters<br />Very strong assumptions, e.g. logistic item-response curve<br />The data must fit the model, not the other way round<br />Discards useful data to fit arbitrary assumptions<br />Interval scaling is questionable gain if psychological constructs are not quantitative in the first place<br />19<br />
- 20. Non-parametric IRT (NPIRT)<br />E.g. Mokken (1971)<br />Takes into account item intensities<br />Stochastic Guttman scale<br />Claims only to rank order people<br />Very weak assumptions<br />Retains data<br />Complements CTT<br />Uses simple scale score<br />20<br />
- 21. 21<br />
- 22. PROMIS project<br />NIH funded project since 2004 ($100m)<br />Establish a domain framework and develop candidate items for adult and paediatric Patient Reported Outcome Measures<br />Questionnaires developed using published methodology<br />Scaling methods include NPIRT and Graded Response Model (GRM)<br />22<br />
- 23. Summary<br />The credibility of Health Psychology research & practice rests on its empirical evidence base<br />This evidence base relies on the quality of questionnaire data<br />The quality of questionnaire data may be compromised by the use of inappropriate methods<br />We should stop relying on factor analysis & reliability coefficients & test the hypothesis that a set of items constitutes a scale<br />23<br />
- 24. Examples of NPIRT<br />
- 25. Mokken (1971) proposed two models<br />Monotone homogeneity model (MH)<br />Doubly monotone model (DM)<br />Scales fitting the MH model rank order people on the attribute of interest<br />Corollary is that scales not fitting the MH model do not rank order people on the attribute of interest <br />
- 26. Select items for the scale based on homogeneity<br />Assess whether the resulting scale fits the MH model<br />Scaling procedure and the MH model based on the following minimal assumptions: <br />For all items, if person A has a higher degree of X than person B, A’s probability of endorsing an item will be equal to or higher than B’s<br />Local independence: item scores are uncorrelated for the same degree of attribute<br />
- 27. If the purpose of the scale is to rank order peopleon a given attribute then the scale must be monotone homogenous<br />Probability of item being endorsed must be monotone nondecreasingagainst attribute<br />i.e. probability of item endorsement does not decrease with an increase in the measured attribute<br />* - as estimated from the remaining items of the scale<br />
- 28. For this GHQ-12 item the probability of endorsement reaches 50% at a low level of psychological distress<br />It is therefore a low intensity item: people endorsing this item are signalling a low level of distress<br />Note that probability (Y-axis) increases with increase in class score (X-axis)<br />
- 29. For this GHQ-12 item the probability of endorsement reaches 50% at a high level of psychological distress<br />It is therefore a high intensity item: people endorsing this item are signalling a high level of distress<br />Note that probability (Y-axis) also increases with increase in class score (X-axis), but curves:<br />Do not have the same slope<br />Are not required to have the same shape<br />
- 30. If two items belong to a unidimensional scale, then:<br />Endorsing the more intense item entails that the less intense item also be endorsed<br />Endorsing the less intense item does not entail that the more intense item be endorsed<br />For a Guttman scale, these are deterministic statements<br />For a Mokken scale, these are probabilistic statements<br />
- 31. Less intense item<br />More intense item<br />AGuttman error occurs when the moreintense item is endorsed but not the less intense item<br />Too many Guttman errors imply that items are not measuring the same attribute<br />
- 32. This asymmetrical relationship between item pairs can be summarised with Loevinger’s H <br />H is the coefficient of homogeneity between two items i and j<br />Ranges from 0.0 to 1.0<br />0.0 indicates no association between items<br />1.0 indicates perfect association, given the differences in item intensity<br />1.0 also indicates no Guttman errors<br />Mokken (1971) developed H for scale development<br />Hij: Homogeneity of pair of items<br />Hi : Homogeneity of item i with all items<br />H : Homogeneity of scale<br />
- 33. All Hij > 0<br />Start with item pair with highest Hij<br />Select third item to maximise scale H<br />Proceed until H reaches threshold value c<br />Produces a unidimensional scale<br />c = 0.3; weak scale<br />c = 0.4; medium scale<br />c = 0.5; strong scale<br />c = 1.0; perfect Guttman scale<br />
- 34. Results for GHQ-12<br />Step Item Scale H<br />1 p6d 0.79<br />1 n4d 0.79<br />2 n6d 0.73<br />3 n5d 0.68<br />4 n2d 0.64<br />5 n3d 0.61<br />6 p5d 0.59<br />7 p3d 0.57<br />8 p4d 0.55<br />9 n1d 0.53<br />10 p2d 0.51<br />11 p1d 0.50<br />=> the items of the GHQ-12 form a strong unidimensional scale <br />
- 35. Monotone homogeneity model: GHQ-12<br />Item H #vi maxvizmax #zsig<br />p1d 0.44 0 0.00 0.00 0<br />n1d 0.45 0 0.00 0.00 0<br />p2d 0.43 1 0.06 0.99 0<br />p3d 0.50 0 0.00 0.00 0<br />n2d 0.55 0 0.00 0.00 0<br />n3d 0.51 0 0.00 0.00 0<br />p4d 0.47 0 0.00 0.00 0<br />p5d 0.50 1 0.05 0.90 0<br />n4d 0.56 0 0.00 0.00 0<br />n5d 0.50 0 0.00 0.00 0<br />n6d 0.56 1 0.05 0.93 0<br />p6d 0.53 1 0.04 0.68 0<br />Small deviations from MH model but none significant<br />
- 36.
- 37.
- 38. Conclusion<br />The GHQ-12 is a strongly homogenous unidimensional scale<br />Small deviations from monotone homogeneity, none significant<br />The GHQ-12 summed score can rank order people by the measured attribute<br />i.e. it can serve as an ordinal measure of severity of psychiatric impairment<br />Compare to results of EFA/CFA studies<br />
- 39. Example: Northwick Park dependency scale<br />Item selection from pool of 16 items<br />Item Scale H<br />Q8 0.93<br />Q5 0.93<br />Q9 0.93<br />Q2 0.91<br />Q1 0.88<br />Q13 0.87<br />Q7 0.84<br />Q12 0.82<br />Q6 0.79<br />Q14 0.76<br />Q4 0.74<br />Q3 0.70<br />Q11 0.67<br />Q15 0.62<br />14 items form unidimensional scale<br />
- 40. Two items with serious violations of monotone homogeneity<br />Item H #vi maxvizmax #zsig<br />Q3 0.45 6 0.25 2.88 4<br />Q11 0.32 5 0.28 3.43 2<br />Q3: help required using toilet (urination)<br />Q11: help required with drinking<br />
- 41.
- 42. These items decrease in probability at the top end of the scale<br />With extreme dependency, patients require less help with drinking and emptying bladder<br />Because at this extreme, they are more likely to be tube-fed and catherised<br />Hence, for these items, probability of endorsement decreases as dependency increases<br />Scale is not monotone homogenous<br />The summed score will not rank order people on the measured attribute<br />

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment