Successfully reported this slideshow.
Your SlideShare is downloading. ×

Metric Calibration of Psychological Instruments (Dissertation Senate presentation)

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 40 Ad

Metric Calibration of Psychological Instruments (Dissertation Senate presentation)

Download to read offline

Powerpoint slides of my dissertation project, presented to the Senate at The University of Western Ontario on June 22, 2011 in London, Ontario, Canada (Etienne P. LeBel, etiennelebel.com)

Abstract: Inspired by the history of the development of instruments in the physical sciences, and by past psychology giants, the following dissertation aimed to advance basic psychological science by investigating the metric calibration of psychological instruments. The over-arching goal of the dissertation was to demonstrate that it is both useful and feasible to calibrate the metric of psychological instruments so as to render their metrics non-arbitrary. Concerning utility, a conceptual analysis was executed delineating four categories of proposed benefits of non-arbitrary metrics including (a) help in the interpretation of data, (b) facilitation of construct validity research, (c) contribution to theory development, and (d) facilitation of general accumulation of knowledge. With respect to feasibility, the metric calibration approach was successfully applied to instruments of seven distinct constructs commonly studied in psychology, across three empirical demonstration studies and re-analyses of other researchers’ data. Extending past research, metric calibration was achieved in these empirical demonstration studies by finding empirical linkages between scores of the measures and specifically configured theoretically-relevant behaviors argued to reflect particular locations (i.e., ranges) of the relevant underlying psychological dimension. More generally, such configured behaviors can serve as common reference points to calibrate the scores of different instruments, rendering the metric of those instruments non-arbitrary.

LeBel, Etienne, "The Utility and Feasibility of Metric Calibration for Basic Psychological Research" (2011). Electronic Thesis and Dissertation Repository. 174.
https://ir.lib.uwo.ca/etd/174

Powerpoint slides of my dissertation project, presented to the Senate at The University of Western Ontario on June 22, 2011 in London, Ontario, Canada (Etienne P. LeBel, etiennelebel.com)

Abstract: Inspired by the history of the development of instruments in the physical sciences, and by past psychology giants, the following dissertation aimed to advance basic psychological science by investigating the metric calibration of psychological instruments. The over-arching goal of the dissertation was to demonstrate that it is both useful and feasible to calibrate the metric of psychological instruments so as to render their metrics non-arbitrary. Concerning utility, a conceptual analysis was executed delineating four categories of proposed benefits of non-arbitrary metrics including (a) help in the interpretation of data, (b) facilitation of construct validity research, (c) contribution to theory development, and (d) facilitation of general accumulation of knowledge. With respect to feasibility, the metric calibration approach was successfully applied to instruments of seven distinct constructs commonly studied in psychology, across three empirical demonstration studies and re-analyses of other researchers’ data. Extending past research, metric calibration was achieved in these empirical demonstration studies by finding empirical linkages between scores of the measures and specifically configured theoretically-relevant behaviors argued to reflect particular locations (i.e., ranges) of the relevant underlying psychological dimension. More generally, such configured behaviors can serve as common reference points to calibrate the scores of different instruments, rendering the metric of those instruments non-arbitrary.

LeBel, Etienne, "The Utility and Feasibility of Metric Calibration for Basic Psychological Research" (2011). Electronic Thesis and Dissertation Repository. 174.
https://ir.lib.uwo.ca/etd/174

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Metric Calibration of Psychological Instruments (Dissertation Senate presentation) (20)

Advertisement

Metric Calibration of Psychological Instruments (Dissertation Senate presentation)

  1. 1. The Utility and Feasibility of Metric Calibration for Basic Psychological Research Etienne LeBel The University of Western Ontario
  2. 2. “…being so disinterested in our variables that we do not care about their units can hardly be desirable” (Tukey, 1969, p. 89). JOHN TUKEY "...psychologists have to start respecting the units they work with, or develop measurement units they can respect enough so that researchers can agree to use them" (Cohen, 1994, p. 1001). JACOB COHEN Inspirational Quotations
  3. 3. Over-arching Goal • Both useful and feasible to calibrate the metric of instruments in basic psychological research
  4. 4. Outline • Definitions and basic concepts • Metric calibration strategies • Past metric calibration research • Utility of Metric Calibration • Feasibility: 3 Empirical demonstration studies • Limitations and Future Directions
  5. 5. Definitions and Basic Concepts • Metric: unit of measurement used to quantify the amount of something • E.g., Celsius metric (°C) • Fridge range = -10 to +50 °C • Freezer range = -50 to +70 °C Fridge: Freezer:
  6. 6. Definitions and Basic Concepts • Metric: unit of measurement used to quantify the amount of something • E.g., Beck's Depression Inventory • Metric = 0 to 63 (BDI; Beck & Steer, 1987) • E.g., Self-report Depression Scale • Metric = 25 to 100 (SDS; Zung, 1965)
  7. 7. Definitions and Basic Concepts • Arbitrary metric: • Scores not inherently meaningful, other than relative interpretation • Formally: Unknown where a given score locates an individual on the underlying psychological dimension (Blanton & Jaccard, 2006a, 2006b) SDSBDI
  8. 8. Underlying Dimension Freezing Reference Point Thermometer CThermometer B SDS BDIMDI Underlying Dimension Behavioral Reference Point Thermometer A Boiling Reference Point 100°B 50°B 150°B 50°B50°B 100°B Cooking thermometer Indoor thermometer Outdoor thermometer
  9. 9. Main Strategies of Metric Calibration • Strategy 1 • Mapping scores to qualitatively distinct behaviors • Strategy 2 • Mapping scores to gradation of behaviors (Blanton & Jaccard, 2006a, 2006b; Sechrest et al., 1996) • Strategy 3 • Experimental approach • Manipulate construct to extreme levels
  10. 10. Metric Calibration Strategy 1 • Map scores to qualitatively distinct theoretically-relevant behaviors 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 5 10 15 20 25 30 35 40 Beck Depression Inventory Scores ProbabilityofSuicide Attempt Underlying Dimension Behavioral Reference Point
  11. 11. Metric Calibration Strategy 2 • Map scores to gradations of theoretically- relevant behaviors Underlying Dimension Ref. Point 8 Hrs/day Ref. Point 2 Hrs/day Ref. Point 12 Hrs/day
  12. 12. Ideal Characteristics of Behavioral Reference Points • Theoretically-relevant • Interpretationally clear (e.g., 1 or 0; hrs/day) • Objective • Unambiguous construct-wise • Also, theoretically-configured context
  13. 13. Past Metric Calibration Research • Specific areas of applied psychology: • Clinical psychology (Kazdin, 1999, 2001; Harman et al., 2001; Sechrest et al., 1996) • Sport psychology (Andersen, McCullagh, & Wilson, 2007) • Forensic psychology (Pirelli et al., 2011; Hanson, 2009; Hanson et al., in press) • Arbitrary metrics in psychology (Blanton & Jaccard, 2006a, 2006b)
  14. 14. Utility of Metric Calibration 1. Help in the interpretation of data a. Enhance interpretability of statistical effects b. Facilitate extraction of more information from data patterns c. Help overcome limitations of NHST 2. Facilitate construct validity research a. Help shed brighter light on psychological constructs b. Help with conceptual challenges (e.g., construct definition) c. Benchmark for detecting problems/improving measures
  15. 15. Utility of Metric Calibration 3. Contribute to theoretical development a. Facilitate theoretical debates involving absolute claims b. Allow more precise theorizing via enhanced scientific language c. Preliminary platform for quantitative testing of theories (Meehl, 1978) 3. Facilitate general accumulation of knowledge a. Calibration findings valuable information in their own right b. Guiding framework for cataloguing magnitude of psychological effects c. Facilitate phenomenon-based research (Rozin, 2001)
  16. 16. Feasibility of Metric Calibration • Empirical demonstration studies • Study 1: Need for cognition (NFC), task persistence (TP), conscientiousness • Study 2: Self-enhancement • Study 3: Risk-taking
  17. 17. Study 1: NFC and TP • Participants • 94 UWO introductory psychology undergraduates • 69 females, 25 males (age = 18.5, SD = 2.2) • Procedure & Materials • Need for cognition measure • Task persistence measure • Word association decision task • Anagram Persistence task • Demographics & Debriefing questions
  18. 18. Study 1: Materials • Need for cognition (NFC) • Tendency to engage in cognitively effortful activities and enjoy thinking in its own right (Cacioppo & Petty, 1982) • 18-item scale (Cacioppo, Petty, & Kao, 1984) • E.g. item: “I find satisfaction in deliberating hard for long hours.” • E.g. item: “Thinking is not my idea of fun” (R ) 1= Extremely Uncharacteristic 2 = Somewhat Uncharacteristic 3 = Uncertain 4 = Somewhat Characteristic 5 = Extremely Characteristic
  19. 19. Study 1: Materials • NFC behavioral reference point • Cognitively effortful (vs. simpler) Remotes Association Task (RAT) (Mednick & Mednick, 1967)
  20. 20. Study 1: Materials • Task persistence • Tendency to persist in an effortful behavior or frustration- inducing activity (Steinberg et al., 2007) • 2-item self-report measure (Steinberg et al., 2007) • Item 1: “I will keep trying the same thing over again even when I have not had success the first time” • Item 2: “I will often continue to work on something, even after other people have given up.” 1= Very untrue, not at all like me 2 = Somewhat untrue or not like me 3 = Somewhat true or like me 4 = Very true, very much like me
  21. 21. Study 1: Materials • Task persistence behavioral reference point • Anagram persistence task (Brandon et al., 2003; Quinn et al., 1996)
  22. 22. Study 1: Results: NFC Wald’s χ2 = 9.71, B = 1.20, odds ratio (OR) = 3.33, p < .002 Underlying Dimension Behavioral Reference Point 5 4 3 2 1 NFC Task 1: 62% Task 2: 38%
  23. 23. Study 1: Results: Task Persistence Linear: B = 0.18, β = r = .15, p < .15 Cubic model: F(3, 90) = 2.00, p < .10
  24. 24. Study 1: Discussion • Enhance MMR analyses • Re-analysis of O’Hara et al. (2009) Conventional +/- 1 SD approach Using calibrated values (75% NFC behavior) (25% NFC behavior) NFC scores centered on 3.8 (50% NFC behavior)
  25. 25. Study 2 Demonstration • Self-enhancement measures • Background context • Pan-cultural self-enhancement debate (Sedikides et al., 2003; Heine, 2005)
  26. 26. Study 2 • Participants • 97 UWO introductory psychology undergraduates • 50 females, 47 males (age = 18.9, SD = 1.3) • Procedure & Materials • 2 self-enhancement measures • Filler task (RAT) • Over-claiming technique • Balanced Inventory of Desirable Responding • Demographics & Debriefing questions
  27. 27. Study 2: Materials • Self-enhancement • Tendency to view characteristics of oneself in an overly positive manner (Hogan & Nicholson, 1988) • Better-than-average judgments (Alicke et al., 1995; Gaertner et al., 2008) • Rate extent to which each listed trait describes yourself relative to the average Western student of your own age and gender POSITIVE: dependable intelligent considerate observant polite respectful cooperative reliable friendly creative NEGATIVE: gullible disobedient snobbish lazy disrespectful mean unforgiving vain uncivil unpleasant 1 = Much worse than the average university student of my age and gender 4 = As well as the average university student of my age and gender 7 = Much better than the average university student of my age and gender
  28. 28. Study 2: Materials • Self-enhancement behavioral reference point • Over-claiming technique variant (OCT; Paulhus et al., 2003) • 150 items (10 categories of 15 items) • 3 non-existent items (foils) per category; 30 foils total • Behavioral index: # of foils claimed as familiar PLEASE INDICATE FOR EACH ITEM WHETHER YOU ARE FAMILIAR WITH THE ITEM OR NOT, BY CLICKING THE APPROPRIATE RESPONSE OPTION: 0 = Never heard of it 1 = Familiar with it
  29. 29. Study 2: Results Linear: B = 1.88, β = r = .29, p < .004 Cubic model: F(3, 94)= 5.91, p < .004
  30. 30. Study 3 Demonstration • Risk-taking measures • Demonstrate metric calibration for: • Measures capturing state-like constructs • Behavioral measures
  31. 31. Study 3 • Participants • 99 individuals from UWO campus • Compensated $5 + earnings in BART task • 39 females, 58 males, 2 non-specified (age = 24.5, SD = 5.5) • Procedure & Materials • Balloon Analogue Risk Task (BART) • Columbia Card Task (CCT) • Risky gambles Lottery task • Two self-report risk-taking measures • Demographics & Debriefing questions
  32. 32. Study 3: Materials • Risk-taking • Behavior involving possibility of gains but with potential negative consequences (Ben-zur & Zeidner, 2009; Lejuez et al., 2002) • Balloon Analogue Risk Task (BART) (Lejuez et al., 2002) • Ps inflate 30 simulated balloons onscreen • Each balloon pump worth 1 cent • If balloon explodes, money is lost for that trial • Scoring: mean # of pumps (non-exploding trials)
  33. 33. Study 3: Materials • Columbia Card Task (CCT) – hot version (Figner et al., 2009) • Ps sequentially turn over cards in 4 x 8 array • Accumulate as many points as possible • Can continue unless loss card turned
  34. 34. Study 3: Materials • Behavioral reference points • Risky gambles in lottery risk task (Hsee & Weber, 1999) • If Option B selected, experimenter would actually flip a coin • Risky gambles on lotteries with larger sure bets reflective of higher risk-taking reference point Lottery Option A Option B 1 $6 for certain Flip a coin. Receive $10 if heads, receive $0 if tails. 2 $2 for certain Flip a coin. Receive $10 if heads, receive $0 if tails. 3 $8 for certain Flip a coin. Receive $10 if heads, receive $0 if tails. 4 $5 for certain Flip a coin. Receive $10 if heads, receive $0 if tails. 5 $4 for certain Flip a coin. Receive $10 if heads, receive $0 if tails.
  35. 35. Study 3: Results: BART Wald’s χ2 = 4.85, B = .03, odds ratio (OR) = 1.03, p < .03
  36. 36. Study 3: Results: CCT $4 safe bet: Wald’s χ2 = 3.24, B = .08, odds ratio (OR) = 1.08, p < .07 $6 safe bet: Wald’s χ2 = 5.78, B = .30, odds ratio (OR) = 1.35, p < .02
  37. 37. Study 3: Discussion • BART & CCT calibrated to common $4 reference point • Implication: • Enhanced interpretation of data patterns • Proposed benefit 1. b) extraction of more information Underlying Dimension $4 Reference Point BART 30 25 20 15 10 5 0 90 80 70 60 50 40 30 20 10 0 CCT 10°R10°R 15°R 20°R 13°R
  38. 38. Limitations & Caveats • Small sample sizes • Consensus re: reference points
  39. 39. Future Directions • Richer behavioral reference points • E.g., EAR (Mehl et al., 2002) • E.g., Eye-tracking • Experimental approach • Capture behavioral manifestations beyond naturally-occurring levels • Item Response Theory approach (Lord, 1980) • Model distinct and ordered behavioral reference points
  40. 40. END • Thanks to all who have helped: • Conceptual: • Bertram, Kurt, Chris, Paul, Yang • Data collection: • Scott Leith • Assigning Cohen (1994): • Lorne

Editor's Notes

  • -Argue that it is both useful and feasible calibrate the metric of instruments in basic psychological research, AS TO RENDER THE METRIC OF OUR INSTRUMENTS NON-ARBITRARY
    -By useful I mean that metric calibration can help us with (enhance) data interpretation, and construct validity, contribute to theoretical development, and facilitate general accumulation of knowledge. We’ll get back to these later.
    -[Also, I want to briefly mention at this point that metric calibration is a fairly unchartered territory in psychology given that only a handful of conceptual and empirical papers exist on metric calibration, all of which have been in specific applied areas of psychology. Hence, my contribution involves elaborating on the much BROADER utility and feasibility of metric calibration for psychological research more generally.]
  • [start building elements of the diagram immediately here..&amp;gt;!!]
    [actually can translate my dual-probe thermometer into the conceptual diagram and it’s perfect because they can represent Thermometer B &amp; C..]
  • [start building elements of the diagram immediately here..&amp;gt;!!]
    [actually can translate my dual-probe thermometer into the conceptual diagram and it’s perfect because they can represent Thermometer B &amp; C..]
  • An interesting fact about metrics is that virtually all measures in psychology have a metric which can be considered arbitrary. And so returning to our depression instruments, informally this means that the scores from the depression instruments are not inherently meaningful in themselves, other than a relative interpretation…blah,blah,blah
    Then after stating B&amp;J’s definition of arbitrary metrics, can further draw parallels between metric of thermometers and metrics of depression instruments, and then bring in the idea of reference points (and add to diagram), and then mention basic metric calibration idea of connecting scores to a common reference point as to render metric non-arbitrary (And then explicitly state that this is the basic idea of metric calibration and is the focus of my dissertation [my research problem]). Then can re-iterate my over-arching goal.
  • To clarify this idea, and further unpack the nature of arbitrary metrics, I will return to the more concrete world of thermometers. Imagine it’s the year 1600 and you have the following three thermometers….
    [actually can translate my dual-probe thermometer into the conceptual diagram and it’s perfect because they can represent Thermometer B &amp; C..]
    And this is the basic essence of metric calibration: RESEARCH PROBLEM:
  • Map observed scores to qualitatively distinct theoretically-relevant behaviors, specifically configured to reflect particular locations of the underlying dimension (can cover consensus issue later when describing ideal characteristics)
    e.g., presence or absence of theoretically-relevant behavior
  • And what I mean by “specifically-configured” is that ideally, behaviors chosen to serve as external reference points should possess the following chracteristics:
    (above &amp; beyond the fact that ideally the behavior should be configured and assesses such that it can be argued to reflect a particular location on the underlying dimension)
  • Entries in bold will be elaborated upon and/or demonstrated using preliminary results from my empirical demonstration studies.
  • The main goal of Study 1 was to provide a preliminary demonstration of the metric calibration approach applied to instruments of constructs commonly studied in psychology.
  • It was explained that Task 1 was less cognitively challenging in the sense that the answer would relate to the 3 stem words in the *SAME* way WHEREAS in Task 2 the answer would relate to the 3 stem words in a different way for each word. After seeing these examples (and corresponding answers) Ps decided which task they wanted to complete and proceeded to complete the chosen task.
  • Practically advantageous 2-item self-report measure of task persistence that has been used in past research
  • As a behavioral reference point, I used a commonly used anagram persistence task
  • Briefly mention in passing that NFC scale midpoint of 3 corresponds only to a probability of about 28% of choosing the cognitively effortful task
  • 1-unit increase in TP scores corresponded to an increase of 11 seconds in actual persistence on the near-impossible anagrams in the APT
    -cubic function explained approx 3x more variance; replicated in Joseph Ditre and Thomas brandon’s data set, etc……
  • Primary goal of Study 2 was to provide a preliminary demonstration of the feasibility and utility of the metric approach with regard to contributing to thoretical development
  • No restrictions were imposed on participant sex, age, or ethnicity. No experimental conditions were examined, hence all participants completed the same measures and tasks in the same order
  • It was explained that Task 1 was less cognitively challenging in the sense that the answer would relate to the 3 stem words in the *SAME* way WHEREAS in Task 2 the answer would relate to the 3 stem words in a different way for each word. After seeing these examples (and corresponding answers) Ps decided which task they wanted to complete and proceeded to complete the chosen task.
  • 1-unit increase in TP scores corresponded to an increase of 11 seconds in actual persistence on the near-impossible anagrams in the APT
    -cubic function explained approx 3x more variance; replicated in Joseph Ditre and Thomas brandon’s data set, etc……
  • Primary goal of Study 3 was to demonstrate the utility and feasibility of calibrating the scores of measures capturing predominantly state-like constructs and behavioral measures, in order to better demonstrate the proposed benefits relevant in experimental contexts
  • No restrictions were imposed on participant sex, age, or ethnicity. No experimental conditions were examined, hence all participants completed the same measures and tasks in the same order
  • Risk-taking is typically defined as the purposive enacting of a behavior that involves the possibility of some postivie consequences or gains but with some potential negative consequences.
  • Risk-taking is typically defined as the purposive enacting of a behavior that involves the possibility of some postivie consequences or gains but with some potential negative consequences.
  • Following Tversky &amp; Kahneman (1981) it was explicitly mentioned that two participants (chosen at random) would actually the money associated with their choices.
  • $4 item: safe bet = 42%, gamble = 56%
    $6 item: safe bet = 78%, gamble = 20%
  • [so in this sense, there are huge opportunities for ego-enhancement in the area of metric calibration; your name could go down in history when linked to a certain metric for a certain construct… e.g., degrees Olson for the unit of measurement for the attitude construct (though °O doesn’t really look very good)]
  • Could briefly mention in passing, that another reason the approach is valuable is that one can connect convenient cheap measures to inconvenient ecologically valid behavioral manifestations, and then be able to interpret lab results w.r.t. these ecologically valid behaviors (a way to connect basic &amp; applied research)
  • (Non-exhaustive list)
  • Enhancements:
    More meaningful calibrated values
    Overcomes sampling error issue
    Could yield different patters that are theoretically important
  • Provide methodological machinery to more directly tackle theoretical questions involving absolute claims….
    Enhancements:
    More meaningful calibrated values
    Overcomes sampling error issue
    Could yield different patters that are theoretically important

×