Measuring Variables and Sampling
Roadmap
                        
Today: Begin Exam 2 material (Chapters 5, 6, 4)
   Scales of measurement
   Psychometric properties
      Reliability
      Validity


Tuesday:
   Finish chapter 5
   Discuss Exam 1
Zoom out: where are we?
           
We have:
   A research question
   An idea for a research design
   A hypothesis
But how do we measure what we’re interested
 in?
Scales of Measurement
                       to measure them
We study variables and need
 accurately
4 scales of measurement
   Nominal
   Ordinal
   Interval
   Ratio
Nominal Scale
                
symbols classify or categorize into GROUPS or
 TYPES
   Name, Categorize, Classify
   Caution: use of numbers to indicate group
Examples- gender, marital status, experimental
 condition
Ordinal Scale
                  
A rank order scale of measurement
Examples- order of finish, Letter grade in class,
 social class (low, med., high)
Allows you to determine which person is higher
 or lower but not how much higher or lower.
   Can’t make direct comparisons
Interval Scale
                  
Rank ordering PLUS equal intervals of distance
 between adjacent numbers
Example- Celsius and Fahrenheit temperature, IQ
 scores, year
Now you can make comparisons
Equal distances but no absolute zero point
Ratio Scale
                   
rank ordering, equal intervals PLUS an absolute
 zero point
Absolute zero = absence of variable
Examples- Kelvin temperature, income, weight,
 height, response time.
Psychometric properties
            
Reliability: Consistency/stability of scores
Validity: Are you measuring what you are trying
 to measure?
Ideally, we want:
   Measures that are reliable
   Inferences that are valid


Reliability is necessary but not sufficient in order
 to have validity
Think about a Target
        



Measuring Reliability
                 
4 Primary types
    Test-Retest Reliability
    Equivalent- Forms Reliability
    Internal Consistency Reliability
    Interrater Reliability
Indicate level of reliability with a reliability coefficient
    Correlation; should be positive and strong (> .70)
Test- Retest
                  
Refers to consistency over time
Same measure administered twice (with a time
 interval between)
Equivalent-Forms Reliability
            
Equivalent forms- two versions of the same
 measure
   Administer to the same group of people
Problem- hard to develop equivalent measures

Example: SAT, GRE
Internal Consistency
               
Consistency with which test items measure a
 single construct.
More items increases reliability, but we use as
 few items as possible
   Why?
Example: Internal
           Consistency
               
I feel sad
I feel down
I feel depressed
I feel miserable
I feel awful
Example: Internal
           Consistency
               
I feel hungry
I feel happy
I have green eyes
Big Bird is scary
I like turtles
   http://www.youtube.com/watch?v=CMNry4PE93Y
Internal Consistency
               
Measured using coefficient alpha (α)
   a.k.a. Cronbach’s alpha
   Should be .7 or higher
High values mean the items are measuring the
 same construct
If your scale measures more than 1 thing, each
 construct gets its own coefficient α
Interrater Reliability
                           of ratings made
Interrater reliability- consistency
  by different judges
   GRE writing section
   Expressive writing studies
   Correlation between ratings should be strong/positive
Interobserver Agreement
                       observers agree
percentage of times different
   % of times raters agree- easy to calculate and
    understand
Validity
                       
Accuracy of inferences or interpretations made
 on the basis of scores
Measuring schizophrenia, or love
   We can’t directly observe it!
   It’s the accuracy of the interpretation from the test
Validity
                      
Construct
Operationalization
Important to consider:
   Does your operationalization truly reflect what you’re
    measuring?
Validation
Never-ending process
Obtaining Validity:
            Based on Content
                          
Content validity: judgment of the degree to
 which items adequately represent a construct’s
 domain.
   Do items appear to represent the thing you’re trying to
    measure? (face validity)
   Does your measure exclude any important parts of
    what you’re trying to measure?
   Does your test measure something besides what you
    wanted? (i.e., include irrelevant items)
Obtaining Validity:
  Based on Internal Structure
                         
Some constructs are multidimensional and need
 measures that address all dimensions
Homogeneity—degree to which a set of items
 measure a single construct
   Item-to-total correlation
   Coefficient alpha
Obtaining Validity: Based on
 Relations to Other Variables
                         
Criterion-related validity: degree to which scores
 predict or relate to an already established test
Two types of criterion validity:
   Predictive: using your measure to predict future
    performance
   Concurrent: using your measure to predict current
    performance on the same construct, or a related one.
Obtaining Validity: Based on
 Relations to Other Variables
                       
Convergent validity: relationship between your
 measure and other measures of that same
 construct
Discriminant validity: evidence that scores from
 your measure are NOT similar to scores of tests
 on different constructs.
Appropriate Use of Reliability
      and Validity Info
                          
Reliability and validity info apply to the measure
 of interest in the reported sample
   Situation-specific, not broad
Standardized tests: norming group
   If you want to use a test with a group not represented
    in the norming group, be cautious
Report R & V for your own sample, and be wary of
 articles that make blanket statements about a
 measure’s R & V

Chapter 5

  • 1.
  • 2.
    Roadmap  Today: Begin Exam 2 material (Chapters 5, 6, 4)  Scales of measurement  Psychometric properties  Reliability  Validity Tuesday:  Finish chapter 5  Discuss Exam 1
  • 3.
    Zoom out: whereare we?  We have:  A research question  An idea for a research design  A hypothesis But how do we measure what we’re interested in?
  • 4.
    Scales of Measurement  to measure them We study variables and need accurately 4 scales of measurement  Nominal  Ordinal  Interval  Ratio
  • 5.
    Nominal Scale  symbols classify or categorize into GROUPS or TYPES  Name, Categorize, Classify  Caution: use of numbers to indicate group Examples- gender, marital status, experimental condition
  • 6.
    Ordinal Scale  A rank order scale of measurement Examples- order of finish, Letter grade in class, social class (low, med., high) Allows you to determine which person is higher or lower but not how much higher or lower.  Can’t make direct comparisons
  • 7.
    Interval Scale  Rank ordering PLUS equal intervals of distance between adjacent numbers Example- Celsius and Fahrenheit temperature, IQ scores, year Now you can make comparisons Equal distances but no absolute zero point
  • 8.
    Ratio Scale  rank ordering, equal intervals PLUS an absolute zero point Absolute zero = absence of variable Examples- Kelvin temperature, income, weight, height, response time.
  • 9.
    Psychometric properties  Reliability: Consistency/stability of scores Validity: Are you measuring what you are trying to measure? Ideally, we want:  Measures that are reliable  Inferences that are valid Reliability is necessary but not sufficient in order to have validity
  • 10.
    Think about aTarget 
  • 11.
  • 12.
  • 13.
  • 14.
    Measuring Reliability  4 Primary types  Test-Retest Reliability  Equivalent- Forms Reliability  Internal Consistency Reliability  Interrater Reliability Indicate level of reliability with a reliability coefficient  Correlation; should be positive and strong (> .70)
  • 15.
    Test- Retest  Refers to consistency over time Same measure administered twice (with a time interval between)
  • 16.
    Equivalent-Forms Reliability  Equivalent forms- two versions of the same measure  Administer to the same group of people Problem- hard to develop equivalent measures Example: SAT, GRE
  • 17.
    Internal Consistency  Consistency with which test items measure a single construct. More items increases reliability, but we use as few items as possible  Why?
  • 18.
    Example: Internal Consistency  I feel sad I feel down I feel depressed I feel miserable I feel awful
  • 19.
    Example: Internal Consistency  I feel hungry I feel happy I have green eyes Big Bird is scary I like turtles  http://www.youtube.com/watch?v=CMNry4PE93Y
  • 20.
    Internal Consistency  Measured using coefficient alpha (α)  a.k.a. Cronbach’s alpha  Should be .7 or higher High values mean the items are measuring the same construct If your scale measures more than 1 thing, each construct gets its own coefficient α
  • 21.
    Interrater Reliability  of ratings made Interrater reliability- consistency by different judges  GRE writing section  Expressive writing studies  Correlation between ratings should be strong/positive
  • 22.
    Interobserver Agreement  observers agree percentage of times different  % of times raters agree- easy to calculate and understand
  • 23.
    Validity  Accuracy of inferences or interpretations made on the basis of scores Measuring schizophrenia, or love  We can’t directly observe it!  It’s the accuracy of the interpretation from the test
  • 24.
    Validity  Construct Operationalization Important to consider:  Does your operationalization truly reflect what you’re measuring? Validation Never-ending process
  • 25.
    Obtaining Validity: Based on Content  Content validity: judgment of the degree to which items adequately represent a construct’s domain.  Do items appear to represent the thing you’re trying to measure? (face validity)  Does your measure exclude any important parts of what you’re trying to measure?  Does your test measure something besides what you wanted? (i.e., include irrelevant items)
  • 26.
    Obtaining Validity: Based on Internal Structure  Some constructs are multidimensional and need measures that address all dimensions Homogeneity—degree to which a set of items measure a single construct  Item-to-total correlation  Coefficient alpha
  • 27.
    Obtaining Validity: Basedon Relations to Other Variables  Criterion-related validity: degree to which scores predict or relate to an already established test Two types of criterion validity:  Predictive: using your measure to predict future performance  Concurrent: using your measure to predict current performance on the same construct, or a related one.
  • 28.
    Obtaining Validity: Basedon Relations to Other Variables  Convergent validity: relationship between your measure and other measures of that same construct Discriminant validity: evidence that scores from your measure are NOT similar to scores of tests on different constructs.
  • 29.
    Appropriate Use ofReliability and Validity Info  Reliability and validity info apply to the measure of interest in the reported sample  Situation-specific, not broad Standardized tests: norming group  If you want to use a test with a group not represented in the norming group, be cautious Report R & V for your own sample, and be wary of articles that make blanket statements about a measure’s R & V