Lesson 5a_Surveys and Measurement 2023.pptx

Lesson 5a: Surveys and
Measurement

Questionnaires/Surveys
Lots of research and advice on how to
conduct best surveys!

Survey Administration mode?
• Personally administered
◦ Confined to local area.
◦ Can collect all responses within a short period of time with
high response rate.
◦ Can motivate by explaining purpose and clarify doubts on
the spot.
• Mailed/Internet questionnaires:
◦ Paper and electronic
◦ No geographical or time boundaries
◦ Lower response rate
◦ Lots of concerns over response biases

Survey Design: Practical Issues
Should you use existing questions/scales versus create
your own?
How many questions to include per variable of interest?
What should be the order of questions?
Place the most important variables first or early on the
survey?
Put predictors before the criteria on the survey, or
counterbalance and analyse for differences?
Demographics should go first or last?
Should I include negatively worded questions or other attention
checks?

5
Item Formats: Likert Format
Most common!
Ask respondents to indicate a degrees of
agreement/endorsement of each individual
question/statement
Response options worded to have roughly equal
intervals
Number of response options:
Usually 5 to 7 options. More options does not improve
data quality.

6
Examples of Response Formats in
Opinion/Attitude surveys
Points on continuum
Type of Scale
1 2 3 4 5
Agreement Strongly
disagree
Disagree Neither
agree nor
disagree
Agree Strongly
agree
Satisfaction Very
dissatisfied
Dissatisfied Neither
satisfied nor
dissatisfied
Satisfied Very
satisfied
Frequency Never Seldom About half
the time
Often Always
Effectiveness Very
ineffective
Ineffective Neither
effective nor
ineffective
Effective Very
effective
Quality Very poor Poor Average Good Very good
Expectancy Much worse
than
expected
Worse than
expected
As expected Better
than
expected
Much better
than
expected
Extent To no
extent
To a small
extent
To a
moderate
extent
To a great
extent
To a very
great extent

7
Item Formats: Open-Ended
Require respondents to answer in their own words
• Advantages:
• can provide more in-depth, qualitative answers to
questions (write three reasons why you decided to
leave this job)
• may be better for info on
sensitive/taboo/controversial topics
• may allow respondents to ventilate feelings on
emotional issues (e.g., “how you really feel about the
downsizing”
• Disadvantages:
• takes a lot more time to complete, to process, and to
analyze
• people vary drastically in writing ability

Main Issues with Surveys: Advanced Topic
A. Social Desirability Bias
-tendency to give socially approved responses
B. Wording Effects
-obtaining different results depending upon how
the questions are worded
C. Response Set Tendency
- a tendency to respond to questions in a
particular way that is unrelated to the
content of the questions
D. Low Response Rates

• Example:
• University of Illinois Job Satisfaction Survey

Why worry about measurement?
Let’s assume you wanted to find out:
What is the relationship between job satisfaction (IV)
and turnover intentions (DV) among employees in
your organization
“H1: There is a negative correlation between JS and TI”
“H0: …….. “
And imagine you somehow you know each employee TRUE
levels of job satisfaction and turnover intention
Then you could plot relations or do some correlation/regression
analyses to answer your question…

Ch. 3 11
Statistical Concepts:
Correlation Examples
Positive correlations  Negative correlations

The problem!
• You don’t know what the TRUE levels of job
satisfaction and turnover intentions are, because
these are attitudes and “reside in people heads”
• Somehow, you need to measure these
• Invent some ways to estimate these; make
invisible visible
• In science, we call this “create operational
definitions” of your variables
• IMPORTANT! Your measurements will never be
perfect, there is always be some error!

13
. IV DV
Hypothesis
“True” world job satisfaction turnover intentions
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Observed world Operational definition 1 Operational definition 2
Observed Cor
co-worker rating Self-reported intention
There are 4 implied relations (arrows),
But only 1 is observed
So, to test the hypothesis, we must estimate OP1 and OP2

14
Reliability
• Definition:
– Proportion of true variance that is contained in
observed variance
– If your measurement is perfect (no error, reliability is =1),
then true corr = observed corr
– If not (reliability < 1), the true corr > observed corr
– Note that, if we know reliabilities, we can correct observed
corr to get true corr
2
2
' / x
xx 

 


15
Measuring HR Variables
• So, how would you measure…
– Happiness?
– Intelligence?
– Job satisfaction?
– Leadership potential?
– Organizational commitment?
• Would you…
– ask or survey people? observe their behaviors?
• How many questions or observations would provide a
sufficiently good measure?

16
What is a test?
• The word "test" is a procedure for obtaining a sample
of behavior…
• When you buy a car, you usually like to know its
“health” or “fitness”, so you do a standard car
inspection/test
• Complication for HR, most variables are
“dispositional”
– refer not to states of objects or people, but to tendencies to
behave in a certain ways (e.g., aggression, intelligence,
attitude)
– only a handful of variables refer to directly observable
phenomena (e.g., sales performance, turnover), and they
also need to be measured well

17
3 basic concepts in understanding
what a test is…
• A Test
– Focuses on a Particular Domain/Variable
– Is a Sample of Behavior, Products, Answers, or
Performances from that Domain.
– Permits the User to Make Inferences About the
Larger Domain of Interest
• Are scores accurate (reliable and valid)

Standardization!
Hallmark of Good Measurement
• You need to control extraneous factors so they don’t
influence observed scores (create more error)
– The assessment content is identical for all people assessed
– The administration of the assessment is identical for all
people
– The rules for assigning scores to responses are the same for
everyone (even for interviews or essays)
• Consultants get lots of projects to
develop/update/maintain various measures

Realities of Testing in Work Settings
• Often, must measure individuals on
– One occasion
– Timed conditions (30-60 min.)
– Multiple variables
• So, must use efficient methods
– Create many measurement opportunities (multiple, simple
questions)
– Objective scoring (more reliable; no judgment involved)
– Possibly, adaptive item selection
• Accept the tradeoff between efficiency (e.g., multiple
choice) and realism (e.g., work samples)

Ch. 1
20
What is “Psychometrics”?
• Branch of applied statistics focusing on test theory
– Mathematical methods used to construct, score, and validate
psychological tests
• Research published in scholarly journals
– Range from theoretical/mathematical (Psychometrika; Applied
Psychological Measurement) to applied (Personnel Psychology)
– Information about specific tests found in technical manuals and books, such
as the Mental Measurements Yearbook
• There are international published standards for test use and ethical
conduct

Ways to Describe Tests
• Performance: Maximum vs. Typical
– Maximum: What can you do?
• Grocery clerk can scan 20 items per minute when motivated
• Cognitive ability tests represent maximum performance tasks
– Typical: What do you usually do?
• Grocery clerk scans 10 items per minute on most days
• Personality tests measure typical performance – across situations
• Behavioral observations
– Examinees scored according to behaviors observed by a rater
– Used frequently in work and clinical settings (assessment centers)
• Self/observer reports
– Subjects indicate their level of agreement or preference concerning
statements reflecting, say, attitudes or behaviors (survey research)
– Response distortion is a big problem (e.g., faking on personality tests)

Test Administration Modes
• Paper and pencil (P&P)/electronic page-
turners
– Most basic; presents test items in predetermined
order
• Research showed no difference between electronic and
P&P administration regarding constructs measured
– Advantages: high control, easy data collection/entry,
low cost

• Multimedia assessments
– Innovative, higher fidelity tasks using video, audio,
and internet technology
– Advantage
• Higher face validity
• May measure variable difficult to assess by P&P (e.g., soft
skills)
• Disadvantage: high cost, fewer measurement ops (time
consuming, scoring – is difficult!)

• Multi-Stage Tests (MSTs)
– Administer groups of items (testlets) similar in content, but
different in average difficulty
– Examinees routed through test based on haw many questions
they answered correctly on the previous form
– Advantages: Multiple parallel tests can be constructed at once
via automated test assembly; forms can be reviewed before
administration; item selection (routing) is simple
– Disadvantage: Less informative (efficient) than true CAT

Multi-Stage Test (MST):
3 Stages, 12 Testlets, 6 Bins
1 2
4 6
3 5
7 10
9 12
8 11
4 Possible Paths through Test

Computerized Tests Appear
in Many Forms
• Computerized Adaptive Test (CAT)
– Items selected/tailored to each examinee to provide
max information about his/her true score; score
updated after each answer to guide selection of next
item
– Advantage: Shorter, more informative tests
– Disadvantage: Really complicated and expensive to
deploy

How do we know that the
test/scale/survey is any good?
There should be evidence of test
1. Reliability – stability, repeatability, consistency
of scores
– Ratio of true to total score variance
2. Construct validity – observed scores represent
true scores of the intended variable and not
something else
– Content, convergent, discriminant validities
27

28
Estimating Reliability
(Test-Retest)

29
Estimating Reliability:
Test-Retest Method
• 1) Administer test to group of individuals
• 2) Re-administer same test to same group
• 3) Correlate scores on two administrations
• The correlation between scores on the two
administrations is an estimate of test reliability
(aka, temporal stability)

Ch. 3
30
Potential Problems with the
Test-Retest Method
• True change in person; skill increases or decreases
– Especially likely if long time passes
• Carry-over: remember answers from first testing
– Especially if little time passed
• Practice effects (reactivity): people improve with
repeated testing
– Look up answers they think they missed
– Perhaps get coaching
• Only appropriate for stable attributes

31
(Parallel Forms)

Ch. 3
32
Alternate/Parallel Forms Method
1) Develop two tests that are as equivalent as
possible in content, format, and psychometric
properties
2) Administer both forms to a group of individuals
3) Correlate scores on alternate forms
The correlation between scores on alternate forms
is an estimate of the reliability of either form.

Ch. 3
33
Advantages of the
Alternate Forms Method
• Carry-over
– Less, because questions on forms different
– Can immediately administer second form; reduces
chance of true change
• Reactivity
– Partially controlled
– First test may influence answers on second, but effect
won’t be as strong as if same test used

Ch. 3
34
Disadvantages of the
Alternate Forms Method
• Two administrations still required
• Difficult and expensive to develop alternate forms
• Consecutive administration of forms eliminates
time lag, but can cause examinee fatigue, which
affects reliability

35
(Split-half)

36
Split-Half Methods
• 1) Administer test to a group of individuals
• 2) Divide test into halves, viewed as alternate
forms
• 3) Correlate scores on each half to get the
reliability of either half-test
• 4) Apply correction formula (Spearman-Brown) to
estimate reliability of whole test.
'
'
1
2
XX
XX
r






37
Advantages and Disadvantages of the
Split-Half Methods
• Advantages
– Requires only one test administration
– Carry-over, reactivity, true change minimized
• Disadvantages
– Many ways to form splits; yield different values
• Ordered (first/second half), odd-even (best)
– Not appropriate for speeded tests (items easy enough to
yield all perfect scores if given time)

Ch. 3
38
Coefficient Alpha:
Internal Consistency Methods
• Equals average using all possible split halves
k
2
i
i 1
2
X
2
i
2
X
xx
k
1 , where
k 1
k represents the number of test items; i = 1, ..., k
represents the variance of the i-th item
represents the variance of the whole test
r 
 

 
 
  
 
 
 
   
 
 





39
Advantages and Disadvantages of the
Internal Consistency Methods
• Advantages
– Requires only one test administration
– Carry-over, reactivity, true change minimized
• Disadvantages
– Not appropriate for speeded tests
– Require homogeneity of content

40
How to Maximize Alpha?
)
)
1
(
1 2
2
'
X
i
XX
x
n
n
p





 maximize

 


j
i
j
i
i
X x
x
x )
,
(
)
(
2
2



maximize
item covariance

Reliability Alone is Not Enough
• Does your measure have assesses what it
claims to assess?
• Does it have construct validity? This includes
showing that
– test content is representative of the domain –
content validity
– Scale scores are related to similar scores from
measures of similar variables – convergent validity
– Scale scores are unrelated to scores from irrelevant
measures
41

• Extent to which the test content
adequately represents the meaning of
the concept
• Ask a panel of experts if the items
assess what they are intended to
measure.
• Sometimes called “face validity"

• Organizational Attraction is a positive affective feeling that job
seekers hold toward an organization
1. For me, this company would be a good place to work.
2. I would not be interested in this company except as a last
resort. (R)
3. This company is attractive to me as a place for employment.
4. I am interested in learning more about this company.
5. A job at this company is very appealing to me.
1 2 3 4 5 6 7
Strongly
Disagree
Disagree Slightly
Disagree
Neither
Agree nor
Disagree
Slightly
Agree
Agree Strongly
Agree

1. Define the domain
2. Develop/sample content of measure
◦ Several items or questions that measures
the construct
3. Response scale: How study participants
will respond to item measures
◦ E.g. 5-point scale ranging from “strongly
disagree” to “strongly agree”
4. Estimate the reliability of the measure
1. Most scales have reliabilities of .7 or higher

Lesson 5a_Surveys and Measurement 2023.pptx

Recommended

Recommended

More Related Content

Similar to Lesson 5a_Surveys and Measurement 2023.pptx

Similar to Lesson 5a_Surveys and Measurement 2023.pptx (20)

Recently uploaded

Recently uploaded (20)

Lesson 5a_Surveys and Measurement 2023.pptx