Develop a Scale in 8 Steps

Agenda
• Step 1: Determine what you want to measure
• Step 2: Generate an item pool
• Step 3: Determine a response format
• Step 4: Have experts review the item pool
• Step 5: Inclusion of validation items
• Step 6: Administer items to a development sample
• Step 7: Evaluate the items
• Step 8: Optimize scale length

Step 1: Determine What You Want to
Measure
• Theory is key for clarity
– Ground the content of the scale in theories on the
construct of interest
– Limit the bounds of the construct so that it does
not drift into un-intended domains
– Specify a theoretical model to guide the scale’s
development
• Can be as simple as a well-formulated definition of the
construct being measured
• Can be as involved as a description of how the new
construct will relate to existing constructs

• Specificity is key to clarity
– Constructs relate better to each other when they
match in levels of specificity
• Do you want your measure to assess very specific
behaviors or be a more global measure of the construct?
• Actively decide the level of specificity that is
appropriate based on the intended use of the scale
– Areas to consider when actively deciding your
scale’s specificity:
• Content domain, setting, population
Measure

• Be clear about what to include
– Is your construct distinct from others?
– Does the measure match my goals for its use?
– Avoid using items that might “cross over” into a
related construct
– Be cautious of similar items that may assess very
different phenomena
• Know the frame of reference for and intended purpose
of your scale
Measure

Step 2: Generate an Item Pool
• Create and select items with the specific
measurement goal in mind
– Use your description of the scale’s purpose to guide
this process
– Each item is a test of the strength of the latent
variable
– Think creatively about the construct of interest

• Be over inclusive and redundant
– Theoretical models that guide scale development
are based on redundancy
– Content that is common across many items will
aggregate, canceling out their irrelevant aspects
– Redundancy allows you to compare items and have
a preference for one over the other
– While redundancy is most prevalent in the initial
item pool, some redundancy in the final item pool
is desirable

• How many items do you need?
– More than you plan include in the final scale
– Lots of items increases your chances of good
internal consistency
– Initial pool can be three to four times larger than
the final pool

• Starting the writing process
– Focus less on quality and more on expressing relevant ideas
– Write quickly and uncritically
– Be critical after you have 3 to 4 times as many items as you
need
Identify a variety of ways to state the central concept the scale
is intending to measure
• Paraphrase the construct of interest
• Create additional statements that get at the same idea somewhat
differently
• Seek alternative ways to express important ideas
• Try this….

Activity!
Help Kevin define his
loneliness by creating
statements that get at
the idea somewhat
differently!!
Try this by paraphrasing
the construct or
expressing the idea of
loneliness!!

• Good Items
– Unambiguous
– Targets the appropriate
reading level for the
intended sample
– Instructions should be
unnecessary
– Specific
– Avoid jargon
– Avoid asking opinions
– Avoid biased language
• Bad Items
– Exceptionally lengthy
– Unnecessarily wordy
– Multiple negatives
– Double barreled items
– Ambiguous pronoun
references
– Nonmonotonic questions

• Positively and negatively worded items
– Positively worded = Items indicating high levels
of the latent variable when endorsed
– Negatively worded = Items indicating low levels
of the latent variable when endorsed
– Purpose of including both in a scale is to avoid
acquiescence, affirmation, or agreement
– Can be confusing to respondents
– Reverse worded items can perform poorly

Why is this item poor?
• A sample item from the Attitudes Towards
Monkeys Scale (ATMS; Hilgeman & Cramer,
2006):
• I love monkeys because they are furry and
magnanimous

A sample item from the Attitudes on Statistics
Scale (ASS; Cramer & Hilgeman, 2006):
“I enjoy statistical analysis of complex models
especially when it involves homoscedasticity
and logarithmic data transformations.”

• A sample item from the Chandler Intelligence
Inventory (CI2; Chandler, 2006):
• “Not nobody is as smart as Joe Chandler.”

Step 3: Determine a Response Format
• This step should occur at the same time you
are generating items so they are compatible
• Example response formats:
– Thurstone Paired Comparison Scale

Thurstone Paired Comparison Scale
• Between each pair of things, which one is most important to
you, personally (when buying food in general)?

• Example response formats (continued):
– Guttman Scaling
-
“I think the following contains pornographic materials.”
Adult movies rated XXX
A
[Yes]
B
[Yes]
C
[Yes]
Scale Value
4
Playboy magazine [Yes] [Yes] [No] 3
Lingerie ads [Yes] [No] [No] 2
New York Times [No] [No] [No] 1
Subject

• Equally weighted items
–All items in the scale are viewed as
equivalent “detectors” of the construct of
interest
–They are imperfect indicators but can be
aggregated into an acceptably reliable scale
–Allows for a variety of response options

• Optimum number of response categories
– Variability is important
• Have lots of items
• Have lots of response options within items
– Respondents must be able to meaningfully
discriminate between options
• Ability to discriminate between items may depend on
specific wording or physical placement of the response
options
– Investigator’s ability and willingness to record a
large number of values for each item

• Optimum number of response categories
(cont’d)
– Odd or even number of response options
depends on the investigator’s purpose
• Odd = implies a central “neutral” point
• Even = forces commitment in one direction
• Neither is superior to the other

• Types of response formats
– Likert Scale
• Item is presented as a declarative statement, followed
by response options
• Response options are worded so they have roughly
equal intervals of agreement
• Used most frequently to measure opinions, attitudes,
beliefs
• Must consider how strongly you should word items in
the initial item pool

• Types of response formats (continued)
–Semantic Differential
• Used in reference to one or more stimuli which
are followed by a list of adjective pairs
representing opposite ends of a continuum
• Adjectives can be bipolar or unipolar
(depending on the intended purpose of the
scale)

• Types of response formats (continued)
– Visual analog scale
• Continuous line between a pair of descriptors
representing opposite ends of a continuum
• Respondent marks a point on the line that represents
what is being measures
• Investigator determines/assigns scores to each point
selected
• Disadvantages = marks at the same point may not mean
the same thing to different individuals
• Advantages = very sensitive and useful for measuring
construct before and after some intervening event,
prevents response bias with repeated measurements

• Item time frames
– When formatting items you need to consider what
time frame will be specified or implied by your
scale
– Not making a reference to a time frame =
implying a universal time perspective
– Choose it actively rather than passively
– Use theory to guide your decision

Step 4: Have Experts Review the Item Pool
• Ask people who are knowledgeable in the
content area to review your initial item pool
– Maximizes your content validity
– Confirms or invalidates your definition of the
phenomenon
• Have them rate how relevant they think each
item is to what you intend to measure
– Especially important if you are creating a measure
that will consist of separate scales to measure
multiple constructs

• This step parallels hypothesis testing
– Hypothesis = Your thoughts about what each item
measure
– Data (confirming or disconfirming) = Your
experts responses
• How to do it:
– Give them a working definition of the construct
– Ask them to rate the relevance of each item to the
construct as you have defined it
– Ask for comments on individual items (e.g.,
clarity, conciseness, alternative wordings)

• Experts can also offer alternative ways to
measure the construct of interest
• Final decision to include or exclude items is
your responsibility
– Experts may not understand principles of scale
construction
– Attend to their suggestions, but make your own
informed decisions about how to actually use their
advice

• Consider running a focus group
– Meet with a small group of individuals to get
detailed feedback on their opinions (~5-10 people)
– Gives you feedback from a sample that is similar
to the sample you will eventually give the scale to
– Especially important if working with “special”
samples (e.g., children, detainees, elderly)

• Things you might do in a focus group
– Identify difficult to read items and ask if the items are
confusing or difficult to read (checks reading level)
– Identify items that you are unsure of whether they
measure what you think they measure and ask
participants what the items mean to them (checks
your construct validity)
• Ask “How would you answer this statement?” and “Why
would you answer it that way?”
– For each item ask the individuals “Is this something
they (sample you will eventually use ) would say?”

Step 5: Inclusion of Validation Items
• Sometimes you may want to include
items that will determine the validity
of the final scale
–Items that might detect flaws or
problems
• May also consider including separate
measures of validity rather than
establishing your own validity items

Step 6: Administer Items
• Administer your initial pool of items along
with construct-related and validity items
• How many participants should you collect?
– Depends on the length of the scale
• Fewer items requires fewer participants
• When the ratio of participants to items is low
correlations among items can be substantially
influenced by chance factors
– Depends on how representative the development
sample is

Step 6: Administer Items
• Possible nonrepresentativeness of the
developmental sample
– Level of the attribute may be different than the
population for which the scale is intended
– Sample is qualitatively different from the target
population
• The underlying structure that emerges may be a trait of
the sample used in development

Step 7: Evaluate the Items
• Ultimate quality is a high correlation with the
true score of the latent variable
– We can make inferences about this relation by
examining the correlations among items
– Higher correlations among items  higher
individual item reliabilities
– More reliable individual items More reliable
scale
– We therefore want items to be highly
intercorrelated in a correlation matrix

• Reverse scored items
– Items may have verbal descriptors for the
response options in the same order but reverse the
numbers associated with the options
– Both the verbal descriptors and the numbers
associated always in the same order but enter
different values at the time of data entry
• Error prone and tedious method
– Reverse score the items electronically
• Easiest and least error prone method

• Item-scale correlations
– We want highly intercorrelated items so we need
each individual item to correlate substantially with
the collection of remaining items
• Two types of item-scale correlations
– Corrected = correlates the item being evaluated with
all other scale items excluding itself
– Uncorrected = correlates the item being evaluated
with all other scale items including itself
• Tells how representative the item is of the whole scale

• Item variances
– We want scale items with relatively high variances
– A development sample that is diverse with respect
to the attribute of interest will provide a range of
scores for any given item (i.e., good variance)
• Item means
– We want means close to the center of the range of
possible scores
– Means too near the extremes will have low
variances  poor correlations with other items

• Coefficient alpha
–Indicator of scale’s reliability – how
successful you’ve been
–Ranges from 0.0 to 1.0
– .70 is acceptable

Step 8: Optimize Scale Length
• Scale length effects reliability
–Alpha is influenced by the degree of
covariation among items and the number of
items in the scale
–Items with average inter-item correlations –
adding items will increase alpha, removing
items will decrease alpha
–Shorter scales are less burdensome to
participants

• Dropping bad items
– Dropping items with sufficiently lower-than-
average item correlations will raise alpha
– Retaining items with slightly below average item
correlations will actually increase alpha
• Adjusting scale length
– Use reliability analyses in SPSS to decide
– Items with lowest item-scale correlations should
be dropped first

• Adjusting scale length (continued)
– Communality (i.e., squared multiple correlation) =
extent to which the item shares variance with the
other items
• Items with low communality estimates should be
dropped
– Should see convergence across these methods
– Also consider that the reliability of alpha as an
estimate of reliability increases with the number
of items

• Split samples
– A large developmental sample may be split into
two samples
• First sample used to compute alpha, evaluate items,
adjust length, arriving at your final item set
• Second sample used to replicate findings
• Consistency across the two samples gives you
confidence in your estimates
– Problems with this
• Samples are not separated by time
• Special conditions may have applied to data collection
• Longer scale was given to the first sample

Develop a Scale in 8 Steps

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Develop a Scale in 8 Steps

Similar to Develop a Scale in 8 Steps (20)

More from Khalid Mahmood

More from Khalid Mahmood (20)

Recently uploaded

Recently uploaded (20)

Develop a Scale in 8 Steps