2. Scale Development for the Web
• …is just like scale development for paper and
pencil instrument, except…
• Diminishing response rates make shorter
scales (3-5 items) more critical
• Having catchy, interesting, easy-to-read item
content also encourages persistence with your
study
• Reduced overall instrument length affects
choices of the generality/breadth of measures
3. Scale Development Steps
• Scale/concept development and definition (literature & researcher)
• Item generation (subject matter experts)
• Item review (subject matter experts)
• Pilot test psychometrics (item variance, internal consistency)
• Cull items based on statistical and judgmental criteria (researcher;
subject matter experts)
• Secondary pilot test with initial evidence of nomothetic network
(researcher; subject matter experts)
• Preliminary analysis of validation evidence (researcher; subject
matter experts)
• Validation with experimental evidence or multi-trait, multi-method
matrix (researcher; subject matter experts)
• Publication of psychometric and validity evidence (researcher)
4. Scale/concept development and definition
(literature & researcher)
• The development of any scale should begin with
a literature review of related concepts or
constructs
• Based on ideas in the literature the researcher
should develop a definition of the new construct
to be measured
• The new construct should be defined positively
(what it is) and negatively (what it isn’t)
• The rationale for creating the new construct and
measure should be fleshed out at this time
5. Item generation (subject matter experts)
• Armed with the construct definition, a panel of experts
(faculty, students, industry experts, practitioners, etc.)
can generate an initial pool of items
• The pool should contain 5-10 times as many items as
one expects to include in the final measure
• One can use a range of brainstorming techniques to
generate item ideas
• Web surveys can be useful for collecting item ideas!
• The response format should be considered at this time
as well; depending on the construct, a Likert,
frequency, intensity, pair-choice, checklist, semantic
differential or other scale format may be suitable
6. Item review (subject matter experts)
• Generally, after an initial item generation
activity, one should using sorting techniques
to organize the items into factors or banks
• Sorting can also be used for review by new
SMEs; reviewed items can be kept, held for
editing or discarded
• Final item pool should be presented with
appropriate response format to a final set of
SMEs prior to pilot testing
7. Pilot test psychometrics (item variance, internal
consistency)
• Without worrying too much about validity
concerns at this stage, the items should be
fielded for response by a group of appropriate
participants
• Generally, a minimum of responses per item
fielded should be collected
• After item data are collected, screened, and
cleaned, calculate basic item statistics such as
mean, variance, skewness, inter-item
correlations, and internal consistency
8. Cull items based on statistical and judgmental
criteria (researcher; subject matter experts)
• Use the basic statistics to delete (or hold for
editing) those items that performed poorly
• If there is sufficient data, some preliminary work
with exploratory factor analysis can be used to
assess factor purity and make decisions about
whether a unitary or faceted scale is more
desirable
• Items with borderline statistical properties should
be considered for editing by SMEs before
completely discarding: use a combination of
statistical and judgmental criteria to decide
9. Secondary pilot test with initial evidence of
nomothetic network (researcher; subject matter
experts)
• The second pilot test will generally be on a diminished set
of items, but not necessarily the final set; there may be
rewritten items that have not been fielded before
• Field the items together with a few other related measures,
some where a strong correlation is expected and some
where no correlation is expected
• Here the demands of statistical power are stronger because
you are looking both for significant correlations with other
measures and some nil correlations as well; demonstrating
a null result requires more statistical power; consult
Cohen’s “A Power Primer” for guidance: use regression
models
10. Preliminary analysis of validation evidence
(researcher; subject matter experts)
• This is the final adjustment step prior to an actual
validity run; items can be discarded at this stage,
but any rewriting should be very minimal
• Depending upon the amount of data you have
collected and the maturity of earlier processes, it
is possible to perform confirmatory factor
analysis on these data
• The output of this stage should be a scale that is
considered final and basically ready for
publication (after the collection of another batch
of validity evidence)
11. Validation with experimental evidence or multi-
trait, multi-method matrix (researcher; subject
matter experts)
• This is the “official” validation, whose statistical results will be
reported for publication: as much care and attention to this study as
any substantive study of a research topic
• Experimental validation procedures have several merits; a
manipulated independent variable is not subject to the common
method variance critique; the choice of a manipulation must be
based in theory, hopefully the same theory that was initially used to
define the construct; experimental methods (when successful) help
allay concerns of spurious correlations with other measures
• Short of experimental evidence, another powerful strategy is the
multi-trait, multi-method matrix; it is quite challenging to find
measures captured by alternate methods; MTMM, when successful,
is good for showing how the new measure is uniquely positioned to
avoid capturing variance of unrelated measures while being related
but distinctive from similar constructs
12. Publication of psychometric and validity
evidence (researcher)
• Not many new scale developments get this far,
and there is generally a dearth of journals that
will publish validation studies
• Nonetheless, this is the sine qua non of
validation: peer review of the techniques used
to support the goodness and usability of the
new scale