CH. 7
 Scaling is a procedure for the assignment of
numbers (or other symbols) to a property of
objects in order to import some of the
characteristics of numbers to properties in
question
Scaling
Nominal
Scales
Ordinal
Scales
Interval
Scales
Ratio
Scales
Four Scales of Measurement;
4
Measurement and Scaling
 A scale is a mechanism by which individuals are
distinguished as to how they differ from one another on the
variables of interest.
 A scale is a continuous series of categories and has been
defined as any series of items that are arranged
progressively according to value or magnitude, into which
an item can be placed according to its quantification
 Four popular scales in business research are:
1. Nominal scales
2. Ordinal scales
3. Interval scales
4. Ratio scales
SCALES
 Nominal Scales: splits data into groups, e.g., men,
women
 Ordinal Scales: ranks data in some order, e.g.,
exercising for 20 minutes is good, for 30 minutes is
better, for 40 minutes is best
 Interval Scales: sets data on a continuum, e.g.
1 2 3 4 5
very low very high
 Ratio Scales: starts with absolute zero and indicates
proportion, e.g.
0 5 10 ten is twice as big as five
6
 A Nominal Scale is the simplest of the four
scale types and in which the numbers or letters
assigned to objects serve as labels for
identification or classification
 Example: variable of gender
 Males = 1, Females = 2
 Sales Zone A = Islamabad, Sales Zone B = Rawalpindi
 Drink A = Pepsi Cola, Drink B = 7-Up, Drink C = Miranda
Measurement and Scaling
7
 An Ordinal Scale is one that arranges objects or
alternatives according to their magnitude
 Examples:
 Career Opportunities = Moderate, Good, Excellent
 Investment Climate = Bad, inadequate, fair, good, very
good
 Merit = A grade, B grade, C grade, D grade
A problem with ordinal scales is that the difference
between categories on the scale is hard to quantify, i.e..,
excellent is better than good but how much is excellent
better?
Measurement and Scaling
8
 An Interval Scale allows us to perform certain arithmetical
operations on the data collected from respondents. This scale
measure the distance between any two points on the scale
 It taps the differences and the magnitudes of the differences in
the variable----Example:
Measurement and Scaling
9
 A Ratio Scale is a scale that possesses absolute
rather than relative qualities and has an absolute
zero point.
 Examples:
 Money
 Weight
 Distance
 Temperature on the Kelvin Scale
Interval scales allow comparisons of the differences of
magnitude (e.g. of attitudes) as well as determinations of
the actual strength of the magnitude
Measurement and Scaling
10
Type of Scale
Numerical
Operation
Descriptive
Statistics
Nominal Counting
Frequency in each
category, percentage in
each category, mode
Ordinal Rank Ordering
Median, range,
percentile ranking
Interval
Arithmetic Operations on
Intervals between
numbers
Mean, standard
deviation, variance
Ratio
Arithmetic Operations on
actual quantities
Geometric mean,
coefficient of variation
Measurement and Scaling
Nominal Numbers
Assigned
to Runners
OrdinalRank Order of
Winners
Interval Performance
Rating on a
0 to 10 Scale
Ratio Time to
Finish, in
Seconds
7 38
Third
place
Second
place
First
place
Finish
Finish
8.2 9.1 9.6
15.2 14.1 13.4
Four Scales of Measurement;
Classification of Scaling Techniques;
Scales
Nominal Ordinal
Fixed sum
Graphic
rating
Interval Ratio
Classification of Scaling Techniques;
Scales
Nominal Ordinal Interval
Likert
Semantic
differential
Numerical
Itemized rating
Staple
Ratio
Classification of Scaling Techniques;
Scales
Nominal
Dichotomous
Category
Ordinal
Fixed sum
Graphic rating
Interval
Likert
Semantic differential
Numerical
Itemized rating
Staple
Ratio
Nominal scales focus on only requiring a
respondent to provide some type of
descriptor as the raw response
Example.
Please indicate your current martial status.
__Married __ Single __ Single, never married __ Widowed
Four Scales of Measurement;
Ordinal scales allow the respondent to
express “relative magnitude” between the raw
responses to a question
Example.
Which one statement best describes your opinion of an Intel PC
processor?
__ Higher than AMD’s PC processor
__ About the same as AMD’s PC processor
__ Lower than AMD’s PC processor
Four Scales of Measurement;
 Interval scales demonstrate the absolute
differences between each scale point
Example.
How likely are you to recommend the new phone to a friend?
Definitely will not Definitely will
1 2 3 4 5 6 7
Four Scales of Measurement;
Ratio scales allow for the identification of
absolute differences between each scale point,
and absolute comparisons between raw
responses
Example 1.
Please circle the number of children under 18 years of age
currently living in your household.
0 1 2 3 4 5 6 7 (if more than 7, please specify ___.)
Four Scales of Measurement;
Chapter 7
MEASUREMENT:
SCALING, RELIABILITY,
VALIDITY
Rating scales
 Have several response categories and are used to obtain
responses with regard to the object, event, or person studied.
Ranking scales
 Make comparisons between or among objects, events,
persons and extract the preferred choices and ranking
among them.
Methods of Scaling;
Measurement scales that allow a respondent to register
the degree (or amount) of a characteristic or attribute
possessed by an object directly on the scale.
Rating Scales;
1. Dichotomous scale
2. Category scale
3. Likert scale
4. Numerical scales
5. Semantic differential scale
6. Itemized rating scale
7. Constant sum scale
8. Stapel scale
9. Graphic scale
10. Consensus scale
Types of rating scales Formats:
Dichotomous scale
 Is used to obtain a Yes or No answer.
 Nominal scale
Do you own a car?
 Yes
 No
Rating Scales Formats;
Category scale
 Uses multiple items to elicit a single response.
 Nominal scale
Rating Scales Formats;
A Category rating scale which the response options
provided for a closed-ended question are labeled
with specific verbal descriptions.
Example:
Please rate car model A on each of the following
dimensions:
Poor Fair Good V. good Excellent
a) Durability [ ] [ ] [ ] [ ] [ ]
b) Fuel consumption [ ] [ ] [ ] [ ] [ ]
Rating Scales Formats;
A simple category scale with only two response categories
(or scale points) both of which are labeled.
Example:
Please rate brand A on each of the following dimensions:
poor excellent
a) Durability [ ] [ ]
b) Fuel consumption [ ] [ ]
Rating Scales Formats;
Likert scale
 Is designed to examine how strongly subjects
agree or disagree with statements on a
5-point scale.
 Interval scale
Rating Scales Formats;
The Likert Scale (Summated Ratings Scale)
 A multiple item rating scale in which the degree of an attribute
possessed by an object is determined by asking respondents to
agree or disagree with a series of positive and/or negative
statements describing the object.
 Example:
Totally
disagree Disagree Neutral Agree
Totally
agree
a) Shopping takes much longer on the Internet [ ] [ ] [ ] [ ] [ ]
b) It is a good thing that Saudi consumers have
the opportunity to buy products through the [ ] [ ] [ ] [ ] [ ]
c) Buying products over the Internet is not a
sensible thing to do [ ] [ ] [ ] [ ] [ ]
Attitude toward buying from the Internet
Rating Scales Formats;
Likert scale
My work is very interesting
 Strongly disagree
 Disagree
 Neither agree nor disagree
 Agree
 Strongly agree
Rating Scales Formats;
Semantic differential scale
 Several bipolar attributes are identified at the
extremes of the scale, and respondents are asked to
indicate their attitudes.
 Interval scale
Rating Scales Formats;
A Semantic Differential rating scale in which bipolar adjectives
are placed at both ends (or poles) of the scale, and response
options are expressed as “semantic” space.
Example:
Please rate car model A on each of the following dimensions:
Durable ---:-X-:---:---:---:---:--- Not durable
Low fuel consumption ---:---:---:---:---:-X-:--- High fuel consumption
Rating Scales Formats;
Numerical scale
 Similar to the semantic differential scale, with the difference
that numbers on a 5-point or 7-point scale are provided, with
bipolar adjectives at both ends.
 Interval scale
Poor Excellent
Durability 1 2 3 4 5 6 7
Durable Not Durable
Durability 1 2 3 4 5 6 7
Rating Scales Formats;
Itemized rating scale
 A 5-point or 7-point scale with anchors, as needed, is
provided for each item and the respondent states the
appropriate number on the side of each item, or circles the
relevant number against each item.
 Interval scale
I will be changing my job within the next 12 months
1 2 3 4 5
Very Unlikely Unlikely Neither Unlikely Likely Very Likely
Nor Likely
Rating Scales Formats;
Fixed or constant sum scale
 The respondents are here asked to distribute a given number
of points across various items.
 Ordinal scale
Rating Scales Formats;
 A Constant-Sum rating scale in which respondents divide a
constant sum among different attributes of an object (usually to
indicate the relative importance of each attribute).
 Assumed to have ratio level properties.
 Example: Divide 100 points among the following dimensions to
indicate their level of importance to you when you purchase a
car:
Durability
Fuel Consumption
Total 100
Rating Scales Formats;
Stapel scale
 This scale simultaneously measure both the direction and
intensity of the attitude toward the items under study.
 A simplified version of the semantic differential scale in which
a single adjective or descriptive phrase is used instead of
bipolar adjectives.
 Interval data
Model A
-3 -2 -1 Durable Car 1 2 3
-3 -2 -1 Good Fuel Conaumption 1 2 3
Rating Scales Formats;
The Stapel scale is a unipolar rating scale with ten categories
numbered from -5 to +5, without a neutral point (zero). This scale
is usually presented vertically.
SEARS
+5 +5
+4 +4
+3 +3
+2 +2X
+1 +1
HIGH QUALITY POOR SERVICE
-1 -1
-2 -2
-3 -3
-4X -4
-5 -5
The data obtained by using a Stapel scale can be analyzed in the
same way as semantic differential data.
Rating Scales Formats;
Graphic rating scale
 A graphical representation helps the respondents to indicate
on this scale their answers to particular question by placing a
mark at the appropriate point on the line.
 Rating scales in which respondents rate an object on a
graphic continuum, usually a straight line.
 Modified versions are the ladder scale and happy face scale.
 Ordinal scale
Rating Scales Formats;
Graphic Rating Scales
Rating Scales Formats;
Graphic Rating Scales
Rating Scales Formats;
Graphic Rating Scales
Rating Scales Formats;
Paired Comparison
 Used when, among a small number of objects, respondents are
asked to choose between two objects at a time.
Example; Choose any combination
Package -A 512 kbps 8 GB Rs: 750
Package -B 1 Mbps 8 GB Rs: 850
Package -C 512 Kbps 12 GB Rs: 900
Package -D 1 Mbps 12 GB Rs: 1000
Rating Scales Formats;
Paired Comparison
 Used when, among a small number of objects, respondents are
asked to choose between two objects at a time.
Example; Choose any combination
Package -A 512 kbps 8 GB Rs: 750
Package -B 1 Mbps 8 GB Rs: 850
Package -C 512 Kbps 12 GB Rs: 900
Package -D 1 Mbps 12 GB Rs: 1000
Rating Scales Formats;
Paired Comparison
 Used when, among a small number of objects, respondents are
asked to choose between two objects at a time.
Example; Choose any combination
Package -A 512 kbps 8 GB Rs: 750
Package -B 1 Mbps 8 GB Rs: 850
Package -C 512 Kbps 12 GB Rs: 900
Package -C 1 Mbps 12 GB Rs: 1000
Rating Scales Formats;
Paired Comparison
 Used when, among a small number of objects, respondents are
asked to choose between two objects at a time.
Example; Choose any combination
Package -A 512 kbps 8 GB Rs: 750
Package -B 1 Mbps 8 GB Rs: 850
Package -C 512 Kbps 12 GB Rs: 900
Package -D 1 Mbps 12 GB Rs: 1000
Rating Scales Formats;
Ranking Scales Formats;
Forced Choice
 Enable respondents to rank objects relative to one another,
among the alternatives provided.
Ranking Scales Formats;
Forced Choice
Ranking Scales Formats;
Comparative Scale
 Provides a benchmark or a point of reference to assess
attitudes toward the current object, event, or situation under
study.
Ranking Scales Formats;
Comparative Scale
Ranking Scales Formats;
Hard to attach a verbal
explanation to response
Visual impact, easy for
poor readers
Choose a visual picture8. Graphic scale-picture
response
No standard answersVisual impact, unlimited
scale points
Choose a point on a
continuum
7. Graphic scale
Endpoints are
numerical, not verbal.
Easier to construct than
semantic differential
Choose point on scale
with 1 center adjective
6. Stapel scale
Difficult for respondents
with low education
levels
Scale approximates an
interval measure
Divide a construct sum
among response
alternatives
5. Constant sum scale
Bipolar adjectives must
be found, data may be
ordinal, not interval
Easy to construct, norms
exist for comparison, e.g.
profile analysis
Choose points between
bipolar adjectives on
relative dimensions
4. Semantic differential
and numerical scales
Hard to judge what a
single score means
Easiest scale to
construct
Evaluate statements on
a 5-point scale
3. Likert scale
Ambiguous items, few
categories, only gross
distinction.
Flexible, easy to respondIndicate a response
category
2.Category scale
DisadvantagesAdvantagesSubject must:Rating Scale
Characteristics Different Types of Rating Scales
Goodness of Measures
Goodness of Measures;
Understanding Validity and Reliability
Situation 2 Situation 3Situation 1
Neither Reliable
nor Valid
Highly Reliable
nor Not Valid
Highly Reliable
and Valid
Illustrations of Possible Reliability and Validity Situations in
Measurement
Figure 8.1
Reliability
(accuracy in
measurement)
Validity
(are we
measuring
the right
thing?)
Goodness
of data
Test-retest reliability
Parallel-form reliability
Interitem consistency reliability
Split-half reliability
Stability
Consistency
Face validity
Logical validity
(content)
Congruent validity
(construct)
Convergent Discriminant
Criterion-related
validity
Predictive Concurrent
Testing Goodness of Measures: Forms of Reliability and Validity.
Goodness of Measures
 It is important to make sure that the instrument that we develop to
measure a particular concept is indeed accurately measuring the
variable, and that in fact, we are actually measuring the concept
that we set out to measure.
 This ensures that in operationally defining perceptual and
attitudinal variables, we have not overlooked some important
dimensions and elements or included some irrelevant ones.
Goodness of Measures;
Item Analysis
 Item analysis is done to see if the items in the instrument belong
there or not.
 Each item is examined for its ability to discriminate between those
subjects whose total scores are high, and those will low scores.
 In item analysis, the means between the high-score group and the
low-score group are tested to detect significant differences
through the t-values.
 The items with a high t-value (test which is able to identify the
highly discriminating items in the instrument) are then included in
the instrument.
Goodness of Measures;
Reliability
 The reliability of a measure indicates the extent to which it
is without bias (error free) and hence ensures consistent
measurement across time and across the various items in
the instrument.
 In other words, the reliability of a measure is an indication
of the stability and consistency with which the instrument
measures the concept and helps to assess the “goodness”
of a measure.
Goodness of Measures;
Stability of Measures
 The ability of a measure to remain the same over time —despite
uncontrollable testing conditions or the state of the respondents
themselves—is indicative of its stability and low vulnerability to
changes in the situation.
 This attests to its “goodness” because the concept is stably
measured, no matter when it is done. Two tests of stability are
test-retest reliability and parallel-form reliability.
Goodness of Measures;
Reliability
(accuracy in
measurement)
Validity
(are we
measuring
the right
thing?)
Goodness
of data
Test-retest reliability
Parallel-form reliability
Interitem consistency reliability
Split-half reliability
Stability
Consistency
Face validity
Logical validity
(content)
Congruent validity
(construct)
Convergent Discriminant
Criterion-related
validity
Predictive Concurrent
Testing Goodness of Measures: Forms of Reliability and Validity.
Test-Retest Reliability
 The reliability coefficient obtained with a repetition of the same
measure on a second occasion is called test-retest reliability.
 That is, when a questionnaire is administered to a set of
respondents now, and again to the same respondents, says
several weeks to 6 months later, then the correlation between
the scores obtained at the two different times from one and the
same set of respondents is called the test-retest coefficient.
 The higher it is, the better the test-retest reliability, and
consequently, the stability of the measure across time.
Goodness of Measures;
Parallel-Form Reliability
 When responses on two comparable sets of measures tapping
the same construct are highly correlated, we have parallel-form
reliability.
 Both forms have similar items and the same response format, the
only changes being the wordings and the order or sequence of
the questions.
 What we try to establish here is the error variability resulting
from wording and ordering of the questions.
 If two such comparable forms are highly correlated the measures
are reasonably reliable.
Goodness of Measures;
Inter item Consistency Reliability
 This is a test of the consistency of respondents’ answers to all
the items in a measure.
 To the degree that items are independent measures of the same
concept, they will be correlated with one another.
 The most popular test of inter item consistency reliability is the
Cronbach’s coefficient alpha (Cronbach’s alpha; Cronbach,
1946), which is used for multipoint-scaled items, and the Kuder-
Richardson formulas (Kuder & Richardson, 1937), used for
dichotomous items.
 The higher the coefficients, the better the measuring instrument.

Goodness of Measures;
Split-Half Reliability
 Split-half reliability reflects the correlations between two halves
of an instrument.
 The estimates would vary depending on how the items in the
measure are split into two halves.
 Split-half reliabilities could be higher than Cronbach’s alpha only
in the circumstance of there being more than one underlying
response dimension tapped by the measure and when certain
other conditions are met as well.
 Hence, in almost all cases, Cronbach’s alpha can be considered
a perfectly adequate index of the interitem consistency reliability.

Goodness of Measures;
Understanding Validity and Reliability
5. Validity
Several types of validity tests are used to test the goodness of measures and
writers use different terms to denote them. For the sake of clarity, we may
group validity tests under three broad headings: content validity,
criterion-related validity, and construct validity.
5.1 Content Validity
Content validity ensures that the measure includes an adequate and
representative set of items that tap the concept. The more the scale items
represent the domain or universe of the concept being measured, the greater
the content validity. To put it differently, content validity is a function of how
well the dimensions and elements of a concept have been delineated.
Face validity is considered by some as a basic and a very minimum index of
content validity. Face validity indicates that the items that are intended to
measure a concept, do on the face of it look like they measure the concept.
Goodness of Measures;
Criterion-Related Validity
 Criterion-related validity is established when the measure differentiates
individuals on a criterion it is expected to predict. This can be done by
establishing con-current validity or predictive validity, as explained below.
 Concurrent validity is established when the scale discriminates individuals
who are known to be different; that is, they should score differently on the
instrument as in the example that follows.
Goodness of Measures;
5.3 Construct Validity
Construct validity testifies to how well the results obtained from the use of the
measure fit the theories around which the test is designed. This is assessed
through convergent and discriminant validity, which are explained below.
Convergent validity is established when the scores obtained with two different
instruments measuring the same concept are highly correlated.
Discriminant Validity is established when, based on theory, two variables are
predicted to be uncorrelated, and the scores obtained by measuring them are
indeed empirically found to be so.
Goodness of Measures;
Thanks
Chapter 9:
Measurement: Scaling, Reliability, Validity
Table 9.1 Types of Validity
Validity Description
Content validity Does the measure adequately measure the concept?
Face validity Do “experts” validate that the instrument measures what its
name suggests it measure?
Criterion-related validity Does the measure differentiate in a manner that helps to
predict a criterion variable?
Concurrent validity Does the measure differentiate in a manner that helps to
predict a criterion variable currently?
Predictive validity Does the measure differentiate individuals in a manner as to
help predict a future criterion?
Construct validity Does the instrument tap the concept as theorized?
Convergent validity Does the measure have low correlation with a variable
That is supposed to be unrelated to this variable?
Reliability
 Indicates the extent to which it is without bias (error
free) and hence ensures consistent measurement
across time and across the various items in the
instrument.
Goodness of Measures
Stability of measures:
 Test-retest reliability
 Parallel-form reliability
 Correlation
Internal consistency of measures:
 Interitem consistency reliability
 Cronbach’s alpha
 Split-half reliability
 Correlation
Goodness of Measures-Reliability
Validity
 Ensures the ability of a scale to measure the intended concept.
 Content validity
 Criterion related validity
 Construct validity
Goodness of Measures-Validity
Content validity
 Ensures that the measure includes an adequate and
representative set of items that tap the concept.
 A panel of judges
Goodness of Measures-Validity
Criterion related validity
 Is established when the measure differentiates individuals on a
criterion it is expected to predict
 Concurrent validity: established when the scale differentiates
individuals who are known to be different
 Predictive validity: indicates the ability of measuring
instrument to differentiate among individuals with reference
to future criterion
 Correlation
Goodness of Measures-Validity
Construct validity
 Testifies to how well the results obtained from the use of the
measure fit the theories around which the test is designed.
 Convergent validity: established when the scores obtained
with two different instrument measuring the same concept are
highly correlated
 Discriminant validity: established when, based on theory, two
variables are predicted to be uncorrelated, and the scores
obtained by measuring them are indeed empirically found to
be so
 Correlation, factor analysis, convergent-discriminant
techniques, multitrait-multimethod analysis
Goodness of Measures-Validity
Situation 2 Situation 3Situation 1
Neither Reliable
nor Valid
Highly Reliable
nor Not Valid
Highly Reliable
and Valid
Illustrations of Possible Reliability and Validity Situations in
Measurement
Figure 8.1
Reliability
(accuracy in
measurement)
Validity
(are we
measuring
the right
thing?)
Goodness
of data
Test-retest reliability
Parallel-form reliability
Interitem consistency reliability
Split-half reliability
Stability
Consistency
Face validity
Logical validity
(content)
Congruent validity
(construct)
Convergent Discriminant
Criterion-related
validity
Predictive Concurrent
Diagram 9.1
Testing Goodness of Measures: Forms of Reliability and Validity.

Research Method for Business chapter 7

  • 1.
  • 2.
     Scaling isa procedure for the assignment of numbers (or other symbols) to a property of objects in order to import some of the characteristics of numbers to properties in question Scaling
  • 3.
  • 4.
    4 Measurement and Scaling A scale is a mechanism by which individuals are distinguished as to how they differ from one another on the variables of interest.  A scale is a continuous series of categories and has been defined as any series of items that are arranged progressively according to value or magnitude, into which an item can be placed according to its quantification  Four popular scales in business research are: 1. Nominal scales 2. Ordinal scales 3. Interval scales 4. Ratio scales
  • 5.
    SCALES  Nominal Scales:splits data into groups, e.g., men, women  Ordinal Scales: ranks data in some order, e.g., exercising for 20 minutes is good, for 30 minutes is better, for 40 minutes is best  Interval Scales: sets data on a continuum, e.g. 1 2 3 4 5 very low very high  Ratio Scales: starts with absolute zero and indicates proportion, e.g. 0 5 10 ten is twice as big as five
  • 6.
    6  A NominalScale is the simplest of the four scale types and in which the numbers or letters assigned to objects serve as labels for identification or classification  Example: variable of gender  Males = 1, Females = 2  Sales Zone A = Islamabad, Sales Zone B = Rawalpindi  Drink A = Pepsi Cola, Drink B = 7-Up, Drink C = Miranda Measurement and Scaling
  • 7.
    7  An OrdinalScale is one that arranges objects or alternatives according to their magnitude  Examples:  Career Opportunities = Moderate, Good, Excellent  Investment Climate = Bad, inadequate, fair, good, very good  Merit = A grade, B grade, C grade, D grade A problem with ordinal scales is that the difference between categories on the scale is hard to quantify, i.e.., excellent is better than good but how much is excellent better? Measurement and Scaling
  • 8.
    8  An IntervalScale allows us to perform certain arithmetical operations on the data collected from respondents. This scale measure the distance between any two points on the scale  It taps the differences and the magnitudes of the differences in the variable----Example: Measurement and Scaling
  • 9.
    9  A RatioScale is a scale that possesses absolute rather than relative qualities and has an absolute zero point.  Examples:  Money  Weight  Distance  Temperature on the Kelvin Scale Interval scales allow comparisons of the differences of magnitude (e.g. of attitudes) as well as determinations of the actual strength of the magnitude Measurement and Scaling
  • 10.
    10 Type of Scale Numerical Operation Descriptive Statistics NominalCounting Frequency in each category, percentage in each category, mode Ordinal Rank Ordering Median, range, percentile ranking Interval Arithmetic Operations on Intervals between numbers Mean, standard deviation, variance Ratio Arithmetic Operations on actual quantities Geometric mean, coefficient of variation Measurement and Scaling
  • 11.
    Nominal Numbers Assigned to Runners OrdinalRankOrder of Winners Interval Performance Rating on a 0 to 10 Scale Ratio Time to Finish, in Seconds 7 38 Third place Second place First place Finish Finish 8.2 9.1 9.6 15.2 14.1 13.4 Four Scales of Measurement;
  • 12.
    Classification of ScalingTechniques; Scales Nominal Ordinal Fixed sum Graphic rating Interval Ratio
  • 13.
    Classification of ScalingTechniques; Scales Nominal Ordinal Interval Likert Semantic differential Numerical Itemized rating Staple Ratio
  • 14.
    Classification of ScalingTechniques; Scales Nominal Dichotomous Category Ordinal Fixed sum Graphic rating Interval Likert Semantic differential Numerical Itemized rating Staple Ratio
  • 15.
    Nominal scales focuson only requiring a respondent to provide some type of descriptor as the raw response Example. Please indicate your current martial status. __Married __ Single __ Single, never married __ Widowed Four Scales of Measurement;
  • 16.
    Ordinal scales allowthe respondent to express “relative magnitude” between the raw responses to a question Example. Which one statement best describes your opinion of an Intel PC processor? __ Higher than AMD’s PC processor __ About the same as AMD’s PC processor __ Lower than AMD’s PC processor Four Scales of Measurement;
  • 17.
     Interval scalesdemonstrate the absolute differences between each scale point Example. How likely are you to recommend the new phone to a friend? Definitely will not Definitely will 1 2 3 4 5 6 7 Four Scales of Measurement;
  • 18.
    Ratio scales allowfor the identification of absolute differences between each scale point, and absolute comparisons between raw responses Example 1. Please circle the number of children under 18 years of age currently living in your household. 0 1 2 3 4 5 6 7 (if more than 7, please specify ___.) Four Scales of Measurement;
  • 19.
  • 20.
    Rating scales  Haveseveral response categories and are used to obtain responses with regard to the object, event, or person studied. Ranking scales  Make comparisons between or among objects, events, persons and extract the preferred choices and ranking among them. Methods of Scaling;
  • 21.
    Measurement scales thatallow a respondent to register the degree (or amount) of a characteristic or attribute possessed by an object directly on the scale. Rating Scales; 1. Dichotomous scale 2. Category scale 3. Likert scale 4. Numerical scales 5. Semantic differential scale 6. Itemized rating scale 7. Constant sum scale 8. Stapel scale 9. Graphic scale 10. Consensus scale Types of rating scales Formats:
  • 22.
    Dichotomous scale  Isused to obtain a Yes or No answer.  Nominal scale Do you own a car?  Yes  No Rating Scales Formats;
  • 23.
    Category scale  Usesmultiple items to elicit a single response.  Nominal scale Rating Scales Formats;
  • 24.
    A Category ratingscale which the response options provided for a closed-ended question are labeled with specific verbal descriptions. Example: Please rate car model A on each of the following dimensions: Poor Fair Good V. good Excellent a) Durability [ ] [ ] [ ] [ ] [ ] b) Fuel consumption [ ] [ ] [ ] [ ] [ ] Rating Scales Formats;
  • 25.
    A simple categoryscale with only two response categories (or scale points) both of which are labeled. Example: Please rate brand A on each of the following dimensions: poor excellent a) Durability [ ] [ ] b) Fuel consumption [ ] [ ] Rating Scales Formats;
  • 26.
    Likert scale  Isdesigned to examine how strongly subjects agree or disagree with statements on a 5-point scale.  Interval scale Rating Scales Formats;
  • 27.
    The Likert Scale(Summated Ratings Scale)  A multiple item rating scale in which the degree of an attribute possessed by an object is determined by asking respondents to agree or disagree with a series of positive and/or negative statements describing the object.  Example: Totally disagree Disagree Neutral Agree Totally agree a) Shopping takes much longer on the Internet [ ] [ ] [ ] [ ] [ ] b) It is a good thing that Saudi consumers have the opportunity to buy products through the [ ] [ ] [ ] [ ] [ ] c) Buying products over the Internet is not a sensible thing to do [ ] [ ] [ ] [ ] [ ] Attitude toward buying from the Internet Rating Scales Formats;
  • 28.
    Likert scale My workis very interesting  Strongly disagree  Disagree  Neither agree nor disagree  Agree  Strongly agree Rating Scales Formats;
  • 29.
    Semantic differential scale Several bipolar attributes are identified at the extremes of the scale, and respondents are asked to indicate their attitudes.  Interval scale Rating Scales Formats;
  • 30.
    A Semantic Differentialrating scale in which bipolar adjectives are placed at both ends (or poles) of the scale, and response options are expressed as “semantic” space. Example: Please rate car model A on each of the following dimensions: Durable ---:-X-:---:---:---:---:--- Not durable Low fuel consumption ---:---:---:---:---:-X-:--- High fuel consumption Rating Scales Formats;
  • 31.
    Numerical scale  Similarto the semantic differential scale, with the difference that numbers on a 5-point or 7-point scale are provided, with bipolar adjectives at both ends.  Interval scale Poor Excellent Durability 1 2 3 4 5 6 7 Durable Not Durable Durability 1 2 3 4 5 6 7 Rating Scales Formats;
  • 32.
    Itemized rating scale A 5-point or 7-point scale with anchors, as needed, is provided for each item and the respondent states the appropriate number on the side of each item, or circles the relevant number against each item.  Interval scale I will be changing my job within the next 12 months 1 2 3 4 5 Very Unlikely Unlikely Neither Unlikely Likely Very Likely Nor Likely Rating Scales Formats;
  • 33.
    Fixed or constantsum scale  The respondents are here asked to distribute a given number of points across various items.  Ordinal scale Rating Scales Formats;
  • 34.
     A Constant-Sumrating scale in which respondents divide a constant sum among different attributes of an object (usually to indicate the relative importance of each attribute).  Assumed to have ratio level properties.  Example: Divide 100 points among the following dimensions to indicate their level of importance to you when you purchase a car: Durability Fuel Consumption Total 100 Rating Scales Formats;
  • 35.
    Stapel scale  Thisscale simultaneously measure both the direction and intensity of the attitude toward the items under study.  A simplified version of the semantic differential scale in which a single adjective or descriptive phrase is used instead of bipolar adjectives.  Interval data Model A -3 -2 -1 Durable Car 1 2 3 -3 -2 -1 Good Fuel Conaumption 1 2 3 Rating Scales Formats;
  • 36.
    The Stapel scaleis a unipolar rating scale with ten categories numbered from -5 to +5, without a neutral point (zero). This scale is usually presented vertically. SEARS +5 +5 +4 +4 +3 +3 +2 +2X +1 +1 HIGH QUALITY POOR SERVICE -1 -1 -2 -2 -3 -3 -4X -4 -5 -5 The data obtained by using a Stapel scale can be analyzed in the same way as semantic differential data. Rating Scales Formats;
  • 37.
    Graphic rating scale A graphical representation helps the respondents to indicate on this scale their answers to particular question by placing a mark at the appropriate point on the line.  Rating scales in which respondents rate an object on a graphic continuum, usually a straight line.  Modified versions are the ladder scale and happy face scale.  Ordinal scale Rating Scales Formats;
  • 38.
  • 39.
  • 40.
  • 41.
    Paired Comparison  Usedwhen, among a small number of objects, respondents are asked to choose between two objects at a time. Example; Choose any combination Package -A 512 kbps 8 GB Rs: 750 Package -B 1 Mbps 8 GB Rs: 850 Package -C 512 Kbps 12 GB Rs: 900 Package -D 1 Mbps 12 GB Rs: 1000 Rating Scales Formats;
  • 42.
    Paired Comparison  Usedwhen, among a small number of objects, respondents are asked to choose between two objects at a time. Example; Choose any combination Package -A 512 kbps 8 GB Rs: 750 Package -B 1 Mbps 8 GB Rs: 850 Package -C 512 Kbps 12 GB Rs: 900 Package -D 1 Mbps 12 GB Rs: 1000 Rating Scales Formats;
  • 43.
    Paired Comparison  Usedwhen, among a small number of objects, respondents are asked to choose between two objects at a time. Example; Choose any combination Package -A 512 kbps 8 GB Rs: 750 Package -B 1 Mbps 8 GB Rs: 850 Package -C 512 Kbps 12 GB Rs: 900 Package -C 1 Mbps 12 GB Rs: 1000 Rating Scales Formats;
  • 44.
    Paired Comparison  Usedwhen, among a small number of objects, respondents are asked to choose between two objects at a time. Example; Choose any combination Package -A 512 kbps 8 GB Rs: 750 Package -B 1 Mbps 8 GB Rs: 850 Package -C 512 Kbps 12 GB Rs: 900 Package -D 1 Mbps 12 GB Rs: 1000 Rating Scales Formats;
  • 45.
  • 46.
    Forced Choice  Enablerespondents to rank objects relative to one another, among the alternatives provided. Ranking Scales Formats;
  • 47.
  • 48.
    Comparative Scale  Providesa benchmark or a point of reference to assess attitudes toward the current object, event, or situation under study. Ranking Scales Formats;
  • 49.
  • 50.
    Hard to attacha verbal explanation to response Visual impact, easy for poor readers Choose a visual picture8. Graphic scale-picture response No standard answersVisual impact, unlimited scale points Choose a point on a continuum 7. Graphic scale Endpoints are numerical, not verbal. Easier to construct than semantic differential Choose point on scale with 1 center adjective 6. Stapel scale Difficult for respondents with low education levels Scale approximates an interval measure Divide a construct sum among response alternatives 5. Constant sum scale Bipolar adjectives must be found, data may be ordinal, not interval Easy to construct, norms exist for comparison, e.g. profile analysis Choose points between bipolar adjectives on relative dimensions 4. Semantic differential and numerical scales Hard to judge what a single score means Easiest scale to construct Evaluate statements on a 5-point scale 3. Likert scale Ambiguous items, few categories, only gross distinction. Flexible, easy to respondIndicate a response category 2.Category scale DisadvantagesAdvantagesSubject must:Rating Scale Characteristics Different Types of Rating Scales
  • 51.
  • 52.
  • 53.
    Situation 2 Situation3Situation 1 Neither Reliable nor Valid Highly Reliable nor Not Valid Highly Reliable and Valid Illustrations of Possible Reliability and Validity Situations in Measurement Figure 8.1
  • 54.
    Reliability (accuracy in measurement) Validity (are we measuring theright thing?) Goodness of data Test-retest reliability Parallel-form reliability Interitem consistency reliability Split-half reliability Stability Consistency Face validity Logical validity (content) Congruent validity (construct) Convergent Discriminant Criterion-related validity Predictive Concurrent Testing Goodness of Measures: Forms of Reliability and Validity.
  • 55.
    Goodness of Measures It is important to make sure that the instrument that we develop to measure a particular concept is indeed accurately measuring the variable, and that in fact, we are actually measuring the concept that we set out to measure.  This ensures that in operationally defining perceptual and attitudinal variables, we have not overlooked some important dimensions and elements or included some irrelevant ones. Goodness of Measures;
  • 56.
    Item Analysis  Itemanalysis is done to see if the items in the instrument belong there or not.  Each item is examined for its ability to discriminate between those subjects whose total scores are high, and those will low scores.  In item analysis, the means between the high-score group and the low-score group are tested to detect significant differences through the t-values.  The items with a high t-value (test which is able to identify the highly discriminating items in the instrument) are then included in the instrument. Goodness of Measures;
  • 57.
    Reliability  The reliabilityof a measure indicates the extent to which it is without bias (error free) and hence ensures consistent measurement across time and across the various items in the instrument.  In other words, the reliability of a measure is an indication of the stability and consistency with which the instrument measures the concept and helps to assess the “goodness” of a measure. Goodness of Measures;
  • 58.
    Stability of Measures The ability of a measure to remain the same over time —despite uncontrollable testing conditions or the state of the respondents themselves—is indicative of its stability and low vulnerability to changes in the situation.  This attests to its “goodness” because the concept is stably measured, no matter when it is done. Two tests of stability are test-retest reliability and parallel-form reliability. Goodness of Measures;
  • 59.
    Reliability (accuracy in measurement) Validity (are we measuring theright thing?) Goodness of data Test-retest reliability Parallel-form reliability Interitem consistency reliability Split-half reliability Stability Consistency Face validity Logical validity (content) Congruent validity (construct) Convergent Discriminant Criterion-related validity Predictive Concurrent Testing Goodness of Measures: Forms of Reliability and Validity.
  • 60.
    Test-Retest Reliability  Thereliability coefficient obtained with a repetition of the same measure on a second occasion is called test-retest reliability.  That is, when a questionnaire is administered to a set of respondents now, and again to the same respondents, says several weeks to 6 months later, then the correlation between the scores obtained at the two different times from one and the same set of respondents is called the test-retest coefficient.  The higher it is, the better the test-retest reliability, and consequently, the stability of the measure across time. Goodness of Measures;
  • 61.
    Parallel-Form Reliability  Whenresponses on two comparable sets of measures tapping the same construct are highly correlated, we have parallel-form reliability.  Both forms have similar items and the same response format, the only changes being the wordings and the order or sequence of the questions.  What we try to establish here is the error variability resulting from wording and ordering of the questions.  If two such comparable forms are highly correlated the measures are reasonably reliable. Goodness of Measures;
  • 62.
    Inter item ConsistencyReliability  This is a test of the consistency of respondents’ answers to all the items in a measure.  To the degree that items are independent measures of the same concept, they will be correlated with one another.  The most popular test of inter item consistency reliability is the Cronbach’s coefficient alpha (Cronbach’s alpha; Cronbach, 1946), which is used for multipoint-scaled items, and the Kuder- Richardson formulas (Kuder & Richardson, 1937), used for dichotomous items.  The higher the coefficients, the better the measuring instrument.  Goodness of Measures;
  • 63.
    Split-Half Reliability  Split-halfreliability reflects the correlations between two halves of an instrument.  The estimates would vary depending on how the items in the measure are split into two halves.  Split-half reliabilities could be higher than Cronbach’s alpha only in the circumstance of there being more than one underlying response dimension tapped by the measure and when certain other conditions are met as well.  Hence, in almost all cases, Cronbach’s alpha can be considered a perfectly adequate index of the interitem consistency reliability.  Goodness of Measures;
  • 64.
  • 65.
    5. Validity Several typesof validity tests are used to test the goodness of measures and writers use different terms to denote them. For the sake of clarity, we may group validity tests under three broad headings: content validity, criterion-related validity, and construct validity. 5.1 Content Validity Content validity ensures that the measure includes an adequate and representative set of items that tap the concept. The more the scale items represent the domain or universe of the concept being measured, the greater the content validity. To put it differently, content validity is a function of how well the dimensions and elements of a concept have been delineated. Face validity is considered by some as a basic and a very minimum index of content validity. Face validity indicates that the items that are intended to measure a concept, do on the face of it look like they measure the concept. Goodness of Measures;
  • 66.
    Criterion-Related Validity  Criterion-relatedvalidity is established when the measure differentiates individuals on a criterion it is expected to predict. This can be done by establishing con-current validity or predictive validity, as explained below.  Concurrent validity is established when the scale discriminates individuals who are known to be different; that is, they should score differently on the instrument as in the example that follows. Goodness of Measures;
  • 67.
    5.3 Construct Validity Constructvalidity testifies to how well the results obtained from the use of the measure fit the theories around which the test is designed. This is assessed through convergent and discriminant validity, which are explained below. Convergent validity is established when the scores obtained with two different instruments measuring the same concept are highly correlated. Discriminant Validity is established when, based on theory, two variables are predicted to be uncorrelated, and the scores obtained by measuring them are indeed empirically found to be so. Goodness of Measures;
  • 68.
  • 69.
    Chapter 9: Measurement: Scaling,Reliability, Validity Table 9.1 Types of Validity Validity Description Content validity Does the measure adequately measure the concept? Face validity Do “experts” validate that the instrument measures what its name suggests it measure? Criterion-related validity Does the measure differentiate in a manner that helps to predict a criterion variable? Concurrent validity Does the measure differentiate in a manner that helps to predict a criterion variable currently? Predictive validity Does the measure differentiate individuals in a manner as to help predict a future criterion? Construct validity Does the instrument tap the concept as theorized? Convergent validity Does the measure have low correlation with a variable That is supposed to be unrelated to this variable?
  • 70.
    Reliability  Indicates theextent to which it is without bias (error free) and hence ensures consistent measurement across time and across the various items in the instrument. Goodness of Measures
  • 71.
    Stability of measures: Test-retest reliability  Parallel-form reliability  Correlation Internal consistency of measures:  Interitem consistency reliability  Cronbach’s alpha  Split-half reliability  Correlation Goodness of Measures-Reliability
  • 72.
    Validity  Ensures theability of a scale to measure the intended concept.  Content validity  Criterion related validity  Construct validity Goodness of Measures-Validity
  • 73.
    Content validity  Ensuresthat the measure includes an adequate and representative set of items that tap the concept.  A panel of judges Goodness of Measures-Validity
  • 74.
    Criterion related validity Is established when the measure differentiates individuals on a criterion it is expected to predict  Concurrent validity: established when the scale differentiates individuals who are known to be different  Predictive validity: indicates the ability of measuring instrument to differentiate among individuals with reference to future criterion  Correlation Goodness of Measures-Validity
  • 75.
    Construct validity  Testifiesto how well the results obtained from the use of the measure fit the theories around which the test is designed.  Convergent validity: established when the scores obtained with two different instrument measuring the same concept are highly correlated  Discriminant validity: established when, based on theory, two variables are predicted to be uncorrelated, and the scores obtained by measuring them are indeed empirically found to be so  Correlation, factor analysis, convergent-discriminant techniques, multitrait-multimethod analysis Goodness of Measures-Validity
  • 76.
    Situation 2 Situation3Situation 1 Neither Reliable nor Valid Highly Reliable nor Not Valid Highly Reliable and Valid Illustrations of Possible Reliability and Validity Situations in Measurement Figure 8.1
  • 77.
    Reliability (accuracy in measurement) Validity (are we measuring theright thing?) Goodness of data Test-retest reliability Parallel-form reliability Interitem consistency reliability Split-half reliability Stability Consistency Face validity Logical validity (content) Congruent validity (construct) Convergent Discriminant Criterion-related validity Predictive Concurrent Diagram 9.1 Testing Goodness of Measures: Forms of Reliability and Validity.