SlideShare a Scribd company logo
1 of 62
Pampers Case
In an increasingly competitive diaper market, P&G’s marketing
department wanted to formulate new approaches to the
construction and marketing of Pampers to position them
effectively against Hugggies without cannibalizing Luvs. They
surveyed 300 mothers of infants. Each was given a randomly
selected brand of diaper (either Pampers, Luvs, or Huggies) and
asked to rate that diaper on nine attributes and to give her
overall preference for the brand. Preference was obtained on a
7-point Likert scale (1=not at all preferred, 7=greatly
preferred). Diaper ratings on the nine attributes were also
obtained on 7-point Likert scale (1=very unfavorable, 7=very
favorable). The study was designed so that each of the three
brands appeared 100 times. The goal of the study was to learn
which attributes of diapers were most important in influencing
purchase preference (Y). The nine attributes used in study were:
Variable
Attribute
Marketing options
X1
count per box
Desire large counts per box?
X2
price
Pay a premium price?
X3
value
Promote high value
X4
skin care
Offer high degree of skin care
X5
style
Prints/color vs. plain diapers
X6
absorbency
Regular vs. superabsorbency
X7
leakage
Narrow/tapered vs. regular crotch
X8
comfort/size
Extra padding and form-fitting gathers
X9
taping
Re-sealable tape vs. regular tape
Question (will be discussed in week 8):
If you don’t have SPSS software at home, you may be able to
download a trial version (good for 21 days) from
spss.com(software(statistics family(PASW statistics 17.0(click
“free trial” and download.
1. Run a regression analysis for brand preference that includes
all independent variables in the model, and describe how
meaningful the model is. Interpret the results for management.
6. Correlation and Regression
*
The mean, or average value, is the most commonly used
measure of central tendency. The mean, ,is given by
Where,
Xi = Observed values of the variable X
n = Number of observations (sample size)
The mode is the value that occurs most frequently. It represents
the highest peak of the distribution. The mode is a good
measure of location when the variable is inherently categorical
or has otherwise been grouped into categories.
Statistics Associated with Frequency Distribution Measures of
Location
X
=
X
i
/
n
S
i
=
1
n
X
*
The median of a sample is the middle value when the data are
arranged in ascending or descending order.
http://www.city-data.com/
Statistics Associated with Frequency Distribution Measures of
Location
*
Skewness. The tendency of the deviations from the mean to be
larger in one direction than in the other. It can be thought of as
the tendency for one tail of the distribution to be heavier than
the other.
Kurtosis is a measure of the relative peakedness or flatness of
the curve defined by the frequency distribution. The kurtosis of
a normal distribution is zero. If the kurtosis is positive, then the
distribution is more peaked than a normal distribution. A
negative value means that the distribution is flatter than a
normal distribution.
Statistics Associated with Frequency Distribution Measures of
Shape
*
Find the mean, median, mode, and range for the following list
of values
13, 18, 13, 14, 13, 16, 14, 21, 13The mean is the usual average:
(13 + 18 + 13 + 14 + 13 + 16 + 14 + 21 + 13) ÷ 9 = 15
The median is the middle value, so I'll have to rewrite the list in
order:
13, 13, 13, 13, 14, 14, 16, 18, 21There are nine
numbers in the list, so the middle one will be the (9 + 1) ÷ 2 =
10 ÷ 2 = 5th number (the median is the mean of the middle two
values if there are an even number of numbers): 13, 13, 13, 13,
14, 14, 16, 18, 21. So the median is 14.
The mode is the number that is repeated more often than any
other: 13 is the mode.
The largest value in the list is 21, and the smallest is 13, so the
range is 21 – 13 = 8.
The range measures the spread of the data. It is simply the
difference between the largest and smallest values in the
sample.
Range = Xlargest – Xsmallest.
The variance is the mean squared deviation from the mean. The
variance can never be negative. The variance is a measure of
how far a set of numbers is spread out from the mean.
http://www.mathsisfun.com/data/standard-deviation.html
Deviation the difference between the value of an observation
and the mean of the population. It is a value minus its mean: x -
meanx. Standard deviation is based on the square of the
difference. In SPSS, select Analyze, Correlate, Bivariate; click
Options; check Cross-product deviations and covariances.
The standard deviation is the square root of the variance.
Statistics Associated with Frequency Distribution Measures of
Variability
s
x
=
(
X
i
-
X
)
2
n
-
1
S
i
=
1
n
*
Statistics Associated with Frequency Distribution Measures of
VariabilityCovariance is a measure of how much the deviations
of two variables match. The equation is: cov(x,y) = SUM[(x -
meanx)(y - meany)]. In SPSS, select Analyze, Correlate,
Bivariate; click Options; check Cross-product deviations and
covariances.
Correlation is a bivariate measure of association (strength) of
the relationship between two variables. It varies from 0 (random
relationship) to 1 (perfect linear relationship) or -1 (perfect
negative linear relationship). It is usually reported in terms of
its square (r2), interpreted as percent of variance explained. For
instance, if r2 is .25, then the independent variable is said to
explain 25% of the variance in the dependent variable. In SPSS,
select Analyze, Correlate, Bivariate; check Pearson.
CorrelationPearson's r , the most common type sometimes, is
also called product-moment correlation.
Pearson's r is a measure of association which varies from -1 to
+1, with 0 indicating no relationship (random pairing of values)
and 1 indicating perfect relationship. In SPSS, select Analyze,
Correlate, Bivariate; check Pearson (the default).
Multiple RegressionThe multiple regression equation takes the
form y = b1x1 + b2x2 + ... + bnxn + c.
The b's are regression coefficients, representing the amount the
dependent variable y changes when the corresponding
independent changes 1 unit. The c is the constant, where the
regression line intercepts the y axis, representing the amount
the dependent y will be when all the independent variables are
0.
The standardized version of the b coefficients are the beta
weights, and the ratio of the beta coefficients is the ratio of the
relative predictive power of the independent variables.
Associated with multiple regression is R2, multiple correlation,
which is the percent of variance in the dependent variable
explained collectively by all of the independent variables.
How big a sample size do I need to do multiple regression ?
According to Tabachnick and Fidell (2001: 117), a rule of
thumb for testing b coefficients is to have N >= 104 + m, where
m = number of independent variables.
Another popular rule of thumb is that there must be at least 20
times as many cases as independent variables.
*
Statistics Associated with
Regression Analysis
Regression coefficient. The estimated parameter b is usually
referred to as the non-standardized regression coefficient.
Standardized regression coefficient. Also termed the beta
coefficient or beta weight is used to denote the standardized
regression coefficient.
Byx = Bxy = rxy
Sum of squared errors. The distances of all the points from the
regression line are squared and added together to arrive at the
sum of squared errors, which is a measure of total error, .
e
j
S
2
*
Conducting Regression Analysis
Plot the Scatter Diagram
A scatter diagram, or scattergram, is a plot of the values of two
variables for all the cases or observations.
The most commonly used technique for fitting a straight line to
a scattergram is the least-squares procedure.
In fitting the line, the least-squares procedure
minimizes the sum of squared errors, .
e
j
S
2
*
Determine the Strength and Significance of Association:
Significance of r with t test
t statistic. A t statistic with n - 2 degrees of freedom (in
simple regression) can be used to test the null hypothesis that
no linear relationship exists between X and Y, or H0: r = 0.One
tests the hypothesis that the correlation is zero (p = 0) using
this formula: t = [r*SQRT(n-2)]/[SQRT(1-r2)] If the computed t
value is as high or higher than the table t value, then the
researcher concludes the correlation is significant (that is,
significantly different from 0). In practice, most computer
programs compute the significance of correlation for the
researcher without need for manual methods.
T test: H0: b=0, H1 b is not equal to zero
F test: H0 R-sqr =0, H1 R-sqr is not equal to zero
Determine the Strength and Significance of Association: F test
Another, equivalent test for examining the significance of the
linear relationship between X and Y (significance of b) is the
test for the significance of the coefficient of determination.
The hypotheses in this case are:
H0: R2pop = 0
H1: R2pop > 0
F test. The F test is used to test the null hypothesis that the
coefficient of multiple determination in the population, R2pop,
is zero. This is equivalent to testing the null hypothesis, which
is the same as testing the significance of the regression model
as a whole. The test statistic has an F distribution with k and (n
- k - 1) degrees of freedom (in multiple regression), where k =
number of terms in the equation not counting the constant. F =
[R2/k]/[(1 - R2 )/(n - k - 1)].
*
Significance level: p valueIn statistics, a result is called
statistically significant if it is unlikely to have occurred by
chance.The decision is often made using the p-value (see sig. in
the table): if the p-value is less than the significance level, then
the null hypothesis is rejected. The smaller the p-value, the
more significant the result is said to be. Thus, we can say, “the
null hypothesis is rejected”.
Variables Dependent variable. The dependent variable is the
predicted variable in the regression equation.
Independent variables are the predictor variables in the
regression equation.
Dummy variables are a way of adding the values of a nominal or
ordinal variable to a regression equation. The standard approach
to modeling categorical variables is to include the categorical
variables in the regression equation by converting each level of
each categorical variable into a variable of its own, usually
coded 0 or 1. For instance, the categorical variable "region"
may be converted into dummy variables such as "East," "West,"
"North," or "South." Typically "1" means the attribute of
interest is present (ex., South = 1 means the case is from the
region South). We have to leave one of the levels out of the
regression model to avoid perfect multicollinearity (singularity;
redundancy), which will prevent a solution (for example, we
may leave out "North" to avoid singularity).
Regression with Dummy Variables
Product Usage Original Dummy Variable Code
Category Variable
Code D1 D2 D3
Nonusers............... 1 1 0 0
Light Users........... 2 0 1 0
Medium Users....... 3 0 0 1
Heavy Users.......... 4 0 0 0
i = a + b1D1 + b2D2 + b3D3
In this case, "heavy users" has been selected as a reference
category and has not been directly included in the regression
equation.
Y
*
Conducting Multiple Regression Analysis
Strength of Association
R
2
S
S
r
e
g
S
S
y
=
R2, also called multiple correlation or the coefficient of
multiple determination, is the percent of the variance in the
dependent explained uniquely or jointly by the independents.
*
Conducting Multiple Regression Analysis
Strength of Association
R
2
R
2
k
(
1
-
)
n
-
k
-
1
-
Adjusted R-Square is an adjustment for the fact that when one
has a large number of independents.
When used for the case of a few independents, R2 and adjusted
R2 will be close. When there are many independents, adjusted
R2 may be noticeably lower. Always use adjusted R2 when
comparing models with different numbers of independents.
R2 is adjusted for the number of independent variables and the
sample
size by using the following formula:
Adjusted R2 =
*
Assumptions
Normality
The error term is normally distributed.
the distribution of variables is normal. regression assumes that
the variables have normal distributions.
Linearity
The means of all these normal distributions of Y, given X, lie
on a straight line with slope b.
Absence of high multicollinearity
No outliers
*
Normality
A histogram of standardized residuals should show a roughly
normal curve.
Skewness and kurtosis can also be used to check normality of
the variables.
P-P plot: Another alternative for the same purpose is the
normal probability plot, with the observed cumulative
probabilities of occurrence of the standardized residuals on the
Y axis and of expected normal probabilities of occurrence on
the X axis, such that a 45-degree line will appear when
observed conforms to normally expected.
Normality
The error term is normally distributed.
The variance of the error term is constant.
Homoscedasticity (also spelled homoskedasticity): Lack of
homoscedasticity may mean (1) there is an interaction effect
between a measured independent variable and an unmeasured
independent variable not in the model; or (2) that some
independent variables are skewed while others are not.
The error terms are uncorrelated. In other words, the
observations have been drawn independently.
The Durbin-Watson statistic is a test to see if the assumption of
independent observations is met, which is the same as testing to
see if autocorrelation is present. As a rule of thumb, a Durbin-
Watson statistic in the range of 1.5 to 2.5 means the researcher
may reject the notion that data are autocorrelated (serially
dependent) and instead may assume independence of
observations.
Homoscedasticity:
error terms are constantNonconstant error variance
(heteroscedastivity) can indicate the need to respecify the
model to include omitted independent variables. Nonconstant
error variance can be observed by requesting simple residual
plots, as in the illustrations below, where "Training" as
independent is used to predict "Score" as dependent: Plot of the
dependent on the X-axis against standardized predicted values
on the Y axis. For the homoscedasticity assumption to be met,
observations should be spread about the regression line
similarly for the entire X axis. In the illustration below, which
is heteroscedastic, the spread is much narrower for low values
than for high values of the X variable, Score.
This plot shows heteroscedasticityThe variance of the error term
should be constant for all values of the independent variables.
Heteroscedasticity occurs when the variance of the error term
is not constant.
The presence of heteroscedasticity can invalidate statistical
tests of significance.
ResidualsResiduals are the difference between the observed
values and those predicted by the regression equation.
Residuals Unstandardized residuals, referenced as RESID in
SPSS, refer in a regression context to the linear difference
between the location of an observation (point) and the
regression line (or plane or surface) in multidimensional space.
Standardized residuals, of course, are residuals after they have
been constrained to a mean of zero and a standard deviation of
1. A rule of thumb is that outliers are points whose standardized
residual is greater than 3.3 (corresponding to the .001 alpha
level – confidence interval or significance level). SPSS will list
"Std. Residual" if "casewise diagnostics" is requested under the
Statistics button.
Studentized residuals are constrained only to have a standard
deviation of 1, but are not constrained to a mean of 0.
Studentized deleted residuals are residuals which have been
constrained to have a standard deviation of 1, after the standard
deviation is calculated leaving the given case out.
Multicollinearity and its problemsMulticollinearity refers to
excessive correlation of the predictor variables. When
correlation is excessive (some use the rule of thumb of r > .90),
standard errors of the b and beta coefficients become large,
making it difficult or impossible to assess the relative
importance of the predictor variables.
Multicollinearity is less important where the research purpose is
sheer prediction since the predicted values of the dependent
remain stable, but multicollinearity is a severe problem when
the research purpose includes causal modeling.
Multicollinearity can result in several problems, including:
The partial regression coefficients may not be estimated
precisely. The standard errors are likely to be high.
The magnitudes as well as the signs of the partial regression
coefficients may change from sample to sample.
It becomes difficult to assess the relative importance of the
independent variables in explaining the variation in the
dependent variable.
Predictor variables may be incorrectly included or removed in
stepwise regression.
Test of multicollinearity
Inspection of the correlation matrix reveals only bivariate
multicollinearity, with the typical criterion being bivariate
correlations > .90. To assess multivariate multicollinearity, one
uses tolerance or VIF (Variance-inflation factor)
Tolerance:As a rule of thumb, if tolerance is less than .20, a
problem with multicollinearity is indicated. In SPSS, select
Analyze, Regression, Linear; click Statistics; check Collinearity
diagnostics to get tolerance.
VIF is the variance inflation factor, which is simply the
reciprocal of tolerance. VIF >= 4 is an arbitrary but common
cut-off criterion for deciding when a given independent variable
displays "too much" multicollinearity: values above 4 suggest a
multicollinearity problem. Some researchers use the more
lenient cutoff of 5.0 or even 10.0 to signal when
multicollinearity is a problem.
Run simple regression
A simple procedure for adjusting for multicollinearity is to use
not variables with low multicollinearity.
Alternatively, the set of independent variables can be
transformed into a new set of predictors that are mutually
independent by using techniques such as principal components
analysis or factor analysis.
More specialized techniques, such as, ridge regression can also
be used.
Remedies of Multicollinearity
*
Outliers
The removal of outliers from the data set under analysis can at
times dramatically affect the performance of a regression
model. Outliers should be removed if there is reason to believe
that other variables not in the model explain why the outlier
cases are unusual -- that is, outliers may well be cases which
need a separate model. Alternatively, outliers may suggest that
additional explanatory variables need to be brought into the
model (that is, the model needs re-specification).
We can check outliers with any one of the five measures of case
influence statistics (DfBeta, standardized DfBeta, DfFit,
standardized DfFit, and the covariance ratio) and distance
measures (Mahalanobis, Cook's D, and leverage).
we can confidently use the following three ways to detect
outliers: casewise diagnostic, Mahalanobis distancem (D2), and
DfBETA.
Check Influential cases (outliers) with Influence statistics
Influence statistics in SPSS are selected under the Save button
dialog.
DfBeta, called standardized DfBeta in SPSS, measures the
change in b coefficients (measured in standard errors) due to
excluding a case from the dataset. A DfBeta coefficient is
computed for every observations. If DfBeta > 0, the case
increases the slope; if < 0, the case decreases the slope. The
case may be considered an influential outlier if |DfBeta| > 2. In
an alternative rule of thumb, a case may be an outlier if
|DfBeta|> 2/SQRT(n).
Standardized DfBeta. Once DfBeta is standardized, it is easier
to interpret. The threshold of SDFBETA is usually set at ±2
DfFit. DfFit measues how much the estimate (predicted value)
changes as a result of a particular observation being dropped
from analysis. The dfFit measure is quite similar to Cook's D.
Standardized DfFit. Once DfFit is standardized, it is easier to
interpret. A rule of thumb flags as outliers those observations
whose standardized DfBeta value is > twice the square root of
p/N, where p is the number of parameters in the model and N is
sample size.
Covariance ratio. This ratio compares the determinant of the
covariance matrix with and without inclusion of a given case.
The closer the covariance ratio approaches 1.0, the less
influential the observation.
Check influential case with Distance Measures
Distance measures in SPSS are also selected under the Save
button dialog.
Centered leverage statistic, h, also called the hat-value, is
available to identify cases which influence regression
coefficients more than others. The leverage statistic varies from
0 (no influence on the model) to almost 1 (completely
determines the model). The maximum value is (N-1)/N, where N
is sample size. A rule of thumb is that cases with leverage under
.2 are not a problem, but if a case has leverage over .5, the case
has undue leverage and should be examined for the possibility
of measurement error or the need to model such cases
separately.
Mahalabobis distance. The higher the Mahalanobis distance for
a case, the more that case's values on independent variables
diverge from average values. As a rule of thumb, the maximum
Mahalanobis distance should not exceed the critical chi-squared
value with degrees of freedom equal to number of predictors
and alpha =.001, or else outliers may be a problem in the data.
Cook's distance, D, is another measure of the influence of a
case. Observations with larger D values than the rest of the data
are those which have unusual influence or leverage. Fox (1991:
34) suggests as a cut-off for detecting influential cases, values
of D greater than 4/(N - k - 1), where N is sample size and k is
the number of independents. Others suggest D > 1 as the
criterion to constitute a strong indication of an outlier problem,
with D > 4/n the criterion to indicate a possible problem.
Casewise Diagnostics
DuPont Has Designs on Fashion
Case question for week 5:
1. After last week’s assignment of developing scales for
measuring motivation and intention, we are now looking at
reliability of the scales we have developed. How would you
determine the reliability of the scales?
2. How would you assess the validity of the scales?
3. Assuming a mall intercept is being conducted, what are the
element, sampling unit, extent and time frame for this study?
4. Assuming a mall intercept is being conducted, what
sampling technique do you recommend for this study? Why?
5. What nonresponse issues must be considered and how can
they be overcome?
5. Validity Test & Reliability Test
and
Sampling Design & Procedures
*
Improving Response Rates
Prior
Notification
Motivating
Respondents
Incentives
Questionnaire Design
and Administration
Follow-Up
Callbacks
Methods of Improving
Response Rates
Reducing
Refusals
Reducing
Not-at-Homes
Scale Evaluation
Alternative Forms
Split-half
Internal Consistency
Discriminant
Convergent
Test-retest
Content
Criterion
Construct
Generalizability
Reliability
Validity
Scale Evaluation
Validity, Reliability, and their relationshipValidity is the degree
to which a study measures what it was designed to measure. It
deals with the quality of measurement. Reliability is the extent
to which a variable is consistent in what is intended to measure,
in other words, it is the consistency, dependability, or
repeatability of measures.
Relationship between validity and reliabilityReliability does not
necessarily tell whether the measurement is measuring what is
supposed to be measured. Compared to validity, which
addresses the issue of what should be measured, reliability is
related to how it is measured. Therefore, in order to minimize
our measurement error, both reliability and validity are
examined. A measure may be reliable but not valid, but it
cannot be valid without being reliable. That is, reliability is a
necessary but not sufficient condition for validity. Reliability is
a necessary, but not sufficient, condition for validity.
ValidityContent validity addresses whether the scales
adequately measure the domain content of the construct. It is a
subjective but systematic evaluation of how well the content of
a scale represents the measurement task at hand. There is no
objective statistical test to evaluate content validity.
Researchers must carefully utilize specified theoretical
descriptions of the construct to judge content validity.
Criterion validity reflects whether a scale performs as expected
in relation to other variables selected (criterion variables) as
meaningful criteria, i.e., is the proposed measures exhibit
generally the same direction and magnitude of the correlation
with other variables of which as the measures have already
been accepted within the social science community.
The establishment of construct validity involves two major
subdomainsconvergent validitydiscriminant validity
Construct validity
Convergent validity is the extent to which the scale correlates
positively with other measures of the same construct.
To test for convergent validity, we can use Factor Analysis and
examine the factor loadings and the significance level of each
construct. When the factor loadings of intended constructs are
all higher than .50, indicating convergent validity has been
achieved. (common factor analysis – PAF) We can also use
AVE to test convergent validity.
Discriminant validity is the extent to which a measure does not
correlate with other constructs from which it is supposed to
differ. In other words, it describes the degree to which one
construct is not similar to any other construct that is
theoretically distinct.
To test for discriminant validity, CFA can be used.
Reliability
Reliability can be defined as the extent to which measures are
free from random error. Researchers must demonstrate
instruments are reliable since without reliability, research
results using the instrument are not replicable.
Reliability is estimated in one of four ways
Internal consistency
Split-half reliability
Test-retest reliability
Alternative forms
Reliability
Internal consistency reliability: estimation based on the
correlation among the variables comprising the set (typically,
Cronbach's alpha).
Split-half reliability: estimation based on the correlation of two
equivalent forms of the scale.
Test-retest reliability: Estimation based on the correlation
between two (or more) administrations of the same item, scale,
or instrument for different times, locations, or populations,
when the two administrations do not differ on other relevant
variables.
Alternative-forms reliability: two equivalent forms of the scale
are constructed and the same respondents are measured at two
different times, with a different form being used each time.
Cronbach’s alpha
Cronbach’s alpha, the coefficient of reliability, is frequently
used to measure internal consistency and stability of an
instrument (Churchill, 1979). It is the average of all possible
split-half coefficients resulting from different ways of splitting
the scale items.
Cronbach’s alpha varies from 0 to 1, and a value of 0.6 or less
generally indicates unsatisfactory internal consistency
reliability.
The Sampling Design Process
Define the Population
Determine the Sampling Frame
Select Sampling Technique(s)
Determine the Sample Size
Execute the Sampling Process
Define the Target Population
The target population is the collection of elements or objects
that possess the information sought by the researcher and about
which inferences are to be made. The target population should
be defined in terms of elements, sampling units, extent, and
time.
An element is the object about which or from which the
information is desired, e.g., the respondent.
A sampling unit is an element, or a unit containing the element,
that is available for selection at some stage of the sampling
process.
Extent refers to the geographical boundaries.
Time is the time period under consideration.
Classification of Sampling Techniques
Sampling Techniques
Nonprobability
Sampling Techniques
Probability
Sampling Techniques
Convenience
Sampling
Judgmental
Sampling
Quota
Sampling
Snowball
Sampling
Systematic
Sampling
Stratified
Sampling
Cluster
Sampling
Other Sampling
Techniques
Simple Random
Sampling
Convenience Sampling
Convenience sampling attempts to obtain a sample of
convenient elements. Often, respondents are selected because
they happen to be in the right place at the right time.
use of students, and members of social organizations
department stores using charge account lists
A Graphical Illustration of Convenience Sampling
Group D happens to assemble at a convenient time and place.
So all the elements in this Group are selected. The resulting
sample consists of elements 16, 17, 18, 19 and 20. Note, no
elements are selected from group A, B, C and E.
ABCDE16111621271217223813182349141924510152025
Judgmental Sampling
Judgmental sampling is a form of convenience sampling in
which the population elements are selected based on the
judgment of the researcher.
test markets
purchase engineers selected in industrial marketing research
expert witnesses used in court
Graphical Illustration of Judgmental Sampling
The researcher considers groups B, C and E to be typical and
convenient. Within each of these groups one or two elements
are selected based on typicality and convenience. The
resulting sample consists of elements 8, 10, 11, 13, and 24.
Note, no elements are selected
from groups A and
D.ABCDE16111621271217223813182349141924510152025
Quota Sampling
Quota sampling may be viewed as two-stage restricted
judgmental sampling.
The first stage consists of developing control categories, or
quotas, of population elements.
In the second stage, sample elements are selected based on
convenience or judgment.
Population Sample
composition composition
Control
Characteristic Percentage Percentage Number
Sex
Male 48 48 480
Female 52 52 520
____ ____ ____
100 100 1000
What sampling technique do you recommend for DuPont Case?
Quota samples are most applicable for mall intercept interviews
because they allow for more precision than regular judgmental
sampling and mall intercept interviews are inherently non-
probabilistic.
We can create control categories along age groups. For example
Age %
22–30 20
31–45 43
45–60 18
60+ 19
which represents the percentage of the sample size, which
should be obtained from each category. Respondents are
approached in the mall with the goal of achieving this age
distribution.
In this case, we also want to bias our selection in terms of
women, since they purchase most carpeting. Thus, we should
purposely target women in these age groups at a 2 to 1 ratio to
men.
A Graphical Illustration of
Quota Sampling
A quota of one element from each group, A to E, is imposed.
Within each group, one element is selected based on judgment
or convenience. The resulting sample consists of elements 3, 6,
13, 20 and 22. Note, one element is selected from each column
or group.
ABCDE16111621271217223813182349141924510152025
Snowball Sampling
In snowball sampling, an initial group of respondents is
selected, usually at random.
After being interviewed, these respondents are asked to identify
others who belong to the target population of interest.
Subsequent respondents are selected based on the referrals.
A Graphical Illustration of
Snowball Sampling
Elements 2 and 9 are selected randomly from groups A and B.
Element 2 refers elements 12 and 13. Element 9 refers
element 18. The resulting sample consists of elements 2, 9, 12,
13, and 18. Note, there are no element from group
E.ABCDE16111621271217223813182349141924510152025
Classification of Sampling Techniques
Sampling Techniques
Nonprobability
Sampling Techniques
Probability
Sampling Techniques
Convenience
Sampling
Judgmental
Sampling
Quota
Sampling
Snowball
Sampling
Systematic
Sampling
Stratified
Sampling
Cluster
Sampling
Simple Random
Sampling
Simple Random Sampling
Each element in the population has a known and equal
probability of selection.
Each possible sample of a given size (n) has a known and equal
probability of being the sample actually selected.
This implies that every element is selected independently of
every other element.
A Graphical Illustration of
Simple Random Sampling
Select five random numbers from 1 to 25. The resulting sample
consists of population elements 3, 7, 9, 16, and 24.
ABCDE16111621271217223813182349141924510152025
Systematic Sampling
The sample is chosen by selecting a random starting point and
then picking every ith element in succession from the sampling
frame.
The sampling interval, i, is determined by dividing the
population size N by the sample size n and rounding to the
nearest integer.
When the ordering of the elements is related to the
characteristic of interest, systematic sampling increases the
representativeness of the sample.
Systematic Sampling
If the ordering of the elements produces a cyclical pattern,
systematic sampling may decrease the representativeness of the
sample.
For example, there are 100,000 elements in the population
and a sample of 1,000 is desired. In this case the sampling
interval, i, is 100. A random number between 1 and 100 is
selected. If, for example, this number is 23, the sample consists
of elements 23, 123, 223, 323, 423, 523, and so on.
A Graphical Illustration of
Systematic Sampling
Select a random number between 1 to 5, say 2.
The resulting sample consists of population 2,
(2+5=) 7, (2+5x2=) 12, (2+5x3=)17, and (2+5x4=) 22. Note, all
the elements are selected from a single
row.ABCDE16111621271217223813182349141924510152025
Stratified Sampling
A two-step process in which the population is partitioned into
subpopulations, or strata.
The strata should be mutually exclusive and collectively
exhaustive in that every population element should be assigned
to one and only one stratum and no population elements should
be omitted.
Next, elements are selected from each stratum by a random
procedure, usually SRS.
A major objective of stratified sampling is to increase precision
without increasing cost.
The elements within a stratum should be as homogeneous as
possible, but the elements in different strata should be as
heterogeneous as possible.
The stratification variables should also be closely related to the
characteristic of interest.
A Graphical Illustration of
Stratified Sampling
Randomly select a number from 1 to 5
for each stratum, A to E. The resulting
sample consists of population elements
4, 7, 13, 19 and 21. Note, one element
is selected from each column.
ABCDE16111621271217223813182349141924510152025
Cluster Sampling
The target population is first divided into mutually exclusive
and collectively exhaustive subpopulations, or clusters.
Elements within a cluster should be as heterogeneous as
possible, but clusters themselves should be as homogeneous as
possible. Ideally, each cluster should be a small-scale
representation of the population.
Then a random sample of clusters is selected, based on a
probability sampling technique such as SRS.
For each selected cluster, either all the elements are included in
the sample (one-stage) or a sample of elements is drawn
probabilistically (two-stage).
A Graphical Illustration of
Cluster Sampling (2-Stage)
Randomly select 3 clusters, B, D and E.
Within each cluster, randomly select one
or two elements. The resulting sample
consists of population elements 7, 18, 20, 21, and 23. Note, no
elements are selected from clusters A and C.
ABCDE16111621271217223813182349141924510152025
Strengths and Weaknesses of
Basic Sampling Techniques
Technique
Strengths
Weaknesses
Nonprobability Sampling
Convenience sampling
Least expensive, least
time-consuming, most
convenient
Selection bias, sample not
representative, not recommended for
descriptive or causal research
Judgmental sampling
Low cost, convenient,
not time-consuming
Does not allow generalization,
subjective
Quota sampling
Sample can be controlled
for certain characteristics
Selection bias, no assurance of
representativeness
Snowball sampling
Can estimate rare
characteristics
Time-consuming
Probability sampling
Simple random sampling
(SRS)
Easily understood,
results
projectable
Difficult to construct sampling
frame, expensive,
lower precision,
no assurance of
representativeness.
Systematic sampling
Can increase
representativeness,
easier to implement than
SRS, sampling frame not
necessary
Can decrease
representativeness
Stratified sampling
Include all important
subpopulations,
precision
Difficult to select relevant
stratification variables, not feasible to
stratify on many variables, expensive
Cluster sampling
Easy to implement, cost
effective
Imprecise, difficult to compute and
interpret results
Procedures for Drawing
Probability Samples
Simple Random Sampling
1. Select a suitable sampling frame
2. Each element is assigned a number from 1 to N
(pop. size)
3. Generate n (sample size) random numbers
between 1 and N
4. The numbers generated denote the elements that
should be included in the sample
Procedures for Drawing
Probability Samples
Systematic Sampling
1. Select a suitable sampling frame
2. Each element is assigned a number from 1 to N (pop. size)
3. Determine the sampling interval i:i=N/n. If i is a fraction,
round to the nearest integer
4. Select a random number, r, between 1 and i, as explained in
simple random sampling
5. The elements with the following numbers will comprise the
systematic random sample: r, r+i,r+2i,r+3i,r+4i,...,r+(n-1)i
Procedures for Drawing
Probability Samples
nh = n
h=1
H
Stratified Sampling
1. Select a suitable frame
2. Select the stratification variable(s) and the number of strata,
H
3. Divide the entire population into H strata. Based on the
classification variable, each element of the population is
assigned
to one of the H strata
4. In each stratum, number the elements from 1 to Nh (the pop.
size of stratum h)
5. Determine the sample size of each stratum, nh
6. In each stratum, select a simple random sample of size nh
Procedures for Drawing
Probability Samples
Cluster Sampling
1. Assign a number from 1 to N to each element in the
population
2. Divide the population into C clusters of which c will be
included in
the sample
3. Calculate the sampling interval i, i=N/c (round to nearest
integer)
4. Select a random number r between 1 and i, as explained in
simple
random sampling
5. Identify elements with the following numbers:
r,r+i,r+2i,... r+(c-1)i
6. Select the clusters that contain the identified elements
7. Select sampling units within each selected cluster based on
SRS
or systematic sampling
8. Remove clusters exceeding sampling interval i. Calculate new
population size N*, number of clusters to be selected C*= C-
1,
and new sampling interval i*.
Tennis' Systematic Sampling
Returns a Smash
Tennis magazine conducted a mail survey of its subscribers to
gain a better understanding of its market. Systematic sampling
was employed to select a sample of 1,472 subscribers from the
publication's domestic circulation list. If we assume that the
subscriber list had 1,472,000 names, the sampling interval
would be 1,000 (1,472,000/1,472). A number from 1 to 1,000
was drawn at random. Beginning with that number, every
1,000th subscriber was selected.
A brand-new dollar bill was included with the questionnaire as
an incentive to respondents. An alert postcard was mailed one
week before the survey. A second, follow-up, questionnaire
was sent to the whole sample ten days after the initial
questionnaire. The net effective mailing was 1,396. Six weeks
after the first mailing, 778 completed questionnaires were
returned, yielding a response rate of 56%.
Discussion question 1Discuss the advantages of convenience
samples and when it is appropriate to use them.
Convenience sampling is the least expensive and least time
consuming of all sampling techniques. The sampling units are
accessible, easy to measure, and cooperative. In spite of these
advantages, this form of sampling has serious limitations. Many
potential sources of selection bias are present, including
respondent self-selection. Convenience samples are not
representative of any definable population. Hence, it is not
theoretically meaningful to generalize to any population from a
convenience sample, and convenience samples are not
appropriate for marketing research projects involving
population inferences.Convenience samples are not
recommended for descriptive or causal research, but they can be
used in exploratory research for generating ideas, insights, or
hypotheses. Convenience samples can be used for focus groups,
pretesting questionnaires, or pilot studies. Even in these cases,
caution should be exercised in interpreting the results.
Nevertheless, this technique is sometimes used even in large
surveys.
Discussion question 2Discuss the advantages of systematic
sampling.
Systematic sampling is less costly and easier than simple
random sampling, because random selection is done only once.
Moreover, the random numbers do not have to be matched with
individual elements as in simple random sampling. Because
some lists contain millions of elements, considerable time can
be saved. This reduces the costs of sampling. If information
related to the characteristic of interest is available for the
population, systematic sampling can be used to obtain a more
representative and reliable (lower sampling error) sample than
simple random sampling. Another relative advantage is that
systematic sampling can even be used without knowledge of the
composition (elements) of the sampling frame. For example,
every ith person leaving a department store or mall can be
intercepted. For these reasons, systematic sampling is often
employed in consumer mail, telephone, and mall intercept
interviews.
Discussion question 3Discuss the uses of nonprobability and
probability sampling.
Nonprobability sampling is used in concept tests, package tests,
name tests, and copy tests, where projections to the populations
are usually not needed. In such studies, interest centers on the
proportion of the sample that gives various responses or
expresses various attitudes. Samples for these studies can be
drawn using methods such as mall intercept quota sampling. On
the other hand, probability sampling is used when there is a
need for highly accurate estimates of market share or sales
volume for the entire market. National market tracking studies,
which provide information on product category and brand usage
rates, as well as psychographic and demographic profiles of
users, use probability sampling. Studies that use probability
sampling generally employ telephone interviews. Stratified and
systematic sampling are combined with some form of random-
digit dialing to select the respondents.
Pampers CaseIn an increasingly competitive diaper market, P&G’.docx

More Related Content

Similar to Pampers CaseIn an increasingly competitive diaper market, P&G’.docx

QNT 275 Inspiring Innovation / tutorialrank.com
QNT 275 Inspiring Innovation / tutorialrank.comQNT 275 Inspiring Innovation / tutorialrank.com
QNT 275 Inspiring Innovation / tutorialrank.comBromleyz33
 
Statistics in research
Statistics in researchStatistics in research
Statistics in researchBalaji P
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxAnusuya123
 
Applied Statistics In Business
Applied Statistics In BusinessApplied Statistics In Business
Applied Statistics In BusinessAshish Nangla
 
Qnt 275 Enhance teaching / snaptutorial.com
Qnt 275 Enhance teaching / snaptutorial.comQnt 275 Enhance teaching / snaptutorial.com
Qnt 275 Enhance teaching / snaptutorial.comBaileya33
 
QNT 275 Exceptional Education - snaptutorial.com
QNT 275   Exceptional Education - snaptutorial.comQNT 275   Exceptional Education - snaptutorial.com
QNT 275 Exceptional Education - snaptutorial.comDavisMurphyB22
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)Abhimanyu Dwivedi
 
Correlation analysis in Biostatistics .pptx
Correlation analysis in Biostatistics .pptxCorrelation analysis in Biostatistics .pptx
Correlation analysis in Biostatistics .pptxHamdiMichaelCC
 
Correlation and regression impt
Correlation and regression imptCorrelation and regression impt
Correlation and regression imptfreelancer
 
parameter Estimation and effect size
parameter Estimation and effect size parameter Estimation and effect size
parameter Estimation and effect size hannantahir30
 
Factor Extraction method in factor analysis with example in R studio.pptx
Factor Extraction method in factor analysis with example in R studio.pptxFactor Extraction method in factor analysis with example in R studio.pptx
Factor Extraction method in factor analysis with example in R studio.pptxGauravRajole
 
Research Methodology Module-06
Research Methodology Module-06Research Methodology Module-06
Research Methodology Module-06Kishor Ade
 
Introduction to correlation and regression analysis
Introduction to correlation and regression analysisIntroduction to correlation and regression analysis
Introduction to correlation and regression analysisFarzad Javidanrad
 

Similar to Pampers CaseIn an increasingly competitive diaper market, P&G’.docx (20)

QNT 275 Inspiring Innovation / tutorialrank.com
QNT 275 Inspiring Innovation / tutorialrank.comQNT 275 Inspiring Innovation / tutorialrank.com
QNT 275 Inspiring Innovation / tutorialrank.com
 
Regression
RegressionRegression
Regression
 
Statistics in research
Statistics in researchStatistics in research
Statistics in research
 
Regression
RegressionRegression
Regression
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
 
Correlation
CorrelationCorrelation
Correlation
 
Applied Statistics In Business
Applied Statistics In BusinessApplied Statistics In Business
Applied Statistics In Business
 
Qnt 275 Enhance teaching / snaptutorial.com
Qnt 275 Enhance teaching / snaptutorial.comQnt 275 Enhance teaching / snaptutorial.com
Qnt 275 Enhance teaching / snaptutorial.com
 
QNT 275 Exceptional Education - snaptutorial.com
QNT 275   Exceptional Education - snaptutorial.comQNT 275   Exceptional Education - snaptutorial.com
QNT 275 Exceptional Education - snaptutorial.com
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
Correlation analysis in Biostatistics .pptx
Correlation analysis in Biostatistics .pptxCorrelation analysis in Biostatistics .pptx
Correlation analysis in Biostatistics .pptx
 
Correlation and regression impt
Correlation and regression imptCorrelation and regression impt
Correlation and regression impt
 
parameter Estimation and effect size
parameter Estimation and effect size parameter Estimation and effect size
parameter Estimation and effect size
 
2 UNIT-DSP.pptx
2 UNIT-DSP.pptx2 UNIT-DSP.pptx
2 UNIT-DSP.pptx
 
Factor Extraction method in factor analysis with example in R studio.pptx
Factor Extraction method in factor analysis with example in R studio.pptxFactor Extraction method in factor analysis with example in R studio.pptx
Factor Extraction method in factor analysis with example in R studio.pptx
 
Chapter05
Chapter05Chapter05
Chapter05
 
2. diagnostics, collinearity, transformation, and missing data
2. diagnostics, collinearity, transformation, and missing data 2. diagnostics, collinearity, transformation, and missing data
2. diagnostics, collinearity, transformation, and missing data
 
Research Methodology Module-06
Research Methodology Module-06Research Methodology Module-06
Research Methodology Module-06
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
Introduction to correlation and regression analysis
Introduction to correlation and regression analysisIntroduction to correlation and regression analysis
Introduction to correlation and regression analysis
 

More from bunyansaturnina

Your tasks will be to answer questions based on the indicators that .docx
Your tasks will be to answer questions based on the indicators that .docxYour tasks will be to answer questions based on the indicators that .docx
Your tasks will be to answer questions based on the indicators that .docxbunyansaturnina
 
Your taskYou must identify a specific, local problem (eithe.docx
Your taskYou must identify a specific, local problem (eithe.docxYour taskYou must identify a specific, local problem (eithe.docx
Your taskYou must identify a specific, local problem (eithe.docxbunyansaturnina
 
Your taskis to analyze and evaluate how various types of medi.docx
Your taskis to analyze and evaluate how various types of medi.docxYour taskis to analyze and evaluate how various types of medi.docx
Your taskis to analyze and evaluate how various types of medi.docxbunyansaturnina
 
Your task this week is to check the internet and the Common Vulner.docx
Your task this week is to check the internet and the Common Vulner.docxYour task this week is to check the internet and the Common Vulner.docx
Your task this week is to check the internet and the Common Vulner.docxbunyansaturnina
 
Your task is to take unit I Will Survive Ecosystems and Adaptations.docx
Your task is to take unit I Will Survive Ecosystems and Adaptations.docxYour task is to take unit I Will Survive Ecosystems and Adaptations.docx
Your task is to take unit I Will Survive Ecosystems and Adaptations.docxbunyansaturnina
 
Your task is to perform and document encryption of Thunderbird Email.docx
Your task is to perform and document encryption of Thunderbird Email.docxYour task is to perform and document encryption of Thunderbird Email.docx
Your task is to perform and document encryption of Thunderbird Email.docxbunyansaturnina
 
Your task is to explain the process of the juvenile justice system a.docx
Your task is to explain the process of the juvenile justice system a.docxYour task is to explain the process of the juvenile justice system a.docx
Your task is to explain the process of the juvenile justice system a.docxbunyansaturnina
 
Your task is to create a journalistic profile that focuses on a .docx
Your task is to create a journalistic profile that focuses on a .docxYour task is to create a journalistic profile that focuses on a .docx
Your task is to create a journalistic profile that focuses on a .docxbunyansaturnina
 
Your task is to evaluate the available evidence on the social, emoti.docx
Your task is to evaluate the available evidence on the social, emoti.docxYour task is to evaluate the available evidence on the social, emoti.docx
Your task is to evaluate the available evidence on the social, emoti.docxbunyansaturnina
 
Your task is to conduct research on the ways that universities p.docx
Your task is to conduct research on the ways that universities p.docxYour task is to conduct research on the ways that universities p.docx
Your task is to conduct research on the ways that universities p.docxbunyansaturnina
 
Your task is to compare and contrast two artworks  given Below.docx
Your task is to compare and contrast two artworks  given Below.docxYour task is to compare and contrast two artworks  given Below.docx
Your task is to compare and contrast two artworks  given Below.docxbunyansaturnina
 
Your task is to create a personal essay that focuses on your per.docx
Your task is to create a personal essay that focuses on your per.docxYour task is to create a personal essay that focuses on your per.docx
Your task is to create a personal essay that focuses on your per.docxbunyansaturnina
 
Your Task is to Carry out an independent research study (.docx
Your Task is to Carry out an independent research study (.docxYour Task is to Carry out an independent research study (.docx
Your Task is to Carry out an independent research study (.docxbunyansaturnina
 
Your Research Project is due this week. It must consist of1. 5 .docx
Your Research Project is due this week. It must consist of1. 5 .docxYour Research Project is due this week. It must consist of1. 5 .docx
Your Research Project is due this week. It must consist of1. 5 .docxbunyansaturnina
 
Your supervisor wants the staff to understand the importance of.docx
Your supervisor wants the staff to understand the importance of.docxYour supervisor wants the staff to understand the importance of.docx
Your supervisor wants the staff to understand the importance of.docxbunyansaturnina
 
Your supervisor has asked you to create a new entity-relationship di.docx
Your supervisor has asked you to create a new entity-relationship di.docxYour supervisor has asked you to create a new entity-relationship di.docx
Your supervisor has asked you to create a new entity-relationship di.docxbunyansaturnina
 
Your supervisor asks you to lead a team of paralegals in the office .docx
Your supervisor asks you to lead a team of paralegals in the office .docxYour supervisor asks you to lead a team of paralegals in the office .docx
Your supervisor asks you to lead a team of paralegals in the office .docxbunyansaturnina
 
Your research paper final must be written using APA style and incl.docx
Your research paper final must be written using APA style and incl.docxYour research paper final must be written using APA style and incl.docx
Your research paper final must be written using APA style and incl.docxbunyansaturnina
 
Your submission should be a PowerPoint slide presentation with 1.docx
Your submission should be a PowerPoint slide presentation with 1.docxYour submission should be a PowerPoint slide presentation with 1.docx
Your submission should be a PowerPoint slide presentation with 1.docxbunyansaturnina
 
your research must includeExecutive summaryAbstractP.docx
your research must includeExecutive summaryAbstractP.docxyour research must includeExecutive summaryAbstractP.docx
your research must includeExecutive summaryAbstractP.docxbunyansaturnina
 

More from bunyansaturnina (20)

Your tasks will be to answer questions based on the indicators that .docx
Your tasks will be to answer questions based on the indicators that .docxYour tasks will be to answer questions based on the indicators that .docx
Your tasks will be to answer questions based on the indicators that .docx
 
Your taskYou must identify a specific, local problem (eithe.docx
Your taskYou must identify a specific, local problem (eithe.docxYour taskYou must identify a specific, local problem (eithe.docx
Your taskYou must identify a specific, local problem (eithe.docx
 
Your taskis to analyze and evaluate how various types of medi.docx
Your taskis to analyze and evaluate how various types of medi.docxYour taskis to analyze and evaluate how various types of medi.docx
Your taskis to analyze and evaluate how various types of medi.docx
 
Your task this week is to check the internet and the Common Vulner.docx
Your task this week is to check the internet and the Common Vulner.docxYour task this week is to check the internet and the Common Vulner.docx
Your task this week is to check the internet and the Common Vulner.docx
 
Your task is to take unit I Will Survive Ecosystems and Adaptations.docx
Your task is to take unit I Will Survive Ecosystems and Adaptations.docxYour task is to take unit I Will Survive Ecosystems and Adaptations.docx
Your task is to take unit I Will Survive Ecosystems and Adaptations.docx
 
Your task is to perform and document encryption of Thunderbird Email.docx
Your task is to perform and document encryption of Thunderbird Email.docxYour task is to perform and document encryption of Thunderbird Email.docx
Your task is to perform and document encryption of Thunderbird Email.docx
 
Your task is to explain the process of the juvenile justice system a.docx
Your task is to explain the process of the juvenile justice system a.docxYour task is to explain the process of the juvenile justice system a.docx
Your task is to explain the process of the juvenile justice system a.docx
 
Your task is to create a journalistic profile that focuses on a .docx
Your task is to create a journalistic profile that focuses on a .docxYour task is to create a journalistic profile that focuses on a .docx
Your task is to create a journalistic profile that focuses on a .docx
 
Your task is to evaluate the available evidence on the social, emoti.docx
Your task is to evaluate the available evidence on the social, emoti.docxYour task is to evaluate the available evidence on the social, emoti.docx
Your task is to evaluate the available evidence on the social, emoti.docx
 
Your task is to conduct research on the ways that universities p.docx
Your task is to conduct research on the ways that universities p.docxYour task is to conduct research on the ways that universities p.docx
Your task is to conduct research on the ways that universities p.docx
 
Your task is to compare and contrast two artworks  given Below.docx
Your task is to compare and contrast two artworks  given Below.docxYour task is to compare and contrast two artworks  given Below.docx
Your task is to compare and contrast two artworks  given Below.docx
 
Your task is to create a personal essay that focuses on your per.docx
Your task is to create a personal essay that focuses on your per.docxYour task is to create a personal essay that focuses on your per.docx
Your task is to create a personal essay that focuses on your per.docx
 
Your Task is to Carry out an independent research study (.docx
Your Task is to Carry out an independent research study (.docxYour Task is to Carry out an independent research study (.docx
Your Task is to Carry out an independent research study (.docx
 
Your Research Project is due this week. It must consist of1. 5 .docx
Your Research Project is due this week. It must consist of1. 5 .docxYour Research Project is due this week. It must consist of1. 5 .docx
Your Research Project is due this week. It must consist of1. 5 .docx
 
Your supervisor wants the staff to understand the importance of.docx
Your supervisor wants the staff to understand the importance of.docxYour supervisor wants the staff to understand the importance of.docx
Your supervisor wants the staff to understand the importance of.docx
 
Your supervisor has asked you to create a new entity-relationship di.docx
Your supervisor has asked you to create a new entity-relationship di.docxYour supervisor has asked you to create a new entity-relationship di.docx
Your supervisor has asked you to create a new entity-relationship di.docx
 
Your supervisor asks you to lead a team of paralegals in the office .docx
Your supervisor asks you to lead a team of paralegals in the office .docxYour supervisor asks you to lead a team of paralegals in the office .docx
Your supervisor asks you to lead a team of paralegals in the office .docx
 
Your research paper final must be written using APA style and incl.docx
Your research paper final must be written using APA style and incl.docxYour research paper final must be written using APA style and incl.docx
Your research paper final must be written using APA style and incl.docx
 
Your submission should be a PowerPoint slide presentation with 1.docx
Your submission should be a PowerPoint slide presentation with 1.docxYour submission should be a PowerPoint slide presentation with 1.docx
Your submission should be a PowerPoint slide presentation with 1.docx
 
your research must includeExecutive summaryAbstractP.docx
your research must includeExecutive summaryAbstractP.docxyour research must includeExecutive summaryAbstractP.docx
your research must includeExecutive summaryAbstractP.docx
 

Recently uploaded

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 

Recently uploaded (20)

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 

Pampers CaseIn an increasingly competitive diaper market, P&G’.docx

  • 1. Pampers Case In an increasingly competitive diaper market, P&G’s marketing department wanted to formulate new approaches to the construction and marketing of Pampers to position them effectively against Hugggies without cannibalizing Luvs. They surveyed 300 mothers of infants. Each was given a randomly selected brand of diaper (either Pampers, Luvs, or Huggies) and asked to rate that diaper on nine attributes and to give her overall preference for the brand. Preference was obtained on a 7-point Likert scale (1=not at all preferred, 7=greatly preferred). Diaper ratings on the nine attributes were also obtained on 7-point Likert scale (1=very unfavorable, 7=very favorable). The study was designed so that each of the three brands appeared 100 times. The goal of the study was to learn which attributes of diapers were most important in influencing purchase preference (Y). The nine attributes used in study were: Variable Attribute Marketing options X1 count per box Desire large counts per box? X2
  • 2. price Pay a premium price? X3 value Promote high value X4 skin care Offer high degree of skin care X5 style Prints/color vs. plain diapers X6 absorbency
  • 3. Regular vs. superabsorbency X7 leakage Narrow/tapered vs. regular crotch X8 comfort/size Extra padding and form-fitting gathers X9 taping Re-sealable tape vs. regular tape Question (will be discussed in week 8): If you don’t have SPSS software at home, you may be able to download a trial version (good for 21 days) from spss.com(software(statistics family(PASW statistics 17.0(click “free trial” and download. 1. Run a regression analysis for brand preference that includes all independent variables in the model, and describe how
  • 4. meaningful the model is. Interpret the results for management. 6. Correlation and Regression * The mean, or average value, is the most commonly used measure of central tendency. The mean, ,is given by Where, Xi = Observed values of the variable X n = Number of observations (sample size) The mode is the value that occurs most frequently. It represents the highest peak of the distribution. The mode is a good measure of location when the variable is inherently categorical or has otherwise been grouped into categories. Statistics Associated with Frequency Distribution Measures of Location X = X i / n S
  • 5. i = 1 n X * The median of a sample is the middle value when the data are arranged in ascending or descending order. http://www.city-data.com/ Statistics Associated with Frequency Distribution Measures of Location * Skewness. The tendency of the deviations from the mean to be larger in one direction than in the other. It can be thought of as the tendency for one tail of the distribution to be heavier than the other.
  • 6. Kurtosis is a measure of the relative peakedness or flatness of the curve defined by the frequency distribution. The kurtosis of a normal distribution is zero. If the kurtosis is positive, then the distribution is more peaked than a normal distribution. A negative value means that the distribution is flatter than a normal distribution. Statistics Associated with Frequency Distribution Measures of Shape * Find the mean, median, mode, and range for the following list of values 13, 18, 13, 14, 13, 16, 14, 21, 13The mean is the usual average: (13 + 18 + 13 + 14 + 13 + 16 + 14 + 21 + 13) ÷ 9 = 15 The median is the middle value, so I'll have to rewrite the list in order: 13, 13, 13, 13, 14, 14, 16, 18, 21There are nine numbers in the list, so the middle one will be the (9 + 1) ÷ 2 = 10 ÷ 2 = 5th number (the median is the mean of the middle two values if there are an even number of numbers): 13, 13, 13, 13, 14, 14, 16, 18, 21. So the median is 14. The mode is the number that is repeated more often than any other: 13 is the mode. The largest value in the list is 21, and the smallest is 13, so the range is 21 – 13 = 8. The range measures the spread of the data. It is simply the difference between the largest and smallest values in the sample.
  • 7. Range = Xlargest – Xsmallest. The variance is the mean squared deviation from the mean. The variance can never be negative. The variance is a measure of how far a set of numbers is spread out from the mean. http://www.mathsisfun.com/data/standard-deviation.html Deviation the difference between the value of an observation and the mean of the population. It is a value minus its mean: x - meanx. Standard deviation is based on the square of the difference. In SPSS, select Analyze, Correlate, Bivariate; click Options; check Cross-product deviations and covariances. The standard deviation is the square root of the variance. Statistics Associated with Frequency Distribution Measures of Variability s x =
  • 9. - 1 S i = 1 n * Statistics Associated with Frequency Distribution Measures of VariabilityCovariance is a measure of how much the deviations of two variables match. The equation is: cov(x,y) = SUM[(x - meanx)(y - meany)]. In SPSS, select Analyze, Correlate, Bivariate; click Options; check Cross-product deviations and
  • 10. covariances. Correlation is a bivariate measure of association (strength) of the relationship between two variables. It varies from 0 (random relationship) to 1 (perfect linear relationship) or -1 (perfect negative linear relationship). It is usually reported in terms of its square (r2), interpreted as percent of variance explained. For instance, if r2 is .25, then the independent variable is said to explain 25% of the variance in the dependent variable. In SPSS, select Analyze, Correlate, Bivariate; check Pearson. CorrelationPearson's r , the most common type sometimes, is also called product-moment correlation. Pearson's r is a measure of association which varies from -1 to +1, with 0 indicating no relationship (random pairing of values) and 1 indicating perfect relationship. In SPSS, select Analyze, Correlate, Bivariate; check Pearson (the default). Multiple RegressionThe multiple regression equation takes the form y = b1x1 + b2x2 + ... + bnxn + c. The b's are regression coefficients, representing the amount the dependent variable y changes when the corresponding independent changes 1 unit. The c is the constant, where the regression line intercepts the y axis, representing the amount the dependent y will be when all the independent variables are 0. The standardized version of the b coefficients are the beta weights, and the ratio of the beta coefficients is the ratio of the relative predictive power of the independent variables. Associated with multiple regression is R2, multiple correlation, which is the percent of variance in the dependent variable
  • 11. explained collectively by all of the independent variables. How big a sample size do I need to do multiple regression ? According to Tabachnick and Fidell (2001: 117), a rule of thumb for testing b coefficients is to have N >= 104 + m, where m = number of independent variables. Another popular rule of thumb is that there must be at least 20 times as many cases as independent variables. * Statistics Associated with Regression Analysis Regression coefficient. The estimated parameter b is usually referred to as the non-standardized regression coefficient. Standardized regression coefficient. Also termed the beta coefficient or beta weight is used to denote the standardized regression coefficient. Byx = Bxy = rxy Sum of squared errors. The distances of all the points from the regression line are squared and added together to arrive at the sum of squared errors, which is a measure of total error, .
  • 12. e j S 2 * Conducting Regression Analysis Plot the Scatter Diagram A scatter diagram, or scattergram, is a plot of the values of two variables for all the cases or observations. The most commonly used technique for fitting a straight line to a scattergram is the least-squares procedure. In fitting the line, the least-squares procedure minimizes the sum of squared errors, . e j S 2 *
  • 13. Determine the Strength and Significance of Association: Significance of r with t test t statistic. A t statistic with n - 2 degrees of freedom (in simple regression) can be used to test the null hypothesis that no linear relationship exists between X and Y, or H0: r = 0.One tests the hypothesis that the correlation is zero (p = 0) using this formula: t = [r*SQRT(n-2)]/[SQRT(1-r2)] If the computed t value is as high or higher than the table t value, then the researcher concludes the correlation is significant (that is, significantly different from 0). In practice, most computer programs compute the significance of correlation for the researcher without need for manual methods. T test: H0: b=0, H1 b is not equal to zero F test: H0 R-sqr =0, H1 R-sqr is not equal to zero Determine the Strength and Significance of Association: F test Another, equivalent test for examining the significance of the linear relationship between X and Y (significance of b) is the test for the significance of the coefficient of determination. The hypotheses in this case are: H0: R2pop = 0 H1: R2pop > 0 F test. The F test is used to test the null hypothesis that the coefficient of multiple determination in the population, R2pop, is zero. This is equivalent to testing the null hypothesis, which is the same as testing the significance of the regression model
  • 14. as a whole. The test statistic has an F distribution with k and (n - k - 1) degrees of freedom (in multiple regression), where k = number of terms in the equation not counting the constant. F = [R2/k]/[(1 - R2 )/(n - k - 1)]. * Significance level: p valueIn statistics, a result is called statistically significant if it is unlikely to have occurred by chance.The decision is often made using the p-value (see sig. in the table): if the p-value is less than the significance level, then the null hypothesis is rejected. The smaller the p-value, the more significant the result is said to be. Thus, we can say, “the null hypothesis is rejected”. Variables Dependent variable. The dependent variable is the predicted variable in the regression equation. Independent variables are the predictor variables in the regression equation. Dummy variables are a way of adding the values of a nominal or ordinal variable to a regression equation. The standard approach to modeling categorical variables is to include the categorical variables in the regression equation by converting each level of each categorical variable into a variable of its own, usually coded 0 or 1. For instance, the categorical variable "region" may be converted into dummy variables such as "East," "West," "North," or "South." Typically "1" means the attribute of interest is present (ex., South = 1 means the case is from the region South). We have to leave one of the levels out of the regression model to avoid perfect multicollinearity (singularity;
  • 15. redundancy), which will prevent a solution (for example, we may leave out "North" to avoid singularity). Regression with Dummy Variables Product Usage Original Dummy Variable Code Category Variable Code D1 D2 D3 Nonusers............... 1 1 0 0 Light Users........... 2 0 1 0 Medium Users....... 3 0 0 1 Heavy Users.......... 4 0 0 0 i = a + b1D1 + b2D2 + b3D3 In this case, "heavy users" has been selected as a reference category and has not been directly included in the regression equation. Y * Conducting Multiple Regression Analysis Strength of Association R 2 S
  • 16. S r e g S S y = R2, also called multiple correlation or the coefficient of multiple determination, is the percent of the variance in the dependent explained uniquely or jointly by the independents. * Conducting Multiple Regression Analysis Strength of Association
  • 17. R 2 R 2 k ( 1 - ) n - k - 1 - Adjusted R-Square is an adjustment for the fact that when one has a large number of independents. When used for the case of a few independents, R2 and adjusted R2 will be close. When there are many independents, adjusted R2 may be noticeably lower. Always use adjusted R2 when comparing models with different numbers of independents. R2 is adjusted for the number of independent variables and the sample size by using the following formula: Adjusted R2 =
  • 18. * Assumptions Normality The error term is normally distributed. the distribution of variables is normal. regression assumes that the variables have normal distributions. Linearity The means of all these normal distributions of Y, given X, lie on a straight line with slope b. Absence of high multicollinearity No outliers * Normality A histogram of standardized residuals should show a roughly normal curve. Skewness and kurtosis can also be used to check normality of the variables. P-P plot: Another alternative for the same purpose is the normal probability plot, with the observed cumulative probabilities of occurrence of the standardized residuals on the Y axis and of expected normal probabilities of occurrence on the X axis, such that a 45-degree line will appear when observed conforms to normally expected.
  • 19. Normality The error term is normally distributed. The variance of the error term is constant. Homoscedasticity (also spelled homoskedasticity): Lack of homoscedasticity may mean (1) there is an interaction effect between a measured independent variable and an unmeasured independent variable not in the model; or (2) that some independent variables are skewed while others are not. The error terms are uncorrelated. In other words, the observations have been drawn independently. The Durbin-Watson statistic is a test to see if the assumption of independent observations is met, which is the same as testing to see if autocorrelation is present. As a rule of thumb, a Durbin- Watson statistic in the range of 1.5 to 2.5 means the researcher may reject the notion that data are autocorrelated (serially dependent) and instead may assume independence of observations. Homoscedasticity: error terms are constantNonconstant error variance (heteroscedastivity) can indicate the need to respecify the model to include omitted independent variables. Nonconstant error variance can be observed by requesting simple residual plots, as in the illustrations below, where "Training" as independent is used to predict "Score" as dependent: Plot of the dependent on the X-axis against standardized predicted values on the Y axis. For the homoscedasticity assumption to be met, observations should be spread about the regression line similarly for the entire X axis. In the illustration below, which
  • 20. is heteroscedastic, the spread is much narrower for low values than for high values of the X variable, Score. This plot shows heteroscedasticityThe variance of the error term should be constant for all values of the independent variables. Heteroscedasticity occurs when the variance of the error term is not constant. The presence of heteroscedasticity can invalidate statistical tests of significance. ResidualsResiduals are the difference between the observed values and those predicted by the regression equation. Residuals Unstandardized residuals, referenced as RESID in SPSS, refer in a regression context to the linear difference between the location of an observation (point) and the regression line (or plane or surface) in multidimensional space. Standardized residuals, of course, are residuals after they have been constrained to a mean of zero and a standard deviation of 1. A rule of thumb is that outliers are points whose standardized residual is greater than 3.3 (corresponding to the .001 alpha level – confidence interval or significance level). SPSS will list "Std. Residual" if "casewise diagnostics" is requested under the Statistics button. Studentized residuals are constrained only to have a standard deviation of 1, but are not constrained to a mean of 0. Studentized deleted residuals are residuals which have been constrained to have a standard deviation of 1, after the standard
  • 21. deviation is calculated leaving the given case out. Multicollinearity and its problemsMulticollinearity refers to excessive correlation of the predictor variables. When correlation is excessive (some use the rule of thumb of r > .90), standard errors of the b and beta coefficients become large, making it difficult or impossible to assess the relative importance of the predictor variables. Multicollinearity is less important where the research purpose is sheer prediction since the predicted values of the dependent remain stable, but multicollinearity is a severe problem when the research purpose includes causal modeling. Multicollinearity can result in several problems, including: The partial regression coefficients may not be estimated precisely. The standard errors are likely to be high. The magnitudes as well as the signs of the partial regression coefficients may change from sample to sample. It becomes difficult to assess the relative importance of the independent variables in explaining the variation in the dependent variable. Predictor variables may be incorrectly included or removed in stepwise regression. Test of multicollinearity Inspection of the correlation matrix reveals only bivariate multicollinearity, with the typical criterion being bivariate correlations > .90. To assess multivariate multicollinearity, one uses tolerance or VIF (Variance-inflation factor) Tolerance:As a rule of thumb, if tolerance is less than .20, a problem with multicollinearity is indicated. In SPSS, select
  • 22. Analyze, Regression, Linear; click Statistics; check Collinearity diagnostics to get tolerance. VIF is the variance inflation factor, which is simply the reciprocal of tolerance. VIF >= 4 is an arbitrary but common cut-off criterion for deciding when a given independent variable displays "too much" multicollinearity: values above 4 suggest a multicollinearity problem. Some researchers use the more lenient cutoff of 5.0 or even 10.0 to signal when multicollinearity is a problem. Run simple regression A simple procedure for adjusting for multicollinearity is to use not variables with low multicollinearity. Alternatively, the set of independent variables can be transformed into a new set of predictors that are mutually independent by using techniques such as principal components analysis or factor analysis. More specialized techniques, such as, ridge regression can also be used. Remedies of Multicollinearity * Outliers The removal of outliers from the data set under analysis can at times dramatically affect the performance of a regression model. Outliers should be removed if there is reason to believe that other variables not in the model explain why the outlier cases are unusual -- that is, outliers may well be cases which need a separate model. Alternatively, outliers may suggest that
  • 23. additional explanatory variables need to be brought into the model (that is, the model needs re-specification). We can check outliers with any one of the five measures of case influence statistics (DfBeta, standardized DfBeta, DfFit, standardized DfFit, and the covariance ratio) and distance measures (Mahalanobis, Cook's D, and leverage). we can confidently use the following three ways to detect outliers: casewise diagnostic, Mahalanobis distancem (D2), and DfBETA. Check Influential cases (outliers) with Influence statistics Influence statistics in SPSS are selected under the Save button dialog. DfBeta, called standardized DfBeta in SPSS, measures the change in b coefficients (measured in standard errors) due to excluding a case from the dataset. A DfBeta coefficient is computed for every observations. If DfBeta > 0, the case increases the slope; if < 0, the case decreases the slope. The case may be considered an influential outlier if |DfBeta| > 2. In an alternative rule of thumb, a case may be an outlier if |DfBeta|> 2/SQRT(n). Standardized DfBeta. Once DfBeta is standardized, it is easier to interpret. The threshold of SDFBETA is usually set at ±2 DfFit. DfFit measues how much the estimate (predicted value) changes as a result of a particular observation being dropped from analysis. The dfFit measure is quite similar to Cook's D. Standardized DfFit. Once DfFit is standardized, it is easier to interpret. A rule of thumb flags as outliers those observations whose standardized DfBeta value is > twice the square root of p/N, where p is the number of parameters in the model and N is sample size. Covariance ratio. This ratio compares the determinant of the
  • 24. covariance matrix with and without inclusion of a given case. The closer the covariance ratio approaches 1.0, the less influential the observation. Check influential case with Distance Measures Distance measures in SPSS are also selected under the Save button dialog. Centered leverage statistic, h, also called the hat-value, is available to identify cases which influence regression coefficients more than others. The leverage statistic varies from 0 (no influence on the model) to almost 1 (completely determines the model). The maximum value is (N-1)/N, where N is sample size. A rule of thumb is that cases with leverage under .2 are not a problem, but if a case has leverage over .5, the case has undue leverage and should be examined for the possibility of measurement error or the need to model such cases separately. Mahalabobis distance. The higher the Mahalanobis distance for a case, the more that case's values on independent variables diverge from average values. As a rule of thumb, the maximum Mahalanobis distance should not exceed the critical chi-squared value with degrees of freedom equal to number of predictors and alpha =.001, or else outliers may be a problem in the data. Cook's distance, D, is another measure of the influence of a case. Observations with larger D values than the rest of the data are those which have unusual influence or leverage. Fox (1991: 34) suggests as a cut-off for detecting influential cases, values of D greater than 4/(N - k - 1), where N is sample size and k is the number of independents. Others suggest D > 1 as the criterion to constitute a strong indication of an outlier problem, with D > 4/n the criterion to indicate a possible problem.
  • 25. Casewise Diagnostics DuPont Has Designs on Fashion Case question for week 5: 1. After last week’s assignment of developing scales for measuring motivation and intention, we are now looking at reliability of the scales we have developed. How would you determine the reliability of the scales? 2. How would you assess the validity of the scales? 3. Assuming a mall intercept is being conducted, what are the element, sampling unit, extent and time frame for this study? 4. Assuming a mall intercept is being conducted, what sampling technique do you recommend for this study? Why? 5. What nonresponse issues must be considered and how can they be overcome? 5. Validity Test & Reliability Test and Sampling Design & Procedures *
  • 26. Improving Response Rates Prior Notification Motivating Respondents Incentives Questionnaire Design and Administration Follow-Up Callbacks Methods of Improving Response Rates Reducing Refusals Reducing Not-at-Homes
  • 27. Scale Evaluation Alternative Forms Split-half Internal Consistency Discriminant Convergent Test-retest Content Criterion Construct Generalizability
  • 28. Reliability Validity Scale Evaluation Validity, Reliability, and their relationshipValidity is the degree to which a study measures what it was designed to measure. It deals with the quality of measurement. Reliability is the extent to which a variable is consistent in what is intended to measure, in other words, it is the consistency, dependability, or repeatability of measures. Relationship between validity and reliabilityReliability does not necessarily tell whether the measurement is measuring what is supposed to be measured. Compared to validity, which addresses the issue of what should be measured, reliability is related to how it is measured. Therefore, in order to minimize our measurement error, both reliability and validity are examined. A measure may be reliable but not valid, but it cannot be valid without being reliable. That is, reliability is a
  • 29. necessary but not sufficient condition for validity. Reliability is a necessary, but not sufficient, condition for validity. ValidityContent validity addresses whether the scales adequately measure the domain content of the construct. It is a subjective but systematic evaluation of how well the content of a scale represents the measurement task at hand. There is no objective statistical test to evaluate content validity. Researchers must carefully utilize specified theoretical descriptions of the construct to judge content validity. Criterion validity reflects whether a scale performs as expected in relation to other variables selected (criterion variables) as meaningful criteria, i.e., is the proposed measures exhibit generally the same direction and magnitude of the correlation with other variables of which as the measures have already been accepted within the social science community. The establishment of construct validity involves two major subdomainsconvergent validitydiscriminant validity Construct validity Convergent validity is the extent to which the scale correlates positively with other measures of the same construct. To test for convergent validity, we can use Factor Analysis and examine the factor loadings and the significance level of each construct. When the factor loadings of intended constructs are all higher than .50, indicating convergent validity has been achieved. (common factor analysis – PAF) We can also use AVE to test convergent validity.
  • 30. Discriminant validity is the extent to which a measure does not correlate with other constructs from which it is supposed to differ. In other words, it describes the degree to which one construct is not similar to any other construct that is theoretically distinct. To test for discriminant validity, CFA can be used. Reliability Reliability can be defined as the extent to which measures are free from random error. Researchers must demonstrate instruments are reliable since without reliability, research results using the instrument are not replicable. Reliability is estimated in one of four ways Internal consistency Split-half reliability Test-retest reliability Alternative forms Reliability Internal consistency reliability: estimation based on the correlation among the variables comprising the set (typically, Cronbach's alpha). Split-half reliability: estimation based on the correlation of two equivalent forms of the scale. Test-retest reliability: Estimation based on the correlation between two (or more) administrations of the same item, scale, or instrument for different times, locations, or populations, when the two administrations do not differ on other relevant variables.
  • 31. Alternative-forms reliability: two equivalent forms of the scale are constructed and the same respondents are measured at two different times, with a different form being used each time. Cronbach’s alpha Cronbach’s alpha, the coefficient of reliability, is frequently used to measure internal consistency and stability of an instrument (Churchill, 1979). It is the average of all possible split-half coefficients resulting from different ways of splitting the scale items. Cronbach’s alpha varies from 0 to 1, and a value of 0.6 or less generally indicates unsatisfactory internal consistency reliability. The Sampling Design Process Define the Population Determine the Sampling Frame Select Sampling Technique(s) Determine the Sample Size Execute the Sampling Process
  • 32. Define the Target Population The target population is the collection of elements or objects that possess the information sought by the researcher and about which inferences are to be made. The target population should be defined in terms of elements, sampling units, extent, and time. An element is the object about which or from which the information is desired, e.g., the respondent. A sampling unit is an element, or a unit containing the element, that is available for selection at some stage of the sampling process. Extent refers to the geographical boundaries. Time is the time period under consideration. Classification of Sampling Techniques Sampling Techniques
  • 34. Sampling Convenience Sampling Convenience sampling attempts to obtain a sample of convenient elements. Often, respondents are selected because they happen to be in the right place at the right time. use of students, and members of social organizations department stores using charge account lists A Graphical Illustration of Convenience Sampling Group D happens to assemble at a convenient time and place. So all the elements in this Group are selected. The resulting sample consists of elements 16, 17, 18, 19 and 20. Note, no elements are selected from group A, B, C and E. ABCDE16111621271217223813182349141924510152025
  • 35.
  • 36. Judgmental Sampling Judgmental sampling is a form of convenience sampling in which the population elements are selected based on the judgment of the researcher. test markets purchase engineers selected in industrial marketing research expert witnesses used in court Graphical Illustration of Judgmental Sampling The researcher considers groups B, C and E to be typical and convenient. Within each of these groups one or two elements are selected based on typicality and convenience. The resulting sample consists of elements 8, 10, 11, 13, and 24. Note, no elements are selected from groups A and D.ABCDE16111621271217223813182349141924510152025
  • 37.
  • 38. Quota Sampling Quota sampling may be viewed as two-stage restricted judgmental sampling. The first stage consists of developing control categories, or quotas, of population elements. In the second stage, sample elements are selected based on convenience or judgment. Population Sample composition composition Control Characteristic Percentage Percentage Number Sex Male 48 48 480 Female 52 52 520 ____ ____ ____
  • 39. 100 100 1000 What sampling technique do you recommend for DuPont Case? Quota samples are most applicable for mall intercept interviews because they allow for more precision than regular judgmental sampling and mall intercept interviews are inherently non- probabilistic. We can create control categories along age groups. For example Age % 22–30 20 31–45 43 45–60 18 60+ 19 which represents the percentage of the sample size, which should be obtained from each category. Respondents are approached in the mall with the goal of achieving this age distribution. In this case, we also want to bias our selection in terms of women, since they purchase most carpeting. Thus, we should purposely target women in these age groups at a 2 to 1 ratio to men. A Graphical Illustration of Quota Sampling A quota of one element from each group, A to E, is imposed. Within each group, one element is selected based on judgment or convenience. The resulting sample consists of elements 3, 6, 13, 20 and 22. Note, one element is selected from each column or group. ABCDE16111621271217223813182349141924510152025
  • 40.
  • 41. Snowball Sampling In snowball sampling, an initial group of respondents is selected, usually at random. After being interviewed, these respondents are asked to identify others who belong to the target population of interest. Subsequent respondents are selected based on the referrals. A Graphical Illustration of Snowball Sampling Elements 2 and 9 are selected randomly from groups A and B. Element 2 refers elements 12 and 13. Element 9 refers element 18. The resulting sample consists of elements 2, 9, 12,
  • 42. 13, and 18. Note, there are no element from group E.ABCDE16111621271217223813182349141924510152025
  • 43. Classification of Sampling Techniques Sampling Techniques Nonprobability Sampling Techniques
  • 45. Each element in the population has a known and equal probability of selection. Each possible sample of a given size (n) has a known and equal probability of being the sample actually selected. This implies that every element is selected independently of every other element. A Graphical Illustration of Simple Random Sampling Select five random numbers from 1 to 25. The resulting sample consists of population elements 3, 7, 9, 16, and 24. ABCDE16111621271217223813182349141924510152025
  • 46.
  • 47. Systematic Sampling The sample is chosen by selecting a random starting point and then picking every ith element in succession from the sampling frame. The sampling interval, i, is determined by dividing the population size N by the sample size n and rounding to the nearest integer. When the ordering of the elements is related to the characteristic of interest, systematic sampling increases the representativeness of the sample. Systematic Sampling If the ordering of the elements produces a cyclical pattern, systematic sampling may decrease the representativeness of the sample. For example, there are 100,000 elements in the population and a sample of 1,000 is desired. In this case the sampling interval, i, is 100. A random number between 1 and 100 is selected. If, for example, this number is 23, the sample consists of elements 23, 123, 223, 323, 423, 523, and so on. A Graphical Illustration of Systematic Sampling Select a random number between 1 to 5, say 2. The resulting sample consists of population 2, (2+5=) 7, (2+5x2=) 12, (2+5x3=)17, and (2+5x4=) 22. Note, all the elements are selected from a single row.ABCDE16111621271217223813182349141924510152025
  • 48.
  • 49. Stratified Sampling A two-step process in which the population is partitioned into subpopulations, or strata. The strata should be mutually exclusive and collectively exhaustive in that every population element should be assigned to one and only one stratum and no population elements should be omitted. Next, elements are selected from each stratum by a random procedure, usually SRS. A major objective of stratified sampling is to increase precision without increasing cost. The elements within a stratum should be as homogeneous as possible, but the elements in different strata should be as heterogeneous as possible. The stratification variables should also be closely related to the characteristic of interest.
  • 50. A Graphical Illustration of Stratified Sampling Randomly select a number from 1 to 5 for each stratum, A to E. The resulting sample consists of population elements 4, 7, 13, 19 and 21. Note, one element is selected from each column. ABCDE16111621271217223813182349141924510152025
  • 51. Cluster Sampling The target population is first divided into mutually exclusive and collectively exhaustive subpopulations, or clusters. Elements within a cluster should be as heterogeneous as possible, but clusters themselves should be as homogeneous as
  • 52. possible. Ideally, each cluster should be a small-scale representation of the population. Then a random sample of clusters is selected, based on a probability sampling technique such as SRS. For each selected cluster, either all the elements are included in the sample (one-stage) or a sample of elements is drawn probabilistically (two-stage). A Graphical Illustration of Cluster Sampling (2-Stage) Randomly select 3 clusters, B, D and E. Within each cluster, randomly select one or two elements. The resulting sample consists of population elements 7, 18, 20, 21, and 23. Note, no elements are selected from clusters A and C. ABCDE16111621271217223813182349141924510152025
  • 53.
  • 54. Strengths and Weaknesses of Basic Sampling Techniques Technique Strengths Weaknesses Nonprobability Sampling Convenience sampling Least expensive, least time-consuming, most convenient Selection bias, sample not representative, not recommended for descriptive or causal research Judgmental sampling Low cost, convenient, not time-consuming Does not allow generalization, subjective Quota sampling Sample can be controlled for certain characteristics Selection bias, no assurance of representativeness Snowball sampling Can estimate rare characteristics
  • 55. Time-consuming Probability sampling Simple random sampling (SRS) Easily understood, results projectable Difficult to construct sampling frame, expensive, lower precision, no assurance of representativeness. Systematic sampling Can increase representativeness, easier to implement than SRS, sampling frame not necessary Can decrease representativeness Stratified sampling Include all important subpopulations, precision Difficult to select relevant stratification variables, not feasible to stratify on many variables, expensive Cluster sampling Easy to implement, cost effective Imprecise, difficult to compute and interpret results
  • 56. Procedures for Drawing Probability Samples Simple Random Sampling 1. Select a suitable sampling frame 2. Each element is assigned a number from 1 to N (pop. size) 3. Generate n (sample size) random numbers between 1 and N 4. The numbers generated denote the elements that should be included in the sample Procedures for Drawing Probability Samples Systematic Sampling 1. Select a suitable sampling frame 2. Each element is assigned a number from 1 to N (pop. size) 3. Determine the sampling interval i:i=N/n. If i is a fraction, round to the nearest integer 4. Select a random number, r, between 1 and i, as explained in
  • 57. simple random sampling 5. The elements with the following numbers will comprise the systematic random sample: r, r+i,r+2i,r+3i,r+4i,...,r+(n-1)i Procedures for Drawing Probability Samples nh = n h=1 H Stratified Sampling 1. Select a suitable frame 2. Select the stratification variable(s) and the number of strata, H 3. Divide the entire population into H strata. Based on the classification variable, each element of the population is assigned to one of the H strata 4. In each stratum, number the elements from 1 to Nh (the pop. size of stratum h) 5. Determine the sample size of each stratum, nh
  • 58. 6. In each stratum, select a simple random sample of size nh Procedures for Drawing Probability Samples Cluster Sampling 1. Assign a number from 1 to N to each element in the population 2. Divide the population into C clusters of which c will be included in the sample 3. Calculate the sampling interval i, i=N/c (round to nearest integer) 4. Select a random number r between 1 and i, as explained in simple random sampling 5. Identify elements with the following numbers: r,r+i,r+2i,... r+(c-1)i 6. Select the clusters that contain the identified elements 7. Select sampling units within each selected cluster based on SRS or systematic sampling 8. Remove clusters exceeding sampling interval i. Calculate new
  • 59. population size N*, number of clusters to be selected C*= C- 1, and new sampling interval i*. Tennis' Systematic Sampling Returns a Smash Tennis magazine conducted a mail survey of its subscribers to gain a better understanding of its market. Systematic sampling was employed to select a sample of 1,472 subscribers from the publication's domestic circulation list. If we assume that the subscriber list had 1,472,000 names, the sampling interval would be 1,000 (1,472,000/1,472). A number from 1 to 1,000 was drawn at random. Beginning with that number, every 1,000th subscriber was selected. A brand-new dollar bill was included with the questionnaire as an incentive to respondents. An alert postcard was mailed one week before the survey. A second, follow-up, questionnaire was sent to the whole sample ten days after the initial questionnaire. The net effective mailing was 1,396. Six weeks after the first mailing, 778 completed questionnaires were returned, yielding a response rate of 56%. Discussion question 1Discuss the advantages of convenience samples and when it is appropriate to use them. Convenience sampling is the least expensive and least time consuming of all sampling techniques. The sampling units are
  • 60. accessible, easy to measure, and cooperative. In spite of these advantages, this form of sampling has serious limitations. Many potential sources of selection bias are present, including respondent self-selection. Convenience samples are not representative of any definable population. Hence, it is not theoretically meaningful to generalize to any population from a convenience sample, and convenience samples are not appropriate for marketing research projects involving population inferences.Convenience samples are not recommended for descriptive or causal research, but they can be used in exploratory research for generating ideas, insights, or hypotheses. Convenience samples can be used for focus groups, pretesting questionnaires, or pilot studies. Even in these cases, caution should be exercised in interpreting the results. Nevertheless, this technique is sometimes used even in large surveys. Discussion question 2Discuss the advantages of systematic sampling. Systematic sampling is less costly and easier than simple random sampling, because random selection is done only once. Moreover, the random numbers do not have to be matched with individual elements as in simple random sampling. Because some lists contain millions of elements, considerable time can be saved. This reduces the costs of sampling. If information related to the characteristic of interest is available for the population, systematic sampling can be used to obtain a more representative and reliable (lower sampling error) sample than simple random sampling. Another relative advantage is that systematic sampling can even be used without knowledge of the composition (elements) of the sampling frame. For example, every ith person leaving a department store or mall can be
  • 61. intercepted. For these reasons, systematic sampling is often employed in consumer mail, telephone, and mall intercept interviews. Discussion question 3Discuss the uses of nonprobability and probability sampling. Nonprobability sampling is used in concept tests, package tests, name tests, and copy tests, where projections to the populations are usually not needed. In such studies, interest centers on the proportion of the sample that gives various responses or expresses various attitudes. Samples for these studies can be drawn using methods such as mall intercept quota sampling. On the other hand, probability sampling is used when there is a need for highly accurate estimates of market share or sales volume for the entire market. National market tracking studies, which provide information on product category and brand usage rates, as well as psychographic and demographic profiles of users, use probability sampling. Studies that use probability sampling generally employ telephone interviews. Stratified and systematic sampling are combined with some form of random- digit dialing to select the respondents.